Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed variable name issues in hmm_kernel and added documentation #105

Merged
merged 2 commits into from
Apr 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 31 additions & 12 deletions utils/prediction/cvfolds.m
Original file line number Diff line number Diff line change
@@ -1,11 +1,30 @@
function foldsi = cvfolds(Y,CVscheme,allcs,perm)
% allcs can be a N x 1 vector with family memberships, an (N x N) matrix
% with family relationships, or empty.
% perm: whether to permute subjects when assigning them to folds or not (if
% not, they will be assigned (accounting for family structure) in the order
% they appear
function foldsi = cvfolds(Y, CVscheme, allcs, perm)
% foldsi = cvfolds(Y, CVscheme, allcs, perm)
%
% creates (stratified) CV folds using either LOO or k-fold CV, optionally
% accounting for family structure
%
% INPUT:
% Y: data that should be split into folds. If this is a vector
% with more than 4 unique entries, this assumes one
% continuous variable. If this is a matrix, this assumes
% classes corresponding to the columns in Y and automatically
% stratifies
% CVscheme: CV scheme to be applied, 0 for LOO, k>1 for k-fold CV
% allcs: can be a N x 1 vector with family memberships, an (N x N)
% matrix with family relationships, or empty.
% perm: only relevant when using family structure: whether to
% permute subjects when assigning them to folds (if 0,
% subjects will be assigned to folds in the order they
% appear). When not using family structure, subjects are
% always randomly assigned
%
% OUTPUT:
% foldsi: cell with (k) CV folds containing indices of Y for each
% fold
%
% Diego Vidaurre
% edits Christine Ahrends
% edits Christine Ahrends, 2023

if nargin<3, allcs = []; end
is_cs_matrix = (size(allcs,2) == 2);
Expand All @@ -16,11 +35,6 @@

[N,q] = size(Y);

if perm==1
indperm = randperm(N);
Y = Y(indperm,:);
end

if CVscheme==0, nfolds = N;
else nfolds = CVscheme;
end
Expand Down Expand Up @@ -59,6 +73,11 @@
foldsDONE = false(nfolds,1); foldsUSED = false(nfolds,1);
j = 1;

if perm==1
indperm = randperm(N);
Y = Y(indperm,:);
end

while j<=N
if grotDONE(j), j = j+1; continue; end
Jj = j;
Expand Down
1 change: 0 additions & 1 deletion utils/prediction/hmm_gradient.m
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
%
% computes the gradient of HMM log-likelihood for time series X (single
% subject/session) with respect to specified parameters
% for use with HMM-MAR toolbox (https://github.com/OHBA-analysis/HMM-MAR)
%
% INPUT:
% X: example data (timeseries of a single subject/session, in
Expand Down
37 changes: 19 additions & 18 deletions utils/prediction/hmm_kernel.m
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
function [K, feat, Dist] = hmm_kernel(X_all, hmm, options)
% [K, feat, Dist] = hmm_kernel(X_all, hmm, options)
function [Kernel, feat, Dist] = hmm_kernel(X_all, hmm, options)
% [Kernel, feat, Dist] = hmm_kernel(X_all, hmm, options)
%
% hmm_kernel computes a kernel and feature matrix from HMMs
% implemented for linear and Gaussian versions of Fisher kernel & naive
% Computes a kernel, feature matrix, and/or distance matrix from HMMs.
% Implemented for linear and Gaussian versions of Fisher kernel & "naive"
% kernels
% for use with HMM-MAR toolbox (https://github.com/OHBA-analysis/HMM-MAR)
%
% INPUT:
% X_all: timeseries of all subjects/sessions (cell of size samples x 1)
Expand All @@ -20,18 +19,20 @@
% + type: which type of features to compute, one of either 'Fisher'
% for Fisher kernel, 'naive' for naive kernel, or
% 'naive_norm' for naive normalised kernel
% + kernel: which kernel to compute, one of either 'linear' or 'Gaussian'
% + shape: which shape of kernel, one of either 'linear' or 'Gaussian'
% + normalisation:
% (optional) how to normalise features, e.g. 'L2-norm'
%
% OUTPUT:
% K: Kernel specified by options.type and options.kernel (matrix
% Kernel: Kernel specified by options.type and options.shape (matrix
% of size samples x samples), e.g. for options.type='Fisher'
% and options.kernel = 'linear', this is the linear Fisher kernel
% and options.shape = 'linear', this is the linear Fisher kernel
% feat: features from which kernel was constructed (matrix of size
% samples x features), e.g. for options.type='Fisher', this
% will be the gradients of the log-likelihood of each subject
% w.r.t. to the specified parameters (i.e. Fisher scores),
% w.r.t. to the specified parameters (i.e. Fisher scores)
% Dist: Distance matrix for Gaussian kernel (matrix of size samples x
% samples)
%
% Christine Ahrends, Aarhus University (2022)

Expand All @@ -42,13 +43,13 @@
normalisation = 'none';
end

if ~isfield(options, 'kernel') || isempty(options.kernel)
kernel = 'linear';
if ~isfield(options, 'shape') || isempty(options.shape)
shape = 'linear';
else
kernel = options.kernel;
shape = options.shape;
end

if strcmpi(kernel, 'Gaussian')
if strcmpi(shape, 'Gaussian')
if ~isfield(options, 'tau') || isempty(options.tau)
tau = 1; % radius of Gaussian kernel, do this in CV
else
Expand All @@ -63,7 +64,7 @@

% get features (compute gradient if requested)
for s = 1:S
[~, feat(s,:)] = hmm_gradient(X_all{s}, hmm, options); % if options.type='vectorised', this does not compute the gradient, but simply does dual estimation and vectorises the subject-level parameters
[~, feat(s,:)] = hmm_gradient(X_all{s}, hmm, options); % if options.type='naive' or 'naive_normalised', this does not compute the gradient, but simply does dual estimation and vectorises the subject-level parameters
end
%
% NOTE: features will be in embedded space (e.g. embedded lags &
Expand All @@ -75,17 +76,17 @@
feat = bsxfun(@rdivide, feat, max(sqrt(sum(feat.^2, 1)), realmin));
end

if strcmpi(kernel, 'linear')
K = feat*feat';
if strcmpi(shape, 'linear')
Kernel = feat*feat';

elseif strcmpi(kernel, 'Gaussian')
elseif strcmpi(shape, 'Gaussian')
% get norm of feature vectors
for i =1:S
for j = 1:S
Dist(i,j) = sqrt(sum(abs(feat(i,:)-feat(j,:)).^2)).^2;
end
end
K = exp(-Dist/(2*tau^2));
Kernel = exp(-Dist/(2*tau^2));
end

end
22 changes: 11 additions & 11 deletions utils/prediction/predictPhenotype.m
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
function [predictedY,predictedYD,YD,stats] = predictPhenotype (Yin,Din,options,varargin)
function [predictedY,predictedYD,YD,stats] = predictPhenotype(Yin,Din,options,varargin)
%
% Kernel ridge regression estimation using a distance matrix using (stratified) LOO.
% Using this means that the HMM was run once, out of the cross-validation loop
Expand Down Expand Up @@ -26,7 +26,7 @@
% + CVfolds - prespecified CV folds for the outer loop
% + biascorrect - whether we correct for bias in the estimation
% (Smith et al. 2019, NeuroImage)
% + kernel - which kernel to use ('linear' or 'gaussian')
% + shape - which kernel to use ('linear' or 'gaussian')
% + verbose - display progress?
% cs optional (no. subjects X no. subjects) dependency structure matrix with
% specifying possible relations between subjects (e.g., family
Expand Down Expand Up @@ -57,12 +57,12 @@
% adapted to different kernels:
% Christine Ahrends, Aarhus University, 2022

if ~isfield(options, 'kernel')
kernel = 'linear';
if ~isfield(options, 'shape')
shape = 'linear';
else
kernel = options.kernel;
shape = options.shape;
end
if strcmp(kernel, 'gaussian')
if strcmp(shape, 'gaussian')
Din(eye(size(Din,1))==1) = 0;
end
[N,q] = size(Yin);
Expand Down Expand Up @@ -93,7 +93,7 @@
else
alpha = options.alpha;
end
if strcmp(kernel, 'gaussian')
if strcmp(shape, 'gaussian')
if ~isfield(options,'sigmafact')
sigmafact = [1/5 1/3 1/2 1 2 3 5];
else
Expand Down Expand Up @@ -242,12 +242,12 @@
Nji = length(Qji);
QD2 = QDin(QJ,Qji);

if strcmp(kernel, 'gaussian')
if strcmp(shape, 'gaussian')
sigmabase = auto_sigma(QD);
sigma = sigmf * sigmabase;
K = gauss_kernel(QD,sigma);
K2 = gauss_kernel(QD2,sigma);
elseif strcmp(kernel, 'linear')
elseif strcmp(shape, 'linear')
K = QD;
K2 = QD2;
end
Expand All @@ -271,13 +271,13 @@
alph = alpha(ialph);
Dii = D(ind,ind); D2ii = D2(:,ind);

if strcmp(kernel, 'gaussian')
if strcmp(shape, 'gaussian')
sigmf = sigmafact(isigm);
sigmabase = auto_sigma(D);
sigma = sigmf * sigmabase;
K = gauss_kernel(Dii,sigma);
K2 = gauss_kernel(D2ii,sigma);
elseif strcmp(kernel, 'linear')
elseif strcmp(shape, 'linear')
K = Dii;
K2 = D2ii;
sigmf = NaN;
Expand Down