Description
I am currently using your fda package in order to apply FPCA to data where the number of measurements per patient can vary. Hence, before using pca.fd, I am building the functional objects with, as an input, arrays where the number of rows equals the maximum number of visits and the number of columns equals the sample size. Therefore, for any patient who have less measurement than this maximum, both argvals and y arrays are filled with NAs after their last visit.
I have tried smoothing this data using the smooth.basis function from your package but the detection of NA make its run impossible. I have tried to use smooth.basis.sparse which seems fit for this kind of data scheme. However, when running it I end up with an error. The reason behind this error seems to be because the function tries to build a matrix with ncol = dim(data)[2] where data is not an argument of the function. I have rewritten the function replacing data and time arguments by y and argvals:
smooth.basis.sparse = function (argvals, y, fdParobj, fdnames = NULL, covariates = NULL,
method = "chol", dfscale = 1)
{
if (is.fdPar(fdParobj)) {
basisobj = fdParobj$fd$basis
}
else {
if (is.fd(fdParobj)) {
basisobj = fdParobj$basis
}
else {
if (is.basis(fdParobj)) {
basisobj = fdParobj
}
else {
stop("fdParobj is not a fdPar, fd, or a basis object.")
}
}
}
coefs = matrix(0, nrow = basisobj$nbasis, ncol = dim(y)[2])
for (i in 1:dim(y)[2]) {
curve = y[, i]
curve.smooth = smooth.basis(argvals[!is.na(curve),i], curve[!is.na(curve)],
basisobj, covariates, method)
coefs[, i] = curve.smooth$fd$coefs
}
datafd = fd(coefs, basisobj, fdnames)
return(datafd)
}
I have also added the i index in argvals when calling smooth.basis so that it allows the measurement time to vary from a patient to an other. However, because the nbasis equals 4 and some of the patient can only have 2 measurements, I end up with an error with smooth.basis because the number of basis function exceeds the number of points to be smoothed.
Hence, my question is: Is there any way of using the pca.fd function to data where each patient have different measurement times and different number of measures? With this kind of data, argvals and y arrays would be of size max_number_of_measurements x number_of_subjects and any subjects having less measurements than the maximum would be filled with NAs after the last measurement in both argvals and y arrays. Maybe the way I have tried to apply the function is wrong and I am happy to get your insights on how to correctly apply pca.fd to this kind of data.
Thanks a lot for your help !