-
Notifications
You must be signed in to change notification settings - Fork 9
Using fda with unequal number of measurement #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
First of all, many thanks for pointing out the error. You are entirely correct about what we should have had -- I've updated the function, also to account for three-dimensional arrays for y. Viz your question; pca.fd assumes you already have your data in a functional form. However, there are techniques for the type of situation you describe. These are based off first smoothing an empirical covariance matrix based on the observations that you have at each pair of time points. pcaPACE.R in this package does some of this although we need to improve the documentation for it, fdapace is another package available on CRAN that carries this out as well and you may find similar functionality in 'refund' |
Hi and thanks a lot for your answer. Just a few tiny remarks, I think you have forgotten to close a parenthesis on line 83 of smooth.basis.sparse.R. Also, 'time' in line 79 and 88 should be replaced by 'argvals' if the visit scheme is identical for all subjects. If visit scheme is allowed to vary, 'time' should be defined as 'time = argvals[,i]' inside the loop over i I think. Our main goal was to use fda package because we have conducted some computational speed tests and fda package was way faster than other fpca packages we have tried. In our work, we need to compute many fpca, so fda package was our favourite choice. The only challenge we are facing is using fda with unequal visit schemes. To my understanding, smooth.basis.sparse cannot handle such situation. I still got an error when the number of points to be smoothed is inferior to the size of the spline basis I defined. If you have any guidance on how to use fda package with such data, I would be really happy to have some insights on it. Thanks a lot for your help! |
Well that's a good lesson in always checking that your code works! Should be fixed now. The code assumes a common set of time points for argvals, but accounts for NA's in the values of y for each curve. We would need a list structure to specify completely different times for each curve. As far as your application goes, pca.fd partly gains its speed from being able to work with pre-smoothed functional data. You may be able to get somewhere by increasing the smoothing penalty, but in your example pre-smoothing would still mean smoothing two observations, for which this answer should be a straight line and not a very stable one at that; I doubt it would give you very good principal components; I would have recommended using something like PACE in this case. I suspect that the implementation in the refund package is considerably faster than in fdapace, or you might look at our implementation. For this the steps are at a high level: smooth.sparse.mean to get a mean function covPACE to get the covariance of the residuals at every pair of time-points that are jointly measured in at least one function pcaPACE for the pca. These do have the advantage of working with basis coefficients so may be faster to compute. |
I am currently using your fda package in order to apply FPCA to data where the number of measurements per patient can vary. Hence, before using pca.fd, I am building the functional objects with, as an input, arrays where the number of rows equals the maximum number of visits and the number of columns equals the sample size. Therefore, for any patient who have less measurement than this maximum, both argvals and y arrays are filled with NAs after their last visit.
I have tried smoothing this data using the smooth.basis function from your package but the detection of NA make its run impossible. I have tried to use smooth.basis.sparse which seems fit for this kind of data scheme. However, when running it I end up with an error. The reason behind this error seems to be because the function tries to build a matrix with ncol = dim(data)[2] where data is not an argument of the function. I have rewritten the function replacing data and time arguments by y and argvals:
I have also added the i index in argvals when calling smooth.basis so that it allows the measurement time to vary from a patient to an other. However, because the nbasis equals 4 and some of the patient can only have 2 measurements, I end up with an error with smooth.basis because the number of basis function exceeds the number of points to be smoothed.
Hence, my question is: Is there any way of using the pca.fd function to data where each patient have different measurement times and different number of measures? With this kind of data, argvals and y arrays would be of size max_number_of_measurements x number_of_subjects and any subjects having less measurements than the maximum would be filled with NAs after the last measurement in both argvals and y arrays. Maybe the way I have tried to apply the function is wrong and I am happy to get your insights on how to correctly apply pca.fd to this kind of data.
Thanks a lot for your help !
The text was updated successfully, but these errors were encountered: