Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collinearity error with control vs treatment test for multiple subjects #9

Open
shobhitagrawal1 opened this issue Oct 15, 2023 · 4 comments

Comments

@shobhitagrawal1
Copy link

Hi,
Really interesting work and really thankful for the general ease of use!
The data I have has several subjects,each belonging to either control or treatment so the formula i am trying is
lemur(sce, design = ~ subject + condition, n_embedding = 30, test_fraction = 0.5)
however I am getting this error

Error in handle_design_parameter(design, data, col_data) :
The model matrix seems degenerate ('matrix_rank(design_matrix) < ncol(design_matrix)'). Some columns are perfectly collinear. Did you maybe include the same coefficient twice?

Now my understanding is that the one-hot encoding for each of control and treatment is being declared as collinear, could you please tell me how one can run a typical multi-subject (assuming them to be biological replicates) two condition analysis ..

appreciate any help.
thanking you
shobhit

@const-ae
Copy link
Owner

Hi shobhit,

thank you :)

To fit a multi-subject two-condition analysis, set the design to ~ condition (i.e., drop the subject). This fits a single coefficient explaining the treatment effect for each gene.

If you notice that the subject effects are so strong that corresponding cells from different subjects are not aligned after calling align_by_grouping or align_harmony, you can call each method with the argument alignment_design = ~ condition + subject or alignment_design = ~ condition * subject to make the alignment more flexible. However, I advise to only fit different design and alignment_designs if absolutely necessary, as it complicates the interpretation of the effects.

Best,
Constantin

@shobhitagrawal1
Copy link
Author

Dear Constantin,
Thank you very much for the prompt reply, much appreciated.
I was thinking of also using just condition for the fit and using align_by_grouping. The only hesitation was regarding the replicates the neighborhood analysis needs, will that still be possible without replicates being mentioned in the design matrix?

thank you once again
shobhit

@const-ae
Copy link
Owner

Yes. The way the replicates are specified is through the group_by argument in find_de_neighborhoods. Here you would set group_by = vars(subject, condition).

@shobhitagrawal1
Copy link
Author

thanks once again! I will give it a try and get back to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants