You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I conducted the analysis, here are the setup and results
Setup: we consider the semi-synthetic dataset. In this dataset, clusters A and B have a known sample stratification, and there is no stratification for the rest of the cells. To mimick differences in composition, we conduct the following preprocessing step to generate the data:
subcluster A (resp B) into two subclusters
attribute a random binary value $c_s$ to each sample $s$, which is going to characterize one of the two subclusters
In each sample $s$, discard q% of the cells belonging to subcluster $c_s$.
This aimed to mimick potentially strong differences in cell composition per sample.
The aim of the experiment is to (i). verify that the distance matrix is not affected by these compositional differences (ii). inspect the pertinence of the uncertainty estimates, see #11.
Results
The two figures below respectively show the evolution of the RF distance of the estimated matrices and the median of the popensity scores (y-axes) against the value of q (x axis)
In each sample , discard q% of the cells belonging to subcluster .
Do you do this for all samples? Or just one sample at a time? It makes sense to me to do one sample as we are interested to see if that one sample becomes out of distribution
as a function of downsampling:
The text was updated successfully, but these errors were encountered: