Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on input to ENVI #7

Open
soerenab opened this issue Mar 20, 2024 · 3 comments
Open

Question on input to ENVI #7

soerenab opened this issue Mar 20, 2024 · 3 comments

Comments

@soerenab
Copy link

Hi,

I have a question regarding the input to ENVI. I read in one of your comments in another issue that "Also, make sure the data is not logged (in the .X), since ENVI expected unlogged counts."

However, following your tutorial and inspecting the data

# !wget https://dp-lab-data-public.s3.amazonaws.com/ENVI/sc_data.h5ad
# !wget https://dp-lab-data-public.s3.amazonaws.com/ENVI/st_data.h5ad
st_data = sc.read_h5ad('st_data.h5ad')
sc_data = sc.read_h5ad('sc_data.h5ad')

I noticed that
st_data.X.max() = 247.12617
sc_data.X.max() = 4360.0
i.e., sc data seems "raw" while spatial data seems to have been processed in some way.

Now I am wondering: how should the sc and sp data be processed when handing it to ENVI?

Thanks a lot!

@DoronHav
Copy link
Collaborator

DoronHav commented Mar 20, 2024

Hello,

The datasets (which are public sources from https://www.nature.com/articles/s41586-021-03705-x) were processed, but the counts are not in log domain. Specifically, the counts in spatial data were normalized by cell size and some batch correction was performed.

We recommend going through all standard motions of single-cell analysis (library size normalization, doublet detection, etc.) for each dataset, and then passing processed (but un-logged) data onto ENVI.

@soerenab
Copy link
Author

Thanks for the reply - just to double check: in your above comment you recommend to do library size normalization for each datasets. Yet, the dissociated dataset in the tutorial seems to contain raw counts, i.e. the data has not been normalized. So should I only normalize the spatial but not the dissociated dataset or does it not matter whether the dissociated dataset has been normalized?

@shahrozeabbas
Copy link

Both spatial and single-cell should be processed (filtered for low quality data, etc) but raw counts should be used as input to the VAE. @DoronHav please correct me if I'm wrong, but I assume this is the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants