Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple-Sample Integration for filtering cell ID based off Seurat #9

Open
cfayx1996 opened this issue Mar 27, 2021 · 6 comments
Open

Comments

@cfayx1996
Copy link

Hello,

Thank you for the well detailed instructions for this they are very helpful. I am rather new to python and I am having a challenging time trying to filter the loom files to match my Seurat object. My Seurat consists of 3 individual samples that are integrated together. I have three separate loom files that were made using Velocyto. I have followed all the instructions in your tutorial up to the filtering step for the loom files. After calling in all the CSV files for the CellIds, UMAP, and cluster ids I moved onto the Multiple-Sample Integration step as my CellID_Obs file has combined 3 samples just like your example table. I use the code:

cellID_obs_sample_one = cellID_obs[cellID_obs_sample_one[0].str.contrains("sample1_")]
cellID_obs_sample_two = cellID_obs[cellID_obs_sample_two[0].str.contrains("sample2_")]
cellID_obs_sample_three = cellID_obs[cellID_obs_sample_three[0].str.contrains("sample3_")]

sample_one = sample_one[np.isin(sample_one.obs.index, cellID_obs_sample_one)]
sample_two = sample_one[np.isin(sample_two.obs.index, cellID_obs_sample_two)]
sample_two = sample_one[np.isin(sample_two.obs.index, cellID_obs_sample_two)]

When I run the first line it errors out with:

cellID_obs_sample_one = sample_obs[cellID_obs_sample_one[0].str.contrains("sample1_")]
Traceback (most recent call last):
File "", line 1, in
NameError: name 'cellID_obs_sample_one' is not defined

If i separate the samples cellID_obs from Seurat into 3 separate lists and run it i still error out:

cellID_obs_sample1 = pd.read_csv("/home/cfay/Documents/cellID_obs_sample1.csv")

sample_one = sample_one[np.isin(sample_one.obs.index,cellID_obs_sample1["x"])]
cellID_obs_sample2 = pd.read_csv("/home/cfay/Documents/cellID_obs_sample2csv")
sample_two = sample_two[np.isin(sample_two.obs.index,cellID_obs_sample2["x"])]
cellID_obs_sample3 = pd.read_csv("/home/cfay/Documents/cellID_obs_sample3.csv")
sample_three = sample_three[np.isin(sample_three.obs.index,cellID_obs_sample3["x"])]
sample_one = sample_one.concatenate(sample_two, sample_three)
Traceback (most recent call last):
File "", line 1, in
File "/home/cfay/anaconda3/lib/python3.8/site-packages/anndata/_core/anndata.py", line 1710, in concatenate
out.obs = concat(
File "/home/cfay/anaconda3/lib/python3.8/site-packages/anndata/_core/anndata.py", line 834, in obs
self._set_dim_df(value, "obs")
File "/home/cfay/anaconda3/lib/python3.8/site-packages/anndata/_core/anndata.py", line 783, in _set_dim_df
value_idx = self._prep_dim_index(value.index, attr)
File "/home/cfay/anaconda3/lib/python3.8/site-packages/anndata/_core/anndata.py", line 810, in _prep_dim_index
value[0], (str, bytes)
File "/home/cfay/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 4101, in getitem
return getitem(key)
IndexError: index 0 is out of bounds for axis 0 with size 0

I figure that I am doing some part of this wrong and wanted to know if you would be able to help me pinpoint the issue as I want to calculate RNA velocity and use my seurat UMAP.
Thank you for your help and consideration!

@sweebinee
Copy link

sweebinee commented Apr 26, 2021

hi @cfayx1996,
I'm a user like you, but I think I can help you.

I think you need to check your cell IDs first, especially their pattern.

cellID_obs_sample_one = cellID_obs[cellID_obs_sample_one[0].str.contrains("sample1_")]

In this line, str.contains() python function finds given string pattern("sample1_") in the front object(cellID_obs_sample_one[0]).
It's possible that your cell ID pattern is not "sampleX_".

And I think you need to modify the code like this:
cellID_obs_sample_one = cellID_obs[cellID_obs[0].str.contains("sample1_")]

contains is right, not contrains. Probably. @basilkhuder

@AAA-3
Copy link

AAA-3 commented Aug 13, 2021

Hello! I tried attempting this solution (see #13 ) but it did not work for me and produced a long traceback error. @cfayx1996 did yoz have any luck?

@cfayx1996
Copy link
Author

Hi @AAA-3,

I presume you are trying to filter your data for RNA Velocity?

I tried to use this tutorial in for sorting in python, but found it was a lot easier to sort and create the object in R since I was analyzing the data with Seurat v4.

If you are using R and Seurat I would be happy to share what I did if that would help!

@AAA-3
Copy link

AAA-3 commented Aug 13, 2021

Hi @AAA-3,

I presume you are trying to filter your data for RNA Velocity?

I tried to use this tutorial in for sorting in python, but found it was a lot easier to sort and create the object in R since I was analyzing the data with Seurat v4.

If you are using R and Seurat I would be happy to share what I did if that would help!

Hi @cfayx1996 Yes I am :) I’d be happy to try your method out as well!! You can email or message through the forum, whichever is convenient: Ali.a.ali@fau.de

@Marc-Benoit
Copy link

Hi @AAA-3,

I presume you are trying to filter your data for RNA Velocity?

I tried to use this tutorial in for sorting in python, but found it was a lot easier to sort and create the object in R since I was analyzing the data with Seurat v4.

If you are using R and Seurat I would be happy to share what I did if that would help!

Hi @cfayx1996 I am having similar trouble - is there a solution using R you could post here? Thank you!

@SimoniMD
Copy link

Hi @AAA-3,

I presume you are trying to filter your data for RNA Velocity?

I tried to use this tutorial in for sorting in python, but found it was a lot easier to sort and create the object in R since I was analyzing the data with Seurat v4.

If you are using R and Seurat I would be happy to share what I did if that would help!

Hi! Would you be able to share this with me, too? michael.simoni@pennmedicine.upenn.edu if you'd like to email. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants