Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to filter loom by CellID extracted from Seurat #4

Closed
ghost opened this issue Oct 19, 2020 · 9 comments
Closed

Fail to filter loom by CellID extracted from Seurat #4

ghost opened this issue Oct 19, 2020 · 9 comments

Comments

@ghost
Copy link

ghost commented Oct 19, 2020

Hi,
I have problem when trying to filter loom with CellID extracted from Seurat object.

  1. When I import the loom file by sample = anndata.read_loom("sample.loom"), I get this warning message: ariable names are not unique. To make them unique, call .var_names_make_unique. Do I need to run sample.var_names_make_unique()?
  2. If I ignore the warning in the above question and continue to load CellID_obs.cvs and filter by sample = sample[sample[np.isin(sample.obs.index,cellID_obs[0])]], I get the following error:

KeyError Traceback (most recent call last)
~/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2890 try:
-> 2891 return self._engine.get_loc(casted_key)
2892 except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 0

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last)
in
----> 1 sample = sample[sample[np.isin(sample.obs.index,cellID_obs[0])]]

~/.local/lib/python3.6/site-packages/pandas/core/frame.py in getitem(self, key)
2900 if self.columns.nlevels > 1:
2901 return self._getitem_multilevel(key)
-> 2902 indexer = self.columns.get_loc(key)
2903 if is_integer(indexer):
2904 indexer = [indexer]

~/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2891 return self._engine.get_loc(casted_key)
2892 except KeyError as err:
-> 2893 raise KeyError(key) from err
2894
2895 if tolerance is not None:

KeyError: 0

Could you help me with this?
Thank you so much!

@JingleW
Copy link

JingleW commented Oct 20, 2020

I met the same problem.

  1. After running "run sample.var_names_make_unique()", there was no warning message: ariable names are not unique.
  2. However, when I running sample = sample[sample[np.isin(sample.obs.index,cellID_obs[0])]], the same error occurred. It seems that the "sample_one.obs.index" in the loom file cannot match the "cellID_obs". I got my cellID_obs.csv of one sample from a Seurat object composed of multiple single-cell samples.
    Thank you. @basilkhuder

@basilkhuder
Copy link
Owner

basilkhuder commented Oct 20, 2020

Hi to both of you!

If you could, please open up your cell observation file (either in excel or python) and look to see the name of the column that has the ids. Use this name for subsetting the loom file (so if the column is named "x"):

sample[np.isin(sample.obs.index,cellID_obs["x"])]

Thanks!

@basilkhuder
Copy link
Owner

  1. When I import the loom file by sample = anndata.read_loom("sample.loom"), I get this warning message: ariable names are not unique. To make them unique, call .var_names_make_unique. Do I need to run sample.var_names_make_unique()?

Go ahead and make them unique. Check this out for more information.

@ghost
Copy link
Author

ghost commented Oct 21, 2020

Hi to both of you!

If you could, please open up your cell observation file (either in excel or python) and look to see the name of the column that has the ids. Use this name for subsetting the loom file (so if the column is named "x"):

sample[sample[np.isin(sample.obs.index,cellID_obs["x"])]]

Thanks!

Hi,
I have thought about that as well but I got another error (see below.) I wonder if it is because the format of indexes are different in sample.obs.index and cellID_obs.
For exapmel, in cellID_obs, the index is listed as "AAACCCAAGTATGGCG-1" whereas in sample.obs.index, it's "AAACCCAAGTATGGCGx".

IndexError Traceback (most recent call last)
in
----> 1 sample = sample[sample[np.isin(sample.obs.index,cellID_obs["x"])]]

~/.local/lib/python3.6/site-packages/anndata/_core/anndata.py in getitem(self, index)
1085 def getitem(self, index: Index) -> "AnnData":
1086 """Returns a sliced view of the object."""
-> 1087 oidx, vidx = self._normalize_indices(index)
1088 return AnnData(self, oidx=oidx, vidx=vidx, asview=True)
1089

~/.local/lib/python3.6/site-packages/anndata/_core/anndata.py in _normalize_indices(self, index)
1066
1067 def _normalize_indices(self, index: Optional[Index]) -> Tuple[slice, slice]:
-> 1068 return _normalize_indices(index, self.obs_names, self.var_names)
1069
1070 # TODO: this is not quite complete...

~/.local/lib/python3.6/site-packages/anndata/_core/index.py in _normalize_indices(index, names0, names1)
32 index = index[0].values, index[1]
33 ax0, ax1 = unpack_index(index)
---> 34 ax0 = _normalize_index(ax0, names0)
35 ax1 = _normalize_index(ax1, names1)
36 return ax0, ax1

~/.local/lib/python3.6/site-packages/anndata/_core/index.py in _normalize_index(indexer, index)
104 return positions # np.ndarray[int]
105 else:
--> 106 raise IndexError(f"Unknown indexer {indexer!r} of type {type(indexer)}")
107
108

IndexError: Unknown indexer View of AnnData object with n_obs × n_vars = 0 × 32285
var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
layers: 'matrix', 'ambiguous', 'spliced', 'unspliced' of type <class 'anndata._core.anndata.AnnData'>

@JingleW
Copy link

JingleW commented Oct 21, 2020

Hi, thanks very much for all the replies.
First, I modified the cellID_obs manually, and the modified cellID_obs is shown below.
image
And my sample_one.obs.index is shown below.
image
Then I tried different ways for the script "sample_one = sample_one[sample_one[np.isin(sample_one.obs.index,sample_obs["x"])]]".

  1. sample_one = sample_one[sample_one[np.isin(sample_one.obs.index,cellID_obs["x"])]]
    image

  2. sample_one = sample_one[sample_one[np.isin(sample_one.obs.index,cellID_obs["y"])]]
    image

  3. sample_one = sample_one[sample_one[np.isin(sample_one.obs.index,cellID_obs["z"])]]
    image

It seemed the filtering worked when cellID_obs were totally the same as the sample_one.obs.index. However, an index error still happened? Have I missed something or any script was wrong?

@basilkhuder
Copy link
Owner

basilkhuder commented Oct 21, 2020

Just realized the typo. It should be:

sample[np.isin(sample.obs.index,cellID_obs["x"])]

Edit: In your case, I'm guessing "z" would be the column name.

@JingleW
Copy link

JingleW commented Oct 21, 2020

Just realized the typo. It should be:

sample[np.isin(sample.obs.index,cellID_obs["x"])]

Edit: In your case, I'm guessing "z" would be the column name.

Yes, sample_one = sample_one[np.isin(sample_one.obs.index,cellID_obs["z"])] worked.
Thank you.

@ghost
Copy link
Author

ghost commented Oct 21, 2020

Hi all,
Thank you so much for the suggestions. I modified the CellID.csv as JingleW suggested and ran sample[np.isin(sample.obs.index,cellID_obs["x"])]. The problem has been resolved!

@AAA-3
Copy link

AAA-3 commented Aug 13, 2021

  1. When I import the loom file by sample = anndata.read_loom("sample.loom"), I get this warning message: ariable names are not unique. To make them unique, call .var_names_make_unique. Do I need to run sample.var_names_make_unique()?

Go ahead and make them unique. Check this out for more information.

Hello everyone! First time user of bioinformatics tools and curious to why would we want the variable names to be unique? According to this AnnData page, the variable names are genes ... if more than one cell is expressing a gene, would we not expect repeats? And would it not be more relevant for us to make the obs_names unique since that holds our CellIDs?

I am having siillar issues to the OP - my post #13 outlines everything I have tried ....any clarification would be appreciated!
Thanks!!
image
image

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants