Problem with hashsolo #74

Kani0n · 2022-07-19T14:17:11Z

Hi,
I just ran my solo analysis and now i need the information a bout which of my cells was predicted to be a doublet and so forth.
If i understood correctly i have to use hashsolo for this, but that just runs with no logger or information about progress and i dont seem to get anything...
I only have an h5ad file with ~800 cells does this just take over a day or am i getting something wrong?
Thank you

davek44 · 2022-07-20T01:15:53Z

HashSolo is meant for working with hashing data. Are you simply attempting to call doublets? We haven’t worked with a data set with only 800 cells before. Solo benefits from larger data sets, so you might do fine with some of the linear methods like scrublet or doublet finder.

Kani0n · 2022-07-20T06:26:20Z

Yes, I just want to know which on my cells are predicted to be what (doublet, singlet). Where in the output do I see this information?

davek44 · 2022-07-20T18:24:33Z

The output file is_doublets.npy contains binary doublet calls. You can see that and all of the other output files in the lines of code after this one: https://github.com/calico/solo/blob/master/solo/solo.py#L394

Kani0n · 2022-07-22T09:42:19Z

is_doublets.npy contains over 10k values, how do i know which of these are the predictions for my cells?

davek44 · 2022-07-22T18:41:38Z

If you think your data has 800 cells, but is_doublets.npy has 10k values, then it's likely the input data isn't formatted according to Solo's assumptions. Could you send me more information about how you're running Solo and the format of your input data? Is it possible the gene expression matrix is transposed relative to the typical Cell x Gene format?

Kani0n · 2022-07-26T16:05:11Z

I actually misspoke earlier,
I have an h5ad file of 12013 cells 816 of them predicted to be doublets by AMULET. In my output folder there is for example the is_doublets.npy vector, but tthis one as well as all the other outputs are 10992 long. What i now want to know is which of my cells Solo predicted to be a doublet (True inside the vector).

davek44 · 2022-07-27T03:22:56Z

Are you saying that you don’t understand how to read the vector stored in is_doublets.npy? Or are you saying that you don’t understand how your 12,013 cells was filtered down to 10,992?

Kani0n · 2022-07-29T05:00:11Z

I am interested in what cells are labelled "True" and what cells are "False".
e.g:
cell1: False
cell2: True
...

davek44 · 2022-07-29T18:50:24Z

OK, you'll want to read the is_doublets.npy file using a command like the following in a python terminal, notebook, script.

cell_doublets = np.load('is_doublets.npy')

cell_doublets will contain a numpy array with type boolean. To determine whether Solo predicts cell 1 to be a doublet, check cell_doublets[0] in the array. To determine whether Solo predicts cell 2 to be a doublet, check cell_doublets[1] in the array. And so on.

njbernstein · 2023-10-04T02:28:32Z

@davek44 seems like we can close this out

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with hashsolo #74

Problem with hashsolo #74

Kani0n commented Jul 19, 2022

davek44 commented Jul 20, 2022

Kani0n commented Jul 20, 2022 via email •

edited

davek44 commented Jul 20, 2022

Kani0n commented Jul 22, 2022

davek44 commented Jul 22, 2022

Kani0n commented Jul 26, 2022

davek44 commented Jul 27, 2022

Kani0n commented Jul 29, 2022

davek44 commented Jul 29, 2022

njbernstein commented Oct 4, 2023

Problem with hashsolo #74

Problem with hashsolo #74

Comments

Kani0n commented Jul 19, 2022

davek44 commented Jul 20, 2022

Kani0n commented Jul 20, 2022 via email • edited

davek44 commented Jul 20, 2022

Kani0n commented Jul 22, 2022

davek44 commented Jul 22, 2022

Kani0n commented Jul 26, 2022

davek44 commented Jul 27, 2022

Kani0n commented Jul 29, 2022

davek44 commented Jul 29, 2022

njbernstein commented Oct 4, 2023

Kani0n commented Jul 20, 2022 via email •

edited