Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with using bc-geometry #670

Closed
curtisd0886 opened this issue Jun 10, 2021 · 14 comments
Closed

Issues with using bc-geometry #670

curtisd0886 opened this issue Jun 10, 2021 · 14 comments
Assignees
Labels
alevin issue is primarily related to alevin

Comments

@curtisd0886
Copy link

We currently are using a protocol that uses a barcode strategy similar to Rhapsody, where you have 3, 8nt barcodes separated by two constant regions. I am trying to use the bc-geometry flag using the following command

" salmon alevin -l ISR -i ~/Data/salmon/cell_hash -1 R1.fq.gz -2 R2.fq.gz --umi-geometry 1[51-56] --bc-geometry 1[3-8,24-29,45-50] --read-geometry 2[1-end] -o outs/ --citeseq --featureStart 0 --featureLength 15"

I am getting an output table that appears to be mapping correctly, however the cell barcodes in the table are 16 nt long instead of the 18 nt specified in the bc-geometry command. Is there something I am missing?

@k3yavi k3yavi self-assigned this Jun 10, 2021
@k3yavi
Copy link
Member

k3yavi commented Jun 10, 2021

Hi @curtisd0886 ,

Thanks for raising the issue, indeed it's weird and is not expected. Is it possible to share a subset of the data to help replicate the issue on my end and propose a solution ?

@k3yavi k3yavi added the alevin issue is primarily related to alevin label Jun 10, 2021
@rob-p
Copy link
Collaborator

rob-p commented Jun 10, 2021

Hi @curtisd0886,

Indeed; thanks for sharing! @k3yavi -- I think we should take a look here and at the resulting implications. We've thus far had limited access to data with barcode lengths > 16, so I think we should try to evaluate if there are any other places we make such assumptions.

@curtisd0886
Copy link
Author

Thanks for the quick reply! I have upload the first million reads as well as my zipped index folder. Let me know if you need anything else.

https://drive.google.com/drive/folders/1asnEIn_J2WCjsxql3z5zfrfhxFpO-KGY?usp=sharing

@rob-p
Copy link
Collaborator

rob-p commented Jun 14, 2021

Hi @curtisd0886,

So, issues relevant to processing this data should be resolved in the new release (v1.5.1). However, for technical reasons in the way different modes are handled internally, we had to simplify the mixing and matching of certain different options. Specifically, one can no longer use the --citeseq flag in conjunction with the custom geometry flags. So, if you have non-standard --citeseq geometry, the recommendation is to just use the new barcode specification format (e.g. --umi-geometry, --bc-geometry and --read-geometry), along with a couple of other flags. Specifically, you should explicitly provide --keepCBFraction 1.0 and a tgMap file (even if it is just a trivial one mapping each feature to itself). @k3yavi can elaborate further if I've overlooked anything.

--Rob

@curtisd0886
Copy link
Author

curtisd0886 commented Jun 14, 2021

Hi @rob-p ,
Thanks for the update. This will be great. I tried using the 1.5.1 version today and I am still only getting the 16nt barcodes. I used the below commands.

sudo ~/salmon-1.5.1_linux_x86_64/bin/salmon alevin -l ISR -i ~/Data/salmon/cell_hash -1 R1.fq.gz -2 R2.fq.gz --read-geometry 2[1-end] --bc-geometry 1[3-8,24-29,45-50] --umi-geometry 1[51-56] -o /home/cndd3/Data/Multi_3/hash_1.5.1/ --citeseq --featureStart 0 --featureLength 15 —keepCBFraction 1

I made sure to get rid of the --citeseq flag, but I am not sure if I am missing something else to get it working.

Thanks for your help with this!

@rob-p
Copy link
Collaborator

rob-p commented Jun 14, 2021

Ok, I'm tagging @k3yavi since I believe he tested the hot fix with the data you shared. Hey may have some more insight on what's going on here. By the way, the command you quote above still contains the --citeseq flag, but I assume that's just a typo.

@k3yavi
Copy link
Member

k3yavi commented Jun 14, 2021

hey try the following command, I double checked on 1.5.1 and it seemed to give the 18 length CBs:

sudo ~/salmon-1.5.1_linux_x86_64/bin/salmon alevin -l ISR -i ~/Data/salmon/cell_hash -1 R1.fq.gz -2 R2.fq.gz --read-geometry 2[1-15] --bc-geometry 1[3-8,24-29,45-50] --umi-geometry 1[51-56] -o /home/cndd3/Data/Multi_3/hash_1.5.1/  —keepCBFraction 1 --tgMap <might have to create a tsv file with feature name tab feature name>

If the program is not exiting with error with the command you shared then probably there is some error on the update as it should throw error when you simultaneously provide with citeseq and geometry flags.

@curtisd0886
Copy link
Author

Sorry I copied over the old command and modified it forgetting to remove the --citeseq flag. When I actually used it with Salmon I made sure the --citeseq flag was not used. I am running it again using the command you recommended and will let you know how it works.

Thanks again for your help.

@rob-p
Copy link
Collaborator

rob-p commented Jun 14, 2021

@curtisd0886 -- I think this may be my fault. I think the pre-compiled binary I uploaded may be cut from the wrong tag. Let me fix it and report back here.

@curtisd0886
Copy link
Author

@rob-p -- not a problem at all. I can compile my own copy if you like, I was just in a rush and used the binary instead.

@rob-p
Copy link
Collaborator

rob-p commented Jun 14, 2021

Ok @curtisd0886, it should be fixed now! Sorry for the mixup. Everything else (bioconda, docker, etc.) were cut from the tag, but the pre-compiled excitable was mistakenly copied over from the master branch (before the changes were merged in) rather than the tag. I've updated the executable.

@curtisd0886
Copy link
Author

Thank you guys. The new software did the trick and now I am getting 18 nt barcodes, however it appears that the mapping efficiency has gone down significantly. Previously it was about 8% of reads now it like 5.3e-5%. Any ideas where the issue might be?

@k3yavi
Copy link
Member

k3yavi commented Jun 19, 2021

Hi @curtisd0886 ,

I noticed that as well, and my hunch is that it's because a lot cellular barcodes are getting filtered based on their frequency as the length of CB are increased. Probably worth providing externally the list of cellular barcodes to quantify using --whitelist flag.

@rob-p
Copy link
Collaborator

rob-p commented May 26, 2022

Closing this for lack of activity, but feel free to re-open if new discussion arises.

@rob-p rob-p closed this as completed May 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
alevin issue is primarily related to alevin
Projects
None yet
Development

No branches or pull requests

3 participants