You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for creating this useful toolkit. When running the software on a really large combined libraries (~200k cells to consider), I found the bottleneck becomes the chunk_barcoded_bam.py part, and I found possible solutions to improve it.
transform cell barcode list from list to set bc = set([x.strip() for x in content]) which improves the speed of checking existence of barcodes a lot (~800 records /s -> ~100k records / s)
Use pysam read.get_tag, instead of the iteration way
def getBarcode(read, tag_get):
'''
Parse out the barcode per-read
'''
# for tg in read.tags:
# if(tag_get == tg[0]):
# return(tg[1])
# return("AA")
try:
read.get_tag(barcodeTag, tag_get)
except:
return ("AA")
This improves the speed from ~100k records/s -> 130k records/s
The text was updated successfully, but these errors were encountered:
Thanks for the input! Would you be able to open a PR for this?
On Jun 25, 2023, at 1:16 PM, ruochiz ***@***.***> wrote:
Thank you for creating this useful toolkit. When running the software on a really large combined libraries (~200k cells to consider), I found the bottleneck becomes the chunk_barcoded_bam.py part, and I found possible solutions to improve it.
1. transform cell barcode list from list to set bc = set([x.strip() for x in content]) which improves the speed of checking existence of barcodes a lot (~800 records /s -> ~100k records / s)
2. Use pysam read.get_tag, instead of the iteration way
def getBarcode(read, tag_get):
'''
Parse out the barcode per-read
'''
# for tg in read.tags:
# if(tag_get == tg[0]):
# return(tg[1])
# return("AA")
try:
read.get_tag(barcodeTag, tag_get)
except:
return ("AA")
This improves the speed from ~100k records/s -> 130k records/s
—
Reply to this email directly, view it on GitHub<#73>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AD32FYOWVXWRSYBGB5H4S6DXNCMARANCNFSM6AAAAAAZTK7Z5Q>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
now implemented in v0.6.8. Thank you very much @ruochiz for the contribution. You should be able to pip install the latest version of the software now.
Thank you for creating this useful toolkit. When running the software on a really large combined libraries (~200k cells to consider), I found the bottleneck becomes the
chunk_barcoded_bam.py
part, and I found possible solutions to improve it.bc = set([x.strip() for x in content])
which improves the speed of checking existence of barcodes a lot (~800 records /s -> ~100k records / s)This improves the speed from ~100k records/s -> 130k records/s
The text was updated successfully, but these errors were encountered: