Possible improvement on chunk_barcoded_bam.py #73

ruochiz · 2023-06-25T20:15:57Z

Thank you for creating this useful toolkit. When running the software on a really large combined libraries (~200k cells to consider), I found the bottleneck becomes the chunk_barcoded_bam.py part, and I found possible solutions to improve it.

transform cell barcode list from list to set bc = set([x.strip() for x in content]) which improves the speed of checking existence of barcodes a lot (~800 records /s -> ~100k records / s)
Use pysam read.get_tag, instead of the iteration way

def getBarcode(read, tag_get):
  '''
  Parse out the barcode per-read
  '''
  # for tg in read.tags:
  # 	if(tag_get == tg[0]):
  # 		return(tg[1])
  # return("AA")
  try:
    read.get_tag(barcodeTag, tag_get)
  except:
    return ("AA")

This improves the speed from ~100k records/s -> 130k records/s

The text was updated successfully, but these errors were encountered:

caleblareau · 2023-06-25T23:06:07Z

Thanks for the input! Would you be able to open a PR for this? On Jun 25, 2023, at 1:16 PM, ruochiz ***@***.***> wrote: Thank you for creating this useful toolkit. When running the software on a really large combined libraries (~200k cells to consider), I found the bottleneck becomes the chunk_barcoded_bam.py part, and I found possible solutions to improve it. 1. transform cell barcode list from list to set bc = set([x.strip() for x in content]) which improves the speed of checking existence of barcodes a lot (~800 records /s -> ~100k records / s) 2. Use pysam read.get_tag, instead of the iteration way def getBarcode(read, tag_get): ''' Parse out the barcode per-read ''' # for tg in read.tags: # if(tag_get == tg[0]): # return(tg[1]) # return("AA") try: read.get_tag(barcodeTag, tag_get) except: return ("AA") This improves the speed from ~100k records/s -> 130k records/s — Reply to this email directly, view it on GitHub<#73>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AD32FYOWVXWRSYBGB5H4S6DXNCMARANCNFSM6AAAAAAZTK7Z5Q>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

caleblareau · 2023-06-26T21:29:45Z

now implemented in v0.6.8. Thank you very much @ruochiz for the contribution. You should be able to pip install the latest version of the software now.

caleblareau closed this as completed Jun 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible improvement on chunk_barcoded_bam.py #73

Possible improvement on chunk_barcoded_bam.py #73

ruochiz commented Jun 25, 2023

caleblareau commented Jun 25, 2023 via email

caleblareau commented Jun 26, 2023

Possible improvement on chunk_barcoded_bam.py #73

Possible improvement on chunk_barcoded_bam.py #73

Comments

ruochiz commented Jun 25, 2023

caleblareau commented Jun 25, 2023 via email

caleblareau commented Jun 26, 2023