# Processing 10x .bam files without MD tags

In [1]:
import os
import re

## Renaming chromosomes in genome .fasta files

If your `.sam` file does not have MD tags and would like to add them, you will need the genome that the reads were generated against. Unfortunately, the genome you [download](https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest) from the 10x website has a `GRCh38_` prefix before all chromosomes, scaffolds, etc., which will not allow samtools to find the chromosome in the index.

We will remove this prefix in this step and then run `samtools calmd` on the .bam files and the genome thus created.

In [4]:
GENOME_FILE = "/home/ubuntu/refdata-cellranger-GRCh38-and-mm10-3.1.0/fasta/genome.fa"
GENOME_RENAMED_FILE = "./genome_chromosome_renamed.fna"
FILE_WITH_NO_MD = "./5k_pbmc_protein_v3_possorted_genome_bam.bam"
FILE_WITH_MD = "./5k_pbmc_protein_v3_calmd.bam"

In [3]:
with open(GENOME_RENAMED_FILE, "w") as g:
    with open(GENOME_FILE, "r") as f:
        for line in f:
            if line and line[0] == ">":
                line = line.replace('>GRCh38_', '>')
                
            g.write(line)

>GRCh38_1 dna:chromosome chromosome:GRCh38:1:1:248956422:1 REF

>1 dna:chromosome chromosome:GRCh38:1:1:248956422:1 REF

>GRCh38_10 dna:chromosome chromosome:GRCh38:10:1:133797422:1 REF

>10 dna:chromosome chromosome:GRCh38:10:1:133797422:1 REF

>GRCh38_11 dna:chromosome chromosome:GRCh38:11:1:135086622:1 REF

>11 dna:chromosome chromosome:GRCh38:11:1:135086622:1 REF

>GRCh38_12 dna:chromosome chromosome:GRCh38:12:1:133275309:1 REF

>12 dna:chromosome chromosome:GRCh38:12:1:133275309:1 REF

>GRCh38_13 dna:chromosome chromosome:GRCh38:13:1:114364328:1 REF

>13 dna:chromosome chromosome:GRCh38:13:1:114364328:1 REF

>GRCh38_14 dna:chromosome chromosome:GRCh38:14:1:107043718:1 REF

>14 dna:chromosome chromosome:GRCh38:14:1:107043718:1 REF

>GRCh38_15 dna:chromosome chromosome:GRCh38:15:1:101991189:1 REF

>15 dna:chromosome chromosome:GRCh38:15:1:101991189:1 REF

>GRCh38_16 dna:chromosome chromosome:GRCh38:16:1:90338345:1 REF

>16 dna:chromosome chromosome:GRCh38:16:1:90338345:1 REF

>GRCh3

>mm10___10 dna:chromosome chromosome:GRCm38:10:1:130694993:1 REF

>mm10___10 dna:chromosome chromosome:GRCm38:10:1:130694993:1 REF

>mm10___11 dna:chromosome chromosome:GRCm38:11:1:122082543:1 REF

>mm10___11 dna:chromosome chromosome:GRCm38:11:1:122082543:1 REF

>mm10___12 dna:chromosome chromosome:GRCm38:12:1:120129022:1 REF

>mm10___12 dna:chromosome chromosome:GRCm38:12:1:120129022:1 REF

>mm10___13 dna:chromosome chromosome:GRCm38:13:1:120421639:1 REF

>mm10___13 dna:chromosome chromosome:GRCm38:13:1:120421639:1 REF

>mm10___14 dna:chromosome chromosome:GRCm38:14:1:124902244:1 REF

>mm10___14 dna:chromosome chromosome:GRCm38:14:1:124902244:1 REF

>mm10___15 dna:chromosome chromosome:GRCm38:15:1:104043685:1 REF

>mm10___15 dna:chromosome chromosome:GRCm38:15:1:104043685:1 REF

>mm10___16 dna:chromosome chromosome:GRCm38:16:1:98207768:1 REF

>mm10___16 dna:chromosome chromosome:GRCm38:16:1:98207768:1 REF

>mm10___17 dna:chromosome chromosome:GRCm38:17:1:94987271:1 REF

>mm10___17 dn

## Add MD tag to original BAM file

In [5]:
!samtools calmd -b $FILE_WITH_NO_MD $GENOME_RENAMED_FILE > $FILE_WITH_MD

In [6]:
print("successfully added MD tag")

successfully added MD tag
