# MAF file and miRNA hit table

Read the maf file in the same folder. Give me a table listing all hit and non-hit miRNAs and the number of mutations on them. For example,

|miR|#hits|
|---|---|
|miR1|2|
|miR2|0|
|miR3|5|

##Reading MAF

In [2]:
import pandas as pd
from pylab import *

FILE = "hgsc.bcm.edu__Mixed_curated_DNA_sequencing_level2.maf"

In [9]:
maf = pd.read_table(FILE, usecols=["Hugo_Symbol", "Chrom", "Start_Position", "End_Position"])
maf.head()

Unnamed: 0,Hugo_Symbol,Chrom,Start_Position,End_Position
0,A1BG,19,58864353,58864353
1,A1CF,10,52573773,52573773
2,A2ML1,12,8995922,8995922
3,A4GALT,22,43089055,43089055
4,A4GALT,22,43089757,43089757


##Reading miRBase

In [11]:
FILE = "../hw6/hsa.gff3"
df = pd.read_table(FILE, comment="#", usecols=["Chromosome", "Start", "End", "miR Name"])
df.head()

Unnamed: 0,Chromosome,Start,End,miR Name
0,chr1,17369,17436,ID=MI0022705;Alias=MI0022705;Name=hsa-mir-6859-1
1,chr1,17409,17431,ID=MIMAT0027618;Alias=MIMAT0027618;Name=hsa-mi...
2,chr1,17369,17391,ID=MIMAT0027619;Alias=MIMAT0027619;Name=hsa-mi...
3,chr1,30366,30503,ID=MI0006363;Alias=MI0006363;Name=hsa-mir-1302-2
4,chr1,30438,30458,ID=MIMAT0005890;Alias=MIMAT0005890;Name=hsa-mi...


##Defining a searching function

In [13]:
def pos2miR(chr, start, end):
    chunk = df[df["Chromosome"] == "chr" + str(chr)]  # shrink the search space to the given chr
    out = []
    for idx, row in chunk.iterrows():
        if row["Start"] <= start <= row["End"] or \
           row["Start"] <=  end  <= row["End"]:
            # get the name (either hsa-mir-### or hsa-miR-###)
            miR = row["miR Name"].split("Name=")[1].split(";")[0]
            out.append(miR)
    return out

In [15]:
# test
pos2miR(1, 17370, 17370)

['hsa-mir-6859-1', 'hsa-miR-6859-3p']

In [17]:
# test2
pos2miR("Y", 2609229, 2609229)

['hsa-mir-6089-2', 'hsa-miR-6089']

##Searching for hits

In [19]:
from collections import Counter
miR2hits = Counter()
for idx, row in maf.iterrows():
    print idx, '\r',
    hits = pos2miR(row["Chrom"], row["Start_Position"], row["End_Position"])
    for hit in hits:
        print row["Hugo_Symbol"], row["Chrom"], row["Start_Position"], row["End_Position"], "hits on", hit
        miR2hits[hit] += 1

MAGEC3 X 140926205 140926205 hits on hsa-mir-320d-2
VARS2 6 30890885 30890885 hits on hsa-mir-4640


In [20]:
miR2hits

Counter({'hsa-mir-4640': 1, 'hsa-mir-320d-2': 1})