# Parsing CSV files from individual genes in gnomAD

Example: [file obtained for GPX2:](https://gnomad.broadinstitute.org/gene/ENSG00000176153)

```csv
Chromosome,Position,rsIDs,Reference,Alternate,Source,Filters - exomes,Filters - genomes,Transcript,HGVS Consequence,Protein Consequence,Transcript Consequence,VEP Annotation,ClinVar Clinical Significance,ClinVar Variation ID,Flags,Allele Count,Allele Number,Allele Frequency,Homozygote Count,Hemizygote Count,Allele Count African/African American,Allele Number African/African American,Homozygote Count African/African American,Hemizygote Count African/African American,Allele Count Latino/Admixed American,Allele Number Latino/Admixed American,Homozygote Count Latino/Admixed American,Hemizygote Count Latino/Admixed American,Allele Count Ashkenazi Jewish,Allele Number Ashkenazi Jewish,Homozygote Count Ashkenazi Jewish,Hemizygote Count Ashkenazi Jewish,Allele Count East Asian,Allele Number East Asian,Homozygote Count East Asian,Hemizygote Count East Asian,Allele Count European (Finnish),Allele Number European (Finnish),Homozygote Count European (Finnish),Hemizygote Count European (Finnish),Allele Count European (non-Finnish),Allele Number European (non-Finnish),Homozygote Count European (non-Finnish),Hemizygote Count European (non-Finnish),Allele Count Other,Allele Number Other,Homozygote Count Other,Hemizygote Count Other,Allele Count South Asian,Allele Number South Asian,Homozygote Count South Asian,Hemizygote Count South Asian
14,65406155,rs1471966487,G,T,gnomAD Exomes,PASS,NA,ENST00000389614.5,c.*51C>A,,c.*51C>A,3_prime_UTR_variant,,,,1,218184,0.0000045832875004583285,0,0,0,15192,0,0,0,31042,0,0,0,6818,0,0,1,17494,0,0,0,19274,0,0,0,100426,0,0,0,5126,0,0,0,22812,0,0
14,65406166,rs1418718413,C,T,gnomAD Exomes,PASS,NA,ENST00000389614.5,c.*40G>A,,c.*40G>A,3_prime_UTR_variant,,,,1,229576,0.00000435585601282364,0,0,0,15322,0,0,0,32902,0,0,0,7906,0,0,1,17834,0,0,0,20042,0,0,0,104758,0,0,0,5506,0,0,0,25306,0,0
```

In [1]:
#!pip install pandas
import pandas as pd
from pathlib import Path

# define the directory where I keep the data
datadir = Path("""~/data""")
filename=datadir/"gnomAD_v2.1.1_ENSG00000176153_2022_03_22_11_24_32.csv"
filename=datadir/"gnomAD_v2.1.1_ENSG00000233276_2022_03_22_10_54_50.csv"


# load the CSV file
data=pd.read_csv(filename)
# print(data)
data.loc[:,["Protein Consequence","ClinVar Clinical Significance"]]
data = data[data["Protein Consequence"].notna()].loc[:,["Protein Consequence","ClinVar Clinical Significance"]]
data[data["ClinVar Clinical Significance"].notna()]

#contain_values = df[df[

FileNotFoundError: [Errno 2] No such file or directory: '/home/jordivilla/data/gnomAD_v2.1.1_ENSG00000233276_2022_03_22_10_54_50.csv'

In [1]:
from Bio import Entrez
import sys

Entrez.email="jordi.villa@uvic.cat"

list_of_genes=['GPX1','GPX2','GPX3','GPX4','GPX5','GPX6','GPX7','GPX8']
term_entrez=' OR '.join(list_of_genes)

print('querying for... ',term_entrez)

esearch_result = Entrez.esearch(db="gene",term='GPX3',rettype='fasta',retmod="text")
parsed_result = Entrez.read(esearch_result)
print(parsed_result)

print(parsed_result['IdList'])
quit()

request = Entrez.epost("gene",id=",".join(parsed_result['IdList']))
try:
   result = Entrez.read(request)
except RuntimeError as e:
   print("An error occurred while retrieving the annotations.")
   print("The error returned was %s" % e)
   sys.exit(-1)
 
webEnv = result["WebEnv"]
queryKey = result["QueryKey"]
efetch_result = Entrez.efetch(db="gene", webenv=webEnv, query_key = queryKey, retmode="xml")
genes = Entrez.read(efetch_result)
print(genes)


querying for...  GPX1 OR GPX2 OR GPX3 OR GPX4 OR GPX5 OR GPX6 OR GPX7 OR GPX8
{'Count': '505', 'RetMax': '20', 'RetStart': '0', 'IdList': ['7124', '3569', '3091', '5743', '3576', '5468', '4780', '5444', '2952', '2950', '4233', '2908', '13982', '335', '155871', '6648', '10135', '860', '2034', '9314'], 'TranslationSet': [], 'TranslationStack': [{'Term': 'GPX3[All Fields]', 'Field': 'All Fields', 'Count': '505', 'Explode': 'N'}, 'GROUP'], 'QueryTranslation': 'GPX3[All Fields]'}
['7124', '3569', '3091', '5743', '3576', '5468', '4780', '5444', '2952', '2950', '4233', '2908', '13982', '335', '155871', '6648', '10135', '860', '2034', '9314']


KeyboardInterrupt: 