# Assignment 2
Look for any information about gene 'PGM1' (encoding Phosphoglucomutase 1) in the NCBI database 'gene' and print the output. Mutations in this gene cause glycogen storage disease. PGM1 is an evolutionarily conserved enzyme that regulates one of the most important metabolic carbohydrate trafficking points in prokaryotic and eukaryotic organisms.

Deficiency is an inherited metabolic disorder in humans (cause of Congenital disorder of glycosylation). Affected patients show multiple disease phenotypes, including dilated cardiomyopathy, exercise intolerance, and hepatopathy, reflecting the central role of the enzyme in glucose metabolism

## Libraries

In [1]:
from Bio import Entrez
from pprint import pprint
import xml.etree.ElementTree as ET

## Functions

In [88]:
Entrez.email = "giacomo.villa.mi@gmail.com"

def info():
    handle = Entrez.einfo()
    result = Entrez.read(handle)
    dbs = result['DbList']
    print("Avaible database:")
    for db in dbs:
        print(db.capitalize(), end = ", ")
    print()
    print("FOR REASEARCH ALL LOWERCASE!")

def db_info(db):
    handle = Entrez.einfo(db = db)
    record = Entrez.read(handle)
    print(pprint(record['DbInfo']))
        
def good_print(text):
    print(pprint(text))
        
def esearch(db, query, num_max = 20):
    handle = Entrez.esearch(db = db, term = query, retmax = num_max)
    record = Entrez.read(handle, validate = True)
    return record

def esummary(db, id_val):
    handle = Entrez.esummary(db = db, id = id_val)
    record = Entrez.read(handle, validate = True)
    return record

def efetch(db, id_val, retmode = 'xml'):
    handle = Entrez.efetch(db  = db, id = id_val, retmode = retmode)
    record = Entrez.read(handle, validate = True)
    return record

def get_idList(record):
    return record['IdList']


## Main

### Print some useful info

In [34]:
info()
db_info('gene')

### Research

In [74]:
ids = esearch('pubmed', 'PGM1', 1000)
print(pprint(ids))
ids = get_idList(ids)
print(ids)
print("")

{'Count': '768',
 'IdList': ['32221390', '32185602', '32171858', '32109608', '32057119', '32030330', '31956197', '31872653', '31867695', '31805863', '31700124', '31691304', '31645594', '31563034', '31562743', '31518962', '31501489', '31285465', '31142285', '31119523', '31077402', '31075182', '31056658', '31002879', '30985866', '30982613', '30862562', '30737079', '30653653', '30641270', '30567101', '30485831', '30335765', '30262252', '30174305', '30149860', '30122451', '30120520', '30086874', '30064602', '30048639', '29883552', '29858906', '29788479', '29752652', '29702557', '29497882', '29280746', '29112118', '28882528', '28855921', '28855392', '28844145', '28837627', '28794993', '28737584', '28617415', '28588557', '28190645', '28139241', '28126686', '28117557', '28108845', '20301507', '27861333', '27778364', '27663060', '27515780', '27351072', '27342503', '27206562', '27150525', '27060284', '27023439', '26972339', '26908106', '26768186', '26758299', '26511157', '26307094', '26303607',

In [77]:
summary = esummary('pubmed', 178589)

In [80]:
good_print(summary[0])

{'ArticleIds': {'doi': '10.1159/000152764',
                'eid': '178589',
                'medline': [],
                'pubmed': ['178589'],
                'rid': '178589'},
 'AuthorList': ['Mondovano JA', 'Gaensslen RE'],
 'DOI': '10.1159/000152764',
 'ELocationID': '',
 'EPubDate': '',
 'ESSN': '1423-0062',
 'FullJournalName': 'Human heredity',
 'HasAbstract': IntegerElement(1, attributes={}),
 'History': {'entrez': '1975/01/01 00:00',
             'medline': ['1975/01/01 00:01'],
             'pubmed': ['1975/01/01 00:00']},
 'ISSN': '0001-5652',
 'Id': '178589',
 'Issue': '6',
 'Item': [],
 'LangList': ['English'],
 'LastAuthor': 'Gaensslen RE',
 'NlmUniqueID': '0200525',
 'Pages': '488-92',
 'PmcRefCount': IntegerElement(0, attributes={}),
 'PubDate': '1975',
 'PubStatus': 'ppublish',
 'PubTypeList': ['Journal Article'],
 'RecordStatus': 'PubMed - indexed for MEDLINE',
 'References': [],
 'SO': '1975;25(6):488-92',
 'Source': 'Hum Hered',
 'Title': 'Distribution of adenylate

In [89]:
fetch = efetch('pubmed', 178589)

In [111]:
good_print(fetch['PubmedArticle'][0]['MedlineCitation']['Article']['Abstract']['AbstractText'][0])
# print(fetch['PubmedArticle'][0].keys())
# dict_keys(['MedlineCitation', 'PubmedData'])

'Blood samples from unrelated persons living in New York, N.Y., were examined for phosphoglucomutase (PGM) and adenylate kinase (AK) phenotypes, the sample consisting of 164 Caucasians, 133 Negroes, 129 persons of Spanish origin or descent, and 156 Chinese for PGM and 136 Caucasians, 134 Negroes, 136 persons of Spanish origin or descent, and 156 Chinese for AK. The PGM1 gene frequency was found to be 0.7774 for Caucasians, 0.8083 for Negroes, 0.7461 for Hispanic persons, and 0.7917 for Chinese. One Hispanic person had a very rare type, PGM 8-1-FAST. The AK1 gene frequency was found to be 0.9669 for Caucasians, 0.9813 for Negroes, 0.9779 for Hispanic persons, and 1.000 for Chinese.'
None


## Analysis

Interesting fields: 
* On Entrez.summary GENE
    * Organism (['DocumentSummarySet']['DocumentSummary'][0]['Organism'])
    * summary  ['DocumentSummarySet']['DocumentSummary'][0]['Summary'] (wordcloud)
* On Entrez.fetch GENE
    * Entrezgene_track-info
* Analysis on pubmed about this gene
    * wordcloud on abstract ['PubmedArticle'][0]['MedlineCitation']['Article']['Abstract']['AbstractText'][0]
    * Language (probably all eng)
    * Year ['PubmedArticle'][0]['PubmedData']['History'][0]
    * Country ['PubmedArticle'][0]['MedlineCitation']['MedlineJournalInfo']
* Analysis on tax 