## Search for the enzyme that regulated in the pathway 
Finding the enzymes for each pathway through the KEGG
#### The list below are the enzyme in the glycolysis pathway:
phosphoglucomutase;
aldose 1-epimerase;
enolase;
glucose-6-phosphate isomerase;
#### Pentose pathway:
glucose-6-phosphate isomerase;
transketolase;
fructose-bisphosphate aldolase;
fructose-1,6-bisphosphatase;
#### CTA cycle pathway:
pyruvate dehydrogenase E1;
isocitrate dehydrogenase;
pyruvate dehydrogenase E2;
malate dehydrogenase


In [21]:
#Using entrez to get the data for the protein coded gene
from Bio import SeqIO
from Bio import Entrez
Entrez.email = 'zihuixu1@berkeley.edu'

glycolysis_terms = ['homo sapiens[ORGN] phosphoglucomutase','drosophila[ORGN] phosphoglucomutase','Escherichia coli[ORGN] phosphoglucomutase',
                  'homo sapiens[ORGN] aldose 1-epimerase','drosophila[ORGN] aldose 1-epimerase','Escherichia coli[ORGN] aldose 1-epimerase',
                  'homo sapiens[ORGN] enolase','drosophila[ORGN] enolase','Escherichia coli[ORGN] enolase',
                  'homo sapiens[ORGN] glucose-6-phosphate isomerase', 'drosophila[ORGN] glucose-6-phosphate isomerase','Escherichia coli[ORGN] glucose-6-phosphate isomerase']
pentose_terms = ['homo sapiens[ORGN] glucose-6-phosphate isomerase','drosophila[ORGN] glucose-6-phosphate isomerase','Escherichia coli[ORGN] glucose-6-phosphate isomerase',
                 'homo sapiens[ORGN] transketolase', 'drosophila[ORGN] transketolase', 'Escherichia coli[ORGN] transketolase',
                 'homo sapiens[ORGN] fructose-bisphosphate aldolase','drosophila[ORGN] fructose-bisphosphate aldolase', 'Escherichia coli[ORGN] fructose-bisphosphate aldolase',
                 'homo sapiens[ORGN] fructose-1,6-bisphosphatase','drosophila[ORGN] fructose-1,6-bisphosphatase', 'Escherichia coli[ORGN] fructose-1,6-bisphosphatase']
citric_terms = ['homo sapiens[ORGN] pyruvate dehydrogenase E1','drosophila[ORGN] pyruvate dehydrogenase E1','Escherichia coli[ORGN] pyruvate dehydrogenase E1'
                'homo sapiens[ORGN] pyruvate dehydrogenase E2', 'drosophila[ORGN] pyruvate dehydrogenase E2', 'Escherichia coli[ORGN] pyruvate dehydrogenase E2',
                'homo sapiens[ORGN] dihydrolipoamide dehydrogenase', 'drosophila[ORGN] dihydrolipoamide dehydrogenase', 'Escherichia coli[ORGN] dihydrolipoamide dehydrogenase',
                'homo sapiens[ORGN] phosphoenolpyruvate carboxykinase', 'drosophila[ORGN] phosphoenolpyruvate carboxykinase', 'Escherichia coli[ORGN] phosphoenolpyruvate carboxykinase']

Prot_gene=[]
def accession(cycle_name, cycle_list):
    for i in cycle_list:
        handle = Entrez.esearch(db='protein',
                               term=i,
                               sort='relevance',
                               idtype='acc'
                               )
        seq=Entrez.read(handle)['IdList']
        if seq:
            First_seq=seq[0]
            handle=Entrez.efetch(db='protein', id=First_seq, rettype='gb', retmode='text')
            temp = SeqIO.read(handle,'gb')
            gene= temp.annotations['db_source'] 
            Prot_gene.append(gene)
            
accession("Glycolysis",glycolysis_terms)
accession("Pentose phosphate pathway",pentose_terms)
accession("Citric acid cycle", citric_terms)
# Get the accession number of the gene
gene_accession_num=[]
for i in Prot_gene:
    length=i.split()
    gene_accession_num.append(length[-1])
print(gene_accession_num)

['FUIG01000013.1', 'XM_023311391.1', 'AFAS01000018.1', 'NM_138801.2', 'XM_023317388.1', 'CP026473.1', 'X66610.1', 'XM_023310563.1', 'X82400.1', 'PROSITE:PS51463', 'XM_023313082.1', 'PROSITE:PS51463', 'PROSITE:PS51463', 'XM_023313082.1', 'PROSITE:PS51463', 'X67688.1', 'AF047336.1', 'AXTJ01000031.1', 'X07292.1', 'XM_023316276.1', 'AKNI01000030.1', 'L10320.1', 'LC058525.1', 'ADWR01000051.1', 'NM_005390.4', 'XM_023318640.1', 'XM_015178001.1', 'PROSITE:PS51826', 'AH003583.2', 'FJ907946.1', 'CP002516.1', 'L12760.1', 'NM_079060.3', 'AXTZ01000072.1']


In [22]:
#Create a table for genes
import sqlite3
conn=sqlite3.connect('my.db')
c=conn.cursor()
c.execute("""DROP TABLE GENES""")
c.execute("""CREATE TABLE GENES(name TEXT,
                                description TEXT,
                                organism TEXT,
                                nuleotide sequence TEXT);""")
def table(gene):
    for t in gene:
        handle = Entrez.esearch(db='nucleotide',
                        term=t,
                        sort='relevance',
                        idtype='acc')
        seqs = Entrez.read(handle)['IdList']
        if seqs:
            first_seq = seqs[0]
            handle=Entrez.efetch(db='nucleotide', id=first_seq, rettype='gb', retmode='text')
            temp = SeqIO.read(handle, 'gb')
            c.execute("INSERT INTO GENES VALUES (?, ?, ?, ?)", 
                      (temp.name, temp.description, temp.annotations['organism'], str(temp.seq)))
        else:
            continue
table(gene_accession_num)
conn.commit()


The file is really big for the gene table. In order to avoid the running crush, the table is not display here as the print out.

In [23]:
#Create a table for pathways
c.execute("""DROP TABLE PATHWAYS""")
c.execute("""CREATE TABLE PATHWAYS (name TEXT,
                                   description TEXT);""")

c.execute("""INSERT INTO PATHWAYS (name,
                                  description)
                          VALUES ('glycolysis',
                                  'convert C6H12O6 into pyruvate'),
                                  ('the citirc acid cycle',
                                   'released stored energy through oxidation of acetyl CoA'),
                                  ('pentose phosphate pathway',
                                  'oxidation of glucose-6P and generates a ribulose-5P');""")
conn.commit()

In [24]:
#print the pathway table
c.execute('SELECT * FROM PATHWAYS')
print(c.fetchall())

[('glycolysis', 'convert C6H12O6 into pyruvate'), ('the citirc acid cycle', 'released stored energy through oxidation of acetyl CoA'), ('pentose phosphate pathway', 'oxidation of glucose-6P and generates a ribulose-5P')]


In [25]:
#Create a table for enzymes
c.execute("""DROP TABLE ENZYMES""")
c.execute("""CREATE TABLE ENZYMES (name TEXT,
                                 function TEXT,
                                 commission number TEXT,
                                 pathway TEXT);""")
c.execute("""INSERT INTO ENZYMES VALUES('phosphoglucomutase',
                                  'transfers a phosphate group on an α-D-glucose monomer from the 1 to the 6 position',
                                  'EC:5.4.2.2',
                                  'glycolysis'),
                                  
                                  ('Aldose 1-epimerase',
                                  'catalyzes the chemical reaction between alpha-D-glucose and beta-D-glucose',
                                  'EC:5.1.3.3',
                                  'glycolysis'),
                                  
                                  ('Enolase',
                                  'catalyze the conversion of 2-phosphoglycerate(2-PG) to phosphoenolpyruvate(PEP)',
                                  'EC:4.2.1.11',
                                  'glycolysis'),
                                  
                                  ('phosphoglycerate kinase',
                                  'atalyzes the reversible transfer of a phosphate group from 1,3-bisphosphoglycerate to ADP',
                                  'EC:2.7.2.3',
                                  'glycolysis'),
                                  
                                  ('Glucose-6-phosphate isomerase',
                                  'interconvert glucose 6-phosphate and fructose 6-phosphate',
                                  'EC:5.3.1.9',
                                  'pentose phosphate pathway'),
                                  
                                  ('Transketolase',
                                  'Catalyzes the transfer of a two-carbon ketol group from a ketose donor to an aldose acceptor',
                                  'EC:2.2.1.1',
                                  'pentose phosphate pathway'),
                                  
                                  ('Fructose-bisphosphate aldolase',
                                  'splits the aldol and fructose 1,6-bisphosphate into the triose phosphates dihydroxyacetone phosphate(DHAP) and glyceraldehyde 3-phosphate(G3P)',
                                  'EC:4.1.2.13',
                                  'pentose phosphate pathway'),
                                  
                                  ('Fructose-1,6-bisphosphatase',
                                  'converts fructose-1,6-bisphosphate to fructose 6-phosphate',
                                  'EC:3.1.3.11',
                                  'pentose phosphate pathway'),
                                  
                                  ('Pyruvate dehydrogenase E1 component',
                                  'pyruvate decarboxylation and reductive acetylation of lipoic acid',
                                  'EC:1.2.4.1',
                                  'citric acid cycle'),
                                  
                                  ('Pyruvate dehydrogenase E2 component',
                                  'bind on lipoate-thioester and by transacylation produce acetyl-CoA',
                                  'EC:2.3.1.12',
                                  'citric acid cycle'),
                                  
                                  ('Isocitrate dehydrogenase',
                                  'catalyzes the oxidative decarboxylation of isocitrate, producing alpha-ketoglutarate and CO₂',
                                  'EC:1.1.1.42',
                                  'citric acid cycle'),
                               
                                  ('Malate dehydrogenase',
                                  'reversibly catalyzes the oxidation of malate to oxaloacetate using the reduction of NAD⁺ to NADH.',
                                  '1.1.1.37',
                                  'citric acid cycle');""")
conn.commit()

In [40]:
#print out enzyme table
c.execute('SELECT * FROM ENZYMES')
print(c.fetchall())

[('phosphoglucomutase', 'transfers a phosphate group on an α-D-glucose monomer from the 1 to the 6 position', 'EC:5.4.2.2', 'glycolysis'), ('Aldose 1-epimerase', 'catalyzes the chemical reaction between alpha-D-glucose and beta-D-glucose', 'EC:5.1.3.3', 'glycolysis'), ('Enolase', 'catalyze the conversion of 2-phosphoglycerate(2-PG) to phosphoenolpyruvate(PEP)', 'EC:4.2.1.11', 'glycolysis'), ('phosphoglycerate kinase', 'atalyzes the reversible transfer of a phosphate group from 1,3-bisphosphoglycerate to ADP', 'EC:2.7.2.3', 'glycolysis'), ('Glucose-6-phosphate isomerase', 'interconvert glucose 6-phosphate and fructose 6-phosphate', 'EC:5.3.1.9', 'pentose phosphate pathway'), ('Transketolase', 'Catalyzes the transfer of a two-carbon ketol group from a ketose donor to an aldose acceptor', 'EC:2.2.1.1', 'pentose phosphate pathway'), ('Fructose-bisphosphate aldolase', 'splits the aldol and fructose 1,6-bisphosphate into the triose phosphates dihydroxyacetone phosphate(DHAP) and glyceraldehy

In [53]:
# Create associative table  for enzyme-pathway 
c.execute("""SELECT ENZYMES.name,ENZYMES.function,ENZYMES.commission number,ENZYMES.pathway,PATHWAYS.description 
          FROM ENZYMES
          INNER JOIN PATHWAYS
          ON ENZYMES.pathway = PATHWAYS.name""")
print(c.fetchall())

[('phosphoglucomutase', 'transfers a phosphate group on an α-D-glucose monomer from the 1 to the 6 position', 'EC:5.4.2.2', 'glycolysis', 'convert C6H12O6 into pyruvate'), ('Aldose 1-epimerase', 'catalyzes the chemical reaction between alpha-D-glucose and beta-D-glucose', 'EC:5.1.3.3', 'glycolysis', 'convert C6H12O6 into pyruvate'), ('Enolase', 'catalyze the conversion of 2-phosphoglycerate(2-PG) to phosphoenolpyruvate(PEP)', 'EC:4.2.1.11', 'glycolysis', 'convert C6H12O6 into pyruvate'), ('phosphoglycerate kinase', 'atalyzes the reversible transfer of a phosphate group from 1,3-bisphosphoglycerate to ADP', 'EC:2.7.2.3', 'glycolysis', 'convert C6H12O6 into pyruvate'), ('Glucose-6-phosphate isomerase', 'interconvert glucose 6-phosphate and fructose 6-phosphate', 'EC:5.3.1.9', 'pentose phosphate pathway', 'oxidation of glucose-6P and generates a ribulose-5P'), ('Transketolase', 'Catalyzes the transfer of a two-carbon ketol group from a ketose donor to an aldose acceptor', 'EC:2.2.1.1', 'p

From the table obtained above,the pathway is one to many relationship. Muliply enzyme involved in glycosis process. Genes in the gene table is one to one relationship to the enzyme.