In [1]:
from Bio.KEGG import REST

human_pathways = REST.kegg_list("pathway", "hsa").read()
human_pathways

'path:hsa00010\tGlycolysis / Gluconeogenesis - Homo sapiens (human)\npath:hsa00020\tCitrate cycle (TCA cycle) - Homo sapiens (human)\npath:hsa00030\tPentose phosphate pathway - Homo sapiens (human)\npath:hsa00040\tPentose and glucuronate interconversions - Homo sapiens (human)\npath:hsa00051\tFructose and mannose metabolism - Homo sapiens (human)\npath:hsa00052\tGalactose metabolism - Homo sapiens (human)\npath:hsa00053\tAscorbate and aldarate metabolism - Homo sapiens (human)\npath:hsa00061\tFatty acid biosynthesis - Homo sapiens (human)\npath:hsa00062\tFatty acid elongation - Homo sapiens (human)\npath:hsa00071\tFatty acid degradation - Homo sapiens (human)\npath:hsa00072\tSynthesis and degradation of ketone bodies - Homo sapiens (human)\npath:hsa00100\tSteroid biosynthesis - Homo sapiens (human)\npath:hsa00120\tPrimary bile acid biosynthesis - Homo sapiens (human)\npath:hsa00130\tUbiquinone and other terpenoid-quinone biosynthesis - Homo sapiens (human)\npath:hsa00140\tSteroid hormo

In [2]:
# Filter all human pathways for repair pathways
repair_pathways = []
for line in human_pathways.rstrip().split("\n"):
    entry, description = line.split("\t")
    if "repair" in description:
        repair_pathways.append(entry)
repair_pathways[2]

'path:hsa03430'

In [3]:
# Get the genes for pathways and add them to a list
repair_genes = []
for pathway in repair_pathways:
    pathway_file = REST.kegg_get(pathway).read()  # query and read each pathway
pathway_file

'ENTRY       hsa03430                    Pathway\nNAME        Mismatch repair - Homo sapiens (human)\nDESCRIPTION DNA mismatch repair (MMR) is a highly conserved biological pathway that plays a key role in maintaining genomic stability. MMR corrects DNA mismatches generated during DNA replication, thereby preventing mutations from becoming permanent in dividing cells. MMR also suppresses homologous recombination and was recently shown to play a role in DNA damage signaling. Defects in MMR are associated with genome-wide instability, predisposition to certain types of cancer including HNPCC, resistance to certain chemotherapeutic agents, and abnormalities in meiosis and sterility in mammalian systems.\n            The Escherichia coli MMR pathway has been extensively studied and is well characterized. In E. coli, the mismatch-activated MutS-MutL-ATP complex licenses MutH to incise the nearest unmethylated GATC sequence. UvrD and an exonuclease generate a gap. This gap is filled by pol I

In [7]:
    
    # iterate through each KEGG pathway file, keeping track of which section
    # of the file we're in, only read the gene in each pathway
    current_section = None
    for line in pathway_file.rstrip().split("\n"):
        section = line[:12].strip()  # section names are within 12 columns
        if not section == "":
            current_section = section

        if current_section == "GENE":
            gene_identifiers, gene_description = line[12:].split("; ")
            gene_id, gene_symbol = gene_identifiers.split()

            if not gene_symbol in repair_genes:
                repair_genes.append(gene_symbol)
print("GeneID and gene symbol are:",gene_id,gene_symbol)
print("There are %d DNA repair pathways and %d repair genes. The genes are:" % \
      (len(repair_pathways), len(repair_genes)))
print(", ".join(repair_genes))

GeneID and gene symbol are: 3978 LIG1
There are 3 DNA repair pathways and 23 repair genes. The genes are:
SSBP1, PMS2, MLH1, MSH6, MSH2, MSH3, MLH3, RFC1, RFC4, RFC2, RFC5, RFC3, PCNA, EXO1, RPA1, RPA2, RPA3, RPA4, POLD1, POLD2, POLD3, POLD4, LIG1
