# KEGG API Biopython Notebook

This notebook features basic functions from biopython.kegg module that facilitate the use
of KEGG's API. 

With these basic functions you are able to search, find and recover info from KEGG database 
that are very important in a diversity of analysis.



## Install necessary packages

In [None]:
%pip install biopython

In [None]:
%pip install pandas

## Load packages into notebook

In [8]:
from Bio.KEGG import REST
import pandas as pd
import re

## Load function

This function is going to be used below

In [102]:
def get_df(data):
    lines = data.read().strip().split("\n")
    data_list = []
    for line in lines:
        parts = line.split("\t")
        if len(parts) == 2:
            data_list.append(parts)          
    return pd.DataFrame(data_list, columns=["Data", "Name"])

## Explore KEGG databases

KEGG is a big database made of subdatabases. To list all the databases available you can use the function `kegg_info` 
with "kegg". 


In [14]:
print(REST.kegg_info("kegg").read())

kegg             Kyoto Encyclopedia of Genes and Genomes
kegg             Release 110.0+/06-08, Jun 24
                 Kanehisa Laboratories
                 pathway   1,172,423 entries
                 brite       390,942 entries
                 module          561 entries
                 orthology    26,794 entries
                 genome       24,739 entries
                 genes     54,564,849 entries
                 compound     19,356 entries
                 glycan       11,220 entries
                 reaction     12,088 entries
                 rclass        3,194 entries
                 enzyme        8,158 entries
                 network       1,549 entries
                 variant       1,452 entries
                 disease       2,750 entries
                 drug         12,449 entries
                 dgroup        2,471 entries



To explore a specific database, use its name in the function:

In [16]:
print(REST.kegg_info("module").read())

module           KEGG Module Database
md               Release 110.0+/06-08, Jun 24
                 Kanehisa Laboratories
                 561 entries

linked db        pathway
                 ko
                 <org>
                 genome
                 compound
                 glycan
                 reaction
                 enzyme
                 pubmed



## Listing all data present in specific database

In [104]:
print(REST.kegg_list("pathway").read())

map01100	Metabolic pathways
map01110	Biosynthesis of secondary metabolites
map01120	Microbial metabolism in diverse environments
map01200	Carbon metabolism
map01210	2-Oxocarboxylic acid metabolism
map01212	Fatty acid metabolism
map01230	Biosynthesis of amino acids
map01232	Nucleotide metabolism
map01250	Biosynthesis of nucleotide sugars
map01240	Biosynthesis of cofactors
map01220	Degradation of aromatic compounds
map00010	Glycolysis / Gluconeogenesis
map00020	Citrate cycle (TCA cycle)
map00030	Pentose phosphate pathway
map00040	Pentose and glucuronate interconversions
map00051	Fructose and mannose metabolism
map00052	Galactose metabolism
map00053	Ascorbate and aldarate metabolism
map00500	Starch and sucrose metabolism
map00520	Amino sugar and nucleotide sugar metabolism
map00620	Pyruvate metabolism
map00630	Glyoxylate and dicarboxylate metabolism
map00640	Propanoate metabolism
map00650	Butanoate metabolism
map00660	C5-Branched dibasic acid metabolism
map00562	Inositol phosphate metabol

We could transform this into a nice table:

In [106]:
df_pathways = get_df(REST.kegg_list("pathway"))
df_pathways

Unnamed: 0,Data,Name
0,map01100,Metabolic pathways
1,map01110,Biosynthesis of secondary metabolites
2,map01120,Microbial metabolism in diverse environments
3,map01200,Carbon metabolism
4,map01210,2-Oxocarboxylic acid metabolism
...,...,...
566,map07035,Prostaglandins
567,map07110,Benzoic acid family
568,map07112,"1,2-Diphenyl substitution family"
569,map07114,Naphthalene family


And search in it specific pathways

In [193]:
df_pathways[df_pathways["Name"].str.contains("Nitrogen")]

Unnamed: 0,Data,Name
32,map00910,Nitrogen metabolism


## Finding id within database

kegg_find( )

With this function you can find entries with matching query keywords or other query data in a specific database.
For example, using the same id from above, we could search it in the "pathway" database: 

In [191]:
print(REST.kegg_find("pathway","map00910").read())

path:map00910	Nitrogen metabolism



## Getting entry information from id
But usually, we want more info about our ids. Then we can use the function kegg_get( )

Let's say I want to get more info on the "Nitrogen metabolism", we could do it this way:

In [162]:
print(REST.kegg_get("map00910").read())

ENTRY       map00910                    Pathway
NAME        Nitrogen metabolism
DESCRIPTION The biological process of the nitrogen cycle is a complex interplay among many microorganisms catalyzing different reactions, where nitrogen is found in various oxidation states ranging from +5 in nitrate to -3 in ammonia. The core nitrogen cycle involves four reduction pathways and two oxidation pathways. Nitrogen fixation [MD:M00175] is the process of reducing atmospheric molecular nitrogen to ammonia, a biologically useful reduced form incorporated into amino acids and other vital compounds. The ability of fixing atmospheric nitrogen by the nitrogenase enzyme complex is present in restricted prokaryotes (diazotrophs). The other reduction pathways are assimilatory nitrate reduction [MD:M00531] and dissimilatory nitrate reduction [MD:M00530] both for conversion to ammonia, and denitrification [MD:M00529]. Denitrification is a respiration in which nitrate or nitrite is reduced as a terminal elec

We can see all the modules that are associated with it and recover this information 
using regex (regular expressions):

In [198]:
pattern = r"(M0\d+)\s+(.*)\s\[PATH:(.*)\]"

modules_info = re.findall(pattern,REST.kegg_get("map00910").read())

df_modules = pd.DataFrame(modules_info, columns=["ModuleID", "Name", "mapID"])
df_modules

Unnamed: 0,ModuleID,Name,mapID
0,M00175,"Nitrogen fixation, nitrogen => ammonia",map00910
1,M00528,"Nitrification, ammonia => nitrite",map00910
2,M00529,"Denitrification, nitrate => nitrogen",map00910
3,M00530,"Dissimilatory nitrate reduction, nitrate => am...",map00910
4,M00531,"Assimilatory nitrate reduction, nitrate => amm...",map00910
5,M00804,"Complete nitrification, comammox, ammonia => n...",map00910
6,M00973,"Anammox, nitrite + ammonia => nitrogen",map00910
