# <img src="https://github.com/JuliaLang/julia-logo-graphics/raw/master/images/julia-logo-color.png" height="100" /> _Colab Notebook Template_

## Instructions
1. Work on a copy of this notebook: _File_ > _Save a copy in Drive_ (you will need a Google account). Alternatively, you can download the notebook using _File_ > _Download .ipynb_, then upload it to [Colab](https://colab.research.google.com/).
2. If you need a GPU: _Runtime_ > _Change runtime type_ > _Hardware accelerator_ = _GPU_.
3. Execute the following cell (click on it and press Ctrl+Enter) to install Julia, IJulia and other packages (if needed, update `JULIA_VERSION` and the other parameters). This takes a couple of minutes.
4. Reload this page (press Ctrl+R, or ⌘+R, or the F5 key) and continue to the next section.

_Notes_:
* If your Colab Runtime gets reset (e.g., due to inactivity), repeat steps 2, 3 and 4.
* After installation, if you want to change the Julia version or activate/deactivate the GPU, you will need to reset the Runtime: _Runtime_ > _Factory reset runtime_ and repeat steps 3 and 4.

In [None]:
%%shell
set -e

#---------------------------------------------------#
JULIA_VERSION="1.8.2" # any version ≥ 0.7.0
JULIA_PACKAGES="IJulia BenchmarkTools"
JULIA_PACKAGES_IF_GPU="CUDA" # or CuArrays for older Julia versions
JULIA_NUM_THREADS=2
#---------------------------------------------------#

if [ -z `which julia` ]; then
  # Install Julia
  JULIA_VER=`cut -d '.' -f -2 <<< "$JULIA_VERSION"`
  echo "Installing Julia $JULIA_VERSION on the current Colab Runtime..."
  BASE_URL="https://julialang-s3.julialang.org/bin/linux/x64"
  URL="$BASE_URL/$JULIA_VER/julia-$JULIA_VERSION-linux-x86_64.tar.gz"
  wget -nv $URL -O /tmp/julia.tar.gz # -nv means "not verbose"
  tar -x -f /tmp/julia.tar.gz -C /usr/local --strip-components 1
  rm /tmp/julia.tar.gz

  # Install Packages
  nvidia-smi -L &> /dev/null && export GPU=1 || export GPU=0
  if [ $GPU -eq 1 ]; then
    JULIA_PACKAGES="$JULIA_PACKAGES $JULIA_PACKAGES_IF_GPU"
  fi
  for PKG in `echo $JULIA_PACKAGES`; do
    echo "Installing Julia package $PKG..."
    julia -e 'using Pkg; pkg"add '$PKG'; precompile;"' &> /dev/null
  done

  # Install kernel and rename it to "julia"
  echo "Installing IJulia kernel..."
  julia -e 'using IJulia; IJulia.installkernel("julia", env=Dict(
      "JULIA_NUM_THREADS"=>"'"$JULIA_NUM_THREADS"'"))'
  KERNEL_DIR=`julia -e "using IJulia; print(IJulia.kerneldir())"`
  KERNEL_NAME=`ls -d "$KERNEL_DIR"/julia*`
  mv -f $KERNEL_NAME "$KERNEL_DIR"/julia

  echo ''
  echo "Successfully installed `julia -v`!"
  echo "Please reload this page (press Ctrl+R, ⌘+R, or the F5 key) then"
  echo "jump to the 'Checking the Installation' section."
fi

Installing Julia 1.8.2 on the current Colab Runtime...
2023-07-30 19:35:33 URL:https://julialang-s3.julialang.org/bin/linux/x64/1.8/julia-1.8.2-linux-x86_64.tar.gz [135859273/135859273] -> "/tmp/julia.tar.gz" [1]
Installing Julia package IJulia...
Installing Julia package BenchmarkTools...
Installing IJulia kernel...
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mInstalling julia kernelspec in /root/.local/share/jupyter/kernels/julia-1.8

Successfully installed julia version 1.8.2!
Please reload this page (press Ctrl+R, ⌘+R, or the F5 key) then
jump to the 'Checking the Installation' section.




In [None]:
versioninfo()

Julia Version 1.8.2
Commit 36034abf260 (2022-09-29 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 2 × Intel(R) Xeon(R) CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, broadwell)
  Threads: 2 on 2 virtual cores
Environment:
  LD_LIBRARY_PATH = /usr/local/nvidia/lib:/usr/local/nvidia/lib64
  JULIA_NUM_THREADS = 2


## Install Packages

In [None]:
using Pkg

In [None]:
Pkg.add("Revise")
Pkg.add("DataFrames")
#Pkg.add("BenchmarkTools")
Pkg.add(url="https://github.com/bwbioinfo/KEGGAPI.jl")

[32m[1m    Updating[22m[39m registry at `~/.julia/registries/General.toml`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m   Installed[22m[39m OrderedCollections ─ v1.6.2
[32m[1m   Installed[22m[39m CodeTracking ─────── v1.3.1
[32m[1m   Installed[22m[39m Requires ─────────── v1.3.0
[32m[1m   Installed[22m[39m JuliaInterpreter ─── v0.9.23
[32m[1m   Installed[22m[39m LoweredCodeUtils ─── v2.3.0
[32m[1m   Installed[22m[39m Revise ───────────── v3.5.3
[32m[1m    Updating[22m[39m `~/.julia/environments/v1.8/Project.toml`
 [90m [295af30f] [39m[92m+ Revise v3.5.3[39m
[32m[1m    Updating[22m[39m `~/.julia/environments/v1.8/Manifest.toml`
 [90m [da1fd8a2] [39m[92m+ CodeTracking v1.3.1[39m
 [90m [aa1ae85d] [39m[92m+ JuliaInterpreter v0.9.23[39m
 [90m [6f1432cf] [39m[92m+ LoweredCodeUtils v2.3.0[39m
 [90m [bac558e1] [39m[92m+ OrderedCollections v1.6.2[39m
 [90m [ae029012] [39m[92m+ Requires v1.3.0[39m
 [90m [295af30f] [39m

In [None]:
using Revise
using DataFrames
using BenchmarkTools
using KEGGAPI


In [None]:
# Get information about the KEGG database
kegg_info = KEGGAPI.info("kegg");
print(kegg_info)

kegg             Kyoto Encyclopedia of Genes and Genomes
kegg             Release 107.0+/07-28, Jul 23
                 Kanehisa Laboratories
                 pathway   1,068,402 entries
                 brite       357,981 entries
                 module          556 entries
                 orthology    25,943 entries
                 genome       22,621 entries
                 genes     49,148,640 entries
                 compound     19,119 entries
                 glycan       11,222 entries
                 reaction     11,941 entries
                 rclass        3,195 entries
                 enzyme        8,056 entries
                 network       1,508 entries
                 variant       1,180 entries
                 disease       2,645 entries
                 drug         12,184 entries
                 dgroup        2,435 entries


In [None]:
# Get a list of pathways in the KEGG database
kegg_pathways = KEGGAPI.list("pathway");
DataFrame(
    kegg_pathways.data,
    kegg_pathways.colnames
    )

Row,ID,Pathway
Unnamed: 0_level_1,String,String
1,map01100,Metabolic pathways
2,map01110,Biosynthesis of secondary metabolites
3,map01120,Microbial metabolism in diverse environments
4,map01200,Carbon metabolism
5,map01210,2-Oxocarboxylic acid metabolism
6,map01212,Fatty acid metabolism
7,map01230,Biosynthesis of amino acids
8,map01232,Nucleotide metabolism
9,map01250,Biosynthesis of nucleotide sugars
10,map01240,Biosynthesis of cofactors


In [None]:
kegg_pathways.data[1]

565-element Vector{String}:
 "map01100"
 "map01110"
 "map01120"
 "map01200"
 "map01210"
 "map01212"
 "map01230"
 "map01232"
 "map01250"
 "map01240"
 "map01220"
 "map00010"
 "map00020"
 ⋮
 "map07216"
 "map07219"
 "map07024"
 "map07217"
 "map07218"
 "map07025"
 "map07034"
 "map07035"
 "map07110"
 "map07112"
 "map07114"
 "map07117"

In [None]:
# Get a list of pathways in the KEGG database
kegg_pathways_human = KEGGAPI.list("pathway/hsa");
DataFrame(
    kegg_pathways_human.data,
    kegg_pathways_human.colnames
    )

Row,ID,Pathway
Unnamed: 0_level_1,String,String
1,hsa01100,Metabolic pathways - Homo sapiens (human)
2,hsa01200,Carbon metabolism - Homo sapiens (human)
3,hsa01210,2-Oxocarboxylic acid metabolism - Homo sapiens (human)
4,hsa01212,Fatty acid metabolism - Homo sapiens (human)
5,hsa01230,Biosynthesis of amino acids - Homo sapiens (human)
6,hsa01232,Nucleotide metabolism - Homo sapiens (human)
7,hsa01250,Biosynthesis of nucleotide sugars - Homo sapiens (human)
8,hsa01240,Biosynthesis of cofactors - Homo sapiens (human)
9,hsa00010,Glycolysis / Gluconeogenesis - Homo sapiens (human)
10,hsa00020,Citrate cycle (TCA cycle) - Homo sapiens (human)


In [None]:
join(kegg_pathways_human.data[1][1:30], "+")

"hsa01100+hsa01200+hsa01210+hsa01212+hsa01230+hsa01232+hsa01250+hsa01240+hsa00010+hsa00020+hsa00030+hsa00040+hsa00051+hsa00052+hsa00053+hsa00500+hsa00520+hsa00620+hsa00630+hsa00640+hsa00650+hsa00562+hsa00190+hsa00910+hsa00920+hsa00061+hsa00062+hsa00071+hsa00100+hsa00120"

In [None]:
# Find human gene entry database for each pathway
@btime kegg_link_pathway = KEGGAPI.link("pathway", "hsa")
DataFrame(
    kegg_link_pathway.data,
    kegg_link_pathway.colnames
    )

  1.604 s (182556 allocations: 20.79 MiB)


Row,Target ID,Source ID
Unnamed: 0_level_1,String,String
1,hsa:10327,path:hsa00010
2,hsa:124,path:hsa00010
3,hsa:125,path:hsa00010
4,hsa:126,path:hsa00010
5,hsa:127,path:hsa00010
6,hsa:128,path:hsa00010
7,hsa:130,path:hsa00010
8,hsa:130589,path:hsa00010
9,hsa:131,path:hsa00010
10,hsa:160287,path:hsa00010


In [None]:
kegg_link_pathway.data[1]

36385-element Vector{String}:
 "hsa:10327"
 "hsa:124"
 "hsa:125"
 "hsa:126"
 "hsa:127"
 "hsa:128"
 "hsa:130"
 "hsa:130589"
 "hsa:131"
 "hsa:160287"
 "hsa:1737"
 "hsa:1738"
 "hsa:2023"
 ⋮
 "hsa:8517"
 "hsa:857"
 "hsa:858"
 "hsa:859"
 "hsa:8878"
 "hsa:90"
 "hsa:9181"
 "hsa:91860"
 "hsa:92"
 "hsa:93"
 "hsa:9446"
 "hsa:9817"

In [None]:
# Get a list of organisms in the KEGG database
@btime kegg_organisms = KEGGAPI.list("organism");
DataFrame(
    kegg_organisms.data,
    kegg_organisms.colnames
    )

  1.697 s (83390 allocations: 10.93 MiB)


Row,T. number,Organism,Species,Phylogeny
Unnamed: 0_level_1,String,String,String,String
1,T01001,hsa,Homo sapiens (human),Eukaryotes;Animals;Vertebrates;Mammals
2,T01005,ptr,Pan troglodytes (chimpanzee),Eukaryotes;Animals;Vertebrates;Mammals
3,T02283,pps,Pan paniscus (bonobo),Eukaryotes;Animals;Vertebrates;Mammals
4,T02442,ggo,Gorilla gorilla gorilla (western lowland gorilla),Eukaryotes;Animals;Vertebrates;Mammals
5,T01416,pon,Pongo abelii (Sumatran orangutan),Eukaryotes;Animals;Vertebrates;Mammals
6,T03265,nle,Nomascus leucogenys (northern white-cheeked gibbon),Eukaryotes;Animals;Vertebrates;Mammals
7,T08803,hmh,Hylobates moloch (silvery gibbon),Eukaryotes;Animals;Vertebrates;Mammals
8,T01028,mcc,Macaca mulatta (rhesus monkey),Eukaryotes;Animals;Vertebrates;Mammals
9,T02918,mcf,Macaca fascicularis (crab-eating macaque),Eukaryotes;Animals;Vertebrates;Mammals
10,T08579,mthb,Macaca thibetana thibetana (Pere David's macaque),Eukaryotes;Animals;Vertebrates;Mammals


In [None]:
# Find entries in the compound database related to glucose
@time kegg_find_pathway = KEGGAPI.find("compound", "glucose")
DataFrame(
    kegg_find_pathway.data,
    kegg_find_pathway.colnames
    )

In [None]:
@time kegg_image = KEGGAPI.get_image("hsa00010")
isa(kegg_image, Vector)
@time KEGGAPI.save_image(kegg_image, "glycolysis.png")

In [None]:
@time kegg_conv_eco = KEGGAPI.conv("eco", "ncbi-geneid")
DataFrame(
    kegg_conv_eco.data,
    kegg_conv_eco.colnames
    )

  2.131120 seconds (23.56 k allocations: 2.993 MiB)


Row,Target ID,Source ID
Unnamed: 0_level_1,String,String
1,ncbi-geneid:944742,eco:b0001
2,ncbi-geneid:945803,eco:b0002
3,ncbi-geneid:947498,eco:b0003
4,ncbi-geneid:945198,eco:b0004
5,ncbi-geneid:944747,eco:b0005
6,ncbi-geneid:944749,eco:b0006
7,ncbi-geneid:944745,eco:b0007
8,ncbi-geneid:944748,eco:b0008
9,ncbi-geneid:944760,eco:b0009
10,ncbi-geneid:944792,eco:b0010


In [None]:
kegg_conv_eco.data[1] |> println

["ncbi-geneid:944742", "ncbi-geneid:945803", "ncbi-geneid:947498", "ncbi-geneid:945198", "ncbi-geneid:944747", "ncbi-geneid:944749", "ncbi-geneid:944745", "ncbi-geneid:944748", "ncbi-geneid:944760", "ncbi-geneid:944792", "ncbi-geneid:944771", "ncbi-geneid:948295", "ncbi-geneid:944751", "ncbi-geneid:944750", "ncbi-geneid:944753", "ncbi-geneid:944754", "ncbi-geneid:944756", "ncbi-geneid:944758", "ncbi-geneid:944757", "ncbi-geneid:944743", "ncbi-geneid:948449", "ncbi-geneid:944759", "ncbi-geneid:949128", "ncbi-geneid:949129", "ncbi-geneid:944761", "ncbi-geneid:944800", "ncbi-geneid:944807", "ncbi-geneid:944777", "ncbi-geneid:944796", "ncbi-geneid:944762", "ncbi-geneid:949025", "ncbi-geneid:944775", "ncbi-geneid:944795", "ncbi-geneid:948999", "ncbi-geneid:948995", "ncbi-geneid:944886", "ncbi-geneid:948997", "ncbi-geneid:949064", "ncbi-geneid:944765", "ncbi-geneid:947316", "ncbi-geneid:948939", "ncbi-geneid:948958", "ncbi-geneid:948590", "ncbi-geneid:944766", "ncbi-geneid:944767", "ncbi-gen