# ComPath -  Pathway name similarity comparison


##### Author: Daniel Domingo-Fernandez

The goal of this notebook is to support the mapping KEGG pathways to their corresponding Reactome pathways. Based on the similarity of pathway names across different repositories, we can manually confirm later whether the two names represent the same pathway.

### Notebook imports

In [1]:
import time
import sys
import os

import pandas as pd
from collections import defaultdict
from difflib import SequenceMatcher

### Notebook configuration

In [2]:
time.asctime()

'Mon Jan 29 10:53:11 2018'

In [3]:
print(sys.version)

3.4.5 (default, Dec 11 2017, 14:22:24) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)]


In [4]:
COMPATH_PATH = os.environ['COMPATH']

### Load resources

In [5]:
kegg_excel = os.path.join(COMPATH_PATH,'src','compath','static','resources','excel','kegg_gene_sets.csv')
reactome_excel = os.path.join(COMPATH_PATH,'src','compath','static','resources','excel','reactome_gene_sets.csv')

kegg_dataframe = pd.read_csv(kegg_excel, dtype=object)
reactome_dataframe = pd.read_csv(reactome_excel, dtype=object)

# Remove the 'Homo sapiens' out of the KEGG pathways
kegg_dataframe.columns = [
    kegg_pathway.replace(' - Homo sapiens (human)', '')
    for kegg_pathway in kegg_dataframe
] 


### Pathway name based mapping

In [6]:
name_based_mapping = list()

for kegg_pathway in kegg_dataframe.columns.values:
        
    matching = set()
    
    for reactome_pathway in reactome_dataframe.columns.values:
        
        sim_value = SequenceMatcher(None, kegg_pathway, reactome_pathway).ratio()
        
        # Similarity threshold
        if sim_value < 0.8:
            continue
            
        matching.add(reactome_pathway)
        
    # Create an ordered column (alphabetic) for mapping overview table
    
    name_based_mapping.append(matching)
        
    if not matching:
        continue
        
    print("{} has the following matching {}".format(kegg_pathway, matching))
        
            

Antigen processing and presentation has the following matching {'Antigen processing-Cross presentation'}
Apelin signaling pathway has the following matching {'Reelin signalling pathway'}
Apoptosis has the following matching {'Apoptosis'}
Arachidonic acid metabolism has the following matching {'Arachidonic acid metabolism'}
Arginine biosynthesis has the following matching {'Agmatine biosynthesis', 'Pyrimidine biosynthesis', 'Androgen biosynthesis', 'Serine biosynthesis'}
Axon guidance has the following matching {'Axon guidance'}
Base excision repair has the following matching {'Base Excision Repair'}
Biotin metabolism has the following matching {'Nicotinate metabolism'}
Caffeine metabolism has the following matching {'Creatine metabolism'}
Calcium signaling pathway has the following matching {'Reelin signalling pathway'}
Cell cycle has the following matching {'Cell Cycle'}
Cellular senescence has the following matching {'Cellular Senescence'}
Chemokine signaling pathway has the followin

##### Export column

In [7]:
for mapped_terms in name_based_mapping:
    
    if not mapped_terms:
        print(".")
    
    # Remove curly brackets
    print(", ".join(str(term) for term in mapped_terms))


.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Antigen processing-Cross presentation
Reelin signalling pathway
Apoptosis
.

Arachidonic acid metabolism
.

Agmatine biosynthesis, Pyrimidine biosynthesis, Androgen biosynthesis, Serine biosynthesis
.

.

.

.

.

.

Axon guidance
.

.

.

.

Base Excision Repair
.

.

.

Nicotinate metabolism
.

.

.

Creatine metabolism
Reelin signalling pathway
.

.

.

.

Cell Cycle
Cellular Senescence
.

.

.

Reelin signalling pathway
.

.

.

.

.

.

Citric acid cycle (TCA cycle)
.

.

.

.

.

.

.

.

.

DNA Replication
.

.

.

.

Ligand-receptor interactions
.

.

.

.

.

.

.

.

.

.

Fanconi Anemia Pathway
Digestion and absorption
Fatty acyl-CoA biosynthesis
.

.

Fatty acid metabolism
.

.

.

.

.

Agmatine biosynthesis, Wax biosynthesis, Fructose biosynthesis
.

.

.

Fructose metabolism, Glucose metabolism, Glycogen metabolism, Galactose catabolism
.

.

.

.

.

.

Glucose metabolism
Glycosphingolipid metabolism
Glycosp