# Conversion to other organisms

Most of the prior knowledge stored inside `Omnipath` is derived from human data, therefore they use gene names. Despite this, using homology we can convert gene names to other organisms.

To showcase how to do it inside `decoupler`, we will load the `MSigDB` database and convert it into gene symbols for mouse and fly.

in a clean environment, need to install pypath

```bash
pip install pypath-omnipath
```

In [1]:
import decoupler as dc

msigdb = dc.get_resource('MSigDB')
msigdb

Unnamed: 0,genesymbol,collection,geneset
0,MAFF,chemical_and_genetic_perturbations,BOYAULT_LIVER_CANCER_SUBCLASS_G56_DN
1,MAFF,chemical_and_genetic_perturbations,ELVIDGE_HYPOXIA_UP
2,MAFF,chemical_and_genetic_perturbations,NUYTTEN_NIPP1_TARGETS_DN
3,MAFF,immunesigdb,GSE17721_POLYIC_VS_GARDIQUIMOD_4H_BMDC_DN
4,MAFF,chemical_and_genetic_perturbations,SCHAEFFER_PROSTATE_DEVELOPMENT_12HR_UP
...,...,...,...
3838543,PRAMEF22,go_biological_process,GOBP_POSITIVE_REGULATION_OF_CELL_POPULATION_PR...
3838544,PRAMEF22,go_biological_process,GOBP_APOPTOTIC_PROCESS
3838545,PRAMEF22,go_biological_process,GOBP_REGULATION_OF_CELL_DEATH
3838546,PRAMEF22,go_biological_process,GOBP_NEGATIVE_REGULATION_OF_DEVELOPMENTAL_PROCESS


For this example we will filter by the `hallmark` gene sets collection:

In [2]:
# Filter by hallmark
msigdb = msigdb[msigdb['collection']=='hallmark']

# Remove duplicated entries
msigdb = msigdb[~msigdb.duplicated(['geneset', 'genesymbol'])]
msigdb

Unnamed: 0,genesymbol,collection,geneset
233,MAFF,hallmark,HALLMARK_IL2_STAT5_SIGNALING
250,MAFF,hallmark,HALLMARK_COAGULATION
270,MAFF,hallmark,HALLMARK_HYPOXIA
373,MAFF,hallmark,HALLMARK_TNFA_SIGNALING_VIA_NFKB
377,MAFF,hallmark,HALLMARK_COMPLEMENT
...,...,...,...
1449668,STXBP1,hallmark,HALLMARK_PANCREAS_BETA_CELLS
1450315,ELP4,hallmark,HALLMARK_PANCREAS_BETA_CELLS
1450526,GCG,hallmark,HALLMARK_PANCREAS_BETA_CELLS
1450731,PCSK2,hallmark,HALLMARK_PANCREAS_BETA_CELLS


Then, we can easily transform the obtained resource into mouse genes. Organisms can be defined by their common name, latin name or [NCBI Taxonomy identifier](https://www.ncbi.nlm.nih.gov/taxonomy).

<div class="alert alert-info">

**Note**
    
Translating to an organism for the first time might take a while (~ 15 minutes). Since the data is stored in cache, the next times is going to run faster. If you need to reset the cache, run `rm -r .pypath/cache/`.

</div>

In [3]:
# Translate targets
mouse_msigdb = dc.translate_net(msigdb, target_organism = 'mouse', unique_by = ('geneset', 'genesymbol'))
mouse_msigdb

error: Error -3 while decompressing data: invalid stored block lengths

Note that when performing homology convertion we might gain or lose some genes from one organism to another.

Let us try the fruit fly (`7227`) now:

In [None]:
# Translate targets
fly_msigdb = dc.translate_net(msigdb, target_organism = 7227, unique_by = ('genesymbol', 'geneset'))
fly_msigdb

The `translate_net` function provides finer control, but in most cases it's enough to pass the name of the desired organism to the functions that download the data:

In [None]:
spw = dc.get_resource('SignaLink_pathway', organism = 'rat')
spw

PROGENy and CollecTRI have their own dedicated functions which work a similar way:

In [None]:
dc.get_progeny(organism = 'Mus musculus')

In [None]:
dc.get_collectri(organism = 'mouse')