# Build the Orphanet/HPO layer

The following code extract Disease-Phenotype associations from Orphanet and creates two networks:
* a monoplex network with disease-phenotype associations: **`network/multiplex/Orpha/Disease_Phenotype.tsv`**
* a monoplex network with disease-phenotype and phenotype-phenotype associations: **`network/multiplex/Orpha/Disease_PhenotypeOntology.tsv`**

All ids are explicited in files:
* **`data/OrphaDisease_HPO_extract.tsv`** contains OrphaCode, disease name, HPO ids and HPO terms 
* **`data/hpo_terms.tsv`** contains all HPO ids and terms


## 1) Build disease-phenotype network from Orphanet Data

To get data from Orphanet: https://www.orphadata.com/data/xml/en_product4.xml

In [1]:
import xml.etree.ElementTree as ET
import csv

In [2]:
tree = ET.parse("../data/en_product4.xml")
root = tree.getroot()

In [3]:
# Create TSV file to store the extracted data
info_file = open('../data/OrphaDisease_HPO_extract.tsv', 'w', newline='')
info_writer = csv.writer(info_file, delimiter ='\t')

# Create the network file for MultiXrank
net_file = open('../network/multiplex/Orpha/Disease_Phenotype.tsv', 'w', newline='')
net_writer = csv.writer(net_file, delimiter ='\t')

# Iterate over disorders and HPOs
for disorder in root.iter('Disorder'):
    orpha_code = "ORPHA:"+disorder.find('OrphaCode').text
    orpha_name = disorder.find('Name').text
    for hpo in disorder.iter("HPO"):
        hpo_id = hpo.find("HPOId").text
        hpo_term = hpo.find("HPOTerm").text
        info_writer.writerow([orpha_code, orpha_name, hpo_id, hpo_term])
        net_writer.writerow([orpha_code, hpo_id])

# Close files
info_file.close()
net_file.close()

## 2) Add HP ontology

Create a new tsv file containing previously extracted Disease-HPO associations AND the full HP ontology.

Download HPO data in obo format from https://hpo.jax.org/app/data/ontology

In [4]:
import obonet
import networkx
import pandas as pd

In [5]:
# Read previously computed Disease-HPO monoplex
dis_hpo_net = pd.read_csv('../network/multiplex/Orpha/Disease_Phenotype.tsv', sep = '\t', header=None)
dis_hpo_net

Unnamed: 0,0,1
0,ORPHA:58,HP:0000256
1,ORPHA:58,HP:0001249
2,ORPHA:58,HP:0001250
3,ORPHA:58,HP:0001257
4,ORPHA:58,HP:0001274
...,...,...
111760,ORPHA:397596,HP:0011110
111761,ORPHA:397596,HP:0012758
111762,ORPHA:397596,HP:0031692
111763,ORPHA:397596,HP:0031693


Load the HPO ontology, append it to `dis_hpo_net`, and store in **`network/multiplex/Orpha/Disease_PhenotypeOntology.tsv`**

In [6]:
# Read obo file
graph = obonet.read_obo("../data/hp.obo")

# Extract edges
ontology = networkx.to_pandas_edgelist(graph)
ontology.columns = [0,1]

# Append to dis_hpo_net
full_net = pd.concat([dis_hpo_net, ontology])
full_net

# Write to tsv
full_net.to_csv("../network/multiplex/Orpha/Disease_PhenotypeOntology.tsv", sep = '\t', header=None, index=False)

Store HPO ids and HPO terms correspondance in tsv file **`data/hpo_terms.tsv`**

In [7]:
id_to_name = {id_: data.get('name') for id_, data in graph.nodes(data=True)}

hpo_file = open('../data/hpo_terms.tsv', 'w', newline='')
hpo_writer = csv.writer(hpo_file, delimiter ='\t')

for hpo in id_to_name:
    hpo_writer.writerow([hpo, id_to_name[hpo]])

hpo_file.close()