# Query and Filter data from NeDRex via the API

## 1. Query a filtered PPI from NeDRex
### Interactive Exercise: Explore the API Documentation

Before we start querying data, let's explore the NeDRex API documentation to understand the available parameters:

1. **Go to the NeDRex API documentation**: https://exbio.wzw.tum.de/repo4eu_nedrex_open/
2. **Navigate to PPI routes** (check out the parameter for the protein-protein-interaction endpoint)
3. **Find the correct parameter names for**:
   - Filtering by **reviewed proteins only** (hint: look for reviewed/curated protein parameters)
   - Filtering by **methods score**
   - Filtering by **evidence type**
4. **Update the `ppi_body` dictionary below** with the correct parameter names

### Current Filter Criteria
- Only reviewed proteins: True
- Evidence type: experimental (exp)
- Method score threshold: 13.75

In [None]:
import requests

class NeDRexService:
    API_LINK = "https://exbio.wzw.tum.de/repo4eu_nedrex_open/"


def getEdges(body):
    all_edges_reviewed_proteins = []
    upper_limit = body.get("limit", 10000)
    offset = 0

    while True:
        # start with skipping no entries
        # skip: offset for pagination -> 0
        # In the body we will always adjust the skip/offset value to get the next set of edges
        body["skip"] = offset
        try:
            # POST request to the ppi endpoint with the body
            response = requests.post(
                url=f"{NeDRexService.API_LINK}/ppi", json=body, headers={"content-type": "application/json"}
            )
            response.raise_for_status()
            data = response.json()
            # collect all edges in a list
            all_edges_reviewed_proteins.extend(data)
            if len(data) < upper_limit:
                # This means that we have reached the end of the edges
                break
            # increase the offset by the upper limit to get the next set of edges
            offset += upper_limit
        except requests.exceptions.RequestException as e:
            print(f"HTTP request failed: {e}")
            return None
    return all_edges_reviewed_proteins
        
def createNedrexGraph(filename, body):
    edges_review_proteins = getEdges(body)
    print("Number of queried edges overall: ", len(edges_review_proteins))
    print("Example edge: ", edges_review_proteins[0])
    
    # Write the edges to a csv file in the format "memberOne,memberTwo"
    with open(filename, "w") as f:
        for edge in edges_review_proteins:
            f.write(str(edge["memberOne"]).lstrip(".uniprot") + "," + str(edge["memberTwo"]).lstrip(".uniprot"))
            f.write("\n")


if __name__ == "__main__":
    # TODO: Update this body with the correct parameter names from the API documentation
    # Visit: https://exbio.wzw.tum.de/repo4eu_nedrex_open/ 
    # Look at PPI routes to find the correct parameter names for:
    # - Reviewed proteins filtering
    # - Methods score filtering
    # - Evidence type filtering
    ppi_body = {
        "limit": 10000,
        # Add your parameters here after checking the documentation:
        # "PARAMETER_NAME_FOR_REVIEWED_PROTEINS": [True],
        # "PARAMETER_NAME_FOR_EVIDENCE": ["exp"],
        # "PARAMETER_NAME_FOR_METHODS_SCORE": 13.75
    }
    createNedrexGraph(filename="../../data/NeDRex_api/filtered_ppi_only_reviewed_proteins.csv", body=ppi_body)

### 🔍 Solution (Hidden - Expand to see the answer)

<details>
<summary>Click here to see the solution after you've tried exploring the documentation</summary>

The correct parameter names are:
- **Reviewed proteins filtering**: `reviewed_proteins`
- **Evidence filtering**: `iid_evidence` 
- **Methods score filtering**: `methods_score_cutoff`

Complete working body:
```python
ppi_body = {
    "limit": 10000,
    "reviewed_proteins": [True],
    "iid_evidence": ["exp"],
    "methods_score_cutoff": 13.75
}
```

**Explanation**:
- `reviewed_proteins`: Boolean list to include only reviewed/curated proteins from UniProt
- `iid_evidence`: List of evidence types to filter by (e.g., ["exp"] for experimental evidence)
- `methods_score_cutoff`: Minimum score threshold for methods confidence
- `limit`: Maximum number of edges to return per request (pagination)

</details>

## 2. Query PDIs from NeDRex

In addition to PPIs, NeDRex contains many other node types and relationships.

Next, let's query some protein-drug interactions (PDIs).

Check the available edge types (https://exbio.wzw.tum.de/repo4eu_nedrex_open/list_edge_collections) and fill in the correct type below.


In [14]:
import requests

class NeDRexService:
    API_LINK = "https://exbio.wzw.tum.de/repo4eu_nedrex_open/"

# Function to get the edges from the NeDRex API
# type: type of the edges to query
# source_domain_ids: list of source domain ids to query for the edges
# target_domain_ids: list of target domain ids to query for the edges
# extra_attributes: additional attributes to query for the edges
def getEdges(
    type: str, body
):
    all_edges = []
    upper_limit = body.get("limit", 10000)

    offset = 0
    while True:
        body["skip"] = offset
        try:
            response = requests.post(
                url=f"{NeDRexService.API_LINK}/{type}/attributes/json", json=body
            )
            response.raise_for_status()
            data = response.json()
            all_edges.extend(data)
            if len(data) < upper_limit:
                break
            offset += upper_limit
        except requests.exceptions.RequestException as e:
            print(f"HTTP request failed: {e}")
            return None
    return all_edges

def createNedrexGraph(filename, type: str, body):
    edges_pdi = getEdges(type, body)
    print("Number of queried edges overall: ", len(edges_pdi))
    print("Example edge: ", edges_pdi[0])
    
    # Write the edges to a csv file in the format "memberOne,memberTwo"
    with open(filename, "w") as f:
        for edge in edges_pdi:
            f.write(str(edge["sourceDomainId"]).lstrip(".drugbank") + "," + str(edge["targetDomainId"]).lstrip(".uniprot"))
            f.write("\n")

if __name__ == "__main__":

    pdi_body = {
        "source_domain_id": [],
        "target_domain_id": [],
        "attributes": ["sourceDomainId", "targetDomainId"],
        "skip": 0,
        "limit": 1000,
    }
    
    # TODO: fill in the correct edge type to query PDIs
    createNedrexGraph(filename="../../data/NeDRex_api/pdi.csv", type="TODO", body=pdi_body)

### 🔍 Solution (Hidden - Expand to see the answer)

<details>
<summary>Click here to see the solution after you've tried exploring the documentation</summary>

The correct type is 'drug_has_target'

```python
createNedrexGraph(filename="../../data/NeDRex_api/pdi.csv", type="drug_has_target", body=pdi_body)
```

</details>

## 3. Get seed genes from NeDRex for Huntingtons disease
### ToDO
- Check out the documentation to find the correct edge type identifier for seed genes (gene associated with disorder): https://exbio.wzw.tum.de/repo4eu_nedrex_open/
- Based on the NeDRex graph, check out the directionality of the graph: We have the MONDO id for Huntingtons disease and based on that we want to get the seed genes
- Querry the seed genes based on the correct parameters with the provided functions below

In [None]:
import requests
from gprofiler import GProfiler

gp = GProfiler(return_dataframe=True)

class NeDRexService:
    API_LINK = "https://exbio.wzw.tum.de/repo4eu_nedrex_open/"

# Function to get the edges from the NeDRex API
# type: type of the edges to query
# source_domain_ids: list of source domain ids to query for the edges
# target_domain_ids: list of target domain ids to query for the edges
# extra_attributes: additional attributes to query for the edges
def getEdges(
    type: str, source_domain_ids=[], target_domain_ids=[], extra_attributes=[]
):
    all_edges = []
    attributes = ["sourceDomainId", "targetDomainId", "dataSources", "type"]
    attributes.extend(extra_attributes)
    upper_limit = 10000

    body = {
        "source_domain_id": source_domain_ids,
        "target_domain_id": target_domain_ids,
        "attributes": attributes,
        "skip": 0,
        "limit": upper_limit,
    }
    offset = 0
    while True:
        body["skip"] = offset
        try:
            response = requests.post(
                url=f"{NeDRexService.API_LINK}/{type}/attributes/json", json=body
            )
            response.raise_for_status()
            data = response.json()
            all_edges.extend(data)
            if len(data) < upper_limit:
                break
            offset += upper_limit
        except requests.exceptions.RequestException as e:
            print(f"HTTP request failed: {e}")
            return None
    return all_edges

# Function to get the edges from the NeDRex API in batches
# type: type of the edges to query
# ids: list of node ids to query for the edges
# sources: boolean to indicate whether the node ids are source or target domain ids
# extra_attributes: additional attributes to query for the edges
def getEdgesInBatches(type: str, ids, sources=True, extra_attributes=[]):
    batch_size = 10000
    all_edges = []
    for i in range(0, len(ids), batch_size):
        batch_nodes = ids[i : i + batch_size]
        if sources:
            edges_batch = getEdges(
                type, source_domain_ids=batch_nodes, extra_attributes=extra_attributes
            )
        else:
            edges_batch = getEdges(
                type, target_domain_ids=batch_nodes, extra_attributes=extra_attributes
            )
        if edges_batch is not None:
            all_edges.extend(edges_batch)
        else:
            print(f"Error occurred while fetching batch starting at index {i}.")
            break
    return all_edges

if __name__ == "__main__":
    ############# EXAMPLE CALLS TO THE NEDREX API ##################
    
    # Example call to get the edges for the protein_has_go_annotation type -> by protein ids as source domain ids
    edges_to_get_proteins = ["uniprot.A0A024RBG1", "uniprot.A0A075B6H9", "uniprot.A0A075B6I6"]
    edges_go = getEdgesInBatches(
        "protein_has_go_annotation",
        ids=edges_to_get_proteins,
        extra_attributes=["qualifiers"],
    )
    print("Example edge: ",edges_go[0])
    
    # Example call to get the edges for the protein_has_go_annotation type -> by go ids as target domain ids
    edges_to_get_go = ["go.0003723", "go.0005829", "go.0046872"]
    edges_go = getEdgesInBatches(
        "protein_has_go_annotation",
        ids=edges_to_get_go,
        sources=False,
        extra_attributes=["qualifiers"],
    )
    print("Example edge: ",edges_go[0])
    
    # Example call to get the edges for the drug_has_contraindication type -> by drug ids as source domain ids and disorder ids as target domain ids
    drug_ids = ["drugbank.DB12530", "drugbank.DB08893"]
    disorder_ids = ["mondo.0005344", "mondo.0005044"]
    edges_contraindications = getEdges(
        "drug_has_contraindication",
        source_domain_ids=drug_ids,
        target_domain_ids=disorder_ids,
    )
    print("Example edge: ",edges_contraindications[0])
    
    #############################################################
    
    # Mondo disease id for Huntingtons disease
    disease_ids = ["mondo.0007739"]
    # TODO: Find solution as described above
    edges_seed_genes = getEdgesInBatches(
        "TODO_EDGE_TYPE",
        ids=disease_ids,
        sources="TODO" # change to True or False based on the correct edge type
    )
    disgenet_edges = [edge for edge in edges_seed_genes 
                 if "dataSources" in edge and "disgenet" in edge["dataSources"]]
    print("Number of edges from disgenet: ", len(disgenet_edges))
    print("Example edge: ", disgenet_edges[0])
        
    seeds_entrez = [edge["sourceDomainId"].lstrip(".entrez") for edge in disgenet_edges]
    
    ############################### CONVERTING ENTREZ IDs TO UNIPROT IDs ###############################
    converted = gp.convert(organism='hsapiens',
        query=list(seeds_entrez),
        target_namespace='UNIPROTSWISSPROT_ACC', numeric_namespace="ENTREZGENE_ACC")

    seeds_uniprot = set(converted['converted'])
    
    #############################################################
    
    output_file = "../../data/NeDRex_api/seed_genes_huntingtons_disease.csv"
    print(f"Writing {len(seeds_uniprot)} seed genes to {output_file}")
    # Write the seed genes to a file
    with open(output_file, 'w') as file:
        for gene in seeds_uniprot:
            file.write(gene + '\n')


### 🔍 Solution (Hidden - Expand to see the answer)

<details>
<summary>Click here to see the solution after you've tried exploring the documentation</summary>

The correct implementation is:

```python
# Mondo disease id for Huntingtons disease
disease_ids = ["mondo.0007739"]

# The correct edge type is "gene_associated_with_disorder"
# Since we have disorder IDs and want genes, disorder should be target_domain_id
edges_seed_genes = getEdgesInBatches(
    "gene_associated_with_disorder",
    ids=disease_ids,
    sources=False  # disorder IDs are target_domain_ids, so sources=False
)
```

**Explanation**:
- **Edge type**: `gene_associated_with_disorder` - connects genes to disorders
- **Directionality**: Genes are `sourceDomainId`, disorders are `targetDomainId`
- **Parameter**: `sources=False` because we're providing disorder IDs as target domain IDs

**Key insights from the API documentation**:
- The edge goes from gene → disorder (gene as source, disorder as target)
- When querying by disorder ID, we need to set `sources=False` to indicate we're providing target domain IDs
- The resulting edges will have genes in `sourceDomainId` and disorders in `targetDomainId`

</details>