# Using RDFLib for  querying the integrated datasets

In this part of the lab, we will learn how to query the integrated dataset in a python application.

First make sure you have all the needed packages installed:

In [None]:
!pip3 install rdflib

RDFLib will be used to query the integrated dataset using SPARQL queries.

More information on RDFLib can be found [here](https://rdflib.readthedocs.io/en/stable/)

First we have to load the integrated dataset we created in Part 2 of the lab.

**IMPORTANT** here we assume that the solution of part 2 is stored in the file triples.ttl in the same folder as this notebook. 

In [None]:
import rdflib
g = rdflib.Graph()
g.parse("triples.ttl",format="turtle")

We will use the following function to help print out the results:

In [None]:
def query(query_str):
    qres = g.query(query_str)
    if len(qres) ==0:
        print('No results found')
    else:
        for row in qres:
            print( [str(item) for item in row])

The following example shows how we can retrieve the first and last name of the patients through a single query, even though they initially originated from different files:

In [None]:
query_str= """
        PREFIX schema: <http://schema.org/>
        PREFIX obo: <http://purl.obolibrary.org/obo/>
        SELECT *
        WHERE {
          ?patient obo:firstName ?givenName. 
          ?patient obo:lastName  ?lastName. 
          
        }"""
query(query_str)

# Querying exercise

In the following, you will be presented with three predefined queries. It is up to you to try to understand the function of each query. You can execute the queries to get a better insight into which data each query retrieves.


Query 1:

In [None]:
query_str= """
        PREFIX obo: <http://purl.obolibrary.org/obo/>
        PREFIX schema: <http://schema.org/>

        SELECT  ?firstName ?lastName
        WHERE {
          ?p a obo:Person. 
          ?p obo:firstName ?firstName.
          ?p obo:lastName ?lastName.
          ?p obo:has_disease [obo:has_symptom [a obo:WeightLoss]]
        }"""
query(query_str)

Query 1: What data does Query 1 retrieve?:

- [ ] a) Retrieves the patient's first and last name who do not have weight loss as one of their disease symptoms.
- [ ] b) Retrieves the patient's first and last name who have weight loss as one of their disease symptoms.
- [ ] c) Retrieves the patient IDs who have weight loss as one of their disease symptoms.
- [ ] d) Retrieves the patient IDs who have a disease with weight loss as the only disease symptom.


Query 2:

In [None]:
query_str= """
        PREFIX obo: <http://purl.obolibrary.org/obo/>
        SELECT (Count(?p) as ?num_patients) ?country
        WHERE {
          ?p a obo:Person; obo:citizenOf ?country; obo:has_disease ?disease.
          }
          GROUP BY ?country
          ORDER BY DESC(?num_patients)
          """
query(query_str)

Query 2: What data does Query 2 retrieve?:

- [ ] a) Retrieves for each country, the number of patients that have at least one disease, ordered according to the number of patients in a descending fashion.
- [ ] b) Counts the patients that have a country definition, ordered in function of the country.
- [ ] c) Retrieves for each disease, the number of patients that have that disease.
- [ ] d) Counts the number of diseases each person has and orders them by country.

Query 3: 


In [None]:
query_str= """
        PREFIX obo: <http://purl.obolibrary.org/obo/>
        PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
        SELECT ?p ?age
        WHERE {
          ?p a obo:Person; 
             obo:has_disease ?d;
             obo:hasAge ?age
          
          FILTER(xsd:integer(?age) > "30"^^xsd:integer)
        }"""
query(query_str)

Query 3: What data does Query 3 retrieve?:

- [ ] a) Retrieves the age of each patient that has a disease.
- [ ] b) Counts the patients that are older than 30 and have at least one disease.
- [ ] c) Retrieves the patients that are older than 30 and have at least one disease.
- [ ] d) Counts the number of diseases each person has.
