# RDF-centered conversion
This notebook will convert ZIB triples to capacity fields.

In [1]:
from rdflib import Graph, URIRef, Namespace, Literal
from rdflib.namespace import RDFS,SKOS, RDF, OWL
from pathlib import Path

## Ontologies

## ZIB ontology
A small sample ontology to be able to describe ZIB concepts.

This ontology contains the following ZIB classes.

| Name          | Description  |
|---------------|--------------|
| [MedicationUse](https://zibs.nl/wiki/MedicationUse2-v1.1.1(2020EN)) |    |
| [PharmaceuticalProduct](https://zibs.nl/wiki/PharmaceuticalProduct-v2.1.2(2020EN)) | Partial information model used in MedicationUse |
| ZIBConcept | The parent class of all concrete ZIB classes|

## CAPACITY ontology
A similar simple ontology. For now we focus on the carmed field. Every multiple choice answer has a class.

| Name | Description|
|------|------------|
| CapacityField| Parent class of all concrete capacity field classes|
|carmed___1|    |
|carmed___3|    |
|carmed___4|    |
|carmed___5|    |
|carmed___6|    |
|carmed___7|    |
|carmed___8|    |
|carmed___15|    |
|carmed___16|    |
|carmed___17|    |
|carmed___9|    |
|carmed___10|    |
|carmed___11|    |
|carmed___12|    |
|carmed___13|    |
|carmed___14|    |
|carmed___99|    |

In order to reason about these concepts we need to load the ontologies into our graph.

In [2]:
patient_graph = Graph()


# Load ZIB ontology into graph
ZIB = Namespace('http://example.org/ZIB#')
patient_graph.bind('ZIB', ZIB)
zib_ontology = '../../ontologies/zib.owl'
patient_graph.parse(zib_ontology)

# Load CAPACITY ontology
CAPACITY = Namespace('http://example.org/capacity#')
patient_graph.bind('CAPACITY', CAPACITY)
capacity_ontology = '../../ontologies/capacity.owl'
patient_graph.parse(capacity_ontology)


<Graph identifier=Nc5384001955747f99a08fc04c8e30b38 (<class 'rdflib.graph.Graph'>)>

# Sample records
Using the ZIB ontology some sample records are created for patient Bob which uses a medication called *quinidine* which has *C01BA01* within the ATC codesystem, and *atenolol* with code *C07AB03*.
We need to create two *MedicationUse* records.

In [3]:
# Some namespaces we use
ATC = Namespace('http://purl.bioontology.org/ontology/ATC/')
UATC = Namespace('http://purl.bioontology.org/ontology/UATC/')


patient = Namespace('http://example.org/patient/')
zib_record = Namespace('http://example.org/zib_record/')

# Sample medicationUse zib
patient_graph.bind('ZIB', ZIB)
patient_graph.bind('ATC', ATC)
patient_graph.bind('UATC', UATC)
patient_graph.bind('RDFS', RDFS)
patient_graph.bind('SKOS', SKOS)
patient_graph.bind('OWL', OWL)

# Quinidine record
record_ref = zib_record.bobsmedication
patient_graph.add((patient.bob, ZIB.hasZibRecord, record_ref))
patient_graph.add((record_ref, RDF.type, ZIB.MedicationUse))
patient_graph.add((record_ref, ZIB.medicationCode, UATC.C01BA01))

# Atenolol record
record_ref = zib_record.bobsOtherMedication
patient_graph.add((patient.bob, ZIB.hasZibRecord, record_ref))
patient_graph.add((record_ref, RDF.type, ZIB.MedicationUse))
patient_graph.add((record_ref, ZIB.medicationCode, UATC.C07AB03))


# Susans medication, she takes B01AC06, acetylsalicylic acid 
record_ref = zib_record.susansmedication
patient_graph.add((patient.susan, ZIB.hasZibRecord, record_ref))
patient_graph.add((record_ref, RDF.type, ZIB.MedicationUse))
patient_graph.add((record_ref, ZIB.medicationCode, UATC.B01AC06))


# Adding ATC ontology
The ZIB records do not provide sufficient information to deduce the required information for CAPACITY on their own. In order to infer higher-level information on medication we need to add the ATC ontology.

In [4]:
from zib_uploader.tools import get

atc_path = 'atc.ttl'
atc_url = 'http://data.bioontology.org/ontologies/ATC/submissions/12/download?apikey=8b5b7825-538d-40e0-9e9e-5ab9274a9aeb'

atc_file = get(atc_url)

patient_graph.parse(str(atc_file), format='turtle')

<Graph identifier=Nc5384001955747f99a08fc04c8e30b38 (<class 'rdflib.graph.Graph'>)>

Now we need to check whether we can use the ATC ontology to infer additional information about Mr. Bobs medication use.

A *DeductiveClosure* is an expansion of the knowledge base with all knowledge that can logically be derived from the original knowledge.

In [5]:
# import owlrl

# owlrl.DeductiveClosure(owlrl.CombinedClosure.RDFS_OWLRL_Semantics).expand(patient_graph)

# query = \
# '''
# select ?medication ?superclass
# where {    
#    <http://example.org/patient/bob> ZIB:hasZibRecord ?zibRecord .
#     ?zibRecord ZIB:medicationCode ?medication .
#     ?medication RDFS:subClassOf ?superclass
# }
# '''

# result = patient_graph.query(query)

# for r in result:
#     print(f'Medication {r[0]} is subclass of {r[1]}')

It actually turns out that transitive relations can also be deduced without a reasoning engine by adding a '\*' to the transitive property.

If this is the most complicated reasoning we need we do not need a reasoning engine.

In [6]:
query = \
'''
select ?medication ?superclass
where {    
   <http://example.org/patient/bob> ZIB:hasZibRecord ?zibRecord .
    ?zibRecord ZIB:medicationCode ?medication .
    ?medication RDFS:subClassOf* ?superclass
}
'''

result = patient_graph.query(query)

for r in result:
    print(f'Medication {r[0]} is subclass of {r[1]}')

Medication http://purl.bioontology.org/ontology/UATC/C01BA01 is subclass of http://purl.bioontology.org/ontology/UATC/C01BA01
Medication http://purl.bioontology.org/ontology/UATC/C01BA01 is subclass of http://purl.bioontology.org/ontology/UATC/C01BA
Medication http://purl.bioontology.org/ontology/UATC/C01BA01 is subclass of http://purl.bioontology.org/ontology/UATC/C01B
Medication http://purl.bioontology.org/ontology/UATC/C01BA01 is subclass of http://purl.bioontology.org/ontology/UATC/C01
Medication http://purl.bioontology.org/ontology/UATC/C01BA01 is subclass of http://purl.bioontology.org/ontology/UATC/C
Medication http://purl.bioontology.org/ontology/UATC/C01BA01 is subclass of http://www.w3.org/2002/07/owl#Thing
Medication http://purl.bioontology.org/ontology/UATC/C07AB03 is subclass of http://purl.bioontology.org/ontology/UATC/C07AB03
Medication http://purl.bioontology.org/ontology/UATC/C07AB03 is subclass of http://purl.bioontology.org/ontology/UATC/C07AB
Medication http://purl.

This information can be used to derive the values of a couple of *carmed* fields in the CAPACITY codebook.
The mapping excel sheet specifies how the multiple choice values map to ATC codes.

In [7]:
import pandas as pd
mapping_url = 'https://raw.githubusercontent.com/FAIR-data-for-CAPACITY/CAPACITY-mapping/master/codes-to-capacity-mapping.csv'
mapping = get(mapping_url)
mapping_df = pd.read_csv(mapping)
mapping_df = mapping_df.apply(lambda x: x.str.strip())
URI_template = 'http://example.org/capacity/{field_name}/{choice_code}'
choice_template = '{field_name}___{choice_code}'

# For now focus on multiple choice
mapping_df = mapping_df[~mapping_df['capacity_field_choice_code'].isna()]


for row in mapping_df.itertuples():
    field_name = row.capacity_field_name
    choice_code = row.capacity_field_choice_code
    
    capacity_uri = URI_template.format(field_name=field_name, choice_code=choice_code)
    choice_name = choice_template.format(field_name=field_name, choice_code=choice_code)
    
    capacity_uri = URIRef(capacity_uri)
    atc_uri = URIRef(UATC + str(row.atc_code))
    
    patient_graph.add((atc_uri, OWL.sameAs, capacity_uri))
    patient_graph.add((capacity_uri, SKOS.prefLabel, Literal(row.capacity_field_description)))
    patient_graph.add((capacity_uri, CAPACITY.fieldName, Literal(row.capacity_field_name)))
    patient_graph.add((capacity_uri, CAPACITY.choiceName, Literal(choice_name)))
    
    # Let's combine all capacity classes under one superclass choice
    patient_graph.add((capacity_uri, RDFS.subClassOf, CAPACITY.Choice))


http://example.org/capacity/cvrisk_ckd_sev/or eGFR 45 does not look like a valid URI, trying to serialize this will break.
http://example.org/capacity/cvrisk_ckd_sev/or eGFR 30 does not look like a valid URI, trying to serialize this will break.
http://example.org/capacity/cvrisk_ckd_sev/or eGFR 45 does not look like a valid URI, trying to serialize this will break.
http://example.org/capacity/cvrisk_copd_sev/FEV1 50 does not look like a valid URI, trying to serialize this will break.
http://example.org/capacity/cvrisk_copd_sev/FEV1 30 does not look like a valid URI, trying to serialize this will break.
http://example.org/capacity/carhist_bmi/Text (number) or cal does not look like a valid URI, trying to serialize this will break.
http://example.org/capacity/cvrisk_bmi_inchpound/Text (number) or cal does not look like a valid URI, trying to serialize this will break.
http://example.org/capacity/capacity_cardiac_baseline_ass essment_required_complete/0 does not look like a valid URI, tr

In [8]:
# import pandas as pd 

# def str_to_tuples(s, sep):
#     l = s.split(sep)
#     l = [(l[i], l[i+1]) for i in range(0, len(l), 2)]
    
#     return l
    

# carmed_to_atc = 'C07A|Beta blocking agents|C01B|ANTIARRHYTHMICS, CLASS I AND III|C01AA05|digoxin|C03|Diuretics|C08|CALCIUM CHANNEL BLOCKERS|C09A | ACE inhibitors, plain|C09C|ANGIOTENSIN II RECEPTOR BLOCKERS (ARBs), PLAIN|C03DA|Aldosterone antagonists|C09CA03| valsartan|C02KX|Antihypertensives for pulmonary hypertension(phosphodisesterase)|B01AC|Platelet aggregation inhibitors excl.heparin|B01AA|Vitamin K antagonists(coumarin)|B01AE|direct thrombin inhibitors (DOAC)|C10|Lipid modifying agents|A10A|Insulins and analogues|A10B|bloog glucose lowering drugs, excl. insulins(Oral antidiabetic agents'
# carmed_to_atc = str_to_tuples(carmed_to_atc, '|')
# carmed_to_atc = map(lambda x: x[0], carmed_to_atc)
# carmed_to_atc = list(carmed_to_atc)
                

# carmed_field_index = '0-None, 1-Betablocker, 3-Antiarrhytmic drugs, 4-Digoxine, 5- Diuretics, 6- Calcium channel blocker, 7- ACE inhibitor, 8- Angiotensin II receptor blocker, 15- Aldosterone antagonist, 16 -Sacubitrivil/valsartan(Entresto),17-Phospodiesterase inhibitors,9 -antiplatelet agents, 10- coumarin, 11- direct oral anticoagulants(DOAC), 12-Lipid lowering agents, 13-Insulin, 14- Oral antidiabetic agents, 99-other cardiovascular medication'
# carmed_field_index = carmed_field_index.split(',')
# carmed_field_index = map(lambda x: x.split('-'), carmed_field_index)
# carmed_field_index = map(lambda x: tuple(x), carmed_field_index)
# carmed_field_index = list(carmed_field_index)

# display(f'carmed_field_index length: {len(carmed_field_index)}  carmed_to_atc: {len(carmed_to_atc)}')

# # Make sure the lists are of the same length so we can line them up and see what's missing
# carmed_to_atc = [''] + carmed_to_atc
# display(carmed_to_atc)
# display(carmed_field_index)

# combined = list(zip(carmed_to_atc, carmed_field_index))
# combined_df = pd.DataFrame(combined, columns=['ATC', 'choice'])

# combined_df.ATC = combined_df.ATC.str.strip()
# combined_df['number'] = combined_df['choice'].map(lambda x: int(x[0].strip()))
# display(combined_df)

# capacity_root_class = URIRef('http://example.org/capacity/capacityValue')
# capacity_fieldname_property =  URIRef('http://example.org/capacity/fieldName')

# for row in combined_df.itertuples():
#     if row.number == 0:
#         continue
        
#     capacity_uri =  URIRef(f'http://example.org/capacity/carmed/{row.number}')
#     atc_uri = URIRef(UATC + row.ATC)
    
#     print(f'CAPACITY uri: {capacity_uri} sameas atc uri: {atc_uri}')
               
#     patient_graph.add((atc_uri, OWL.sameAs, capacity_uri))
#     patient_graph.add((capacity_uri, SKOS.prefLabel, Literal(row.choice[1])))
#     patient_graph.add((capacity_uri, capacity_fieldname_property, Literal(f'carmed___{row.number}')))
    
#     # Let's combine all capacity classes under one superclass capacityValue
#     patient_graph.add((capacity_uri, RDFS.subClassOf, capacity_root_class))


Now we have the triples to connect medication to CAPACITY fields. Let's check if our patient Bob uses any cardiovascular medication.

 # Querying CAPACITY data
 For our conversion it would be convenient if we had a list of all the capacity values per patient.

In [13]:
query = \
'''
select ?patient ?field ?field_name ?choice_name
where {    
    ?patient ZIB:hasZibRecord ?zibRecord .
    ?zibProperty RDFS:subPropertyOf ZIB:zibProperty .
    ?zibRecord ?zibProperty ?zibValue .
    ?zibValue RDFS:subClassOf*/OWL:sameAs* ?field .
    ?field RDFS:subClassOf CAPACITY:Choice .
    ?field SKOS:prefLabel ?label .
    ?field CAPACITY:fieldName ?field_name .
    ?field CAPACITY:choiceName ?choice_name .
}
'''

result = patient_graph.query(query)
result_df = pd.DataFrame(result, columns=['patient', 'capacity_class', 'capacity_field', 'choice_name'])
display(result_df)

Unnamed: 0,patient,capacity_class,capacity_field,choice_name
0,http://example.org/patient/bob,http://example.org/capacity/carmed/1,carmed,carmed___1
1,http://example.org/patient/bob,http://example.org/capacity/carmed/3,carmed,carmed___3
2,http://example.org/patient/bob,http://example.org/capacity/carmed_bb/1,carmed_bb,carmed_bb___1
3,http://example.org/patient/bob,http://example.org/capacity/carmed_antiarrh/1,carmed_antiarrh,carmed_antiarrh___1
4,http://example.org/patient/susan,http://example.org/capacity/carmed_antiplate/1,carmed_antiplate,carmed_antiplate___1
5,http://example.org/patient/susan,http://example.org/capacity/carmed/9,carmed,carmed___9
6,http://example.org/patient/bob,http://example.org/capacity/carmed_arrhyth_cla...,carmed_arrhyth_class1,carmed_arrhyth_class1___1


# Converting answers to CAPACITY codebook format
Converting the data to a list of dicts, every dict representing one patient.
It is possible to upload a pandas DataFrame to REDCap but as a lot of fields are optional the list of dicts will probably be more compact.

In [14]:
def collapse_patient(group: pd.DataFrame) -> pd.DataFrame:
    fields = list(group.choice_name.apply(lambda x: x.value).values)
    
    # Assuming all fields are multiple choice. If a field exists for a patient, its value is True
    values = [True] * len(fields)
    
    row = dict(zip(fields, values))
    row['subjid'] = group.patient.iloc[0].toPython()
    
    return row
    

# Get one row per patient
redcap_records = result_df.groupby(by='patient').apply(collapse_patient).tolist()
display(redcap_records)

[{'carmed___1': True,
  'carmed___3': True,
  'carmed_bb___1': True,
  'carmed_antiarrh___1': True,
  'carmed_arrhyth_class1___1': True,
  'subjid': 'http://example.org/patient/bob'},
 {'carmed_antiplate___1': True,
  'carmed___9': True,
  'subjid': 'http://example.org/patient/susan'}]

In [15]:
from redcap import Project
import os

api_url = 'https://redcap.heart-institute.nl/api/'
api_key = os.environ['REDCAP_TOKEN']

project = Project(api_url, api_key)

project.import_records(redcap_records)

{'count': 2}

## TODO:
- Make the system work for open answers, not just multiple choice.