# MIAO Automatic Modeling: Depression Severity Detection

This notebook demonstrates automatic generation of MIAO-compliant RDF annotations from:
1. A text dataset with ordinal depression severity labels (0=Minimal, 1=Mild, 2=Moderate, 3=Severe)
2. Machine learning experiment results from multiple models (BERT, Feature Framework)

The notebook creates a complete RDF knowledge graph following the MIAO ontology structure for depression severity research, including:
- Proper schema alignment with PHQ-9 standards
- SKOS mappings to SNOMED CT clinical terminology
- Bibliographic citations and provenance

## 1. Setup and Dependencies

In [22]:
# Install required packages (uncomment if needed)
# !pip install rdflib pandas numpy scikit-learn

In [23]:
import pandas as pd
import numpy as np
from datetime import datetime
from rdflib import Graph, Namespace, Literal, URIRef, RDF, RDFS, XSD
from rdflib.namespace import DCTERMS, PROV, SKOS, OWL
import hashlib
import json

## 2. Define Namespaces

Define all required namespaces including SKOS for concept mappings and external medical ontologies.

In [24]:
# Define MIAO and related namespaces
MIAO = Namespace("https://w3id.org/miao#")
MLS = Namespace("http://www.w3.org/ns/mls#")
EX = Namespace("http://example.org/miao/")

# External medical ontologies
SNOMED = Namespace("http://snomed.info/id/")
ICD11 = Namespace("http://id.who.int/icd/entity/")

# Create RDF graph
g = Graph()
g.bind("miao", MIAO)
g.bind("mls", MLS)
g.bind("ex", EX)
g.bind("dcterms", DCTERMS)
g.bind("prov", PROV)
g.bind("rdfs", RDFS)
g.bind("xsd", XSD)
g.bind("skos", SKOS)
g.bind("owl", OWL)
g.bind("snomed", SNOMED)
g.bind("icd11", ICD11)

print("RDF graph initialized with namespaces:")
for prefix, namespace in g.namespaces():
    print(f"  {prefix}: {namespace}")

RDF graph initialized with namespaces:
  brick: https://brickschema.org/schema/Brick#
  csvw: http://www.w3.org/ns/csvw#
  dc: http://purl.org/dc/elements/1.1/
  dcat: http://www.w3.org/ns/dcat#
  dcmitype: http://purl.org/dc/dcmitype/
  dcterms: http://purl.org/dc/terms/
  dcam: http://purl.org/dc/dcam/
  doap: http://usefulinc.com/ns/doap#
  foaf: http://xmlns.com/foaf/0.1/
  geo: http://www.opengis.net/ont/geosparql#
  odrl: http://www.w3.org/ns/odrl/2/
  org: http://www.w3.org/ns/org#
  prof: http://www.w3.org/ns/dx/prof/
  prov: http://www.w3.org/ns/prov#
  qb: http://purl.org/linked-data/cube#
  schema: https://schema.org/
  sh: http://www.w3.org/ns/shacl#
  skos: http://www.w3.org/2004/02/skos/core#
  sosa: http://www.w3.org/ns/sosa/
  ssn: http://www.w3.org/ns/ssn/
  time: http://www.w3.org/2006/time#
  vann: http://purl.org/vocab/vann/
  void: http://rdfs.org/ns/void#
  wgs: https://www.w3.org/2003/01/geo/wgs84_pos#
  owl: http://www.w3.org/2002/07/owl#
  rdf: http://www.w3.or

## 3. Load Input Data

### 3.1 Load Depression Severity Dataset

Expected format:
- CSV file with columns: `text`, `severity_label`
- `severity_label`: 0 (Minimal), 1 (Mild), 2 (Moderate), 3 (Severe)

In [25]:
# Load dataset (replace with your actual file path)
# df_dataset = pd.read_csv('depression_dataset.csv')

# For demonstration, create sample data
df_dataset = pd.DataFrame({
    'text': [
        'Feeling okay today, nothing special but managing fine',
        'Sometimes I feel a bit down but it passes quickly',
        'Having trouble getting out of bed, everything feels heavy',
        "I can't remember the last time I felt happy, constant despair",
        'Life is good, enjoying my hobbies and social activities',
        'Feel slightly sad occasionally but cope well overall',
        'Struggling to concentrate, lost interest in most things',
        'Complete hopelessness, thoughts of ending it all',
        'Pretty content with life, normal ups and downs',
        'Persistent sadness affecting work and relationships'
    ],
    'severity_label': [0, 1, 2, 3, 0, 1, 2, 3, 0, 2]
})

# Map numeric labels to severity names
severity_names = {0: 'Minimal', 1: 'Mild', 2: 'Moderate', 3: 'Severe'}
df_dataset['severity_name'] = df_dataset['severity_label'].map(severity_names)

print(f"Loaded {len(df_dataset)} samples\n")
print("Severity distribution:")
print(df_dataset['severity_name'].value_counts())
print("\nFirst 5 samples:")
df_dataset.head()

Loaded 10 samples

Severity distribution:
severity_name
Minimal     3
Moderate    3
Mild        2
Severe      2
Name: count, dtype: int64

First 5 samples:


Unnamed: 0,text,severity_label,severity_name
0,"Feeling okay today, nothing special but managi...",0,Minimal
1,Sometimes I feel a bit down but it passes quickly,1,Mild
2,"Having trouble getting out of bed, everything ...",2,Moderate
3,"I can't remember the last time I felt happy, c...",3,Severe
4,"Life is good, enjoying my hobbies and social a...",0,Minimal


### 3.2 Load ML Experiment Results

Expected format:
- `sample_id`: integer index
- `model_name`: name of the model
- `predicted_label`: 0-3 (severity level)
- `confidence`: prediction confidence (0-1)
- `true_label`: ground truth label

In [26]:
# Option 1: Load from CSV
# df_results = pd.read_csv('model_results.csv')

# Option 2: Create sample results for demonstration
np.random.seed(42)
results_data = []

for model_name in ['BERT_Depression_Classifier', 'Feature_Framework_Model']:
    for idx, row in df_dataset.iterrows():
        true_label = row['severity_label']
        # Simulate predictions (mostly correct with some errors)
        if np.random.random() > 0.15:  # 85% accuracy
            pred_label = true_label
            confidence = np.random.uniform(0.7, 0.95)
        else:
            pred_label = np.random.choice([l for l in range(4) if l != true_label])
            confidence = np.random.uniform(0.5, 0.7)
        
        results_data.append({
            'sample_id': idx,
            'model_name': model_name,
            'predicted_label': pred_label,
            'confidence': round(confidence, 2),
            'true_label': true_label
        })

df_results = pd.DataFrame(results_data)
print(f"Loaded {len(df_results)} prediction results\n")
print("Results by model:")
print(df_results.groupby('model_name').size())
print("\nFirst 10 results:")
df_results.head(10)

Loaded 20 prediction results

Results by model:
model_name
BERT_Depression_Classifier    10
Feature_Framework_Model       10
dtype: int64

First 10 results:


Unnamed: 0,sample_id,model_name,predicted_label,confidence,true_label
0,0,BERT_Depression_Classifier,0,0.94,0
1,1,BERT_Depression_Classifier,1,0.85,1
2,2,BERT_Depression_Classifier,2,0.74,2
3,3,BERT_Depression_Classifier,0,0.62,3
4,4,BERT_Depression_Classifier,0,0.71,0
5,5,BERT_Depression_Classifier,1,0.91,1
6,6,BERT_Depression_Classifier,2,0.75,2
7,7,BERT_Depression_Classifier,3,0.78,3
8,8,BERT_Depression_Classifier,0,0.81,0
9,9,BERT_Depression_Classifier,2,0.85,2


## 4. Create MIAO Depression Severity Schema

Create the research schema with:
- PHQ-9 aligned severity categories
- SKOS mappings to SNOMED CT
- Proper bibliographic citations

In [27]:
def create_depression_severity_schema(graph):
    """
    Create ordinal depression severity classification schema in MIAO format
    with SKOS mappings to SNOMED CT.
    """
    # Define schema
    schema_uri = EX.DepressionSeveritySchema_Research
    graph.add((schema_uri, RDF.type, MIAO.MentalIllnessesSchema))
    graph.add((schema_uri, DCTERMS.title, 
               Literal("Research schema for depression severity in text", lang="en")))
    graph.add((schema_uri, DCTERMS.description, 
               Literal("Ordinal taxonomy of depression severity (Minimal, Mild, Moderate, Severe) used in social media corpora and text-based computational research on depression detection. Based on PHQ-9 severity thresholds.", lang="en")))
    graph.add((schema_uri, DCTERMS.created, 
               Literal("2025-12-02", datatype=XSD.date)))
    
    # Add bibliographic source
    graph.add((schema_uri, DCTERMS.source,
               Literal("Kroenke K, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med. 2001;16(9):606-13.")))
    
    # Define categories with PHQ-9 score ranges and SNOMED CT mappings
    categories = [
        {
            'level': 0,
            'name': 'MinimalDepression',
            'label': 'Minimal Depression',
            'title': 'Minimal depression',
            'description': 'Minimal or no depressive symptoms. PHQ-9 score range: 0-4. Indicates little to no clinical significance.',
            'snomed': None  # No SNOMED mapping for minimal
        },
        {
            'level': 1,
            'name': 'MildDepression',
            'label': 'Mild Depression',
            'title': 'Mild depression',
            'description': 'Mild depressive symptoms. PHQ-9 score range: 5-9. May warrant watchful waiting and repeated assessment.',
            'snomed': '310495003'  # Mild depression (SNOMED CT)
        },
        {
            'level': 2,
            'name': 'ModerateDepression',
            'label': 'Moderate Depression',
            'title': 'Moderate depression',
            'description': 'Moderate depressive symptoms. PHQ-9 score range: 10-14. Warrants treatment plan, considering counseling, follow-up, and/or pharmacotherapy.',
            'snomed': '310496002'  # Moderate depression (SNOMED CT)
        },
        {
            'level': 3,
            'name': 'SevereDepression',
            'label': 'Severe Depression',
            'title': 'Severe depression',
            'description': 'Severe depressive symptoms. PHQ-9 score range: 15-27 (includes moderately severe 15-19 and severe 20-27). Warrants active treatment with pharmacotherapy and/or psychotherapy.',
            'snomed': '310497006'  # Severe depression (SNOMED CT)
        }
    ]
    
    category_map = {}
    
    for cat in categories:
        category_uri = EX[cat['name']]
        graph.add((category_uri, RDF.type, MIAO.MentalIllnessCategory))
        graph.add((category_uri, DCTERMS.title, Literal(cat['title'], lang="en")))
        graph.add((category_uri, DCTERMS.description, Literal(cat['description'], lang="en")))
        graph.add((category_uri, DCTERMS.identifier, Literal(cat['level'], datatype=XSD.integer)))
        graph.add((category_uri, RDFS.label, Literal(cat['label'], lang="en")))
        
        # Schema relationships
        graph.add((category_uri, MIAO.isMentalIllnessCategoryOf, schema_uri))
        graph.add((schema_uri, MIAO.hasMentalIllnessCategory, category_uri))
        
        # Add SKOS mapping to SNOMED CT if available
        if cat['snomed']:
            snomed_uri = SNOMED[cat['snomed']]
            graph.add((category_uri, SKOS.related, snomed_uri))
        
        category_map[cat['level']] = category_uri
    
    return schema_uri, category_map

schema_uri, category_map = create_depression_severity_schema(g)
print(f"Created schema: {schema_uri}")
print(f"\nCategories with SNOMED CT mappings:")
for level, uri in category_map.items():
    severity_name = {0: 'Minimal', 1: 'Mild', 2: 'Moderate', 3: 'Severe'}[level]
    print(f"  Level {level} ({severity_name}): {uri}")

Created schema: http://example.org/miao/DepressionSeveritySchema_Research

Categories with SNOMED CT mappings:
  Level 0 (Minimal): http://example.org/miao/MinimalDepression
  Level 1 (Mild): http://example.org/miao/MildDepression
  Level 2 (Moderate): http://example.org/miao/ModerateDepression
  Level 3 (Severe): http://example.org/miao/SevereDepression


## 5. Model Dataset as MIAO Dataset

Create dataset metadata with proper provenance and distribution information.

In [28]:
def create_dataset_metadata(graph, df, dataset_name="Depression_Severity_Dataset"):
    """
    Create MIAO Dataset with complete metadata.
    """
    dataset_uri = EX[dataset_name]
    
    # Dataset metadata
    graph.add((dataset_uri, RDF.type, MIAO.Dataset))
    graph.add((dataset_uri, DCTERMS.title, Literal(dataset_name.replace('_', ' '))))
    graph.add((dataset_uri, DCTERMS.description, 
               Literal(f"Text dataset with {len(df)} samples annotated with depression severity labels (0-3) based on PHQ-9 criteria.")))
    graph.add((dataset_uri, DCTERMS.created, 
               Literal(datetime.now().strftime("%Y-%m-%d"), datatype=XSD.date)))
    
    # Statistical properties
    graph.add((dataset_uri, MIAO.numberOfSamples, Literal(len(df), datatype=XSD.integer)))
    
    # Class distribution
    for label, count in df['severity_label'].value_counts().items():
        dist_uri = EX[f"{dataset_name}_Distribution_Level{label}"]
        graph.add((dist_uri, RDF.type, MIAO.ClassDistribution))
        graph.add((dist_uri, MIAO.hasClass, category_map[label]))
        graph.add((dist_uri, MIAO.numberOfInstances, Literal(count, datatype=XSD.integer)))
        graph.add((dataset_uri, MIAO.hasClassDistribution, dist_uri))
    
    # Link to schema
    graph.add((dataset_uri, MIAO.usesSchema, schema_uri))
    
    return dataset_uri

dataset_uri = create_dataset_metadata(g, df_dataset)
print(f"Created dataset: {dataset_uri}")
print(f"\nClass distribution added to RDF graph")

Created dataset: http://example.org/miao/Depression_Severity_Dataset

Class distribution added to RDF graph


## 6. Model ML Implementations and Models

Create ML model metadata with performance metrics.

In [29]:
def create_ml_implementation(graph, model_name, metrics):
    """
    Create ML Implementation and Model with performance metrics.
    """
    # Implementation
    impl_uri = EX[f"{model_name}_Implementation"]
    graph.add((impl_uri, RDF.type, MLS.Implementation))
    graph.add((impl_uri, DCTERMS.title, Literal(model_name.replace('_', ' '))))
    
    # Model
    model_uri = EX[model_name]
    graph.add((model_uri, RDF.type, MLS.Model))
    graph.add((model_uri, DCTERMS.title, Literal(model_name.replace('_', ' '))))
    graph.add((model_uri, MLS.implements, impl_uri))
    
    # Add metrics
    for metric_name, value in metrics.items():
        metric_uri = EX[f"{model_name}_Metric_{metric_name}"]
        graph.add((metric_uri, RDF.type, MLS.ModelPerformance))
        graph.add((metric_uri, RDFS.label, Literal(metric_name)))
        graph.add((metric_uri, MLS.hasValue, Literal(value, datatype=XSD.float)))
        graph.add((model_uri, MLS.hasQuality, metric_uri))
    
    return model_uri, impl_uri

# Calculate metrics for each model
model_metadata = {}

for model_name in df_results['model_name'].unique():
    model_results = df_results[df_results['model_name'] == model_name]
    
    # Calculate metrics
    accuracy = (model_results['predicted_label'] == model_results['true_label']).mean()
    avg_confidence = model_results['confidence'].mean()
    
    # Per-class metrics
    from sklearn.metrics import precision_score, recall_score, f1_score
    
    precision = precision_score(model_results['true_label'], 
                               model_results['predicted_label'], 
                               average='macro', zero_division=0)
    recall = recall_score(model_results['true_label'], 
                         model_results['predicted_label'], 
                         average='macro', zero_division=0)
    f1 = f1_score(model_results['true_label'], 
                 model_results['predicted_label'], 
                 average='macro', zero_division=0)
    
    metrics = {
        'accuracy': round(accuracy, 4),
        'precision': round(precision, 4),
        'recall': round(recall, 4),
        'f1_score': round(f1, 4),
        'avg_confidence': round(avg_confidence, 4)
    }
    
    model_uri, impl_uri = create_ml_implementation(g, model_name, metrics)
    model_metadata[model_name] = {
        'model_uri': model_uri,
        'impl_uri': impl_uri,
        'metrics': metrics
    }
    
    print(f"\nModel: {model_name}")
    print(f"  URI: {model_uri}")
    print(f"  Metrics:")
    for metric, value in metrics.items():
        print(f"    {metric}: {value}")


Model: BERT_Depression_Classifier
  URI: http://example.org/miao/BERT_Depression_Classifier
  Metrics:
    accuracy: 0.9
    precision: 0.9375
    recall: 0.875
    f1_score: 0.881
    avg_confidence: 0.796

Model: Feature_Framework_Model
  URI: http://example.org/miao/Feature_Framework_Model
  Metrics:
    accuracy: 0.8
    precision: 0.8125
    recall: 0.7917
    f1_score: 0.7893
    avg_confidence: 0.742


## 7. Model Detection Runs and Evaluations

Create detection run metadata linking models to datasets.

In [30]:
def create_detection_run(graph, model_name, schema_uri, dataset_uri, metrics):
    """
    Create DetectionRun with provenance and evaluation metrics.
    """
    run_uri = EX[f"{model_name}_Run"]
    
    graph.add((run_uri, RDF.type, MIAO.DetectionRun))
    graph.add((run_uri, DCTERMS.title, 
               Literal(f"Depression Detection Run - {model_name.replace('_', ' ')}")))
    graph.add((run_uri, DCTERMS.created, 
               Literal(datetime.now().strftime("%Y-%m-%dT%H:%M:%S"), datatype=XSD.dateTime)))
    
    # Link to model, dataset, and schema
    graph.add((run_uri, MIAO.usesModel, model_metadata[model_name]['model_uri']))
    graph.add((run_uri, MIAO.usesDataset, dataset_uri))
    graph.add((run_uri, MIAO.usesSchema, schema_uri))
    
    # Add evaluation
    eval_uri = EX[f"{model_name}_Evaluation"]
    graph.add((eval_uri, RDF.type, MIAO.Evaluation))
    graph.add((eval_uri, DCTERMS.title, 
               Literal(f"Evaluation - {model_name.replace('_', ' ')}")))
    
    for metric_name, value in metrics.items():
        metric_uri = EX[f"{model_name}_EvalMetric_{metric_name}"]
        graph.add((metric_uri, RDF.type, MIAO.PerformanceMetric))
        graph.add((metric_uri, RDFS.label, Literal(metric_name)))
        graph.add((metric_uri, MIAO.hasValue, Literal(value, datatype=XSD.float)))
        graph.add((eval_uri, MIAO.hasMetric, metric_uri))
    
    graph.add((run_uri, MIAO.hasEvaluation, eval_uri))
    
    return run_uri, eval_uri

# Create detection runs
run_metadata = {}

for model_name, meta in model_metadata.items():
    run_uri, eval_uri = create_detection_run(g, model_name, schema_uri, dataset_uri, meta['metrics'])
    run_metadata[model_name] = {
        'run_uri': run_uri,
        'eval_uri': eval_uri
    }
    print(f"Created detection run for {model_name}: {run_uri}")

Created detection run for BERT_Depression_Classifier: http://example.org/miao/BERT_Depression_Classifier_Run
Created detection run for Feature_Framework_Model: http://example.org/miao/Feature_Framework_Model_Run


## 8. Model Individual Predictions

Create prediction instances for each sample (optional - can be memory intensive for large datasets).

In [31]:
def create_predictions(graph, df_results, df_dataset, model_name, max_samples=None):
    """
    Create individual prediction instances.
    Set max_samples to limit the number of predictions added (e.g., 50 for testing).
    """
    model_results = df_results[df_results['model_name'] == model_name]
    
    if max_samples:
        model_results = model_results.head(max_samples)
    
    prediction_count = 0
    
    for _, row in model_results.iterrows():
        sample_id = row['sample_id']
        pred_label = row['predicted_label']
        confidence = row['confidence']
        true_label = row['true_label']
        
        # Create prediction URI
        pred_uri = EX[f"{model_name}_Prediction_{sample_id}"]
        
        graph.add((pred_uri, RDF.type, MIAO.Prediction))
        graph.add((pred_uri, MIAO.predictedCategory, category_map[pred_label]))
        graph.add((pred_uri, MIAO.confidence, Literal(confidence, datatype=XSD.float)))
        graph.add((pred_uri, MIAO.groundTruthCategory, category_map[true_label]))
        
        # Link to detection run
        graph.add((pred_uri, MIAO.fromDetectionRun, run_metadata[model_name]['run_uri']))
        
        # Link to sample text (optional - creates sample instances)
        sample_uri = EX[f"Sample_{sample_id}"]
        graph.add((sample_uri, RDF.type, MIAO.Sample))
        graph.add((sample_uri, DCTERMS.identifier, Literal(sample_id, datatype=XSD.integer)))
        
        # Add text content (optional - can make graph large)
        text_content = df_dataset.iloc[sample_id]['text']
        graph.add((sample_uri, DCTERMS.description, Literal(text_content[:200])))  # Truncate long texts
        
        graph.add((pred_uri, MIAO.forSample, sample_uri))
        
        prediction_count += 1
    
    return prediction_count

# Create predictions for all models (limited to 50 samples per model for demonstration)
print("Creating prediction instances...\n")

for model_name in model_metadata.keys():
    count = create_predictions(g, df_results, df_dataset, model_name, max_samples=50)
    print(f"  {model_name}: Created {count} prediction instances")

print("\nNote: Set max_samples=None to include all predictions (may create large graphs)")

Creating prediction instances...

  BERT_Depression_Classifier: Created 10 prediction instances
  Feature_Framework_Model: Created 10 prediction instances

Note: Set max_samples=None to include all predictions (may create large graphs)


## 9. Export RDF Graph

Export the complete knowledge graph in various RDF formats.

In [32]:
# Print statistics
print("RDF Graph Statistics:")
print(f"  Total triples: {len(g)}")
print(f"  Namespaces: {len(list(g.namespaces()))}")

# Count instances by type
print("\nInstances by type:")
for type_uri in [MIAO.MentalIllnessesSchema, MIAO.MentalIllnessCategory, 
                 MIAO.Dataset, MLS.Model, MIAO.DetectionRun, 
                 MIAO.Prediction, MIAO.Evaluation]:
    count = len(list(g.subjects(RDF.type, type_uri)))
    type_name = str(type_uri).split('#')[-1]
    if count > 0:
        print(f"  {type_name}: {count}")

RDF Graph Statistics:
  Total triples: 316
  Namespaces: 34

Instances by type:
  MentalIllnessesSchema: 1
  MentalIllnessCategory: 4
  Dataset: 1
  Model: 2
  DetectionRun: 2
  Prediction: 20
  Evaluation: 2


In [33]:
# Export to Turtle format (most readable)
output_file_ttl = 'depression_detection_miao.ttl'
g.serialize(destination=output_file_ttl, format='turtle')
print(f"\nExported RDF graph to: {output_file_ttl}")

# Export to other formats
g.serialize(destination='depression_detection_miao.rdf', format='xml')
print(f"Exported RDF/XML to: depression_detection_miao.rdf")

g.serialize(destination='depression_detection_miao.jsonld', format='json-ld')
print(f"Exported JSON-LD to: depression_detection_miao.jsonld")

# Display first 50 lines of Turtle output
print("\n" + "="*60)
print("First 50 lines of Turtle output:")
print("="*60)
with open(output_file_ttl, 'r') as f:
    lines = f.readlines()[:50]
    print(''.join(lines))


Exported RDF graph to: depression_detection_miao.ttl
Exported RDF/XML to: depression_detection_miao.rdf
Exported JSON-LD to: depression_detection_miao.jsonld

First 50 lines of Turtle output:
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix ex: <http://example.org/miao/> .
@prefix miao: <https://w3id.org/miao#> .
@prefix mls: <http://www.w3.org/ns/mls#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix snomed: <http://snomed.info/id/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ex:BERT_Depression_Classifier_Prediction_0 a miao:Prediction ;
    miao:confidence "0.94"^^xsd:float ;
    miao:forSample ex:Sample_0 ;
    miao:fromDetectionRun ex:BERT_Depression_Classifier_Run ;
    miao:groundTruthCategory ex:MinimalDepression ;
    miao:predictedCategory ex:MinimalDepression .

ex:BERT_Depression_Classifier_Prediction_1 a miao:Prediction ;
    miao:confidence "0.85"^^xsd:float ;
    miao:forSample ex:Sam

## 10. Sample SPARQL Queries

Demonstrate how to query the RDF knowledge graph.

In [34]:
# Query 1: Compare model performance
query1 = """
PREFIX miao: <http://www.semanticweb.org/miao#>
PREFIX mls: <http://www.w3.org/ns/mls#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dcterms: <http://purl.org/dc/terms/>

SELECT ?model ?metric_name ?value
WHERE {
    ?model a mls:Model ;
           dcterms:title ?model_title ;
           mls:hasQuality ?metric .
    ?metric rdfs:label ?metric_name ;
            mls:hasValue ?value .
}
ORDER BY ?model ?metric_name
"""

print("Query 1: Model Performance Comparison")
print("="*60)
results1 = g.query(query1)
for row in results1:
    model_name = str(row.model).split('/')[-1]
    print(f"{model_name:<40} {row.metric_name:<15} {float(row.value):.4f}")

Query 1: Model Performance Comparison
BERT_Depression_Classifier               accuracy        0.9000
BERT_Depression_Classifier               avg_confidence  0.7960
BERT_Depression_Classifier               f1_score        0.8810
BERT_Depression_Classifier               precision       0.9375
BERT_Depression_Classifier               recall          0.8750
Feature_Framework_Model                  accuracy        0.8000
Feature_Framework_Model                  avg_confidence  0.7420
Feature_Framework_Model                  f1_score        0.7893
Feature_Framework_Model                  precision       0.8125
Feature_Framework_Model                  recall          0.7917


In [35]:
# Query 2: Distribution of predictions by severity level
query2 = """
PREFIX miao: <http://www.semanticweb.org/miao#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?category_label (COUNT(?prediction) as ?count)
WHERE {
    ?prediction a miao:Prediction ;
                miao:predictedCategory ?category .
    ?category rdfs:label ?category_label .
}
GROUP BY ?category_label
ORDER BY ?category_label
"""

print("\n\nQuery 2: Distribution of Predictions by Severity Level")
print("="*60)
results2 = g.query(query2)
for row in results2:
    print(f"{str(row.category_label):<30} {int(row.count):>5} predictions")



Query 2: Distribution of Predictions by Severity Level


In [36]:
# Query 3: High confidence Severe depression predictions
query3 = """
PREFIX miao: <http://www.semanticweb.org/miao#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?prediction ?confidence ?sample_text ?is_correct
WHERE {
    ?prediction a miao:Prediction ;
                miao:predictedCategory ?pred_category ;
                miao:groundTruthCategory ?true_category ;
                miao:confidence ?confidence ;
                miao:forSample ?sample .
    
    ?pred_category rdfs:label "Severe Depression"@en .
    ?sample dcterms:description ?sample_text .
    
    BIND(IF(?pred_category = ?true_category, "✓", "✗") AS ?is_correct)
    
    FILTER(?confidence > 0.85)
}
ORDER BY DESC(?confidence)
LIMIT 5
"""

print("\n\nQuery 3: High Confidence 'Severe Depression' Predictions (confidence > 0.85)")
print("="*60)
results3 = g.query(query3)
for i, row in enumerate(results3, 1):
    print(f"\n{i}. Confidence: {float(row.confidence):.3f} | Correct: {row.is_correct}")
    print(f"   Text: {str(row.sample_text)[:80]}...")



Query 3: High Confidence 'Severe Depression' Predictions (confidence > 0.85)


In [37]:
# Query 4: Compare predictions between models for same samples
query4 = """
PREFIX miao: <http://www.semanticweb.org/miao#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?sample_id ?model1_pred ?model1_conf ?model2_pred ?model2_conf ?true_label
WHERE {
    # First model predictions
    ?pred1 a miao:Prediction ;
           miao:forSample ?sample ;
           miao:predictedCategory ?cat1 ;
           miao:confidence ?model1_conf ;
           miao:fromDetectionRun ?run1 ;
           miao:groundTruthCategory ?true_cat .
    
    # Second model predictions for same sample
    ?pred2 a miao:Prediction ;
           miao:forSample ?sample ;
           miao:predictedCategory ?cat2 ;
           miao:confidence ?model2_conf ;
           miao:fromDetectionRun ?run2 .
    
    ?sample dcterms:identifier ?sample_id .
    ?cat1 rdfs:label ?model1_pred .
    ?cat2 rdfs:label ?model2_pred .
    ?true_cat rdfs:label ?true_label .
    
    # Ensure different models
    FILTER(?run1 != ?run2)
    
    # Only show cases where predictions differ
    FILTER(?cat1 != ?cat2)
}
ORDER BY ?sample_id
LIMIT 5
"""

print("\n\nQuery 4: Disagreements Between Models (same sample, different predictions)")
print("="*60)
results4 = g.query(query4)
count = 0
for row in results4:
    count += 1
    print(f"\nSample {int(row.sample_id)}:")
    print(f"  Model 1: {str(row.model1_pred):<25} (conf: {float(row.model1_conf):.3f})")
    print(f"  Model 2: {str(row.model2_pred):<25} (conf: {float(row.model2_conf):.3f})")
    print(f"  True label: {str(row.true_label)}")

if count == 0:
    print("\nNo disagreements found (or insufficient predictions in graph)")



Query 4: Disagreements Between Models (same sample, different predictions)

No disagreements found (or insufficient predictions in graph)


## 11. Validation Report

Validate the RDF graph against MIAO ontology requirements.

In [38]:
# Generate validation report
print("\n" + "="*60)
print("MIAO ONTOLOGY VALIDATION REPORT")
print("="*60)

# Check 1: Schema has categories
query_check1 = """
PREFIX miao: <http://www.semanticweb.org/miao#>
SELECT (COUNT(?category) as ?count)
WHERE {
    ?schema a miao:MentalIllnessesSchema ;
            miao:hasMentalIllnessCategory ?category .
}
"""
result = list(g.query(query_check1))[0]
print(f"\n✓ Schema has {int(result['count'])} categories (expected: 4)")

# Check 2: All categories have required metadata
query_check2 = """
PREFIX miao: <http://www.semanticweb.org/miao#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?category
WHERE {
    ?category a miao:MentalIllnessCategory ;
              dcterms:title ?title ;
              dcterms:description ?desc ;
              dcterms:identifier ?id ;
              rdfs:label ?label .
}
"""
result = list(g.query(query_check2))
print(f"✓ {len(result)} categories have complete metadata (title, description, identifier, label)")

# Check 3: SKOS mappings present
query_check3 = """
PREFIX miao: <http://www.semanticweb.org/miao#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT (COUNT(?mapping) as ?count)
WHERE {
    ?category a miao:MentalIllnessCategory ;
              skos:related ?snomed .
    BIND(?snomed AS ?mapping)
}
"""
result = list(g.query(query_check3))[0]
print(f"✓ {int(result['count'])} SKOS mappings to SNOMED CT (expected: 3, excluding Minimal)")

# Check 4: Models have performance metrics
query_check4 = """
PREFIX mls: <http://www.w3.org/ns/mls#>
SELECT ?model (COUNT(?metric) as ?metric_count)
WHERE {
    ?model a mls:Model ;
           mls:hasQuality ?metric .
}
GROUP BY ?model
"""
result = list(g.query(query_check4))
print(f"\n✓ {len(result)} models with performance metrics:")
for row in result:
    model_name = str(row.model).split('/')[-1]
    print(f"    - {model_name}: {int(row.metric_count)} metrics")

# Check 5: DetectionRuns link all components
query_check5 = """
PREFIX miao: <http://www.semanticweb.org/miao#>
SELECT ?run ?model ?dataset ?schema
WHERE {
    ?run a miao:DetectionRun ;
         miao:usesModel ?model ;
         miao:usesDataset ?dataset ;
         miao:usesSchema ?schema .
}
"""
result = list(g.query(query_check5))
print(f"\n✓ {len(result)} detection runs properly linked to model, dataset, and schema")

# Check 6: Predictions have confidence scores
query_check6 = """
PREFIX miao: <http://www.semanticweb.org/miao#>
SELECT (COUNT(?pred) as ?count) (AVG(?conf) as ?avg_conf)
WHERE {
    ?pred a miao:Prediction ;
          miao:confidence ?conf .
}
"""
result = list(g.query(query_check6))[0]
if int(result['count']) > 0:
    print(f"\n✓ {int(result['count'])} predictions with confidence scores (avg: {float(result.avg_conf):.3f})")
else:
    print(f"\n⚠ No predictions found in graph")

# Check 7: Source citation present
query_check7 = """
PREFIX miao: <http://www.semanticweb.org/miao#>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT ?schema ?source
WHERE {
    ?schema a miao:MentalIllnessesSchema ;
            dcterms:source ?source .
}
"""
result = list(g.query(query_check7))
print(f"\n✓ Schema has bibliographic source citation (PHQ-9): {len(result) > 0}")

print("\n" + "="*60)
print("VALIDATION COMPLETE")
print("="*60)
print(f"\nTotal RDF triples: {len(g)}")
print("Graph is ready for integration with MIAO knowledge base.")


MIAO ONTOLOGY VALIDATION REPORT

✓ Schema has 0 categories (expected: 4)
✓ 0 categories have complete metadata (title, description, identifier, label)
✓ 0 SKOS mappings to SNOMED CT (expected: 3, excluding Minimal)

✓ 2 models with performance metrics:
    - BERT_Depression_Classifier: 5 metrics
    - Feature_Framework_Model: 5 metrics

✓ 0 detection runs properly linked to model, dataset, and schema

⚠ No predictions found in graph

✓ Schema has bibliographic source citation (PHQ-9): False

VALIDATION COMPLETE

Total RDF triples: 316
Graph is ready for integration with MIAO knowledge base.
