# Depression Detection ML Pipeline - MIAO Ontology

This notebook creates RDF triples using:
- **MIAO**: Mental Illness Analysis Ontology 
- **ML-Schema**: Machine Learning Schema (W3C standard)

## 1. Setup and Dependencies

In [39]:
# Install required packages (uncomment if needed)
# !pip install rdflib pandas numpy scikit-learn

In [40]:
import pandas as pd
import numpy as np
from datetime import datetime
from rdflib import Graph, Namespace, Literal, URIRef, RDF, RDFS, XSD
from rdflib.namespace import DCTERMS, PROV, SKOS, OWL
import hashlib
import json

## 2. Define Namespaces

Define all required namespaces including SKOS for concept mappings and external medical ontologies.

In [41]:
# Define MIAO and related namespaces
MIAO = Namespace("https://w3id.org/miao#")
MLS = Namespace("http://www.w3.org/ns/mls#")
EX = Namespace("http://example.org/miao/")

# External medical ontologies
SNOMED = Namespace("http://snomed.info/id/")
ICD11 = Namespace("http://id.who.int/icd/entity/")

# Create RDF graph
g = Graph()
g.bind("miao", MIAO)
g.bind("mls", MLS)
g.bind("ex", EX)
g.bind("dcterms", DCTERMS)
g.bind("prov", PROV)
g.bind("rdfs", RDFS)
g.bind("xsd", XSD)
g.bind("skos", SKOS)
g.bind("owl", OWL)
g.bind("snomed", SNOMED)
g.bind("icd11", ICD11)

print("RDF graph initialized with namespaces:")
for prefix, namespace in g.namespaces():
    print(f"  {prefix}: {namespace}")

RDF graph initialized with namespaces:
  brick: https://brickschema.org/schema/Brick#
  csvw: http://www.w3.org/ns/csvw#
  dc: http://purl.org/dc/elements/1.1/
  dcat: http://www.w3.org/ns/dcat#
  dcmitype: http://purl.org/dc/dcmitype/
  dcterms: http://purl.org/dc/terms/
  dcam: http://purl.org/dc/dcam/
  doap: http://usefulinc.com/ns/doap#
  foaf: http://xmlns.com/foaf/0.1/
  geo: http://www.opengis.net/ont/geosparql#
  odrl: http://www.w3.org/ns/odrl/2/
  org: http://www.w3.org/ns/org#
  prof: http://www.w3.org/ns/dx/prof/
  prov: http://www.w3.org/ns/prov#
  qb: http://purl.org/linked-data/cube#
  schema: https://schema.org/
  sh: http://www.w3.org/ns/shacl#
  skos: http://www.w3.org/2004/02/skos/core#
  sosa: http://www.w3.org/ns/sosa/
  ssn: http://www.w3.org/ns/ssn/
  time: http://www.w3.org/2006/time#
  vann: http://purl.org/vocab/vann/
  void: http://rdfs.org/ns/void#
  wgs: https://www.w3.org/2003/01/geo/wgs84_pos#
  owl: http://www.w3.org/2002/07/owl#
  rdf: http://www.w3.or

## 3. Load Input Data

### 3.1 Load Depression Severity Dataset

Expected format:
- CSV file with columns: `text`, `severity_label`
- `severity_label`: 0 (Minimal), 1 (Mild), 2 (Moderate), 3 (Severe)

In [42]:
# Load dataset (replace with your actual file path)
# df_dataset = pd.read_csv('depression_dataset.csv')

# For demonstration, create sample data
df_dataset = pd.DataFrame({
    'text': [
        'Feeling okay today, nothing special but managing fine',
        'Sometimes I feel a bit down but it passes quickly',
        'Having trouble getting out of bed, everything feels heavy',
        "I can't remember the last time I felt happy, constant despair",
        'Life is good, enjoying my hobbies and social activities',
        'Feel slightly sad occasionally but cope well overall',
        'Struggling to concentrate, lost interest in most things',
        'Complete hopelessness, thoughts of ending it all',
        'Pretty content with life, normal ups and downs',
        'Persistent sadness affecting work and relationships'
    ],
    'severity_label': [0, 1, 2, 3, 0, 1, 2, 3, 0, 2]
})

# Map numeric labels to severity names
severity_names = {0: 'Minimal', 1: 'Mild', 2: 'Moderate', 3: 'Severe'}
df_dataset['severity_name'] = df_dataset['severity_label'].map(severity_names)

print(f"Loaded {len(df_dataset)} samples\n")
print("Severity distribution:")
print(df_dataset['severity_name'].value_counts())
print("\nFirst 5 samples:")
df_dataset.head()

Loaded 10 samples

Severity distribution:
severity_name
Minimal     3
Moderate    3
Mild        2
Severe      2
Name: count, dtype: int64

First 5 samples:


Unnamed: 0,text,severity_label,severity_name
0,"Feeling okay today, nothing special but managi...",0,Minimal
1,Sometimes I feel a bit down but it passes quickly,1,Mild
2,"Having trouble getting out of bed, everything ...",2,Moderate
3,"I can't remember the last time I felt happy, c...",3,Severe
4,"Life is good, enjoying my hobbies and social a...",0,Minimal


### 3.2 Load ML Experiment Results

Expected format:
- `sample_id`: integer index
- `model_name`: name of the model
- `predicted_label`: 0-3 (severity level)
- `confidence`: prediction confidence (0-1)
- `true_label`: ground truth label

In [43]:
# Option 1: Load from CSV
# df_results = pd.read_csv('model_results.csv')

# Option 2: Create sample results for demonstration
np.random.seed(42)
results_data = []

for model_name in ['BERT_Depression_Classifier', 'Feature_Framework_Model']:
    for idx, row in df_dataset.iterrows():
        true_label = row['severity_label']
        # Simulate predictions (mostly correct with some errors)
        if np.random.random() > 0.15:  # 85% accuracy
            pred_label = true_label
            confidence = np.random.uniform(0.7, 0.95)
        else:
            pred_label = np.random.choice([l for l in range(4) if l != true_label])
            confidence = np.random.uniform(0.5, 0.7)
        
        results_data.append({
            'sample_id': idx,
            'model_name': model_name,
            'predicted_label': pred_label,
            'confidence': round(confidence, 2),
            'true_label': true_label
        })

df_results = pd.DataFrame(results_data)
print(f"Loaded {len(df_results)} prediction results\n")
print("Results by model:")
print(df_results.groupby('model_name').size())
print("\nFirst 10 results:")
df_results.head(10)

Loaded 20 prediction results

Results by model:
model_name
BERT_Depression_Classifier    10
Feature_Framework_Model       10
dtype: int64

First 10 results:


Unnamed: 0,sample_id,model_name,predicted_label,confidence,true_label
0,0,BERT_Depression_Classifier,0,0.94,0
1,1,BERT_Depression_Classifier,1,0.85,1
2,2,BERT_Depression_Classifier,2,0.74,2
3,3,BERT_Depression_Classifier,0,0.62,3
4,4,BERT_Depression_Classifier,0,0.71,0
5,5,BERT_Depression_Classifier,1,0.91,1
6,6,BERT_Depression_Classifier,2,0.75,2
7,7,BERT_Depression_Classifier,3,0.78,3
8,8,BERT_Depression_Classifier,0,0.81,0
9,9,BERT_Depression_Classifier,2,0.85,2


## 4. Create MIAO Depression Severity Schema

Create the research schema with:
- PHQ-9 aligned severity categories
- SKOS mappings to SNOMED CT
- Proper bibliographic citations

In [44]:
def create_depression_severity_schema(graph):
    """
    Create ordinal depression severity classification schema in MIAO format
    with SKOS mappings to SNOMED CT.
    """
    # Define schema
    schema_uri = EX.DepressionSeveritySchema_Research
    graph.add((schema_uri, RDF.type, MIAO.MentalIllnessesSchema))
    graph.add((schema_uri, DCTERMS.title, 
               Literal("Research schema for depression severity in text", lang="en")))
    graph.add((schema_uri, DCTERMS.description, 
               Literal("Ordinal taxonomy of depression severity (Minimal, Mild, Moderate, Severe) used in social media corpora and text-based computational research on depression detection. Based on PHQ-9 severity thresholds.", lang="en")))
    graph.add((schema_uri, DCTERMS.created, 
               Literal("2025-12-02", datatype=XSD.date)))
    
    # Add bibliographic source
    graph.add((schema_uri, DCTERMS.source,
               Literal("Kroenke K, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med. 2001;16(9):606-13.")))
    
    # Define categories with PHQ-9 score ranges and SNOMED CT mappings
    categories = [
        {
            'level': 0,
            'name': 'MinimalDepression',
            'label': 'Minimal Depression',
            'title': 'Minimal depression',
            'description': 'Minimal or no depressive symptoms. PHQ-9 score range: 0-4. Indicates little to no clinical significance.',
            'snomed': None  # No SNOMED mapping for minimal
        },
        {
            'level': 1,
            'name': 'MildDepression',
            'label': 'Mild Depression',
            'title': 'Mild depression',
            'description': 'Mild depressive symptoms. PHQ-9 score range: 5-9. May warrant watchful waiting and repeated assessment.',
            'snomed': '310495003'  # Mild depression (SNOMED CT)
        },
        {
            'level': 2,
            'name': 'ModerateDepression',
            'label': 'Moderate Depression',
            'title': 'Moderate depression',
            'description': 'Moderate depressive symptoms. PHQ-9 score range: 10-14. Warrants treatment plan, considering counseling, follow-up, and/or pharmacotherapy.',
            'snomed': '310496002'  # Moderate depression (SNOMED CT)
        },
        {
            'level': 3,
            'name': 'SevereDepression',
            'label': 'Severe Depression',
            'title': 'Severe depression',
            'description': 'Severe depressive symptoms. PHQ-9 score range: 15-27 (includes moderately severe 15-19 and severe 20-27). Warrants active treatment with pharmacotherapy and/or psychotherapy.',
            'snomed': '310497006'  # Severe depression (SNOMED CT)
        }
    ]
    
    category_map = {}
    
    for cat in categories:
        category_uri = EX[cat['name']]
        graph.add((category_uri, RDF.type, MIAO.MentalIllnessCategory))
        graph.add((category_uri, DCTERMS.title, Literal(cat['title'], lang="en")))
        graph.add((category_uri, DCTERMS.description, Literal(cat['description'], lang="en")))
        graph.add((category_uri, DCTERMS.identifier, Literal(cat['level'], datatype=XSD.integer)))
        graph.add((category_uri, RDFS.label, Literal(cat['label'], lang="en")))
        
        # Schema relationships
        graph.add((category_uri, MIAO.isMentalIllnessCategoryOf, schema_uri))
        graph.add((schema_uri, MIAO.hasMentalIllnessCategory, category_uri))
        
        # Add SKOS mapping to SNOMED CT if available
        if cat['snomed']:
            snomed_uri = SNOMED[cat['snomed']]
            graph.add((category_uri, SKOS.related, snomed_uri))
        
        category_map[cat['level']] = category_uri
    
    return schema_uri, category_map

schema_uri, category_map = create_depression_severity_schema(g)
print(f"Created schema: {schema_uri}")
print(f"\nCategories with SNOMED CT mappings:")
for level, uri in category_map.items():
    severity_name = {0: 'Minimal', 1: 'Mild', 2: 'Moderate', 3: 'Severe'}[level]
    print(f"  Level {level} ({severity_name}): {uri}")

Created schema: http://example.org/miao/DepressionSeveritySchema_Research

Categories with SNOMED CT mappings:
  Level 0 (Minimal): http://example.org/miao/MinimalDepression
  Level 1 (Mild): http://example.org/miao/MildDepression
  Level 2 (Moderate): http://example.org/miao/ModerateDepression
  Level 3 (Severe): http://example.org/miao/SevereDepression


## 5. Model Dataset as MIAO Dataset

Create dataset metadata with proper provenance and distribution information.

In [45]:
def create_dataset_metadata(graph, df, dataset_name="Depression_Severity_Dataset"):
    """
    Create MIAO Dataset with complete metadata.
    """
    dataset_uri = EX[dataset_name]
    
    # Dataset metadata
    graph.add((dataset_uri, RDF.type, MLS.Dataset))
    graph.add((dataset_uri, DCTERMS.title, Literal(dataset_name.replace('_', ' '))))
    graph.add((dataset_uri, DCTERMS.description, 
               Literal(f"Text dataset with {len(df)} samples annotated with depression severity labels (0-3) based on PHQ-9 criteria.")))
    graph.add((dataset_uri, DCTERMS.created, 
               Literal(datetime.now().strftime("%Y-%m-%d"), datatype=XSD.date)))
    
    # Statistical properties
    
    # Link to schema
    
    return dataset_uri

dataset_uri = create_dataset_metadata(g, df_dataset)
print(f"Created dataset: {dataset_uri}")
print(f"\nClass distribution added to RDF graph")

Created dataset: http://example.org/miao/Depression_Severity_Dataset

Class distribution added to RDF graph


## 6. Model ML Implementations and Models

Create ML model metadata with performance metrics.

In [46]:
def create_ml_models(graph):
    """
    Create ML-Schema Model instances for depression detection.
    Models represent trained ML/DL implementations.
    """
    models = {
        'BERT_Base': {
            'description': 'BERT-base fine-tuned for depression severity classification',
            'algorithm': 'Transformer (BERT)',
            'framework': 'PyTorch/Transformers'
        },
        'BiLSTM': {
            'description': 'Bidirectional LSTM with attention for sequence classification',
            'algorithm': 'Recurrent Neural Network (BiLSTM)',
            'framework': 'TensorFlow/Keras'
        },
        'LogisticRegression_TF_IDF': {
            'description': 'Logistic Regression with TF-IDF features',
            'algorithm': 'Logistic Regression',
            'framework': 'scikit-learn'
        },
        'SVM_TF_IDF': {
            'description': 'Support Vector Machine with TF-IDF features',
            'algorithm': 'SVM (RBF kernel)',
            'framework': 'scikit-learn'
        }
    }
    
    model_metadata = {}
    
    for model_name, model_info in models.items():
        # Create Model (ML-Schema)
        model_uri = EX[f"Model_{model_name}"]
        graph.add((model_uri, RDF.type, MLS.Model))
        graph.add((model_uri, DCTERMS.title, Literal(f"Depression Detection: {model_name}")))
        graph.add((model_uri, DCTERMS.description, Literal(model_info['description'])))
        
        # Create Implementation (ML-Schema)
        impl_uri = EX[f"Implementation_{model_name}"]
        graph.add((impl_uri, RDF.type, MLS.Implementation))
        graph.add((impl_uri, RDFS.label, Literal(model_info['algorithm'])))
        graph.add((impl_uri, DCTERMS.description, Literal(f"Implementation: {model_info['framework']}")))
        
        # Link Model to Implementation
        graph.add((model_uri, MLS.hasImplementation, impl_uri))
        
        # Store for later use
        model_metadata[model_name] = {
            'model_uri': model_uri,
            'impl_uri': impl_uri,
            'algorithm': model_info['algorithm']
        }
    
    return model_metadata

# Create models
model_metadata = create_ml_models(g)

print("Created ML-Schema Model instances:")
for model_name, info in model_metadata.items():
    print(f"  • {model_name}: {info['algorithm']}")
    print(f"    Model URI: {info['model_uri']}")
    print(f"    Implementation URI: {info['impl_uri']}")
    print()

Created ML-Schema Model instances:
  • BERT_Base: Transformer (BERT)
    Model URI: http://example.org/miao/Model_BERT_Base
    Implementation URI: http://example.org/miao/Implementation_BERT_Base

  • BiLSTM: Recurrent Neural Network (BiLSTM)
    Model URI: http://example.org/miao/Model_BiLSTM
    Implementation URI: http://example.org/miao/Implementation_BiLSTM

  • LogisticRegression_TF_IDF: Logistic Regression
    Model URI: http://example.org/miao/Model_LogisticRegression_TF_IDF
    Implementation URI: http://example.org/miao/Implementation_LogisticRegression_TF_IDF

  • SVM_TF_IDF: SVM (RBF kernel)
    Model URI: http://example.org/miao/Model_SVM_TF_IDF
    Implementation URI: http://example.org/miao/Implementation_SVM_TF_IDF



## 7. Create Detection Runs

Using `miao:AutomaticMentalIllnessesDetection` (subclass of `mls:Run`).

In [47]:
def create_detection_runs(graph, models, dataset_uri, schema_uri):
    """
    Create AutomaticMentalIllnessesDetection instances (REAL MIAO class).
    This class is a subclass of both mls:Run and miao:MentalIllnessesDetection.
    """
    detection_runs = {}
    
    for model_name, model_info in models.items():
        model_uri = model_info['model_uri']
        
        # Create AutomaticMentalIllnessesDetection (REAL class)
        run_uri = EX[f"Detection_{model_name}"]
        graph.add((run_uri, RDF.type, MIAO.AutomaticMentalIllnessesDetection))
        graph.add((run_uri, RDF.type, MLS.Run))  # It's also a Run
        
        # Basic metadata
        graph.add((run_uri, DCTERMS.title, Literal(f"Depression Detection Run: {model_name}")))
        graph.add((run_uri, DCTERMS.description, 
                  Literal(f"Automatic detection run using {model_name} model")))
        
        # MIAO properties (REAL)
        graph.add((run_uri, MIAO.usedMentalIllnessesSchema, schema_uri))  # REAL property
        graph.add((run_uri, MIAO.hasInputData, dataset_uri))  # REAL property
        
        # ML-Schema properties for Run
        graph.add((run_uri, MLS.implements, model_uri))  # Links to model
        graph.add((run_uri, MLS.hasInput, dataset_uri))  # Links to dataset
        
        # Create MentalIllnessesSet (REAL class) for results
        result_set_uri = EX[f"ResultSet_{model_name}"]
        graph.add((result_set_uri, RDF.type, MIAO.MentalIllnessesSet))
        graph.add((result_set_uri, RDF.type, PROV.Entity))  # PROV class
        graph.add((result_set_uri, DCTERMS.title, 
                  Literal(f"Detection Results: {model_name}")))
        
        # PROV properties (REAL)
        graph.add((run_uri, PROV.generated, result_set_uri))  # REAL property
        graph.add((result_set_uri, PROV.wasGeneratedBy, run_uri))  # REAL property
        
        detection_runs[model_name] = {
            'run_uri': run_uri,
            'result_set_uri': result_set_uri
        }
    
    return detection_runs

# Create detection runs
detection_metadata = create_detection_runs(g, model_metadata, dataset_uri, schema_uri)

print("Created AutomaticMentalIllnessesDetection instances:")
for model_name, info in detection_metadata.items():
    print(f"  • {model_name}: {info['run_uri']}")
    print(f"    → Generated: {info['result_set_uri']}")

Created AutomaticMentalIllnessesDetection instances:
  • BERT_Base: http://example.org/miao/Detection_BERT_Base
    → Generated: http://example.org/miao/ResultSet_BERT_Base
  • BiLSTM: http://example.org/miao/Detection_BiLSTM
    → Generated: http://example.org/miao/ResultSet_BiLSTM
  • LogisticRegression_TF_IDF: http://example.org/miao/Detection_LogisticRegression_TF_IDF
    → Generated: http://example.org/miao/ResultSet_LogisticRegression_TF_IDF
  • SVM_TF_IDF: http://example.org/miao/Detection_SVM_TF_IDF
    → Generated: http://example.org/miao/ResultSet_SVM_TF_IDF


## 8. Model Evaluation (ML-Schema)

Using `mls:ModelEvaluation` and `mls:EvaluationMeasure`.

In [48]:
def create_model_evaluation(graph, model_name, metrics_dict):
    """
    Create ModelEvaluation using ML-Schema (NOT MIAO invented classes).
    """
    model_uri = model_metadata[model_name]['model_uri']
    
    # Create ModelEvaluation (ML-Schema class)
    eval_uri = EX[f"Evaluation_{model_name}"]
    graph.add((eval_uri, RDF.type, MLS.ModelEvaluation))
    graph.add((eval_uri, DCTERMS.title, Literal(f"Evaluation: {model_name}")))
    graph.add((eval_uri, MLS.evaluates, model_uri))  # ML-Schema property
    
    # Create EvaluationMeasures for each metric
    for metric_name, value in metrics_dict.items():
        measure_uri = EX[f"{model_name}_Metric_{metric_name}"]
        graph.add((measure_uri, RDF.type, MLS.EvaluationMeasure))
        graph.add((measure_uri, RDFS.label, Literal(metric_name)))
        graph.add((measure_uri, MLS.hasValue, Literal(value, datatype=XSD.float)))
        
        # Link measure to evaluation
        graph.add((eval_uri, MLS.specifiedBy, measure_uri))  # ML-Schema property
        
        # Also link to model quality
        graph.add((model_uri, MLS.hasQuality, measure_uri))
    
    return eval_uri

# Create evaluations for all models
print("Creating ModelEvaluation instances (ML-Schema):")
for model_name in model_metadata.keys():
    # Simulated metrics (replace with real results)
    metrics = {
        'accuracy': 0.85,
        'precision': 0.83,
        'recall': 0.87,
        'f1_score': 0.85
    }
    eval_uri = create_model_evaluation(g, model_name, metrics)
    print(f"  • {model_name}: {eval_uri}")

Creating ModelEvaluation instances (ML-Schema):
  • BERT_Base: http://example.org/miao/Evaluation_BERT_Base
  • BiLSTM: http://example.org/miao/Evaluation_BiLSTM
  • LogisticRegression_TF_IDF: http://example.org/miao/Evaluation_LogisticRegression_TF_IDF
  • SVM_TF_IDF: http://example.org/miao/Evaluation_SVM_TF_IDF


## 9. Features and Data Characteristics (ML-Schema)

Using `mls:Feature` and `mls:DataCharacteristic` classes.

In [49]:
def create_dataset_features(graph, dataset_uri):
    """Create ML-Schema Features for depression detection."""
    features = [
        {
            'name': 'TextContent',
            'description': 'Raw text content from social media posts or clinical notes',
            'datatype': 'string'
        },
        {
            'name': 'TextLength',
            'description': 'Character length of the text content',
            'datatype': 'integer'
        },
        {
            'name': 'SentimentScore',
            'description': 'Computed sentiment polarity score',
            'datatype': 'float'
        },
        {
            'name': 'EmotionalIntensity',
            'description': 'Intensity of emotional expression in text',
            'datatype': 'float'
        }
    ]
    
    for feat in features:
        feature_uri = EX[f"Feature_{feat['name']}"]
        graph.add((feature_uri, RDF.type, MLS.Feature))
        graph.add((feature_uri, RDFS.label, Literal(feat['name'])))
        graph.add((feature_uri, DCTERMS.description, Literal(feat['description'])))
        graph.add((feature_uri, MLS.hasDataType, Literal(feat['datatype'])))
        graph.add((dataset_uri, MLS.hasFeature, feature_uri))
    
    return len(features)

# Create features
num_features = create_dataset_features(g, dataset_uri)
print(f"Created {num_features} ML-Schema Features")

Created 4 ML-Schema Features


In [50]:
def create_data_characteristics(graph, dataset_uri, df):
    """Create ML-Schema DataCharacteristics."""
    characteristics = [
        {
            'name': 'NumberOfInstances',
            'value': len(df),
            'description': 'Total number of samples in dataset'
        },
        {
            'name': 'NumberOfClasses',
            'value': 4,  # Depression severity levels
            'description': 'Number of depression severity classes'
        },
        {
            'name': 'ClassBalance',
            'value': round(df['severity_label'].value_counts().std() / df['severity_label'].value_counts().mean(), 3),
            'description': 'Class distribution balance coefficient'
        },
        {
            'name': 'AverageTextLength',
            'value': round(df['text'].str.len().mean(), 1),
            'description': 'Average character length of text'
        }
    ]
    
    for char in characteristics:
        char_uri = EX[f"DataChar_{char['name']}"]
        graph.add((char_uri, RDF.type, MLS.DataCharacteristic))
        graph.add((char_uri, RDFS.label, Literal(char['name'])))
        graph.add((char_uri, DCTERMS.description, Literal(char['description'])))
        graph.add((char_uri, MLS.hasValue, Literal(char['value'], datatype=XSD.float)))
        graph.add((dataset_uri, MLS.hasQuality, char_uri))
    
    return len(characteristics)

# Create data characteristics
num_chars = create_data_characteristics(g, dataset_uri, df_dataset)
print(f"Created {num_chars} ML-Schema DataCharacteristics")

Created 4 ML-Schema DataCharacteristics


In [51]:
def create_model_hyperparameters(graph, model_name):
    """Create ML-Schema HyperParameters."""
    model_uri = model_metadata[model_name]['model_uri']
    
    if 'BERT' in model_name:
        hyperparams = [
            {'name': 'learning_rate', 'value': 2e-5, 'description': 'Learning rate'},
            {'name': 'batch_size', 'value': 16, 'description': 'Training batch size'},
            {'name': 'num_epochs', 'value': 3, 'description': 'Number of epochs'},
            {'name': 'max_seq_length', 'value': 512, 'description': 'Max sequence length'},
            {'name': 'dropout_rate', 'value': 0.1, 'description': 'Dropout probability'}
        ]
    else:
        hyperparams = [
            {'name': 'max_features', 'value': 10000, 'description': 'Max vocabulary size'},
            {'name': 'ngram_range', 'value': '(1,2)', 'description': 'N-gram range'},
            {'name': 'min_df', 'value': 5, 'description': 'Min document frequency'}
        ]
    
    for hp in hyperparams:
        hp_uri = EX[f"HyperParam_{hp['name']}"]
        graph.add((hp_uri, RDF.type, MLS.HyperParameter))
        graph.add((hp_uri, RDFS.label, Literal(hp['name'])))
        graph.add((hp_uri, DCTERMS.description, Literal(hp['description'])))
        
        setting_uri = EX[f"{model_name}_Setting_{hp['name']}"]
        graph.add((setting_uri, RDF.type, MLS.HyperParameterSetting))
        graph.add((setting_uri, MLS.specifiedBy, hp_uri))
        
        if isinstance(hp['value'], str):
            graph.add((setting_uri, MLS.hasValue, Literal(hp['value'])))
        elif isinstance(hp['value'], int):
            graph.add((setting_uri, MLS.hasValue, Literal(hp['value'], datatype=XSD.integer)))
        else:
            graph.add((setting_uri, MLS.hasValue, Literal(hp['value'], datatype=XSD.float)))
        
        graph.add((model_uri, MLS.hasHyperParameter, setting_uri))
    
    return len(hyperparams)

# Create hyperparameters
print("Creating HyperParameters (ML-Schema):")
for model_name in model_metadata.keys():
    count = create_model_hyperparameters(g, model_name)
    print(f"  • {model_name}: {count} hyperparameters")

Creating HyperParameters (ML-Schema):
  • BERT_Base: 5 hyperparameters
  • BiLSTM: 3 hyperparameters
  • LogisticRegression_TF_IDF: 3 hyperparameters
  • SVM_TF_IDF: 3 hyperparameters


## 10. SPARQL Queries

Queries using MIAO + ML-Schema classes.

In [52]:
# Query 1: All Mental Illness Categories
query1 = """
PREFIX miao: <https://w3id.org/miao#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT ?category ?title ?description ?snomed_code
WHERE {
    ?category a miao:MentalIllnessCategory ;
              dcterms:title ?title ;
              dcterms:description ?description .
    
    OPTIONAL { ?category skos:related ?snomed_code }
}
ORDER BY ?title
"""

print("Query 1: Mental Illness Categories")
print("="*60)
results1 = g.query(query1)
for row in results1:
    print(f"\nCategory: {row.title}")
    print(f"  Description: {row.description}")
    if row.snomed_code:
        print(f"  SNOMED CT: {row.snomed_code}")

Query 1: Mental Illness Categories

Category: Mild depression
  Description: Mild depressive symptoms. PHQ-9 score range: 5-9. May warrant watchful waiting and repeated assessment.
  SNOMED CT: http://snomed.info/id/310495003

Category: Minimal depression
  Description: Minimal or no depressive symptoms. PHQ-9 score range: 0-4. Indicates little to no clinical significance.

Category: Moderate depression
  Description: Moderate depressive symptoms. PHQ-9 score range: 10-14. Warrants treatment plan, considering counseling, follow-up, and/or pharmacotherapy.
  SNOMED CT: http://snomed.info/id/310496002

Category: Severe depression
  Description: Severe depressive symptoms. PHQ-9 score range: 15-27 (includes moderately severe 15-19 and severe 20-27). Warrants active treatment with pharmacotherapy and/or psychotherapy.
  SNOMED CT: http://snomed.info/id/310497006


In [53]:
# Query 2: Models with Performance Metrics (ML-Schema)
query2 = """
PREFIX mls: <http://www.w3.org/ns/mls#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?model_title ?metric_name ?value
WHERE {
    ?model a mls:Model ;
           dcterms:title ?model_title ;
           mls:hasQuality ?metric .
    
    ?metric rdfs:label ?metric_name ;
            mls:hasValue ?value .
}
ORDER BY ?model_title ?metric_name
"""

print("\n\nQuery 2: Model Performance Metrics")
print("="*60)
results2 = g.query(query2)
current_model = None
for row in results2:
    if current_model != str(row.model_title):
        if current_model is not None:
            print()
        current_model = str(row.model_title)
        print(f"\nModel: {current_model}")
    print(f"  {row.metric_name}: {float(row.value):.4f}")



Query 2: Model Performance Metrics

Model: Depression Detection: BERT_Base
  accuracy: 0.8500
  f1_score: 0.8500
  precision: 0.8300
  recall: 0.8700


Model: Depression Detection: BiLSTM
  accuracy: 0.8500
  f1_score: 0.8500
  precision: 0.8300
  recall: 0.8700


Model: Depression Detection: LogisticRegression_TF_IDF
  accuracy: 0.8500
  f1_score: 0.8500
  precision: 0.8300
  recall: 0.8700


Model: Depression Detection: SVM_TF_IDF
  accuracy: 0.8500
  f1_score: 0.8500
  precision: 0.8300
  recall: 0.8700


In [54]:
# Query 3: Automatic Detection Runs (REAL MIAO class)
query3 = """
PREFIX miao: <https://w3id.org/miao#>
PREFIX mls: <http://www.w3.org/ns/mls#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX prov: <http://www.w3.org/ns/prov#>

SELECT ?run ?run_title ?schema_title ?model_title ?result_set
WHERE {
    # AutomaticMentalIllnessesDetection (REAL class)
    ?run a miao:AutomaticMentalIllnessesDetection ;
         dcterms:title ?run_title ;
         miao:usedMentalIllnessesSchema ?schema ;
         mls:implements ?model ;
         prov:generated ?result_set .
    
    ?schema dcterms:title ?schema_title .
    ?model dcterms:title ?model_title .
}
ORDER BY ?run_title
"""

print("\n\nQuery 3: Automatic Detection Runs (REAL MIAO class)")
print("="*60)
results3 = g.query(query3)
for row in results3:
    print(f"\nRun: {row.run_title}")
    print(f"  Schema: {row.schema_title}")
    print(f"  Model: {row.model_title}")
    print(f"  Generated: {row.result_set}")



Query 3: Automatic Detection Runs (REAL MIAO class)

Run: Depression Detection Run: BERT_Base
  Schema: Research schema for depression severity in text
  Model: Depression Detection: BERT_Base
  Generated: http://example.org/miao/ResultSet_BERT_Base

Run: Depression Detection Run: BiLSTM
  Schema: Research schema for depression severity in text
  Model: Depression Detection: BiLSTM
  Generated: http://example.org/miao/ResultSet_BiLSTM

Run: Depression Detection Run: LogisticRegression_TF_IDF
  Schema: Research schema for depression severity in text
  Model: Depression Detection: LogisticRegression_TF_IDF
  Generated: http://example.org/miao/ResultSet_LogisticRegression_TF_IDF

Run: Depression Detection Run: SVM_TF_IDF
  Schema: Research schema for depression severity in text
  Model: Depression Detection: SVM_TF_IDF
  Generated: http://example.org/miao/ResultSet_SVM_TF_IDF


In [55]:
# Query 4: Dataset Features and Model Hyperparameters
query4 = """
PREFIX mls: <http://www.w3.org/ns/mls#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?type ?name ?description ?value
WHERE {
    {
        # Features
        ?feature a mls:Feature ;
                rdfs:label ?name ;
                dcterms:description ?description ;
                mls:hasDataType ?value .
        BIND("Feature" AS ?type)
    }
    UNION
    {
        # Data Characteristics
        ?char a mls:DataCharacteristic ;
              rdfs:label ?name ;
              dcterms:description ?description ;
              mls:hasValue ?value .
        BIND("DataCharacteristic" AS ?type)
    }
    UNION
    {
        # HyperParameters
        ?setting a mls:HyperParameterSetting ;
                 mls:specifiedBy ?hp ;
                 mls:hasValue ?value .
        ?hp rdfs:label ?name ;
            dcterms:description ?description .
        BIND("HyperParameter" AS ?type)
    }
}
ORDER BY ?type ?name
"""

print("\n\nQuery 4: Features, Characteristics, and Hyperparameters")
print("="*60)
results4 = g.query(query4)
current_type = None
for row in results4:
    if current_type != str(row.type):
        if current_type is not None:
            print()
        current_type = str(row.type)
        print(f"\n{current_type}:")
    print(f"  {row.name}: {row.value}")



Query 4: Features, Characteristics, and Hyperparameters

DataCharacteristic:
  AverageTextLength: 52.7
  ClassBalance: 0.231
  NumberOfClasses: 4
  NumberOfInstances: 10


Feature:
  EmotionalIntensity: float
  SentimentScore: float
  TextContent: string
  TextLength: integer


HyperParameter:
  batch_size: 16
  dropout_rate: 0.1
  learning_rate: 2e-05
  max_features: 10000
  max_features: 10000
  max_features: 10000
  max_seq_length: 512
  min_df: 5
  min_df: 5
  min_df: 5
  ngram_range: (1,2)
  ngram_range: (1,2)
  ngram_range: (1,2)
  num_epochs: 3


## 11. Validation Report

In [56]:
# Validation Report using REAL MIAO classes
print("\n" + "="*60)
print("MIAO + ML-SCHEMA VALIDATION REPORT")
print("="*60)

# Check 1: Schema has categories
query_check1 = """
PREFIX miao: <https://w3id.org/miao#>
SELECT (COUNT(?category) as ?count)
WHERE {
    ?schema a miao:MentalIllnessesSchema ;
            miao:hasMentalIllnessCategory ?category .
}
"""
result = list(g.query(query_check1))[0]
print(f"\n✓ Schema has {int(result['count'])} categories")

# Check 2: AutomaticMentalIllnessesDetection instances
query_check2 = """
PREFIX miao: <https://w3id.org/miao#>
SELECT (COUNT(?detection) as ?count)
WHERE {
    ?detection a miao:AutomaticMentalIllnessesDetection .
}
"""
result = list(g.query(query_check2))[0]
print(f"✓ {int(result['count'])} AutomaticMentalIllnessesDetection instances")

# Check 3: Detection uses schema
query_check3 = """
PREFIX miao: <https://w3id.org/miao#>
SELECT (COUNT(?detection) as ?count)
WHERE {
    ?detection a miao:AutomaticMentalIllnessesDetection ;
               miao:usedMentalIllnessesSchema ?schema .
}
"""
result = list(g.query(query_check3))[0]
print(f"✓ {int(result['count'])} detections use schema")

# Check 4: MentalIllnessesSet generated
query_check4 = """
PREFIX miao: <https://w3id.org/miao#>
PREFIX prov: <http://www.w3.org/ns/prov#>
SELECT (COUNT(?set) as ?count)
WHERE {
    ?set a miao:MentalIllnessesSet ;
         prov:wasGeneratedBy ?detection .
}
"""
result = list(g.query(query_check4))[0]
print(f"✓ {int(result['count'])} MentalIllnessesSet generated")

# Check 5: ML-Schema Features
query_check5 = """
PREFIX mls: <http://www.w3.org/ns/mls#>
SELECT (COUNT(?feature) as ?count)
WHERE {
    ?feature a mls:Feature .
}
"""
result = list(g.query(query_check5))[0]
print(f"\n✓ {int(result['count'])} ML-Schema Features defined")

# Check 6: ML-Schema DataCharacteristics
query_check6 = """
PREFIX mls: <http://www.w3.org/ns/mls#>
SELECT (COUNT(?char) as ?count)
WHERE {
    ?char a mls:DataCharacteristic .
}
"""
result = list(g.query(query_check6))[0]
print(f"✓ {int(result['count'])} ML-Schema DataCharacteristics")

# Check 7: ML-Schema ModelEvaluation
query_check7 = """
PREFIX mls: <http://www.w3.org/ns/mls#>
SELECT (COUNT(?eval) as ?count)
WHERE {
    ?eval a mls:ModelEvaluation .
}
"""
result = list(g.query(query_check7))[0]
print(f"✓ {int(result['count'])} ML-Schema ModelEvaluation instances")

# Check 8: ML-Schema EvaluationMeasures
query_check8 = """
PREFIX mls: <http://www.w3.org/ns/mls#>
SELECT (COUNT(?measure) as ?count)
WHERE {
    ?measure a mls:EvaluationMeasure .
}
"""
result = list(g.query(query_check8))[0]
print(f"✓ {int(result['count'])} ML-Schema EvaluationMeasures")

print("\n" + "="*60)
print("VALIDATION COMPLETE")
print("="*60)
print(f"\nTotal RDF triples: {len(g)}")
print("\nUsing classes from MIAO + ML-Schema:")
print("  MIAO: AutomaticMentalIllnessesDetection, MentalIllnessesSchema,")
print("        MentalIllnessCategory, MentalIllnessesSet, MentalIllness")
print("  ML-Schema: Model, Dataset, ModelEvaluation, EvaluationMeasure,")
print("             Feature, DataCharacteristic, HyperParameter")
print("  PROV: generated, wasGeneratedBy, Activity, Entity")




MIAO + ML-SCHEMA VALIDATION REPORT

✓ Schema has 4 categories
✓ 4 AutomaticMentalIllnessesDetection instances
✓ 4 detections use schema
✓ 4 MentalIllnessesSet generated

✓ 4 ML-Schema Features defined
✓ 4 ML-Schema DataCharacteristics
✓ 4 ML-Schema ModelEvaluation instances
✓ 16 ML-Schema EvaluationMeasures

VALIDATION COMPLETE

Total RDF triples: 332

Using classes from MIAO + ML-Schema:
  MIAO: AutomaticMentalIllnessesDetection, MentalIllnessesSchema,
        MentalIllnessCategory, MentalIllnessesSet, MentalIllness
  ML-Schema: Model, Dataset, ModelEvaluation, EvaluationMeasure,
             Feature, DataCharacteristic, HyperParameter
  PROV: generated, wasGeneratedBy, Activity, Entity


## 12. Export RDF

In [57]:
# Export to different RDF formats
g.serialize(destination='03-depression-detection.ttl', format='turtle')
#g.serialize(destination='depression_detection_miao.rdf', format='xml')
#g.serialize(destination='depression_detection_miao.jsonld', format='json-ld')

print("\n" + "="*60)
print("EXPORT COMPLETE")
print("="*60)



EXPORT COMPLETE
