# Client API Tutorial

This notebook demonstrates how to use the `MedCATTrainerSession` class to interact with the MedCATTrainer API. We'll cover:

1. Setting up a MedCATTrainer session
2. Exploring available resources (users, datasets, models)
3. Creating new resources (datasets, models, users)
4. Creating annotation projects with different approaches
5. Downloading and saving annotations

These steps provide a complete workflow for programmatically managing medical text annotation projects with MedCATTrainer.

<u>__SETUP:__</u>

You need to have [MedCATtrainer service running locally](http://localhost:8001/)

The default credentials when setup is:

```bash
username: admin
password: admin
```

The administrative console can be found here: http://localhost:8001/admin/

Within this admin console is where you can manually interact the the MedCATtrainer program and setup projects




## 1. Setup and Authentication

First, let's import the necessary classes and set up our session:

In [5]:
import os
import json
import sys
sys.path.append('../client')
from mctclient import MedCATTrainerSession, MCTDataset, MCTConceptDB, MCTVocab, MCTModelPack, MCTUser, MCTProject

In [6]:
# Initialize the session

# Set environment variables for authentication, These are default and are optional.
os.environ['MCTRAINER_USERNAME'] = 'admin'
os.environ['MCTRAINER_PASSWORD'] = 'admin'
mct_server = 'http://localhost:8001' # Default server is http://localhost:8001 if not specified
# session = MedCATTrainerSession()

# Initialize the session and change explicit arguements if required.
session = MedCATTrainerSession(server=mct_server, username='admin', password='admin') # Wrapper for the MedCATTrainer API.

## 2. Explore Available Resources

Let's check what resources are already available in the MedCATTrainer instance:

In [7]:
# Get users
users = session.get_users()
print("Users:")
for user in users:
    print(user)
print()

# Get datasets
datasets = session.get_datasets()
print("Datasets:")
for dataset in datasets:
    print(dataset)
print()

# Get concept databases and vocabularies
concept_dbs, vocabs = session.get_models()
print("Concept DBs:")
for cdb in concept_dbs:
    print(cdb)
print()
print("Vocabularies:")
for vocab in vocabs:
    print(vocab)
print()

# Get modelpacks
model_packs = session.get_model_packs()
print("ModelPacks:")
for model_pack in model_packs:
    print(model_pack)
print()

# Get meta tasks
meta_tasks = session.get_meta_tasks()
print("Meta Tasks:")
for i, task in enumerate(meta_tasks):
    print(f"{i+1} : {task.name}")
print()

# Get relation tasks
rel_tasks = session.get_rel_tasks()
print("Relation Tasks:")
for i, task in enumerate(rel_tasks):
    print(f"{i+1} : {task.name}")

Users:
3 : annotator2
2 : annotator1
1 : admin

Datasets:
1 : Example Dataset 	 http://localhost:8001/media/Example_Dataset.csv
2 : Neurology Notes 	 http://localhost:8001/media/neurology_notes.csv
3 : SG-example-docs 	 http://localhost:8001/media/sg-sample-docs.csv

Concept DBs:
1 : umls_cdb 	 http://localhost:8001/media/cdb.dat
2 : snomed_cdb 	 http://localhost:8001/media/snomed-cdb.dat
3 : snomed_2022_modelpack_CDB 	 http://localhost:8001/media/Users/k1897038/projects/MedCATtrainer/webapp/api/media/20230227__kch_gstt_trained_model_494c3717f637bb89/cdb.dat
8 : medcat_full_pack_CDB 	 http://localhost:8001/media/Users/k1897038/projects/MedCATtrainer/webapp/api/media/medcat_model_pack_u3fB9G5/cdb.dat
12 : snomed-2023-bert-metacats_CDB 	 http://localhost:8001/media/Users/k1897038/projects/MedCATtrainer/webapp/api/media/20230227__kch_gstt_trained_model_bert_metacats_138689a7bb83cb0a/cdb.dat
13 : de_id_modelpack_CDB 	 http://localhost:8001/media/Users/k1897038/projects/MedCATtrainer/webapp

## 3. Upload new resources to MedCATtrainer

Before we create a project we need to create and upload all the required resources. We'll start with a dataset:


In [None]:
# Create a new dataset to be annotated.
neurology_dataset = session.create_dataset(
    name="Neurology Notes",  # Names must be unique
    dataset_file="./example_data/neuro.csv"  # This csv should have atleast these 2 columns. ["name", "text"]
)
print(f"Created dataset: {neurology_dataset}")

### 3.1 Creating MedCAT Models

We have two options for creating models:

1. Upload separate CDB and Vocab files
2. Upload a complete model pack ZIP

Let's explore both approaches:

In [None]:
# If you don't have these medcat components or modelpack. You can download an example here:
# Download vocab.dat
!wget -O ./example_data/vocab.dat https://cogstack-medcat-example-models.s3.eu-west-2.amazonaws.com/medcat-example-models/vocab.dat
# Download snomed-cdb-mc-v1.cdb
!wget -O ./example_data/snomed-cdb-mc-v1.cdb https://cogstack-medcat-example-models.s3.eu-west-2.amazonaws.com/medcat-example-models/snomed-cdb-mc-v1.cdb
# Download model pack (this is a zip file)
!wget -O ./example_data/medcat_model_pack.zip https://cogstack-medcat-example-models.s3.eu-west-2.amazonaws.com/medcat-example-models/medcat_model_pack_c4e0d25701ce4e88.zip

# Otherwise Skip this

In [None]:
# Option 1: Upload separate CDB and Vocab files
example_cdb = MCTConceptDB(name="example_cdbv1", conceptdb_file="./example_data/snomed-cdb-mc-v1.cdb")
example_vocab = MCTVocab(name="example_vocabv2", vocab_file="./example_data/vocab.dat")

# Create the model in the MedCATTrainer instance
cdb, vocab = session.create_medcat_model(example_cdb, example_vocab)
print(f"Created CDB: {cdb}")
print(f"Created Vocab: {vocab}")

In [None]:
# Option 2: Upload a complete modelpack ZIP
# This contains CDB, Vocab, and potentially MetaCAT and RelCAT models
medcat_model_pack = MCTModelPack(
    name="medcat_full_pack",
    model_pack_zip="./medcat_model_pack.zip"
)
session.create_medcat_model_pack(medcat_model_pack)
print(f"Created model pack: {medcat_model_pack}")

### 3.2 Creating a New User

If we need to add an annotator to our project:

In [None]:
new_user = session.create_user(username="annotator1", password="secure_password")
print(f"Created user: {new_user}")

## 4. Creating Annotation Projects

Now we can create annotation projects using our resources:

But first, Let's check again what resources are now available in the MedCATTrainer instance after Part 3:

In [None]:
# Get users
users = session.get_users()
print("Users:")
for user in users:
    print(user)
print()

# Get datasets
datasets = session.get_datasets()
print("Datasets:")
for dataset in datasets:
    print(dataset)
print()

# Get concept databases and vocabularies
concept_dbs, vocabs = session.get_models()
print("Concept DBs:")
for cdb in concept_dbs:
    print(cdb)
print()
print("Vocabularies:")
for vocab in vocabs:
    print(vocab)
print()

# Get modelpacks
model_packs = session.get_model_packs()
print("ModelPacks:")
for model_pack in model_packs:
    print(model_pack)
print()

# Get meta tasks
meta_tasks = session.get_meta_tasks()
print("Meta Tasks:")
for i, task in enumerate(meta_tasks):
    print(f"{i+1} : {task.name}")
print()

# Get relation tasks
rel_tasks = session.get_rel_tasks()
print("Relation Tasks:")
for i, task in enumerate(rel_tasks):
    print(f"{i+1} : {task.name}")


In [None]:
# Method 1: Create a project with separate CDB and Vocab
neuro_project = session.create_project(
    name="Neurology Annotation Project",
    description="Demo annotation project of neurology conditions, epilepsy & seizure",
    members=[user for user in users],  # Add all users...
    dataset=datasets[-1],
    concept_db=concept_dbs[-1],
    vocab=vocabs[-1],
    cuis=["84757009", "91175000"],  # Whitelist Filter CUIs/concepts
    #meta_tasks=["Temporality", "Certainty"],  # Can specify by name or by object
    #rel_tasks=["Has_Finding"] # only add this relational extraction task if absolutely required
)

print(f"Created project: {neuro_project}")

In [None]:
# Method 2: Create a project with a modelpack

# Rerun the explore resources to run the following code:
general_project = session.create_project(
    name="Demo General Medical Annotation",
    description="Annotation of neurology medical conditions",
    members=[user for user in users],  # All users
    dataset=datasets[-1],  # Use existing dataset
    modelpack=model_packs[-1],  # Use existing model pack
    # cuis_file="./resources/mct_filter.json",  # Load whitelist concepts from a file ["concept1", "concept2"]
)

print(f"Created project with model pack: {general_project}")

## 5. Retrieving Project Annotations

After annotators have worked on the projects, we can download the annotations:

In [None]:
# Get all projects
mct_projects = session.get_projects()

# Download annotations for all projects
projects = session.get_project_annos(mct_projects)

print(f"Downloaded annotations for {len(mct_projects)} projects:")
for p in projects['projects']:
    print(p['name'])

In [None]:
# Inspect all details from a single export
projects['projects'][0]

## 6. Saving Annotations for Analysis

Finally, let's save the annotations to a file for later analysis:

In [None]:
# Save MCT export / annotations to a file
with open("./example_data/medical_annotations.json", "w") as f:
    json.dump(projects, f, indent=2)

print("Annotations saved to ./example_data/medical_annotations.json")

# End of Tutorial