# MedCATTrainer API Examples
The following notebook details the how to programmatically upload data, permission users, and create projects to setup users for large distributed annotation projects.
- Create Datasets in MedCATTrainer
- Create CDB and Vocab MedCAT models in MedCATTrainer
- Create Projects in MedCATTainer

In [1]:
import requests
import json
import pandas as pd
from pprint import pprint

In [2]:
URL = 'http://localhost:8001' # Should be set to your running deployment, IP / PORT if not running on localhost:8001

## Sample Dataset
Sample data from [MT-Samples](https://www.mtsamples.com/), a subset of this dataset is available here under example_data/*.csv

We'll be working with 3 datasets, the below guide works with 3 datasets, but can use function with 100s if needed.

In [3]:
ortho_notes = pd.read_csv('example_data/ortho.csv')
neuro_notes = pd.read_csv('example_data/neuro.csv')
cardio_notes = pd.read_csv('example_data/cardio.csv')

## Accessing the MedCATTrainer API
API access is via a username / password. Upon login the API auth endpoint provides an auth token that must be used for all following requests.

In [18]:
payload = {"username": "test", "password": "foobar312$"}
headers = {
    'Authorization': f'Token {json.loads(requests.post("http://localhost:8001/api/api-token-auth/", json=payload).text)["token"]}',
}
headers

{'Authorization': 'Token dece457ae382482d9edc999fdbf37dd39ea2893c'}

### Resource APIs 
The MedCAT API, follows a RESTful architecture. Objects created, updated, deleted under their respective resource path

In [19]:
json.loads(requests.get(f'{URL}/api/', headers=headers).text)

{'users': 'http://localhost:8001/api/users/',
 'entities': 'http://localhost:8001/api/entities/',
 'project-annotate-entities': 'http://localhost:8001/api/project-annotate-entities/',
 'project-groups': 'http://localhost:8001/api/project-groups/',
 'documents': 'http://localhost:8001/api/documents/',
 'annotated-entities': 'http://localhost:8001/api/annotated-entities/',
 'meta-annotations': 'http://localhost:8001/api/meta-annotations/',
 'meta-tasks': 'http://localhost:8001/api/meta-tasks/',
 'meta-task-values': 'http://localhost:8001/api/meta-task-values/',
 'relations': 'http://localhost:8001/api/relations/',
 'entity-relations': 'http://localhost:8001/api/entity-relations/',
 'concept-dbs': 'http://localhost:8001/api/concept-dbs/',
 'vocabs': 'http://localhost:8001/api/vocabs/',
 'datasets': 'http://localhost:8001/api/datasets/'}

### Create Datasets
A MedCATTrainer 'Dataset' is a set of documents that is uploaded into the trainer and used for one or more, annotation projects. 
The trainer interface accepts CSV / XLSX files, files have 2 columns namely, **name** and **text**. 

An example DataFrame for this format are shown below. 

The below API can be used to upload and create multiple datasets, one for each example DataFrame

In [6]:
# Add a name column to the other datasets
ortho_notes['name'] = ortho_notes.subject_id.apply(lambda l: f'Subject {l}')
neuro_notes['name'] = neuro_notes.subject_id.apply(lambda l: f'Subject {l}')
cardio_notes['name'] = cardio_notes.subject_id.apply(lambda l: f'Subject {l}')
ortho_notes.loc[:, ['name', 'text']].head(3)

AttributeError: 'DataFrame' object has no attribute 'subject_id'

In [7]:
# Collate datasets, with Names
datasets = [('Neuro Notes', neuro_notes), ('Cardio Notes', cardio_notes), ('Ortho Notes', ortho_notes)]

In [None]:
# POST dataset API with list of datasets
dataset_ids = []
for name, d_s in datasets:
    payload = {
        'dataset_name': name,   # Name that appears in each
        'dataset': d_s.loc[:, ['name', 'text']].to_dict(),  #  Dictionary representation of only
        'description': f'{name} first 20 notes from each category' # Description that appears in the trainer
    }
    resp = requests.post(f'{URL}/api/create-dataset/', json=payload, headers=headers)
    dataset_ids.append(json.loads(resp.text)['dataset_id'])
# New datasets created in the trainer have the following IDs
dataset_ids

### Create CDBs and Vocabularies
The MedCAT models used by MedCATTrainer are output by MedCAT instances of classesmedcat.cdb.CDB, medcat.utils.vocab.Vocabulary. Calling save_dict('\<file location\>') will write a file that can be loaded in another instance of MedCAT (via load_dict()), or within MedCATTrainer.

Examples models are provided on the MedCAT Repository: https://github.com/CogStack/MedCAT

#### Upload a CDB

In [10]:
!ls -l ../../medcat-models/cdb-medmen.dat

total 4020900
drwxrwxr-x 6 cerberus cerberus      4096 Jan 17  2024 20230227__kch_gstt_trained_model_494c3717f637bb89
-rw-rw-r-- 1 cerberus cerberus 954511008 Jan 17  2024 20230227__kch_gstt_trained_model_494c3717f637bb89.zip
-rw-r--r-- 1 cerberus cerberus 392429568 Jan  5  2024 cdb.dat
-rw-r--r-- 1 cerberus cerberus 607359360 Jun  7  2023 cdb-medmen.dat
drwxrwxr-x 4 cerberus cerberus      4096 Feb  8  2024 deid_medcat_n2c2_modelpack
-rw-r--r-- 1 cerberus cerberus 487520245 Feb  8  2024 deid_medcat_n2c2_modelpack.zip
-rw-r--r-- 1 cerberus cerberus    181988 Jan  5  2024 icd_11_cbd_utf8.dat
drwxrwxr-x 6 cerberus cerberus      4096 Feb 22  2024 medcat_model_pack_v1.4.0
-rw-rw-r-- 1 cerberus cerberus 841304525 Jan  5  2024 medcat_model_pack_v1.4.0.zip
-rw-r--r-- 1 cerberus cerberus       223 Jan  5  2024 pain_custom_cdb.dat
-rw-r--r-- 1 cerberus cerberus     25599 Jan  5  2024 pain_term.dat
-rw-r--r-- 1 cerberus cerberus    772635 Jan  5  2024 rxnorm_analgesic_cbd_utf8.dat
-rw-r--r-- 1 ce

In [None]:
from medcat.cdb import CDB

In [None]:
cdb = CDB.load('../../medcat-models/deid_medcat_n2c2_modelpack/cdb.dat')

<medcat.cdb.CDB at 0x704e66be4050>

In [15]:
txt = json.loads(requests.post(f'{URL}/api/concept-dbs/', headers=headers,
                               data={'name': 'example_cdb', 'use_for_training': True},
                               files={'cdb_file': open('../../medcat-models/deid_medcat_n2c2_modelpack/cdb.dat', 'rb')}).text)

In [20]:
txt

{'id': 21,
 'name': 'example_cdb',
 'cdb_file': 'http://localhost:8001/media/cdb_HjbPPUI.dat',
 'use_for_training': True,
 'create_time': '2024-09-20T21:07:03.062580Z',
 'last_modified': '2024-09-20T21:07:03.062610Z',
 'last_modified_by': 1}

In [22]:
txt = json.loads(requests.put(f'{URL}/api/concept-dbs/21/', headers=headers,
                               data={'name': 'example_cdb-EDITED', 'use_for_training': True},
                               files={'cdb_file': open('../../medcat-models/deid_medcat_n2c2_modelpack/cdb.dat', 'rb')}).text)

In [17]:
requests.post(f'{URL}/api/concept-dbs/', headers=headers,
                               data={'name': 'example_cdb', 'use_for_training': True},
                               files={'cdb_file': open('../../medcat-models/deid_medcat_n2c2_modelpack/cdb.dat', 'rb')}).text)

{'id': 21,
 'name': 'example_cdb',
 'cdb_file': 'http://localhost:8001/media/cdb_HjbPPUI.dat',
 'use_for_training': True,
 'create_time': '2024-09-20T21:07:03.062580Z',
 'last_modified': '2024-09-20T21:07:03.062610Z',
 'last_modified_by': 1}

#### Upload a Vocabulary

In [None]:
txt = json.loads(requests.post(f'{URL}/api/vocab/', headers=headers,
                               files={'cdb_file': open('<<LOCATION OF vocab>>', 'rb')}).text)

## Create Projects
'Projects' are individual annotaion projects that can broadly be used to:
- Improve an existing MedCAT model, by providing feedback (correct, incorrect) on MedCAT annotations, providing more synonyms, abbreviations etc for exsiting concepts or even new concepts entirely, if the current CDB does not capture possible concepts, and re-train the MedCAT model between each document.
- Inspect existing annotations of a MedCAT model and only collect annotations.

**Each new project is 'wired' up with exsiting users, models and datasets via their respective IDs. You should have already setup: User(s) a Concept Database and Vocabulary via the admin page http://{deployment_url}/admin/auth/user/.**

<!-- ![Admin Page](imgs/admin_page.png) -->
<div>
<img src="./../docs/_static/img/admin_page.png" width="350px"/>
</div>

Once you've created each object via the /admin/ page, return here to collect Users IDs and the MedCAT models IDs.

### User Permissions
First create user accounts 

Collect user IDs via that you want to permission for the new projects.

In [None]:
resp = json.loads(requests.get(f'{URL}/api/users/', headers=headers).text)['results']
pprint(resp)
users_ids = [u['id'] for u in resp]

### MedCAT Models
Each project is configured with a MedCAT Concept Database (CDB), and Vocabulary (Vocab). 

In [None]:
all_cdbs = json.loads(requests.get(f'{URL}/api/concept-dbs/', headers=headers).text)['results']
# the CDB ID we'll use for this example
cdb_to_use = all_cdbs[0]['id']
# you might have many CDBs here. First 2 cdbs:
all_cdbs[0:2]

In [None]:
# You'll probably only have one vocabulary
all_vocabs = json.loads(requests.get(f'{URL}/api/vocabs/', headers=headers).text)['results']
vocab_to_use = all_vocabs[0]['id']

### Project Creation
We'll create 3 projects, one for each dataset, with both users able to access all projects. 

We'll leave the CUI and TUI filters blank, allowing for all concepts to appear for all these projects. 

|Parameter|Description|
|---------|-----------|
|name|# Name of the project that appears on the landing page|
|description| Example projects', # Description as it appears on the landing page|
|cuis       | Comma  separated list if needed |
|type_ids   | A comma separated list of Type IDs. Type IDs are logical groupings of CUIs such as 'disease', or 'symptom'|
|dataset    | The set of documents to be annotated|
|concept_db | Previously retrieved CDB ID  |
|cdb_search_filter|**list** of CDB IDs that are used to lookup concepts during addition of annotations to a document| 
|vocab      | Previously retrieved vocab ID|
|members    | **list** of users for the project |

In [None]:
project_names = [d_n for d_n, d in datasets]

In [None]:
project_ids = []
for d_id, p_name in zip(dataset_ids, project_names):
    payload = {
        'name': f'{p_name} Annotation Project',
        'description': 'Example projects',
        'cuis': '',
        'tuis': '',
        'dataset': d_id,
        'concept_db': cdb_to_use,
        'vocab': vocab_to_use,
        'members': users_ids
    }
    project_ids.append(json.loads(requests.post(f'{URL}/api/project-annotate-entities/', json=payload, headers=headers).text))

Newly created projects are now available for the assigned users. Given this above method many projects for specific conditions can created, configured and permissioned in seconds

![](../docs/_static/img/new_projects.png)