# MedCATTrainer API Examples
The following notebook details the how to programmatically upload data, permission users, and create projects to setup users for large distributed annotation projects.
- Create Datasets in MedCATTrainer
- Create Projects in MedCATTainer

In [118]:
import requests
import json
import pandas as pd
from pprint import pprint

In [87]:
URL = 'http://localhost:8001' # Should be set to your running deployment, IP / PORT if not running on localhost:8001

## Sample Dataset
Sample data from [MT-Samples](https://www.mtsamples.com/), a subset of this dataset is available here under example_data/*.csv

We'll be working with 3 datasets, the below guide works with 3 datasets, but can use function with 100s if needed.

In [88]:
# notes = pd.read_csv('/Users/tom/phd/mt-samples_scraped/noteevents.csv', index_col=0)
# notes.category = [c.strip() for c in notes.category]
# notes.category.value_counts()
# notes[notes.category == 'Cardiovascular / Pulmonary'].iloc[0:20].to_csv('example_data/cardio.csv', index=False)
# notes[notes.category == 'Orthopedic'].iloc[0:20].to_csv('example_data/ortho.csv', index=False)
# notes[notes.category == 'Neurology'].iloc[0:20].to_csv('example_data/neuro.csv', index=False)

In [89]:
ortho_notes = pd.read_csv('example_data/ortho.csv')
neuro_notes = pd.read_csv('example_data/neuro.csv')
cardio_notes = pd.read_csv('example_data/cardio.csv')

## Accessing the MedCATTrainer API
API access is via a username / password. Upon login the API auth endpoint provides an auth token that must be used for all following requests.

In [90]:
payload = {"username": "admin", "password": "admin"}
headers = {
    'Authorization': f'Token {json.loads(requests.post("http://localhost:8001/api/api-token-auth/", json=payload).text)["token"]}',
}
headers

{'Authorization': 'Token b61e36a984a5929367b891afc6ac5f4b9d51926e'}

### Resource APIs 
The MedCAT API, follows a RESTful architecture. Objects created, updated, deleted under their respective resource path

In [91]:
json.loads(requests.get('http://localhost:8001/api/', headers=headers).text)

{'users': 'http://localhost/api/users/',
 'concepts': 'http://localhost/api/concepts/',
 'entities': 'http://localhost/api/entities/',
 'project-annotate-entities': 'http://localhost/api/project-annotate-entities/',
 'documents': 'http://localhost/api/documents/',
 'annotated-entities': 'http://localhost/api/annotated-entities/',
 'meta-annotations': 'http://localhost/api/meta-annotations/',
 'meta-tasks': 'http://localhost/api/meta-tasks/',
 'meta-task-values': 'http://localhost/api/meta-task-values/',
 'concept-dbs': 'http://localhost/api/concept-dbs/',
 'vocabs': 'http://localhost/api/vocabs/',
 'datasets': 'http://localhost/api/datasets/',
 'icd-codes': 'http://localhost/api/icd-codes/',
 'opcs-codes': 'http://localhost/api/opcs-codes/'}

### Create Datasets
A MedCATTrainer 'Dataset' is a set of documents that is uploaded into the trainer and used for one or more, annotation projects. 
The trainer interface accepts CSV / XLSX files, files have 2 columns namely, **name** and **text**. 

An example DataFrame for this format are shown below. 

The below API can be used to upload and create multiple datasets, one for each example DataFrame

In [92]:
# Add a name column to the other datasets
ortho_notes['name'] = ortho_notes.subject_id.apply(lambda l: f'Subject {l}')
neuro_notes['name'] = neuro_notes.subject_id.apply(lambda l: f'Subject {l}')
cardio_notes['name'] = cardio_notes.subject_id.apply(lambda l: f'Subject {l}')
ortho_notes.loc[:, ['name', 'text']].head(3)

Unnamed: 0,name,text
0,Subject 7,"EXAM:,MRI LEFT KNEE WITHOUT CONTRAST,CLINICAL:..."
1,Subject 7,"REASON FOR CONSULTATION: , Left hip fracture.,..."
2,Subject 7,"REASON FOR CONSULTATION: , Left hip fracture.,..."


In [163]:
# Collate datasets, with Names
datasets = [('Neuro Notes', neuro_notes), ('Cardio Notes', cardio_notes), ('Ortho Notes', ortho_notes)]

In [164]:
# POST dataset API with list of datasets
dataset_ids = []
for name, d_s in datasets:
    payload = {
        'dataset_name': d_name,   # Name that appears in each
        'dataset': dataset.loc[:, ['name', 'text']].to_dict(),  #  Dictionary representation of only  
        'description': f'{d_name} first 20 notes from each category' # Description that appears in the trainer
    }
    resp = requests.post(f'{URL}/api/create-dataset/', json=payload, headers=headers)
    dataset_ids.append(json.loads(resp.text)['dataset_id']) 
# New datasets created in the trainer have the following IDs
dataset_ids

[1294, 1295, 1296]

## Create Projects
'Projects' are individual annotaion projects that can broadly be used to:
- Improve an existing MedCAT model, by providing feedback (correct, incorrect) on MedCAT annotations, providing more synonyms, abbreviations etc for exsiting concepts or even new concepts entirely, if the current CDB does not capture possible concepts, and re-train the MedCAT model between each document.
- Inspect existing annotations of a MedCAT model and only collect annotations.

**Each new project is 'wired' up with exsiting users, models and datasets via their respective IDs. You should have already setup: User(s) a Concept Database and Vocabulary via the admin page http://{deployment_url}/admin/auth/user/.**

<!-- ![Admin Page](imgs/admin_page.png) -->
<div>
<img src="imgs/admin_page.png" width="350px"/>
</div>

Once you've created each object via the /admin/ page, return here to collect Users IDs and the MedCAT models IDs.

### User Permissions
First create user accounts 

Collect user IDs via that you want to permission for the new projects.

In [165]:
resp = json.loads(requests.get(f'{URL}/api/users/', headers=headers).text)['results']
pprint(resp)
users_ids = [u['id'] for u in resp]

[{'email': '',
  'id': 2,
  'url': 'http://localhost/api/users/2/',
  'username': 'Test_1'},
 {'email': '',
  'id': 1,
  'url': 'http://localhost/api/users/1/',
  'username': 'admin'}]


### MedCAT Models
Each project is configured with a MedCAT Concept Database (CDB), and Vocabulary (Vocab). 

In [166]:
all_cdbs = json.loads(requests.get(f'{URL}/api/concept-dbs/', headers=headers).text)['results']
# the CDB ID we'll use for this example
cdb_to_use = all_cdbs[0]['id']
# you might have many CDBs here. First 2 cdbs: 
all_cdbs[0:2]

[{'id': 3,
  'name': 'umls_cdb_full',
  'cdb_file': 'http://localhost/media/0.2.7_umls_2m_mimic.dat',
  'use_for_training': True},
 {'id': 4,
  'name': 'umls_cdb_full_search_drop_down',
  'cdb_file': 'http://localhost/media/0.2.7_umls_2m_mimic_4P6as8R.dat',
  'use_for_training': False}]

In [167]:
# You'll probably only have one vocabulary
all_vocabs = json.loads(requests.get(f'{URL}/api/vocabs/', headers=headers).text)['results']
vocab_to_use = all_vocabs[0]['id']

### Project Creation
We'll create 3 projects, one for each dataset, with both users able to access all projects. 

We'll leave the CUI and TUI filters blank, allowing for all concepts to appear for all these projects. 

|Parameter|Description|
|---------|-----------|
|name|# Name of the project that appears on the landing page|
|description| Example projects', # Description as it appears on the landing page|
|cuis       | Comma  separated list if needed |
|tuis       | A comma separated list of TUIs. TUIs are logical groupings of CUIs such as 'disease', or 'symptom'|
|dataset    | The set of documents to be annotated|
|concept_db | Previously retrieved CDB ID  |
|vocab      | Previously retrieved vocab ID|
|members    | **list** of users for the project |

In [168]:
project_names = [d_n for d_n, d in datasets]

In [172]:
project_ids = []
for d_id, p_name in zip(dataset_ids, project_names):
    payload = {
        'name': f'{p_name} Annotation Project',
        'description': 'Example projects', 
        'cuis': '', 
        'tuis': '',
        'dataset': d_id,
        'concept_db': cdb_to_use, 
        'vocab': vocab_to_use, 
        'members': users_ids
    }
    project_ids.append(json.loads(requests.post(f'{URL}/api/project-annotate-entities/', json=payload, headers=headers).text))

Newly created projects are now available for the assigned users. Given this above method many projects for specific conditions can created, configured and permissioned in seconds

![](imgs/new_projects.png)