*Copyright 2018 MD.ai, Inc.   
Licensed under the Apache License, Version 2.0*

# Create additional annotations using the MD.ai Annotator 

#### Pneumonia Detection Challenge: https://www.kaggle.com/c/rsna-pneumonia-detection-challenge        

For further data exploration and to create your own additional annotations, clone the MD.ai project at:  
https://public.md.ai/annotator/project/LxR6zdR2.  

You can then create your own team, add new labels and additional annotations. The “Users“ tab will allow you to create teams and assign exams to team members. You can track progress and export your new annotations in JSON format.

Further instructions and videos are available at https://docs.md.ai.

### Clone the Annotator Project

RSNA Pneumonia Detection Challenge Annotator project URL:  
https://public.md.ai/annotator/project/LxR6zdR2/workspace. 

To add annotations to the cloned project, you need to clone the project first. 

First, navigate to the original project URL (above), click on "Clone Project" button. 

![GitHub Logo](https://raw.githubusercontent.com/mdai/ml-lessons/master/images/kaggle-project-clone-small.gif)


**Intro to deep learning for medical imaging lessons**

- Lesson 1. Classification of chest vs. adominal X-rays using TensorFlow/Keras [Github](https://github.com/mdai/ml-lessons/blob/master/lesson1-xray-images-classification.ipynb) [Annotator](https://public.md.ai/annotator/project/PVq9raBJ)

- Lesson 2. Lung X-Rays Semantic Segmentation using UNets. [Github](https://github.com/mdai/ml-lessons/blob/master/lesson2-lung-xrays-segmentation.ipynb)
[Annotator](https://public.md.ai/annotator/project/aGq4k6NW/workspace) 

- Lesson 3. RSNA Pneumonia detection using Kaggle data format [Github](https://github.com/mdai/ml-lessons/blob/master/lesson3-rsna-pneumonia-detection-kaggle.ipynb) [Annotator](https://public.md.ai/annotator/project/LxR6zdR2/workspace) 
  
- Lesson 3. RSNA Pneumonia detection using MD.ai python client library [Github](https://github.com/mdai/ml-lessons/blob/master/lesson3-rsna-pneumonia-detection-mdai-client-lib.ipynb) [Annotator](https://public.md.ai/annotator/project/LxR6zdR2/workspace)

- MD.ai python client libray URL: https://github.com/mdai/mdai-client-py
- MD.ai documentation URL: https://docs.md.ai/

In [1]:
import os
import sys
import json
import pydicom
import pandas as pd

### Import the `mdai` library

Run the block below to install the `mdai` client library into your python environment.

In [2]:
!pip install --upgrade --quiet mdai
import mdai
mdai.__version__

'0.0.5'

In [3]:
# Root directory of the project 
ROOT_DIR = os.path.abspath('./lesson3-data')

### Create an `mdai` client

The mdai client requires an access token, which authenticates you as the user. To create a new token or select an existing token, navigate to the "Personal Access Tokens" tab on your user settings page at the specified MD.ai domain (e.g., public.md.ai).

**Important: keep your access tokens safe. Do not ever share your tokens.**

In [16]:
mdai_client = mdai.Client(domain='public.md.ai', access_token="MY_PERSONAL_ACCESS_TOKEN")

Successfully authenticated to public.md.ai.


### Define project

Define a project you have access to by passing in the project id. The project id can be found in the URL in the following format: `https://public.md.ai/annotator/project/{project_id}`.

For example, `project_id` would be `XXXX` for `https://public.md.ai/annotator/project/XXXX`.

Specify optional `path` as the data directory (if left blank, will default to current working directory).

In [5]:
# use cloned project_id! 
CLONED_PROJECT_ID = 'EoBKoMBG' 
p = mdai_client.project(project_id=CLONED_PROJECT_ID, path=ROOT_DIR)

Using path '/home/txia/mdai-git/ml-lessons/lesson3-data' for data.
Preparing annotations export for project EoBKoMBG...                                                
Preparing images export for project EoBKoMBG...                                                     
Using cached annotations data for project EoBKoMBG.
Using cached images data for project EoBKoMBG.


## Prepare data

### Grab the label ids. You'll need these to create a label dictionary.

In [6]:
p.show_label_groups()

Label Group, Id: G_q563m2, Name: Default group
	Labels:
	Id: L_NBy1aB, Name: Lung Opacity
	Id: L_Wdjx2B, Name: No Lung Opacity



### Set label ids

Selected label ids must be explicitly set by `Project#set_label_ids` method in order to prepare datasets.

## Note: Your label ids and dataset ids will be different. Use show_label_groups() and show_datasets() to find your specific ids.

In [1]:
# this maps label ids to class ids
# make sure this matches the Kaggle dataset
# target = 0: No Lung Opacity 
# target = 1: Lung Opacity 
labels_dict = {
    'L_Wdjx2B':0, # target = 0, background 
    'L_NBy1aB':1, # target = 1, lung opacity 
              }

print(labels_dict)
p.set_labels_dict(labels_dict)

{'L_Wdjx2B': 0, 'L_NBy1aB': 1}


NameError: name 'p' is not defined

### Use this formula to find your specific dataset id and use it to load dataset.

In [8]:
p.show_datasets()

Datasets:
Id: D_gEX5do, Name: stage 1 train



In [9]:
dataset = p.get_dataset_by_id('D_gEX5do')
dataset.prepare()

In [10]:
dataset.show_classes()

Label id: L_Wdjx2B, Class id: 0, Class text: No Lung Opacity
Label id: L_NBy1aB, Class id: 1, Class text: Lung Opacity


In [11]:
# generate kaggle labels format (see, stage_1_train_labels.csv)

# use dataset object from above 
image_ids = dataset.get_image_ids() 

kaggle_data = []
for image_id in image_ids: 
    ds = pydicom.dcmread(image_id)    
    anns = dataset.get_annotations_by_image_id(image_id)
    for ann in anns: 
        labelId = ann['labelId']
        target = int(dataset.label_id_to_class_id(labelId))

        if target == 0: 
            kaggle_data.append((ds.PatientID, None, None, None, None, target))

        elif target == 1: 

            x = ann['data']['x']
            y = ann['data']['y']
            height = ann['data']['height']
            width = ann['data']['width']
            kaggle_data.append((ds.PatientID, x, y, height, width, target))
        else: 
            raise ValueError('Target {}  is invalid.'.format(target))
            
kaggle_df = pd.DataFrame(kaggle_data, columns=['patientId', 'x', 'y', 'width', 'height', 'Target'])

In [12]:
kaggle_df.to_csv('stage_1_train_cloned.csv')

In [13]:
kaggle_df.head()

Unnamed: 0,patientId,x,y,width,height,Target
0,00322d4d-1c29-4943-afc9-b6754be640eb,111.5954,92.98391,634.40922,497.87585,1
1,003d8fa0-6bf1-40ed-b54c-ac657f8495c5,,,,,0
2,00313ee0-9eaa-42f4-b0ab-c148ed3241cd,,,,,0
3,0004cfab-14fd-4e49-80ba-63a80b6bddd6,175.58974,281.97101,429.23523,286.53734,1
4,00436515-870c-4b36-a041-de91049b9ab4,350.52874,609.69195,117.70117,133.00229,1


In [14]:
!cat stage_1_train_cloned.csv

,patientId,x,y,width,height,Target
0,00322d4d-1c29-4943-afc9-b6754be640eb,111.5954,92.98391,634.40922,497.87585,1
1,003d8fa0-6bf1-40ed-b54c-ac657f8495c5,,,,,0
2,00313ee0-9eaa-42f4-b0ab-c148ed3241cd,,,,,0
3,0004cfab-14fd-4e49-80ba-63a80b6bddd6,175.58974,281.97101,429.23523,286.53734,1
4,00436515-870c-4b36-a041-de91049b9ab4,350.52874,609.69195,117.70117,133.00229,1
5,00436515-870c-4b36-a041-de91049b9ab4,177.50805,229.51724,249.52643,209.50804,1
6,00436515-870c-4b36-a041-de91049b9ab4,645.95862,349.57241,149.48047,105.93103,1


**At this point, you could merge the training labels in this csv with the original training labels csv (i.e., stage_1_train_labels.csv); either using the command line, or read the csv data via pandas and merge the data.**