# Improving Data Quality in Medical Imaging using Custom Vision - Training images

## MPP AI Capstone

The data files for the Capstone are found here after registration and log in. 
https://www.datasciencecapstone.org/competitions/8/ct-scans/page/26/

Extract the Capstone files in the same directory as this Jupyter Notebook and You are ready to go.

## Custom Vision init

https://customvision.ai

Documentation:

https://docs.microsoft.com/en-us/azure/opbuildpdf/cognitive-services/Custom-Vision-Service/toc.pdf?branch=live


To use the tutorial, you need to do the following: 
- Install either Python 2.7+ or Python 3.5+. 
- Install pip. 
- Install Git.

To build this example, you need to install the Preview Python SDK for the Custom Vision API from GitHub as follows:

pip install azure-cognitiveservices-vision-customvision


If you encounter a Filename too long error, make sure you have longpath support in Git enabled:

git config --system core.longpaths true


### Initializing the Custom Vision pipe and creating a new project


In [1]:
from azure.cognitiveservices.vision.customvision.training import training_api 
from azure.cognitiveservices.vision.customvision.training.models import ImageUrlCreateEntry

# Replace with a valid key 
# Obtain your training and prediction key by signing in to Custom Vision Service and going to your account settings. 

training_key = "d673b5ebb5e9453ebe212f948a472ac0" 

# Prediction key is used later and provided just before usage
# prediction_key = "84b7316448e948eaa65fbba6360c7f78"

trainer = training_api.TrainingApi(training_key)

# Create a new project

print ("Creating project...") 
project = trainer.create_project("MPP AI CS2 API1")
print ("Project created!")


Creating project...
Project created!


### Adding tags

In [2]:
# Make tags in the new project

or0_tag = trainer.create_tag(project.id, "orientation0") 
or1_tag = trainer.create_tag(project.id, "orientation1")
or2_tag = trainer.create_tag(project.id, "orientation2")
or3_tag = trainer.create_tag(project.id, "orientation3")


### Sorting images

In [3]:
import os
import pandas as pd
import shutil as sh
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

# Creating subfolders

train_dir= "train"

or0_dir = "train\\orientation0"

if not os.path.exists(or0_dir):
    os.makedirs(or0_dir)

or1_dir = "train\\orientation1"

if not os.path.exists(or1_dir):
    os.makedirs(or1_dir)

or2_dir = "train\\orientation2"

if not os.path.exists(or2_dir):
    os.makedirs(or2_dir)

or3_dir = "train\\orientation3"

if not os.path.exists(or3_dir):
    os.makedirs(or3_dir)

    
# Reading labels
    
labels = pd.read_csv("train_labels.csv", header=0, dtype=str)
# print(labels)

# Sorting files by label

or0_labels = (labels[labels.orientation == "0"])
or0_files = (or0_labels.id + ".png")
# print(or0_files)

or1_labels = (labels[labels.orientation == "1"])
or1_files = (or1_labels.id + ".png")
# print(or1_files)

or2_labels = (labels[labels.orientation == "2"])
or2_files = (or2_labels.id + ".png")
# print(or2_files)

or3_labels = (labels[labels.orientation == "3"])
or3_files = (or3_labels.id + ".png")
# print(or3_files)


# Copying files by label to subfolders

for files in or0_files:
    sh.copy2(train_dir + "\\" + files, or0_dir)
    
for files in or1_files:
    sh.copy2(train_dir + "\\" + files, or1_dir)
    
for files in or2_files:
    sh.copy2(train_dir + "\\" + files, or2_dir)
    
for files in or3_files:
    sh.copy2(train_dir + "\\" + files, or3_dir)
    


### Uploading images

In [4]:
import os
import pandas as pd
import shutil as sh

# Then image uploading, this might take a while, upto 1 hour with 4000 images
train_dir= "train"
or0_dir = "train\\orientation0"
or1_dir = "train\\orientation1"
or2_dir = "train\\orientation2"
or3_dir = "train\\orientation3"

for image in os.listdir(os.fsencode(or0_dir)): 
    with open(or0_dir + "\\" + os.fsdecode(image), mode="rb") as img_data:
        trainer.create_images_from_data(project.id, img_data.read(), [ or0_tag.id ])

for image in os.listdir(os.fsencode(or1_dir)): 
    with open(or1_dir + "\\" + os.fsdecode(image), mode="rb") as img_data:
        trainer.create_images_from_data(project.id, img_data.read(), [ or1_tag.id ])

for image in os.listdir(os.fsencode(or2_dir)): 
    with open(or2_dir + "\\" + os.fsdecode(image), mode="rb") as img_data:
        trainer.create_images_from_data(project.id, img_data.read(), [ or2_tag.id ])

for image in os.listdir(os.fsencode(or3_dir)): 
    with open(or3_dir + "\\" + os.fsdecode(image), mode="rb") as img_data:
        trainer.create_images_from_data(project.id, img_data.read(), [ or3_tag.id ])




### Training

First iteration in the project, mark as default iteration

In [5]:
import time

print ("Training...") 
iteration = trainer.train_project(project.id) 
while (iteration.status != "Completed"):
    iteration = trainer.get_iteration(project.id, iteration.id)
    print ("Training status: " + iteration.status)
    time.sleep(1)

# The iteration is now trained. Make it the default project endpoint

trainer.update_iteration(project.id, iteration.id, is_default=True)
print ("Done!")

Training...
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training
Training status: Training


### Testing 1st run (then use separate TestingImages notebook)

In [6]:
from azure.cognitiveservices.vision.customvision.prediction import prediction_endpoint 
from azure.cognitiveservices.vision.customvision.prediction.prediction_endpoint import models

prediction_key = "84b7316448e948eaa65fbba6360c7f78"
predictor = prediction_endpoint.PredictionEndpoint(prediction_key)


# Open the sample image and get back the prediction results. 
# This might take some time, upto 15 min with 1000 images.
# Write the results to file.

with open("predictions.csv","w") as f:
    f.write("id,pred0,pred1,pred2,pred3\n")

test_dir = "test"

for image in os.listdir(os.fsencode(test_dir)):
    with open(test_dir + "\\" + os.fsdecode(image), mode="rb") as test_data:
        results = predictor.predict_image(project.id, test_data.read())
        with open("predictions.csv", "a") as f:
            f.write(image.decode("utf-8"))
        for prediction in results.predictions:
            with open("predictions.csv","a") as f:
                f.write("," + prediction.tag_name + ": {0:.2f}\n".format(prediction.probability))
            
print("Done testing, results in Predictions.csv")



Done testing, results in Predictions.csv
