# AutoML Vision in Jupyter Notebook:

# Predicting Referrable Diabetic Retinopathy using the Messidor Dataset 

## David K Ryan 

Data is publicly available from: http://www.adcis.net/en/third-party/messidor/)

This returns a model that predicts referrable diabetic retinopathy from retinal images : 

#### Evaluation Metrics 

PR-AUC: 0.845  
Precision: 75.36%  
Recall: 75.36%   
Using a score threshold of 0.5   


________________________

## Data curation 

1. Download image data from messidor website as zip file 
2. Upload zip file to google drive 
3. Unzip file within google drive 
4. Download csv with meta-data and label 
4. Rearrange images in folders according to label using notebook: google_drive_file_manipulation.ipynb
5. Upload images from google drive to google cloud storage using notebook: drive_to_gcs.ipynb

## Project set-up

1. Set up a google VM instance via AI notebooks and access jupyter notebook via the console 
2. Ensure billing is enabled 
3. Enable AI console APIs 
4. Sort-out service accounts and keys! 
5. In jupyter - open a new terminal window - and ensure google sdk is in the right project id etc
    1. gcloud init 
    2. Follow instructions to verify project and computer location 
    3. ! gcloud config set project $PROJECT_ID (note: this ensures you are in the correct project as well!)
6. Set Service Account Role for AutoML (see code below)
    
See further: https://aihub.cloud.google.com/u/0/p/products%2Ffd607928-12f3-4aa1-a523-ad4431a96ed6


In [None]:
#### Code for terminal to set service account roles ####
!gcloud projects add-iam-policy-binding $PROJECT_ID \
   --member="[user:your-userid@your-domain]" \
   --role="roles/automl.admin"
!gcloud projects add-iam-policy-binding $PROJECT_ID \
   --member="[serviceAccount:service-account-name]" \
   --role="roles/automl.editor"

#note you must have the member in the format --member="user:d************@gmail.com" --role="roles/automl.admin"

Create a google cloud storage bucket using the console (must be in same geographical region - uscentral1 (IOWA)

In [None]:
# Ensure libraries are installed 
!pip install -U google-cloud-storage
!pip install -U google-cloud-automl
!pip install -U protobuf

# Import libraries 
import tensorflow as tf
import numpy as np

# Import the Google AutoML client library
from google.cloud import automl_v1beta1 as automl

In [None]:
PROJECT_ID = ""
COMPUTE_REGION=""
BUCKET_NAME=""

In [None]:
# Create an AutoML client
client = automl.AutoMlClient()

In [None]:
# Derive the full GCP path to the project
client = automl.AutoMlClient()

# A resource that represents Google Cloud Platform location.
project_location = f"projects/{PROJECT_ID}/locations/us-central1"

In [None]:
# Specify a name for the dataset
DATASET_NAME="messidor"

# Specify the image classification type for the dataset.
dataset_metadata = {"classification_type": 'MULTICLASS'}

In [None]:
# Set dataset name and metadata of the dataset.
my_dataset = {
    "display_name": DATASET_NAME,
    "image_classification_dataset_metadata": dataset_metadata}

In [None]:
# Create a dataset with the dataset metadata in the region.
response = client.create_dataset(parent = project_location, dataset=my_dataset)

In [None]:
# Display the dataset information.
print("Dataset name: {}".format(response.name))
print("Dataset id: {}".format(response.name.split("/")[-1]))
print("Dataset display name: {}".format(response.display_name))
print("Image classification dataset metadata:")
print("\t{}".format(response.image_classification_dataset_metadata))
print("Dataset example count: {}".format(response.example_count))

# Save the dataset ID
dataset_id = response.name.split("/")[-1]

In [None]:
# Get the full path of the dataset.
dataset_full_id = client.dataset_path(
    PROJECT_ID, COMPUTE_REGION, dataset_id)

dataset_full_id

Note: 

- CSV file must be in the form of gs://file/to/path.jpg, label 
- When saving CSV file - index=False, header=False 
- Bucket and model must be in the same geographical location

In [None]:
# Specify the location of the CSV file for the dataset
CSV_DATASET = "gs://"

In [None]:
# Configure images 
input_config = {"gcs_source": {"input_uris": [CSV_DATASET]}}

In [None]:
# Import data from the input URI.
response = client.import_data(name=dataset_full_id, input_config= input_config)

In [None]:
# synchronous check of operation status.
print("Data imported. {}".format(response.result()))

This can be verified in GCS console 

In [None]:
#print all eligible datasets in the project 
response = client.list_datasets(parent=project_location)

print("List of datasets:")
for dataset in response:
    # Display the dataset information.
    print("Dataset name: {}".format(dataset.name))
    print("Dataset id: {}".format(dataset.name.split("/")[-1]))
    print("Dataset display name: {}".format(dataset.display_name))
    print("Image classification dataset metadata:")
    print("\t{}".format(dataset.image_classification_dataset_metadata))
    print("Dataset example count: {}\n".format(dataset.example_count))

In [None]:
# Specify a name for your model.
MODEL_NAME="messidor_dr_model"

In [None]:
# Set training for a maximum of 1 hour
train_budget=1

In [None]:
# Instantiate model 
my_model = {
    "display_name": MODEL_NAME,
    "dataset_id": dataset_full_id
    "image_classification_model_metadata": {"train_budget": train_budget}}

In [None]:
# Create a model with the model metadata in the region.
response = client.create_model(parent=project_location, model=my_model)

In [None]:
# Locate the training operation name 
print("Training operation name: {}".format(response.operation.name))

In [None]:
# synchronous check of operation status.
print("Training done. {}".format(response.result()))

# Save the model ID
model_id = response.result().name.split("/")[-1]

In [None]:
# Get the full path of the model.
model_full_id = client.model_path(PROJECT_ID, COMPUTE_REGION, model_id)

In [None]:
# Get complete detail of the model.
model = client.get_model(name=model_full_id)

In [None]:
model

In [None]:
# Retrieve deployment state
if model.deployment_state == "DeploymentState.DEPLOYED": 
    deployment_state = 'deployed'
else: 
    deployment_state = 'undeployed'

In [None]:
# Display the model information.
print("Model name: {}".format(model.name))
print("Model id: {}".format(model.name.split("/")[-1]))
print("Model display name: {}".format(model.display_name))
print("Image classification model metadata:")
print(
    "Training budget: {}".format(
        model.image_classification_model_metadata.train_budget
    )
)
print(
    "Training cost: {}".format(
        model.image_classification_model_metadata.train_cost
    )
)
print(
    "Stop reason: {}".format(
        model.image_classification_model_metadata.stop_reason
    )
)
print(
    "Base model id: {}".format(
        model.image_classification_model_metadata.base_model_id
    )
)
print("Model deployment state: {}".format(deployment_state))

In [None]:
# List evaluations 

# Get the full path of the model.
model_full_id = client.model_path(PROJECT_ID, COMPUTE_REGION, model_id)

# List all the model evaluations in the model by applying filter.
response = client.list_model_evaluations(parent=model_full_id)

print("List of model evaluations:")
for element in response:
    print(element)

Vision console has confusion matrix and evaluation metrics for hold-out test set 

In [None]:
# Batch predict 

project_id = ""
model_id = ""
input_uri = ""
output_uri = ""

- Input uri is the csv file saved in gs:// format which contains the gs:// for all batch-predict images 
- Input uri is a csv of gs:// only (no label). NB saved as csv with header=False, index=False
- Output uri is the file where the json output will be saved 


In [None]:
# instantiate the prediction client 
prediction_client = automl.PredictionServiceClient()

In [None]:
# Get the full path of the model.
model_full_id = f"projects/{project_id}/locations/us-central1/models/{model_id}"

In [None]:
gcs_source = automl.GcsSource(input_uris=[input_uri])

In [None]:
#prediction 
input_config = automl.BatchPredictInputConfig(gcs_source=gcs_source)
gcs_destination = automl.GcsDestination(output_uri_prefix=output_uri)
output_config = automl.BatchPredictOutputConfig(
    gcs_destination=gcs_destination)

In [None]:
response = prediction_client.batch_predict(
    name=model_full_id,
    input_config=input_config,
    output_config=output_config
)

print("Waiting for operation to complete...")
print(
    f"Batch Prediction results saved to Cloud Storage bucket. {response.result()}")

## Evaluation Metrics

<img src="https://github.com/dkdryan/retinal_deeplearning/blob/master/auc_messidor_automl.png?raw=true" width="40%">
<img src="https://github.com/dkdryan/retinal_deeplearning/blob/master/confusion_matrix_automl.png?raw=true" width="40%">