# Cellular Imaging Demo

We will be using the Malaria dataset in this demo.

The TDFS Malaria dataset contains a total of 27,558 cell images with equal instances of parasitized and uninfected cells from the thin blood smear slide images of segmented cells. The original data source is from [NIH](https://lhncbc.nlm.nih.gov/publication/pub9932).

This lab runs training on AI Platform on CPUs, TPUs, and GPUs.

No predictions are run in this lab, but you can view the model training code in /trainer/.trainer.py.

## Step 1

## Repository Cloning and Setup

In [None]:
!echo $PWD

In [None]:
#copy the tensor flow repo from git hub

!git clone https://github.com/tensorflow/models.git \
  --branch=v2.1.0 \
  --depth=1

In [None]:
#copy the trainer folder inside the model folder

!cp -r trainer  models/

In [None]:
# Change the current working directory to models
import os
os.chdir('models/')

os.getcwd()

In [None]:
# Create the setup.py file

In [None]:
%%writefile setup.py
from setuptools import find_packages
from setuptools import setup

REQUIRED_PACKAGES=['tensorflow-datasets~=3.1', 
                   'pip>=20.2',
                   'absl-py<0.9,>=0.7']

setup(
    name='official',
    _version_='0.1',
    install_requires=REQUIRED_PACKAGES,
    include_package_data=True,
    packages=find_packages()
)

# Step 2

## Training the model on the AI platform with CPUs

Please replace the *project_id*, the *bucket_id*, *folder_path*, *region*, and *data_dir* with proper values.
The gsutil command will create the proper bucket for you, and if one already exists, a message will be displayed.

In [None]:
project_id='ai-fulcrum-demo'
bucket_id='maven-user10'
student_path='cellular-image'
region='us-central1'
data_dir='amazing-public-data/Cellular_Imaging_Data'
!gsutil mb -c standard -l {region} gs://{bucket_id}

In [None]:
bucket_path=f'{bucket_id}/{student_path}'
model_dir=f'{bucket_path}/cellular_img__CPU_model_files'

%env BUCKET_ID=$bucket_id
%env PROJECT_ID=$project_id
%env REGION=$region
%env DATA_DIR=$data_dir
%env MODEL_DIR=$model_dir

In [None]:
import time
from datetime import datetime, timedelta

In [None]:
now=(datetime.now() + timedelta(hours=-5)).strftime("%Y%m%d_%H%M%S") # Central Time
%env JOB_NAME=cellular_img_CPU{now}

!gcloud ai-platform jobs submit training $JOB_NAME \
  --package-path trainer \
  --module-name trainer.trainer  \
  --region $REGION \
  --python-version 3.7 \
  --runtime-version 2.1 \
  --staging-bucket gs://$BUCKET_ID \
  -- \
  --tpu local \
  --model_dir gs://$MODEL_DIR \
  --data_dir gs://$DATA_DIR \
  --train_epochs 1 \
  --distribution_strategy off \
  --num_gpus 0 \
  --download False

                
# Stream logs so that training is done before subsequent cells are run.
# Remove  '> /dev/null' to see step-by-step output of the model build steps.
# !gcloud ai-platform jobs stream-logs $JOB_NAME > /dev/null

# Show the current status of the job
!gcloud ai-platform jobs describe $JOB_NAME --format="value(state)"

# This code loops 20 times to show the job status within the Python notebook.
# The model should exit with a status of "SUCCEEDED."
# (If it does not within 20 loops, you can check on the job in the terminal window with the first suggested bash line in the output below.)
cmd = 'gcloud ai-platform jobs describe $JOB_NAME --format="value(state)"'
for i in range(20):
    time.sleep(10)
    !{cmd}

# Step 3

### Train the model on the AI Platform Using TPUs

#### Prerequisites before using a Cloud TPU
#### Authorizing your Cloud TPU to access your project 

[TPU Environment Setup](https://cloud.google.com/ai-platform/training/docs/using-tpus#console) - These steps need to be executed by the Data Enginner for setting up the environment to use the TPU's. 

In [None]:
now=(datetime.now() + timedelta(hours=-5)).strftime("%Y%m%d_%H%M%S") # Central Time
%env JOB_NAME=cellular_img_TPU{now}

!gcloud ai-platform jobs submit training $JOB_NAME \
  --scale-tier BASIC_TPU \
  --package-path trainer \
  --module-name trainer.trainer  \
  --region $REGION \
  --python-version 3.7 \
  --runtime-version 2.1 \
  --staging-bucket gs://$BUCKET_ID \
  -- \
  --model_dir gs://$MODEL_DIR \
  --data_dir gs://$DATA_DIR \
  --train_epochs 1 \
  --distribution_strategy tpu \
  --download False

                
# Stream logs so that training is done before subsequent cells are run.
# Remove  '> /dev/null' to see step-by-step output of the model build steps.
# !gcloud ai-platform jobs stream-logs $JOB_NAME > /dev/null

# Show the current status of the job
!gcloud ai-platform jobs describe $JOB_NAME --format="value(state)"

# This code loops 20 times to show the job status within the Python notebook.
# The model should exit with a status of "SUCCEEDED."
# (If it does not within 20 loops, you can check on the job in the terminal window with the first suggested bash line in the output below.)
cmd = 'gcloud ai-platform jobs describe $JOB_NAME --format="value(state)"'
for i in range(20):
    time.sleep(10)
    !{cmd}

Once the CPU job and TPU jobs have completed, run the "gcloud ai-platform jobs describe" command in the terminal for each job. You should see the TPU job ran about 30% faster. (Look at the job createTime and endTime stats.)

# Step 4

### Train the model on the AI Platform Using GPUs

Useful Links :
1. [Distributed Taining](https://www.tensorflow.org/guide/distributed_training)
1. [Mirrored Strategy](https://www.tensorflow.org/api_docs/python/tf/distribute/MirroredStrategy)
1. [Using GPU's in AI Platform](https://cloud.google.com/ai-platform/training/docs/using-gpus) 

In [None]:
now=(datetime.now() + timedelta(hours=-5)).strftime("%Y%m%d_%H%M%S") # Central Time
%env JOB_NAME=cellular_img_GPU{now}

!gcloud ai-platform jobs submit training $JOB_NAME \
  --scale-tier <ADD HERE> \
  --package-path <ADD HERE> \
  --module-name <ADD HERE>  \
  --region us-east1 \
  --python-version 3.7 \
  --runtime-version 2.1 \
  --staging-bucket <ADD HERE> \
  -- \
  --model_dir <ADD HERE> \
  --data_dir <ADD HERE> \
  --train_epochs 1 \
  --num_gpus=<ADD HERE> \ # You can use 1 for now
  --distribution_strategy <ADD HERE>  \
  --download False

                
# Stream logs so that training is done before subsequent cells are run.
# Remove  '> /dev/null' to see step-by-step output of the model build steps.
# !gcloud ai-platform jobs stream-logs $JOB_NAME > /dev/null

# Show the current status of the job
!gcloud ai-platform jobs describe $JOB_NAME --format="value(state)"

# This code loops 20 times to show the job status within the Python notebook.
# The model should exit with a status of "SUCCEEDED."
# (If it does not within 20 loops, you can check on the job in the terminal window with the first suggested bash line in the output below.)
cmd = 'gcloud ai-platform jobs describe $JOB_NAME --format="value(state)"'
for i in range(20):
    time.sleep(10)
    !{cmd}

If you run the "gcloud ai-platform jobs describe" command in the terminal for the GPU job, you'll find it's run time is closer to the TPU than the CPU. (Look at the job createTime and endTime stats.)