# Computer Vision Example:  Image Classification with WMLA

https://developer.ibm.com/technologies/artificial-intelligence/tutorials/use-computer-vision-with-dli-watson-machine-learning-accelerator/

This workflow is documented here...

### Contents

- [Introduction](#Introduction)
- [Upload this notebook to your environment](#Upload-notebook)
- [Download dataset and model](#Download-dataset-model)
- [Import dataset](#Import-dataset)
- [Build the model](#Build-the-model)
- [Tune Hyper-parameter](#Tune-hyper-parameter)
- [Run Training](#Run-training)
- [Inspect Training Run](#Inspect-training-run)
- [Create an inference model](#Create-an-inference-model)
- [Test it out](#Test-it-out)

## Introduction
[Back to top](#Contents)

This notebook details the process of performing a basic computer vision image classification example using the Deep Learning Impact functionality within Watson Machine Learning Accelerator.  

Please visit [Watson Machine Learning Accelerator Learning Path](https://developer.ibm.com/series/learning-path-get-started-with-watson-machine-learning-accelerator/) for further insight of Watson ML Accelerator

For this lab we will build a **custom image classifier** using WMLA

In [9]:
## Imports
import os,sys


In [50]:
def get_config(cfg_in={}):
    cfg = {}
    cfg['userid']="b0p036aa"
    # location of git clone ....
    cfg['repo_dir']="/gpfs/home/s4s004/b0p036aa/wmla-learning-path"
    cfg['image_dir']="/gpfs/home/s4s004/b0p036aa/wmla-learning-path/images"
    cfg["classes"]=["cars","busses","trucks"]  
    cfg["num_images"] = {"train":200,"valid":20,"test":20}
    cfg["d_partitions"]=["train"]
    
    # overwrite configs if passed
    for (k,v) in cfg_in.items() :
        npt("Overriding Config {}:{} with {}".format(k,cfg[k],v))
        cfg[k] = v

    # non overrideable ...
    cfg["jpeginfo"] =cfg['repo_dir']+"/utils/jpeginfo"
    cfg["googliser"] =cfg['repo_dir']+"/utils/googliser.sh"
    
    return cfg

# utility print function
def nprint(mystring) :
    print("**{}** : {}".format(sys._getframe(1).f_code.co_name,mystring))
npt=nprint


## Download dataset and model
 Here you are going to define your own image classification project!  We will use google to grab images, and will build custom models..

Now we are ready to go,  lets get started and download the dataset from github!!!

First step is that we will change our working directory to your Spectrum Scale//GPFS Directory

## Define classes for our dataset 

Here we are going to build our own dataset !!  Think of 3 categories you would like to classify images.  In this example, we will use 
* busses
* trucks 
* cars

We will use an open source tool called *googliser* to download our images from google images.

For a Covid-19 based example you could make your classes something like 
* "people wearing masks"
* "people posing street"


In [44]:
#################################################################################################
# @@ Students : Customize this cell with your custom classes for image classification
################################################################################################

# Overrides for lab

mycfg = {
    'repo_dir':"/gpfs/home/s4s004/vanstee/2020-05-wmla/wmla-learning-path",
    'image_dir':"/gpfs/home/s4s004/vanstee/2020-05-wmla/images",
    "classes":["people wearing masks","people posing street","people skateboarding","people on bikes"],  ## <<- CLASS Enter your search terms here 
    "d_partitions":["train"],
}
cfg=get_config(mycfg)

**get_config** : Overriding Config repo_dir:/gpfs/home/s4s004/b0p036aa/wmla-learning-path with /gpfs/home/s4s004/vanstee/2020-05-wmla/wmla-learning-path
**get_config** : Overriding Config image_dir:/gpfs/home/s4s004/b0p036aa/wmla-learning-path/images with /gpfs/home/s4s004/vanstee/2020-05-wmla/images
**get_config** : Overriding Config classes:['cars', 'busses', 'trucks'] with ['people wearing masks', 'people posing street', 'people skateboarding', 'people on bikes']
**get_config** : Overriding Config d_partitions:['train'] with ['train']


In [45]:
# Helpers to make directories
def class_folder_name(base,d_part,cls) :
    return base+"/"+d_part+"/"+ cls.replace(" ","_")

def makeDirIfNotExist(directory) :
    if not os.path.exists(directory):  
        npt("Making directory {}".format(directory))
        os.makedirs(directory) 
    else :
        npt("Directory {} already exists .. ".format(directory))

# Build directory hierarchy
#   [train|valid|test ]
#    -----------------> [class1 | class2 | class...]
for d_part in cfg["d_partitions"] :
    for cls in cfg["classes"] :
        directory=class_folder_name(cfg['image_dir'],d_part,cls)
        makeDirIfNotExist(directory)


**makeDirIfNotExist** : Directory /gpfs/home/s4s004/vanstee/2020-05-wmla/images/train/people_wearing_masks already exists .. 
**makeDirIfNotExist** : Directory /gpfs/home/s4s004/vanstee/2020-05-wmla/images/train/people_posing_street already exists .. 
**makeDirIfNotExist** : Directory /gpfs/home/s4s004/vanstee/2020-05-wmla/images/train/people_skateboarding already exists .. 
**makeDirIfNotExist** : Directory /gpfs/home/s4s004/vanstee/2020-05-wmla/images/train/people_on_bikes already exists .. 


In [46]:
# install googliser
def install_googliser():
    googliser_directory = cfg['repo_dir']+"/googliser"
    if not os.path.exists(googliser_directory):  
        npt("Installing Googliser here : {} ".format(googliser_directory))
        os.chdir(cfg['repo_dir'])
        !git clone https://github.com/teracow/googliser
    else :
        npt("Googliser already installed here : {} ".format(googliser_directory))

    googliser = cfg['repo_dir']+"/googliser/googliser.sh"

    return googliser 
googliser = install_googliser()
!ls {googliser}

**install_googliser** : Googliser already installed here : /gpfs/home/s4s004/vanstee/2020-05-wmla/wmla-learning-path/googliser 
/gpfs/home/s4s004/vanstee/2020-05-wmla/wmla-learning-path/googliser/googliser.sh


In [47]:
# The code below will download files to train folder only to avoid duplicate downloads.  
# We then move a few files over.  This can be done manually or programatically.  For our example
# we will let FastAI do the work for us!

def download_images(cfg):
    utility_dir = cfg['repo_dir']
    for d_p in cfg["d_partitions"] : # train only for now ..
        for cls in cfg["classes"] :
            current_dir =class_folder_name(cfg['image_dir'],d_p,cls)
            #os.chdir(current_dir)
            os.chdir(utility_dir)
            command = googliser + \
                      " --o {}".format(current_dir) +\
                      " --phrase \"{}\"".format(cls) + \
                      " --parallel 50 --upper-size 500000 --lower-size 2000 " + \
                      " -n {}".format(cfg['num_images'][d_p]) + \
                      " --format jpg --timeout 15 --safesearch-off "
            npt(command)
            !{command}
    npt("Downloads complete!")
download_images(cfg)

**download_images** : /gpfs/home/s4s004/vanstee/2020-05-wmla/wmla-learning-path/googliser/googliser.sh --o /gpfs/home/s4s004/vanstee/2020-05-wmla/images/train/people_wearing_masks --phrase "people wearing masks" --parallel 50 --upper-size 500000 --lower-size 2000  -n 200 --format jpg --timeout 15 --safesearch-off 
 [1mgoogliser.sh[0m v:200212 PID:64281

   [1;34mG[0m[1;31mo[0m[1;38;5;214mo[0m[1;34mg[0m[1;32ml[0m[1;31me[0m: [1;32m0/10[0m web pages OK, [1;38;5;214m1/10[0m are in progres[1;32m0/10[0m web pages OK, [1;38;5;214m2/10[0m are in progres[1;32m0/10[0m web pages OK, [1;38;5;214m3/10[0m are in progres[1;32m0/10[0m web pages OK, [1;38;5;214m4/10[0m are in progres[1;32m0/10[0m web pages OK, [1;38;5;214m5/10[0m are in progres[1;32m0/10[0m web pages OK, [1;38;5;214m6/10[0m are in progres[1;32m0/10[0m web pages OK, [1;38;5;214m7/10[0m are in progres[1;32m0/10[0m web pages OK, [1;38;5;214m8/10[0m are in progres[1;32m0/10[0m web pages OK, 

 download: [1;32m0/90[0m images OK[1;32m0/90[0m images OK, [1;38;5;214m1/50[0m are in progress[1;32m0/90[0m images OK, [1;38;5;214m2/50[0m are in progress[1;32m0/90[0m images OK, [1;38;5;214m3/50[0m are in progress[1;32m0/90[0m images OK, [1;38;5;214m4/50[0m are in progress[1;32m0/90[0m images OK, [1;38;5;214m5/50[0m are in progress[1;32m0/90[0m images OK, [1;38;5;214m6/50[0m are in progress[1;32m0/90[0m images OK, [1;38;5;214m6/50[0m are in progress and [1;31m1[0m have failed[1;32m0/90[0m images OK, [1;38;5;214m7/50[0m are in progress and [1;31m1[0m have failed[1;32m0/90[0m images OK, [1;38;5;214m8/50[0m are in progress and [1;31m1[0m have failed[1;32m0/90[0m images OK, [1;38;5;214m9/50[0m are in progress and [1;31m1[0m have failed[1;32m0/90[0m images OK, [1;38;5;214m10/50[0m are in progress and [1;31m1[0m have faile[1;32m2/90[0m images OK, [1;38;5;214m9/50[0m are in progress and [1;31m1[0m have failed[1;32m3/90[0m images

[1;32m55/90[0m images OK, [1;38;5;214m23/50[0m are in progress and [1;31m12[0m have faile[1;32m56/90[0m images OK, [1;38;5;214m22/50[0m are in progress and [1;31m12[0m have faile[1;32m57/90[0m images OK, [1;38;5;214m21/50[0m are in progress and [1;31m12[0m have faile[1;32m58/90[0m images OK, [1;38;5;214m21/50[0m are in progress and [1;31m12[0m have faile[1;32m60/90[0m images OK, [1;38;5;214m18/50[0m are in progress and [1;31m12[0m have faile[1;32m61/90[0m images OK, [1;38;5;214m17/50[0m are in progress and [1;31m12[0m have faile[1;32m62/90[0m images OK, [1;38;5;214m16/50[0m are in progress and [1;31m12[0m have faile[1;32m63/90[0m images OK, [1;38;5;214m15/50[0m are in progress and [1;31m12[0m have faile[1;32m64/90[0m images OK, [1;38;5;214m14/50[0m are in progress and [1;31m12[0m have faile[1;32m65/90[0m images OK, [1;38;5;214m13/50[0m are in progress and [1;31m12[0m have faile[1;32m67/90[0m images OK, [1;38;5;214m12/50[0m

 download: [1;32m0/85[0m images OK[1;32m0/85[0m images OK, [1;38;5;214m1/50[0m are in progress[1;32m0/85[0m images OK, [1;38;5;214m2/50[0m are in progress[1;32m0/85[0m images OK, [1;38;5;214m3/50[0m are in progress[1;32m0/85[0m images OK, [1;38;5;214m4/50[0m are in progress[1;32m0/85[0m images OK, [1;38;5;214m5/50[0m are in progress[1;32m0/85[0m images OK, [1;38;5;214m6/50[0m are in progress and [1;31m1[0m have failed[1;32m0/85[0m images OK, [1;38;5;214m7/50[0m are in progress and [1;31m1[0m have failed[1;32m0/85[0m images OK, [1;38;5;214m8/50[0m are in progress and [1;31m1[0m have failed[1;32m1/85[0m images OK, [1;38;5;214m8/50[0m are in progress and [1;31m1[0m have failed[1;32m1/85[0m images OK, [1;38;5;214m8/50[0m are in progress and [1;31m2[0m have failed[1;32m1/85[0m images OK, [1;38;5;214m9/50[0m are in progress and [1;31m2[0m have failed[1;32m2/85[0m images OK, [1;38;5;214m9/50[0m are in progress and [1;31m2[0m have

[1;32m46/85[0m images OK, [1;38;5;214m13/50[0m are in progress and [1;31m26[0m have faile[1;32m47/85[0m images OK, [1;38;5;214m12/50[0m are in progress and [1;31m26[0m have faile[1;32m48/85[0m images OK, [1;38;5;214m12/50[0m are in progress and [1;31m26[0m have faile[1;32m48/85[0m images OK, [1;38;5;214m11/50[0m are in progress and [1;31m26[0m have faile[1;32m49/85[0m images OK, [1;38;5;214m11/50[0m are in progress and [1;31m26[0m have faile[1;32m49/85[0m images OK, [1;38;5;214m10/50[0m are in progress and [1;31m26[0m have faile[1;32m50/85[0m images OK, [1;38;5;214m10/50[0m are in progress and [1;31m26[0m have faile[1;32m50/85[0m images OK, [1;38;5;214m9/50[0m are in progress and [1;31m26[0m have failed[1;32m51/85[0m images OK, [1;38;5;214m8/50[0m are in progress and [1;31m26[0m have failed[1;32m52/85[0m images OK, [1;38;5;214m7/50[0m are in progress and [1;31m26[0m have failed[1;32m53/85[0m images OK, [1;38;5;214m6/50[0m 

In [49]:
# clean with jpeginfo! 
def clean_up_bad_jpegs(cfg,ext_list):
    for extension in ext_list:
        os.chdir(cfg['image_dir'])
        nprint("Search for Error files in {}......".format(cfg['image_dir']))
        # handle both jpg //jpeg
        command = "find . -name \"*.{}\"".format(extension) + \
          " | xargs -i {}".format(cfg["jpeginfo"]) + \
          " -c {} | grep ERROR"
        nprint("Running command : {}".format(command))
        !{command}
        nprint("Removing any error files listed above")
        command = command + ' | cut -d " " -f1 | xargs -i rm {} '
        nprint("Running command : {}".format(command))
        !{command}
        nprint("Done")

def remove_non_jpg(cfg,ext_list):
    for extension in ext_list:
        command = "find . -name \"*.{extension}\""
        nprint(command)
    
clean_up_bad_jpegs(cfg,["jpg","jpeg"])

remove_non_jpg(cfg,["png","webpm"])

**clean_up_bad_jpegs** : Search for Error files in /gpfs/home/s4s004/vanstee/2020-05-wmla/images......


KeyError: 'jpeginfo'

In [6]:
cd ../test

/tmp/CIFAR-10-images/test


In [7]:
testing_path = %pwd

#### Copy the Dataset Training and Testing folder

In [8]:
print ('training_path: ' + training_path)
print ('testing_path:' + testing_path)

training_path: /tmp/CIFAR-10-images/train
testing_path:/tmp/CIFAR-10-images/test


### Download model

In [9]:
cd ../..

/tmp


In [10]:
!git clone https://us-south.git.cloud.ibm.com/ibmconductor-deep-learning-impact/dli-1.2.3-tensorflow-samples.git

Cloning into 'dli-1.2.3-tensorflow-samples'...
remote: Enumerating objects: 308, done.[K
remote: Counting objects: 100% (308/308), done.[K
remote: Compressing objects: 100% (227/227), done.[K
remote: Total 539 (delta 111), reused 252 (delta 79)[K
Receiving objects: 100% (539/539), 448.54 KiB | 0 bytes/s, done.
Resolving deltas: 100% (212/212), done.


In [11]:
cd dli-1.2.3-tensorflow-samples/tensorflow-1.13.1/cifar10

/tmp/dli-1.2.3-tensorflow-samples/tensorflow-1.13.1/cifar10


In [12]:
model_path = %pwd
print ('model_path: '+ model_path)

model_path: /tmp/dli-1.2.3-tensorflow-samples/tensorflow-1.13.1/cifar10


## Import Dataset
<a id='Import-dataset'></a>
[Back to top](#Contents)

**Data Scientist could bring in their dataset and transform data set as common output format in Watson ML Accelerator.  In this scenario raw images are converted into TensorflowRecord format.**

1. Lets swtich back to the browse:  https://IP_address:8443/platform
2. At the top Left select **Workload** > **Spark** > **Deep Learning**
3. Select the **Datasets** tab, and click **New**
4. Retrieve dataset trainig_path and dataset testing_path





In [13]:
print ('training_path: ' + training_path)
print ('testing_path:' + testing_path)

training_path: /tmp/CIFAR-10-images/train
testing_path:/tmp/CIFAR-10-images/test


5. Click **Images for Object Classification**. When presented with a dialog box, provide a unique name (lets use "Cifar10"!!!) and select the TFRecords for 'Dataset stores images in',  and then set the value of "Training folder" and "Testing folder" with the folder that contains the images obtained in the previous step ("**/tmp/CIFAR-10-images/train**" + "**/tmp/CIFAR-10-images/train**").  The other fields are fine to use with the default settings. When you're ready, click Create.

<br>

![](https://github.com/IBM/wmla-assets/raw/master/WMLA-learning-journey/image-classification-with-WMLA-UI/Shared-images/ImportDataset.png)

In [14]:
### Remove dataset from the file system.   
### Before proceeding to this step please ensure the Import Dataset is in FINISHED state

!rm -rf /tmp/CIFAR-10-images
!rm /tmp/CIFAR10-images.zip

## Build the model

<a id='Build-the-model'></a>
[Back to top](#Contents)

1. Select the Models tab and click **New** > **Add Location**
2. Retrieve the model path


In [15]:
print ('model_path: '+ model_path)

model_path: /tmp/dli-1.2.3-tensorflow-samples/tensorflow-1.13.1/cifar10


3. When presented with a diaglog box,  enter following attributes:
![](https://github.com/IBM/wmla-assets/raw/master/WMLA-learning-journey/image-classification-with-WMLA-UI/Shared-images/modelcreation3.png)
<br>
4. Select the **Tensorflow-cifar10** and click **Next**.

5. When presented with a dialog box, ensure that the Training engine is set to singlenode and that the data set points to the one you just created
![](https://github.com/IBM/wmla-assets/raw/master/WMLA-learning-journey/image-classification-with-WMLA-UI/Shared-images/modelcreation1.png)
<br>
6. Set the following parameters and click **Add**
![](https://github.com/IBM/wmla-assets/raw/master/WMLA-learning-journey/image-classification-with-WMLA-UI/Shared-images/modelcreation2.png)
<br>
7.  The model is now ready to be trained.

In [16]:
## Clean up Model 
### Before proceeding to this step please ensure the Model Creation is in FINISHED state

!rm -rf /tmp/dli-1.2.3-tensorflow-samples

## Tune Hyper-parameter

**Watson ML Accelerator automates the search for optimal hyperpamater by automating tuning jobs in parallel with four out-of-box search algorithm: Random Search, Bayesian, TPE, Hyperband,  prior to the commencement of the training process.** 

<a id='Tune-hyper-parameter'></a>
[Back to top](#Contents)

1. You could search optimal hyperparameter by leveraging automated Hyper-parameter Tuning.
1. Back at the **Models** tab, **click** on the model 
1. Navigate from the **Overview panel** to the **Hyperparameter Tuning** panel
1. Click **New**
1. When presented with a dialog box, enter following value and click **Start Tuning**

![](https://github.com/IBM/wmla-assets/raw/master/WMLA-learning-journey/image-classification-with-WMLA-UI/Shared-images/modeltune1.png)
![](https://github.com/IBM/wmla-assets/raw/master/WMLA-learning-journey/image-classification-with-WMLA-UI/Shared-images/modeltune2.png)
1. Under the **Hyperparameter Tuning** panel, click on the hyperparameter search job 
![](https://github.com/IBM/wmla-assets/raw/master/WMLA-learning-journey/image-classification-with-WMLA-UI/Shared-images/modeltune3.png)
1. Navigate from the **Input panel** to the **Progress panel** and **Best panel** to review the optimal set of hyperparameter
![](https://github.com/IBM/wmla-assets/raw/master/WMLA-learning-journey/image-classification-with-WMLA-UI/Shared-images/modeltune4.png)
![](https://github.com/IBM/wmla-assets/raw/master/WMLA-learning-journey/image-classification-with-WMLA-UI/Shared-images/modeltune5.png)

## Run Training

<a id='Run-training'></a>
[Back to top](#Contents)

1. Back at the **Models** tab, select the model you created in previous step and click **Train**
1. When presented with a dialog box, keep default parameter and click **Start Training**
![](https://github.com/IBM/wmla-assets/raw/master/WMLA-learning-journey/image-classification-with-WMLA-UI/Shared-images/modeltrain1.png)

## Inspect Training Run

<a id='Inspect-training-run'></a>
[Back to top](#Contents)

**Spectrum Deep Learning Impact Insight offers Data Scientist the visualization to monitor training metric including loss rate and accuracy as epochs continue to execute.  With this insight Data Scientist could decide to terminate the model training if there is no further gain in accuracy and no further drop in loss rate.**

1. From the **Train** submenu of the **Models** tab, select the model that is training by clicking the link.
1. Navigate from the **Overview panel** to the **Training** panel, and click the most recent link. You can watch as the results roll in.
![](https://github.com/IBM/wmla-assets/raw/master/WMLA-learning-journey/image-classification-with-WMLA-UI/Shared-images/modeltrain2.png)

## Create an inference model

**You are now ready to validate your training result by deploying your trained model as inference service.   
You can submit inference request to inference restapi end point**

<a id='Create-an-inference-model'></a>
[Back to top](#Contents)


1. From the Training view, click Create Inference Model.
![](https://github.com/IBM/wmla-assets/raw/master/WMLA-learning-journey/image-classification-with-WMLA-UI/Shared-images/inference1.png)
1. This creates a new model in the Models tab. You can view it by going to the Inference submenu.
![](https://github.com/IBM/wmla-assets/raw/master/WMLA-learning-journey/image-classification-with-WMLA-UI/Shared-images/inference2.png)

## Test it out
<a id='Test-it-out'></a>
[Back to top](#Contents)

1. Download [inference test image](https://github.com/IBM/wmla-assets/raw/master/WMLA-learning-journey/image-classification-with-WMLA-UI/Shared-images/car.jpg) to your laptop

1. Go back to the Models tab, select the new inference model, and click Test. At the new Testing overview screen, select New Test.
![](https://github.com/IBM/wmla-assets/raw/master/WMLA-learning-journey/image-classification-with-WMLA-UI/Shared-images/inference3.png)

1.  When presented with a dialog box, click **Choose File** to load the inference test image.  Click **Start Test**
![](https://github.com/IBM/wmla-assets/raw/master/WMLA-learning-journey/image-classification-with-WMLA-UI/Shared-images/inference4.png)

1. Wait for the test state to change from RUNNING to FINISHED.  Click the link to view the results of the test.
![](https://github.com/IBM/wmla-assets/raw/master/WMLA-learning-journey/image-classification-with-WMLA-UI/Shared-images/inference5.png)

1. As you can see, the images are available as a thumbnail preview along with their classified label and probability.

![](https://github.com/IBM/wmla-assets/raw/master/WMLA-learning-journey/image-classification-with-WMLA-UI/Shared-images/inference6.png)

#### This is version 1.0 and its content is copyright of IBM.   All rights reserved.   


