# Amazon Lookout for Vision Python SDK

In this notebook we will show you how to run k-fold Cross Validation with the Amazon Lookout for
Vision Python SDK. It gives you a programmatic way of interacting with this service and adds a lot of
helper functions that complement the service, like:

* create manifest file
* push manifest file to S3
* check image sizes if they comply with the service
* check image shapes if you need to rescale images
* rescale images based on optimal shape
* upload images to S3 in the appropriate structure
* k-fold Cross Validation

**Requirements**

Have your images on locally. The anomaly images should be stored in a separate folder than the normal images.
Also note that the only formats allowed are: jpeg, jpg and png.
The following url describes the quotas/limitation of images for training and validation --> https://docs.aws.amazon.com/lookout-for-vision/latest/developer-guide/limits.html

## Training a Model

First let's set some general variables that you need:

* input_bucket: the S3 bucket that contains your images for training a model
* project_name: the unique name of the Amazon Lookout for Vision project
* output_bucket: a bucket where your model and inference results are stored (can be same as input_bucket)
* n_splits: number of cross validations to perform, this is equivalent to k in k-fold cross validation
* normal: folder name containing the normal images
* anomaly: folder name containing the anomaly images

In [1]:
# Install the SDK using pip
#!pip uninstall lookoutvision

In [2]:
# Import all the libraries needed to get started:
from lookoutvision.image import Image
from lookoutvision.lookoutvision import LookoutForVision
import boto3

In [3]:
input_bucket = "YOUR_S3_BUCKET_FOR_TRAINING"
project_name = "YOUR_PROJECT_NAME"
# Evaluation output
output_bucket = "YOUR_S3_BUCKET_FOR_INFERENCE" # can be same as input_bucket
n_splits = 3 #number of splits for k-fold Cross Validation
normal = "FOLDER_NAME_OF_NORMAL_IMAGES"
anomaly = "FOLDER_NAME_OF_ANOMALY_IMAGES"
seed = 0

Instantiate Image class to interact with your local images and upload them to S3.

In [4]:
img = Image()

In [5]:
# Check if your local images comply with the service
sizes = img.check_image_sizes(verbose=False, normal="noncloud", anomaly="cloud")
print(sizes)

The following image is not compliant: noncloud/.DS_Store
The following image is not compliant: cloud/.DS_Store
{'noncloud': {'no_of_images': 51, 'compliant_images': 50, 'compliant': False, 'min_size': 64, 'max_size': 4096}, 'cloud': {'no_of_images': 51, 'compliant_images': 50, 'compliant': False, 'min_size': 64, 'max_size': 4096}}


In [6]:
# Check if all image shapes are the same
shapes = img.check_image_shapes(verbose=True, normal="noncloud", anomaly="cloud")
print(shapes)

The following image is not compliant: noncloud/.DS_Store
The following image is not compliant: cloud/.DS_Store
{'noncloud': {'no_of_images': 51, 'compliant': 50, 'status': 'Downsize images!', 'min_image_shape': (256, 256, 3), 'image_metadata': {'noncloud/train_775.jpg': (256, 256, 3), 'noncloud/train_198.jpg': (256, 256, 3), 'noncloud/train_987.jpg': (256, 256, 3), 'noncloud/train_606.jpg': (256, 256, 3), 'noncloud/train_570.jpg': (256, 256, 3), 'noncloud/train_176.jpg': (256, 256, 3), 'noncloud/train_837.jpg': (256, 256, 3), 'noncloud/train_83.jpg': (256, 256, 3), 'noncloud/train_1161.jpg': (256, 256, 3), 'noncloud/train_110.jpg': (256, 256, 3), 'noncloud/train_845.jpg': (256, 256, 3), 'noncloud/train_890.jpg': (256, 256, 3), 'noncloud/train_933.jpg': (256, 256, 3), 'noncloud/train_885.jpg': (256, 256, 3), 'noncloud/train_1016.jpg': (256, 256, 3), 'noncloud/train_329.jpg': (256, 256, 3), 'noncloud/train_937.jpg': (256, 256, 3), 'noncloud/train_658.jpg': (256, 256, 3), 'noncloud/train_

In [7]:
# If not: rescale them
# Note: you don't need to specify a prefix. If you do a new folder is generated for you being named
# rescaled_good and rescaled_bad. Without prefix your original images will be overwritten
resc = img.rescale(prefix="rescaled_", normal="noncloud", anomaly="cloud")
print(resc)

No rescaling needed!
{'rescaled_noncloud': 'Ok', 'rescaled_cloud': 'Ok'}


In [8]:
# Check again in rescaled folder (if you created it)
sizes = img.check_image_sizes(prefix="rescaled_", normal="noncloud", anomaly="cloud", verbose=False)
print(sizes)

Error: Methods requires folders rescaled_noncloud/ and rescaled_cloud/ with images in this location!
{}


In [9]:
# Check again in rescaled folder (if you created it)
shapes = img.check_image_shapes(prefix="rescaled_", normal="noncloud", anomaly="cloud", verbose=True)
print(shapes)

Error: Methods requires folders rescaled_noncloud/ and rescaled_cloud/ with images in this location!
{}


Once you prepared your images, have them all in the same shape and they comply with the service's rules you can upload them to your S3 bucket.
Before you upload them you need to split them into k folds to run k-fold Cross Validation. In the following the images which reside in the normal
and anomaly folders are split into n_splits different datasets.

In [10]:
training_normal, training_anomaly, validation_normal, validation_anomaly = img.kfold_split(n_splits=n_splits,
                                                                                           normal=normal,
                                                                                           anomaly=anomaly,
                                                                                           seed=0)

  training_data_normal = np.array(normal_img)[datasets_idx]
  training_data_anomaly = np.array(anomaly_img)[datasets_idx]


In [12]:
for i in range(len(training_normal)):
    print(f"Dataset {i}: Training images normal class: {len(training_normal[i])}, "
          f"Training images anomaly class: {len(training_anomaly[i])}, "
          f"Validation images normal class: {len(validation_normal[i])}, "
          f"Validation images anomaly class: {len(validation_anomaly[i])}")

Dataset 0: Training images normal class: 33, Training images anomaly class: 33, Validation images normal class: 17, Validation images anomaly class: 17
Dataset 1: Training images normal class: 33, Training images anomaly class: 33, Validation images normal class: 17, Validation images anomaly class: 17
Dataset 2: Training images normal class: 34, Training images anomaly class: 34, Validation images normal class: 16, Validation images anomaly class: 16


The k different datsets are uploaded to S3 by utilizing the Image function kfold_upload and passing a bucket, prefix and project name

In [13]:
upload_data = True
if upload_data:
    img.kfold_upload(input_bucket, f"amazon-lookout-for-vision-python-sdk/data/{project_name}/", project_name,
                 training_normal, training_anomaly, validation_normal, validation_anomaly)

Finally the k-fold Cross Validation is performed by training k models, on k different datasets and evaluating them on k different validation sets.
The validation results are returned with the call to train_k_fold.

In [None]:
l4v = LookoutForVision(project_name=project_name)
kfold_summary = l4v.train_k_fold(input_bucket=input_bucket,
                                 output_bucket=input_bucket,
                                 s3_path=f"amazon-lookout-for-vision-python-sdk/data/{project_name}/",
                                 n_splits=n_splits,
                                 parallel_training=True,
                                 delete_kfold_projects=True)

Project cloud_detection does not exist yet...use the create_project() method to set up your first project
[('sagemaker-sabina-148244586595-eu-west-1', 'sagemaker-sabina-148244586595-eu-west-1', 'k_fold_', 0, False), ('sagemaker-sabina-148244586595-eu-west-1', 'sagemaker-sabina-148244586595-eu-west-1', 'k_fold_', 1, False), ('sagemaker-sabina-148244586595-eu-west-1', 'sagemaker-sabina-148244586595-eu-west-1', 'k_fold_', 2, False)]
Project cloud_detection_0 does not exist yet...use the create_project() method to set up your first project
Project cloud_detection_1 does not exist yet...use the create_project() method to set up your first project
Creating the project: cloud_detection_0
Creating the project: cloud_detection_1
Creating dataset(s): Creating dataset(s): --------!
!
Model training started: -Model training started: --------------------------------------------------------------------------!


View the validation results and to investigate whether your the model is overfitting to different subsets.

In [18]:
kfold_summary



Unnamed: 0,ROCAUC,AveragePrecision,Precision,Recall,F1Score,model_name,model_version,NumberOfTrainImages,NumberOfTestImages
0,0.99308,0.993808,1.0,0.941176,0.969697,cloud_detection_0,1,66,34
1,0.958478,0.951572,0.894737,1.0,0.944444,cloud_detection_1,1,66,34
2,0.941406,0.966927,0.9375,0.9375,0.9375,cloud_detection_2,1,68,32
