## Wafer Classification using Amazon Lookout for Vision - Sample Jupyter Notebook

The image is taken from https://upload.wikimedia.org/wikipedia/commons/e/e2/Silicon_wafer.jpg. For the sample picture (good/bad folder) we modified the images.

### Environmental variables

In a very first step we want to define the two global variables needed for this notebook:

- bucket: the S3 bucket that you will create and then use as your source for Amazon Lookout for Vision
    - Note: Please read the comments carefully. Depending on your region you need to uncomment the correct command
- project: the project name you want to use in Amazon Lookout for Vision

In [None]:
import os
import boto3

bucket = "MY_BUCKET_NAME"
project = "MY_PROJECT_NAME"
os.environ["BUCKET"] = bucket
os.environ["PROJECT"] = project
os.environ["REGION"] = boto3.session.Session().region_name

You can check your region here with:

In [None]:
# Check your region:
print(boto3.session.Session().region_name)

Depending on your region follow the instructions of the next cell:

In [None]:
## Create your S3 bucket:
## if your region is eu-east-1 please execute:

# !aws s3api create-bucket --bucket $BUCKET

## in all other cases use:

# !aws s3api create-bucket --bucket $BUCKET --create-bucket-configuration LocationConstraint=$REGION

### Install the newest version of AWS CLI

At the time of creation Amazon SageMaker notebooks still had an older version of CLI installed. Hence, you couldn't use the newest services, e.g. Amazon Lookout for Vision. As it never hurts to run on the newest version the upcoming cells can remain.

Note: Code is taken from https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2-linux.html. Here you also find more information on updating CLI.

In [None]:
# Download CLI v2
!curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"

In [None]:
# Unzip here
!unzip awscliv2.zip

In [None]:
# Run update
!sudo ./aws/install --update

In [None]:
# Check version
!aws --version

In [None]:
# Remove downloaded/extracted files from local instance
!rm -rf aws
!rm awscliv2.zip

### Create folders that will be necessary for Amazon Lookout for Vision

In Amazon Lookout for Vision - see also
- https://aws.amazon.com/lookout-for-vision/ and
- https://aws.amazon.com/blogs/aws/amazon-lookout-for-vision-new-machine-learning-service-that-simplifies-defect-detection-for-manufacturing/
if you already have pre-labeled images available, as it is the case in this example, you can already establish a folder structure that lets you define training and validation. Further, images are labeled for Amazon Lookout via the corresponding folder (normal=good, anomaly=bad). We will create this structure locally and then upload it to S3.

In [None]:
!mkdir training
!mkdir training/anomaly
!mkdir training/normal
!mkdir validation
!mkdir validation/anomaly
!mkdir validation/normal

## Image Preparation and EDA

Next, we want to explore the images available for you a little. Then, in order to create many more images for our model training, we will rotate the images slightly and generate more of them. At the end we aim to have 90% good and 10% bad images. Which is a likely case - at least in the Yield Enhancement department of a semiconductor company.

### Load libraries and check out main image

In [None]:
# Ignore warnings for skimage to avoid frustration in this little test case :)
import skimage
import numpy as np
import warnings

warnings.filterwarnings('ignore')

In [None]:
# Load the original image (wafer.jpg)
import os
from skimage import io
from skimage.transform import rotate
from skimage import img_as_ubyte

filename = 'good/wafer.jpg'
wafer = io.imread(filename)

In [None]:
# Use matplotlib to display the image
from matplotlib import pyplot as plt

io.imshow(wafer)
plt.show()

In [None]:
# On top: load all available images (good & bad) and display them row-wise:
# Top row: good images
# Bottom row: bad images

# Define plot figure
fig = plt.figure(figsize=(12, 12))
columns = 5
rows = 2

# For all images in good/ and bad/ folder...
i = 1
path = ["good/", "bad/"]
for p in path:
    files = os.listdir(path=p)
    # ...load all files and do...
    for file in files:
        # ...check if they are not relevant,...
        if ".ipynb_checkpoints" in file:
            continue
        # ...if they are load them and add them to the plot figure
        img = io.imread("{}/{}".format(p, file))
        fig.add_subplot(rows, columns, i)
        plt.imshow(img)
        i += 1
# Finally show the image:
plt.show()

### Generate MORE images

For the sake of simplicity we will do the following:

- Take all good images
    - Rotate them 360 times (1 degree at a time)
    - Randomly save it either to training/ or validation/ folder (ratio: 80:20)
- Take all bad images
    - Rotate them 36 times (10 degrees at a time)
    - Randomly save it either to training/ or validation/ folder (ratio: 80:20)

In a real-world scenario you might want to apply even more transformation (color mapping, etc.). For this example we keep it simple.

In [None]:
# For each type of wafer good/ or bad/...
path = ["good/", "bad/"]
for p in path:
    # ...print the path (to know where the loop is) and
    # list all files.
    print(p)
    files = os.listdir(path=p)
    for file in files:
        # For each file check it's nonsense or a real image.
        if ".ipynb_checkpoints" in file:
            continue
        # If it's a real image print it (to know where the loop is) and
        # form a filename and load the corresponding wafer image:
        print(file)
        filename = '{}{}'.format(p, file)
        wafer = io.imread(filename)
        # steps = 1 if it's a good wafer ("1 degree at a time"),
        # if it's a bad wafer then make steps = 10 ("10 degrees at a time").
        # Also note that the subfolder (normal = good / anomaly = bad wafer)
        # is set here:
        steps = 1
        subfolder = "normal"
        if p == "bad/":
            steps = 10
            subfolder = "anomaly"
        # Now it's time to rotate each image and save it to the corresponding folder.
        # An important concept is to randomly assign each image either the training
        # or the validation dataset:
        for i in range(0, 360, steps):
            # ~80% training data - ~20% validation data:
            choice = np.random.choice(a=[0, 1], size=1, replace=True, p=[0.8, 0.2])[0]
            # Check the choice and set the main folder:
            folder = "training"
            if choice == 1:
                folder = "validation"
            # Form the file name, rotate the image and save it:
            fname = "{}/{}/{}_{}".format(folder, subfolder, str(i), file)
            rot = rotate(image=wafer, angle=i)
            rot = img_as_ubyte(image=rot)
            io.imsave(fname=fname, arr=rot)

### Generate the *manifest* files

You might be familiar with the manifest files if you ever used Amazon SageMaker Ground Truth. If you are not don't worry about that section too much.

If you are still interested in what's happening, you can continue reading:

Each dataset training/ as well as validation/ needs a manifest file. This file is used by Amazon Lookout for Vision to determine where to look for the images. The manifest follows a fixed structure. Most importantly are the keys (it's JSON formatted) *source-ref* this is the location for each file, *auto-label* the value for each label (0=bad, 1=good), *folder* which indicates whether Amazon Lookout is using training or validation and *creation-date* as this let's you know when an image was put in place. All other fields are pre-set for you.

Each manifest file itself contains N JSON objects, where N is the number of images that are used in this dataset.

In [None]:
# Datetime for datetime generation and json to dump the JSON object
# to the corresponding files:
from datetime import datetime
import json

# Current date and time in manifest file format:
now = datetime.now()
dttm = now.strftime("%Y-%m-%dT%H:%M:%S.%f")

# The two datasets used: training and validation
datasets = ["training", "validation"]

# For each dataset...
for ds in datasets:
    # ...list the folder available (normal or anomaly).
    folders = os.listdir("./{}".format(ds))
    # Then open the manifest file for this dataset...
    with open("{}.manifest".format(ds), "a") as f:
        for folder in folders:
            # ...and iterate through both folders by first listing
            # the corresponding files and setting the appropriate label
            # (as noted above: 1 = good, 0 = bad):
            files = os.listdir("./{}/{}".format(ds, folder))
            label = 1
            if folder == "anomaly":
                label = 0
            # For each file in the folder...
            for file in files:
                # ...generate a manifest JSON object and save it to the manifest
                # file. Don't forget to add '/n' to generate a new line:
                manifest = {
                  "source-ref": "s3://{}/{}/{}/{}".format(bucket, ds, folder, file),
                  "auto-label": label,
                  "auto-label-metadata": {
                    "confidence": 1,
                    "job-name": "labeling-job/auto-label",
                    "class-name": folder,
                    "human-annotated": "yes",
                    "creation-date": dttm,
                    "type": "groundtruth/image-classification"
                  }
                }
                f.write(json.dumps(manifest)+"\n")

### Upload manifest files and images to S3

Now it's time to upload all the images and the manifest files:

In [None]:
# Upload manifest files:
!aws s3 cp training.manifest s3://$BUCKET/training.manifest
!aws s3 cp validation.manifest s3://$BUCKET/validation.manifest

In [None]:
# Upload images:
!aws s3 cp training/ s3://$BUCKET/training/ --recursive
!aws s3 cp validation/ s3://$BUCKET/validation/ --recursive

## Amazon Lookout for Vision

We are almost done. You have a couple of options on how to create your Amazon Lookout project (console, CLI or boto3). We chose CLI in this example. We highly recommend to check out the console, too. It's so simple to generate a project and let a model be trained. This is what we should show to our customers, too!

The steps we take with CLI are:

1. Create a project (the name as been set right at the beginning)
2. Tell your project where to find your training dataset. This is done via the manifest file for training.
3. Tell your project where to find your validation dataset. This is done via the manifest file for validation.
    - Note: This step is optional. In general all 'validation' related code, etc. is optional. Amazon Lookout for Vision will also work with 'training' dataset only. We chose to use both as training and validation is a common (best) practice when training AI/ML models. And we should always let our customer know this to help them get to the next level.
4. Create a model. This command will trigger the model training and validation.

**Note**: Training a model can (will) take a few hours as it uses Deep Learning in the background. Once your model is trained (you can for instance check progress in the console) you can continue with this notebook.

In [None]:
!aws lookoutvision create-project --project-name $PROJECT
!aws lookoutvision create-dataset --project-name $PROJECT --dataset-type train --dataset-source 'GroundTruthManifest={S3Object={Bucket=$BUCKET,Key=training.manifest,VersionId="1"}}'
!aws lookoutvision create-dataset --project-name $PROJECT --dataset-type test --dataset-source 'GroundTruthManifest={S3Object={Bucket=$BUCKET,Key=validation.manifest,VersionId="1"}}'
!aws lookoutvision create-model --project-name $PROJECT --output-config 'S3Location={Bucket=$BUCKET,Prefix=model}'

### Model Deployment

Getting the model in an operating stage is as easy as telling it to "start". This process also takes a few minutes. So, please be patient. You can again check in the console (or via CLI) the status of the model.

In [None]:
!aws lookoutvision start-model --project-name $PROJECT --model-version 1 --min-inference-units 1

### Make Predictions

#### CLI

Making predictions via CLI requires the project name, model version, content type and a sample images. We are using a local images from the SageMaker instance:

In [None]:
!aws lookoutvision detect-anomalies --project-name $PROJECT --model-version 1 --content-type image/jpeg --body ./bad/particle.jpg

#### Boto3

You can also use boto3 to achieve the same. An application how this could be used can be found under https://collaborate-corp.amazon.com/nuxeo/ui/#!/doc/976afb34-b53c-4ddc-a8a6-7d316dc0f951.

As mentioned earlier the newest boto3 version is not available in all of our services. This is also true for AWS Lambda. That's why the above artifact uses Lambda Layers to enable the below boto3 call.

In [None]:
import boto3

client = boto3.client("lookoutvision")

image = open("/home/ec2-user/SageMaker/jupyter-notebook/bad/particle.jpg", "rb")

response = client.detect_anomalies(
    ProjectName=project,
    ModelVersion='1',
    Body=image,
    ContentType='image/jpeg'
)

In [None]:
response["DetectAnomalyResult"]

# BE FRUGAL

If you don't need your model anymore please stop it to save costs!

In [None]:
# !aws lookoutvision stop-model --project-name $PROJECT --model-version 1