## Amazon Lookout for Vision Lab

To help you learn about creating a model, Amazon Lookout for Vision provides example images of circuit boards (circuit_board) that you can use. These images are taken from https://docs.aws.amazon.com/lookout-for-vision/latest/developer-guide/su-prepare-example-images.html.

### Environmental variables

In a very first step we want to define the two global variables needed for this notebook:

- bucket: the S3 bucket that you will create and then use as your source for Amazon Lookout for Vision
    - Note: Please read the comments carefully. Depending on your region you need to uncomment the correct command
- project: the project name you want to use in Amazon Lookout for Vision

In [1]:
import os
import boto3

bucket = "udacity-ml-aws-large-dist-models"
project = "circuitproject"
os.environ["BUCKET"] = bucket
os.environ["REGION"] = boto3.session.Session().region_name

#client = boto3.client('lookoutvision')
client=boto3.Session().client('sagemaker')
print(client)

<botocore.client.SageMaker object at 0x7f23c4650df0>


You can check your region here with:

In [2]:
# Check your region:
print(boto3.session.Session().region_name)

us-east-2


Depending on your region follow the instructions of the next cell:

## Image Preparation and EDA

In Amazon Lookout for Vision - see also
- https://aws.amazon.com/lookout-for-vision/ and
- https://aws.amazon.com/blogs/aws/amazon-lookout-for-vision-new-machine-learning-service-that-simplifies-defect-detection-for-manufacturing/
if you already have pre-labeled images available, as it is the case in this example, you can already establish a folder structure that lets you define training and validation. Further, images are labeled for Amazon Lookout via the corresponding folder (normal=good, anomaly=bad).

We will import the sample images provided by AWS Lookout of Vision. If you're importing your own images, you will prepare them at this stage.

### Generate the *manifest* files

You might be familiar with the manifest files if you ever used Amazon SageMaker Ground Truth. If you are not don't worry about that section too much.

If you are still interested in what's happening, you can continue reading:

Each dataset training/ as well as validation/ needs a manifest file. This file is used by Amazon Lookout for Vision to determine where to look for the images. The manifest follows a fixed structure. Most importantly are the keys (it's JSON formatted) *source-ref* this is the location for each file, *auto-label* the value for each label (0=bad, 1=good), *folder* which indicates whether Amazon Lookout is using training or validation and *creation-date* as this let's you know when an image was put in place. All other fields are pre-set for you.

Each manifest file itself contains N JSON objects, where N is the number of images that are used in this dataset.

In [3]:
# Datetime for datetime generation and json to dump the JSON object
# to the corresponding files:
from datetime import datetime
import json

# Current date and time in manifest file format:
now = datetime.now()
dttm = now.strftime("%Y-%m-%dT%H:%M:%S.%f")

# The two datasets used: train and test
datasets = ["train", "test"]

# For each dataset...
for ds in datasets:
    # ...list the folder available (normal or anomaly).
    #print(ds)
    folders = os.listdir("./circuitboard/{}".format(ds))
    # Then open the manifest file for this dataset...
    with open("{}.manifest".format(ds), "w") as f:
        for folder in folders:
            filecount=0
            #print(folder)
            # ...and iterate through both folders by first listing
            # the corresponding files and setting the appropriate label
            # (as noted above: 1 = good, 0 = bad):
            files = os.listdir("./circuitboard/{}/{}".format(ds, folder))
            label = 1
            if folder == "anomaly":
                label = 0
            # For each file in the folder...
            for file in files:
                filecount+=1
                print(filecount)
                # Uncomment the following two lines to use the entire dataset
                if filecount>20:
                    break
                # ...generate a manifest JSON object and save it to the manifest
                # file. Don't forget to add '/n' to generate a new line:
                manifest = {
                  "source-ref": "s3://{}/{}/{}/{}/{}".format(bucket,project, ds, folder, file),
                  "auto-label": label,
                  "auto-label-metadata": {
                    "confidence": 1,
                    "job-name": "labeling-job/auto-label",
                    "class-name": folder,
                    "human-annotated": "yes",
                    "creation-date": dttm,
                    "type": "groundtruth/image-classification"
                  }
                }
                f.write(json.dumps(manifest)+"\n")

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20


### Upload manifest files and images to S3

Now it's time to upload all the images and the manifest files:

In [4]:
# Upload manifest files to S3 bucket:
!aws s3 cp train.manifest s3://{bucket}/{project}/train.manifest
!aws s3 cp test.manifest s3://{bucket}/{project}/test.manifest

upload: ./train.manifest to s3://udacity-ml-aws-large-dist-models/circuitproject/train.manifest
upload: ./test.manifest to s3://udacity-ml-aws-large-dist-models/circuitproject/test.manifest


In [5]:
# Upload images to S3 bucket:
!aws s3 cp circuitboard/train/normal s3://{bucket}/{project}/train/normal --recursive
!aws s3 cp circuitboard/train/anomaly s3://{bucket}/{project}/train/anomaly --recursive

!aws s3 cp circuitboard/test/normal s3://{bucket}/{project}/test/normal --recursive
!aws s3 cp circuitboard/test/anomaly s3://{bucket}/{project}/test/anomaly --recursive

upload: circuitboard/train/normal/train-normal_1.jpg to s3://udacity-ml-aws-large-dist-models/circuitproject/train/normal/train-normal_1.jpg
upload: circuitboard/train/normal/train-normal_16.jpg to s3://udacity-ml-aws-large-dist-models/circuitproject/train/normal/train-normal_16.jpg
upload: circuitboard/train/normal/train-normal_15.jpg to s3://udacity-ml-aws-large-dist-models/circuitproject/train/normal/train-normal_15.jpg
upload: circuitboard/train/normal/train-normal_14.jpg to s3://udacity-ml-aws-large-dist-models/circuitproject/train/normal/train-normal_14.jpg
upload: circuitboard/train/normal/train-normal_17.jpg to s3://udacity-ml-aws-large-dist-models/circuitproject/train/normal/train-normal_17.jpg
upload: circuitboard/train/normal/train-normal_11.jpg to s3://udacity-ml-aws-large-dist-models/circuitproject/train/normal/train-normal_11.jpg
upload: circuitboard/train/normal/train-normal_13.jpg to s3://udacity-ml-aws-large-dist-models/circuitproject/train/normal/train-normal_13.jpg
u

In [7]:
import json

with open('train.manifest', 'r') as f:
    train_manifest = json.loads(f)



TypeError: the JSON object must be str, bytes or bytearray, not TextIOWrapper