# AWS SageMaker Tutorial Part 3
https://smusg.udemy.com/course/build-an-aws-machine-learning-pipeline-for-object-detection/learn/lecture/36969246#overview
-	https://github.com/patrikszepesi/EcoAI (MERN app code)
-	https://github.com/patrikszepesi/MachineLearningSourceCodeCourse2 (ML code)


# Collab Shortcuts
Moving / Creating / Deleting
*   move cell up ctrl+m K
*   move cell down ctrl+m J
*   create a new cell above ctrl+m a
*   create a new cell below ctrl+m b
*   delete a cell ctrl+m d

Conversion
*   convert a text cell to code cell ctrl + m + y
*   convert a code cellt o text cell ctrl + m + m (double tap m)

Find and Replace
*   find and replace within cell ctrl + shift + h
*   find and replace within entire notebook ctrl + h

Running code
*   ctrl + enter to run current cell
*   alt + Enter to run current cell and create new cell below
*   ctrl + shift + enter to run selection




```
# ctrl + m + i to interrupt m
# ctrl + m + l to toggle line numbers
# ctrl + m + o to toggle output
print ('1')
```


# Setting up Step Functions
State machine

AWS Step Functions is a serverless orchestration service that lets you integrate with AWS Lambda functions and other AWS services to build business critical applications.

And through the graphical console you can see your applications workflow as a series of event driven "steps" and step functions are based on state machines and tasks.

* A state machine is basically a workflow 
* A task is a state in a workflow that represents a single unit of work that other AWS service performs.

So basically each step in a workflow is a state, and there are three ways you can create your step functions:
* Using drag and drop nodes (Design your workflow visually)
* Coding from scratch (Write your workflow)
* Using pre-build templates (Run a sample project)
<br>
<br>
<center><img src="img/stepfunction 00.png"/></center>
<p style="text-align: center">
    <b>Step Function Creation</b>
</p><br>

<div style="display: flex; justify-content: center; align-items: center;">
    <img src="img/stepfunction 01.png" style="margin: 10px; max-width: 50%; height: auto;"/>
    <img src="img/stepfunction 02.png" style="margin: 10px; max-width: 50%; height: auto;"/>
</div>
<p style="text-align: center">
    <b>Optional visual method</b>
</p>

<div style="display: flex; justify-content: center; align-items: center;">
    <img src="img/stepfunction 03.png" style="margin: 10px; max-width: 50%; height: auto;"/>
    <img src="img/stepfunction 04.png" style="margin: 10px; max-width: 50%; height: auto;"/>
</div>
<p style="text-align: center">
    <b>Select Job Poller under "Run a sample project..."</b>
</p>

For the sake of learning to code, select "Run a sample project..." and scroll down to select "Job Poller"

In an asynchronous process, the system can perform other tasks or simply wait without blocking resources while the batch job completes.
The polling mechanism checks the job status at intervals, and once the job is complete, it triggers the next step (Lambda function) without blocking the entire workflow.

The reason we're using this is because our workflow consists of:
* starting an AWS batch transformation job 
* checking to see if that job is finished. 
* if the job is finished, save the results to S3 using a lambda function.

<div style="display: flex; justify-content: center; align-items: center;">
    <img src="img/stepfunction 05.png" style="margin: 10px; max-width: 50%; height: auto;"/>
    <img src="img/stepfunction 06.png" style="margin: 10px; max-width: 50%; height: auto;"/>
</div>
<p style="text-align: center">
    <b>Click the StackID to view CloudFormation process</b>
</p>

Deploy the resources necessary to create the step function using AWS CloudFormation (automates creation of the Lambda functions). (**Note this can take up to 10 min**).

<div style="display: flex; justify-content: center; align-items: center;">
    <img src="img/stepfunction 07.png" style="margin: 10px; max-width: 50%; height: auto;"/>
    <img src="img/stepfunction 08.png" style="margin: 10px; max-width: 50%; height: auto;"/>
</div>
<p style="text-align: center">
    <b>Refresh often to check CloudFormation completion.</b>
</p>


<center><img src="img/stepfunction 09.png"/></center>
<div style="display: flex; justify-content: center; align-items: center;">
    <img src="img/stepfunction 10.png" style="margin: 10px; max-width: 50%; height: auto;"/>
    <img src="img/stepfunction 11.png" style="margin: 10px; max-width: 50%; height: auto;"/>
</div>
<p style="text-align: center">
    <b>View of created State Machine workflow for AWS Step Function.</b>
</p>

# Modify Submit Job Lambda Function
Proceed into the Pol-Submit Lambda function to modify it. Remember to save by clicking "Deploy".

<center><img src="img/lambda 001.png"/></center>

```
import json
import boto3
from datetime import datetime
import random

client = boto3.client('sagemaker')

def lambda_handler(event, context):
    
    date_today = datetime.today().strftime('%Y-%m-%d-%h')

    year = date_today[0:4]
    month = date_today[5:7]
    day = date_today[8:10]
    hour = date_today[11:13]
    
    print(year,month,day, hour)
```

<div style="display: flex; justify-content: center; align-items: center;">
    <img src="img/lambda 002.png" style="margin: 10px; max-width: 50%; height: auto;"/>
    <img src="img/lambda 003.png" style="margin: 10px; max-width: 50%; height: auto;"/>
</div>
<p style="text-align: center">
    <b>Modify the Lambda function code and deploy it. Create any test event for the test</b>
</p>


```
{
    "key1": 100,
    "key2": 200,
}
```

<!-- <center><img src="img/lambda 05.png"/></center> -->
<div style="display: flex; justify-content: center; align-items: center;">
    <img src="img/lambda 005.png" style="margin: 10px; max-width: 50%; height: auto;"/>
    <img src="img/lambda 004.png" style="margin: 10px; max-width: 50%; height: auto;"/>
</div>
<p style="text-align: center">
    <b>Run test to ensure Lambda code works.</b>
</p>


# Creating model that triggers Lambda Function
Now, open a SEPARATE tab to the best training job from the tuning in AWS SageMaker. Click "Create Model" from it. Set the model name as
```
object-detection-plastic
```
<center><img src="img/lambda 006.png"/></center>
<div style="display: flex; justify-content: center; align-items: center;">
    <img src="img/lambda 007.png" style="margin: 10px; max-width: 50%; height: auto;"/>
    <img src="img/lambda 008.png" style="margin: 10px; max-width: 50%; height: auto;"/>
</div>
<p style="text-align: center">
    <b>Create Model within best training job.</b>
</p>

Create a new IAM role for this model. Once its created, click into its link to view the IAM role in the IAM.Management Console in a new tab.

Click "Add Permissions" > "Attach Policies" to set up the necessary policies for it. For simplicity, choose to add "AdministratorAccess" (In proper projects, IAM policies would be more nuanced).

<div style="display: flex; justify-content: center; align-items: center;">
    <img src="img/lambda 009.png" style="margin: 10px; max-width: 50%; height: auto;"/>
    <img src="img/lambda 010.png" style="margin: 10px; max-width: 50%; height: auto;"/>
</div>
<p style="text-align: center">
    <b>Set policies for the model's IAM role .</b>
</p>

With the IAM role setup, return to the 'Create Model' tab to click "Create model" at the bottom.

# S3 URI linkage
Next, open another SEPARATE tab to the S3 storage. 

Go into the relevant bucket made for uploading and outputting images ('obj-detection-batch-transform'), into the "uploading" images folder and copy the '2023' S3 folder URI path by clicking "**Copy S3 URI**"

<div style="display: flex; justify-content: center; align-items: center;">
    <img src="img/lambda 011.png" style="margin: 10px; max-width: 50%; height: auto;"/>
</div>
<p style="text-align: center">
    <b>Set policies for the model's IAM role .</b>
</p>

At this point, the Lambda code should look like this (with the relevant model name and S3 folder URIs): 

In [None]:
import json
import boto3
from datetime import datetime
import random

client = boto3.client('sagemaker')

def lambda_handler(event, context):
    
    date_today = datetime.today().strftime('%Y-%m-%d')

    year = date_today[0:4]
    month = date_today[5:7]
    day = date_today[8:10]
    
    print(year,month,day)
    
    response = client.create_transform_job(
        
        # Given a batch size of 100 images and MaxPayloadInMB = 6,
        # each inference request will contain a maximum of 6MB of input data (~10, 12 images depending on image size)
        
        TransformJobName = f'{year}-{month}-{day}-object-detection-unique',
        ModelName = 'object-detection-plastic',
        MaxPayloadInMB = 100, # max input size for each image (data sample) in a batch.
        
        # Declare that the data source is S3 with 'S3prefix'
        # Give it the relevant keys with S3Uri variable
        TransformInput = {
            
            'DataSource': {
                'S3DataSource': {
                    'S3DataType': 'S3Prefix',
                    'S3Uri': f's3://object-detection-batch-transform/images/{year}/{month}/{day}/'
                }
                
            },
            
            'ContentType' : 'image/jpeg',
            'CompressionType': 'None',
            'SplitType': 'None'
        },
        
        # Link output to S3 bucket's "batch-output" folder
        TransformOutput = {
            'S3OutputPath': f's3://object-detection-batch-transform/batch-output/{year}/{month}/{day}',
            'AssembleWith': 'None'
        },
        
        # Specify the model's instance type and count
        TransformResources = {
            'InstanceType': 'ml.m4.xlarge',
            'InstanceCount': 1
        },
        
        # Input and output filter are JSON path expression used to select a portion of the input data to pass to the algorithm.
        # To pass the entire dataset to the algorithm, then use the default value with $.
        DataProcessing = {
            'InputFilter': '$',
            'OutputFilter': '$',
            'JoinSource': 'None'
        }
        
    )
    
    return {
        'body': response
    }

# Modify SubmitJob Lambda Function.

The Lambda function itself will require permission to call the batch transform job, run the model and write to S3

Go to the "Configuration" Tab > "Permissions" and click the role name. Once again in the IAM Management Console, click "Add Permissions" > "Add Policies" and select 'AdministratorAccess'.

<!-- <center><img src="img/lambda 011.png"/></center> -->
<div style="display: flex; justify-content: center; align-items: center;">
    <img src="img/lambda 012.png" style="margin: 10px; max-width: 50%; height: auto;"/>
    <img src="img/lambda 013.png" style="margin: 10px; max-width: 50%; height: auto;"/>
</div>
<div style="display: flex; justify-content: center; align-items: center;">
    <img src="img/lambda 014.png" style="margin: 10px; max-width: 50%; height: auto;"/>
    <img src="img/lambda 015.png" style="margin: 10px; max-width: 50%; height: auto;"/>
</div>
<p style="text-align: center">
    <b>Modify the Lambda function code and deploy it. Create any test event for the test</b>
</p>

# Modify CheckJob Lambda Function
Now, do the same for the Job Status checking Lambda function. Modify its code and deploy it. 

<center><img src="img/lambda 100.png"/></center>

<div style="display: flex; justify-content: center; align-items: center;">
    <img src="img/lambda 101.png" style="margin: 10px; max-width: 50%; height: auto;"/>
    <img src="img/lambda 015.png" style="margin: 10px; max-width: 50%; height: auto;"/>
</div>
<p style="text-align: center">
    <b>Modify the Lambda function code and deploy it.</b>
</p>

Ensure that the adminstator access is provided to the CheckJob function as well. 

In [None]:
import json
import boto3
from datetime import datetime

client = boto3.client('sagemaker') 

date_today = datetime.today().strftime('%Y-%m-%d')
year = date_today[0:4]
month = date_today[5:7]
day = date_today[8:10]

print(year, month, day)

def lambda_handler(event, context):
    
    try:
       
       name = f'{year}-{month}-{day}-object-detection-unique'
       
       response = client.describe_transform_job(TransformJobName = name)
       
       jobStatus = response['TransformJobStatus']
        
       return jobStatus #Failed, Completed, InProgress
       
       
    except Exception as e:
        print(e)
        message = 'Error getting Batch Job status'
        print(message)
        raise Exception(message)

# Create 3rd Lambda Function
Lastly, create a new lambda function (to clean and sort the detection ouputs). Set the runtime to ```Python 3.9```. 

<center><img src="img/lambda 200.png"/></center>
<div style="display: flex; justify-content: center; align-items: center;">
    <img src="img/lambda 201.png" style="margin: 10px; max-width: 50%; height: auto;"/>
    <img src="img/lambda 202.png" style="margin: 10px; max-width: 50%; height: auto;"/>
</div>
<p style="text-align: center">
    <b>Create a new 3rd Lambda function.</b>
</p>
<center><img src="img/lambda 203.png"/></center>
<p style="text-align: center">
    <b>Write the Lambda function code and deploy it.</b>
</p>

Ensure that the adminstator access is provided to the CheckJob function as well. 

<div style="display: flex; justify-content: center; align-items: center;">
    <!-- <img src="img/lambda 203.png" style="margin: 10px; max-width: 50%; height: auto;"/> -->
    <img src="img/lambda 204.png" style="margin: 10px; max-width: 50%; height: auto;"/>
</div>

In [None]:
import json
import boto3
import os
from datetime import datetime

date_today = datetime.today().strftime('%Y-%m-%d')
year = date_today[0:4]
month = date_today[5:7]
day = date_today[8:10]

s3 = boto3.client('s3')
BUCKET = 'obj-detection-batch-transform'
FOLDER = f'batch-output/{year}/{month}/{day}'

def lambda_handler(event, context):
    
    class my_dictionary(dict):
        
        def __init__(self):
            self = dict()
        
        def add(self, key, value):
            self[key] = value
            
    dict_obj = my_dictionary()
    
    paginator = s3.get_paginator('list_objects_v2')
    pages = paginator.paginate(Bucket = BUCKET, Prefix = FOLDER)
    
    for page in pages:
        for obj in page['Contents']:
            # Avoid looping through irrelevant .AWS/config files with short names 
            if len(obj['Key']) > 30:
                file_key = obj['Key']
                response = s3.get_object(Bucket = BUCKET, Key = file_key)
                content = response['Body']
                jsonObject = json.loads(content.read())
                detections = jsonObject['prediction']
                
                temp_arr = []
                # Loop through the detections values and append them to the temp_arr
                for det in detections:
                    #print(det)
                    (klass, score, x0, y0, x1, y1) = det 
                    if score < 0.25:
                        continue
                    arr = [klass, score, x0, y0, x1, y1]
                    temp_arr.append(arr)
                dict_obj.add(file_key, temp_arr)
                
    results = json.dumps(dict_obj, indent = 4)
    json_name = f"{year}_{month}_{day}"
    
    # Save the results to a JSON file (2025_02_11.json) in /tmp
    with open(f'/tmp/{json_name}.json', 'w') as outfile:
        outfile.write(results)
    
    file_name = f'/tmp/{year}_{month}_{day}.json'
    desired_name_s3 = f"cleansed-jsons/{year}/{month}/{day}/{year}_{month}_{day}.json"
    
    s3_resource = boto3.resource('s3')
    s3_resource.Bucket(BUCKET).upload_file(file_name, desired_name_s3)
    
    os.remove(file_name) # Remove the file from /tmp
    
    return(
        {'body': results}
        )

# End