# Inventory Monitoring at Distribution Centres
Train a model to count the number of items in a box using Amazon SageMaker.

**TODO**: Give a helpful introduction to what this notebook is for. Remember that comments, explanations and good documentation make your project informative and professional.

**Note:** This notebook has a bunch of code and markdown cells with TODOs that you have to complete. These are meant to be helpful guidelines for you to finish your project while meeting the requirements in the project rubrics. Feel free to change the order of the TODO's and/or use more than one cell to complete all the tasks.

In [None]:
# TODO: Install any packages that you might need

In [2]:
import os
import json
import boto3
import tqdm
import sagemaker

from sagemaker.pytorch import PyTorch

## Data Preparation
**TODO:** Run the cell below to download the data.

The cell below creates a folder called `train_data`, downloads training data and arranges it in subfolders. Each of these subfolders contain images where the number of objects is equal to the name of the folder. For instance, all images in folder `1` has images with 1 object in them. Images are not divided into training, testing or validation sets. If you feel like the number of samples are not enough, you can always download more data (instructions for that can be found [here](https://registry.opendata.aws/amazon-bin-imagery/)). However, we are not acessing you on the accuracy of your final trained model, but how you create your machine learning engineering pipeline.

In [None]:
def download_and_arrange_data():
    s3_client = boto3.client('s3')

    with open('file_list.json', 'r') as f:
        d=json.load(f)

    for k, v in d.items():
        print(f"Downloading Images with {k} objects")
        directory=os.path.join('train_data', k)
        if not os.path.exists(directory):
            os.makedirs(directory)
        for file_path in tqdm(v):
            file_name=os.path.basename(file_path).split('.')[0]+'.jpg'
            s3_client.download_file('aft-vbi-pds', os.path.join('bin-images', file_name),
                             os.path.join(directory, file_name))

download_and_arrange_data()

## Dataset
**TODO:** Explain what dataset you are using for this project. Give a small overview of the classes, class distributions etc that can help anyone not familiar with the dataset get a better understanding of it. You can find more information about the data [here](https://registry.opendata.aws/amazon-bin-imagery/).

In [None]:
#TODO: Perform any data cleaning or data preprocessing

# Split the data into train and test
from os import listdir, rename, mkdir
from os.path import isfile, join

mkdir("test_data")

for i in range(0, 6):
    mypath = f'train_data/{i}'
    files = [f for f in listdir(mypath) if isfile(join(mypath, f))]
    num_test_objects = int(len(files) * 0.2)
    test_objects = files[-num_test_objects:]
    
    mkdir(f"test_data/{i}")

    for o in test_objects:
        rename(f"train_data/{i}/{o}", f"test_data/{i}/{o}")

In [None]:
#TODO: Upload the data to AWS S3
!aws s3 cp train_data_v2 s3://proj-5/train_data_v2 --recursive
!aws s3 cp test_data_v2 s3://proj-5/train_data_v2 --recursive

## Model Training
**TODO:** This is the part where you can train a model. The type or architecture of the model you use is not important. 

**Note:** You will need to use the `train.py` script to train your model.

In [8]:
#TODO: Declare your model training hyperparameter.
hyperparameters = {
    "batch-size": "32",
    "lr": "0.001",
}

In [9]:
#TODO: Create your training estimator
role = sagemaker.get_execution_role()

estimator = PyTorch(
    entry_point='train.py',
    base_job_name='mn2',
    role=role,
    instance_count=1,
    instance_type='ml.c4.2xlarge',
    framework_version='1.4.0',
    py_version='py3',
    hyperparameters=hyperparameters,
)

INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole
INFO:botocore.credentials:Found credentials from IAM Role: BaseNotebookInstanceEc2InstanceRole


In [None]:
!python testing.py

2023-04-15 12:33:42,118 | [INFO] Creating model
2023-04-15 12:33:42,595 | [INFO] Creating loss function and optimizer
2023-04-15 12:33:42,595 | [INFO] Creating data loaders
2023-04-15 12:33:42,625 | [INFO] Starting training
2023-04-15 12:33:42,626 | [INFO] Epoch 1 of 14
2023-04-15 12:33:42,626 | [INFO] Testing
Distinct labels in test_loader: {0, 1, 2, 3, 4}
before
1


In [None]:
# TODO: Fit your estimator
estimator.fit({"train": "s3://proj-5/train_data/", "test": "s3://proj-5/test_data/"}, wait=True)

INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker:Creating training-job with name: mn2-2023-04-15-23-26-02-978


2023-04-15 23:26:03 Starting - Starting the training job...
2023-04-15 23:26:19 Starting - Preparing the instances for training......
2023-04-15 23:27:21 Downloading - Downloading input data......
2023-04-15 23:28:13 Training - Training image download completed. Training in progress.[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2023-04-15 23:28:22,855 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2023-04-15 23:28:22,858 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2023-04-15 23:28:22,869 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2023-04-15 23:28:22,871 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2023-04-15 23:28:23,019 sagemaker-containers INFO     Module default_user_module_name does not provide a setup.py. 

## Standout Suggestions
You do not need to perform the tasks below to finish your project. However, you can attempt these tasks to turn your project into a more advanced portfolio piece.

### Hyperparameter Tuning
**TODO:** Here you can perform hyperparameter tuning to increase the performance of your model. You are encouraged to 
- tune as many hyperparameters as you can to get the best performance from your model
- explain why you chose to tune those particular hyperparameters and the ranges.


In [None]:
#TODO: Create your hyperparameter search space

In [None]:
#TODO: Create your training estimator

In [None]:
# TODO: Fit your estimator

In [None]:
# TODO: Find the best hyperparameters

### Model Profiling and Debugging
**TODO:** Use model debugging and profiling to better monitor and debug your model training job.

In [None]:
# TODO: Set up debugging and profiling rules and hooks

In [None]:
# TODO: Create and fit an estimator

In [None]:
# TODO: Plot a debugging output.

**TODO**: Is there some anomalous behaviour in your debugging output? If so, what is the error and how will you fix it?  
**TODO**: If not, suppose there was an error. What would that error look like and how would you have fixed it?

In [None]:
# TODO: Display the profiler output

### Model Deploying and Querying
**TODO:** Can you deploy your model to an endpoint and then query that endpoint to get a result?

In [None]:
# TODO: Deploy your model to an endpoint

In [None]:
# TODO: Run an prediction on the endpoint

In [None]:
# TODO: Remember to shutdown/delete your endpoint once your work is done

### Cheaper Training and Cost Analysis
**TODO:** Can you perform a cost analysis of your system and then use spot instances to lessen your model training cost?

In [None]:
# TODO: Cost Analysis

In [None]:
# TODO: Train your model using a spot instance

### Multi-Instance Training
**TODO:** Can you train your model on multiple instances?

In [None]:
# TODO: Train your model on Multiple Instances