# Inventory Monitoring at Distribution Centers

This notebook guides you through building, training, and deploying a machine learning model for **Inventory Monitoring at Distribution Centers** using **AWS SageMaker**. The goal is to create a model that can count the number of objects in bins based on images. This project uses the **Amazon Bin Image Dataset**, which contains images from Amazon Fulfillment Centers showing bins with various objects. 

The key steps in this project include:
1. Data preparation and upload to S3.
2. Model training on SageMaker using a custom script (`train.py`).
3. Model deployment to a SageMaker endpoint for inference.
4. Optional tasks such as hyperparameter tuning, debugging, profiling, and cost analysis.

This project focuses on implementing a machine learning engineering pipeline rather than achieving high accuracy.


In [1]:
# TODO: Install any packages that you might need
!pip install tqdm boto3



In [2]:
# TODO: Import any packages that you might need
import os
import json
import boto3
import sagemaker
from sagemaker import get_execution_role
from tqdm import tqdm

sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml


## Data Preparation
**TODO:** Run the cell below to download the data.

The cell below creates a folder called `train_data`, downloads training data and arranges it in subfolders. Each of these subfolders contain images where the number of objects is equal to the name of the folder. For instance, all images in folder `1` has images with 1 object in them. Images are not divided into training, testing or validation sets. If you feel like the number of samples are not enough, you can always download more data (instructions for that can be found [here](https://registry.opendata.aws/amazon-bin-imagery/)). However, we are not acessing you on the accuracy of your final trained model, but how you create your machine learning engineering pipeline.

In [3]:
#import os
#import json
#import boto3

def download_and_arrange_data():
    s3_client = boto3.client('s3')

    with open('file_list.json', 'r') as f:
        d=json.load(f)

    for k, v in d.items():
        print(f"Downloading Images with {k} objects")
        directory=os.path.join('train_data', k)
        if not os.path.exists(directory):
            os.makedirs(directory)
        for file_path in tqdm(v):
            file_name=os.path.basename(file_path).split('.')[0]+'.jpg'
            s3_client.download_file('aft-vbi-pds', os.path.join('bin-images', file_name),
                             os.path.join(directory, file_name))

download_and_arrange_data()

Downloading Images with 1 objects


100%|██████████| 1228/1228 [04:40<00:00,  4.38it/s]


Downloading Images with 2 objects


100%|██████████| 2299/2299 [08:55<00:00,  4.29it/s]


Downloading Images with 3 objects


100%|██████████| 2666/2666 [10:14<00:00,  4.34it/s]


Downloading Images with 4 objects


100%|██████████| 2373/2373 [08:59<00:00,  4.40it/s]


Downloading Images with 5 objects


100%|██████████| 1875/1875 [07:09<00:00,  4.37it/s]


## Dataset

For this project, we are using the **Amazon Bin Image Dataset**, which contains images of bins from Amazon Fulfillment Centers. Each image shows a bin with one or more items inside, where items are placed randomly. This dataset allows us to build a machine learning model that can classify images based on the number of objects in each bin, which is essential for efficient inventory tracking and management in distribution centers.

### Structure of the Dataset
Each image in the dataset is associated with a metadata file containing details about the items in the bin. The key information for our project is the `EXPECTED_QUANTITY`, which indicates the number of objects present in each bin. This quantity ranges from **1 to 5 objects**, and we use it to create labeled data for training a classification model with five classes:
- **Class 1**: Images with 1 object in the bin
- **Class 2**: Images with 2 objects in the bin
- **Class 3**: Images with 3 objects in the bin
- **Class 4**: Images with 4 objects in the bin
- **Class 5**: Images with 5 objects in the bin

### Data Preprocessing
For this project, the images have been organized into folders by the number of objects (1–5), based on the `EXPECTED_QUANTITY` field in the metadata. We then split the data into training, validation, and test sets to facilitate model evaluation and ensure generalization.

This dataset helps us train a classification model to determine the number of items in a bin from an image, enabling automated inventory counting at distribution centers.

More information about the dataset can be found [here](https://registry.opendata.aws/amazon-bin-imagery/).

This function splits the data into `train`, `validation`, and `test` sets based on the specified ratios. The output directory `processed_data` will contain subdirectories for training, validation, and test sets, each further organized by object count.

In [4]:
#TODO: Perform any data cleaning or data preprocessing
import shutil
from sklearn.model_selection import train_test_split

# Split data into train, validation, and test sets
def split_data(data_dir, output_dir, train_ratio=0.7, val_ratio=0.2, test_ratio=0.1):
    if os.path.exists(output_dir):
        shutil.rmtree(output_dir)
    os.makedirs(os.path.join(output_dir, 'train'))
    os.makedirs(os.path.join(output_dir, 'validation'))
    os.makedirs(os.path.join(output_dir, 'test'))
    
    for object_count in os.listdir(data_dir):
        images = os.listdir(os.path.join(data_dir, object_count))
        train, temp = train_test_split(images, train_size=train_ratio)
        val, test = train_test_split(temp, train_size=val_ratio/(val_ratio + test_ratio))
        
        for subset, subset_images in zip(['train', 'validation', 'test'], [train, val, test]):
            subset_dir = os.path.join(output_dir, subset, object_count)
            os.makedirs(subset_dir, exist_ok=True)
            for image in subset_images:
                shutil.copy2(os.path.join(data_dir, object_count, image), subset_dir)
        print(f"Data split for category '{object_count}' complete.")

# Run data split
split_data('train_data', 'processed_data')


Data split for category '1' complete.
Data split for category '2' complete.
Data split for category '3' complete.
Data split for category '4' complete.
Data split for category '5' complete.


In [5]:
#TODO: Upload the data to AWS S3

import sagemaker
from sagemaker.session import Session
from sagemaker import get_execution_role
from sagemaker.pytorch import PyTorch
#import boto3
#import os

session = sagemaker.Session()

bucket=session.default_bucket()
print("Default Bucket: {}".format(bucket))

region = session.boto_region_name
print("AWS Region: {}".format(region))

role = get_execution_role() #sagemaker iam role
print("RoleArn: {}".format(role))

data = "s3://{}/{}/".format(bucket, "inventory-monitoring")
print(f"Uploading data to S3 at {data}")
output = "s3://{}/{}/".format(bucket, "output")
model_dir = "s3://{}/{}/".format(bucket, "model")
os.environ["DEFAULT_S3_BUCKET"] =bucket
os.environ['SM_CHANNEL_TRAIN']=data 
os.environ['SM_OUTPUT_DATA_DIR']=output
os.environ['SM_MODEL_DIR']=model_dir

Default Bucket: sagemaker-eu-west-1-940426109786
AWS Region: eu-west-1
RoleArn: arn:aws:iam::940426109786:role/service-role/AmazonSageMaker-ExecutionRole-20241111T123978
Uploading data to S3 at s3://sagemaker-eu-west-1-940426109786/inventory-monitoring/


In [7]:
s3_data_path = session.upload_data(path='processed_data', bucket=bucket, key_prefix=f'inventory-monitoring/data')

In [9]:
output

's3://sagemaker-eu-west-1-940426109786/output/'

## Model Training
**TODO:** This is the part where you can train a model. The type or architecture of the model you use is not important. 

**Note:** You will need to use the `train.py` script to train your model.

In [15]:
#TODO: Declare your model training hyperparameter.
#NOTE: You do not need to do hyperparameter tuning. You can use fixed hyperparameter values
# Define hyperparameters for the training job
hyperparameters = {
    "batch-size": 64,
    "epochs": 3,
    "lr": 0.1
}


In [16]:
#TODO: Create your training estimator
# Import the PyTorch estimator
from sagemaker.pytorch import PyTorch

# Create a PyTorch Estimator
estimator = PyTorch(
    entry_point="train.py",                 # Path to the training script
    role=role,                              # IAM role with permissions
    instance_count=1,                       # Number of instances
    instance_type="ml.m5.xlarge",           # Instance type for training
    framework_version="1.8",                # PyTorch version
    py_version="py3",                       # Python version
    hyperparameters=hyperparameters,        # Hyperparameters defined above
    output_path=output                      # Path for saving output artifacts
)


In [17]:
# TODO: Fit your estimator
# Start the training job
estimator.fit({"train": s3_data_path})

INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker.image_uris:image_uri is not presented, retrieving image_uri based on instance_type, framework etc.
INFO:sagemaker:Creating training-job with name: pytorch-training-2024-11-15-22-20-24-440


2024-11-15 22:20:25 Starting - Starting the training job...
2024-11-15 22:20:39 Starting - Preparing the instances for training...
2024-11-15 22:21:04 Downloading - Downloading input data......
2024-11-15 22:22:24 Downloading - Downloading the training image...
2024-11-15 22:22:35 Training - Training image download completed. Training in progress.[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2024-11-15 22:22:43,539 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2024-11-15 22:22:43,542 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2024-11-15 22:22:43,551 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2024-11-15 22:22:43,553 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2024-11-15 22:22:43,706 sagemaker-training-t

## Standout Suggestions
You do not need to perform the tasks below to finish your project. However, you can attempt these tasks to turn your project into a more advanced portfolio piece.

### Hyperparameter Tuning
**TODO:** Here you can perform hyperparameter tuning to increase the performance of your model. You are encouraged to 
- tune as many hyperparameters as you can to get the best performance from your model
- explain why you chose to tune those particular hyperparameters and the ranges.


In [None]:
#TODO: Create your hyperparameter search space

In [None]:
#TODO: Create your training estimator

In [None]:
# TODO: Fit your estimator

In [None]:
# TODO: Find the best hyperparameters

### Model Profiling and Debugging
**TODO:** Use model debugging and profiling to better monitor and debug your model training job.

In [None]:
# TODO: Set up debugging and profiling rules and hooks

In [None]:
# TODO: Create and fit an estimator

In [None]:
# TODO: Plot a debugging output.

**TODO**: Is there some anomalous behaviour in your debugging output? If so, what is the error and how will you fix it?  
**TODO**: If not, suppose there was an error. What would that error look like and how would you have fixed it?

In [None]:
# TODO: Display the profiler output

### Model Deploying and Querying
**TODO:** Can you deploy your model to an endpoint and then query that endpoint to get a result?

In [None]:
# TODO: Deploy your model to an endpoint

In [None]:
# TODO: Run an prediction on the endpoint

In [None]:
# TODO: Remember to shutdown/delete your endpoint once your work is done

### Cheaper Training and Cost Analysis
**TODO:** Can you perform a cost analysis of your system and then use spot instances to lessen your model training cost?

In [None]:
# TODO: Cost Analysis

In [None]:
# TODO: Train your model using a spot instance

### Multi-Instance Training
**TODO:** Can you train your model on multiple instances?

In [None]:
# TODO: Train your model on Multiple Instances