# **Image Moderation Workshop**

## Prepare training data for custom model
We will use pictures in custom-model-training/data for demostration in this project. Before we train the model, we will augment more data for training.

Now, we will see how many labels in our training dataset. Now we only have one label for training. Let's check it:

In [None]:
with open('data/labels/train/classes.txt') as f:
    categories = f.read().splitlines()

print('Custom labels:', categories)

### Data Augmentation

If we don't have much data for training, data augmentation is a way to increase training set by creating modified copies from a existing dataset.
`Albumentations` efficiently implements a rich variety of image transform operations for data augmentation. We will use this library to enrich our training dataset.

In [None]:
!pip install -q albumentations opencv-python-headless
!pip install -qU opencv-python

Let's use `albumentations` to generate 9 copies for each image in dataset augmentation. It takes a minute or two, please wait for the whole process to finish.

In [None]:
import glob
import os
from pathlib import Path
from image_augment import ImageAugmentor
import tqdm

copies_per_image = 9
label_id = 0

print(f'We will create {copies_per_image} copies for label {label_id}')
ia = ImageAugmentor({label_id:copies_per_image}, "data/images/train/", "data/labels/train/")
    
for image_name in tqdm.tqdm(glob.glob('data/images/train/*')):
    
    label_name = os.path.join('data/labels/train', f"{Path(image_name).stem}.txt")

    ia.bbox_augmentation(image_name, label_name)

## Prepare training data

In order to train a custom model to detect our custom label, we will use YoloV5, which offers a real-time object detection framework with pretrained models in deep learning and computer vision. We will,
- Download a yolo pretrained model,
- Write the yaml config for next training step,
- Upload our training dataset and the pretrained model

### Download pretrained model
Now we will download yolov5s6.pt for this training, ### Prepare the training config

In [None]:
project = "content-moderation"
yolo_version = '6.2'

!mkdir -p data/cfg
!mkdir -p data/weights
!wget https://github.com/ultralytics/yolov5/releases/download/v$yolo_version/yolov5s6.pt --directory-prefix data/weights

### Write the yolo training config file
And we will create a training config file populated with the following content,

In [None]:
%%writefile data/cfg/content-moderation.yaml

train: /opt/ml/input/data/images/train/
val: /opt/ml/input/data/images/val/

# number of labels
nc: 1
# label names
names: ['tulip']

### Upload the data for training
Upload the data to default sagemaker s3 bucket

In [None]:
from datetime import datetime
import sagemaker

sagemaker_session = sagemaker.Session()
#training_job_name = job_name
sagemaker_default_bucket = sagemaker_session.default_bucket()
project_and_time_prefix = project+'-'+ datetime.now().strftime("%Y-%m-%d-%H-%M-%S")

In [None]:
!aws s3 sync data/ s3://$sagemaker_default_bucket/training_data/$project_and_time_prefix

## Training Custom model
After we prepare the training data, let's start training our model to detect custom label with Sagemaker training job. You can find the training job listed on AWS console, Sagemaker -> Training - > Training jobs.

After training, you will find the the trained model in s3,

In [None]:
import os
import sagemaker


from sagemaker.pytorch import PyTorch
from sagemaker.pytorch.model import PyTorchModel
import logging
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.INFO)

sagemaker_session = sagemaker.Session()
instance_type = 'ml.g4dn.12xlarge'
role = sagemaker.get_execution_role()

## hyperparameters for training
git_config = {'repo': 'https://github.com/ultralytics/yolov5.git', 'branch': 'v6.2'}
hyperparameters = {'data': '/opt/ml/input/data/cfg/{}.yaml'.format(project), 
                   'cfg': 'models/yolov5s.yaml',
                   'hyp': 'data/hyps/hyp.scratch-med.yaml', 
                   'weight': '/opt/ml/input/data/weights/yolov5s.pt',
                   'project': '/opt/ml/model/',
                   'name': 'tutorial',
                   'img': 640, 
                   'batch-size': 64,
                   'batch': 10,
                   'epochs': 60,
                   'device': '0,1,2,3',
                   'workers': 16
} 

## define training job
metric_definitions = [
    {'Name': 'Precision', 'Regex': r'all\s+[0-9.]+\s+[0-9.]+\s+([0-9.]+)'},
    {'Name': 'Recall', 'Regex': r'all\s+[0-9.]+\s+[0-9.]+\s+[0-9.]+\s+([0-9.]+)'},
    {'Name': 'mAP@.5', 'Regex': r'all\s+[0-9.]+\s+[0-9.]+\s+[0-9.]+\s+[0-9.]+\s+([0-9.]+)'},
    {'Name': 'mAP@.5:.95', 'Regex': r'all\s+[0-9.]+\s+[0-9.]+\s+[0-9.]+\s+[0-9.]+\s+[0-9.]+\s+([0-9.]+)'}
]

estimator = PyTorch(entry_point='train.py',
                    source_dir='.',
                    git_config=git_config,
                    role=role,
                    hyperparameters=hyperparameters,
                    framework_version='1.13.1',  # '1.8.1', '1.9.1'
                    py_version='py39',  # 'py3', 'py38'
                    script_mode=True,       
                    instance_count=1,  # 1 or 2 or ...
                    instance_type=instance_type,
                    train_max_wait=72 * 60 * 60,
                    use_spot_instances=True,
                    metric_definitions = metric_definitions,
                    distribution={"torch_distributed": {"enabled": True}},
                    base_job_name=f'yolo-{yolo_version.replace(".", "")}-hyp-med-no-aug-v6'
)

## fire the training job
data_location = f's3://{sagemaker_default_bucket}/training_data/{project_and_time_prefix}'
inputs = {'cfg': data_location+'/cfg',
          'weights': data_location+'/weights',
          'images': data_location+'/images',
          'labels': data_location+'/labels'}

estimator.fit(inputs)

In [None]:
job_name = estimator.latest_training_job.name

print(f'Trained model location: s3//{sagemaker_default_bucket}/{job_name}/output/model.tar.gz')

## Prepare the model for inference

We will prepare the output model:
- Download the training model
- Pack it with inference code

### Download the training model

Download the training model to file model.tar.gz

In [None]:
!rm -f model.tar.gz
!aws s3 cp s3://$sagemaker_default_bucket/$job_name/output/model.tar.gz model.tar.gz

In [None]:
!rm -rf tutorial
!tar -zxf model.tar.gz

### Prepare for the model deployment

Make the inference model file

In [None]:
!rm -rf model-inference
!rm -f inference-pytorch.tar.gz
!mkdir model-inference
!cp -R code model-inference/
!cp tutorial/weights/best.pt model-inference/
!cd model-inference && tar -czvf ../inference-pytorch.tar.gz *

In [None]:
!aws s3 cp inference-pytorch.tar.gz s3://$sagemaker_default_bucket/output/inference-pytorch.tar.gz