### Configurazione della cartella base contenente le img per il train

Tutte le immagini sono inserite all'interno di un bucket S3, organizzato nel seguente modo:

    completeDataset
    ├── redcard
    |  ├── redcard.1.jpg
    |  ├── redcard.2.jpg
    |  └── . . .
    ├── yellowcard
    |  ├── ycard.1.jpg
    |  ├── ycard.2.jpg
    |  └── . . .
    ├── change
    |  ├── change.1.jpg
    |  ├── change.2.jpg
    |  └── . . .
    └── . . .

In [1]:
#Bucket S3
bucket_name = 'ahlimgdata'

#prefisso cartella contenente le sottocartelle delle immagini del dataset (una per label)
dataset_name = 'completeDataset'

### Impostazione dell'ambiente

Settiamo i collegamenti e l'autenticazione ai servizi AWS

In [2]:
import sagemaker
import cv2
from sagemaker import get_execution_role
from sagemaker.amazon.amazon_estimator import get_image_uri

role = get_execution_role()
sess = sagemaker.Session()

#immagine build-in dell'algoritmo di classificazione
training_image = get_image_uri(sess.boto_region_name, 'image-classification', repo_version="latest")

### Preparazione dei dati per il modello

Prima di lanciare il train sul modello bisogna:

* Creare alcuni file che insegnano a SageMaker le immagini di ciascuna delle label
* Caricare questi file addizionali in S3
* Configurare il modello nell'usare questi file per il train el al validazione

L'immagine dell'algoritmo per la classificazione ha bisogno di capire la relazione immagine->label. Per la creazione di questi file utilizziamo file LST o RecorI, in particolare usiamo lo script python im2rec.py

In [3]:
# Cerchiamo im2rec nel nostro ambiente e settiamo alcune variabili

base_dir = '/tmp'

%env BASE_DIR = $base_dir
%env S3_DATA_BUCKET_NAME = $bucket_name
%env DATASET_NAME = $dataset_name

import sys,os

suffix = '/mxnet/tools/im2rec.py'
im2rec = list(filter( (lambda x: os.path.isfile(x + suffix )), sys.path))[0] + suffix
%env IM2REC = $im2rec

env: BASE_DIR=/tmp
env: S3_DATA_BUCKET_NAME=ahlimgdata
env: DATASET_NAME=completeDataset
env: IM2REC=/home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/tools/im2rec.py


### Prendiamo le immagini per il train da S3

Dobbiamo creare i file RecordIO per il train e la validazione e dobbiamo scaricarle nel filesystem locale.

In [4]:
# Raccoglimento delle immagini da S3
!aws s3 sync s3://$S3_DATA_BUCKET_NAME/$DATASET_NAME $BASE_DIR/$DATASET_NAME --quiet

### Adattamento img

In [5]:
# %%bash
# CATEGORIES = ["change", "redcard", "yellowcard"]
import cv2
import matplotlib.pyplot as plt

categories = !ls $BASE_DIR/$DATASET_NAME
suffix = base_dir+"/"+dataset_name

for category in categories:
    label_path = os.path.join(suffix, category)
    for img in os.listdir(label_path):
        i = cv2.imread(os.path.join(label_path,img))
        new_img = cv2.resize(i, (224,224))
        cv2.imwrite(os.path.join(label_path,img),new_img)


# /home/ec2-user/anaconda3/bin/python -m pip install --upgrade pip
# pip install opencv-python

### Creazione dei file RecordIO del training

Lo script im2rec.py può creare file LST o RecodIO.

In [6]:
%%bash
#Utilizziamo im2rec.py per convertire le nostre immagini in RecordIO file

#Puliamo la nostra dir di lavoro esistente dai file LST e REC

cd $BASE_DIR
rm *.rec
rm *.lst

# Come prima cosa dobbiamo creare due LST file (training e test), attenzione a posizionare le img nelle classi corrette
# dobbiamo includere anche la lista di tutte le classi di label

echo "Creazione LST file"
python $IM2REC --list --recursive --pass-through --test-ratio=0.3 --train-ratio=0.7 $DATASET_NAME $DATASET_NAME > ${DATASET_NAME}_classes

echo "Classi label:"
cat ${DATASET_NAME}_classes

# Creazione dei file RecordIO dai file LST
echo "Creazione RecordIO files"
python $IM2REC --num-thread=4 ${DATASET_NAME}_train.lst $DATASET_NAME
python $IM2REC --num-thread=4 ${DATASET_NAME}_test.lst $DATASET_NAME
ls -lh *.rec

Creazione LST file
Classi label:
change 0
esultanza 1
game 2
goal 3
guardalinee 4
palo 5
punizione 6
redcard 7
rigore 8
var 9
yellowcard 10
Creazione RecordIO files
Creating .rec file from /tmp/completeDataset_train.lst in /tmp
time: 0.004594326019287109  count: 0
Creating .rec file from /tmp/completeDataset_test.lst in /tmp
time: 0.014521121978759766  count: 0
-rw-rw-r-- 1 ec2-user ec2-user 5.7M May 21 14:58 completeDataset_test.rec
-rw-rw-r-- 1 ec2-user ec2-user  14M May 21 14:58 completeDataset_train.rec


rm: cannot remove ‘*.rec’: No such file or directory
rm: cannot remove ‘*.lst’: No such file or directory


### Caricamento dei dati di training e test (RecordIO files)

Salvataggio dei file in S3 così SageMaker può utilizzarli per il training

In [7]:
#Caricamento dei file nel bucket S3 che la sessione di SageMaker sta usando
bucket = sess.default_bucket()

s3train_path = 's3://{}/{}/train/'.format(bucket, dataset_name)
s3validation_path = 's3://{}/{}/validation/'.format(bucket, dataset_name)

# Pulizia dait pre-esistenti
!aws s3 rm s3://{bucket}/{dataset_name}/train --recursive
!aws s3 rm s3://{bucket}/{dataset_name}/validation --recursive

# Caricamento dei file rec nei canali di train e validazione
!aws s3 cp /tmp/{dataset_name}_train.rec $s3train_path
!aws s3 cp /tmp/{dataset_name}_test.rec $s3validation_path

delete: s3://sagemaker-us-east-2-693949087897/completeDataset/train/completeDataset_train.rec
delete: s3://sagemaker-us-east-2-693949087897/completeDataset/validation/completeDataset_test.rec
upload: ../../../tmp/completeDataset_train.rec to s3://sagemaker-us-east-2-693949087897/completeDataset/train/completeDataset_train.rec
upload: ../../../tmp/completeDataset_test.rec to s3://sagemaker-us-east-2-693949087897/completeDataset/validation/completeDataset_test.rec


### Configurazione dei dati da usare per il training

Diciamo a SageMaker dove trovare i file RecordIo da usare per il training

In [8]:
train_data = sagemaker.session.s3_input(
    s3train_path,
    distribution = 'FullyReplicated',
    content_type = 'application/x-recordio',
    s3_data_type = 'S3Prefix'
)

validation_data = sagemaker.session.s3_input(
    s3validation_path,
    distribution = 'FullyReplicated',
    content_type = 'application/x-recordio',
    s3_data_type = 'S3Prefix'
)

data_channels = {'train' : train_data, 'validation' : validation_data}

### Training

#### Creazione di un classificatore di immagini con una configurazione base

In [9]:
s3_output_location = 's3://{}/{}/output'.format(bucket, dataset_name)

image_classifier = sagemaker.estimator.Estimator(
    training_image,
    role,
    train_instance_count = 1,
    train_instance_type = 'ml.p2.xlarge',
    output_path = s3_output_location,
    sagemaker_session = sess
)

print(s3_output_location)

s3://sagemaker-us-east-2-693949087897/completeDataset/output


### Impostazione di alcuni iperparametri

In [10]:
num_classes=! ls -l {base_dir}/{dataset_name} | wc -l
num_classes=int(num_classes[0]) - 1

num_training_samples=! cat {base_dir}/{dataset_name}_train.lst | wc -l
num_training_samples = int(num_training_samples[0])

# Tuning degli iperparametri
base_hyperparameters=dict(
    use_pretrained_model=1,
    image_shape='3,224,224',
    num_classes=num_classes,
    num_training_samples=num_training_samples,
)

# Iperparametri che aiutano il successo del modello
hyperparameters={
    **base_hyperparameters, 
    **dict(
        learning_rate=0.001,
        mini_batch_size=5,
    )
}


image_classifier.set_hyperparameters(**hyperparameters)

hyperparameters

{'use_pretrained_model': 1,
 'image_shape': '3,224,224',
 'num_classes': 11,
 'num_training_samples': 600,
 'learning_rate': 0.001,
 'mini_batch_size': 5}

### Inizio del train

In [11]:
%%time

import time
now = str(int(time.time()))
training_job_name = 'IC-' + dataset_name.replace('_', '-') + '-' + now

image_classifier.fit(inputs=data_channels, job_name=training_job_name, logs=True)

job = image_classifier.latest_training_job
model_path = f"{base_dir}/{job.name}"

print(f"\n\n Finished training! The model is available for download at: {image_classifier.output_path}/{job.name}/output/model.tar.gz")

2020-05-21 14:58:06 Starting - Starting the training job...
2020-05-21 14:58:08 Starting - Launching requested ML instances.........
2020-05-21 14:59:38 Starting - Preparing the instances for training.........
2020-05-21 15:01:19 Downloading - Downloading input data...
2020-05-21 15:01:56 Training - Downloading the training image......
2020-05-21 15:02:51 Training - Training image download completed. Training in progress.[34mDocker entrypoint called with argument(s): train[0m
[34m[05/21/2020 15:02:54 INFO 140130713106240] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/image_classification/default-input.json: {u'beta_1': 0.9, u'gamma': 0.9, u'beta_2': 0.999, u'optimizer': u'sgd', u'use_pretrained_model': 0, u'eps': 1e-08, u'epochs': 30, u'lr_scheduler_factor': 0.1, u'num_layers': 152, u'image_shape': u'3,224,224', u'precision_dtype': u'float32', u'mini_batch_size': 32, u'weight_decay': 0.0001, u'learning_rate': 0.1, u'momentum': 0}[0m
[34m[05/21/2020 15:

[34m[05/21/2020 15:06:26 INFO 140130713106240] Epoch[3] Validation-accuracy=0.866667[0m
[34m[05/21/2020 15:06:27 INFO 140130713106240] Storing the best model with validation accuracy: 0.866667[0m
[34m[05/21/2020 15:06:27 INFO 140130713106240] Saved checkpoint to "/opt/ml/model/image-classification-0004.params"[0m
[34m[05/21/2020 15:06:35 INFO 140130713106240] Epoch[4] Batch [20]#011Speed: 13.023 samples/sec#011accuracy=0.942857[0m
[34m[05/21/2020 15:06:42 INFO 140130713106240] Epoch[4] Batch [40]#011Speed: 13.577 samples/sec#011accuracy=0.951220[0m
[34m[05/21/2020 15:06:49 INFO 140130713106240] Epoch[4] Batch [60]#011Speed: 13.754 samples/sec#011accuracy=0.963934[0m
[34m[05/21/2020 15:06:56 INFO 140130713106240] Epoch[4] Batch [80]#011Speed: 13.843 samples/sec#011accuracy=0.967901[0m
[34m[05/21/2020 15:07:03 INFO 140130713106240] Epoch[4] Batch [100]#011Speed: 13.908 samples/sec#011accuracy=0.966337[0m
[34m[05/21/2020 15:07:10 INFO 140130713106240] Epoch[4] Train-accur

[34m[05/21/2020 15:13:16 INFO 140130713106240] Epoch[12] Batch [40]#011Speed: 13.545 samples/sec#011accuracy=0.975610[0m
[34m[05/21/2020 15:13:23 INFO 140130713106240] Epoch[12] Batch [60]#011Speed: 13.721 samples/sec#011accuracy=0.977049[0m
[34m[05/21/2020 15:13:30 INFO 140130713106240] Epoch[12] Batch [80]#011Speed: 13.818 samples/sec#011accuracy=0.980247[0m
[34m[05/21/2020 15:13:37 INFO 140130713106240] Epoch[12] Batch [100]#011Speed: 13.851 samples/sec#011accuracy=0.984158[0m
[34m[05/21/2020 15:13:44 INFO 140130713106240] Epoch[12] Train-accuracy=0.986667[0m
[34m[05/21/2020 15:13:44 INFO 140130713106240] Epoch[12] Time cost=42.809[0m
[34m[05/21/2020 15:13:50 INFO 140130713106240] Epoch[12] Validation-accuracy=0.884615[0m
[34m[05/21/2020 15:13:58 INFO 140130713106240] Epoch[13] Batch [20]#011Speed: 12.983 samples/sec#011accuracy=0.990476[0m
[34m[05/21/2020 15:14:06 INFO 140130713106240] Epoch[13] Batch [40]#011Speed: 13.528 samples/sec#011accuracy=0.985366[0m
[34m

[34m[05/21/2020 15:20:32 INFO 140130713106240] Epoch[21] Batch [20]#011Speed: 12.993 samples/sec#011accuracy=1.000000[0m
[34m[05/21/2020 15:20:39 INFO 140130713106240] Epoch[21] Batch [40]#011Speed: 13.551 samples/sec#011accuracy=0.985366[0m
[34m[05/21/2020 15:20:46 INFO 140130713106240] Epoch[21] Batch [60]#011Speed: 13.733 samples/sec#011accuracy=0.986885[0m
[34m[05/21/2020 15:20:53 INFO 140130713106240] Epoch[21] Batch [80]#011Speed: 13.822 samples/sec#011accuracy=0.990123[0m
[34m[05/21/2020 15:21:00 INFO 140130713106240] Epoch[21] Batch [100]#011Speed: 13.884 samples/sec#011accuracy=0.992079[0m
[34m[05/21/2020 15:21:07 INFO 140130713106240] Epoch[21] Train-accuracy=0.993333[0m
[34m[05/21/2020 15:21:07 INFO 140130713106240] Epoch[21] Time cost=42.738[0m
[34m[05/21/2020 15:21:13 INFO 140130713106240] Epoch[21] Validation-accuracy=0.898039[0m
[34m[05/21/2020 15:21:21 INFO 140130713106240] Epoch[22] Batch [20]#011Speed: 12.976 samples/sec#011accuracy=1.000000[0m
[34m

Training seconds: 1642
Billable seconds: 1642


 Finished training! The model is available for download at: s3://sagemaker-us-east-2-693949087897/completeDataset/output/IC-completeDataset-1590073086/output/model.tar.gz
CPU times: user 4.33 s, sys: 152 ms, total: 4.49 s
Wall time: 30min 59s


### Deploy del modello trainato

Una volta che il modello ha completato il train, utilizzeremo lo stesso oggetto dell'immagine (image_classifier) per creare un end-point completamente gestito.

In [12]:
%%time

# Deploy del modello nell'end-point, può richiedere alcuni minuti

deployed_endpoint = image_classifier.deploy(
    initial_instance_count = 1,
    instance_type = 'ml.t2.medium'
)

-----------------------!CPU times: user 413 ms, sys: 1.16 ms, total: 415 ms
Wall time: 11min 32s


### Pulizia

Quando abbiamo finito con l'end-point, possiamo eliminarlo e in questo modo le istanze di supporto verranno rilasciate.

In [13]:
# deployed_endpoint.delete_endpoint()

### Utilizzo dell' end-point appena deployato tramite codice Python

Se si vuole utilizzare l'end-point di cui si è appena fatto il deploy, c'è una funzione da utilizzare, essa prende il percorso dell'immagine da classificare e la lista delle classi usate per il training

In [14]:
# import json
# import numpy as np
# import os

# def classify_deployed(file_name, classes):
#     payload = None
#     with open(file_name, 'rb') as f:
#         payload = f.read()
#         payload = bytearray(payload)
        
#     deployed_endpoint.content_type = 'application/x-image'
#     result = json.loads(deployed_endpoint.predict(payload))
#     best_prob_index = np.argmax(result)
#     return (classes[best_prob_index], result[best_prob_index])

In [15]:
# inserire prova di evaluation

# !aws s3 cp --recursive s3://ahlimgdata/test/ ./test/
#%ls test

# for img in os.listdir("test/"):
#     print(img)
#     print(classify_deployed("test/" + img,["change", "redcard" , "yellowcard"]))


