# Huggingface Sagemaker - Vision Transformer

### Image Classification with the `google/vit` on `cifar10`

1. [Introduction](#Introduction)  
2. [Development Environment and Permissions](#Development-Environment-and-Permissions)
    1. [Installation](#Installation)  
    3. [Permissions](#Permissions)
3. [Processing](#Preprocessing)   
    1. [convert features and transform images](#convert-features-and-transform-images)  
    2. [Uploading data to sagemaker_session_bucket](#Uploading-data-to-sagemaker_session_bucket)  
4. [Fine-tuning & starting Sagemaker Training Job](#Fine-tuning-\&-starting-Sagemaker-Training-Job)  
    1. [Creating an Estimator and start a training job](#Creating-an-Estimator-and-start-a-training-job)  

# Introduction

Welcome to our end-to-end binary Image-Classification example. In this demo, we will use the Hugging Faces `transformers` and `datasets` library together with Amazon SageMaker to fine-tune a pre-trained vision transformers on image classification.

The script and notebook is inspired by [NielsRogges](https://github.com/NielsRogge) example notebook of [Fine-tune the Vision Transformer on CIFAR-10](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/VisionTransformer/Fine_tuning_the_Vision_Transformer_on_CIFAR_10_with_the_%F0%9F%A4%97_Trainer.ipynb). Niels was also the contributor of the Vision Transformer into `transformers`.


_**NOTE: You can run this demo in Sagemaker Studio, your local machine or Sagemaker Notebook Instances**_

![Bildschirmfoto%202021-06-09%20um%2010.08.22.png](attachment:Bildschirmfoto%202021-06-09%20um%2010.08.22.png)

# Development Environment and Permissions


_**Use at least a `t3.large` instance otherwise preprocessing will take ages.**_

## Installation

_*Note:* we only install the required libraries from Hugging Face and AWS. You also need PyTorch or Tensorflow, if not already installed_

In [None]:
%pip install "comet_ml>=3.44.0" "sagemaker>=2.140.0" "transformers~=4.36.1" "datasets" s3fs "torch~=2.1.0" --upgrade

## Permissions

_If you are going to use Sagemaker in a local environment, you need access to an IAM Role with the required permissions for Sagemaker. You can find out more about this [here](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html)_

In [None]:
import sagemaker
import boto3

# Uncomment if you need to use a specific AWS profile
# boto3.setup_default_session(profile_name="profile")

sess = sagemaker.Session()
# sagemaker session bucket -> used for uploading data, models and logs
# sagemaker will automatically create this bucket if it not exists
sagemaker_session_bucket = None
if sagemaker_session_bucket is None and sess is not None:
    # set to default bucket if a bucket name is not given
    sagemaker_session_bucket = sess.default_bucket()

role = None

# Uncomment if you need to use a specific AWS Sagemaker Role
# role = "arn:aws:iam::276069367280:role/service-role/AmazonSageMaker-ExecutionRole-20240620T150642"

if role is None:
    try:
        role = sagemaker.get_execution_role()
    except ValueError:
        iam = boto3.client("iam")
        role = iam.get_role(RoleName="sagemaker_execution_role")["Role"]["Arn"]

sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")

In [None]:
print(role)

# Preprocessing

We are using the `datasets` library to download and preprocess the `fashion-mnist` dataset. After preprocessing, the dataset will be uploaded to our `sagemaker_session_bucket` to be used within our training job. The [cifar10](https://www.cs.toronto.edu/~kriz/cifar.html) are labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.


_Note from Nils: "that in the ViT paper, the best results were obtained when fine-tuning at a higher resolution. For this, one interpolates the pre-trained absolute position embeddings"._



## Convert Features and transform images

In [None]:
from transformers import AutoProcessor
from datasets import load_dataset
import numpy as np
from PIL import Image
from random import randint

# dataset used
dataset_name = "cifar10"

# s3 key prefix for the data
s3_prefix = "samples/datasets/cifar10"

# FeatureExtractor used in preprocessing
model_name = "google/vit-base-patch16-224-in21k"

image_processor = AutoProcessor.from_pretrained(model_name)

We are downsampling dataset to make it faster to preprocess.

In [None]:
# load dataset
train_dataset, test_dataset = load_dataset(
    dataset_name, split=["train[:500]", "test[:200]"]
)

# display random sample
train_dataset[0]["img"]

In [None]:
from datasets import Features, Array3D

# we need to extend the features
features = Features(
    {
        **train_dataset.features,
        "pixel_values": Array3D(dtype="float32", shape=(3, 224, 224)),
    }
)

# extractor helper function
def preprocess_images(examples):
    # get batch of images
    images = examples["img"]
    inputs = image_processor(images=images)
    examples["pixel_values"] = inputs["pixel_values"]

    return examples


# preprocess dataset
train_dataset = train_dataset.map(preprocess_images, batched=True, features=features)
test_dataset = test_dataset.map(preprocess_images, batched=True, features=features)

# set to torch format for training
train_dataset.set_format("torch", columns=["pixel_values", "label"])
test_dataset.set_format("torch", columns=["pixel_values", "label"])

# remove unused column
train_dataset = train_dataset.remove_columns("img")

## Uploading data to `sagemaker_session_bucket`

After we processed the `datasets` we are going to use the new `FileSystem` [integration](https://huggingface.co/docs/datasets/filesystems.html) to upload our dataset to S3.

In [None]:
import botocore
from s3fs import S3FileSystem

# save train_dataset to s3
training_input_path = f"s3://{sess.default_bucket()}/{s3_prefix}/train"
train_dataset.save_to_disk(training_input_path, num_shards=1)

# save test_dataset to s3
test_input_path = f"s3://{sess.default_bucket()}/{s3_prefix}/test"
test_dataset.save_to_disk(test_input_path, num_shards=1)

print(f"train dataset is uploaded to {training_input_path}")
print(f"test dataset is uploaded to {test_input_path}")

# Training code

Here is our training code:

In [None]:
%%writefile src/train.py

import comet_ml
from transformers import ViTForImageClassification, Trainer, TrainingArguments,default_data_collator,ViTFeatureExtractor
from datasets import load_from_disk,load_metric
import random
import logging
import sys
import argparse
import os
import numpy as np
import subprocess

subprocess.run([
        "git",
        "config",
        "--global",
        "user.email",
        "sagemaker@huggingface.co",
    ], check=True)
subprocess.run([
        "git",
        "config",
        "--global",
        "user.name",
        "sagemaker",
    ], check=True)


def main(args):
    experiment = comet_ml.start()
    
    # Set up logging
    logger = logging.getLogger(__name__)

    logging.basicConfig(
        level=logging.getLevelName("INFO"),
        handlers=[logging.StreamHandler(sys.stdout)],
        format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
    )

    # load datasets
    train_dataset = load_from_disk(args.training_dir)
    test_dataset = load_from_disk(args.test_dir)
    num_classes = train_dataset.features["label"].num_classes


    logger.info(f" loaded train_dataset length is: {len(train_dataset)}")
    logger.info(f" loaded test_dataset length is: {len(test_dataset)}")

    metric_name = "accuracy"
    # compute metrics function for binary classification

    metric = load_metric(metric_name)

    def compute_metrics(eval_pred):
        predictions, labels = eval_pred
        predictions = np.argmax(predictions, axis=1)
        return metric.compute(predictions=predictions, references=labels)

    # download model from model hub
    model = ViTForImageClassification.from_pretrained(args.model_name,num_labels=num_classes)
    
    # change labels
    id2label =  {key:train_dataset.features["label"].names[index] for index,key in enumerate(model.config.id2label.keys())}
    label2id =  {train_dataset.features["label"].names[index]:value for index,value in enumerate(model.config.label2id.values())}
    model.config.id2label = id2label
    model.config.label2id = label2id
    
    
    # define training args
    training_args = TrainingArguments(
        output_dir=args.output_dir,
        num_train_epochs=args.num_train_epochs,
        per_device_train_batch_size=args.per_device_train_batch_size,
        per_device_eval_batch_size=args.per_device_eval_batch_size,
        warmup_steps=args.warmup_steps,
        weight_decay=args.weight_decay,
        evaluation_strategy="steps",
        logging_dir=f"{args.output_dir}/logs",
        learning_rate=float(args.learning_rate),
        load_best_model_at_end=True,
        metric_for_best_model=metric_name,
    )
    
    
    # create Trainer instance
    trainer = Trainer(
        model=model,
        args=training_args,
        compute_metrics=compute_metrics,
        train_dataset=train_dataset,
        eval_dataset=test_dataset,
        data_collator=default_data_collator,
    )

    # train model
    trainer.train()

    # evaluate model
    eval_result = trainer.evaluate(eval_dataset=test_dataset)

    # writes eval result to file which can be accessed later in s3 ouput
    with open(os.path.join(args.output_dir, "eval_results.txt"), "w") as writer:
        print(f"***** Eval results *****")
        for key, value in sorted(eval_result.items()):
            writer.write(f"{key} = {value}\n")

    # Saves the model to s3
    trainer.save_model(args.output_dir)


if __name__ == "__main__":

    parser = argparse.ArgumentParser()

    # hyperparameters sent by the client are passed as command-line arguments to the script.
    parser.add_argument("--model_name", type=str)
    parser.add_argument("--output_dir", type=str,default="/opt/ml/model")
    parser.add_argument("--extra_model_name", type=str,default="sagemaker")
    parser.add_argument("--dataset", type=str,default="cifar10")
    parser.add_argument("--task", type=str,default="image-classification")

    parser.add_argument("--num_train_epochs", type=int, default=3)
    parser.add_argument("--per_device_train_batch_size", type=int, default=32)
    parser.add_argument("--per_device_eval_batch_size", type=int, default=64)
    parser.add_argument("--warmup_steps", type=int, default=500)
    parser.add_argument("--weight_decay", type=float, default=0.01)
    parser.add_argument("--learning_rate", type=str, default=2e-5)

    parser.add_argument("--training_dir", type=str, default=os.environ["SM_CHANNEL_TRAIN"])
    parser.add_argument("--test_dir", type=str, default=os.environ["SM_CHANNEL_TEST"])

    args, _ = parser.parse_known_args()

    main(args)

And we need to add few dependencies:

In [None]:
%%writefile src/requirements.txt

comet_ml

# Fine-tuning & starting Sagemaker Training Job

In order to create a sagemaker training job we need a `HuggingFace` Estimator. The Estimator handles end-to-end Amazon SageMaker training and deployment tasks. In an Estimator, we define which fine-tuning script should be used as `entry_point`, which `instance_type` should be used, which `hyperparameters` are passed in .....

```python
/opt/conda/bin/python train.py --num_train_epochs 1 --model_name google/vit-base-patch16-224-in21k --per_device_train_batch_size 16
```

## Creating an Estimator and start a training job

In [None]:
from sagemaker.huggingface import HuggingFace

# hyperparameters, which are passed into the training job
hyperparameters = {
    "num_train_epochs": 3,  # train epochs
    "per_device_train_batch_size": 16,  # batch size
    "model_name": model_name,  # model which will be trained on
}

In [None]:
import comet_ml.config

COMET_API_KEY = comet_ml.config.get_config()["comet.api_key"]

huggingface_estimator = HuggingFace(
    entry_point="train.py",
    source_dir="./src",
    instance_type="ml.p3.2xlarge",
    instance_count=1,
    role=role,
    transformers_version="4.36",
    pytorch_version="2.1",
    py_version="py310",
    hyperparameters=hyperparameters,
    environment={
        "COMET_API_KEY": COMET_API_KEY,
    },
)

In [None]:
# starting the train job with our uploaded datasets as input
huggingface_estimator.fit({"train": training_input_path, "test": test_input_path})