# Sentiment Analysis App with Huggingface Distributed GPU Training and PyTorch Lightning

In this notebook, we will reimplement the work done in the SageMaker Project notebook, but with a language model obtained from Huggingface and using PyTorch Lightning to train the model. We are doing this because the model trained in the SageMaker Notebook used an LSTM, which is quite outdated for language models.

## Setup

In [None]:
import sagemaker
from sagemaker.huggingface import HuggingFace
sagemaker.__version__

In [None]:
import sagemaker
from sagemaker.huggingface import HuggingFace

# gets role for executing training job
role = sagemaker.get_execution_role()
hyperparameters = {
	'model_name_or_path':'distilbert-base-uncased',
	'output_dir':'/opt/ml/model',
    'dataset_name': 'imdb',
    'do_train': True,
    'do_eval': True,
    'per_device_train_batch_size': 12,
    'num_train_epochs': 5,
    'max_seq_length': 128,
    'fp16': True,
    'pad_to_max_length': True,
}

# git configuration to download our fine-tuning script
git_config = {'repo': 'https://github.com/huggingface/transformers.git','branch': 'v4.6.1'}

# configuration for running training on smdistributed Data Parallel
# smdistributed = SageMaker Distributed
distribution = {'smdistributed': {'dataparallel':{'enabled': True}}}

# creates Hugging Face estimator
huggingface_estimator = HuggingFace(
	entry_point='run_glue.py',
	source_dir='./examples/pytorch/text-classification',
	instance_type='ml.p3dn.24xlarge', # has 8 GPUs
	instance_count=2, # changed to 2 instances
	role=role,
	git_config=git_config,
	transformers_version='4.6.1',
	pytorch_version='1.7.1',
	py_version='py36',
	hyperparameters = hyperparameters
)

# starting the train job
huggingface_estimator.fit()

In [None]:
!pip install transformers torch==1.6.0

In [None]:
import os
import tarfile
from sagemaker.s3 import S3Downloader

local_path = 'imdb_sentiment_distributed_transformer'

os.makedirs(local_path, exist_ok=True)

# download model S3
