# Training the model remotely on a SageMaker Training Instance

In this notebook we train a recommendation model on the trade data that we created, and we produce recommendations for each investor. 

The model code is in 'train.py', we will execute this script remotely on a SageMaker Training Instance.

We start by importing the SageMaker library and getting the role, region and session.

In [None]:
import sagemaker, boto3, json
from sagemaker import get_execution_role

aws_role = get_execution_role()
aws_region = boto3.Session().region_name
sess = sagemaker.Session()
aws_role

Next we define the S3 bucket, input and output path that we are going to use.

In [None]:
bucket = sess.default_bucket()

training_dataset_s3_path = f"s3://{bucket}/input"
s3_output_location = f"s3://{bucket}/output"
print(training_dataset_s3_path)
print(s3_output_location)

Now we upload our data to the bucket.

In [None]:
!aws s3 cp trades.tsv {training_dataset_s3_path}/trades.tsv

Before starting the training job, we prepare the code that will be executed.

In [None]:
!mkdir -p source_dir
!cp *.py *.whl requirements.txt source_dir

Now we define the training job as an Estimator object. 

The execution environment is defined by the instance type, the docker image and the source_dir.

The command is defined by the entry point and the hyperparameters.

In [None]:
from sagemaker.estimator import Estimator
from sagemaker.utils import name_from_base

training_job_name = name_from_base(f"job")

estimator = Estimator(
    role=aws_role,
    instance_count=1,
    instance_type='ml.c5.4xlarge',
    image_uri='763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.12.1-cpu-py38-ubuntu20.04-sagemaker',
    source_dir='./source_dir',
    entry_point="train.py",
    hyperparameters={
        'datapath': 'trades.tsv',
        'training-index': 71,
        'prediction-index': 73,
    },
    max_run=360000,
)

Next we start the training job, passing the s3 input path and the job name. 

In [None]:
estimator.fit({"training": training_dataset_s3_path}, logs=True, job_name=training_job_name)

Now we retrieve the job output

In [None]:
!rm -rf training_job
!aws s3 sync s3://{bucket}/{training_job_name} training_job
!tar xvf training_job/output/model.tar.gz -C training_job/output
!tar xvf training_job/output/output.tar.gz -C training_job/output

Finally we can load and explore the recommendations

In [None]:
import pandas as pd

df = pd.read_csv('training_job/output/TransE_l2_trades_2017-03-31_2017-09-30_0/reco.tsv', sep='\t')
del df['rel']
df.columns = 'investor security score'.split()
df.shape

In [None]:
df

To query for a specific investor we use a fuzzy matching library

In [None]:
!pip install thefuzz[speedup]

In [None]:
from thefuzz import process

In [None]:
query = 'rennaissan'
choices = df.investor.unique()
response = process.extractOne(query, choices)[0]
response

In [None]:
df[df.investor.eq(response)]