# Training your First Model in Python

In the previous recipe, we generated a scatterplot diagram to explore the relationship between the 2 variables in the dataset. In this recipe, we will use the SageMaker **Linear Learner** built-in algorithm to build a linear regression model which predicts a professional’s salary using the number of months of relevant managerial experience. This recipe aims to demonstrate how a SageMaker built-in algorithm is used in a machine learning experiment that involves the train-test split and running the training job.

![Book Cover](../Extra/chap01/05.png)

The image above shows us what we will do in this recipe. Using the DataFrame loaded from the recipe `Visualizing and Understanding your Data in Python`, we will perform the train-test split and use the training dataset to train and build the model.

### How to do it...

In [None]:
%store -r df_all_data

In [None]:
from sklearn.model_selection import train_test_split

X = df_all_data['management_experience_months'].values 
y = df_all_data['monthly_salary'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

In [None]:
X_train

In [None]:
X_test

In [None]:
import pandas as pd
df_training_data = pd.DataFrame({ 'monthly_salary': y_train, 'management_experience_months': X_train})
df_training_data

In [None]:
!mkdir -p tmp

In [None]:
df_training_data.to_csv('tmp/training_data.csv', header=False, index=False)

In [None]:
s3_bucket = '<insert bucket name here>'
prefix = 'chapter01'

In [None]:
!aws s3 cp tmp/training_data.csv s3://{s3_bucket}/{prefix}/input/training_data.csv

In [None]:
import sagemaker 
import boto3
from sagemaker import get_execution_role 

role = get_execution_role()
session = sagemaker.Session()
region_name = boto3.Session().region_name

In [None]:
training_s3_input_location = f"s3://{s3_bucket}/{prefix}/input/training_data.csv" 
training_s3_output_location = f"s3://{s3_bucket}/{prefix}/output/"

In [None]:
from sagemaker.inputs import TrainingInput

train = TrainingInput(training_s3_input_location, content_type="text/csv")

In [None]:
train.__dict__

In [None]:
from sagemaker.image_uris import retrieve 

container = retrieve("linear-learner", region_name, "1")
container

In [None]:
estimator = sagemaker.estimator.Estimator(
    container,
    role, 
    instance_count=1, 
    instance_type='ml.m5.xlarge',
    output_path=training_s3_output_location,
    sagemaker_session=session)

In [None]:
estimator.set_hyperparameters(predictor_type='regressor', mini_batch_size=4)

In [None]:
estimator.fit({'train': train})

In [None]:
model_data = estimator.model_data
model_data

In [None]:
%store model_data

In [None]:
model_uri = estimator.image_uri
model_uri

In [None]:
%store model_uri

In [None]:
%store X_test
%store y_test

<img align="left" width="130" src="https://raw.githubusercontent.com/PacktPublishing/Amazon-SageMaker-Cookbook/master/Extra/cover-small-padded.png"/>

This notebook contains the code to help readers work through one of the recipes of the book [Machine Learning with Amazon SageMaker Cookbook: 80 proven recipes for data scientists and developers to perform ML experiments and deployments](https://www.amazon.com/Machine-Learning-Amazon-SageMaker-Cookbook/dp/1800567030)