# Using Built-in Algorithms

In this lesson, we will explore the built-in algorithms available in AWS SageMaker for model training. By the end of this lesson, you will be able to identify different algorithms, understand their use cases, and effectively train a model using a built-in algorithm.

## Learning Objectives
- Identify built-in algorithms available in SageMaker.
- Understand the use cases for different algorithms.
- Train a model using a built-in algorithm effectively.

## Why This Matters

Built-in algorithms simplify the model training process, allowing users to focus on data and results rather than implementation details. By leveraging these algorithms, you can quickly build and deploy machine learning models without needing extensive programming knowledge.

### Concept 1: Built-in Algorithms

Built-in algorithms are pre-implemented algorithms provided by SageMaker that simplify the model training process. They allow users to leverage powerful machine learning techniques without needing to implement them from scratch.

In [None]:
# Example: List of built-in algorithms in SageMaker
# This code snippet lists some of the built-in algorithms available in SageMaker.

built_in_algorithms = [
    'XGBoost',
    'Linear Learner',
    'Factorization Machines',
    'K-Means',
    'Object Detection'
]

print('Built-in Algorithms in SageMaker:')
for algo in built_in_algorithms:
    print('-', algo)

### Micro-Exercise 1

Identify at least three built-in algorithms available in SageMaker.

In [None]:
# Micro-exercise starter code
# List of built-in algorithms
# Fill in the list with your answers
my_built_in_algorithms = [
    'Algorithm 1',
    'Algorithm 2',
    'Algorithm 3'
]

print('Your built-in algorithms:')
for algo in my_built_in_algorithms:
    print('-', algo)

### Concept 2: Algorithm Selection

Algorithm selection involves choosing the most suitable algorithm for a given problem based on the nature of the data and the desired outcome. Different algorithms have different strengths and weaknesses.

In [None]:
# Example: Factors influencing algorithm choice
# This code snippet illustrates how to choose an algorithm based on the problem type.

problem_type = 'classification'

if problem_type == 'classification':
    recommended_algorithm = 'XGBoost'
elif problem_type == 'regression':
    recommended_algorithm = 'Linear Learner'
else:
    recommended_algorithm = 'K-Means'

print(f'Recommended algorithm for {problem_type}: {recommended_algorithm}')

### Micro-Exercise 2

Match the following algorithms to their appropriate use cases:
1. XGBoost
2. Linear Learner
3. K-Means

- A. Predicting housing prices
- B. Customer segmentation
- C. Credit scoring

## Examples Section

### Example 1: Using XGBoost for Classification Tasks
This example demonstrates how to use the XGBoost algorithm for a classification task in finance, such as credit scoring.

In [None]:
# Example code for training XGBoost model
import sagemaker
from sagemaker import get_execution_role
from sagemaker.estimator import Estimator

# Define the role and session
role = get_execution_role()
session = sagemaker.Session()

# Define the XGBoost estimator
xgboost_estimator = Estimator(
    image_uri=sagemaker.image_uris.retrieve('xgboost', session.boto_region_name),
    role=role,
    instance_count=1,
    instance_type='ml.m5.large',
    output_path='s3://your-bucket/output',
    sagemaker_session=session
)

# Set hyperparameters
xgboost_estimator.set_hyperparameters(
    objective='binary:logistic',
    num_round=100
)

# Train the model
xgboost_estimator.fit({'train': 's3://your-bucket/train-data'})

### Example 2: Training a Linear Regression Model
This example shows how to train a linear regression model for predicting housing prices using SageMaker's built-in algorithms.

In [None]:
# Example code for training Linear Regression model
from sagemaker.linear_learner import LinearLearner

# Define the Linear Learner estimator
linear_estimator = LinearLearner(
    role=role,
    instance_count=1,
    instance_type='ml.m5.large',
    predictor_type='regressor',
    output_path='s3://your-bucket/output',
    sagemaker_session=session
)

# Set hyperparameters
linear_estimator.set_hyperparameters(
    feature_dim=10,
    predictor_type='regressor',
    mini_batch_size=32
)

# Train the model
linear_estimator.fit({'train': 's3://your-bucket/train-data'})

## Main Exercise
### Training a Model with XGBoost
In this exercise, you will load a dataset into SageMaker, select the XGBoost algorithm, configure the training job parameters, and evaluate the model's performance.

In [None]:
# Load dataset and train XGBoost model
import pandas as pd
from sklearn.model_selection import train_test_split

# Load your dataset
data = pd.read_csv('s3://your-bucket/dataset.csv')

# Split the dataset into training and testing sets
train_data, test_data = train_test_split(data, test_size=0.2)

# Save the datasets to S3
train_data.to_csv('s3://your-bucket/train-data', index=False)
test_data.to_csv('s3://your-bucket/test-data', index=False)

# Now you can use the previous XGBoost code to train the model.

## Common Mistakes
- Choosing the wrong algorithm for the problem at hand.
- Neglecting to tune hyperparameters after selecting an algorithm.

## Recap & Next Steps
In this lesson, we learned about the built-in algorithms in SageMaker and how to select the appropriate algorithm for different use cases. We also practiced training models using XGBoost and Linear Learner. In the next lesson, we will explore hyperparameter tuning to optimize model performance.