# Music Recommendation Project

This is the second section of the Capstone Project for Udacity's Machine Learning Engineer Nanodegree.

This notebook includes importing the cleaned data from the first notebook, implementing a baseline algorithm, implementing a complex algorithm, hyper-parameter optimization, and saving the model.

Author: Ben Walsh \
February 7, 2021

## Contents

1. [Feature Import](#feature-data-import)
2. Baseline Model
3. [Final Model](#xgb-model)
4. [Save Model](#save-model)

## <a class="anchor" id="feature-data-import"></a>1. Feature Data Import

### Import libraries

In [1]:
import pandas as pd
import numpy as np
import xgboost as xgb
import pickle

import os
import json
import datetime

First import all cleaned feature data: X and y target data for training.

In [2]:
X_train_file = './data-input-clean/X_train.csv'
y_train_file = './data-input-clean/y_train.csv'

### Import training data

In [3]:
if os.path.exists(X_train_file):
    X_train = pd.read_csv(X_train_file)
else:
    print('Training data file {} not found!'.format(X_train_file))

if os.path.exists(y_train_file):
    y_train = pd.read_csv(y_train_file)
else:
    print('Training data file {} not found!'.format(y_train_file))

## 2. Baseline Model

## <a class="anchor" id="xgb-model"></a>3. XGBoost Model

In [20]:
xgb_hyper_params = {'objective': 'reg:linear',
                   'colsample_bytree': 0.3,
                   'learning_rate': 0.1,
                   'max_depth': 10, # try increasing this to 10
                   'alpha': 10,
                   'n_estimators': 10}

In [21]:
xgb_model = xgb.XGBRegressor(objective = xgb_hyper_params['objective'], #reg:squarederror #?
                             colsample_bytree = xgb_hyper_params['colsample_bytree'], 
                             learning_rate = xgb_hyper_params['learning_rate'],
                             max_depth = xgb_hyper_params['max_depth'], 
                             alpha = xgb_hyper_params['alpha'], 
                             n_estimators = xgb_hyper_params['n_estimators'])

In [22]:
xgb_model.fit(X_train, y_train)



XGBRegressor(alpha=10, colsample_bytree=0.3, max_depth=10, n_estimators=10)

## <a class="anchor" id="save-model"></a>4. Save Model

Get timestamp for history and to ensure a unique model name. 

In [23]:
timestamp = datetime.datetime.now()
timestamp_str = '{}-{:02}-{:02}-{}-{}-{}-{}'.format(timestamp.year, timestamp.month, timestamp.day, timestamp.hour, timestamp.minute, timestamp.second, timestamp.microsecond)


Save model with pickle

In [24]:
model_folder = './saved_models'
if not(os.path.exists(model_folder)):
       os.mkdir(model_folder)

In [25]:
pickle.dump(xgb_model, open('{}/model-{}'.format(model_folder, timestamp_str), "wb"))