# Music Recommendation Project

Short description here... this will be Algorithm Development: Import features, baseline algorithm, complex algorithm, hyper-parameter optimization

Author: Ben Walsh \
February 7, 2021

## Contents

1. [Feature Import](#feature-data-import)
2. Data Exploration
3. Data Cleaning
4. Feature Engineering
5. Feature Selection
6. [Saving Data](#6.-Save-Data)

## <a class="anchor" id="feature-data-import"></a>1. Feature Data Import

### Import libraries

In [4]:
import pandas as pd
import os
import xgboost as xgb

First import all cleaned feature data: X and y target data for training and testing.

In [5]:
X_train_file = './data-input-clean/X_train.csv'
X_test_file = './data-input-clean/X_test.csv'
y_train_file = './data-input-clean/y_train.csv'
y_test_file = './data-input-clean/y_test.csv'

### Import training/test data

In [6]:
if os.path.exists(X_train_file):
    X_train = pd.read_csv(X_train_file)
else:
    print('Training data file {} not found!'.format(X_train_file))

if os.path.exists(X_test_file):
    X_test = pd.read_csv(X_test_file)
else:
    print('Test data file {} not found!'.format(X_test_file))

if os.path.exists(y_train_file):
    y_train = pd.read_csv(y_train_file)
else:
    print('Training data file {} not found!'.format(y_train_file))

if os.path.exists(y_test_file):
    y_test = pd.read_csv(y_test_file)
else:
    print('Test data file {} not found!'.format(y_test_file))

## 2. Baseline Model

## 3. XGBoost Model

In [7]:
xgb_model = xgb.XGBRegressor(objective ='reg:linear', #reg:squarederror #?
                             colsample_bytree = 0.3, 
                             learning_rate = 0.1,
                             max_depth = 5, 
                             alpha = 10, 
                             n_estimators = 10)

In [8]:
xgb_model.fit(X_train, y_train)



XGBRegressor(alpha=10, colsample_bytree=0.3, max_depth=5, n_estimators=10)

## 4. Evaluate Model

Compare training and testing accuracy

In [9]:
y_predict_train = xgb_model.predict(X_train)
y_predict_test = xgb_model.predict(X_test)

In [10]:
# Round outputs to compare
y_predict_train = y_predict_train.round().reshape(len(y_predict_train),1)
y_predict_test = y_predict_test.round().reshape(len(y_predict_test),1)

In [11]:
print('Accuracy on training set = {:.2f}%'.format(100*(y_predict_train == y_train.values).sum() / len(y_train)))

Accuracy on training set = 61.67%


In [12]:
print('Accuracy on testing set = {:.2f}%'.format(100*(y_predict_test == y_test.values).sum() / len(y_test)))

Accuracy on testing set = 61.69%


### Observations

Initial parameters of XGBoost model without any song features in training data and without any new features has accuracy of 61.69%. The accuracy is nearly identical to the training accuracy, indicating the algorithm is not overfitting.