## Module 4 Final Project Submission

Please fill out:
* Student name: **Chelsea Power**
* Student pace: **part time**
* Scheduled project review date/time: **8/30/19 at 5 pm ET**
* Instructor name: **Brandon Lewis**
* Blog post URL: 


## Purpose

This project will focus on predicting customer mood for the purpose of song recommendation and maintaining listener engagement. A neural network will be used to predict the song mood (emotion variation of a user based on the track they are playing). Then based on the output of the neural network, a conext-aware recommendation system will recommend the next song.

## Data Dictionary

**# nowplaying-RS Dataset (3 files)**

This dataset features context- and content features of listening events. It contains 11.6 million music listening events of 139K users and 346K tracks collected from Twitter. The dataset comes with a rich set of item content features and user context features, as well as timestamps of the listening events. Moreover, some of the user context features imply the cultural origin of the users, and some others - like hashtags - give clues to the emotional state of a user underlying a listening event.

- **user_track_hashtag_timestamp.csv** contains basic information about each listening event. Provided in each listening event: id, the user_id, track_id, hashtag, created_at
- **context_content_features.csv** contains all context and content features. Provided in each listening event: the id of the event, user_id, track_id, artist_id, content features regarding the track mentioned in the event (instrumentalness, liveness, speechiness, danceability, valence, loudness, tempo, acousticness, energy, mode, key) and context features regarding the listening event (coordinates (as geoJSON), place (as geoJSON), geo (as geoJSON), tweet_language, created_at, user_lang, time_zone, entities contained in the tweet).
- **sentiment_values.csv** contains sentiment information for hashtags. It contains the hashtag itself and the sentiment values gathered via four different sentiment dictionaries: AFINN, Opinion Lexicon, Sentistrength Lexicon and vader. For each of these dictionaries we list the minimum, maximum, sum and average of all sentiments of the tokens of the hashtag (if available, else we list empty values). However, as most hashtags only consist of a single token, these values are equal in most cases. Please note that the lexica are rather diverse and therefore, are able to resolve very different terms against a score. Hence, the resulting csv is rather sparse. The file contains the following comma-separated values: <hashtag, vader_min, vader_max, vader_sum,vader_avg,  afinn_min, afinn_max, afinn_sum, afinn_avg, ol_min, ol_max, ol_sum, ol_avg, ss_min, ss_max, ss_sum, ss_avg >, where we abbreviate all scores gathered over the Opinion Lexicon with the prefix 'ol'. Similarly, 'ss' stands for SentiStrength. 

### OBSERVE: Understand and Load the Datasets

In [0]:
# Import required libraries
import pandas as pd
import numpy as np

df1 = pd.read_csv('user_track_hashtag_timestamp.csv')

#Look at size of the dataset
df1.shape

In [0]:
#Look at the columns and first 10 rows of the dataset
df1.head()

In [0]:
#Load second dataset
df2 = pd.read_csv('context_content_features.csv')

#Look at size of the dataset
df2.shape

In [0]:
#Look at the columns and first 10 rows of the dataset
df2.head()

In [0]:
#Load third dataset
df3 = pd.read_csv('sentiment_values.csv.csv')

#Look at size of the dataset
df3.shape

In [0]:
#Look at the columns and first 10 rows of the dataset
df3.head()

In [0]:
#Determine how to join datasets - rename columns / drop unnecessary columns per csv file
#Merge csv files into new dataset

### SCRUB: Data Preparation
- Data type conversions (e.g. numeric data mistakenly encoded as objects)
- Detect and deal with missing values
- Remove unnecessary columns

In [0]:
#Look at column types

## Check for null/missing values

In [0]:
#Run an apply method utilizing a lambda expression that checks to see if there was any missing values through each column. 
#Printing the column name and total missing values for that column, iteratively.
df_music.apply(lambda x: x.isnull().sum())

## Explore the data
- Look at the distribution for the data
- Look for multicolinarity
- Remove unnecessary features

In [0]:
#Look at value counts of the predictor variable
df_music.XXX.value_counts()

In [0]:
# Visualize data
import seaborn as sns
import matplotlib.pyplot as plt

sns.countplot(x='XXX', data=df, palette='hls')
plt.show()

**Observation:**

In [0]:
# Create continuous dataset and look at distributions for data
df_music =

**Summary:**

In [0]:
#Create coorelation heatmap - check for multicolinarity
from matplotlib import pyplot as plt
import seaborn as sns

correlation = df_music.corr()
plt.figure(figsize=(14, 12))
heatmap = sns.heatmap(correlation, annot=True, linewidths=0, vmin=-1, cmap="RdBu_r")

**Summary:**

## Import packages and generate the data

In [0]:
# Package imports
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import sklearn
from sklearn.datasets import make_classification
import sklearn.linear_model

# Display plots inline and change default figure size
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (6.0, 6.0)

In [0]:
# Generate a dataset and plot it
np.random.seed(123)
sample_size = 500
X, Y = sklearn.datasets.make_circles(n_samples = sample_size, noise = 0.1)

# Display plots inline and change default figure size
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (6.0, 6.0)
plt.scatter(X[:,0], X[:,1], s=20, c=Y, edgecolors="gray")

## Model 1: Logistic Regression
- Normalize the data prior to fitting the model
- Train-Test Split
- Fit the model
- Predict
- Evaluate

In [0]:
#Train the logistic regression classifier
clf = sklearn.linear_model.LogisticRegressionCV()
clf.fit(X, y)

In [0]:
# Helper function to plot a decision boundary that will visualize the classification performance
def plot_decision_boundary(pred_func):
    # Set min and max values and give it some padding
    x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
    y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
    h = 0.01
    
    # Generate a grid of points with distance h between them
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
    
    # Predict the function value for the whole gid
    Z = pred_func(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    # Plot the contour and training examples
    plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral)
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Spectral)

In [0]:
#Create a decision boundart using the predictions made by the logistic regression model
plot_decision_boundary(lambda x: log_reg.predict(x))
plt.title("Logistic Regression")

In [0]:
#Store the predictions to calculate the accuracy
clf_predict = clf.predict(X)

In [0]:
print ('The logistic regression model has an accuracy of: ' 
       + str(np.sum(clf_predict == Y)/sample_size*100) + '%')

## Decision Tree

## Build a neural network

In [0]:
#Define some useful variables and parameters for gradient descent:
num_examples = len(X) #training set size
nn_input_dim = 2 #input layer dimensionality
nn_output_dim = 2 #output layer dimensionality

# Gradient descent parameters (these may need to change)
epsilon = 0.01 # learning rate for gradient descent
reg_lambda = 0.01 # regularization strength

In [0]:
#Helper function to evaluate the total loss on the dataset
def calculate_loss(model):
    W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']
    # Forward propagation to calculate our predictions
    z1 = X.dot(W1) + b1
    a1 = np.tanh(z1)
    z2 = a1.dot(W2) + b2
    exp_scores = np.exp(z2)
    probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
    # Calculating the loss
    corect_logprobs = -np.log(probs[range(num_examples), y])
    data_loss = np.sum(corect_logprobs)
    # Add regulatization term to loss (optional)
    data_loss += reg_lambda/2 * (np.sum(np.square(W1)) + np.sum(np.square(W2)))
    return 1./num_examples * data_loss

In [0]:
#Helper function to predict an output (0 or 1)
def predict(model, x):
    W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']
    # Forward propagation
    z1 = x.dot(W1) + b1
    a1 = np.tanh(z1)
    z2 = a1.dot(W2) + b2
    exp_scores = np.exp(z2)
    probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
    return np.argmax(probs, axis=1)

In [0]:
# This function learns parameters for the neural network and returns the model.
# - nn_hdim: Number of nodes in the hidden layer
# - num_passes: Number of passes through the training data for gradient descent
# - print_loss: If True, print the loss every 1000 iterations
def build_model(nn_hdim, num_passes=20000, print_loss=False):
    
    # Initialize the parameters to random values. We need to learn these.
    np.random.seed(0)
    W1 = np.random.randn(nn_input_dim, nn_hdim) / np.sqrt(nn_input_dim)
    b1 = np.zeros((1, nn_hdim))
    W2 = np.random.randn(nn_hdim, nn_output_dim) / np.sqrt(nn_hdim)
    b2 = np.zeros((1, nn_output_dim))

    # This is what we return at the end
    model = {}
    
    # Gradient descent. For each batch...
    for i in range(0, num_passes):

        # Forward propagation
        z1 = X.dot(W1) + b1
        a1 = np.tanh(z1)
        z2 = a1.dot(W2) + b2
        exp_scores = np.exp(z2)
        probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)

        # Backpropagation
        delta3 = probs
        delta3[range(num_examples), y] -= 1
        dW2 = (a1.T).dot(delta3)
        db2 = np.sum(delta3, axis=0, keepdims=True)
        delta2 = delta3.dot(W2.T) * (1 - np.power(a1, 2))
        dW1 = np.dot(X.T, delta2)
        db1 = np.sum(delta2, axis=0)

        # Add regularization terms (b1 and b2 don't have regularization terms)
        dW2 += reg_lambda * W2
        dW1 += reg_lambda * W1

        # Gradient descent parameter update
        W1 += -epsilon * dW1
        b1 += -epsilon * db1
        W2 += -epsilon * dW2
        b2 += -epsilon * db2
        
        # Assign new parameters to the model
        model = { 'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}
        
        # Optionally print the loss.
        # This is expensive because it uses the whole dataset, so we don't want to do it too often.
        if print_loss and i % 1000 == 0:
          print("Loss after iteration %i: %f" %(i, calculate_loss(model)))
    
    return model

## A network with a hidden layer of size 3
Train the network with a hidden layer size of 3

In [0]:
# Build a model with a 3-dimensional hidden layer
model = build_model(3, print_loss=True)

# Plot the decision boundary
plot_decision_boundary(lambda x: predict(model, x))
plt.title("Decision Boundary for hidden layer size 3")

## Varying the hidden layer size

In [0]:
plt.figure(figsize=(16, 32))
hidden_layer_dimensions = [1, 2, 3, 4, 5, 20, 50]
for i, nn_hdim in enumerate(hidden_layer_dimensions):
    plt.subplot(5, 2, i+1)
    plt.title('Hidden Layer size %d' % nn_hdim)
    model = build_model(nn_hdim)
    plot_decision_boundary(lambda x: predict(model, x))
plt.show()

### Citation

@inproceedings{smc18,
title = {#nowplaying-RS: A New Benchmark Dataset for Building Context-Aware Music Recommender Systems},
author = {Asmita Poddar and Eva Zangerle and Yi-Hsuan Yang},
url = {http://mac.citi.sinica.edu.tw/~yang/pub/poddar18smc.pdf},
year = {2018},
date = {2018-07-04},
booktitle = {Proceedings of the 15th Sound & Music Computing Conference},
address = {Limassol, Cyprus},
note = {code at https://github.com/asmitapoddar/nowplaying-RS-Music-Reco-FM},
tppubtype = {inproceedings}
}