# Neural network with scikit-learn

The goal of this notebook is to build feed forward neural network model using scikit-learn library to predicting sentiment from product reviews. You will do the following:

 * Load product reviews.
 * Implement feed forward neural network model using scikit-learn.
 * Tuning some parameters.

In [5]:
# Import some libs

import pandas
import numpy as np
import json

In [3]:
def remove_punctuation(text):
    import string
    return text.translate(string.punctuation)

def get_numpy_data(dataframe, features, label):
    dataframe.loc[:, 'intercept'] = 1
    features = ['intercept'] + features
    feature_matrix = dataframe.loc[:, features].values
    label_array = dataframe.loc[:, label].values
    return (feature_matrix, label_array)

def get_product_reviews_data():
    products_df = pandas.read_csv('/content/drive/MyDrive/FUNIX Progress/MLP303x_1.1-A_EN/data/amazon_baby_subset.csv')

    with open('/content/drive/MyDrive/FUNIX Progress/MLP303x_1.1-A_EN/data/important_words.json', 'r') as f:
        important_words = json.loads(f.read())

    products_df = products_df.fillna({'review':''})  # fill in N/A's in the review column
    products_df.loc[:, 'review_clean'] = products_df['review'].apply(remove_punctuation)

    for word in important_words:
        products_df.loc[:, word] = products_df['review_clean'].apply(lambda s : s.split().count(word))

    sentiment_train_data = products_df.sample(frac=0.8, random_state=100)
    sentiment_validation_data = products_df.drop(sentiment_train_data.index)

    sentiment_X_train, sentiment_y_train = get_numpy_data(sentiment_train_data, important_words, 'sentiment')
    sentiment_X_valid, sentiment_y_valid = get_numpy_data(sentiment_validation_data, important_words, 'sentiment')

    print ('*****Sentiment data shape*****')
    print ('sentiment_X_train.shape: ', sentiment_X_train.shape)
    print ('sentiment_y_train.shape: ', sentiment_y_train.shape)
    print ('sentiment_X_valid.shape: ', sentiment_X_valid.shape)
    print ('sentiment_y_valid.shape: ', sentiment_y_valid.shape)

    return (sentiment_X_train, sentiment_y_train), (sentiment_X_valid, sentiment_y_valid)

## Load product reviews dataset
Like previous module, we load, preprocess data, convert and split them into train and test datasets. We dont't focus on that in this notebook, so you can just run the following cells. You can check out the load data code inside the folder **utils**.

In [6]:
train_set, val_set = get_product_reviews_data()

sentiment_X_train, sentiment_y_train = train_set
sentiment_X_valid, sentiment_y_valid = val_set

*****Sentiment data shape*****
sentiment_X_train.shape:  (42458, 194)
sentiment_y_train.shape:  (42458,)
sentiment_X_valid.shape:  (10614, 194)
sentiment_y_valid.shape:  (10614,)


# Build Feed forward neural network classifier using scikit learn
Now, let's use the built-in Feed forword Neural network learner [sklearn.neural_network.MLPClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier).

## Chose the hidden layer sizes
Lets first check the number of features in each dataset then we will choose the hidden layer size appropriately.

In [7]:
print("Sentiment feature size: ", sentiment_X_train.shape[1])

Sentiment feature size:  194


For the sentiment dataset we will choose 2 hidden layers with 64, 8 neurons correspondingly.

In [8]:
from sklearn.neural_network import MLPClassifier

clf_sentiment = MLPClassifier(solver='adam', alpha=1e-5, activation='relu',
                    hidden_layer_sizes=(64, 8), random_state=1)

clf_sentiment.fit(sentiment_X_train, sentiment_y_train)

print("***Sentiment result***")
print("Train accuracy: {}".format(clf_sentiment.score(sentiment_X_train, sentiment_y_train)))
print("Validation accuracy: {}".format(clf_sentiment.score(sentiment_X_valid, sentiment_y_valid)))

***Sentiment result***
Train accuracy: 0.9614206981016534
Validation accuracy: 0.7157527793480309




As you can see a simple feed forward neural network can get pretty reasonable results and the neural network can fit the pattern in the train set really well.
<br>
**Quiz**: What is the validation accuracy?
<br>
**Your answer:** 71.58%