# Cellar Tracker MLP Classifier

In [None]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder

## Data Prep
Read the data directly from data.world into a Pandas data frame

In [None]:
df = pd.read_csv('https://query.data.world/s/r8VVStpwMgTZlScD_VyeGd6JUEX4fm')

### Analysis

If we look at the columns available to us, there's a few to try out for a classifier:
- review/text is the text of the review written by the user
- review/userId or review/userName identify the user in question
- wine/name references the year, vintner, and name of the wine (so it includes wine/year)
- wine/variant is the style of wine
- wine/year is the year of vintage

review/points is the target column for regression (which could be repurposed into classification if so desired)

In [None]:
df.head() # view some of the data frame

In [None]:
# We have to deal with nulls. For simplicty, we'll remove them but any normal null imputation options are on the table.

clean_df = df[df['review/points'].notnull()]
clean_df = clean_df[clean_df['wine/year'].notnull()]

# Since we're headed towards a neural network, let's go ahead and use LabelEncoder to convert each wine variant into 
# a unique ID.  fit_transform will go ahead and both determine and ID encoding plus return the transformed variants
# for us to save back into the data frame. 
encoder = LabelEncoder()
clean_df['wine/variantId'] = encoder.fit_transform(clean_df['wine/variant'].values)

In [None]:
clean_df[clean_df['wine']