# Wine Price Prediction - Model Definition v3

- **v1** - Trained simple DL model and LGB model
- **v2** - Added wider and deeper DL models
- **v3** - Update model configurations for updated dataset

## Data and Use case

[**Wine Reviews** - 130k wine reviews with variety, location, winery, price, and description](https://www.kaggle.com/zynicide/wine-reviews/home)

This dataset is available on Kaggle and contains around 130k of wine reviews. The data was scraped from [WineEnthusiast](http://www.winemag.com/?s=&drink_type=wine) on November 22nd, 2017.

I plan to use this dataset to develop a model that predicts wine price for specified set of parameters, like wine variety, region, desired quality. Such model, may be integrated into an application that runs on a mobile device to suggest price range during wine shopping without need to do online search.

## Model Definition

Our problem is a supervised machine learning task: basing on historical data we want to predict wine price values. As we saw, our dataset contains mostly categorical features (at the first iteration at least). 

According to the course requirements we need to implement our algorithm in at least one deep learning and at least one non-deep learning algorithm. Let's start.

### Model Performance Indicator

We start with choosing a model performance indicator first.  

For our problem we select **the mean squared error (MSE)** that measures the average of the squares of the errors — that is, the average squared difference between the estimated values and what is estimated. MSE has the same units of measurement as the square of the quantity being estimated.

I will also look at the square root of MSE that yields **the root-mean-square error (RMSE)**, which has the same units as the quantity being estimated.

## Deep Learning Algorithm

I will try to create an MLP model using Keras to solve this regression problem. Keras allows to define MLP model in a quite simple and quick way. It's also supported in Watson Studio with TensorFlow as a back-end.

In [1]:
from keras.models import Sequential
from keras.layers import Dense, Dropout

Using TensorFlow backend.


Let's start with relatively simple MLP network that includes just two layers.

In [2]:
num_of_features = 1146 # number of inputs

model = Sequential()  # Instantiate sequential model
model.add(Dense(1200, activation='relu', input_shape=(num_of_features,)))
model.add(Dense(1))

Let's compile our model with Adam optimizer and MSE as a loss function.

In [3]:
model.compile(optimizer='adam', loss='mse')

Let's save our model for future training.

In [4]:
model.save("wine-price-prediction.model.keras.h5")

### Deeper Model

In [5]:
model = Sequential()  # Instantiate sequential model
model.add(Dense(1536, activation='relu', input_shape=(num_of_features,)))
model.add(Dropout(0.3))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(1))

model.compile(optimizer='adam', loss='mse')

model.save("wine-price-prediction.model.keras.deep.h5")

### Wider Model

In [6]:
model = Sequential()  # Instantiate sequential model
model.add(Dense(3000, activation='relu', input_shape=(num_of_features,)))
model.add(Dropout(0.3))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(1))

model.compile(optimizer='adam', loss='mse')

model.save("wine-price-prediction.model.keras.wide.h5")

## Non-Deep Learning Model

Our problem is a supervised machine learning task: basing on historical data we want to predict wine price values. As we saw, our dataset contains mostly categorical features (at the first iteration at least). Tree models are particularly well suited for this type of dataset since each tree branching can be mapped to a distinct value from a categorical datum.

Tree models are often so effective at fitting the data that they usually do suffer from overfitting. A common solution is to consider tree ensembles to increase variance without reducing bias to much.

Here we have chosen a Gradient Boosted Trees model were an ensemble of trees are successively trained in a sequence, where a new tree is trained on a subset of date that was not correctly predicted by its predecessor, adding to the collective prediction of the tree ensemble. Trees can grow exponentially large for large datasets with many variables. In our case the data set is relatively small.

I am going to use LightGBM gradient boosting framework (https://lightgbm.readthedocs.io/en/latest/). This framework seems to be quite popular on Kaggle and provides Python API, so I will be able to use it in Watson Studio.

In [8]:
# !pip install lightgbm
import lightgbm as lgb
from lightgbm import LGBMModel

import pickle

In [9]:
seed=123

In [10]:
model2 = LGBMModel(objective="regression", metric=["mse", "rmse"], num_leaves=50, 
                   learning_rate=0.01, bagging_fraction=0.75, feature_fraction=0.8, 
                   bagging_frequency=9, n_estimators=5000, random_state=seed)

In [11]:
with open('wine-price-prediction.model.lgbm.pickle', 'wb') as file:
    pickle.dump(model2, file)