# Keras Tutorial: Deep Learning in Python (Predict Wine Type)

This Keras tutorial introduces you to deep learning in python: learn to preprocess your data, model, evaluate and optimize neural networks.

## Introducing Artificial Neural Networks

### Perceptrons

The simplest neural network is the **perceptron**, consists of a single neuron. It is a simple tree structure which
has input nodes and a single output node, which is connected to each input node.

1. Input nodes: Each input node is associated with a numerical value.
2. Connections: Each connection that departs from the input node has a weight associated with it.
3. All values and weights of the connection are brought together: 
$$
y = f\bigg(\sum_{i=1}^D w_i * x_i\bigg)
$$
4. This result will be the input for a transfer or **activation** function. The most intuitive way that one can think of
it is by devising a system like the following:

\begin{dcases}
    f(x) = 0 & x < 0 \\
    f(x) = \frac{1}{2} & x = 0 \\
    f(x) = 1 & x > 0 \\
\end{dcases}

Of course, this is discontinuous function. Because of this we choose a continuous variant, the sigmoid function.
5. As a result, you have the output node, which is associated with the function (such as the sigmoid function) of the 
weighted sum of the input nodes.
6. Lastly, the perceptron may have an additional parameter, called a bias, which you can consider as the weight associated
with an additional input node that is permanently set to 1. The bias value is critical because it allows you to shift the
activation function to the left or right, which can make a determine the success of your learning.

### Multi-Layer Perceptrons

Networks of perceptrons are multi-layer perceptrons aslo known as **feed-foward neural networks**. These are more complex networks than
the perceptron, as they consist of multiple neurons that are organized in layers. The number of layers in usually limited to two or
three. Multi-layer perceptrons are often fully connected.

Note that while the perceptron could only represent linear seperations between classes, the multi-layer perceptron overcomes that limitation
and can also represent more complex decision boundaries.

## Predicting Wine Types: Red or White?

### Loading The Data

In [None]:
import pandas as pd

white = pd.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv", sep=';')
red = pd.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv", sep=";")

### Data Exploration

In [None]:
print(white.info())
print(red.info())

In [None]:
red.head()

In [None]:
white.tail()

In [None]:
red.sample(5)

In [None]:
white.describe()

In [None]:
pd.isnull(red)

### Visualizing The Data

#### Alcohol

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(1, 2)

ax[0].hist(red.alcohol, 10, facecolor='red', alpha=0.5, label="Red wine")
ax[1].hist(white.alcohol, 10, facecolor='white', ec="black", lw=0.5, alpha=0.5, label="White wine")

fig.subplots_adjust(left=0, right=1, bottom=0, top=0.5, hspace=0.05, wspace=1)
ax[0].set_ylim([0, 1000])
ax[0].set_xlabel("Alcohol in % Vol")
ax[0].set_ylabel("Frequency")
ax[1].set_xlabel("Alcohol in % Vol")
ax[1].set_ylabel("Frequency")
ax[0].legend(loc='best')
ax[1].legend(loc='best')
fig.suptitle("Distribution of Alcohol in % Vol")

plt.show()

As you can see in the image below, you see that the alcohol levels between the red and white wine are mostly the same:
they have around 9% of alcohol.

#### Sulfates

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(1, 2, figsize=(8, 4))

ax[0].scatter(red['quality'], red["sulphates"], color="red")
ax[1].scatter(white['quality'], white['sulphates'], color="white", edgecolors="black", lw=0.5)

ax[0].set_title("Red Wine")
ax[1].set_title("White Wine")
ax[0].set_xlabel("Quality")
ax[1].set_xlabel("Quality")
ax[0].set_ylabel("Sulphates")
ax[1].set_ylabel("Sulphates")
ax[0].set_xlim([0,10])
ax[1].set_xlim([0,10])
ax[0].set_ylim([0,2.5])
ax[1].set_ylim([0,2.5])
fig.subplots_adjust(wspace=0.5)
fig.suptitle("Wine Quality by Amount of Sulphates")

plt.show()

As you can see in the image below, the red wine seems to contain more sulfates than the white wine, which has 
fewer sulfates above $1g/dm^3$. For the white wine, there only seem to be a couple of exceptions that fall just
above $1g/dm^3$, while this is definitely more for the red wines.

#### Acidity

In [None]:
import matplotlib.pyplot as plt
import numpy as np

np.random.seed(570)

redlabels = np.unique(red['quality'])
whitelabels = np.unique(white['quality'])

import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 2, figsize=(8, 4))
redcolors = np.random.rand(6,4)
whitecolors = np.append(redcolors, np.random.rand(1,4), axis=0)

for i in range(len(redcolors)):
    redy = red['alcohol'][red.quality == redlabels[i]]
    redx = red['volatile acidity'][red.quality == redlabels[i]]
    ax[0].scatter(redx, redy, c=redcolors[i])
for i in range(len(whitecolors)):
    whitey = white['alcohol'][white.quality == whitelabels[i]]
    whitex = white['volatile acidity'][white.quality == whitelabels[i]]
    ax[1].scatter(whitex, whitey, c=whitecolors[i])
    
ax[0].set_title("Red Wine")
ax[1].set_title("White Wine")
ax[0].set_xlim([0,1.7])
ax[1].set_xlim([0,1.7])
ax[0].set_ylim([5,15.5])
ax[1].set_ylim([5,15.5])
ax[0].set_xlabel("Volatile Acidity")
ax[0].set_ylabel("Alcohol")
ax[1].set_xlabel("Volatile Acidity")
ax[1].set_ylabel("Alcohol") 
ax[0].legend(redlabels, loc='best', bbox_to_anchor=(1.3, 1))
ax[1].legend(whitelabels, loc='best', bbox_to_anchor=(1.3, 1))
fig.suptitle("Alcohol - Volatile Acidity")
fig.subplots_adjust(top=0.85, wspace=0.7)

plt.show()

In the image above, you see that the levels that you have read about above especially hold for the white wine: most 
wines with label 8 have volatile acidity levels of 0.5 or below.

### Preprocess Data

In [None]:
red['type'] = 1
white['type'] = 0

wines = red.append(white, ignore_index=True)

### Correlation Matrix

In [None]:
import seaborn as sns
corr = wines.corr()
sns.heatmap(corr, 
            xticklabels=corr.columns.values,
            yticklabels=corr.columns.values)

There are some variables that correlate, such as **density** and **residual sugar**.

### Train and Test Sets

In [None]:
from sklearn.model_selection import train_test_split

X = wines.iloc[:,0: 11]
y = np.ravel(wines.type)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

In [None]:
X[:5]

### Standardize The Data

In [None]:
from sklearn.preprocessing import StandardScaler

scaler1 = StandardScaler().fit(X_train)
scaler2 = StandardScaler().fit(X_test)

X_train = scaler1.transform(X_train)
X_test = scaler2.transform(X_test)

### Model Data

In [None]:
from keras.models import Sequential
from keras.layers import Dense 

model = Sequential()
model.add (Dense(12, activation='relu', input_shape=(11,)))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

In [None]:
model.output_shape
model.summary()
model.get_config()
model.get_weights()

### Compile and Fit

In [None]:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

model.fit(X_train, y_train, epochs=20, batch_size=1, verbose=1)

In [None]:
y_pred = model.predict(X_test)

In [None]:
y_pred[:5]

### Evaluate Model

In [None]:
score = model.evaluate(X_test, y_test, verbose=1)
print(score)

Note that the data is somewhat imbalanced. The accuracy might just be reflecting the class distribution of the data
because it'll just predict white since those observations are abundantly present!

In [None]:
from sklearn.metrics import confusion_matrix, precision_score, recall_score, f1_score, cohen_kappa_score

confusion_matrix(y_test, y_pred)