# Loading dataset

In [1]:
!cat ../data/winequality.names

Citation Request:
  This dataset is public available for research. The details are described in [Cortez et al., 2009]. 
  Please include this citation if you plan to use this database:

  P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. 
  Modeling wine preferences by data mining from physicochemical properties.
  In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.

  Available at: [@Elsevier] http://dx.doi.org/10.1016/j.dss.2009.05.016
                [Pre-press (pdf)] http://www3.dsi.uminho.pt/pcortez/winequality09.pdf
                [bib] http://www3.dsi.uminho.pt/pcortez/dss09.bib

1. Title: Wine Quality 

2. Sources
   Created by: Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV) @ 2009
   
3. Past Usage:

  P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. 
  Modeling wine preferences by data mining from physicochemical properties.
  In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 016

In [2]:
!head ../data/winequality-red.csv

"fixed acidity";"volatile acidity";"citric acid";"residual sugar";"chlorides";"free sulfur dioxide";"total sulfur dioxide";"density";"pH";"sulphates";"alcohol";"quality"
7.4;0.7;0;1.9;0.076;11;34;0.9978;3.51;0.56;9.4;5
7.8;0.88;0;2.6;0.098;25;67;0.9968;3.2;0.68;9.8;5
7.8;0.76;0.04;2.3;0.092;15;54;0.997;3.26;0.65;9.8;5
11.2;0.28;0.56;1.9;0.075;17;60;0.998;3.16;0.58;9.8;6
7.4;0.7;0;1.9;0.076;11;34;0.9978;3.51;0.56;9.4;5
7.4;0.66;0;1.8;0.075;13;40;0.9978;3.51;0.56;9.4;5
7.9;0.6;0.06;1.6;0.069;15;59;0.9964;3.3;0.46;9.4;5
7.3;0.65;0;1.2;0.065;15;21;0.9946;3.39;0.47;10;7
7.8;0.58;0.02;2;0.073;9;18;0.9968;3.36;0.57;9.5;7


In [3]:
data = dlmread("../data/winequality-red.csv", ";" ,1, 0); % skip feature names
size(data)

ans =

   1599     12



# Separating into training and testing set

In [4]:
X = data(:,1:11); % inputs
Y = data(:, 12);  % labels

In [5]:
X_train = X(1:1119, :); % 70% for trainig
Y_train = Y(1:1119);

X_test = X(1120:1599, :);  % 30% for test
Y_test = Y(1120:1599);

# ANN Architecture

![Diagram](../Diagram.png)

In [6]:
% M number of labeled inputs
% N number of features (lenght of input vector)
[M, N] = size(X_train)

M =  1119
N =  11


In [7]:
O = 10 % number of neurons in the hidden layer

O =  10


In [8]:
% initial weights matrix as small random values
W = rand([O N]).*0.01; % W: OxN

In [9]:
% adding column for bias
X_train_bias = [ones(size(X_train,1),1), X_train]'; % X_tr_b: MxN+1

## ANN function:

Feedfoward output:


$ \mathbf{u} = \mathbf{Wx} $  
$ \mathbf{a} = f(\mathbf{u}) = \tanh(\mathbf{u})$  
$ y = \sum \mathbf{a}$

$$
    y = \sum \tanh{(\mathbf{Wx})}
$$

## Backpropagation


Gradient descent for minimizing error function J:

$$
\mathbf{W^{N+1}} = \mathbf{W^N} - \alpha \nabla{J} =  \mathbf{W^N} - \alpha \frac{\partial{J}}{\partial{\mathbf{W}}}
$$

Definig error function as squared error:

$$
J = e^2 = (s-y)^2
$$

Chain rule for finding gradient of J:

$$
\frac{\partial{J}}{\partial{\mathbf{W}}} = \frac{\partial{J}}{\partial{e}} \frac{\partial{e}}{\partial{y}} \frac{\partial{y}}{\partial{\mathbf{a}}} \frac{\partial{\mathbf{a}}}{\partial{\mathbf{u}}} \frac{\partial{\mathbf{u}}}{\partial{\mathbf{W}}} = 2 (-1) (1) \mathbf{\dot{F}(u)} \mathbf{x} = \boldsymbol{\delta} \mathbf {x}
$$

Where the array $\mathbf{\dot{F}(u)}$ is defined as:

$$
\mathbf{\dot{F}(u)} = 
\begin{bmatrix}
  \dot{f}(u_1) & 0 & \cdots  & 0 \\
  0 & \dot{f}(u_2) & \cdots  & 0 \\
  \vdots   & \vdots & \ddots & \vdots \\
  0 & 0 & \cdots  & \dot{f}(u_O) \\
\end{bmatrix}
$$

and:

$\dot{f}(u) = \frac{d \tanh(u)}{du} = sech^2(u)$

Error metric:

$$
MSE = \frac{1}{M}\sum_M e^2 = \frac{1}{M}\sum_M (s - y)^2
$$

In [10]:
function Delta = get_delta(u, s, y),
    F_prime = zeros(size(U,2), size(U,2));
    for i = 1:size(U,2),
        for j = 1:size(U,2),
            if i==j,
                F_prime(i,j) = (sech(u(1)))^2; % F: OxO
            end
        end
    end
    
    Delta = -2*F_prime*(s-y) % delta: OxO
    
end
                
                

In [11]:
function MSE = backprop_step(x, d, W, alpha, epsilon),
    % one step of backpropagation (one input x)
    
    % feedfoward
    u = x*W';      % u: 1xO
    a = tanh(u);   % output of hidden layer a: 1xO
    y = sum(a);    % y: 1x1

    % backpropagation
    s = y - d;     % difference s: 1x1
    Delta = get_delta(u, s, y); % delta: OxO
    aux = delta*y';             % aux: OxN !
    W = W - alpha.*aux; 

    % metric
    MSE = (e'*e)/N;
end

In [12]:
backprop()

error: 'backprop' undefined near line 1 column 1
