# Table of Contents
 <p><div class="lev1 toc-item"><a href="#An-Introduction-to-Torch-Framework" data-toc-modified-id="An-Introduction-to-Torch-Framework-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>An Introduction to Torch Framework</a></div><div class="lev2 toc-item"><a href="#Context" data-toc-modified-id="Context-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Context</a></div><div class="lev2 toc-item"><a href="#In-a-nutshell-:" data-toc-modified-id="In-a-nutshell-:-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>In a nutshell :</a></div><div class="lev1 toc-item"><a href="#Implementations" data-toc-modified-id="Implementations-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Implementations</a></div><div class="lev2 toc-item"><a href="#Loading-the-required-modules" data-toc-modified-id="Loading-the-required-modules-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Loading the required modules</a></div><div class="lev2 toc-item"><a href="#Initiallizing-default-parameters" data-toc-modified-id="Initiallizing-default-parameters-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Initiallizing default parameters</a></div><div class="lev2 toc-item"><a href="#Linear-Models---Overview" data-toc-modified-id="Linear-Models---Overview-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Linear Models - Overview</a></div><div class="lev3 toc-item"><a href="#Linear-Models-:-definition" data-toc-modified-id="Linear-Models-:-definition-2.3.1"><span class="toc-item-num">2.3.1&nbsp;&nbsp;</span>Linear Models : definition</a></div><div class="lev3 toc-item"><a href="#Fitting-linear-model" data-toc-modified-id="Fitting-linear-model-2.3.2"><span class="toc-item-num">2.3.2&nbsp;&nbsp;</span>Fitting linear model</a></div><div class="lev2 toc-item"><a href="#Linear-Regression-for-Classification-using-Torch" data-toc-modified-id="Linear-Regression-for-Classification-using-Torch-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Linear Regression for Classification using Torch</a></div><div class="lev3 toc-item"><a href="#Batch-Gradient-Descent" data-toc-modified-id="Batch-Gradient-Descent-2.4.1"><span class="toc-item-num">2.4.1&nbsp;&nbsp;</span>Batch Gradient Descent</a></div>

# An Introduction to Torch Framework

## Context

* **Institution** Pierre & Marie Curie Sorbonne University
* **Master DAC** Machine Learning and Deep Learning
* **Professors** Patrick Gallinari, Ludovic Denoyer, Nicolas Baskiotis
* **Students** : Youcef Benyettou & David Panou

This notebook follows the instructions given in _"Guide LUA"_ written by **Prof. Denoyer** and **Prof. Baskiotis** at UPMC.
Its intent is to allow the newcomers to get into LUA parametric models.

## In a nutshell : 
1. A loss can :
    * Can calculate its value by comparing the output for a given input 
    * Can compute its derivative for a given entry data point
2. A module can :
    * Can compute it's output for a given input (using its forward method)
    * Can update its gradient using its entry value and is delta (i.e the derivative of its error for a given output which is transmitted by the following module  loss)
    * Can compute its delta (the derivative of its error compared to ..)


# Implementations

## Loading the required modules

In [40]:
require 'nn'
require 'gnuplot'
require 'tools'

## Initiallizing default parameters

In [166]:
-- 1: Create a artificial dataset
DIMENSION=2 -- Entry dataset features dimensions
n_points=1000 -- Number of samples
  
-- We create two gaussians
mean_positive=torch.Tensor(DIMENSION):fill(1); 
var_positive=1.0

mean_negative=torch.Tensor(DIMENSION):fill(-1); 
var_negative=1.0

xs=torch.Tensor(n_points,DIMENSION)
ys=torch.Tensor(n_points,1)
for i=1,n_points/2 do  xs[i]:copy(torch.randn(DIMENSION)*var_positive+mean_positive); ys[i][1]=1 end
for i=n_points/2+1,n_points do xs[i]:copy(torch.randn(DIMENSION)*var_negative+mean_negative); ys[i][1]=-1 end
   

## Linear Models - Overview

### Linear Models : definition
A linear model is defined by the following equation :

$$ \hat{Y} = \hat{\beta_{0}} + \sum_{j=1}^{p}X_{j}*\hat{\beta_{j}}$$

With the $\beta$ vector being the model parameters.  This can be noted by the following vector product  : $$ \hat{Y} = \beta^T X$$ 

By design, linear models makes **huge** structural assumption about the data at hand. Thus sometimes, prediction power can be lowered but is very stable.

Such a discussion is about the **bias-variance dilemma** that is reminded at the end of this notebook.

### Fitting linear model

Linear models are usually fitted using the parameter combination that minimize the **Residual Sum of Square** _(RSS)_ parameter.

$$ RSS(\beta) = \sum_{i=1}^{N}(y_{i}-x^{T}_{i}\beta)^2$$

## Linear Regression for Classification using Torch

In [205]:
-- 2 : Creation of the model
model = nn.Linear(2,1)
criterion = nn.MSECriterion()
model:zeroGradParameters() -- Here, we are setting the cummulative of the gradients parameters to be equal to 0
model:reset() -- I don't know what this command is for but it was in the TP TODO : Look for it

### Batch Gradient Descent

Batch Gradient Descent is an optimization method that is using several simple points of the data to optimize a given Loss function, as opposed to Stochastic Gradient Descent that only use single-sample points.

**Pro**

The benefit of batch gradient descent is that the trajectory of the weight vector is smoothed, compared to that in corresponding single-sample algorithms , since at each update the full set of misclassified patterns is used, the local statistical variations in the misclassified patterns tend to cancel while the large-scale trend does not.

Thus, if the samples are linearly separable, all of the possible correction vectors form a linearly separable set, and if η(k) satisfies. The sequence of weight vectors produced by the gradient descent algorithm for will always converge to a solution vector.

**Cons**

On the other side, if the algorithm is to be used in an online (and by online, we mean *REAL* online) manner, then fixed size batch can't be used since frequency of new samples arrival is to be determined.

It has to be reminded that Batch doesn't treat data points has a TimeSeries but in a iid manner.

In [223]:
-- 3 : Learning Loop
learning_rate= torch.random(1,10)/10000 -- We put a random weight parameter in this.
maxEpoch = 50
all_losses={}
all_losses["stochastic"]= {}
all_losses["batch"]= {}
model:zeroGradParameters()

for iteration=1,maxEpoch do
    loss=0
    model:forward(xs)
    loss=criterion:forward(output,ys)
    model:backward(xs,criterion:backward(output,ys))
    updated_parameters = model:updateParameters(learning_rate)
    all_losses['batch'][iteration] = loss
end

--gnuplot.plot(torch.Tensor(all_losses['stochastic'])) 
gnuplot.plot(torch.Tensor(all_losses['batch'])) 

In [221]:
all_losses

{
  batch : 
    {


      1 : 3.5892947504199
      2 : 3.5854547499488
      3 : 3.5777839570721
      4 : 3.5663007657187
      5 : 3.5510327113318
      6 : 3.5320164044777
      7 : 3.5092974425746
      8 : 3.4829302999542


      9 : 3.4529781965211
      10 : 3.4195129453283
      11 : 3.3826147794355
      12 : 3.3423721584706
      13 : 3.2988815553588
      14 : 3.2522472237377
      15 : 3.2025809466173
      16 : 3.1500017668943
      17 : 3.0946357003697
      18 : 3.0366154319642
      19 : 2.9760799958658
      20 : 2.9131744403792
      21 : 2.8480494782906
      22 : 2.780861123589
      23 : 2.7117703154234
      24 : 2.6409425302035
      25 : 2.5685473827806
      26 : 2.4947582176728
      27 : 2.4197516913208
      28 : 2.3437073463836
      29 : 2.2668071791025
      30 : 2.1892352007785
      31 : 2.1111769944207


      32 : 2.0328192676386
      33 : 1.9543494028559
      34 : 1.8759550059339
      35 : 1.7978234542925
      36 : 1.7201414456227
      37 : 1.6430945482768
      38 : 1.5668667544252
      39 : 1.491640037057
      40 : 1.4175939118943
      41 : 1.3449050052778
      42 : 1.2737466290676
      43 : 1.2042883635843
      44 : 1.1366956495986
      45 : 1.0711293903527


      46 : 1.0077455645754
      47 : 0.94669485142475
      48 : 0.88812226826195
      49 : 0.83216682213306
      50 : 0.77896117579735
    }
  stochastic : table: 0x0f1ed8d0
}


In [48]:
Y = torch.Tensor(50,1)
for i=1,50 do Y[i]=i end
x = torch.Tensor(50,1)

In [87]:
gnuplot.plot(Y,)

...rs/david/torch/install/share/lua/5.1/gnuplot/gnuplot.lua:613: bad argument #2 to 'format' (number expected, got userdata)
stack traceback:
	[C]: in function 'format'
	...rs/david/torch/install/share/lua/5.1/gnuplot/gnuplot.lua:613: in function 'gnuplot_string'
	...rs/david/torch/install/share/lua/5.1/gnuplot/gnuplot.lua:835: in function 'gnulplot'
	...rs/david/torch/install/share/lua/5.1/gnuplot/gnuplot.lua:976: in function 'f'
	[string "local f = function() return gnuplot.plot(Y,to..."]:1: in main chunk
	[C]: in function 'xpcall'
	/Users/david/torch/install/share/lua/5.1/itorch/main.lua:179: in function </Users/david/torch/install/share/lua/5.1/itorch/main.lua:143>
	/Users/david/torch/install/share/lua/5.1/lzmq/poller.lua:75: in function 'poll'
	/Users/david/torch/install/share/lua/5.1/lzmq/impl/loop.lua:307: in function 'poll'
	/Users/david/torch/install/share/lua/5.1/lzmq/impl/loop.lua:325: in function 'sleep_ex'
	/Users/david/torch/install/share/lua/5.1/lzmq/impl/loop.lua:370: in function 'start'
	/Users/david/torch/install/share/lua/5.1/itorch/main.lua:350: in main chunk
	[C]: in function 'require'
	(command line):1: in main chunk
	[C]: at 0x010eaa4bd0: 