> “For the simplicity on this side of complexity, I wouldn't give you a fig. But for the simplicity on the other side of complexity, for that I would give you anything I have.” 

Oliver Wendell Holmes, Jr. 

# Theano - introduction

_Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently._

- Linear algebra
- Differentiation
- Optimized for GPUs
- Run parallel code over shared memory
- 1.8x as fast as Numpy using CPU
- 11x as fast using GPU

See http://deeplearning.net/software/theano

### Installation

To install Theano in your current Anaconda environment run:

$ pip install theano

### Load all required libraries

In [25]:
import numpy         as np
import itertools     as it
import theano        as th
import theano.tensor as T
th.__version__

'0.8.2'

### Symbolic mathematics: arithmetic

Symbolic mathematics is the study of mathematical expressions, which are expressions or formulas involving variables. For example, $x^2 + 3x$ is an expression and $y = x^2 + 3x$ is a formula. This is distinct from numerical mathematics, in which numbers are studied (not variables). 

In particular, Theano is able to compute symbolic _derivatives_ and _gradients_ of these sympolic expressions. 

Below we step through the details to demonstrate symbolic mathematics in Theano.

For the first example, create _scalar_ variables `x`, `y` and `z`. 

In [147]:
x = T.scalar()
y = T.scalar()
z = T.scalar()

Store in `result` an expression in variables `x`, `y` and `z`. 

In [148]:
result = x * y + z

The  variable `result` is considered both an expression and a formula. 

Compile this expression into a function and run this function with numeric input.

In [150]:
result_function = th.function(inputs=[x, y, z],
                              outputs=result)

result_function(3,2,1)

array(7.0)

Theano can also create symbolic _vectors_ (in addition to creating scalar variables as above). 

Create vector variables `x` and `y`. 

In [209]:
x = T.dvector()
y = T.dvector()

This function returns the vector which is the element-wise sum of the input vectors.

In [210]:
vec_result = x + y
f = th.function(inputs=[x, y],
                outputs=vec_result)
f(np.array([1,2,3]),
  np.array([4,5,6]))

array([ 5.,  7.,  9.])

Notice that the length of the input vectors, `x` and `y`, is not specified.

In [211]:
f(np.array([1,2,3,-1,-2,-3]),
  np.array([5,6,7,8,9,10]))

array([  6.,   8.,  10.,   7.,   7.,   7.])

The dot product is a fundamental operation in linear algebra and so in machine learning. 

This function returns the _dot product_ of the input vectors. 

In [212]:
vec_result = x.dot(y)
f = th.function(inputs=[x, y],
                outputs=vec_result)
f(np.array([1,2,3]),
  np.array([4,5,6]))

array(32.0)

Theano can also create symbolic matrices, which are two dimensional arrays (with rows and columns). 

Create _matrix_ variables `x` and `y`. 

In [155]:
x = T.dmatrix()
y = T.dmatrix()

This function returns the matrix which is the element-wise sum of the input matrices.

In [156]:
mat_result = x + y
f = th.function(inputs=[x, y],
                outputs=mat_result)

In [157]:
f(np.array([[1,2,3],[10,20,30]]),
  np.array([[4,5,6],[40,50,60]]))

array([[  5.,   7.,   9.],
       [ 50.,  70.,  90.]])

Notice that the shape of the input matrices, `x` and `y`, is not specified.

In [158]:
f(np.array([[1,2],[10,20],[100,200]]),
  np.array([[4,5],[40,50],[400,500]]))

array([[   5.,    7.],
       [  50.,   70.],
       [ 500.,  700.]])

The following matrices have a single row.

In [99]:
f(np.array([[1,2,3]]),
  np.array([[4,5,6]]))

array([[ 5.,  7.,  9.]])

This function returns the matrix which is the element-wise product of the input matrices.

In [213]:
mat_mult = a * b
f = th.function(inputs=[a, b],
                outputs=mat_mult)
f(np.array([[1,2],[3,4]]),
  np.array([[5,6],[7,8]]))

array([[  5.,  12.],
       [ 21.,  32.]])

This function returns the matrix which is the _matrix product_ (using the _dot product_) of the input matrices.

The multiplication implemented in the following example is explained, step by step, in [this example](https://www.mathsisfun.com/algebra/matrix-multiplying.html).

In [214]:
mat_mult = a.dot(b)
f = th.function(inputs=[a, b],
                outputs=mat_mult)

In [218]:
np.array([[1,2,3],[3,2,1]])

array([[1, 2, 3],
       [3, 2, 1]])

In [219]:
np.array([[7,8],[9,10],[11,12]])

array([[ 7,  8],
       [ 9, 10],
       [11, 12]])

In [217]:
f(np.array([[1,2,3],[4,5,6]]),
  np.array([[7,8],[9,10],[11,12]]))

array([[  58.,   64.],
       [ 139.,  154.]])

See [Baby Steps - Algebra](http://deeplearning.net/software/theano/tutorial/adding.html)
from the Theano documentation for more examples and details on: 

- data types
- tensors multi-dimensional arrays
- 32 and 64 bit variables

### Symbolic mathematics: differentiaton and the gradient

Create a symbolic variable `x`. We will find the derivative below of an expression in `x`.

In [221]:
x = T.scalar()

Store in `y` the expression `x^2 + 3x + 4` (in the variable `x`).

In [222]:
y = x ** 2 + 3*x + 4

Take the derivative of `y` with respect to `x`, which is `2x + 3`.

In [223]:
dydx = T.grad(y,x)

Create a function from the expression in `dydx`.

In [224]:
f_dydx = th.function(inputs=[x],
                     outputs=dydx)

f_dydx(2)

array(7.0)

Create scalar variables `x` and `y`. We will create an expression in `x` and `y`, and then take its partial derivative below.

In [225]:
x = T.dscalar()
y = T.dscalar()

Create an expression `z`, which is `x^3 + y^2 + xy`. 

In [226]:
z = x**3 + y**2 + x*y

Create the (partial) derivative of `z` with respect to `x`, which is `3x^2 + y`.

In [231]:
dzdx = T.grad(z,x)
f_dzdx = th.function(inputs=[x,y],
                     outputs=dzdx)
f_dzdx(2,3)

array(15.0)

Create the (partial) derivative of `z` with respect to `y`, which is `2y + x`.

In [232]:
dzdy = T.grad(z,y)
f_dzdy = th.function(inputs=[x,y],
                     outputs=dzdy)
f_dzdy(2,3)

array(8.0)

Create the (partial) derivative of `z` with respect to `x` and `y`, 
which is the vector function
$ \left(3x^2 + y, 2y + x\right)
$,
which is a vector for any pair values for $x$ and $y$. 

In [233]:
dzdxdy = T.grad(z,[x,y])
f_dzdxdy = th.function(inputs=[x,y],
                       outputs=dzdxdy)
f_dzdxdy(2,3)

[array(15.0), array(8.0)]

__Alert! Don't miss this!__

The vector $(15,8)$ is the direction of greatest increase of the function $z(x,y) = x^3 + y^2 + xy$ at the point $(2,3)$.

The function `f_dzdxdy` returns the vector/direction of greatest increase of the $z(x,y) = x^3 + y^2 + xy$ for any point (as specified by input values of `x` and `y`.

The function $ grad(z) = \left(3x^2 + y, 2y + x\right)$ is called 
the _gradient_ of the function $ z = x^3 + y^2 + xy$.

__Alert! Don't miss this!__

It is sometimes simpler to create an expression from a single vector variable instead of two or more scalar variables. It is also more general since vectors need not be a fixed length.

Importantly, Theano can take derivatives of expressions in vector variables, as long as the expression returns a scalar result.

Create a vector variable `x`.

In [235]:
x = T.dvector()

This function returns the sum of the elements of the dot product of the input vector with itself.

This expression can be written $z = x_1^2 + x_2^2 + ... + x_n^2$ where the variables $x_i$ are single elements of the vector variable $x$.

In [241]:
z = T.sum(x.dot(x))
f = th.function(inputs=[x],
                outputs=sum_sqr)
f([1,2,3])

array(14.0)

The partial derivative of $z$ with respect to $x_i$ is $2x_i$.

In [244]:
dzdx = T.grad(sum_sqr,x)
f_dzdx = th.function(inputs=[x],
                     outputs=dzdx)
f_dzdx([-2,3])

array([-4.,  6.])

Now we create a more involved example of the derivative/gradient of an expression of a single vector variable.

Create a _shared_ variable of constants. 

In [250]:
s = th.shared(np.array([0,1,2]))

Create an expression, which is the sum of each component of the vector 
raised to the corresponding power in the shared variable `s`.

The resulting expression is $z = 1 + x_2 + x_3^2$. 

In [253]:
z = T.sum(x**s)
f = th.function(inputs=[x],
                outputs=z)
f([6,5,4])

array(22.0)

In [244]:
dzdx = T.grad(z,x)
f_dzdx = th.function(inputs=[x],
                     outputs=dzdx)
f_dzdx([-2,3])

array([-4.,  6.])

??? http://deeplearning.net/software/theano/library/tensor/basic.html

In [204]:
dfdx = T.grad(result,x)
fun_dfdx = th.function(inputs=[x],
                       outputs=dfdx)
fun_dfdx([3,2,4])

array([  1.,   4.,  48.])

See the Theano documentation:  
- [Derivatives](http://deeplearning.net/software/theano/tutorial/gradients.html)
- [Complex variable derivatives](http://deeplearning.net/software/theano/proposals/complex_gradient.html)

See [Gradient](https://en.wikipedia.org/wiki/Gradient#Definition) at Wikipedia.

### Shared variables

[Theano documentation](http://deeplearning.net/software/theano/tutorial/examples.html#using-shared-variables)

### Linear regression example

### Multi linear regression example

### Set initial parameters

In [9]:
N_samples      =    10  # training sample size
N_variables    =     3  # number of input variables
training_steps = 10000  # number of training iterations

x_start        = 0
x_stop         = 1
x_numof        = 5

rnd_mul  = 0.001

### Create artificial weighting vector

In [10]:
x_gen    = np.random.randint(1, 4, 1+N_variables)
x_gen

array([3, 3, 2, 2])

These weights are used to create `x_train` as a grid of equally spaced input values.

### Create sample dataset `x_train`

The first column of `x_train` is all ones. 

In [11]:
x_train = np.column_stack([np.ones(x_numof**N_variables),
                           np.array(list(it.product(np.linspace(0,1,5),
                                                    repeat=N_variables)))])

### Create sample vector  `y_train` of target variables 



In [12]:
y_train = x_train.dot(x_gen) + np.random.randn(x_numof**N_variables) * rnd_mul
y_train.shape, y_train

((125,), array([  2.99943259,   3.5025544 ,   4.00181057,   4.50031131,
          5.00124727,   3.50004208,   3.99976366,   4.50049924,
          5.001439  ,   5.50060442,   3.99969206,   4.49849135,
          4.999687  ,   5.49862285,   5.99984756,   4.50015618,
          5.00002981,   5.50164639,   6.00026615,   6.50241561,
          4.99939253,   5.50093789,   6.00028882,   6.49994953,
          6.99997334,   3.75089448,   4.25065735,   4.74980069,
          5.25116708,   5.74928774,   4.25004467,   4.75031688,
          5.24913927,   5.75251056,   6.25068597,   4.74983374,
          5.24983452,   5.74846755,   6.24747626,   6.74830639,
          5.25022626,   5.74777387,   6.25070263,   6.75056415,
          7.25001836,   5.7504393 ,   6.2502899 ,   6.75097436,
          7.2511491 ,   7.75154026,   4.50081073,   5.00236813,
          5.50137596,   5.99953593,   6.50085864,   4.99912806,
          5.50056034,   6.00033535,   6.50119423,   6.99937306,
          5.50262817,   5.999951

### Declare Theano symbolic variables

- `x` is set to `x_train`
- `y` is set to `y_train`
- `w` contains the weights to be determined by training

In [13]:
x = T.matrix("x")
y = T.vector("y")

w = th.shared(np.random.randn(1+N_variables), 
              name="w")
print('Initial model (w):',w.get_value())

Initial model (w): [-0.62460173 -0.30536624 -1.13324668  0.81653627]


### Create theano formula `prediction` as matrix multiplication of `x` by `w`

In [14]:
prediction = T.dot(x,w) 

### Create the `predict` function

- input: one or more rows from the `x_train` array
- output: matrix product of `x_train` times the current weights vector `w`

The output is a vector with number of rows equal to the number of rows of the input matrix

In [15]:
predict = th.function(inputs=[x],
                      outputs=prediction)

For example: 

In [16]:
predict(x_train[0:3])

array([-0.62460173, -0.42046766, -0.21633359])

### Create the cost function and calculate its gradient

- `cost` is the mean square error between the predictions and target values

Notice that the variables in the cost function are `x`, `w` and `y`. 

In [17]:
cost = T.mean(T.sqr(T.dot(x,w) - y))

### Compute the gradient with respect to the _vector_ `w` of weights

In [18]:
cost_grad = T.grad(cost, w) 

### Create a theano _training_ function (called `train`)

- input: `x=x_train` and `y=y_train`
- output: the `cost` (mean square error) which is based on the target values in `y_train` and the predicted values `T.dot(x,w)`
- updates: the weights `w` are updated based on the gradient of the cost 

In [19]:
train = th.function(
          inputs=[x,y],
          outputs=[cost],
          updates=[(w, w - 0.1 * cost_grad)])

### Train to modify weights in `w`

In [20]:
for i in range(10000):
    err = train(x_train, 
                y_train)
    print('err :',err)

err : [array(58.05881643915938)]
err : [array(23.905517369722293)]
err : [array(9.956890787158775)]
err : [array(4.2547160325467415)]
err : [array(1.9185542789838046)]
err : [array(0.9565679465203732)]
err : [array(0.5558229181666526)]
err : [array(0.3845228122892324)]
err : [array(0.3072352265547781)]
err : [array(0.2686691893531648)]
err : [array(0.24624210848583938)]
err : [array(0.23072124521621518)]
err : [array(0.21832302838464226)]
err : [array(0.20748856980002656)]
err : [array(0.19756764325332474)]
err : [array(0.18828155179042547)]
err : [array(0.17950387833679254)]
err : [array(0.17117097323666214)]
err : [array(0.1632453918815809)]
err : [array(0.15570094394047257)]
err : [array(0.14851656418408535)]
err : [array(0.1416737849476245)]
err : [array(0.1351556797849263)]
err : [array(0.12894640876831331)]
err : [array(0.12303101049305609)]
err : [array(0.11739529586996812)]
err : [array(0.11202578450818598)]
err : [array(0.10690965947651185)]
err : [array(0.10203473051292844)]


In [21]:
print("Generating model:",x_gen)
print("Final model (w):",w.get_value())
print("predictions (on x_train):",predict(x_train))
print("target values (y_train):",y_train)

Generating model: [3 3 2 2]
Final model (w): [ 3.00040362  2.99976511  1.99969358  2.00008028]
predictions (on x_train): [ 3.00040362  3.50042369  4.00044376  4.50046383  5.0004839   3.50032701
  4.00034709  4.50036716  5.00038723  5.5004073   4.00025041  4.50027048
  5.00029055  5.50031062  6.00033069  4.5001738   5.00019388  5.50021395
  6.00023402  6.50025409  5.0000972   5.50011727  6.00013734  6.50015741
  7.00017748  3.7503449   4.25036497  4.75038504  5.25040511  5.75042518
  4.25026829  4.75028836  5.25030843  5.7503285   6.25034857  4.75019169
  5.25021176  5.75023183  6.2502519   6.75027197  5.25011508  5.75013515
  6.25015522  6.75017529  7.25019536  5.75003848  6.25005855  6.75007862
  7.25009869  7.75011876  4.50028618  5.00030625  5.50032632  6.00034639
  6.50036646  5.00020957  5.50022964  6.00024971  6.50026978  7.00028985
  5.50013297  6.00015304  6.50017311  7.00019318  7.50021325  6.00005636
  6.50007643  7.0000965   7.50011657  8.00013664  6.49997976  6.99999983
  7

Modified from 

- http://deeplearning.net/software/theano/tutorial/examples.html#a-real-example-logistic-regression