## Theano Tutorial: Basics

Theano is a library that adds a mathematical background to Python, primarily focused on the calculus of mathematical expressions involving matrices and tensors. It is one of Tensorflow's "competitors" 

Pros:
* Simpler
* More open to different hardware (libgpuarray como padrão)
* Numpy integration
* Compiled (?)

Cons:
* Smaller community
* No Sessions
* No TensorBoard
* Compiled (?)

In [2]:
from theano import *
import theano.tensor as T
import numpy as np #np.arrays are used for input and output

### Theano
Theano's workflow is similar to Tensorflow's:

We first describe a computation graph

It is statically typed: we have to define the variables type and they don't change

Then the function "theano.function" compiles and prepares the graph for use, returning a function that can be normally called in python

In [3]:
#Define a computational graph
x = T.dscalar()
y = T.dscalar()
z = x + y
f = function([x,y], z) #Compile in C

In [4]:
print(f)
print(z)
print(f(1, 2))
print(z.eval({x: 1, y: 2})) #eval does not compile, but it's not very flexible

<theano.compile.function_module.Function object at 0x7f43088ca240>
Elemwise{add,no_inplace}.0
3.0
3.0


### Vectors and Matrices
Example: $Ax + b$

In [5]:
A = T.dmatrix('A')
x = T.dvector('x')
b = T.dvector('b')
z = theano.dot(A, x) + b 
f = function([A, x, b], z)

In [6]:
A = np.array([[1, -1], [-1, 1]])
x = np.array([2, 1])
b = np.array([1, 1])
f(A, x, b)

array([ 2.,  0.])

### Types of Variables
Prefix: size and type
* b = byte
* w = 16bits integer
* i = 32bits integer
* l = 64bits integer
* f = float
* d = double
* c = complex number

Radical: form
    
scalar, vector, matrix, row, col, tensor3, tensor4, tensor5

### Variables with Inner State

Shared variables are symbolic (may be used to construct the computational graph) and non-symbolic (can act as normal variables by using the get_value() and set_value() methods)

In [8]:
from theano import shared
count = shared(0) #Creates a shareable "0"
inc = T.iscalar()
result = count + inc
acc = function([inc], result)

In [9]:
print(acc(1))
print(count.get_value())

1
0


In [10]:
count.set_value(acc(1))
print(count.get_value())
count.set_value(acc(1))
print(count.get_value())
count.set_value(acc(10))
print(count.get_value())

1
2
12


This update logic may be included in the function itself by using the optional "updates" argument in theano.function. It receives a list of (shared variable, mathematical expression of update) pairs and executes the update after the computation is over

In [11]:
count = shared(0)
inc = T.iscalar()
acc = function([inc], count + inc, updates=[(count, count + inc)])

In [12]:
print(acc(1))
print(acc(1))
print(acc(20))
print(count.get_value())

1
2
22
22


### Random Variables
A special kind of shared variable. It is a stream of random numbers, called when needed. By default they are updated as to generate a random number each time

In [13]:
from theano.tensor.shared_randomstreams import RandomStreams
srng = RandomStreams(seed=0) 
num = srng.uniform()
f = function([], num)
g = function([], num, no_default_updates=True)

In [14]:
print("Update generator")
for _ in range(3):
    print(f())
print("\nDo not update gerador")
for _ in range(3):
    print(g())

Update generator
0.48604732751846313
0.6857123374938965
0.9855760335922241

Do not update gerador
0.19559641182422638
0.19559641182422638
0.19559641182422638


### Gradients and Derivation (Basic)

The graph format makes it specially easy to calculate derivatives and gradients by chain rule, given that each node is a simple operation that is composed with others to create a complex expression (see: http://cs231n.github.io/optimization-2/ )

With that, Theano's "grad" function can simbolically calculate the gradient or derivative of a function:

In [18]:
from theano import pp #pretty print
x = T.dscalar('x')
y = x ** 2 
dy = T.grad(y, x) #dy/dx
f = theano.function([x], dy)
pp(dy)

'((fill((x ** TensorConstant{2}), TensorConstant{1.0}) * TensorConstant{2}) * (x ** (TensorConstant{2} - TensorConstant{1})))'

In [20]:
print(f(3), f(0), f(1))

-0.2222222222222222 -inf -6.0
