# Theano Introduction
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. 

#### Why theano you ask?
- tight integration with NumPy – Use numpy.ndarray in Theano-compiled functions.
- transparent use of a GPU – Perform data-intensive calculations up to 140x faster than with CPU.(float32 only)
- efficient symbolic differentiation – Theano does your derivatives for function with one or many inputs.
- speed and stability optimizations – Get the right answer for log(1+x) even when x is really tiny.
- dynamic C code generation – Evaluate expressions faster.
- extensive unit-testing and self-verification – Detect and diagnose many types of errors.

## 1. Getting started

### 1.1 Checking your installation

See if theano imports properly

In [None]:
import numpy
import theano.tensor as T

### 1.2 Theano tensors
Theano uses <b>tensors</b> to store data. In the examples that follow we shall see some of the common tensor types. These tensors come along with various routines attached with them. Find the list of all the tensors and routines here:

http://deeplearning.net/software/theano/library/tensor/basic.html|

### 1.3 Adding two scalars
Reference : http://deeplearning.net/software/theano/tutorial/adding.html

In [None]:
import numpy
import theano.tensor as T
from theano import function

#Data structure: scalar
x = T.dscalar('x')
y = T.dscalar('y')
z = x + y
f = function([x, y], z)

# Execute the function

f(4,5)

### 1.4 Adding two matrices
Reference : http://deeplearning.net/software/theano/tutorial/adding.html

In [1]:
import numpy as np
import theano.tensor as T
from theano import function

#Data structure: matrix
x = T.dmatrix('x')
y = T.dmatrix('y')
z = x + y
f = function([x, y], z)

# Execute the function
f([[1, 2], [3, 4]], [[10, 20], [30, 40]])

ImportError: No module named theano.tensor

### 1.4.1 Exercise:
Reference : http://deeplearning.net/software/theano/tutorial/adding.html

Write a program that takes a set of 2D points as input and does the following transformation on it:

- x->x+y-1 and y->y-x+1

You can loop over all the points or find a matrix that does this linear transformation and multiply it with the points.

In [None]:
import numpy
import theano.tensor as T
from theano import function

# Your code here



# END

# Execute the function

>>> f([[1,4],[2,5]])

### 1.5 Shared Variables
It is also possible to make a function with an internal state. For example, let’s say we want to make an accumulator: at the beginning, the state is initialized to zero. Then, on each function call, the state is incremented by the function’s argument.

First let’s define the accumulator function. It adds its argument to the internal state, and returns the old state value.

In [None]:
from theano import shared
state = shared(0)
inc = T.iscalar('inc')
accumulator = function([inc], state, updates=[(state, state+inc)])

This code introduces a few new concepts. The shared function constructs so-called shared variables. These are hybrid symbolic and non-symbolic variables whose value may be shared between multiple functions. Shared variables can be used in symbolic expressions just like the objects returned by dmatrices(...) but they also have an internal value that defines the value taken by this symbolic variable in all the functions that use it. It is called a shared variable because its value is shared between many functions. The value can be accessed and modified by the .get_value() and .set_value() methods. We will come back to this soon.

The other new thing in this code is the updates parameter of function. updates must be supplied with a list of pairs of the form (shared-variable, new expression). It can also be a dictionary whose keys are shared-variables and values are the new expressions. Either way, it means “whenever this function runs, it will replace the .value of each shared variable with the result of the corresponding expression”. Above, our accumulator replaces the state‘s value with the sum of the state and the increment amount.

Let’s try it out!

In [None]:
print(state.get_value())
accumulator(1)
print(state.get_value())
accumulator(300)
print(state.get_value())

It is possible to reset the state. Just use the .set_value() method:

In [None]:
state.set_value(-1)
accumulator(3)
print(state.get_value())

### 1.6 Random numbers
The way to think about putting randomness into Theano’s computations is to put random variables in your graph. Theano will allocate a NumPy RandomStream object (a random number generator) for each such variable, and draw from it as necessary. We will call this sort of sequence of random numbers a random stream. Random streams are at their core shared variables, so the observations on shared variables hold here as well. Theanos’s random objects are defined and implemented in RandomStreams and, at a lower level, in RandomStreamsBase.

In [None]:
from theano.tensor.shared_randomstreams import RandomStreams
from theano import function
srng = RandomStreams(seed=234)
rv_u = srng.uniform((2,2))
rv_n = srng.normal((2,2))
f = function([], rv_u)
g = function([], rv_n, no_default_updates=True)    #Not updating rv_n.rng
nearly_zeros = function([], rv_u + rv_u - 2 * rv_u)

In [None]:
f_val0 = f()
f_val1 = f()  #different numbers from f_val0

### 1.7 Gradients

Gradients are very important for all optimization problems. Here we see how to compute simple gradient. For Jacobian and Hessian see the link: http://deeplearning.net/software/theano/tutorial/gradients.html

In [None]:
import numpy
import theano
import theano.tensor as T
from theano import pp
x = T.dscalar('x')
y = x ** 2
gy = T.grad(y, x)
pp(gy)  # print out the gradient prior to optimization
f = theano.function([x], gy)
f(4)
numpy.allclose(f(94.2), 188.4)

### 1.8 A simple logistic regression in Theano

Read the code carefully to gain clarity on all the concepts described above.

Reference:http://deeplearning.net/software/theano/tutorial/examples.html#a-real-example-logistic-regression

In [None]:
# A basic classifier based on logistic regression

import numpy
import theano
import theano.tensor as T
rng = numpy.random

N = 400                                   # training sample size
feats = 784                               # number of input variables

# generate a dataset: D = (input_values, target_class)
D = (rng.randn(N, feats), rng.randint(size=N, low=0, high=2))
training_steps = 10000

# Declare Theano symbolic variables
x = T.dmatrix("x")
y = T.dvector("y")

# initialize the weight vector w randomly
#
# this and the following bias variable b
# are shared so they keep their values
# between training iterations (updates)
w = theano.shared(rng.randn(feats), name="w")

# initialize the bias term
b = theano.shared(0., name="b")

print("Initial model:")
print(w.get_value())
print(b.get_value())

# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w) - b))   # Probability that target = 1
prediction = p_1 > 0.5                    # The prediction thresholded
xent = -y * T.log(p_1) - (1-y) * T.log(1-p_1) # Cross-entropy loss function
cost = xent.mean() + 0.01 * (w ** 2).sum()# The cost to minimize
gw, gb = T.grad(cost, [w, b])             # Compute the gradient of the cost
                                          # w.r.t weight vector w and
                                          # bias term b
                                          # (we shall return to this in a
                                          # following section of this tutorial)

# Compile
train = theano.function(
          inputs=[x,y],
          outputs=[prediction, xent],
          updates=((w, w - 0.1 * gw), (b, b - 0.1 * gb)))
predict = theano.function(inputs=[x], outputs=prediction)

# Train
for i in range(training_steps):
    pred, err = train(D[0], D[1])

print("Final model:")
print(w.get_value())
print(b.get_value())
print("target values for D:")
print(D[1])
print("prediction on D:")
print(predict(D[0]))

### 1.8.1 Exercise
Just for the heck of it, try the following:

Q1 : What happens if we modify the cost function?

In [None]:
# Your code here

Q2 : What happens if we modify the number of training steps?

In [None]:
# Your code here

There are other functionalities that are native to Theano, but we may not be able to cover all of them here. But you can follow all of them at:

http://deeplearning.net/software/theano/tutorial

### It's all good, but.....

Well Theano happens to be a highly customizable and powerful library for machine learning but it can certainly use a lot more abstraction. With this idea in mind an easier to use wrapper was written on top of it to facilitate easier usage without much compromise in functionality. This wrapper happens be Keras. have a look....

# Keras Introduction:
Keras is a minimalist, highly modular neural networks library, written in Python and capable of running on <b>top of either TensorFlow or Theano</b>. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.

Use Keras if you need a deep learning library that:

- allows for easy and fast prototyping (through total modularity, minimalism, and extensibility).
- supports both convolutional networks and recurrent networks, as well as combinations of the two.
- supports arbitrary connectivity schemes (including multi-input and multi-output training).
- runs seamlessly on CPU and GPU.

## 1. Getting Started¶
Reference:http://keras.io/
### 1.1 Checking your Installation

In [None]:
# Check if keras imports successfully
from keras.models import Model
# Import the sequential module from keras
from keras.models import Sequential
# Import the layers you wish to use in your net
from keras.layers.core import Dense, Dropout, Activation
from keras.layers import Input
# Import the optimization algorithms that you wish to use
from keras.optimizers import SGD, Adam, RMSprop
# Import other utilities that help in data formatting etc.
from keras.utils import np_utils

### 1.2 Generate Data
We shall write a simple logistic regression here to see how Keras works. First generate some random points for data.

In [None]:
import numpy as np
np.random.seed(1337)                       # for reproducibility
N = 400                                   # training sample size
feat = 784                               # number of input variables
labels=np.random.randint(low=0, high=2, size=(N,1))
x=np.random.randn(N,feat)
x=x.astype('float32')
print(np.shape(labels))
print(np.shape(x))

# convert class vectors to binary class matrices
labels = np_utils.to_categorical(labels, 2)

### 1.3 Building a net

Initiate a sequential model.

In [None]:
model = Sequential()

Add the regression neuron

In [None]:
model.add(Dense(2, input_shape=(784,)))

Add a non linear activation function

In [None]:
model.add(Activation('sigmoid'))

### 1.4 Compiling the net

In [None]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

### 1.5 Training the net

In [None]:
history = model.fit(x, labels,
                    batch_size=100, nb_epoch=20,
                    verbose=1, validation_data=(x, labels))

### 1.6. Plotting accuracy and loss

We might be interested in seeing the accuracy or convergence of our model. To do so:

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
# list all data in history
print(history.history.keys())
# summarize history for accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

Well that was quick. Thats how simple Keras is.

## 2. Exercise
Write a code that reads an image and downsamples it.
Hint: You'll need a network with an input layer and a suitable pooling layer.

In [None]:
# Your code here

#### Now that the basics are out of the way, lets get serious .....