# Adding Two Scalars

In [2]:
import numpy as np
import theano.tensor as T
from theano import function

In Theano, all symbols must be _typed_. In particular, below we will use `T.dscalar` as the type for a "0-dimensional array (scalar) of type double". This is a Theano `type`. 

In [3]:
x = T.dscalar('x')
y = T.dscalar('y')

Note, `dscalar` is _not_ a class, so `x` and `y` are not instances of `dscalar`. They are instances of `TensorVariable`. `x` and `y`, however, are assigned the theano Type `dscalar` in their `type` field, as you can see below:

In [4]:
type(x)

theano.tensor.var.TensorVariable

In [5]:
x.type

TensorType(float64, scalar)

In [6]:
T.dscalar

TensorType(float64, scalar)

In [7]:
x.type is T.dscalar

True

By calling `T.dscalar` with a string argument, you create a _Variable_ representing a floating-point scalar quantity with the given name. If you provide no argument, the symbol will be unamed. Names are not required, but will often help with debugging. 

Next, we can combine `x` and `y` into their sum, `z`:

In [8]:
z = x + y

`z` is yet another variable which represents the addition of `x` and `y`. You can use the `pp` function to pretty-print out the computation associated to `z`:

In [9]:
from theano import pp
print(pp(z))

(x + y)


The last step is to create a function taking `x` and `y` as inputs, and giving `z` as output: 

In [10]:
f = function([x, y], z)

In [11]:
f(2, 3)

array(5.)

Note there is a slight delay when executing the `function` instruction. This because behind the scenes, `f` was being compiled into C code.

The first argument to `function` is a list of Variables that will be provided as inputs to the function. The second argument is a single Variable or a list of Variables. For either case, the second argument is what we want to see as output when we apply the function. `f` may then be used like a normal Python function.

# Adding Two Matrices
This next step simply requires that we instantiate `x` and `y` using the matrix Types:

In [12]:
x = T.dmatrix('x')
y = T.dmatrix('y')
z = x + y
f = function([x, y] , z)

We can see that we are able to pass in either python lists or numpy arrays. 

In [13]:
f([[1,2],[3,4]], [[10,20],[30,40]])

array([[11., 22.],
       [33., 44.]])

In [14]:
f(np.array([[1,2], [3,4]]), np.array([[10,20], [30,40]]))

array([[11., 22.],
       [33., 44.]])

# Shared Variables
It is also possible to make a function with an internal state. For example, let's say we want to make an accumulator: at the beginning, the state is initialized to zero. Then, on each function call, the state is incremented by the function's argument. 

First, let's define the `accumulator` function. It adds its argument to the internal state, and returns the old state value. 

In [15]:
from theano import shared
state = shared(0)
inc = T.iscalar('inc')
accumulator = function([inc], state, updates=[(state, state+inc)])

The above code introduces a few new concepts. The `shared` function constructs so-called _shared variables_. These are hybrid symbolic and non-symbolic variables whose value may be shared between multiple functions. Shared variables can be used in symbolic expressions just like the objects returned by `dmatrices(...)` but they also have an internal value that defines the value taken by this symbolic variable in all the functions that use it. It is called a _shared_ variable because its value is shared between many functions. The value can be accessed and modified by the `.get_value()` and `.set_value()` methods. 

The other new thing in this code is the `updates` parameter of `function`. `updates` must be supplied with a list of pairs of the form: `(shared-variable, new expression)`. It can also be a dictionary whose keys are shared-variables and values are the new expressions. Either way, it means “whenever this function runs, it will replace the `.value` of each shared variable with the result of the corresponding expression”. Above, our accumulator replaces the state‘s value with the sum of the state and the increment amount.

We can now try this out:

In [16]:
print(state.get_value())

0


In [17]:
accumulator(1)

array(0)

In [18]:
print(state.get_value())

1


In [19]:
accumulator(300)

array(1)

In [20]:
print(state.get_value())

301


We can also reset the state by using the `.set_value()` method:

In [21]:
state.set_value(-1)

In [22]:
accumulator(3)

array(-1)

In [23]:
print(state.get_value())

2


As we mentioned above, you can define more than one function to use the same shared variable. These functions can all update the value.

In [24]:
decrementor = function([inc], state, updates=[(state, state-inc)])

In [25]:
decrementor(2)

array(2)

In [26]:
print(state.get_value())

0


You might be wondering why the updates mechanism exists. You can always achieve a similar result by returning the new expressions, and working with them in NumPy as usual. The updates mechanism can be a syntactic convenience, but it is mainly there for efficiency. Updates to shared variables can sometimes be done more quickly using in-place algorithms (e.g. low-rank matrix updates). Also, Theano has more control over where and how shared variables are allocated, which is one of the important elements of getting good performance on the GPU.

# Graph Structures
The first step in writing Theano code is to write down all mathematical relations using symbolic placeholders (variables). When writing down these expressions you use operations like `+`, `-`, `**`, `sum()`, `tanh()`. All these are represented internally as **ops**. An op represents a certain computation on some type of inputs producing some type of output. You can see it as a _function definition_ in most programming languages.

Theano represents symbolic mathematical computations as graphs. These graphs are composed of interconnected _Apply_, _Variable_ and _Op_ nodes. _Apply_ node represents the application of an _op_ to some _variables_. It is important to draw the difference between the definition of a computation represented by an op and its application to some actual data which is represented by the apply node. Furthermore, data types are represented by Type instances. Here is a piece of code and a diagram showing the structure built by that piece of code. This should help you understand how these pieces fit together:

In [34]:
x = T.dmatrix('x')
y = T.dmatrix('y')
z = x + y

### Diagram

<img src="https://drive.google.com/uc?id=19KR8tIZk0EVPkyeV1o5FgDRm5r6am6VK">

Arrows represent references to the Python objects pointed at. The blue box is an Apply node. Red boxes are Variable nodes. Green circles are Ops. Purple boxes are Types.

When we create _Variables_ and then _Apply Ops_ to them to make more Variables, we build a bi-partite, directed, acyclic graph. Variables point to the Apply nodes representing the function application producing them via their `owner` field. These Apply nodes point in turn to their input and output Variables via their `inputs` and `outputs` fields. (Apply instances also contain a list of references to their `outputs`, but those pointers don’t count in this graph.)

The `owner` field of both `x` and `y` point to `None` because they are not the result of another computation. If one of them was the result of another computation, it’s `owner` field would point to another blue box like `z` does, and so on.

Note that the `Apply` instance’s outputs points to `z`, and `z.owner` points back to the `Apply` instance.

## Traversing the graph
The graph can be traversed starting from outputs (the result of some computation) down to its inputs using the owner field. Take for example the following code:

In [35]:
x = theano.tensor.dmatrix('x')
y = x * 2

If you enter `type(y.owner)` you get `<class 'theano.gof.graph.Apply'>`, which is the apply node that connects the op and the inputs to get this output. You can now print the name of the op that is applied to get y:

In [36]:
type(y.owner)

theano.gof.graph.Apply

In [38]:
y.owner.op.name

'Elemwise{mul,no_inplace}'

Hence, an elementwise multiplication is used to compute y. This multiplication is done between the inputs:

In [39]:
len(y.owner.inputs)

2

In [42]:
y.owner.inputs[0]

x

In [43]:
y.owner.inputs[1]

InplaceDimShuffle{x,x}.0

Note that the second input is not 2 as we would have expected. This is because 2 was first broadcasted to a matrix of same shape as x. This is done by using the op DimShuffle :

In [44]:
type(y.owner.inputs[1])

theano.tensor.var.TensorVariable

In [45]:
type(y.owner.inputs[1].owner)

theano.gof.graph.Apply

In [46]:
y.owner.inputs[1].owner.op

<theano.tensor.elemwise.DimShuffle at 0x115018d68>

In [47]:
y.owner.inputs[1].owner.inputs

[TensorConstant{2}]

# Derivative in Theano
Now let’s use Theano for a slightly more sophisticated task: create a function which computes the derivative of some expression `y` with respect to its parameter `x`. To do this we will use the macro `T.grad`. For instance, we can compute the gradient of $x^2$ with respect to $x$:

$$\frac{d(x^2)}{dx} = 2x$$

In [29]:
from theano import pp

In [33]:
x = T.dscalar('x')
y = x ** 2
gy = T.grad(y, x)
display(pp(gy)) # Pretty print gradient prior to optimization

f = theano.function([x], gy)
f(4)

'((fill((x ** TensorConstant{2}), TensorConstant{1.0}) * TensorConstant{2}) * (x ** (TensorConstant{2} - TensorConstant{1})))'

array(8.)

# Logistic Regression Example

In [28]:
import numpy as np
import theano 
import theano.tensor as T
rng = np.random

In [None]:
N = 400      # Training sample size
feats = 784  # Number of input variables

# generate a dataset: D = (input_values, target_class)
# D[0].shape = (400, 784), D[1].shape = (400,)
D = (rng.randn(N, feats), rng.randint(size=N, low=0, high=2))
training_steps = 10000

# Declare Theano symbolic variables
x = T.dmatrix('x')
y = T.dvector('y')

# Initialize the weight vector w randomly
# 
# This and the following bias variable b 
# are shared so they keep their udpate values 
# between training iterations (updates)
w = theano.shared(rng.randn(feats), name='w')

# Initialize bias term
b = theano.shared(0., name='b')

print('Initial model: ')
print(w.get_value())
print(b.get_value())

# ------- Construct Theano Expression Graph ---------
# Prediction, Probability that target = 1 
p_1 = 1 / (1 + T.exp(-T.dot(x, w) - b))      
prediction = p_1 > 0.5                        # Prediction threshold

# Cross entropy loss function, returns an array of cross entropy's
xent = -y * T.log(p_1) - (1-y) * T.log(1-p_1) 

# Get the average of all the cross entropy's, add regularization  
cost = xent.mean() + 0.01 * (w ** 2).sum()    

# Compute the gradient of the cost (w/ reg), w.r.t weight vector w and bias term b
gw, gb = T.grad(cost, [w,b])                  

# Compile
train = theano.function(
  inputs=[x,y], 
  outputs=[prediction, xent], 
  updates=((w, w - 0.1 * gw), (b, b - 0.1 * gb))
)
predict = theano.function(inputs=[x], outputs=prediction)

# Train
for i in range(training_steps):
  pred, err = train(D[0], D[1])
  
print("Final model:")
print(w.get_value())
print(b.get_value())
print("target values for D:")
print(D[1])
print("prediction on D:")
print(predict(D[0]))