# Short multi linear regression example

### [Theano tutorial](http://theano.readthedocs.org/en/latest/tutorial/index.html#tutorial)

### https://github.com/Newmu/Theano-Tutorials


In [2]:
import theano as th
th.__version__

'0.8.1'

In [3]:
import matplotlib.pyplot as plt
import numpy as np
import theano.tensor as T
# from theano import tensor as T

## Linear regression example: find slope

The `linspace` function, in this case, returns an array of `101` equally spaces numbers starting with `-1` and ending with `1`.

In [4]:
N = 5
m = 3
b0 = 4
x_start = -1
x_stop  = +1 
rnd_mul = 0.01

trX = np.linspace(start=x_start, stop=x_stop, num=N)
trX[0:5]

In [5]:
trX = np.array([np.ones(N), 
                np.linspace(start=x_start, stop=x_stop, num=N)]).T
trX[0:5,]

array([[ 1. , -1. ],
       [ 1. , -0.5],
       [ 1. ,  0. ],
       [ 1. ,  0.5],
       [ 1. ,  1. ]])

trY = m * trX + np.random.randn(*trX.shape) * rnd_mul + b0
trY[50:51]

In [6]:
trY = trX.dot(np.array([2,3])) + np.random.randn(N) * rnd_mul
trY

array([-0.99162584,  0.49919199,  2.00775363,  3.50243213,  4.99510299])

### The input and output variables are scalar values.

[Basic Tensor Functionality](http://deeplearning.net/software/theano/library/tensor/basic.html) (documentation)

In [7]:
X = T.matrix() # X = T.scalar()
Y = T.vector() # Y = T.scalar()
#XY = T.dot(X,Y)
#XY_f = th.function(inputs=[X,Y],outputs=XY)
#XY_f(trX, np.array([2,3]))

T.zeros_like(Y) # ???

### The model

- determined by: `w` intrepreted as a _slope_
- input: `X` a scalar
- output: `X * w` 

In [8]:
w = th.shared(np.zeros(3).T, name="w")


def model(X, w):
    return T.dot(X,w)

# w = th.shared(np.asarray(0.0, dtype=th.config.floatX))
# w = th.shared(np.asarray(0.0, dtype=th.config.floatX))

# b = th.shared(np.asarray(0.0, dtype=th.config.floatX))


y = model(X, w)
X, w, y

(<TensorType(float64, matrix)>, w, dot.0)

In [105]:
y_f = th.function(inputs=[X,w],givens=[w],outputs=y)

ValueError: length not known: w [id A]


In [98]:
sdf = T.dot(X,Y)
model_f = th.function(inputs=[X,Y],outputs=sdf)
model_f(trX, trY[0:2])

array([-1.49853058, -1.24724508, -0.99595959, -0.74467409, -0.4933886 ])

### Calculate the cost function

In [100]:
#cost = T.mean(T.sqr(y - Y))
cost = T.sum(T.sqr(y - Y))
th.pp(cost)

'Sum{acc_dtype=float64}(sqr(((<TensorType(float64, matrix)> \\dot w) - <TensorType(float64, vector)>)))'

In [101]:
cost_f = th.function(inputs=[y,Y], outputs=cost)

The cost function works

In [102]:
cost_f(np.array([1,2]),np.array([4,0]))

array(13.0)

In [65]:
cost_f(y,
       trY)

TypeError: ('Bad input argument to theano function with name "<ipython-input-54-f7c06619124c>:1"  at index 0(0-based)', 'Expected an array-like object, but found a Variable: maybe you are trying to call a function on a (possibly shared) variable instead of a numeric array?')

### Calculate the gradient of the cost formula

Think of the gradient as the slope. From Wikipedia, it is the vector whose components are the `n` partial derivatives of f. It is thus a vector-valued function.

See [Wikipedia - Gradient](https://en.wikipedia.org/wiki/Gradient)


In [56]:
gradient = T.grad(cost=cost, wrt=w)
th.pp(gradient)

'(<TensorType(float64, matrix)>.T \\dot ((fill(sqr(((<TensorType(float64, matrix)> \\dot w) - <TensorType(float64, vector)>)), fill(Sum{acc_dtype=float64}(sqr(((<TensorType(float64, matrix)> \\dot w) - <TensorType(float64, vector)>))), TensorConstant{1.0})) * ((<TensorType(float64, matrix)> \\dot w) - <TensorType(float64, vector)>)) * TensorConstant{2}))'

In [57]:
gradient_f = th.function(inputs=[X,Y], outputs=gradient)

In [61]:
trX, trY

(array([[ 1. , -1. ],
        [ 1. , -0.5],
        [ 1. ,  0. ],
        [ 1. ,  0.5],
        [ 1. ,  1. ]]),
 array([-0.99595959,  0.50257099,  2.01093111,  3.5004867 ,  5.01870034]))

In [62]:
gradient_f(trX,trY.T)

ValueError: Shape mismatch: A.shape[1] != x.shape[0]
Apply node that caused the error: CGemv{no_inplace}(<TensorType(float64, vector)>, TensorConstant{2.0}, <TensorType(float64, matrix)>, w, TensorConstant{-2.0})
Toposort index: 2
Inputs types: [TensorType(float64, vector), TensorType(float64, scalar), TensorType(float64, matrix), TensorType(float64, vector), TensorType(float64, scalar)]
Inputs shapes: [(5,), (), (5, 2), (3,), ()]
Inputs strides: [(8,), (), (8, 40), (8,), ()]
Inputs values: [array([-0.99595959,  0.50257099,  2.01093111,  3.5004867 ,  5.01870034]), array(2.0), 'not shown', array([ 0.,  0.,  0.]), array(-2.0)]
Outputs clients: [[CGemv{inplace}(AllocEmpty{dtype='float64'}.0, TensorConstant{1.0}, InplaceDimShuffle{1,0}.0, CGemv{no_inplace}.0, TensorConstant{0.0})]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

In [58]:
w.get_value() # gradient function is -2 times difference b/w

array([ 0.,  0.,  0.])

The gradient points in the direction of steepest ascent. The negative of `gradient` is added to `w` since we want to minimize `cost`.

In [26]:
updates = [[w, w - gradient * 0.01]] # 0.01 is called the "learning rate"
updates

[[<TensorType(float64, scalar)>, Elemwise{sub,no_inplace}.0]]

In [27]:
train = th.function(inputs=[X, Y], outputs=cost, updates=updates, allow_input_downcast=True)

In [28]:
for i in range(1000):
    for x, y in zip(trX, trY):
        train(x, y)

In [29]:
print(w.get_value()) #something around 2
print(b.get_value()) #something around 2

3.7462299745043968
0.0
