# SymPy Intro

Take a look [here](https://safwanahmad.github.io/2018/01/21/Linear-Regression-A-Tale-of-a-Transform.html)

# TensorFlow and Keras
and why they matter

# Warming up
* Implement a function that computes the sum of squares of numbers from 0 to N
* Use numpy or python
* An array of numbers 0 to N - numpy.arange(N)

In [None]:
import numpy as np

def sum_squares(N):
    return (np.arange(N)**2).sum()

In [None]:
%%time
sum_squares(10**8)

# tensorflow teaser

Doing the very same thing

In [None]:
import tensorflow as tf

In [None]:
#I gonna be function parameter
N = tf.placeholder(tf.int64)

#i am a recipe on how to produce sum of squares of arange of N given N
result = tf.reduce_sum(tf.range(N)**2)

In [None]:
%%time

session.run(result, feed_dict={N : 10**8})

# How does it work?
__if you're currently in classroom, chances are i am explaining this text wall right now__
* 1 You define inputs for your future function;
* 2 You write a recipe for some transformation of inputs;
* 3 You ask session to compute it with the values you provide session with;
* You have just got a graph!

* There are two main kinds of entities: "Inputs" and "Transformations"
* Both can be numbers, vectors, matrices, tensors, etc.
* Both can be integers, floats of booleans (uint8) of various size.


* An input is a placeholder for function parameters.
 * N from example above


* Transformations are the recipes for computing something given inputs and transformation
 * tf.reduce_sum(tf.range(N)^2) are 3 sequential transformations of N
 
Still confused? We gonna fix that.

In [None]:
3 # a rank 0 tensor; this is a scalar with shape []
[1. ,2., 3.] # a rank 1 tensor; this is a vector with shape [3]
[[1., 2., 3.], [4., 5., 6.]] # a rank 2 tensor; a matrix with shape [2, 3]
[[[1., 2., 3.]], [[7., 8., 9.]]] # a rank 3 tensor with shape [2, 1, 3]

In [None]:
#Inputs
example_input_integer = tf.Variable(1)

example_input_tensor = tf.Variable(np.ones((100, 3, 50, 50)))
#do not warry, we won't need tensor
#yet

input_vector = tf.Variable(np.ones(5, dtype=np.float32)) # vector of integers

In [None]:
#Transformations

#transofrmation: elementwise multiplication
double_the_vector = input_vector * 2

#elementwise cosine
elementwise_cosine = tf.cos(input_vector)

#difference between squared vector and vector itself
vector_squares = input_vector**2 - input_vector

In [None]:
#Practice time:
#create two vectors of size float32
my_vector = student.init_float32_vector()
my_vector2 = student.init_one_more_such_vector()

In [None]:
#Write a transformation(recipe):
#(vec1)*(vec2) / (sin(vec1) +1)
my_transformation = student.implementwhatwaswrittenabove()

In [None]:
print(my_transformation)
#it's okay it aint a number

# Compiling
* So far we were using "symbolic" variables and transformations
 * Defining the recipe for computation, but not computing anything
* To use the recipe, one should compile it

In [None]:
# The next lines defines a function that takes two vectors and computes your transformation
def my_function(val1, val2):
    a = <the first vector that my_transformation depends on>
    b = <the second vector that my_transformation depends on>
    outputs = [<What do we compute (can be a list of several transformation)>]
    
    return session.run(outputs, {a : val1, b : val2})

In [None]:
#using function with, lists:
print('using python lists:')
print(my_function([1, 2, 3], [4, 5, 6]))
print()

#Or using numpy arrays:
#btw, that 'float' dtype is casted to secong parameter dtype which is float32
print('using numpy arrays:')
print(my_function(np.arange(10),
                  np.linspace(5, 6, 10, dtype='float')))

* When debugging, one would generally want to reduce the computation complexity. For example, if you are about to feed neural network with 1000 samples batch, consider taking first 2.
* If you really want to debug graph of high computation complexity, you could just as well compile it (e.g. with optimizer='fast_compile')

# Do It Yourself

In [None]:
# Quest #1 - implement a function that computes a mean squared error of two input vectors
# Your function has to take 2 vectors and return a single number

<student.define_inputs_and_transformations()>

def compute_mse(<student.define_function_arguments()>):
    <student.compile_function()>

In [None]:
# Tests
from sklearn.metrics import mean_squared_error

for n in [1, 5, 10, 10**3]:
    elems = [np.arange(n), np.arange(n, 0, -1), np.zeros(n),
             np.ones(n), np.random.random(n), np.random.randint(100, size=n)]
    
    for el in elems:
        for el_2 in elems:
            true_mse = np.array(mean_squared_error(el, el_2))
            my_mse = compute_mse(el, el_2)
            
            if not np.allclose(true_mse, my_mse):
                print('Wrong result:')
                print('mse({},{})'.format(el, el_2))
                print('should be: {}, but your function returned {}'.format(true_mse, my_mse))
                raise ValueError, 'Smth went wrong'

print('All tests passed')

You could already see two types of graph inputs:
* placeholders
* Variables

There also exist constants:

`tf.constant(5)` - constant is initialized during its creation, what is opposite to Variables and placeholders.
One could see Variables as global variables that allows to put the data to graph and read the values from outside of the graph.

In [None]:
a = tf.Variable(np.ones([200, 150, 5, 3]), expected_shape=[None, 150, 5, 3])

try:
    session.run(a)
except:
    print("A variable was not initialized")

In [None]:
# In order to initialize Variable you have to call
init = tf.global_variables_initializer()
session.run(init)

In [None]:
session.run(a)

# Your turn

In [None]:
# Write a recipe (transformation) that computes an elementwise transformation of shared_vector and input_scalar
#Compile as a function of input_scalar

input_constant = tf.constant(5.0) # dtype is infered (or could be set implicitly with dtype = option)

constant_times_variable = <student.write_recipe()>

variable_times_n = <student.make_a_function()>

In [None]:
print('shared_times_n(5)', variable_times_n(5))

print('shared_times_n(-0.5)', variable_times_n(-0.5))

In [None]:
#Changing value of vector 1 (output should change)
shared_vector_1.assign(5 * shared_vector_1)

print('shared_times_n(5)', variable_times_n(5))

print('shared_times_n(-0.5)', variable_times_n(-0.5))

# T.grad - why tensorflow matters
* tensorflow can compute derivatives and gradients automatically
* Derivatives are computed symbolically, not numerically

Limitations:
* You can only compute a gradient of a __scalar__ transformation over one or several scalar or vector (or tensor) transformations or inputs.
* A transformation has to have float32 or float64 dtype throughout the whole computation graph
 * derivative over an integer has no mathematical sense


In [None]:
my_scalar = tf.placeholder(dtype=tf.float64)

scalar_squared = tf.reduce_sum(my_scalar**2)

#a derivative of v_squared by my_vector
derivative = tf.gradients(scalar_squared, my_scalar)

fun = lambda x: session.run(scalar_squared, {my_scalar: x})
grad = lambda x: session.run(derivative, {my_scalar: x})

In [None]:
import matplotlib.pyplot as plt
plt.style.use('ggplot')

%matplotlib inline

x = np.linspace(-3, 3)
x_squared = list(map(fun, x))
x_squared_der = list(map(grad, x))

plt.plot(x, x_squared, label='x^2')
plt.plot(x, x_squared_der, label='derivative')
plt.legend(loc='best')

# Why that rocks

In [None]:
my_vector = tf.placeholder(tf.float64)

#Compute the gradient of the next weird function over my_scalar and my_vector
#warning! Trying to understand the meaning of that function may result in permanent brain damage

weird_psychotic_function = tf.reduce_mean((my_vector+my_scalar)**(1+tf.nn.moments(my_vector, axes=[0])[1]) + 1. / tf.log(my_scalar + tf.sqrt(my_scalar**2 + 1))) / (my_scalar**2 + 1) + 0.01 * tf.sin(2 * my_scalar**1.5) * (tf.reduce_sum(my_vector) * my_scalar**2) * tf.exp((my_scalar - 4)**2) / (1 + tf.exp((my_scalar - 4)**2)) * (1. - (tf.exp( - (my_scalar - 4)**2)) / (1 + tf.exp( - (my_scalar - 4)**2)))**2

der_by_scalar, der_by_vector = #<student.compute_grad_over_scalar_and_vector()>

compute_weird_function = lambda x, y: session.run(weird_psychotic_function, {my_scalar : x, my_vector : y })
compute_der_by_scalar = lambda x, y: session.run(der_by_scalar, {my_scalar : x, my_vector : y })

In [None]:
#Plotting your derivative
vector_0 = [1, 2, 3]

scalar_space = np.linspace(0, 7)

y = [compute_weird_function(x, vector_0) for x in scalar_space]
plt.plot(scalar_space, y, label='function')
y_der_by_scalar = [compute_der_by_scalar(x, vector_0) for x in scalar_space]
plt.plot(scalar_space, y_der_by_scalar, label='derivative')
plt.grid()
plt.legend(loc='best')

# Logistic regression example

Implement the regular logistic regression training algorithm

Tips:
* Weights fit in as a shared variable
* X and y are potential inputs
* Compile 2 functions:
 * train_function(X, y) - returns error and computes weights' new values __(through updates)__
 * predict_fun(X) - just computes probabilities ("y") given data
 
 
We shall train on a two-class MNIST dataset
* please note that target y are {0,1} and not {-1,1} as in some formulae

In [None]:
from sklearn.datasets import load_digits
mnist = load_digits(2)

X = mnist.data
y = mnist.target

print('y [shape - {}]:{}'.format(str(y.shape), y[:10]))
print('X [shape - {}]:'.format(str(X.shape))
print(X[:3])
print(y[:10])

In [None]:
# inputs and shareds
shared_weights = <student.code_me()>
input_X = <student.code_me()>
input_y = <student.code_me()>

In [None]:
predicted_y = <predicted probabilities for input_X>
loss = <logistic loss (scalar, mean over sample)>

grad = <gradient of loss over model weights>

updates = {
    shared_weights: <new weights after gradient step> #Implement your favorite stochastic optimization algorithm
}

In [None]:
train_function = <compile function that takes X and y, returns log loss and updates weights>
predict_function = <compile function that takes X and computes probabilities of y>

In [None]:
from sklearn.cross_validation import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y)

In [None]:
from sklearn.metrics import roc_auc_score

for i in range(5):
    loss_i = train_function(X_train,y_train)
    print('loss at iter {}:{}'.format(i, loss_i)
    print('train auc:', roc_auc_score(y_train,predict_function(X_train)))
    print('test auc:', roc_auc_score(y_test,predict_function(X_test)))
    
print('resulting weights:')
          
plt.imshow(shared_weights.get_value().reshape(8, -1))
plt.colorbar()

In [None]:
import tensorflow as tf

x = tf.placeholder(tf.float32)
f = x / x

with tf.Session() as session:
    print(session.run(f, {x : 0}))

# Bonus task:

implement regression assignment with SymPy

# Report

I did such and such, that did that cool thing and my awesome logistic regression bloated out that stuff. Finally, i did that thing and felt like Einstein. That cool article and that kind of weed helped me so much (if any).