<a href="https://colab.research.google.com/github/ahmadhajmosa/Machine-learning-labs/blob/Mareike/Session_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab on Machine Learning and Applications in Intelligent Vehicles
## Session 1: Introduction


#Session 2: 05.06 - 13:00 - 14:30 :




## Intro:

Tensorflow is a powerful framework for implementing and deploying large-scale deep learning models. Recently, it has been widely used in both reasearch and production. TF objective is to combine scale and flexibility.

In the past session, we will learning the following:

1. TF programming stack
2. TF programming concepts including computatoin graphs, operations and sessions. 
3. Implementation of linear regression
4. Implementation of feed-forward neural networks

## TF stack:

TensorFlow is a framework composed of two core building blocks — a library for defining computational graphs and a runtime for executing such graphs on a variety of different hardware


![alt text](https://www.tensorflow.org/images/layers.png)


Before goining into details about the stack, let us talk about computational graphs.

### Computational Graphs

A directed graph is a data structure consisting of nodes (vertices) and edges. It’s a set of vertices connected pairwise by directed edges.

Graphs come in many shapes and sizes and are used to solve many real-life problems, such as representing networks including telephone networks, circuit networks, road networks, and even social networks. 
![alt text](https://cdn-images-1.medium.com/max/800/1*V6aYjD3AxDbEKYahkGqVQw.png)

TensorFlow uses directed graphs internally to represent computations, and they call this data flow graphs (or computational graphs).

The nodes in TF data flow graph mostly represents operations, variables and placeholders.

Take for example the following operation:
![alt text](https://cdn-images-1.medium.com/max/800/1*6E3sfit6DCeJ9mOz17g4bA.png)

To create a computational graph out of this program, we create nodes for each of the operations in our program, along with the input variables a and b. In fact, a and b could be constants if they don’t change. If one node is used as the input to another operation we draw a directed arrow that goes from one node to another.

The computational graph for this program might look like this:
![alt text](https://cdn-images-1.medium.com/max/800/1*vPb9E0Yd1QUAD0oFmAgaOw.png)

Operations create or manipulate data according to specific rules. In TensorFlow those rules are called Ops, short for operations. Variables on the other hand represent shared, persistent state that can be manipulated by running Ops on those variables.

The questions now what are the advantages of representing operations as directed graphs: The main advantage of using directed graphs is the ability to do **parallelism** and what is called **dependency driving scheduling**. 
For example, consider again the follwoing code:
![alt text](https://cdn-images-1.medium.com/max/800/1*6E3sfit6DCeJ9mOz17g4bA.png)
At the most fundamental level, most computer programs are mainly composed of two things — primitive operations and an order in which these operations are executed, often sequentially, line by line. This means we would first multiply a and b and only when this expression was evaluated we would take their sum. Computational graphs on the otherhand, exclusively specify the dependencies across the operations.
If we look at our computational graph we see that we could execute the multiplication and addition in parallel. That’s because these two operations do not depend on each other.
 So we can use the topology of the graph to drive the scheduling of operations and execute them in the most efficient manner, e.g. using multiple GPUs on a single machine or even distribute the execution across multiple machines.
 Another key advantage is portability. The graph is a language-independent representation of our code. So we can build the graph in Python, save the model (TensorFlow uses protocol buffers), and restore the model in a different language, say C++, if you want to go really fast.
 
 

--------------------------------
# References:

https://medium.com/@d3lm/understand-tensorflow-by-mimicking-its-api-from-scratch-faa55787170d

https://www.tensorflow.org/guide/extend/architecture

https://www.tensorflow.org/guide/low_level_intro

  
 






# placeholder: tensors are feeded externally for example input tensors + output tensors
# variables: tensors represent the parameters of the network/graph i.e. nn weights

In [6]:
# Vgl. lecture slides session 5 CNN

import tensorflow as tf
import numpy as np

# Parameters
learning_rate = 0.001
training_iters = 2000
batch_size = 128

# Network Parameters # let's say we have just one layer this time
num_inputs = 3
num_outputs = 4
num_samples = 10

# training data
x_gr = np.random.rand(num_samples,num_inputs)
y_gr = np.random.rand(num_samples,num_outputs)

# tf Graph input
x = tf.placeholder(tf.float32, [None, num_inputs]) # data type is float, then dimension, number of samples that we have, number of inputs
y = tf.placeholder(tf.float32, [None, num_outputs])

# weights -> variables, not the placeholder now
w_1 = tf.Variable(tf.random_normal([num_inputs,num_outputs])) # linear regression 
#tf.Variable means it can be optimized, using random number generation, first dimension should be equal to the inputs, second dimension is the output

# next we need to create the bias, it's the same, but I do not do that now (?)

# model
y_p = tf.matmul(x, w_1)

# cost
cost = tf.reduce_mean(tf.pow(y-y_p,2)) 
# in tensorflow we have reduce mean and reduce sum -> summing the error over all the samples -> average 
# -> and next is the objective, we need to minimize the cost 
# -> optimization with the gradient descent


# optimization
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost) # i could also write maximize(...)

# once we are done with these 5 steps, we have configured the graph of the network
# -> you always have to define these (inputs, outputs, dimensions, optimization,...)

# initializing the graph and the weights
init = tf.global_variables_initializer()

# launch the graph
with tf.Session() as sess:
    sess.run(init)
    
    for i in range(100):
    
        sess.run(optimizer, feed_dict={x: x_gr, y: y_gr}) 
        # first: which session do I want to run -> first I want to run the optimizer
        # the keys for the dictionary are the placeholders that we have defined (I need to feed the two placeholders here)
    
        pr_cost = sess.run(cost, feed_dict={x: x_gr, y: y_gr})
    
        print('iter: ',i, 'cost: ', pr_cost)
      
    y_p_p = sess.run(y_p, feed_dict={x: x_gr, y: y_gr})
    
    print('predicted ', y_p_p)
    print('real', y_gr)

iter:  0 cost:  3.1211493
iter:  1 cost:  3.116713
iter:  2 cost:  3.1122823
iter:  3 cost:  3.1078563
iter:  4 cost:  3.1034353
iter:  5 cost:  3.0990195
iter:  6 cost:  3.094609
iter:  7 cost:  3.090204
iter:  8 cost:  3.0858042
iter:  9 cost:  3.0814102
iter:  10 cost:  3.0770218
iter:  11 cost:  3.0726383
iter:  12 cost:  3.0682614
iter:  13 cost:  3.0638897
iter:  14 cost:  3.059524
iter:  15 cost:  3.055164
iter:  16 cost:  3.0508106
iter:  17 cost:  3.0464625
iter:  18 cost:  3.0421205
iter:  19 cost:  3.037785
iter:  20 cost:  3.0334554
iter:  21 cost:  3.0291321
iter:  22 cost:  3.024815
iter:  23 cost:  3.0205038
iter:  24 cost:  3.0161994
iter:  25 cost:  3.0119014
iter:  26 cost:  3.0076096
iter:  27 cost:  3.003324
iter:  28 cost:  2.999045
iter:  29 cost:  2.9947724
iter:  30 cost:  2.9905066
iter:  31 cost:  2.986247
iter:  32 cost:  2.9819942
iter:  33 cost:  2.9777477
iter:  34 cost:  2.9735081
iter:  35 cost:  2.969275
iter:  36 cost:  2.9650486
iter:  37 cost:  2.960



```
# This is formatted as code
```


Before we jump into Tensorflow, we will implemented our first neural network model using Python Numpy package. NumPy is the fundamental package for scientific computing with Python, such as:

1. Linear Algebra
2. Statistics
3. Calculus

## A brief intro to Numpy operations:

1. Creating a Vector:
Here we use Numpy to create a 1-D Array which we then call a vector.





In [0]:
#Load Library
import numpy as np

#Create a vector as a Row
vector_row = np.array([1,2,3])

#Create vector as a Column
vector_column = np.array([[1],[2],[3]])

2. Creating a Matrix
We Create a 2-D Array in Numpy and call it a Matrix. It contains 2 rows and 3 columns.

In [0]:
#Load Library
import numpy as np

#Create a Matrix
matrix = np.array([[1,2,3],[4,5,6]])
print(matrix)

3. Selecting Elements


In [0]:
#Load Library
import numpy as np

#Create a vector as a Row
vector_row = np.array([ 1,2,3,4,5,6 ])

#Create a Matrix
matrix = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(matrix)

#Select 3rd element of Vector
print(vector_row[2])

#Select 2nd row 2nd column
print(matrix[1,1])
#Select all elements of a vector
print(vector_row[:])
#Select everything up to and including the 3rd element
print(vector_row[:3])
#Select the everything after the 3rd element
print(vector_row[3:])
#Select the last element
print(vector_row[-1])
#Select the first 2 rows and all the columns of the matrix
print(matrix[:2,:])
#Select all rows and the 2nd column of the matrix
print(matrix[:,1:2])


4. Describing a Matrix

In [0]:
import numpy as np


#Create a Matrix
matrix = np.array([[1,2,3],[4,5,6],[7,8,9]])
#View the Number of Rows and Columns
print(matrix.shape)
#View the number of elements (rows*columns)
print(matrix.size)
#View the number of Dimensions(2 in this case)
print(matrix.ndim)

5. Finding the max and min values

In [0]:
#Load Library
import numpy as np

#Create a Matrix
matrix = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(matrix)
#Return the max element
print(np.max(matrix))
#Return the min element
print(np.min(matrix))
#To find the max element in each column
print(np.max(matrix,axis=0))
#To find the max element in each row
print(np.max(matrix,axis=1))

6. Reshaping Arrays


In [0]:
#Load Library
import numpy as np

#Create a Matrix
matrix = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(matrix)
#Reshape
print(matrix.reshape(9,1))
#Here -1 says as many columns as needed and 1 row
print(matrix.reshape(1,-1))
#If we provide only 1 value Reshape would return a 1-d array of that length
print(matrix.reshape(9))
#We can also use the Flatten method to convert a matrix to 1-d array
print(matrix.flatten())

7. Calculating Dot Products

In [0]:
#Load Library
import numpy as np

#Create vector-1
vector_1 = np.array([ 1,2,3 ])
#Create vector-2
vector_2 = np.array([ 4,5,6 ])
#Calculate Dot Product
print(np.dot(vector_1,vector_2))
#Alternatively you can use @ to calculate dot products
print(vector_1 @ vector_2)

##Linear regression in Numpy:

---



Write the numpy code for the following model:

$Y=WX+B$

where $X$ is 3x10 matrix:  10 samples and 3 features

$Y$ is 4x10 matrix: 10 samples and 4 outputs

$W$ is the weights matrix with the shape 4x3: connecting 3 inputs to 4 outputs

$b$ is a vector with a size 4 ( one bias per output)


In [0]:
#Load Library
import numpy as np

# Generate a random X (we do not have a real data)
X = np.random.rand(3,10)
display(X.shape)

# Generate a random weights vector
W = np.random.rand(4,3)

# Generate a random bias 
b = np.random.rand(4,1)

# Calculate Y
Y= np.dot(W,X) + b
display(Y.shape)


## One neuron model in numpy:

A single neuron has multiple inputs and one output, in addition to the linear regression model, we need to add non linearity through an activation function:

$Y= f(WX+B)$

where $X$ is n x m matrix:  m samples and n features/inputs

$f(g)= \frac{1}{1+\exp(-g)}$  is a sigmoid acitavation function

$Y$ is nh1 x m matrix: m samples and ny outputs

$W$ is the weights matrix with the shape nh1 x n: connecting 3 inputs to 4 outputs

$b$ is a vector with a size nh1 ( one bias per output)





In [0]:
# load Library
import numpy as np 

f = lambda x: 1.0/(1.0 + np.exp(-x)) # activation function (use sigmoid)

# Generate a random X (we do not have a real data)
X = np.random.rand(3,10)


# Generate a random weights vector
W = np.random.rand(1,3)


# Generate a random bias 
b = np.random.rand()

# Calculate Y
Y= f(np.dot(W,X) + b)
display(Y)


## One hidden layer model in numpy:

The difference from the one neuron model is simple:  we need only to change the number of output "ny"

In [0]:
# load Library
import numpy as np 

#Suppose we have the following NN architecture

m = 10 # Number of samples
ni= 3 # Number of input neurons
h = 1 # Number of hidden layers
nh1 = 4 # Number of neurons in the hidden layer 1
no =1 # Number of neurons in the output layer



f = lambda x: 1.0/(1.0 + np.exp(-x)) # activation function (use sigmoid)

# Generate a random X (we do not have a real data)
X = np.random.rand(ni,m)


# Generate a random weights vector for the first hidden layer
W1 = np.random.rand(nh1,ni)


# Generate a random bias for the first hidden layer 
b1 = np.random.rand(nh1,1)

# Generate a random weights vector for the output layer
W2 = np.random.rand(no,nh1)

# Generate a random bias for the output layer 
b2 = np.random.rand(no,1)

# Calculate output of the first hidden layer
Yh1= f(np.dot(W1,X) + b1)

# Calculate output of the output layer

Y= f(np.dot(W2,Yh1) + b2)

display(Yh1.shape)
display(Y.shape)

## Gradient descent in Numpy:
Let us now start training a neural network
We start by implementing a simple gradient descent for linear regression

In [0]:
converged = False
iter = 0
m = 10 # Number of samples
ni= 1 # Number of input neurons
h = 1 # Number of hidden layers
no =1 # Number of neurons in the output layer

# Generate a random X (we do not have a real data)
X = np.random.rand(m)
display(X)

# learning rate
alpha =0.01

# early stop criteria 
ep=0.001

# maximum number of training iterations
max_iter=100

# Generate a random weights vector for the output layer
W1 = np.random.rand()

# Generate a random bias for the output layer 
b1 = np.random.rand()

# Generate a random ground truth
Y_gr = np.random.rand(m)


J = sum([(b1 + W1*X[i] - Y_gr[i])**2 for i in range(m)])

while not converged:
        # for each training sample, compute the gradient (d/d_theta j(theta))
        grad0 = 1.0/m * sum([(b1 + W1*X[i] - Y_gr[i]) for i in range(m)]) 
        grad1 = 1.0/m * sum([(b1 + W1*X[i] - Y_gr[i])*X[i] for i in range(m)])
        
        # update the theta_temp
        temp0 = W1 - alpha * grad0
        temp1 = b1 - alpha * grad1
        # update theta
        W1 = temp0
        b1 = temp1
        
        # sum squared error
        e = sum([(b1 + W1*X[i] - Y_gr[i])**2 for i in range(m)]) 

        if abs(J-e) <= ep:
            print('Converged, iterations: ', iter, '!!!')
            converged = True
    
        J = e   # update error 
        iter += 1  # update iter
    
        if iter == max_iter:
            print('Max interactions exceeded!')
            converged = True

##Assignment 1
### Backpropagation in Numpy:
