## Mathematics Fundamentals with PyTorch

In this notebook we will explore some mathematical notions required for understanding deep learning not by theorems and text but rather with code , we will demonstrate using [PyTorch](http://pytorch.org/) .

### Epilogue

Deep Learning uses a lot of mathematical concepts from different fields Matrix Operations from *Linear Algebra* , Derivatives and Optimization from *Real Analysis* , Density Functions,Distributions from *Probability Theory* if you are interested in doing cutting edge research a mastery of these would be required but we're more interested in the Hacker's way where we understand the concepts and use them to engineer our projects and apply Machine Learning 

### PyTorch

PyTorch is a Deep Learning framework or library like [TensorFlow](www.tensorflow.org) or [Caffe2](https://caffe2.ai) it provides primitives like Linear Algebra operations,optimization ... to be able to build Neural Networks,these are represented in the form of Graphs called computational graphs ,where operations are represented in form of a graph ![computationalgraph](https://colah.github.io/posts/2015-08-Backprop/img/tree-def.png) .



### Linear Algebra with PyTorch

PyTorch can replace NumPy and you can use it with a GPU for massive improvements over performance ,for an indepth numerical algebra course check [fast.ai numerical linear algbera ](https://github.com/fastai/numerical-linear-algebra)

#### Matrtix Tensors and NEO

A Vector v is a a 1-dimensional array of numbers , a Matrix is 2-dimensional array of numbers, a Tensor well a Tensor is an n-dimensional array of numbers , tensors describe linear relations between geometric spaces think of them as buckets of numbers .


In [1]:
# Getting Started with PyTorch

from __future__ import print_function
import torch

##Tensor Constructions 
x = torch.Tensor(5,3) #constructing a 5x3 Matrix
y = torch.Tensor(3) #constructing a vector 
z = torch.Tensor(3,3,3) #constructing a 3-d tensor 

print(x,y,z)


1.00000e-02 *
 -1.3144  0.0000 -1.3144
  0.0000  0.0000  0.0000
  0.0000  0.0000  0.0000
  0.0000  0.0000  0.0000
  0.0000  0.0000  0.0000
[torch.FloatTensor of size 5x3]
 
-1.3144e-02
 4.5560e-41
 2.7836e-37
[torch.FloatTensor of size 3]
 
(0 ,.,.) = 
 -1.3145e-02  4.5560e-41 -1.3145e-02
  4.5560e-41  1.8217e-44  4.1058e-42
  1.0511e-39  4.6608e-32  2.5353e+30

(1 ,.,.) = 
  1.6816e-44  3.6371e+27  1.1866e+27
  1.1422e-40  3.1407e-24  1.5046e-36
  9.9402e-32  9.4780e-38  9.4780e-38

(2 ,.,.) = 
  9.4780e-38  3.9729e-34  1.4708e-39
  5.0782e+31  2.2561e-43  0.0000e+00
 -1.3145e-02  4.5560e-41 -1.3145e-02
[torch.FloatTensor of size 3x3x3]



In [2]:
# Numerical Operations can be run on tensors 
a = torch.Tensor(3,3,3)
print(a*z)


(0 ,.,.) = 
1.00000e-04 *
   1.7278  0.0000  1.7278
   0.0000  0.0000  0.0000
   0.0000  0.0000  0.0000

(1 ,.,.) = 
1.00000e-04 *
   0.0000  0.0000  0.0000
   0.0000  0.0000  0.0000
   0.0000  0.0000  0.0000

(2 ,.,.) = 
1.00000e-04 *
   0.0000  0.0000  0.0000
   0.0000  0.0000  0.0000
  -0.0000  0.0000 -0.0000
[torch.FloatTensor of size 3x3x3]



Machine Learning methods like Topic Modeling,Recommender systems uses extensive Matrix Decompositions like SVD for example (Non Negative Matrix Factorization...), PCA for example can be used for different things like Background noise removal or Feature Reduction for Visualization .



In [4]:
t

AttributeError: 'torch.FloatTensor' object has no attribute 'uniform'

In [9]:
a = torch.rand(3,3)
a.svd()

(
 -0.3976  0.8140  0.4234
 -0.7138  0.0156 -0.7002
 -0.5766 -0.5807  0.5748
 [torch.FloatTensor of size 3x3], 
  1.7103
  0.6783
  0.3554
 [torch.FloatTensor of size 3], 
 -0.6871 -0.5968 -0.4144
 -0.6337  0.7713 -0.0600
 -0.3555 -0.2214  0.9081
 [torch.FloatTensor of size 3x3])

In [12]:
# logistic regression Sigmoid(W*X+b)
b = torch.rand(3)
W = torch.rand(3,3)
X = torch.rand(3,3)

L = torch.sigmoid(W*X+b)
print(L)



 0.5644  0.7086  0.6762
 0.5857  0.5998  0.6093
 0.6172  0.5999  0.6642
[torch.FloatTensor of size 3x3]



### Autograd

Autograd is a package that provides automatic differentiation in other words given f(x) it can compute the derivative df(x) .

This can be done in different ways for example we can use reverse mode differentiation (diff)[https://stats.stackexchange.com/questions/224140/step-by-step-example-of-reverse-mode-automatic-differentiation] there also methods that leverage dual numbers in the form of z = a + b.epsilon where epsilon is nilpotent and epsilon^2 = 0




In [13]:
# reverse mode differentiation
class Const(object):
    def __init__(self,value):
        self.value = value
    def evaluate(self):
        return self.value
    def backpropagate(self,gradient):
        pass
    def __str__(self):
        return str(self.value)
    
class Var(object):
    def __init__(self,init_value,name):
        self.value = init_value
        self.name = name
        self.gradient = 0
    def evaluate(self):
        return self.value
    def backpropagate(self,gradient):
        self.gradient += gradient
    def __str__(self):
        return self.name
class BinaryOperator(object):
    def __init__(self,a,b):
        self.a = a
        self.b = b
class Add(BinaryOperator):
    def evaluate(self):
        self.value = self.a.evaluate() + self.b.evaluate()
        return self.value
    def backpropagate(self,gradient):
        self.a.backpropagate(gradient)
        self.b.backpropagate(gradient)
    def __str__(self):
        return "{} + {}".format(self.a,self.b)
    
class Mul(BinaryOperator):
    def evaluate(self):
        self.value = self.a.evaluate() * self.b.evaluate()
        return self.value
    def backpropagate(self,gradient):
        self.a.backpropagate(gradient * self.b.value)
        self.b.backpropagate(gradient * self.a.value)
    def __str__(self):
        return "{} * {}".format(self.a,self.b)
    
x = Var(3,name='x')
y = Var(4,name='y')

f = Add(Mul(Mul(x,x),y),Add(y,Const(2))) #f(x,y) = x^2y+2+2

result = f.evaluate()
#backpropagate takes the gradient of the output respect to the input dn7/df node which is one
f.backpropagate(1.0)
print("f(x,y)=",f)
print("f(3,4)=",result)
print("df/dx=",x.gradient)
print("df/dy=",y.gradient)

f(x,y)= x * x * y + y + 2
f(3,4)= 42
df/dx= 24.0
df/dy= 10.0


In [14]:
# using pytorch autograd package
from torch.autograd import Variable

In [31]:
x = Variable(torch.Tensor([-1,3,4]),requires_grad=True)
y = x**x + 3
z = y * y * 3
out = z.mean()
out.backward()
print(x.grad) #computes d(out)/dx


Variable containing:
1.00000e+05 *
     nan
  0.0340
  3.1644
[torch.FloatTensor of size 3]



### Activation Functions

Activation functions are principal components of neural networks they introduce non linearities in neurons

In [32]:
x = torch.Tensor(3,3)

In [34]:
print(x.sigmoid()) #sigmoid is defined as f(x) = 1 / 1 + e^-x
print(x.tanh()) #tanh is the tangent hyperbolic function


 0.0000  0.5000  0.5000
 0.5000  0.5000  0.5000
 0.5000  0.5000  0.5000
[torch.FloatTensor of size 3x3]


-1.0000  0.0000  0.0000
 0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000
[torch.FloatTensor of size 3x3]

