# Deep Learning Course - Introduction

This is the introduction to the Deep Learning course. 

In this week's entry we will cover: 

 * Course Scope 
 * Prerequisites 
 * Course Structure 
 * Assignments and Assessment
 * Coding
 * Server Environment 

## Course Scope 

In this Deep Learning course we will trace through the most important concepts in Deep Learning such as the Backpropagation Algorithm, Convolutional Neural Networks, Recurrent Neural Networks and Restricted Boltzman Machines.

The course will place emphasis on introducing concepts that are commonly used in practical Deep Learning research and application. An emphasis on running examples will be made. As such lecture notes will be provided in the style of Jupyter notebooks that a student can download, edit and run on their own machines. 

Some content commonly found in other courses will not be covered here. For example the following will not be addressed: 

 * The history of neural networks 
 * The biological inspiration of neural networks 

The interested student is directed to the many good external resources such as Geoff Hinton's online course for  detail on these topics. 

Speaking of videos, this course is not intended to be self-contained. While detailed notes with working examples will be given from week to week as Jupyter notebooks, links to external resources such as videos from Geoff Hinton and Andrew Ng will also be suggested for additional coverage or for a different perspective on a given question. 

## Prerequisites 

This course is intended as an Introduction to Deep Learning for students who have already completed undergraduate or Masters level modules on:

 * Artificial Intelligence
 * Machine Learning
 
In the first few weeks the essential background topics of Linear and Logistic Regression will be re-introduced, but no other background Machine Learning concepts will be addressed such as distinctions between learning types; or benchmarking machine learning performance. Students who have no background in Machine learning should first take the Machine Learning module in the MSc programme. Students who have completed a Machine Learning module but feel rusty should consult with John Kelleher's textbook or Andrew Ng's free online Introduction to Machine Learning video series. 

The module is suitable for someone who has already taken an online module in Deep Learning or otherwise consulted books etc. While no new material will be covered the course will give you an opportunity to reflect on important topics with the support of fellow students. 

All assignments must be coded up using Python with related libraries. If you have not already coded in Python now is a good time to start. Introductory tutorials to Python and the use of the SKLearn (scikit) package should be consulted. In short make sure that you can load a data set, perform k-fold cross validation of an SVM based classifier, and then run and present metrics such as the F1 score. 

## Course Structure 

This is a 13 class programme with the following structure for the Spring 2018 semester. The course structure is subject to change and this content should be considered indicative. 

###### Class 1 - Introduction and Linear Regression
 * Course Scope 
 * Prerequisites 
 * Course Structure 
 * Assignments and Assessment
 * Coding 
 * Server Environment 
 * Fitting functions to data
 * Cost Functions 
 * Gradient Descent 
 * Normalization of Data
 
###### Class 2 - Logistic Regression
 * Non-linear Linear Regression
 * Regularization
 * The Logistic Function
 * The Cost Function for Logistic Units
 * Limits for Logistic Function
 * Higher Order Functions
 * Regularization for Logistic Regression
 
######  Class 3 - Neural Network Representations and Forward Propogation
 * From Linear to Non-Linear Classifiers
 * Units
 * Layers
 * Bias Units
 * Building non-linear functions
 * The Feed Forward Algorithm
 
###### Class 4 - Backpropagation
 * Overview of Backpropagation Methods
 * Deriving the Backpropagation Equations
 * The Backpropagation Algorithm
 * Visualizing Backpropogation in Tensorflow
 
###### Class 5 - Refining Backpropagation
 * Cross Entropy Loss function 
 * Hyperbolic Tangent Units
 * Rectified Linear Units
 * Softmax Layers 
 
###### Class 6 - Preventing Overfitting
 * Regularization in Neural Networks
 * Early Stopping
 * Dropout
 
###### Class 7 - Convolutional Neural Networks
 * Test
 * Convolutions 
 * Pooling Layers
 * Implementing a CNN
 * Scaling networks with a GPU

###### Class 8 - Convolutional Neural Networks pt 2
 * Practical Applications
 * Assignment Review
 * Advanced CNN Applications

###### Class 9 - Recurrent Neural Networks I
 * Basic topology
 * Motivating examples
 * Long Short Term Memory
 * Language Modelling Eample
 
###### Class 10 - Recurrent Neural Networks II
 * Gated Recurrent Units
 * Dialogue Systems Example 
 
###### Class 11 - Unsupervised Training I
 * Motivation
 * Representations and Dimensionality Reduction
 * Autoencoders
 
###### Class 12 - Unsupervised Training II
 * Restricted Boltzman Machines
 * RBM Training
 
###### Week 13 - Class Test 
 * Final Class Test 

Classes will be delivered approximately weekly on TUESDAY evening at 6pm. 

### Delivery Model

The course is a 5 ECTS course at Level 9/10. There is two contact hours per week, but students are expected to engage in a significant amount of self-study each week. The contact hours will be used for group discussion, tests, reviewing assignments, and addressing any issues with respect to the delivery of the course. With the exception of Week 1, the hours are not used to deliver material. Instead students are expected to study the content for each module in advance of that class. The example for the class in Week 2, it is expected that students will have studied the course notes on Linear Regression, reviewed any relevant videos, and completed any relevant questions or assignments. Thus the class is delivered with a **flipped classroom** model. All content will be made available in the Deep Learning module on Webcourses. 

## Assignments and Assessment
This is a 100\% Continuous Assessment course. The assessment of the course will be broken down as follows:

 * 40\% on in-class tests
 * 60\% on Coding Challenge 

There will be two in-class tests to encourage students to continuously engage with the material and take on assignments. 

The Challenge / Project is a coding / modeling task which will allow the student to demonstrate a clear understanding of the concepts covered in the course. A dataset will be provided and students will be required to submit operational code and a short but detailed report on their model. Specific instructions on both the Challenge  will be provided. The Challenge will be due at the end of the semester. 

## Coding 
In this course we will use Python extensively for all examples and assignments. Rather than using vanilla Python we will where appropriate make use of Python packages that provide enhanced functionality for numerical computing and Deep Learning. Examples have been coded up using Python 3 rather than Python 2.7. it is possible to run these environments in Python 2.7 but in most cases imports from `__future__` will be necessary. 

The following packages are some of the most frequently used in this course:
 * **numpy**
 * **scipy** - often referred to as SciKit
 * **matplotlib**

If you aren't familiar with either Python or these specific packages, now is the time to get familiar. 

There are many other interesting packages for numerical computing in Python. One interesting one is **Pandas**. This provides a frame based indexing mechanism similar to the one you may be used to in the R programming language. Pandas examples can in may cases be more intuitive but there is an overhead in computing based on Pandas frames. For this reason our examples will be based on numpy. 

Course notes and all examples will be coded up in the **Jupyter Notebook** environment. Notebooks will be made available on Webcourses for download. Students who do not already use Jupyter Notebook should install Jupyter Notebook. 

As indicated, the course will place emphasis on practical understanding - with attention placed on both the practical implementation and use of important model types. As such models will be explained with two types of examples: **The Hard Way** and **The Easy Way**. 

### The Hard Way

In each week most material will explain how to implement key concepts from first principles. We refer to this way of doing things as **The Hard Way**. In these examples we will make use of the **numpy** library for performing numerical operations such as matrix multiplication or transposition. However we will in general be designing and coding important concepts such as neuron types, the backpropogation algorithm etc. with little reliance on well known libraries. The emphasis here will be on understanding how the algorithm works. In these cases many simplifying assumptions will often be made.

### The Easy Way

In each week we will also use examples to show how the newly introduced concept can be quickly implemented using well known 3rd party libraries such as **scipy** or **TensorFlow**, or **PyTorch**. The emphasis here will often be on more complex examples which demonstrate the true power or limitations of the models that we are investigating.

## Server Environment 

Students are expected to have running installations of scipy and TensorFlow running on their own machines for testing. Where possible you should use a machine with a GPU and test out your code on both CPU and GPU architectures. 

However for your Assignment 1 and Assignment 2 task each student will be given some time on one of our GPU nodes to run tests on their code. Running code will be deployed using **Slurm** to avoid conflicts. Keep in mind that only one node will be made available between all students. Therefore it will be necessary for you to test your code completely in advance and be sure to provide yourself with sufficient time for testing given that there may be a significant draw on the machines by other students. 

Details on the server environment including the IP address of your machine, your username and password, and details on how to deploy a process with **Slurm** will be detailed when the first assignment is given out. 

# Appendices 

Below you will find a couple of useful reference posts for issues such as notation and linear algebra in python. 

## Appendix A - Notation

Given a training set we talk about:
 * $m$ = number of training examples
 * $x$'s = input variables or features
 * $y$'s = output variable or target
 * $(x,y)$ = one training example
 * $(x^{i},y^{i})$ = refers specifically to the ith training case

## Appendix B - Linear Algebra in Python

In python we can use the `numpy` library to easily define and perform operations on vectors and matrices. 

### Matrices
The term **matrix** refers to a 2D rectangular array of numbers. 

In [1]:
import numpy as np

# We can define a matrix directly from a string of numbers where rows are delimited by semi-colons
A = np.matrix('1 2 3; 4 5 6')
print(A)

# Alternatively we can define the same matrix from a series of vectors where each vector defines a row of the matrix
B = np.matrix([[1, 2, 3], [4, 5, 6]])
print(B)

# Alternatively we can use numpy's array constructor to creaea a 2D array, i.e., our matrix
B = np.array([[1, 2, 3], [4, 5, 6]])
print(B)

[[1 2 3]
 [4 5 6]]
[[1 2 3]
 [4 5 6]]
[[1 2 3]
 [4 5 6]]


A matrix will have a certain *dimensionality* defined in terms of the number of rows and the number of columns. We can use standard number theory notation to define the dimensionality of the matrix. For example we can define the set of matrices of real numbers with 3 columns and 2 rows as $R^{3x2}$. 

The **shape** function in numpy will return the number of rows and columns for a given matrix. 

In [2]:
print(A.shape)

(2, 3)


We refer to individual objects within the matrix as **elements**. We use subscript notation on the matrix name in order to refer to individual objects. In general $M_{i,j}$ will refer to the element found on the $i^{th}$ row and $j^{th}$ column of M. 

Numpy supports a wide range of methods for indexing and slicing arrays which we will not cover here. In the simple case we can however use indices to operate directly on the matrix as follows. 

In [3]:
print(A[1,1])

5


Note that indexing on the rows and columns of numpy matrices begins at 0. 

### Vectors
A vector is a 1D array and as such can be thought of as a special case of a matrix with only one column. 

While the vector is in general a special case of a matrix, we typically use different operators to create and work with vectors. 

In [4]:
# create a numpy vector by way of a stadard python list fed into the array constructor
v1 = np.array([2,3,1,0])
print(v1)

# note that this is not equivilent to attempting to create the array as a single row of a matrix
v2 = np.matrix([[2,3,1,0]])
print(v2)

# index an element in the vector
print(v1[1])

[2 3 1 0]
[[2 3 1 0]]
3


### Matrix and Vector Basic Operations

We can add and subtract matrices which are of the same dimensionality to result in a new matrix which is of that same dimensionality. 

In [5]:
C = A + B
print(C)
D = C - A
print(D)

[[ 2  4  6]
 [ 8 10 12]]
[[1 2 3]
 [4 5 6]]


We can do the same for vectors

In [6]:
v2 = np.array([10,20,30,40])

v3 = v1 + v2
print(v3)
v3 = v2 - v1
print(v3)

[12 23 31 40]
[ 8 17 29 40]


We can also directly apply scalar multiplication and division operations to matrices and vectors.  

In [7]:
E = A * 2
print(E)
F = A / 2
print(F)
v4 = v2 * 2
print(v4)
v5 = v2 / 2
print(v5)

[[ 2  4  6]
 [ 8 10 12]]
[[0.5 1.  1.5]
 [2.  2.5 3. ]]
[20 40 60 80]
[ 5. 10. 15. 20.]


### Inner / Scalar Product
The Inner Product or Scalar Product of two vectors is the scalar result of summing the pairwise products of elements in two vectors of equal length. In geometric space the Scalar Product is often interpreted as a distance metric between two points in that space. This is often used for example to calculate document similarities. 

In python we can calculate the scalar product of two vectors using numpy's inner function. 

In [8]:
v3 = np.dot(v1,v2)
print(v3)

110


Note that the dot function will also product this result when applied to two vectors. However the dot function can also be applied to matrices and higher-order arrays. 

### Matrix-Matrix Multiplication
Given two matrices we can calculate the cross product of these matrices so long as the number of rows in the first matrix equals the number of columns in the second. 

For two matrices A and B, the dimensionality of the resultant matrix product is given by: 

\begin{equation}
R_{A}C_{A} \times R_{B}C_{B} = R_{A}C_{B}
\end{equation}

The operation for calculating the matrix product is straightforward: the entry for row i column j in the resultant matrix C is the dot product of the i$^{th}$ row of A and the j$^{th}$ column of B. 

![matrix matrix multiplication](figures/img792.gif)

In python we can use the numpy function **matmul** to perform matrix-matrix multiplication. 

In [9]:
G = np.matrix('1 2; 3, 4; 5, 6')
print(G)
H = np.matmul(A,G)
print(H)

[[1 2]
 [3 4]
 [5 6]]
[[22 28]
 [49 64]]


Remember that matrix matrix multiplication is not commutative. 
\begin{equation}
  A \times B \neq B \times A
\end{equation}

In [10]:
A = np.matrix('1, 2, 3; 4, 5, 6; 7, 8, 9')
B = np.matrix('1, 1, 1; 2, 2, 2; 3, 3, 3')
print(A)
print(B)
print(np.matmul(A,B))
print(np.matmul(B,A))

[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[1 1 1]
 [2 2 2]
 [3 3 3]]
[[14 14 14]
 [32 32 32]
 [50 50 50]]
[[12 15 18]
 [24 30 36]
 [36 45 54]]


### Matrix Identity and Inverse 

We can define the identity $I$ of a matrix which in general allows the commutation property to hold. 

\begin{equation}
 A \times I = I \times A = A
\end{equation}

Here $I$ is the Identity Matrix which is a square matrix where all diagonal elements are = 1 and all non-diagonal elements are = 0. 

Numpy allows us to easily define an identity matrix with a specified number of rows and columns. 

In [11]:
I = np.identity(3)
print(I)
print(A)
print(np.matmul(A,I))
print(np.matmul(I,A))

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]


Just as we can define the inverse for a real number $X \in R$ as $\frac{1}{X}$, we can also define the inverse for a matrix. 

Beginning first with the case of real numbers, we note that: 

\begin{equation}
 X \times INV(X) = I
\end{equation}

where 1 is the identity for real numbers - which is 1. 

This gives us an intuition of how the inverse is defined for matrices:  

\begin{equation}
 A \times INV(A) = I 
\end{equation}

i.e., the inverse of a matrix A should be defined such that the matrix product of A by it produces an identity matrix. 

While we straightforwardly calculate the inverse of a real number X as $\frac{1}{X}$, the calculation of the inverse of a matrix is more complicated and involves calculations of Determinants and Cofactors of matrices which we will not consider here. Fortunately we can of course calculate the inverse directly in numpy. 

In [12]:
import numpy.linalg as la
X = np.matrix('100, -200, 403; 44, -5, 607; 27, 98, -59')
B = la.inv(X)
print(B)
print(np.matmul(X,B))

# note that the process doesn't work very well when the candidate matrix is close to 0
print("Uncomment lines in the block to run this - but expect an error!") 
# B = la.inv(A)
# print(np.matmul(A,B))

[[ 0.00746988 -0.00349497  0.01506633]
 [-0.0023959   0.00211775  0.00542254]
 [-0.00056121  0.00191823 -0.00104746]]
[[ 1.00000000e+00  4.31512465e-17  2.29850861e-17]
 [ 4.47775497e-17  1.00000000e+00 -5.16080234e-17]
 [ 4.32596667e-17 -3.10081821e-17  1.00000000e+00]]
Uncomment lines in the block to run this - but expect an error!


### Matrix Transpose
We can also define the transpose of a matrix $A^{T}$ as a matrix such that rows and columns of $A$ are reversed. 

\begin{equation}
   A_{ij} = A_{ji}
\end{equation}

In [13]:
C = A.T
print(A)
print(C)

[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[1 4 7]
 [2 5 8]
 [3 6 9]]
