<a href="https://colab.research.google.com/github/djdtimit/Deep-Learning/blob/master/Fundamentals_Deep_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fundamentals of Deep Learning 

In [0]:
import keras as kf
import tensorflow as tf
print(tf.__version__)
print(kf.__version__)

Using TensorFlow backend.


1.14.0
2.2.5


## two essential characteristics of how deep learning learns from data (Chollet: Deep Learning with python, p. 18)

- incremental, layer-by-layer way in which increasingly complex representations are developed
- these intermediate incremental representations are learned jointly

## three importand properties of deep learning (Chollet: Deep Learning with python, p. 23)

- Simplicity: Deep learning removes the need for feature engineering, replacing
complex, brittle, engineering-heavy pipelines with simple, end-to-end trainable
models that are typically built using only five or six different tensor operations.

- Scalability: Deep learning is highly amenable to parallelization on GPUs or
TPUs, so it can take full advantage of Moore’s law. In addition, deep-learning
models are trained by iterating over small batches of data, allowing them to be
trained on datasets of arbitrary size. 

- Versatility and reusability: Unlike many prior machine-learning approaches,
deep-learning models can be trained on additional data without restarting from
scratch, making them viable for continuous online learning—an important
property for very large production models. Furthermore, trained deep-learning
models are repurposable and thus reusable (transfer learning)

## What is a tensor?

A tensor is a container for data—almost always numerical data. So, it’s a
container for numbers. You may be already familiar with matrices, which are 2D tensors: tensors are a generalization of matrices to an arbitrary number of dimensions
(note that in the context of tensors, a dimension is often called an axis)

A tensor is defined by three key attributes:

- Number of axes (rank): For instance, a 3D tensor has three axes, and a matrix has
two axes. This is also called the tensor’s ndim in Python libraries such as Numpy.

- Shape: This is a tuple of integers that describes how many dimensions the tensor has along each axis. For instance, the previous matrix example has shape
(3, 5), and the 3D tensor example has shape (3, 3, 5). A vector has a shape
with a single element, such as (5,), whereas a scalar has an empty shape, ().

- Data type (usually called dtype in Python libraries): This is the type of the data
contained in the tensor; for instance, a tensor’s type could be float32, uint8,
float64, and so on. On rare occasions, you may see a char tensor. Note that
string tensors don’t exist in Numpy (or in most other libraries), because tensors
live in preallocated, contiguous memory segments: and strings, being variable
length, would preclude the use of this implementation.

In [0]:
from keras.datasets import mnist

In [0]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Downloading data from https://s3.amazonaws.com/img-datasets/mnist.npz


In [0]:
print('axes: ', train_images.ndim)
# 60000 images
print('shape: ', train_images.shape)
print('dtype: ', train_images.dtype)
print('type: ', type(train_images))

axes:  3
shape:  (60000, 28, 28)
dtype:  uint8
type:  <class 'numpy.ndarray'>


axis 0 = samples axes or batch axes

## Preprocessing

- scaling of input data into the [0, 1] interval
- reshaping of data
- categorically encoding of the labels
- one-hot encoding
- vectorizing of data since only tensors are allowed as input

## Layers

- densely connected or fully connected layers: simple vector data, stored in 2D tensors of shape (samples,features) -> sentiment analysis

- recurrent layers: Sequence data, stored in 3D tensors of shape (samples,timesteps, features)

- 2D convolution layers (Conv2D): Image data, stored in 4D tensors

- information bottleneck: layer with small number of neurons dropping relevant information -> increase number of neurons for example in a multiclass classification problem

## Loss functions

- binary crossentropy: two-class classification
  
- categorical crossentropy: many-class classification problem
  
- meansquared error: regression problem
  
- connectionist temporal classification (CTC): sequence-learning problem

## Activation functions

- activation functions are necessary to introduce non-linearity in order to get access to a much richer hypthesis space that would benefit from deep representations

- relu is a good choice

## Optimizers

- rmsprop is a good choice

## binary classification

- network should end with a Dense layer with one unit and sigmoid activation function

- loss_function: binary_crossentropy

## classification problems

- single label, multiclass classification: many classes but a classification into one catgory

- multi label, multiclass classification: each data point could belong to multiple classes

## multiclass classification

- network should end with a softmax activation function

- not too few neurons per layer (see information bottleneck)

- loss function: categorical_crossentropy