# Notes from 

* Tensorflow for Deep Learning by Bharath Ramsundar and Reza Bosagh Zadeh (O'Reilly). Copyright 2018 Reza Zadeh, Bharath Ramsundar, 978-1-491-98045-3

## Deep Learning Primitives

Most deep architectures are built by combining and recombining a limited set of architectural primitives.

### Fully Connected Layer

* Transforms a list of inputs into a list of outputs  
* Any input value can affect any output value  
* Have many learnable parameters
* Large advantage of assuming no structure in the inputs

### Convolutional Layer (images to images)

* A convolutional network assumes special spatial structure in its input
* In particular, it assumes that inputs that are close to each other spatially are semantically relates  
* Makes most sense for images  
* Convolutional layers transform images into images

### Recurrent Neural Network Layers (RNN)

* Allow neural networks to learn from sequences of inputs. 
* Assumes that the input evolves from step to step following a defined update rule that can be learned from data
* This update rule presents a prediction of the next state in the sequence given all the states that have come previously
* Very useful for tasks such as language modeling, where engineers seek to build systems that can predict the next word users will type from history

### Long Short-Term Memory Cells

* Is a modification to the RNN layer that allows for signals from deeper in the past to make their way to the present.


## Deep Learning Architectures

Nnot an exhaustive list

### LeNet

* Arguably the first prominent "deep" convolutional architecture
* Introduced in 1988
* Performed optical character recognition (OCR) for documents
* Computational cost of the LeNet was estreme for computer hardwares available at the time

### AlexNet

* Based on a modification of LeNet run on powerful graphical processing units (GPUs)
* In 2010 The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) was first organized
* In 2012, the AlexNet architecture, entered and dominated the challenge with error rates half that of the nearest competitors


### ResNet

* Winner of the ILSVRC 2015 challenge
* Extended up to 130 layers deep, in contrast to 8-layer AlexNet architecture
* When networks grow this big, they run into the vanishing gradients problem
* The ResNet introduced an innovation that controlled this attenuation
* Allows part of the signal from deeper layers to pass through undiminished

### Neural Captioning Model

* Automatically generate captions for the contents of images
* They do so by combining convolutional networks with an LSTM layer
* The entire system is trained *end-to-end*

### Google Neural Machine Translation

* Uses the paradigm of end-to-end training
* depends on the fundamental building block of the LSTM, which it stacks over a dozen times and trains on an extremely large dataset of translated sentences
* breakthrough advance in machine-translation by cutting the gap between human and machine translations by up to 60%

### One-Shot Models

* Perhaps the most interesting new idea in machine/deep learning
* Given only a few examples, such systems can learn to make meaningful predictions with very few datapoints

### AlphaGo

* AlphaGo from Google DeepMind deafeated one of the world's strongest Go champions
* Some of the key ideas from AlphaGo include the use of a deep value network and deep policy network
* The value network provides an estimate of the value of a board position
* The policy network helps estimate the best move to take given a current board state
* Monte Carlo Tree search, together with the above two techniques overcame the large branching factor in Go games

### Generative Adversarial Networks (GANs)

* Uses two competing neural networks, the generator and the adversary (also called the discriminator)
* The generator tries to draw samples from a training distribution (tries to draw realistic images)
* The discriminator works on differentiating samples drawn from the generator from true data samples

### Neural Turing Machines

* First attempt at making a deep learning architecture capable of learning arbitrary algorithms
* Adds an external memory bank to an LSTM-like system
* Allows the deep architecture to make use of scratch space to compute more sophisticated functions