Implementations of various deep learning architectures + extra theoretical information
Implementations of various deep learning architectures + extra theoretical information
Feature Extraction Local connectivity:
- Apply a feature extraction filter == convolution layer.
- Add bias to the filter function.
- Activate with a non-linear function e.g ReLU for thresholding.
- Pooling: dimesnionality reduction of generated feature maps from 1&2 above. Still maintains spatial invariance.
Classification:
- Apply a fully connected layer that uses the high-level features from 1-2-3 to perform classification.
Other potential applications; to achieve, you only modify the fully-connected layer bit == step1 under classification.
- Segmentation.
- Object Detection.
- Regression.
- Probabilistic Control.
- Autonomous navigation.
Applications:
- Density Estimation.
- Sample Generation.
- Input data X is encoded(compressed, self-encoded, auto encoded) to a lower dimensional latent space Z then model(decoder network) learns from the Z to reconstruct x as x hat.
- Loss function == sq of difference between pixel by pixel difference between x and x hat.
- VAEs introduces a stochasti process/randomness/probability aspect by calculating mean and SD from which the latent space features Z are sampled from.
- VAEs loss = reconstruction loss + regularisation term(enforcess continuity and completeness).
- Regularisation(D) = distance between 2 distributions (techniques like KL-divergence can be used to qunatify) (prior is introduced)
- To counter the challenge of backpropagation cause of stochasticty, reparametrisation is introduced == calculating a fixed vector of means and SDs thus driving away the probabilistic nature away from means and SDs of z.
- Generators starts from noise(z==latent space) then imitates input data based on this.
- Discriminator tries to identify real data from fakes created by the generator.(minimises probability of fake)
- The process is iterative till the discriminator produces the highest probability that the generated data is real.
- G == global optimum = produce the true data distribution == minmax
- output at time t = a function of the input at t and past memory at time t-1.
Steps:
- Initialise the weight matrices and the hidden state to 0.
- Defining the core function: the forward pass through:
- Update of the hidden state(input + previous state).
- Output computation to generate output and new hidden state.
Design Criteria:
- Handle sequences of variable length.
- Track longterm dependencies.
- Maintain info about order.
- Share parameters across the sequence.
3.2 LSTM
Concepts:
- Maintains cell state.
- Has gates that control info flow: eliminates irrelevantt, keeps relevant.
- Backpropagation through time with partially uninterrupted gradient flow.