Skip to content

Commit

Permalink
added notes
Browse files Browse the repository at this point in the history
  • Loading branch information
SpellOnYou committed Apr 23, 2020
1 parent 349bc10 commit 525869c
Show file tree
Hide file tree
Showing 51 changed files with 2,211 additions and 3,566 deletions.
1,637 changes: 0 additions & 1,637 deletions _posts/collab-copy.ipynb

This file was deleted.

File renamed without changes.
File renamed without changes.
64 changes: 0 additions & 64 deletions _posts/part1/2020-04-15-lesson06-note.md

This file was deleted.

3 changes: 0 additions & 3 deletions _posts/part1/_2020-04-13-lesson05-note.md

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2019 v3 fastai course, Lesson07"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- What's the problem of 'deep' neural network?\n",
"- why 56-layer model is worse than 20-layer?</br>\n",
"\n",
"[Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
144 changes: 144 additions & 0 deletions _posts/part1v3/2020-04-15-v3-2019-lesson06-note.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
---
layout: post
title: "fastai 2019 course-v3 Part1, lesson06"
author: dionne
categories: [ fastai-v3 ]
image: assets/images/att_00069.png
---

# Lesson 06

## Rossmann(Tabular)

- Tabular data: be careful on Categorical variable vs Continuous variable.
- if datatype is int, fastai think it is classification, not a regression.
- Root mean square percentage error. as loss function.
- When you assign the y_range, it's better to assign little bit more than actual maximum. > because it's sigmoid.
- intermediate layers, which is weight matrix is 1) 1000, and 2) 500 -> which means our parameter would be 500*1000.


~~~python
learn.model
~~~

### What is dropout and embedding dropout?

[Nitish Srivastava, Dropout: A Simple way to prevent Neural Networks from Overfitting](http://jmlr.org/papers/v15/srivastava14a.html)

- you can dropout with `p` value, make it specified to specific layer, or make it applied to all the layers.
- Pytorch code 1) bernoulli, which decides whether you will hold it? 2) and divide the noise value depends on noise value. so noise became 2 or remain 0.
- According to pytorch code, We do change at training time, but we do nothing at test time. and this means you don't have to do anything special with inference time.'
- <b>TODO</b>: find at forums `what is inference time` - Related to NVIDIA, GPU.

- Embedding dropout is just a dropout.
- It's different between continuous variable and embedding layer. <b>TODO</b> Still can't understand. why embedding dropout is effective. or,... in need.
- Let's delete at random, some of the results of the embedding.
- and It worked well especially at Kaggle

### Batch Normalization

[Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/pdf/1502.03167.pdf) -> came out false! According to [How Does Batch Normalization Help Optimization?](https://arxiv.org/pdf/1805.11604.pdf)

- The key was `multiplicative` bias {\gamma} and `additive` bias {\beta}`
- Explain
- Let \$$ \hat{y} = f(w_1, w_2, w_3, ... , x)} $$ , loss = MSE , Then `y_range` should be between 1 and 5`
- And Activation function ends with `-1 -> +1`
- To mitigate this problem, we can add the other parameter, like \$$w_n$$
- But there're so much interactions in the process so just re-scale the output.

### Momentum parameter at BatchNorm1d
- Different from momentum like in optimization.
- This momentum is Exponentially weighted moving average of the mean, instead of deviation.
- If this is small number: `mean standard deviation` would be less from mini\_batch to mini\_batch >> less regularization effect. (If this is large number, variation would be greater from mini\_batch to mini\_batch >> more regularization effect)
- TODO: can't sure, but i understand, this is not about `how to update parameter` but about `how much reflect previous value when scale and shift`

<End of Rossmann>

Q. Preference between batchnorm and the other regularizations(drop out, weight decay)
A. Nope, always try and see the results

## lesson6-pets-more

### Data Augmentation

- Last reg
- `get_transforms` has lots of params (even not yet learned all) -> check documentation
- Remember you can implement all the doc contents bc it's made from nbdev
- TODO: try this!!
- Essence of data augmentation is you should maintain the label, while somewhat making sense.
- ex) tilt, because it's optically sensible, you can always change the angle of the data view.
- zeros, border, and reflection but always `reflection` works most of the time, so that is the default

### Convolutional Kernel(What is convolution?)


- Will make heat\_map from scratch, which means the parts convolution focuses on

![setosa_visualization]()

- http://setosa.io/ev/image-kernels/
- javascript thing
- How convolution works
- Kernel. which does element-wise multiplication, and sum them up
- so it has on pixel less at borders -> so it uses padding, and fastai uses reflection as said.
- why this Kernel(matrix) helps catching horizontal edge side?
- because this kernel`(picture2)` weights differently, depends on `x axis`
- why familiar, because it's similar intuition with fugus`(paper)` paper
- CNN from different viewpoints`link`
- output of pixel is results from different linear equations.
- If you connect this with represents of neural network nodes, you can see that the specific inp nodes connected with specific out nodes.
- **Summarize**: cnn does 1) matmul some of the elements are always zero 2) same weight for every row, which is called `weight time? weight..?, 1:18:50` `(picture)`

#### Further lowdown

- Because generally image has 3 channels, we need rank 3 kernel.
- And **do multiply with all channel output is one pixel**.(`draw by your self`)
- but this kernel will catch one feature, like horizontal, so that we make more kernel so that output becomes (h * w * kernel)
- And that `kernel` come to `channel`
- **Conv2d**: with 3 by 3 kernel, stride 2 conv -> (h/2 * w/2 * kernel)
- skip or jump over input pixel
- to protect from memory out of control

~~~python
learn.model
learn.summary()
~~~
TODO: understand yourself the blocks of conv-kernel:

- Usually use big kernel size at first layer (will study this at part2)


- Bottom right highlighting kernel(`pic / draw`)
- `torch.tensor.expand`: for memory efficient, because we should do RGB
- We do not make separate kernel, but make rank 4 kernel
- 4d tensor is just stacked kernel
- `t[None].shape` create new unit axis, and why? we make this -> it should move unit of batch, not one size image.

### Average pooling, feature

- suppose our pre-trained model results in size of `11 by 11 by 512 ` `pic 4` and my classification task has 37 classes
* take the first face of channel, which is 11 by 11 and `mean` it, so that make rank 2 tensor, 512 by 1
* and make 2d matrix, which is 512 by 37 and multiply so that we can get 37 by 1 matrix.

- Feature, at convolution block
- So, when we transfer-learning without unfreeze, every element of last matrix (512 by 1) should represent(or could catch) each feature.

### Heatmap, Hook

~~~
hook_output(model[0]) -> acts -> avg_acts
~~~
- if we average the block with `axis=feature`, result of matrix(11 by 11) depicts `how activated was that area?` -> it is heatmap, `avg_acts`

- and acts comes from hook, which is more advanced pytorch feature.
- hook into pytorch machine itself, and run any arbitrary Pytorch code
- Why this is cool?: Normally it gives set of outputs of forward pass, but we can interrupt and hook the forward pass.
- Also can store the output of the convolutional part of the model, which is before avg_pooling
- Thinking back when we do cut off `after` the conv part.
- but with fast.ai the original convolutional part of the model would be *the first thing in the model*, specifically could be given from `learn.model.eval()[0]`
- And this is gotten from `hooked_output` and having hooked the output, we can pass our x_minibatch to output.
- Not directly, but with normalized, minibatch, put on to the gpu
- `one_item()` function do it, when we have one data `TODO: this is assignment` do it yourself without one_item function
- and `.cuda()` put it on gpu

- you should print out very often the shape of tensor, and try think why.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
9 changes: 9 additions & 0 deletions _posts/part1v3/_2020-04-13-lesson05-note.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Two ways are in derivatives:
1) finite differenciate
2) analytic

### GD, SGD, online GD

- gradient descent - treat whole data as batch.
- SGD - depends on batch size, update
- online gradient descent - treat one data as batch size.
49 changes: 49 additions & 0 deletions _posts/part1v3/_2020-04-17-lessonn07-note.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2019 v3 fastai course, Lesson07"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- What's the problem of 'deep' neural network?\n",
"- why 56-layer model is worse than 20-layer?</br>\n",
"\n",
"[Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
17 changes: 17 additions & 0 deletions _posts/part1v4/_2020-04-23-lesson06.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
layout: post
title: "fastai 2020 course-v4 Part1, lesson06"
author: dionne
categories: [ fastai-v3 ]
image: assets/images/att_00069.png
---


Lesson06

Q. why use steepest, not minimum?
A. we also consider the minimum, and use minimum point divided by 10

### Unfreezing and transfer learning
- We throw away last layer.
-
Loading

0 comments on commit 525869c

Please sign in to comment.