added notes

SpellOnYou · Apr 23, 2020 · 525869c · 525869c
1 parent 349bc10
commit 525869c
Show file tree

Hide file tree

Showing 51 changed files with 2,211 additions and 3,566 deletions.
diff --git a/_posts/collab-copy.ipynb b/_posts/collab-copy.ipynb
diff --git a/_posts/_2020-02-09-fast.ai-nlp-note-10.md → ...ts/nlp/_2020-02-09-fast.ai-nlp-note-10.md b/_posts/_2020-02-09-fast.ai-nlp-note-10.md → ...ts/nlp/_2020-02-09-fast.ai-nlp-note-10.md
diff --git a/_posts/_2020-02-29-ELMO-ulmfit.md → _posts/nlp/_2020-02-29-ELMO-ulmfit.md b/_posts/_2020-02-29-ELMO-ulmfit.md → _posts/nlp/_2020-02-29-ELMO-ulmfit.md
diff --git a/_posts/part1/2020-04-15-lesson06-note.md b/_posts/part1/2020-04-15-lesson06-note.md
diff --git a/_posts/part1/_2020-04-13-lesson05-note.md b/_posts/part1/_2020-04-13-lesson05-note.md
diff --git a/_posts/part1v3/.ipynb_checkpoints/_2020-04-17-lessonn07-note-checkpoint.ipynb b/_posts/part1v3/.ipynb_checkpoints/_2020-04-17-lessonn07-note-checkpoint.ipynb
@@ -0,0 +1,49 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2019 v3 fastai course, Lesson07"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "- What's the problem of 'deep' neural network?\n",
+    "- why 56-layer model is worse than 20-layer?</br>\n",
+    "\n",
+    "[Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/...art1/2020-04-02-qna-image-segmentation.md → ...t1v3/2020-04-02-qna-image-segmentation.md b/...art1/2020-04-02-qna-image-segmentation.md → ...t1v3/2020-04-02-qna-image-segmentation.md
diff --git a/_posts/part1v3/2020-04-15-v3-2019-lesson06-note.md b/_posts/part1v3/2020-04-15-v3-2019-lesson06-note.md
@@ -0,0 +1,144 @@
+---
+layout: post
+title: "fastai 2019 course-v3 Part1, lesson06"
+author: dionne
+categories: [ fastai-v3 ]
+image: assets/images/att_00069.png
+---
+
+# Lesson 06
+
+## Rossmann(Tabular)
+
+- Tabular data: be careful on Categorical variable vs Continuous variable.
+- if datatype is int, fastai think it is classification, not a regression.
+- Root mean square percentage error. as loss function.
+- When you assign the y_range, it's better to assign little bit more than actual maximum. > because it's sigmoid.
+-  intermediate layers, which is weight matrix is 1) 1000, and 2) 500 -> which means our parameter would be 500*1000.
+
+
+~~~python
+learn.model
+~~~
+
+### What is dropout and embedding dropout?
+
+[Nitish Srivastava, Dropout: A Simple way to prevent Neural Networks from Overfitting](http://jmlr.org/papers/v15/srivastava14a.html)
+
+- you can dropout with `p` value, make it specified to specific layer, or make it applied to all the layers.
+- Pytorch code 1) bernoulli, which decides whether you will hold it? 2) and divide the noise value depends on noise value. so noise became 2 or remain 0.
+	 - According to pytorch code, We do change at training time, but we do nothing at test time. and this means you don't have to do anything special with inference time.'
+	 - <b>TODO</b>: find at forums `what is inference time` - Related to NVIDIA, GPU.
+
+- Embedding dropout is just a dropout.
+	- It's different between continuous variable and embedding layer.  <b>TODO</b> Still can't understand. why embedding dropout is effective. or,... in need.
+	- Let's delete at random, some of the results of the embedding. 
+	- and It worked well especially at Kaggle
+
+### Batch Normalization
+
+[Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/pdf/1502.03167.pdf) -> came out false! According to [How Does Batch Normalization Help Optimization?](https://arxiv.org/pdf/1805.11604.pdf)
+
+- The key was  `multiplicative` bias {\gamma} and `additive` bias {\beta}`
+- Explain
+	- Let \$$ \hat{y}  = f(w_1, w_2, w_3, ... , x)} $$ ,  loss = MSE , Then `y_range` should be between 1 and 5`
+	- And Activation function ends with `-1 -> +1`
+	- To mitigate this problem, we can add the other parameter, like \$$w_n$$
+	- But there're so much interactions in the process so just re-scale the output.
+
+### Momentum parameter at BatchNorm1d
+- Different from momentum like in optimization.
+- This momentum is Exponentially weighted moving average of the mean, instead of deviation. 
+	-  If this is small number: `mean standard deviation` would be less from mini\_batch to mini\_batch >> less regularization effect. (If this is large number, variation would be greater from mini\_batch to mini\_batch >> more regularization effect)
+	-  TODO: can't sure, but i understand, this is not about `how to update parameter` but about `how much reflect previous value when scale and shift`
+
+<End of Rossmann>
+
+Q. Preference between batchnorm and the other regularizations(drop out, weight decay)
+A. Nope, always try and see the results
+
+## lesson6-pets-more
+
+### Data Augmentation
+
+- Last reg
+- `get_transforms` has lots of params (even not yet learned all) -> check documentation
+	- Remember you can implement all the doc contents bc it's made from nbdev
+	- TODO: try this!!
+- Essence of data augmentation is you should maintain the label, while somewhat making sense.
+	- ex) tilt, because it's optically sensible, you can always change the angle of the data view.
+- zeros, border, and reflection but always `reflection` works most of the time, so that is the default
+
+### Convolutional Kernel(What is convolution?)
+
+
+- Will make heat\_map from scratch, which means the parts convolution focuses on
+
+![setosa_visualization]()
+
+- http://setosa.io/ev/image-kernels/
+	- javascript thing
+	- How convolution works
+	- Kernel. which does element-wise multiplication, and sum them up
+	- so it has on pixel less at borders -> so it uses padding, and fastai uses reflection as said.
+- why this Kernel(matrix) helps catching horizontal edge side?
+	- because this kernel`(picture2)` weights differently, depends on `x axis`
+	- why familiar, because it's similar intuition with fugus`(paper)` paper
+- CNN from different viewpoints`link`
+	- output of pixel is results from different linear equations.
+	- If you connect this with represents of neural network nodes, you can see that the specific inp nodes connected with specific out nodes.
+	- **Summarize**: cnn does 1) matmul some of the elements are always zero 2) same weight for every row, which is called `weight time? weight..?, 1:18:50` `(picture)`
+
+#### Further lowdown
+
+- Because generally image has 3  channels, we need rank 3 kernel.
+- And **do multiply with all channel output is one pixel**.(`draw by your self`)
+	- but this kernel will catch one feature, like horizontal, so that we make more kernel so that output becomes (h * w * kernel)
+	- And that `kernel` come to `channel`
+- **Conv2d**: with 3 by 3 kernel, stride 2 conv -> (h/2 * w/2 * kernel)
+	- skip or jump over input pixel
+	- to protect from memory out of control
+
+~~~python
+learn.model
+learn.summary()
+~~~
+TODO: understand yourself the blocks of conv-kernel: 
+
+- Usually use big kernel size at first layer (will study this at part2)
+
+
+- Bottom right highlighting kernel(`pic / draw`)
+- `torch.tensor.expand`: for memory efficient, because we should do RGB
+- We do not make separate kernel, but make rank 4 kernel
+	- 4d tensor is just stacked kernel
+- `t[None].shape` create new unit axis, and why? we make this -> it should move unit of batch, not one size image.
+
+### Average pooling, feature
+
+- suppose our pre-trained model results in size of `11 by 11 by 512 ` `pic 4` and my classification task has 37 classes
+	* take the first face of channel, which is 11 by 11 and `mean` it, so that make rank 2 tensor, 512 by 1
+	* and make 2d matrix, which is 512 by 37 and multiply so that we can get 37 by 1 matrix.
+
+- Feature, at convolution block
+	- So, when we transfer-learning without unfreeze, every element of last matrix (512 by 1) should represent(or could catch) each feature. 	
+
+### Heatmap, Hook
+
+~~~
+hook_output(model[0]) -> acts -> avg_acts
+~~~
+- if we average the block with `axis=feature`, result of matrix(11 by 11) depicts `how activated was that area?` -> it is heatmap, `avg_acts`
+
+- and acts comes from hook, which is more advanced pytorch feature.
+	- hook into pytorch machine itself, and run any arbitrary Pytorch code
+	- Why this is cool?: Normally it gives set of outputs of forward pass, but we can interrupt and hook the forward pass.
+	- Also can store the output of the convolutional part of the model, which is before avg_pooling
+- Thinking back when we do cut off `after` the conv part.
+	- but with fast.ai the original convolutional part of the model would be *the first thing in the model*, specifically could be given from `learn.model.eval()[0]`
+	- And this is gotten from `hooked_output` and having hooked the output, we can pass our x_minibatch to output.
+	- Not directly, but with normalized, minibatch, put on to the gpu
+	- `one_item()` function do it, when we have one data `TODO: this is assignment` do it yourself without one_item function
+	- and `.cuda()` put it on gpu
+
+- you should print out very often the shape of tensor, and try think why.
diff --git a/_posts/part1/_2019-12-20-lecture1-note.md → _posts/part1v3/_2019-12-20-lecture1-note.md b/_posts/part1/_2019-12-20-lecture1-note.md → _posts/part1v3/_2019-12-20-lecture1-note.md
diff --git a/_posts/part1/_2019-12-31-lesson1-fastai.md → _posts/part1v3/_2019-12-31-lesson1-fastai.md b/_posts/part1/_2019-12-31-lesson1-fastai.md → _posts/part1v3/_2019-12-31-lesson1-fastai.md
diff --git a/_posts/part1/_2020-01-27-lesson02-note.md → _posts/part1v3/_2020-01-27-lesson02-note.md b/_posts/part1/_2020-01-27-lesson02-note.md → _posts/part1v3/_2020-01-27-lesson02-note.md
diff --git a/_posts/part1/_2020-02-08-fastai-lesson03.md → ...ts/part1v3/_2020-02-08-fastai-lesson03.md b/_posts/part1/_2020-02-08-fastai-lesson03.md → ...ts/part1v3/_2020-02-08-fastai-lesson03.md
diff --git a/_posts/_2020-03-03-collab.md → _posts/part1v3/_2020-03-03-collab.md b/_posts/_2020-03-03-collab.md → _posts/part1v3/_2020-03-03-collab.md
diff --git a/_posts/part1v3/_2020-04-13-lesson05-note.md b/_posts/part1v3/_2020-04-13-lesson05-note.md
@@ -0,0 +1,9 @@
+Two ways are in derivatives:
+	1) finite differenciate
+	2) analytic
+
+### GD, SGD, online GD
+
+- gradient descent - treat whole data as batch.
+- SGD - depends on batch size, update
+- online gradient descent - treat one data as batch size. 
diff --git a/_posts/part1v3/_2020-04-17-lessonn07-note.ipynb b/_posts/part1v3/_2020-04-17-lessonn07-note.ipynb
@@ -0,0 +1,49 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2019 v3 fastai course, Lesson07"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "- What's the problem of 'deep' neural network?\n",
+    "- why 56-layer model is worse than 20-layer?</br>\n",
+    "\n",
+    "[Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/_posts/part1v4/_2020-04-23-lesson06.md b/_posts/part1v4/_2020-04-23-lesson06.md
@@ -0,0 +1,17 @@
+---
+layout: post
+title: "fastai 2020 course-v4 Part1, lesson06"
+author: dionne
+categories: [ fastai-v3 ]
+image: assets/images/att_00069.png
+---
+
+
+Lesson06
+
+Q. why use steepest, not minimum?
+A. we also consider the minimum, and use minimum point divided by 10
+
+### Unfreezing and transfer learning
+- We throw away last layer.
+-