diff --git a/_site/2020/01/retrosprect-of-acl-paper-2020/index.html b/_site/2020/01/retrosprect-of-acl-paper-2020/index.html
index 5c9f8cf605..f7de9e92b2 100644
--- a/_site/2020/01/retrosprect-of-acl-paper-2020/index.html
+++ b/_site/2020/01/retrosprect-of-acl-paper-2020/index.html
@@ -19,9 +19,9 @@
-
+
+{"description":"2020 Annual Conference of the Association for Computational Linguistics","author":{"@type":"Person","name":"dionne"},"@type":"BlogPosting","url":"http://localhost:4000/2020/01/retrosprect-of-acl-paper-2020/","publisher":{"@type":"Organization","logo":{"@type":"ImageObject","url":"http://localhost:4000/assets/images/logo.png"},"name":"dionne"},"image":"http://localhost:4000/assets/images/acl2020.png","headline":"Retrospect of ACL 2020 paper writing","dateModified":"2020-01-29T00:00:00+09:00","datePublished":"2020-01-29T00:00:00+09:00","mainEntityOfPage":{"@type":"WebPage","@id":"http://localhost:4000/2020/01/retrosprect-of-acl-paper-2020/"},"@context":"http://schema.org"}
@@ -161,96 +161,101 @@
"body": " {% if page. url == / %} {% assign latest_post = site. posts[0] %} <div class= topfirstimage style= background-image: url({% if latest_post. image contains :// %}{{ latest_post. image }}{% else %} {{site. baseurl}}/{{ latest_post. image}}{% endif %}); height: 200px; background-size: cover; background-repeat: no-repeat; ></div> {{ latest_post. title }} : {{ latest_post. excerpt | strip_html | strip_newlines | truncate: 136 }} In {% for category in latest_post. categories %} {{ category }}, {% endfor %} {{ latest_post. date | date: '%b %d, %Y' }} {%- assign second_post = site. posts[1] -%} {% if second_post. image %} <img class= w-100 src= {% if second_post. image contains :// %}{{ second_post. image }}{% else %}{{ second_post. image | absolute_url }}{% endif %} alt= {{ second_post. title }} > {% endif %} {{ second_post. title }} : In {% for category in second_post. categories %} {{ category }}, {% endfor %} {{ second_post. date | date: '%b %d, %Y' }} {%- assign third_post = site. posts[2] -%} {% if third_post. image %} <img class= w-100 src= {% if third_post. image contains :// %}{{ third_post. image }}{% else %}{{site. baseurl}}/{{ third_post. image }}{% endif %} alt= {{ third_post. title }} > {% endif %} {{ third_post. title }} : In {% for category in third_post. categories %} {{ category }}, {% endfor %} {{ third_post. date | date: '%b %d, %Y' }} {%- assign fourth_post = site. posts[3] -%} {% if fourth_post. image %} <img class= w-100 src= {% if fourth_post. image contains :// %}{{ fourth_post. image }}{% else %}{{site. baseurl}}/{{ fourth_post. image }}{% endif %} alt= {{ fourth_post. title }} > {% endif %} {{ fourth_post. title }} : In {% for category in fourth_post. categories %} {{ category }}, {% endfor %} {{ fourth_post. date | date: '%b %d, %Y' }} {% for post in site. posts %} {% if post. tags contains sticky %} {{post. title}} {{ post. excerpt | strip_html | strip_newlines | truncate: 136 }} Read More {% endif %}{% endfor %} {% endif %} All Stories: {% for post in paginator. posts %} {% include main-loop-card. html %} {% endfor %} {% if paginator. total_pages > 1 %} {% if paginator. previous_page %} « Prev {% else %} « {% endif %} {% for page in (1. . paginator. total_pages) %} {% if page == paginator. page %} {{ page }} {% elsif page == 1 %} {{ page }} {% else %} {{ page }} {% endif %} {% endfor %} {% if paginator. next_page %} Next » {% else %} » {% endif %} {% endif %} {% include sidebar-featured. html %} "
}, {
"id": 12,
+ "url": "http://localhost:4000/2020/04/v3-2019-lesson06-note/",
+ "title": "fastai 2019 course-v3 Part1, lesson06",
+ "body": "2020/04/15 - Lesson 06Rossmann(Tabular): Tabular data: be careful on Categorical variable vs Continuous variable. if datatype is int, fastai think it is classification, not a regression. Root mean square percentage error. as loss function. When you assign the y_range, it’s better to assign little bit more than actual maximum. > because it’s sigmoid. intermediate layers, which is weight matrix is 1) 1000, and 2) 500 -> which means our parameter would be 500*1000. learn. modelWhat is dropout and embedding dropout?: Nitish Srivastava, Dropout: A Simple way to prevent Neural Networks from Overfitting you can dropout with p value, make it specified to specific layer, or make it applied to all the layers. Pytorch code 1) bernoulli, which decides whether you will hold it? 2) and divide the noise value depends on noise value. so noise became 2 or remain 0. According to pytorch code, We do change at training time, but we do nothing at test time. and this means you don’t have to do anything special with inference time. ’ TODO: find at forums what is inference time - Related to NVIDIA, GPU. Embedding dropout is just a dropout. It’s different between continuous variable and embedding layer. TODO Still can’t understand. why embedding dropout is effective. or,… in need. Let’s delete at random, some of the results of the embedding. and It worked well especially at Kaggle Batch Normalization: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift -> came out false! According to How Does Batch Normalization Help Optimization? The key was multiplicative bias {\gamma} and additive bias {\beta}` Explain Let $$ \hat{y} = f(w_1, w_2, w_3, … , x)} $$ , loss = MSE , Then y_range should be between 1 and 5` And Activation function ends with -1 -> +1 To mitigate this problem, we can add the other parameter, like $$w_n$$ But there’re so much interactions in the process so just re-scale the output. Momentum parameter at BatchNorm1d: Different from momentum like in optimization. This momentum is Exponentially weighted moving average of the mean, instead of deviation. If this is small number: mean standard deviation would be less from mini_batch to mini_batch » less regularization effect. (If this is large number, variation would be greater from mini_batch to mini_batch » more regularization effect) TODO: can’t sure, but i understand, this is not about how to update parameter but about how much reflect previous value when scale and shift Q. Preference between batchnorm and the other regularizations(drop out, weight decay)A. Nope, always try and see the results## lesson6-pets-more### Data Augmentation- Last reg- `get_transforms` has lots of params (even not yet learned all) -> check documentation - Remember you can implement all the doc contents bc it's made from nbdev - TODO: try this!!- Essence of data augmentation is you should maintain the label, while somewhat making sense. - ex) tilt, because it's optically sensible, you can always change the angle of the data view. - zeros, border, and reflection but always `reflection` works most of the time, so that is the default### Convolutional Kernel(What is convolution?)- Will make heat\_map from scratch, which means the parts convolution focuses on![setosa_visualization]()- http://setosa. io/ev/image-kernels/ - javascript thing - How convolution works - Kernel. which does element-wise multiplication, and sum them up - so it has on pixel less at borders -> so it uses padding, and fastai uses reflection as said. - why this Kernel(matrix) helps catching horizontal edge side? - because this kernel`(picture2)` weights differently, depends on `x axis` - why familiar, because it's similar intuition with fugus`(paper)` paper- CNN from different viewpoints`link` - output of pixel is results from different linear equations. - If you connect this with represents of neural network nodes, you can see that the specific inp nodes connected with specific out nodes. - **Summarize**: cnn does 1) matmul some of the elements are always zero 2) same weight for every row, which is called `weight time? weight. . ?, 1:18:50` `(picture)`#### Further lowdown- Because generally image has 3 channels, we need rank 3 kernel. - And **do multiply with all channel output is one pixel**. (`draw by your self`) - but this kernel will catch one feature, like horizontal, so that we make more kernel so that output becomes (h * w * kernel) - And that `kernel` come to `channel`- **Conv2d**: with 3 by 3 kernel, stride 2 conv -> (h/2 * w/2 * kernel) - skip or jump over input pixel - to protect from memory out of control~~~pythonlearn. modellearn. summary()~~~TODO: understand yourself the blocks of conv-kernel: - Usually use big kernel size at first layer (will study this at part2)- Bottom right highlighting kernel(`pic / draw`)- `torch. tensor. expand`: for memory efficient, because we should do RGB- We do not make separate kernel, but make rank 4 kernel - 4d tensor is just stacked kernel- `t[None]. shape` create new unit axis, and why? we make this -> it should move unit of batch, not one size image. ### Average pooling, feature- suppose our pre-trained model results in size of `11 by 11 by 512 ` `pic 4` and my classification task has 37 classes * take the first face of channel, which is 11 by 11 and `mean` it, so that make rank 2 tensor, 512 by 1 * and make 2d matrix, which is 512 by 37 and multiply so that we can get 37 by 1 matrix. - Feature, at convolution block - So, when we transfer-learning without unfreeze, every element of last matrix (512 by 1) should represent(or could catch) each feature. ### Heatmap, Hook~~~hook_output(model[0]) -> acts -> avg_acts~~~- if we average the block with `axis=feature`, result of matrix(11 by 11) depicts `how activated was that area?` -> it is heatmap, `avg_acts`- and acts comes from hook, which is more advanced pytorch feature. - hook into pytorch machine itself, and run any arbitrary Pytorch code - Why this is cool?: Normally it gives set of outputs of forward pass, but we can interrupt and hook the forward pass. - Also can store the output of the convolutional part of the model, which is before avg_pooling- Thinking back when we do cut off `after` the conv part. - but with fast. ai the original convolutional part of the model would be *the first thing in the model*, specifically could be given from `learn. model. eval()[0]` - And this is gotten from `hooked_output` and having hooked the output, we can pass our x_minibatch to output. - Not directly, but with normalized, minibatch, put on to the gpu - `one_item()` function do it, when we have one data `TODO: this is assignment` do it yourself without one_item function - and `. cuda()` put it on gpu- you should print out very often the shape of tensor, and try think why. "
+ }, {
+ "id": 13,
+ "url": "http://localhost:4000/2020/04/qna-image-segmentation/",
+ "title": "[Q&A] Image Segmentation, using Unet with Driving Video data",
+ "body": "2020/04/02 - This post is about my questions while I was studying USF Deep Learning course about image segmentation task. All the answers are from the course, source code, library document, or document. I cared about being clear at reporting information including source of information, however if there are still anything unclear, please contact me. And thank you Jeremy&Rachael for everything. Also Thank you Cambridge Computer Vision Lab to made us to study with your labor. The Cambridge-driving Labeled Video Database (CamVid) is the first collection of videos with object class semantic labels, complete with metadata. The database provides ground truth labels that associate each pixel with one of 32 semantic classes. If someone is interested in this project, please check the site and see the details. Now, let’s start first using jupyter’s one of tricks which I love most. It enables cell to print the code without print function. from IPython. core. interactiveshell import InteractiveShell# pretty print all cell's output and not just the last oneInteractiveShell. ast_node_interactivity = all from fastai. vision import *from fastai. callbacks. hooks import *from fastai. utils. mem import *path = untar_data(URLs. CAMVID) # The locations where the data and models are downloaded are set in config. ymlpath. ls() I’m trying to accustomed to using pathlib module, not just it became built-in module in python, but I felt uncomfortable myself with os module. However, still unpredictable conflicts are remain, even in the quite standard library like Pytorch, tensorflow, onnx. (it require me string for path. not PosixPath. will send PR. . ) [PosixPath('/root/. fastai/data/camvid/valid. txt'), PosixPath('/root/. fastai/data/camvid/images'), PosixPath('/root/. fastai/data/camvid/labels'), PosixPath('/root/. fastai/data/camvid/codes. txt')]path_img = path/'images'path_lbl = path/'labels'fnames = get_image_files(path_img) #filenamelbl_names = get_image_files(path_lbl)1. (Play with data) My Hypothesis: File name has A_B format. and A / B would be at key-value position. Use collections - defaultdict Default Dict: Link: easy to group a sequence of key and value pairs into a dictionary of list?from collections import defaultdictfnames[0], lbl_names[0](PosixPath('/root/. fastai/data/camvid/images/0001TP_009210. png'), PosixPath('/root/. fastai/data/camvid/labels/0016E5_01800_P. png'))files = [tuple(i. stem. split('_')) for i in fnames]labels = [tuple(i. stem. split('_')[:-1]) for i in lbl_names]d = defaultdict(list)for k, v in files: d[k]. append(v)d. keys()len(d['0001TP'])124for k, v in d. items(): print(k, v)0001TP ['009210', '008850', '007350', '008970', '009840', '010140', '008490', '008520', '009540', '008250', '008340', '006840', '007860', '007410', '007740', '009870', '010080', '007890', '008790', '010020', '008400', '007080', '008280', '010380', '009330', '009060', '007470', '006810', '009720', '008580', '007110', '008730', '009150', '007680', '009780', '007800', '007290', '008760', '009510', '008640', '008310', '007440', '006900', '007500', '008460', '009030', '008130', '009480', '009900', '010230', '009270', '008040', '007590', '007950', '009990', '008550', '007260', '008100', '007530', '006960', '008190', '009420', '009930', '009000', '007830', '008940', '006690', '009570', '008880', '010170', '007560', '009300', '006750', '009360', '010200', '007320', '008010', '009120', '007620', '007200', '007140', '010320', '006720', '008670', '007230', '008370', '010260', '009690', '006930', '009090', '007770', '010290', '010350', '008610', '008070', '009600', '008430', '009450', '007380', '009240', '007710', '007170', '008160', '008910', '007020', '006780', '007050', '009960', '009810', '008220', '009180', '009750', '010050', '009660', '010110', '007920', '009630', '007650', '006990', '008700', '009390', '007980', '008820', '006870']0016E5 ['01290', '08159', '05760', '08133', '08063', '06660', '00960', '05850', '00750', '06960', '08035', '08107', '07975', '08017', '05610', '07140', '08119', '08027', '07170', '08400', '08093', '02100', '06390', '04470', '08340', '06060', '00600', '07470', '08151', '07800', '01620', '05730', '01530', '00690', '08430', '05940', '01980', '07320', '08069', '07965', '04380', '05430', '01410', '06780', '08007', '08087', '08079', '06600', '08109', '05490', '00901', '04590', '04680', '08045', '01770', '06690', '08085', '06810', '00420', '08011', '07440', '02190', '06300', '04800', '01500', '00450', '08029', '01470', '06330', '07997', '08067', '05370', '08013', '08190', '00840', '02370', '08049', '08135', '01440', '06870', '05820', '05280', '08051', '04440', '08091', '01380', '00630', '07290', '05520', '04770', '00540', '07995', '07999', '05550', '07920', '08101', '08141', '08053', '04620', '08103', '05160', '07350', '08057', '06030', '06000', '08550', '07963', '08089', '05970', '08047', '05640', '06240', '05220', '04350', '01590', '07959', '01950', '08117', '06180', '01560', '05400', '08043', '07680', '00780', '08081', '07050', '01020', '01350', '04530', '06720', '07969', '08149', '08003', '08131', '08129', '08033', '05460', '01650', '07530', '08023', '05340', '08640', '05100', '08075', '01230', '04980', '02070', '01080', '06210', '05910', '08009', '01800', '05190', '02400', '08083', '08019', '07620', '07200', '07890', '08059', '06990', '04410', '08121', '08123', '06930', '08137', '08147', '08095', '06570', '06150', '08153', '06840', '05250', '00510', '08370', '08580', '08113', '07410', '08097', '01200', '04950', '07770', '07650', '04710', '06090', '08055', '07110', '07981', '00990', '08250', '08127', '01920', '07985', '08220', '08005', '08157', '05130', '08071', '01140', '04830', '07740', '08143', '06120', '02040', '08111', '08115', '00660', '08280', '06420', '07983', '02220', '05700', '01860', '01260', '04920', '06510', '07020', '08073', '08105', '08125', '06360', '07860', '07993', '00810', '06540', '08099', '08139', '02010', '07973', '08155', '07991', '06630', '00480', '06750', '04890', '08001', '08025', '00870', '08490', '01830', '07977', '05010', '01170', '07961', '01680', '01050', '07987', '07080', '04560', '00930', '05310', '02340', '05790', '08460', '00720', '08031', '02280', '08039', '08037', '08065', '06270', '08077', '06900', '04650', '06480', '07230', '08041', '06450', '00570', '07989', '04740', '07979', '02250', '07380', '00390', '01710', '07590', '08021', '08520', '07500', '01110', '04500', '02310', '07971', '02130', '05580', '05880', '08610', '08310', '08145', '05670', '04860', '07260', '08015', '07967', '01740', '01320', '07560', '07830', '01890', '08061', '02160', '07710', '05070', '05040']Seq05VD ['f00030', 'f02550', 'f03450', 'f01110', 'f00480', 'f00210', 'f04590', 'f04170', 'f01800', 'f03990', 'f03360', 'f03900', 'f02070', 'f00810', 'f03690', 'f01350', 'f01530', 'f04980', 'f05100', 'f03060', 'f00900', 'f03870', 'f02460', 'f01470', 'f02370', 'f02820', 'f04080', 'f02760', 'f04860', 'f02250', 'f04200', 'f00270', 'f03720', 'f02850', 'f04410', 'f01200', 'f03090', 'f02010', 'f03930', 'f00090', 'f01650', 'f01890', 'f03840', 'f03030', 'f02130', 'f01230', 'f04110', 'f02520', 'f04140', 'f04020', 'f00060', 'f03420', 'f01560', 'f00120', 'f04290', 'f02340', 'f00300', 'f01380', 'f00870', 'f01860', 'f02970', 'f04560', 'f02730', 'f00330', 'f04530', 'f03780', 'f01770', 'f03390', 'f05040', 'f02430', 'f03330', 'f00660', 'f01740', 'f02100', 'f04800', 'f04050', 'f00510', 'f02790', 'f04350', 'f00690', 'f00540', 'f02490', 'f00960', 'f00930', 'f04230', 'f02880', 'f03600', 'f01020', 'f01500', 'f02400', 'f04830', 'f04470', 'f03300', 'f02670', 'f00450', 'f01980', 'f01170', 'f01620', 'f04500', 'f01080', 'f03180', 'f05070', 'f03150', 'f04950', 'f01440', 'f03510', 'f01710', 'f00360', 'f04770', 'f02910', 'f01050', 'f00630', 'f04320', 'f00570', 'f03240', 'f02190', 'f01140', 'f03540', 'f02220', 'f02640', 'f03960', 'f00000', 'f04920', 'f01950', 'f00990', 'f03480', 'f03000', 'f00420', 'f04620', 'f03210', 'f00780', 'f03570', 'f01590', 'f00750', 'f01920', 'f04650', 'f03750', 'f03630', 'f02310', 'f02610', 'f02580', 'f04740', 'f02280', 'f04680', 'f00390', 'f00720', 'f03660', 'f02040', 'f03270', 'f00180', 'f03810', 'f01410', 'f01290', 'f03120', 'f00840', 'f04440', 'f00150', 'f01260', 'f02700', 'f02940', 'f00600', 'f01830', 'f04260', 'f05010', 'f04890', 'f02160', 'f00240', 'f04380', 'f01680', 'f04710', 'f01320']0006R0 ['f02820', 'f03690', 'f03180', 'f02550', 'f01020', 'f03660', 'f02340', 'f01170', 'f02610', 'f02940', 'f01290', 'f02100', 'f01350', 'f03270', 'f03870', 'f01380', 'f01980', 'f03810', 'f02430', 'f02310', 'f01830', 'f03480', 'f02970', 'f01890', 'f03210', 'f03930', 'f02040', 'f02070', 'f02400', 'f01560', 'f03030', 'f01770', 'f01590', 'f01950', 'f03420', 'f01650', 'f03450', 'f00990', 'f03630', 'f01500', 'f03570', 'f00930', 'f03090', 'f03360', 'f02880', 'f02460', 'f01440', 'f01920', 'f01230', 'f03840', 'f02730', 'f01620', 'f02220', 'f03750', 'f03330', 'f03540', 'f02520', 'f02790', 'f01050', 'f03120', 'f01800', 'f01140', 'f01860', 'f01530', 'f01470', 'f02670', 'f02490', 'f01260', 'f01110', 'f02760', 'f01680', 'f03150', 'f02580', 'f03300', 'f02280', 'f01200', 'f03390', 'f03510', 'f02640', 'f02190', 'f02370', 'f01320', 'f02130', 'f03600', 'f03240', 'f03780', 'f03720', 'f02700', 'f01410', 'f01080', 'f02850', 'f01710', 'f03900', 'f03060', 'f01740', 'f02010', 'f02250', 'f00960', 'f03000', 'f02160', 'f02910']for k, v in d. items(): print(k, len(d[k]))0001TP 1240016E5 305Seq05VD 1710006R0 101for i in d2. keys(): print(i,len(d2[i]))0016E5 3050001TP 1240006R0 101Seq05VD 171files[0], labels[0](('0001TP', '009210'), ('0016E5', '01800'))2. My question: Link: Why do we need masking? and does color from fastai library? (have to look into source code) What do the parameter alpha do? When people make masked img, would it be have ranged integer limit? Does image normalization related with this?lbl_sorted = sorted(lbl_names)f_sorted = sorted(fnames)lbl_1 = lbl_sorted[33]f_1 = f_sorted[33]img = open_image(lbl_1)mask = open_mask(lbl_1)_,axs = plt. subplots(1,2, figsize=(10,5))# img. show(ax=axs[0], y=mask, title='masked')img. show(ax=axs[0], title='1')mask. show(ax=axs[1], title='2', alpha=1. ) img_2 = open_image(f_1)mask_2 = open_mask(f_1)_,axs = plt. subplots(1,2, figsize=(10,5))# img. show(ax=axs[0], y=mask, title='masked')img_2. show(ax=axs[0], title='3',)mask_2. show(ax=axs[1], title='4', alpha=1. ) open_mask(lbl_1). data. shapetorch. Size([1, 720, 960])open_mask(lbl_1). data. shapetorch. Size([1, 720, 960])open_image(f_1). data. shapetorch. Size([3, 720, 960])open_image(f_1). data. shapetorch. Size([3, 720, 960])img. data #labeled datatensor([[[0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], . . . , [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176]], [[0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], . . . , [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176]], [[0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], . . . , [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176]]])mask. data # after mask, labeled datatensor([[[ 4, 4, 4, . . . , 21, 21, 21], [ 4, 4, 4, . . . , 21, 21, 21], [ 4, 4, 4, . . . , 21, 21, 21], . . . , [17, 17, 17, . . . , 30, 30, 30], [17, 17, 17, . . . , 30, 30, 30], [17, 17, 17, . . . , 30, 30, 30]]])img_2. data, mask_2. data(tensor([[[0. 0706, 0. 0667, 0. 0706, . . . , 0. 6431, 0. 6549, 0. 6627], [0. 0745, 0. 0706, 0. 0706, . . . , 0. 6431, 0. 6510, 0. 6549], [0. 0784, 0. 0706, 0. 0745, . . . , 0. 6392, 0. 6588, 0. 6588], . . . , [0. 0863, 0. 0824, 0. 0824, . . . , 0. 1333, 0. 1216, 0. 1255], [0. 0902, 0. 0863, 0. 0824, . . . , 0. 1255, 0. 1176, 0. 1216], [0. 0863, 0. 0824, 0. 0784, . . . , 0. 1137, 0. 1059, 0. 1137]], [[0. 0706, 0. 0667, 0. 0706, . . . , 0. 7490, 0. 7608, 0. 7686], [0. 0745, 0. 0706, 0. 0706, . . . , 0. 7451, 0. 7569, 0. 7608], [0. 0784, 0. 0706, 0. 0745, . . . , 0. 7412, 0. 7529, 0. 7529], . . . , [0. 0980, 0. 0941, 0. 0941, . . . , 0. 1804, 0. 1686, 0. 1725], [0. 1059, 0. 1020, 0. 0980, . . . , 0. 1725, 0. 1647, 0. 1686], [0. 1020, 0. 0980, 0. 0941, . . . , 0. 1608, 0. 1529, 0. 1608]], [[0. 0784, 0. 0745, 0. 0784, . . . , 0. 7569, 0. 7686, 0. 7765], [0. 0824, 0. 0784, 0. 0784, . . . , 0. 7647, 0. 7647, 0. 7686], [0. 0784, 0. 0706, 0. 0745, . . . , 0. 7608, 0. 7647, 0. 7647], . . . , [0. 1216, 0. 1176, 0. 1176, . . . , 0. 2000, 0. 1882, 0. 1922], [0. 1176, 0. 1137, 0. 1098, . . . , 0. 1843, 0. 1765, 0. 1804], [0. 1137, 0. 1098, 0. 1059, . . . , 0. 1725, 0. 1647, 0. 1725]]]), tensor([[[ 18, 17, 18, . . . , 183, 186, 188], [ 19, 18, 18, . . . , 183, 185, 186], [ 20, 18, 19, . . . , 182, 185, 185], . . . , [ 25, 24, 24, . . . , 43, 40, 41], [ 26, 25, 24, . . . , 41, 39, 40], [ 25, 24, 23, . . . , 38, 36, 38]]]))3. What is a difference between image and imageSegment?: imageSegment An ImageSegment object has the same properties as an Image. The only difference is that when applying the transformations to an ImageSegment, it will ignore the functions that deal with lighting and keep values of 0 and 1. It’s easy to show the segmentation mask over the associated Image by using the y argument of show_image. img = open_image(fnames[0])mask = open_mask(lbl_names[0])_,axs = plt. subplots(1,3, figsize=(8,4))img. show(ax=axs[0], title='no mask')img. show(ax=axs[1], y=mask, title='masked') #seg mask over the img using y argmask. show(ax=axs[2], title='mask only', alpha=1. ) vision. image ##4. Why/How img div by 255 and how it results fast. ai : vision. image - If div=True, pixel values are divided by 255. to become floats between 0. and 1. At times, you want to get rid of distortions caused by lights and shadows in an image. Normalizing the RGB values of an image can at times be a simple and effective way of achieving this. So sum of the pixel’s value over all channels(which is S) divides each intensified channel so that nomalized value will be R/S, G/S and B/S (where, S=R+G+B). Detailed explain here4. Python Evaluation Order: Python evaluates expressions from left to right. Notice that while evaluating an assignment, the right-hand side is evaluated before the left-hand side. mask_tmp, trg_tmp, void_tmp = 2, 1, 10mask_tmp = trg_tmp != void_tmpprint(mask_tmp, trg_tmp, void_tmp) # (1) target is not same with voidTrue 1 10# Example 1x = 1y = 2x,y = y,x+yx, y(2, 3)# Example 2x = 1y = 2x = yy = x+yx, y(2, 4)5. model learner parameter :: pct_start: A: Percentage of total number of epochs when learning rate rises during one cycle. Q: Sorry, I still confused that one cycle in the new API only runs one epoch. How the percentage of total number of epochs works? Can you give a example? If learn. fit_one_cycle(10, slice(1e-4,1e-3,1e-2), pct_start=0. 05)??A: Ok, strictly correct answer would be percentage of iterations, so you can have lr both increase and decrease during same epoch. In your example, say, you have 100 iterations per epoch, then for half an epoch (0. 05 * (10 * 100) = 50) lr will rise, then slowly decrease. Q2: Thanks for this explanation … so essentially, it is the percentage of overall iterations where the LR is increasing, correct? So, given the default of 0. 3, it means that your LR is going up for 30% of your iterations and then decreasing over the last 70%. Is that a correct summation of what is happening? A2: Yes, I think that’s correct. You can verify that by changing its value and check:learn. recorder. plot_lr() For example if pct_start = 0. 2 source: forums. fastai "
+ }, {
+ "id": 14,
"url": "http://localhost:4000/2020/03/note08-fastai-4/",
"title": "Gradient backward, Chain Rule, Refactoring",
- "body": "2020/03/02 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring” Lecture 08 - Deep Learning From Foundations-part2 “ Homework: calculus for machine learning einsum conventionCONTENTS: Foundation version Gradients backward pass decompose function chain rule with code check the result using Pytorch autograd Refactor model Layers as classes Modue. forward() Without einsum nn. Linear and nn. Module Forward process Foundation version: Gradients backward pass: Gradients is output with respect to parameter we’ve done this work in this path(below) to simplify this calculus, we can just change it into, So, you should know of the derivative of each bit on its own, and then you multiply them all together. As a result, it would be over cross over the data. So you can get gradient, output with respect to parameter What order should we calculate? BTW, why Jeremy wrote , not Loss function?1 decompose function We want to get derivative of which forms But, we have a estimation of answer (we call it y hat) now So, I will decompose funciton to trace target variable. Using the above forward pass, we can suppose some function from the end. start from , We know MSE funciton got two parameters, output, and target . from MSE’s input we know function’s output and supposing v is input of that function, similarly, v became output of chain rule with code examplify backward process by random sampling To get a variable, I modified forward model a little def model_ping(out = 'x_train'): l1 = lin(x_train, w1, b1) # one linear layer l2 = relu(l1) # one relu layer l3 = lin(l2, w2, b2) # one more linear layer return eval(out) Be careful we don’t use mse_loss in backward process1) start with the very last function, which is loss funciton. MSE If we codify this formula,def mse_grad(inp, targ): #mse_input(1000,1), mse_targ (1000,1) # grad of loss with respect to output of previous layer inp. g = 2. * (inp. squeeze() - targ). unsqueeze(-1) / inp. shape[0] And, this can be examplified like below. Notice that input of gradient function is same with forward functiony_hat = model_ping('l3') #get value from forward modely_hat. g = ((y_hat. squeeze(-1)-y_train). unsqueeze(-1))/y_hat. shape[0]y_hat. g. shape>>> torch. Size([50000, 1]) We can just calculate using broadcasting, not using squeeze. then why should do and unsqueeze again?🎯 It’s related with random access memory(RAM). . If I don’t squeeze, (I’m using colab) it out of RAM. 2) Derivative of linear2 function This process’s weight dimensions defined by axis=1, axis=2. axis=0 dimension means size of data. This will be summazed by . sum(0) method. unsqeeze(-1)&unsqeeze(1) seperates the dimension, and make a dot product, and vanish axis=0 dimension. def lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowlin2 = model_ping('l2'); #get value from forward modellin2. g = y_hat. g@w2. t(); w2. g = (lin2. unsqueeze(-1) * y_hat. g. unsqueeze(1)). sum(0);b2. g = y_hat. g. sum(0);lin2. g. shape, w2. g. shape, b2. g. shape>>> torch. Size([50000, 50])torch. Size([50, 1])torch. Size([1]) Notice going reverse order, we’re passing in gradient backward3) derivative of ReLU def relu_grad(inp, out): # grad of relu with respect to input activations inp. g = (inp>0). float() * out. g Examplified belowlin1=model_ping('l1') #get value from forward modellin1. g = (lin1>0). float() * lin2. g;lin1. g. shape>>> torch. Size([50000, 50])4) Derivative of linear1 Same process with 2) but, this process’s weight hasdef lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowx_train. g = lin1. g @ w1. t(); w1. g = (x_train. unsqueeze(-1) * lin1. g. unsqueeze(1)). sum(0); b1. g = lin1. g. sum(0);x_train. g. shape, w1. g. shape, b1. g. shape>>> torch. Size([50000, 784])torch. Size([784, 50])torch. Size([50])5) Then it goes backward pass def forward_and_backward(inp, targ): # forward pass: l1 = inp @ w1 + b1 l2 = relu(l1) out = l2 @ w2 + b2 # we don't actually need the loss in backward! loss = mse(out, targ) # backward pass: mse_grad(out, targ) lin_grad(l2, out, w2, b2) relu_grad(l1, l2) lin_grad(inp, l1, w1, b1)Version 1 (Basic)- Wall time: 1. 95 s Summary Notice that output of function at forward pass became input of backward pass backpropagation is just the chain rule value loss (loss=mse(out,targ)) is not used in gradient calcuation. Because, it doesn’t appear with the weight. w1g, w2g, b1g, b2g, ig will be used for optimizercheck the result using Pytorch autograd require_grad_ is the magical function, which can automatic differentiation. 2 This magical auto gradified tensor keep track what happend in forward (taking loss function), and do the backward3 So it saves our time to differentiate ourselves ⤵️ THis is benchmark…. . Version 2 (torch autograd)- Wall time: 3. 81 µs Refactor model: Amazingly, just refactoring our main pieces, it comes down up to Pytorch package. 🌟 Implement yourself, Practice, practice, practice! 🌟 Layers as classes: Relu and Linear are layers in oue neural net. -> make it as classes For the forward, using __call__ for the both of forward & backward. Because ‘call’ means we treat this as a function. class Lin(): def __init__(self, w, b): self. w,self. b = w,b def __call__(self, inp): self. inp = inp self. out = inp@self. w + self. b return self. out def backward(self): self. inp. g = self. out. g @ self. w. t() # Creating a giant outer product, just to sum it, is inefficient! self. w. g = (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) self. b. g = self. out. g. sum(0) Remember that in lin_grad function, we save bias&weight!!!!!💬 inp. g : gradient of the output with respect to the input. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 w. g : gradient of the output with respect to the weight. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 b. g : gradient of the output with respect to the bias. {: style=”color:grey; font-size: 90%; text-align: center;”} class Model(): def __init__(self, w1, b1, w2, b2): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ) def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() refer to Jeremy’s Model class, he put layers in list Dionne’s self-study note: Decomposing Jeremy’s Model class init needs weight, bias but not x data when call that class(a. k. a function) it gave x data and y label! jeremy composited function in layers. x = l(x) so concise…. . also utilized that layer list when backward ust reversing it (using python list’s method) And he is recursively calling the function on the result of the previous thing. ⬇️for l in self. layers: x = l(x)Q2: Don’t I need to declare magical autograd function, requires_grad_?{: style=”color:red; font-size: 130%; text-align: center;”} [The questions migrated to this article] Version 3 (refactoring - layer to class)- Wall time: 5. 25 µs Modue. forward(): Duplicate code makes execution time slow. Role of __call__ changed. No more __call__ for implementing forward pass. By initializing the forward with __call__, Module. forward() use overriding to maximize reusability. So any layer inherit Module, can use parent’s function. gradient of the output with respect to the weight (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) can be reexpressed using einsum, torch. einsum( bi,bj->ij , inp, out. g) Defining forward and Module enables Pytorch to out almost duplicatesVersion 4 (Module & einsum)- Wall time: 4. 29 µs Q2: Isn’t there any way to use broadcasting? Why we should use outer product?{: style=”color:red; font-size: 130%; text-align: center;”} Without einsum: Replacing einsum to matrix product is even more faster. torch. einsum( bi,bj->ij , inp, out. g)can be reexpressed using matrix product, inp. t() @ out. gVersion 5 (without einsum)- Wall time: 3. 81 µs nn. Linear and nn. Module: Torch’s package nn. Linear and nn. Module Version 6 (torch package)- Wall time: 5. 01 µs Final, Using torch. nn. Linear & torch. nn. Module~~~pythonclass Model(nn. Module): def init(self, n_in, nh, n_out): super(). init() self. layers = [nn. Linear(n_in,nh), nn. ReLU(), nn. Linear(nh,n_out)] self. loss = mse def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x. squeeze(), targ)class Model(): def init(self): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ)def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() ~~~ Footnote: fast. ai forums Lesson-8 ↩ pytorch docs - autograd ↩ stackoverflow - finding methods a object has ↩ "
+ "body": "2020/03/02 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring ” Lecture 08 - Deep Learning From Foundations-part2 “ Homework: calculus for machine learning einsum conventionCONTENTS: Foundation version Gradients backward pass decompose function chain rule with code check the result using Pytorch autograd Refactor model Layers as classes Modue. forward() Without einsum nn. Linear and nn. Module Forward process Foundation version: Gradients backward pass: Gradients is output with respect to parameter we’ve done this work in this path(below) to simplify this calculus, we can just change it into, So, you should know of the derivative of each bit on its own, and then you multiply them all together. As a result, it would be over cross over the data. So you can get gradient, output with respect to parameter What order should we calculate? BTW, why Jeremy wrote , not Loss function?1 decompose function We want to get derivative of which forms But, we have a estimation of answer (we call it y hat) now So, I will decompose funciton to trace target variable. Using the above forward pass, we can suppose some function from the end. start from , We know MSE funciton got two parameters, output, and target . from MSE’s input we know function’s output and supposing v is input of that function, similarly, v became output of chain rule with code examplify backward process by random sampling To get a variable, I modified forward model a little def model_ping(out = 'x_train'): l1 = lin(x_train, w1, b1) # one linear layer l2 = relu(l1) # one relu layer l3 = lin(l2, w2, b2) # one more linear layer return eval(out) Be careful we don’t use mse_loss in backward process1) start with the very last function, which is loss funciton. MSE If we codify this formula,def mse_grad(inp, targ): #mse_input(1000,1), mse_targ (1000,1) # grad of loss with respect to output of previous layer inp. g = 2. * (inp. squeeze() - targ). unsqueeze(-1) / inp. shape[0] And, this can be examplified like below. Notice that input of gradient function is same with forward functiony_hat = model_ping('l3') #get value from forward modely_hat. g = ((y_hat. squeeze(-1)-y_train). unsqueeze(-1))/y_hat. shape[0]y_hat. g. shape>>> torch. Size([50000, 1]) We can just calculate using broadcasting, not using squeeze. then why should do and unsqueeze again?🎯 It’s related with random access memory(RAM). . If I don’t squeeze, (I’m using colab) it out of RAM. 2) Derivative of linear2 function This process’s weight dimensions defined by axis=1, axis=2. axis=0 dimension means size of data. This will be summazed by . sum(0) method. unsqeeze(-1)&unsqeeze(1) seperates the dimension, and make a dot product, and vanish axis=0 dimension. def lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowlin2 = model_ping('l2'); #get value from forward modellin2. g = y_hat. g@w2. t(); w2. g = (lin2. unsqueeze(-1) * y_hat. g. unsqueeze(1)). sum(0);b2. g = y_hat. g. sum(0);lin2. g. shape, w2. g. shape, b2. g. shape>>> torch. Size([50000, 50])torch. Size([50, 1])torch. Size([1]) Notice going reverse order, we’re passing in gradient backward3) derivative of ReLU def relu_grad(inp, out): # grad of relu with respect to input activations inp. g = (inp>0). float() * out. g Examplified belowlin1=model_ping('l1') #get value from forward modellin1. g = (lin1>0). float() * lin2. g;lin1. g. shape>>> torch. Size([50000, 50])4) Derivative of linear1 Same process with 2) but, this process’s weight hasdef lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowx_train. g = lin1. g @ w1. t(); w1. g = (x_train. unsqueeze(-1) * lin1. g. unsqueeze(1)). sum(0); b1. g = lin1. g. sum(0);x_train. g. shape, w1. g. shape, b1. g. shape>>> torch. Size([50000, 784])torch. Size([784, 50])torch. Size([50])5) Then it goes backward pass def forward_and_backward(inp, targ): # forward pass: l1 = inp @ w1 + b1 l2 = relu(l1) out = l2 @ w2 + b2 # we don't actually need the loss in backward! loss = mse(out, targ) # backward pass: mse_grad(out, targ) lin_grad(l2, out, w2, b2) relu_grad(l1, l2) lin_grad(inp, l1, w1, b1)Version 1 (Basic)- Wall time: 1. 95 s Summary Notice that output of function at forward pass became input of backward pass backpropagation is just the chain rule value loss (loss=mse(out,targ)) is not used in gradient calcuation. Because, it doesn’t appear with the weight. w1g, w2g, b1g, b2g, ig will be used for optimizercheck the result using Pytorch autograd require_grad_ is the magical function, which can automatic differentiation. 2 This magical auto gradified tensor keep track what happend in forward (taking loss function), and do the backward3 So it saves our time to differentiate ourselves Postfix underscore means in pytorch, in-place function, What is in-place function?⤵️ THis is benchmark…. . Version 2 (torch autograd)- Wall time: 3. 81 µs Refactor model: Amazingly, just refactoring our main pieces, it comes down up to Pytorch package. 🌟 Implement yourself, Practice, practice, practice! 🌟 Layers as classes: Relu and Linear are layers in oue neural net. -> make it as classes For the forward, using __call__ for the both of forward & backward. Because ‘call’ means we treat this as a function. class Lin(): def __init__(self, w, b): self. w,self. b = w,b def __call__(self, inp): self. inp = inp self. out = inp@self. w + self. b return self. out def backward(self): self. inp. g = self. out. g @ self. w. t() # Creating a giant outer product, just to sum it, is inefficient! self. w. g = (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) self. b. g = self. out. g. sum(0) Remember that in lin_grad function, we save bias&weight!!!!!💬 inp. g : gradient of the output with respect to the input. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 w. g : gradient of the output with respect to the weight. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 b. g : gradient of the output with respect to the bias. {: style=”color:grey; font-size: 90%; text-align: center;”} class Model(): def __init__(self, w1, b1, w2, b2): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ) def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() refer to Jeremy’s Model class, he put layers in list Dionne’s self-study note: Decomposing Jeremy’s Model class init needs weight, bias but not x data when call that class(a. k. a function) it gave x data and y label! jeremy composited function in layers. x = l(x) so concise…. . also utilized that layer list when backward ust reversing it (using python list’s method) And he is recursively calling the function on the result of the previous thing. ⬇️for l in self. layers: x = l(x)Q2: Don’t I need to declare magical autograd function, requires_grad_?{: style=”color:red; font-size: 130%; text-align: center;”} [The questions migrated to this article] Version 3 (refactoring - layer to class)- Wall time: 5. 25 µs Modue. forward(): Duplicate code makes execution time slow. Role of __call__ changed. No more __call__ for implementing forward pass. By initializing the forward with __call__, Module. forward() use overriding to maximize reusability. So any layer inherit Module, can use parent’s function. gradient of the output with respect to the weight (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) can be reexpressed using einsum, torch. einsum( bi,bj->ij , inp, out. g) Defining forward and Module enables Pytorch to out almost duplicatesVersion 4 (Module & einsum)- Wall time: 4. 29 µs Q2: Isn’t there any way to use broadcasting? Why we should use outer product?{: style=”color:red; font-size: 130%; text-align: center;”} Without einsum: Replacing einsum to matrix product is even more faster. torch. einsum( bi,bj->ij , inp, out. g)can be reexpressed using matrix product, inp. t() @ out. gVersion 5 (without einsum)- Wall time: 3. 81 µs nn. Linear and nn. Module: Torch’s package nn. Linear and nn. Module Version 6 (torch package)- Wall time: 5. 01 µs Final, Using torch. nn. Linear & torch. nn. Module~~~pythonclass Model(nn. Module): def init(self, n_in, nh, n_out): super(). init() self. layers = [nn. Linear(n_in,nh), nn. ReLU(), nn. Linear(nh,n_out)] self. loss = mse def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x. squeeze(), targ)class Model(): def init(self): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ)def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() ~~~ Footnote: fast. ai forums Lesson-8 ↩ pytorch docs - autograd ↩ stackoverflow - finding methods a object has ↩ "
}, {
- "id": 13,
+ "id": 15,
"url": "http://localhost:4000/2020/03/note08-fastai-3/",
"title": "Implement forward&backward pass from scratch",
"body": "2020/03/01 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring1. The forward and backward passes: 1. 1 Normalization: train_mean,train_std = x_train. mean(),x_train. std()>>> train_mean,train_std(tensor(0. 1304), tensor(0. 3073))Remember! Dataset, which is x_train, mean and standard deviation is not 0&1. But we need them to be which means we should substract means and divide data by std. You should not standarlize validation set because training set and validation set should be aparted. after normalize, mean is close to zero, and standard deviation is close to 1. 1. 2 Variable definition: n,m: size of the training set c: the number of activations we need in our model2. Foundation Version: 2. 1 Basic architecture: Our model has one hidden layer, output to have 10 activations, used in cross entropy. But in process of building architecture, we will use mean square error, output to have 1 activations and lator change it to cross entropy number of hidden unit; 50see below pic We want to make w1&w2 mean and std be 0&1. why initializating and make mean zero and std one is important? paper highlighting importance of normalisation - training 10,000 layer network without regularisation1 2. 1. 1 simplified kaiming initQ: Why we did init, normalize with only validation data? Because we can not handle and get statistics from each value of x_valid?{: style=”color:red; font-size: 130%; text-align: center;”} what about hidden(first) layer?w1 = torch. randn(m,nh)b1 = torch. zeros(nh)t = lin(x_valid, w1, b1) # hidden>>> t. mean(), t. std()((tensor(2. 3191), tensor(27. 0303))In output(second) layer, w2 = torch. randn(nh,1)b2 = torch. zeros(1)t2 = lin(t, w2, b2) # output>>> t2. mean(), t2. std()(tensor(-58. 2665), tensor(170. 9717)) which is terribly far from normalzed value. But if we apply simplified kaiming init w1 = torch. randn(m,nh)/math. sqrt(m); b1 = torch. zeros(nh)w2 = torch. randn(nh,1)/math. sqrt(nh); b2 = torch. zeros(1)t = lin(x_valid, w1, b1)t. mean(),t. std()>>> (tensor(-0. 0516), tensor(0. 9354)) But, actually, we use activations not only linear function After applying activations relu at linear layer, mean and deviation became 0. 5. 2. 1. 2 Glorrot initializationPaper2: Understanding the difficulty of training deep feedforward neural networks Gaussian(, bell shaped, normal distributions) is not trained very well. How to initialize neural nets? with the size of layer , the number of filters . But there is No acount for import of ReLU If we got 1000 layers, vanishing gradients problem emerges2. 1. 3 Kaiming initializatingPaper3: Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Kaiming He, explained here rectifier: rectified linear unit rectifier network: neural network with rectifier linear units This is kaiming init, and why suddenly replace one to two on a top? to avoid vanishing gradient(weights) But it doesn’t give very nice mean tough. 2. 1. 4 Pytorch package Why fan_out? according to pytorch documentation, choosing 'fan_in' preserves the magnitude of the variance of the wights in the forward pass. choosing 'fan_out' preserves the magnitues in the backward pass(, which means matmul; with transposed matrix) ➡️ in the other words, torch use fan_out cz pytorch transpose in linear transformaton. What about CNN in Pytorch?I tried torch. nn. Conv2d. conv2d_forward?? Jeremy digged into using torch. nn. modules. conv. _ConvNd. reset_parameters?? 2 in Pytorch, it doesn’t seem to be implemented kaiming init in right formula. so we should use our own operation. But actually, this has been discussed in Pytorch community before. 3 4 Jeremy said it enhanced variance also, so I sampled 100 times and counted better results. To make sure the shape seems sensible. check with assert. (remember we will replace 1 to 10 in cross entropy)assert model(x_valid). shape==torch. Size([x_valid. shape[0],1])>>> model(x_valid). shape(10000, 1) We have made Relu, init, linear, it seems we can forward pass code we need for basic architecture nh = 50def lin(x, w, b): return x@w + b;w1 = torch. randn(m,nh)*math. sqrt(2. /m ); b1 = torch. zeros(nh)w2 = torch. randn(nh,1); b2 = torch. zeros(1)def relu(x): return x. clamp_min(0. ) - 0. 5t1 = relu(lin(x_valid, w1, b1))def model(xb): l1 = lin(xb, w1, b1) l2 = relu(l1) l3 = lin(l2, w2, b2) return l32. 2 Loss function: MSE: Mean squared error need unit vector, so we remove unit axis. def mse(output, targ): return (output. squeeze(-1) - targ). pow(2). mean() In python, in case you remove axis, you use ‘squeeze’, or add axis use ‘unsqueeze’ torch. squeeze where code commonly broken. so, when you use squeeze, clarify dimension axis you want to removetmp = torch. tensor([1,1])tmp. squeeze()>>> tensor([1, 1]) make sure to make as float when you calculateBut why??? because it is tensor?{: style=”color:red; font-size: 130%;”} Here’s the error when I don’t transform the data type ---------------------------------------------------------------------------TypeError Traceback (most recent call last)<ipython-input-22-ae6009bef8b4> in <module>()----> 1 y_train = get_data()[1] # call data again 2 mse(preds, y_train)TypeError: 'map' object is not subscriptable This is forward passFootnote: Other materials: Understanding the difficulty of training deep feedforward neural networks, paper that introduced Xavier initialization Fixup Initialization: Residual Learning Without Normalization ↩ Pytorch implementaion on Kaiming init of conv and linear layers ↩ Pytorch kaiming init issue ↩ Pytorch kaiming init explained ↩ "
}, {
- "id": 14,
+ "id": 16,
"url": "http://localhost:4000/2020/03/note08-fastai-2/",
"title": "What's inside Pytorch Operator?",
"body": "2020/03/01 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, RefactoringWhat’s inside Pytorch Operator?: Section02 Time comparison with pure Python: Matmul with broadcasting> 3194. 95 times faster Einstein summation> 16090. 91 times faster Pytorch’s operator> 49166. 67 times faster 1. Elementwise op: 1. 1 Frobenius norm: above converted into (m*m). sum(). sqrt() Plus, don’t suffer from mathmatical symbols. He also copy and paste that equations from wikipedia. and if you need latex form, download it from archive. 2. Elementwise Matmul: What is the meaning of elementwise? We do not calculate each component. But all of the component at once. Because, length of column of A and row of B are fixed. How much time we saved? So now that takes 1. 37ms. We have removed one line of code and it is a 178 times faster…#TODOI don’t know where the 5 from. but keep it. Maybe this is related with frobenius norm…?as a result, the code before for k in range(ac): c[i,j] += a[i,k] + b[k,j]the code after c[i,j] = (a[i,:] * b[:,j]). sum()To compare it (result betweet original and adjusted version) we use not test_eq but other function. The reason for this is that due to rounding errors from math operations, matrices may not be exactly the same. As a result, we want a function that will “is a equal to b within some tolerance” #exportdef near(a,b): return torch. allclose(a, b, rtol=1e-3, atol=1e-5)def test_near(a,b): test(a,b,near)test_near(t1, matmul(m1, m2))3. Broadcasting: Now, we will use the broadcasting and removec[i,j] = (a[i,:] * b[:,j]). sum() How it works?>>> a=tensor([[10,10,10], [20,20,20], [30,30,30]])>>> b=tensor([1,2,3,])>>> a,b (tensor([[10, 10, 10], [20, 20, 20], [30, 30, 30]]),tensor([1, 2, 3])) >>> a+btensor([[11, 12, 13], [21, 22, 23], [31, 32, 33]]) <Figure 2> demonstrated how array b is broadcasting(or copied but not occupy memory) to compatible with a. Refered from numpy_tutorial there is no loop, but it seems there is exactly the loop. This is not from jeremy (actually after a moment he cover it) but i wondered How to broadcast an array by columns? c=tensor([[1],[2],[3]])a+ctensor([[11, 11, 11], [22, 22, 22], [33, 33, 33]])s What is tensor. stride()?help(t. stride)Help on built-in function stride: stride(…) method of torch. Tensor instancestride(dim) -> tuple or intReturns the stride of :attr:’self’ tensor. Stride is the jump necessary to go from one element to the next one in the specified dimension :attr:’dim’. A tuple of all strides is returned when no argument is passed in. Otherwise, an integer value is returned as the stride in the particular dimension :attr:’dim’. Args: dim (int, optional): the desired dimension in which stride is requiredExample::* x = torch. tensor([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])`x. stride()>>> (5, 1)x. stride(0)>>> 5x. stride(-1)>>> 1 unsqueeze & None index We can manipulate rank of tensor Special value ‘None’, which means please squeeze a new axis here== please broadcast herec = torch. tensor([10,20,30])c[None,:] in c, squeeze a new axis in here please. 2. 2 Matmul with broadcasting: for i in range(ar):# c[i,j] = (a[i,:]). *[:,j]. sum() #previous c[i] = (a[i]. unsqueeze(-1) * b). sum(dim=0) And Using None also (As howard teached)c[i] = (a[i ]. unsqueeze(-1) * b). sum(dim=0) #howardc[i] = (a[i][:,None] * b). sum(dim=0) # using Nonec[i] = (a[i,:,None]*b). sum(dim=0)⭐️Tips🌟 1) Anytime there’s a trailinng(final) colon in numpy or pytorch you can delete it ex) c[i, :] = c [i]2) any number of colon commas at the start, you can switch it with the single elipsis. ex) c[:,:,:,:,i] = c […,i] 2. 3 Broadcasting Rules: What if we tensor. size([1,3]) * tensor. size([3,1])? torch. Size([3, 3]) What is scale???? What if they are one array is times of the other array? ex) Image : 256 x 256 x 3Scale : 128 x 256 x 3Result: ? Why I did not inserted axis via None, but happened broadcasting? >>> c * c[:,None]tensor([[100. , 200. , 300. ], [200. , 400. , 600. ], [300. , 600. , 900. ]])maybe it broadcast cz following array has 3 rows as same principle, no matter what nature shape was, if we do the operation tensor broadcasts to the other. >>> c==c[None]tensor([[True, True, True]])>>> c[None]==c[None,:]tensor([[True, True, True]])>>>c[None,:]==ctensor([[True, True, True]])3. Einstein summation: Creates batch-wise, remove inner most loop, and replaced it with an elementwise producta. k. ac[i,j] += a[i,k] * b[k,j]inner most loop c[i,j] = (a[i,:] * b[:,j]). sum()elementwise product Because K is repeated so we do a dot product. And it is torch. Usage of einsum()1) transpose2) diagnalisation tracing3) batch-wise (matmul) … einstein summation notationdef matmul(a,b): return torch. einsum('ik,kj->ij', a, b)so after all, we are now 16000 times faster than Python. 4. Pytorch op: 49166. 67 times faster than pure python And we will use this matrix multiplication in Fully Connect forward, with some initialized parameters and ReLU. But before that, we need initialized parameters and ReLU, Footnote: TensorRank ti noteResources: Frobenius Norm Review Broadcasting Review (especially Rule) Refer colab! (I totally confused with extension of arrays) torch. allclose Review np. einsum Reviewh "
}, {
- "id": 15,
+ "id": 17,
"url": "http://localhost:4000/2020/02/note08-fastai-1/",
"title": "What is the meaning of 'deep-learning from foundations?'",
"body": "2020/02/29 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring” Lecture 08 - Deep Learning From Foundations-part2 “ I don’t know if you read this article, but I heartily appreciate Rachael Thomas and Jeremy Howard for providing these priceless lectures for free Homework: Review concepts 16 concepts from Course 1 (lessons 1 - 7)(1) Affine Functions & non-linearities; 2) Parameters & activations; 3) Random initialization & transfer learning; 4) SGD, Momentum, Adam; 5) Convolutions; Batch-norm; 6) Dropout; 7) Data augmentation; 8) Weight decay; 9) Res/dense blocks; 10) Image classification and regression; 11)Embeddings; 12) Continuous & Categorical variables; 13) Collaborative filtering; 14) Language models; 15) NLP classification; 16) Segmentation; U-net; GANS) Make sure you understand broadcasting Read section 2. 2 in Delving Deep into Rectifiers Try to replicate as much of the notebooks as you can without peeking; when you get stuck, peek at the lesson notebook, but then close it and try to do it yourself calculus for machine learning based on weight… einsum conventionCONTENTS: What is going on in this course? What is ‘from foundations’? Steps to a basic modern CNN model Today’s implementation goal: 1) matmul -> 4) FC backward Library development using jupyter notebook jupyter notebook certainly can make module Elementwise ops How can we make python faster? What is element wise operation? FootnoteWhat is going on in this course?: What is ‘from foundations’?: 1) Recreate fast. ai and Pytorch 2) using pure python Evade OverfittingOverfit : validation error getting worsetraining loss < validation loss Know the name of the symbol you usefind in this page if you don’t know the symbol that you are using or just draw it here (run by ML!) Steps to a basic modern CNN model: 1) Matrix multiplication -> 2) Relu/Initialization -> 3) Fully-connected Forward-> 4) Fully-connected Backward -> 5) Train loop -> 6) Convolution-> 7) Optimization ->8) Batchnormalization -> 9) Resnet Today’s implementation goal: 1) matmul -> 4) FC backward: Library development using jupyter notebook: what is assers? jupyter notebook certainly can make module: There will be #export tag that Howard (and we) want to extract special notebook2script. py will detect sign of #expert and convert following into python module and test ittest\_eq(TEST,'test')test\_eq(TEST,'test1') what is run_notebook. py? when you want to test your module in command line interface !python run\_notebook. py 01_matmul. ipynb Is there any difference between 1) and 2)?1) test -> test01 2) test01 -> test #TODO I don’t know yet look into run_notebook. py, package fire Jeremy used. What is that?read and run the code in a notebook, and in the process, Jeremy made Python Fire library called!shockingly, fire takes any kind of function and converts into CLI command. fire library was released by Google open source, Thursday, March 2, 2017 Get data pytorch and numpy are pretty much same. variable c explains how many pixels there are in in MNIST, 28 pixels PyTorch’s view() method: torch function that manipulating tensor, and squeeze() in torch & mathmatical operation similar function Rao & McMahan said usually this functions result in feature vector. In part 1, you can use view function several times. Initial python model Which is Linear, like $Xw$(weight)$+a$(bias) $= Y$ If you don’t know hou to multiple matrix, refer this site matmul visulization site How many time spends if we we use pure python function matmul, typical matrix multiplication function, takes about 1 second for calculating 1 single train data! (maybe assumed stochastic, 5 data points in validation) it takes about 11. 36 hours to update parameters even single layer and 1 iteration! (if that was my computer, it would be 14 hours. . )🤪 THIS is why we need to consider ‘time’&’space’ This is kinda slow - what if we could speed it up by 50,000 times? Let’s try! Elementwise ops: How can we make python faster?: If we want to calculate faster, then do remove pythonic calcuation, by passing its computation down to something that is written something other than python, like pytorch. According to PyTorch doc it uses C++ (via ATen), so we are going to implement that function with python. What is element wise operation?: items makes a pair, operate corresponding componentFootnote: notebooks material video broadcasting excel"
}, {
- "id": 16,
+ "id": 18,
"url": "http://localhost:4000/2020/02/what-is-convolution/",
"title": "Digging into convolution",
"body": "2020/02/28 - Issues 1) Kaiming Initializtion in Pytorch was in trouble. 1 2) Jeremy started to dig in, in lesson09, but I didn’t know why the size of tensor is 2 and even understand this spreadsheet data. 3 Homework: Read Visualizing and Understanding Convolutional Networks paper What is a convolution? Visualization one kernel Matthew D Zeiler & Rob Fergus Paper Convolution can be represented as matmul Padding Kernel has rank 3 How can we find a side-edge, a gradient and area of constant weight? What is a convolution?: A convolutional neural network is that your red, green, and blue pixels go into the simple computation, and something comes out of that, and then the result of that goes into a second layer, and the result of that goes into the third layer and so forth. Visualization: one kernel Refer this site for visualizing CNN filteringMatthew D Zeiler & Rob Fergus PaperLecture01 Nine examples of the actual coefficients from the **first layer** Convolution can be represented as matmul: CNNs from different viewpoints {align-items: center;} [A B C D E F G H I J] is 3 by 3 image data flatten to vector. As a result, convolution is a just matrix just two things happens Some of entries are set to zeros at all the times same color always have the same weight. That called weight time / wegith sharing So, we can implement a convolution with matrix multiplication. But, we don’t do that because it’s slow!Padding: What most of libraries do is just put zeros asdie of matrix fast. ai uses reflection paddings (what is this? Jeremy said he uttered it)Kernel has rank 3: As standard picture input would be 4 5, it would be actually 3d, not 2d. If we make kernel as a 3x3 size, we pass over same kernel all the different Red, Green, Blue Pixels. This could make problem, because, if we want to detect frog, which is green, we would want more activations on the green(I made a test cell in my colab 6) How can we find a side-edge, a gradient and area of constant weight?: Not top-edge! One kernel can find only the top-edge, so we should stack the kernels 7 So, we pass it through bunch of kernels to the input images, and that process gives us height x width x corresponding number of kernels. Usually that number of chanel is 16 And if we want to get the more channels and features, we should repeat that process This process gives rise to memory out of control, we do the stride #### conv-example. xlsx 2 convolutional filters At a second layer, filter is 3x3x2 tensor, because to add up together the first layer’s channel. Reference: Problem was math. sqrt(5) was not kaiming initialization formula, Implementation in Pytorch ↩ size of tensor, lecture09 ↩ conv-example. xlsx ↩ Why do computer use red, green and blue instead of primary colors ↩ Grayscale is a group of shades without any visible color. … Each of these dots has its own brightness level as well and, therefore, can be converted to grayscale. A grayscale image is one with all color information removed. ↩ Testing RGB and grayscale ↩ stack kernel and make new rank of tensor at output, Lesson06-2019 ↩ "
}, {
- "id": 17,
+ "id": 19,
"url": "http://localhost:4000/2020/02/dps-week8/",
- "title": "Digital Product School week 8&9",
- "body": "2020/02/24 - The 8th week retropect at Digital Product School Week 8/9 - Ship your MVP/Release next iteration each day This week's schedule CONTENT: Preparing engineering weekly Agile Process Daily Stand-up Making application flowchart (feat draw. io) / ER diagram Flowchart, understaning user journey ER diagram Engineering weekly AI lunch Connecting firebase andPreparing engineering weekly: This week at Wednesday, I planned to explain the Language Modelings, mainly focusing ELMo, ULMFiT, BERT and GPT-2. Slides is available here Changed the presentation, because there were people who are not in ML domain. hereWhenever I do the presentation, I learn more than the information I give them. At the same time, I realize I need to learn more than I know. Agile Process: One of a priceless lesson I learnt from digital product school, was experience of doing agile work. Before I came here, it was a little bit vague concept. I’m not sure ‘what is agile’ but this is what we tried to make agile process. Daily Stand-up: Sharing the works everyday helps interdisciplinary team to work better. Since product started to get higher fidelity, the gap between engineer and non-engineer increased. Actually I didn’t planned to explain concept because I thougth I would be lose my audience when I start to explain. But as daily stand-up, which shares our progess, goes day by day, I planed and reported the issues. And it made each other’s topic feel more familiar. I think point is very important, because at that point people start to be curious. So we can actively ask to the others, and that momwnr, we can explain the point teammate dosen’t know. Each color means every different section. Red: Our team goal, Blue: Interaction designer, Green: Product manager, Yellow: Software/AI engineer This week engineer's main plan Each of us try to explain what we are doing, but things become easier when we are asked. Because we explained something was important to us before, but if we asked it is something important for the others. Making application flowchart (feat draw. io) / ER diagram: Before we start the party, we should clarify the flowchart and ER diagram of our application. Flowchart, understaning user journey: Thanks for google, we could use draw. io for our framechart framework. Actually, we cana choice other good flatform, but draw. io has connected app throgh google drive, most of our engineer was used to it. And after this job, I got to know there is also (of course) rule with the symbols, color, size, space, scaling and direction of arrow -reference. But why we should do this? WE have made our storymap before!! I think storymap is for visualize our status and app. So it should be shared with whole the team, and they should able to understand each role’s issue. But flowchart is more like testing technical feasibility, and error that user can experience. So it could be little more specific, complicated, and hypothetical. This week engineer's main plan ER diagram: Even if we use NoSQL database through firebase, my team was accustomed to SQL more. That what we educated when we were at college, so we had to organize our concept while we were learning NoSQL. Engineering weekly: Every engineering weekly we exchange our knowledge each other so that we can grow together. Before today, my AI collegues presented regression, knn and it was my turn. I prepared slide that explain about pre-trained language model, but my header advised me if I go deep of theoretical things, I would lose my audience. So I decided to brief BERT mode, how I can contribute to other team’s project. Since BERT was breakthrough of NLP industry, I tried to explain how it can be applied to hands on product and how it can help people in their product. The result was quite motivative to me. They gave feedback that since it wasn’t that much theoretical, they could enjoy it, and useful information. Someone asked me do I had learned of presentation before. I was really happy with their feedback! AI lunch: Connecting firebase and: "
+ "title": "My life in Digital Product School - week 8/19/10",
+ "body": "2020/02/24 - The 8/9/10th week retropect at Digital Product School Week 8 - Ship your MVPWeek 9/10 - Release next iteration each day Week 8th schedule CONTENT: Agile Product Development Daily Stand-up(planning) Gemba Walk Sprint Reviews Engineering weeklyAgile Product Development: One of a priceless lesson I learnt from digital product school, was experience of doing agile work. Before I came here, it was a little bit vague concept. I’m still not sure ‘what is agile’ but this is how we tried to make agile process. Daily Stand-up(planning): Sharing the works everyday helps interdisciplinary team to work better. Since product started to get higher fidelity, the gap between engineer and non-engineer increased. Actually I didn’t planned to explain concept because I thougth I would be lose my audience when I start to explain. But as daily stand-up, which shares our progess, goes day by day, I planed and reported the issues. And it made each other’s topic feel more familiar. I think point is very important, because at that point people start to be curious. So we can actively ask to the others, and that momwnr, we can explain the point teammate dosen’t know. Each color means every different section. Red: Our team goal, Blue: Interaction designer, Green: Product manager, Yellow: Software/AI engineer This week engineer's main plan Each of us try to explain what we are doing, but things become easier when we are asked. Because we explained something was important to us before, but if we asked it is something important for the others. Gemba Walk: Team Cero with core team Every 2 weeks, we do the Gemba work, which is ‘question everything to the core team’ time. At this period, people can ask anything related to our product, workshop, and framework. Core team will help just for each team, and each team can solve the problem related to their work. < br/>Why we need this session? because with workshop and general schedule, core team has no time just focus on each team. So through this session, we can have opportunity to understand each program and workshop, like why we are using this platform, and when is the due of our small project, and we have this problem and we need help for this. whatever small problem you have, core team is always willing to help you. Sprint Reviews: Every Friday, we have time to summarise what we did for the week. Maybe we need HMW question and our storymap to share our process and then tell and share what we did try, what point we succeeded and what point it was deviant of our prediction, and why we tried it. . Sprint of Ve-link And then, just after all team’s ppt, we do vote with such a cute marvel. Always it’s very difficult to vote (of course you can’t vote to your team!) Because it depends on criteria what do I value!But since this is process of our agile work, I try to focus on what they have changed since last week, and why they did it, how they did it. Engineering weekly: Every engineering weekly we exchange our knowledge each other so that we can grow together. Everyone have their knowledge to share and we can be tutor and at the same time can be of tutee. Previously, my AI collegues presented regression, knn. And because I’m somewhat specialized to NLP, I prepared slide that explain about pre-trained language model, but my header advised me if I go deep of theoretical things, I would lose my audience. So I decided to brief BERT mode, how I can contribute to other team’s project. Since BERT was breakthrough of NLP industry, I tried to explain how it can be applied to hands on product and how it can help people in their product. The result was quite motivative to me. They gave feedback that since it wasn’t that much theoretical, they could enjoy it, and useful information. Someone asked me do I had learned of presentation before. I was really happy with their feedback! "
}, {
- "id": 18,
+ "id": 20,
"url": "http://localhost:4000/2020/02/fast.ai-nlp-note-16/",
"title": "Algorithmic bias",
"body": "2020/02/20 - Algorithms can encode & magnify human bias Case Study 1: Facial Recognition & Predictive Policing: Joy Buolamwini & Timnit Gebru, gendershades. org Microsoft, FACE+, IBM - All of these things are sell now. Largest gap between $\therefore\ Lighter Male\ >\ Darker\ Female $ This US mayor joked cops should “mount . 50-caliber” guns where AI predicts crime With machine learning, with automation, there’s a 99% success, so that robot is ㅡwill beㅡ99% accurate in telling us what is going to happen next, which is really interesting. - city official in Lancater, CA, approving on using IBM for public security Bias: Bias is type of error Statistical Bias: difference between a statistic’s expected value and the true value Unjust Bias: disproportionate preference for or prejudice against a group Unconscious bias: bias that we don’t realize we have But, term bias is too generic to be productive. Different sources of bias have different causes Representation Bias: Dataset was not representative of the algorithm that might be used on later. Above : Data is okay, but algorithm has some problem. Below : Data has error. For example, object detection production that performs very well in common product of US. But in contrast, change of target product region, like Zimbabwe, Solomon Island, and so on, reduced the performence remarkably. It is not the algorithmic problem, so we should care about data volume of region. Evaluation Bias: Benchmark datasets spur on research, 4. 4% of IJB-A images are dark-skinned women. 2/3 of ImageNet images from the West (Sharkar et al, 2017) Case Study 2: Recidivism Algorithm Used Prison Sentencing: Case Study 3: Online Ad Delivery: Bias in NLP: ( Nothing to do with the course, but I’m researching this field these days. ) But all about Englsih ImpactThe person is doctor. The person is nurse -> 그는 의사다. 그녀는 간호사다. Concept of “biased data” often too generic to be useful: Different sources of bias have different sources Data, models and systems are not unchanging numbers on a screen. They’re the result of a complex process that starts with years of historical context and involves a series of choices and norms, from data measurement to model evaluation to human interpretation. - Harini Suresh, “The problem with Biased Data” Five Sources of Bias in ML: Representation Bias Evaluation Bias Measurement Bias Aggregation Bias(46:02) Historical Bias(46:26) A few studies(47:13) Racial Bias, Even when we have good intentions(new york times)(47:10) gender(48:59) Humans are biased, so why does algorithmic bias matter?: Algorithms & humans are used differently (humans are usually decision maker) Algorithms are accurate and objective No way to apeal if there if error processed large scale cheap Machine learning can amplify bias Machine learning can create feedback loops. Technology is power. And with that comes responsibility. Solutions: Analyze a project at work/school: Questions about AI 5 types of bias (Suresh & Guttag) Datasheets for datasets, Modelcards for model reporting Accuracy rate on different sub-groups Work with domain experts & those impacted Increase diversity in our workspace Advocate for good policy Be on the ongoing lookout for bias"
}, {
- "id": 19,
+ "id": 21,
"url": "http://localhost:4000/2020/02/classifier-city/",
"title": "Making a classifier with image dataset made from gooogle",
"body": "2020/02/15 - CONTENTS: Creating dataset from google images Using google_images_download Create ImageDataBunch Train model fit_one_cycle() Let’s find-tune Let’s train the whole model! Let’s make batch size bigger! Interpretation Model in productionCode can be found hereDeployed model here Making a classifier which can distinguish Seoul from Munich and Sanfrancisco!(hoping my well in Munich!) Creating dataset from google images: In machine learning, you always need data before you build your model. You can use either URLs or google_images_download package. Since Jeremy explained specifically, I will try the other. Using google_images_download: note: This is not google official package Refer to Official Doncument, put that arguments. from google_images_download import google_images_downloadresponse = google_images_download. googleimagesdownload() #class instantiationout_dir = os. path. abspath('. . /. . /materials/dataset/pkg/')os. mkdir(out_dir)arguments = { keywords : Cebu,Munich,Seoul , print_urls :True, suffix_keywords : city , output_directory :out_dir, type : photo , }paths = response. download(arguments) #passing the arguments to the functionprint(paths)and if you need, here is main code. Create ImageDataBunch: We need to separate validation set because we just grabbed these imagese from Google. Most of the dataset we use (kaggle/research) splited into train / validation / test so if they are not devided beforehand we should make databunch, and Jeremy recommended assign 20% to validation. Help on function verify_images in module fastai. vision. data:verify_images(path: Union[pathlib. Path, str], delete: bool = True, max_workers: int = 4, max_size: int = None, recurse: bool = False, dest: Union[pathlib. Path, str] = '. ', n_channels: int = 3, interp=2, ext: str = None, img_format: str = None, resume: bool = None, **kwargs) Check if the images in `path` aren't broken, maybe resize them and copy it in `dest`. Data from google image url Data from package Train model: len(class) len(train) len(valid) Data_url 3 432 108 Data_pkg 3 216 53 Uisng model: restnet34 1, Measurement: accuracy 2 fit_one_cycle(): What is fit one cycle? Cyclical Learning Rates for Training Neural Networks One of the way to find good learning rate. Core idea is to start with small learning rate (like 1e-4, 1e-3) and increase the learning rate after each mini-batch till loss starts exploding. And pick up learning rate one order lower than exploding point. For example, plotted learning rate is like below picture, picking up around 1e-2 is the best way. Why this methods Traditionally, the learning rate is decreased as the learning starts converging with time. But this paper suggests to cycle our learning rate, because it makes us avoid local minimum. Basically this cyclic method enables us to explore whole of loss function so that find out global minimum. In other words, higher learning rate behaves like regularisation. Let’s find-tune: Do train just one last layer by learning rate found by find_lr This section you should find the strongest downward slope that kind of sticking around for quite a while. And choose just one order lower than lowest point. As explained before, I will pick up 1e-2. And of course, this is fine-tuning, we don’t need discriminative learning rate yet. Let’s train the whole model!: link When you plot the learning rate again, maybe you will get soaring shape of learning rate. Rule of thumb, When you slice the learning rate, use learning rate you used at unfrozen part. Divide it by 5 or 10 and put it on maximum bound. At minimum bound, get the point just before it soared, and divide it by 10. Let’s make batch size bigger!: Since default batch size is 64, I tried it to 128. And it gets way more better result(even it’s still underfitting!) And if I freeze model and train whole model again, the model would be better. Also, you can use this method to the other big dataset model training! Interpretation: See the confusion matrix. Result is quite great. *Since I’m using colab, I will skip data cleansing. But I highly recommend you to use ImageCleaner widget, only if you are using jupyter notebook (not jupyter lab) Model in production: You can deploy your model in simple way. I referred fast. ai, and used render(it’s free for limited time). You can find detailed document here. and you can create a route like this. @app. route( /classify-url , methods=[ GET ])async def classify_url(request): bytes = await get_bytes(request. query_params[ url ]) img = open_image(BytesIO(bytes)) _,_,losses = learner. predict(img) return JSONResponse({ predictions : sorted( zip(cat_learner. data. classes, map(float, losses)), key=lambda p: p[1], reverse=True ) })You can find my deployed model here Reference: How to create a deep learning dataset using Google Images towardsdatascience - one cycle policy Deep Residual Learning for Image Recognition ↩ Accuracy_and_precision ↩ "
}, {
- "id": 20,
+ "id": 22,
"url": "http://localhost:4000/2020/02/dps-week5/",
"title": "Digital Product School week 5",
"body": "2020/02/09 - The 5th week retropect at Digital Product School Week 5 - Create a Storymap and sync it with Lean Canvas This week's schedule CONTENT: How to create our story map Prepare your story Discover your product’s AI potentialMondayHow to create our story map: We need this 'aha' moment There was a Milestone workshop, about our weekly goal. As we are agile working, we go fast and change every week’s goal. This week we will finalize our story map based on user’s pain-point and HMW questions. How should we make our story-map Basically we should make story map based on this rule Tell stories, don’t just write them! We always need context, that means all the story component should be connected Visualize your product to establish a shared understanding and speed up discussions! Post-it filled of text is not enough, we should fill it with visualizations then team mates can understand it fast Only discuss in front our your story map! (Speed) So we can update our story-map as soon as we change our opinion And also Use a story map to find the parts that matter most and to identify holes in your idea! Since the story map consists of techinical part, we should consider each story’s technical feasibility Minimise output, maximise outcome and impact! Build tests to figure out what’s minimum and what’s viable! This story map functions to find out our minimum value of ideas Work iteratively: Change your story map according to your learnings! We should repeat this process again and again PMs: Make sure Storymap is up to date!Prepare your story: team cero, our whole story map Our goal Technical feasibility of our storyWhat is your strategy to make user achieve something? This would be our expand point Discover your product’s AI potential: How can we apply AI to our product? Let’s write down our ‘HMW’ questions, and find out all p ossibilities. These are suggestion of possibilities, so don’t attached to feasibility (we will do in at lean start-up) Software section's expectation AI section's expectationTuesday Engineer's task, week5This 5th week, engineers settled WendesdayThursdayFriday"
}, {
- "id": 21,
+ "id": 23,
"url": "http://localhost:4000/2020/02/GPU-time/",
"title": "4 reasons took much time to setting GPU for fast.ai than I expected",
"body": "2020/02/05 - Motivation: Before now, me as a undergraduate student, I was parsimony who usually depend on colab, kaggle, friend’s server(occasional) whenever i need GPU. . And this time it’s been for a while to install GPU than I expected and I share the several component that stood in my way. Written at Oct 24 2019, if you think this is deprecated, please do not have a leap of faith. Just for the record, I’ve used Kaggle, Colab, GCP, Azure, EC2 as GPU cloud. 1. Did not know there is JupyterLab option in Google Cloud Platform. : At the first time when GCP came out, there was no AI Platform service. So from starting vm instance to launching jupyter and installing packages, I did all of the things myself. (and I learned 🤗) $ curl -O https://repo. continuum. io/archive/Anaconda3-5. 0. 1-Linux-x86_64. sh[Downloading conda in ssh] I created VM instance,selected zone, machine type and disk type. Then, define firewall rules and in ssh terminal, install jupyter and other packages. But you can do all of these things just using AI Platform. [AI Platform] I think it especially save your time if you are living in Asia-Pacific, which google doesn’t support not that much GPU resources. 2. Consider if the platform has limited resources in a region you live in. : I live in South Korea, East Asia, and it seems like this region has lots of limitation in GPU (except quite expensive AWS) And the Taiwan which was the only one region where I can launch my own VM with GPU (I tried all the other regions in the list) sometimes do normaly, but not always. 😥After launching, I did several works and next day I could not start VM. (I didn’t count it, but tried it a few hours because I didn’t want cost any more time…) Endlessly failed to start instance, then I choose to move AWS as an alternative way. 3. Fast. ai gives deliberate guide and I didn’t know it. : Fast. ai offer the guide for all available platform. (Colab, salamander, Gradient, Kaggle, Colab, and so on) It is so important, and really needs, because cloud computing options are vary as occasion and purpose arise. I didn’t know fast. ai has manual to running GCP, and I think it’s as good a reason as any for me to be have taken time. It helped me so much when I had aws and shortened my time. I don’t want to read all of the manual in amazno. . (It is recommended. . but I’d rather read GIT PRO now…) ssh -i ~/. ssh/<your_private_key_pair> -L localhost:8888:localhost:8888 ubuntu@<your instance IP>4. You should wait to add more volume just after add volume, by building AWS EC2. : Since Elastic Block Store(EBS) storage supports optimized storage, users can’t extend storage volume two times in a row. Unfortunately, at the first time, I didn’t know it (again 👻) and when VM lacked volume, I doubled dist capacity (76*2) at a rough but It needs more. <!– this time I installed GPU in two years, and it became little complicated compared to 2 years ago. And this time for the first time(maybe not the first time. . but i handled it in my class or with my friend. but it’s my first time on my own. ) I very I’m started to using used google colab, kaggleand, GCP-JupyterLab, ec2 - friend made, aws vm machine but I had a environment variable but i did not know of it. On these days, I could not get a resources from taiwan… I couldn’t notice a deliberate Anyway, as a result I tried myself gcp myself and aws ec2 with fast. ai But I think doing on my self surely takes much time (in this point I wonder why I’m doing this, and should remind me, especially I was studying disk volume optimization) disk volume exceed - https://askubuntu. com/questions/919748/no-space-left-on-device-even-though-there-is: "
}, {
- "id": 22,
+ "id": 24,
"url": "http://localhost:4000/2020/02/dps-week4/",
"title": "Digital Product School week 4",
"body": "2020/02/01 - The 4th week retropect at Digital Product School Week 4 - Find solution ideas and run experiments [This week’s schedule] CONTENT: Ideation Techniques What is ideation techniques? Generating idea in my team AIdeation Team brain storming of idea Die Produkt MacherMondayIdeation Techniques: [slides from @steffen] What is ideation techniques?: We tried to find out user’s painpoint last week. Tried to users talk about their, pain point. No question directly, but extract from them their pain with transportation. Generating idea in my team: AIdeation: TuesdayTeam brain storming of idea: Based on generated idea on Monday, we extended our idea doing rolling-paper! Die Produkt Macher: What is lean start-up? Lean startup is a methodology for developing businesses and products that aims to shorten product development cycles and rapidly discover if a proposed business model is viable; this is achieved by adopting a combination of business-hypothesis-driven experimentation, iterative product releases, and validated learning. - wikipedia WendesdayThursdayFriday"
}, {
- "id": 23,
+ "id": 25,
"url": "http://localhost:4000/2020/01/retrosprect-of-acl-paper-2020/",
"title": "Retrospect of ACL 2020 paper writing",
"body": "2020/01/29 - 2020 Annual Conference of the Association for Computational Linguistics Why I can’t use ‘Cebuano’ for the research?: Why I had to change target language from ‘Cebuano’ to ‘Tagalog’?-> No language translator options except google translation. But before knowing that I already consult my friend, whose mother tongue is English. So I had to aplogize her, but couldn’t tell her why suddenly I changed my plan. -> I realized there are many languages even can’t be researched at all. . -> Getting accustomed to discrimination makes misunderstanding, sometimes. At my country, we couldn’t use music streaming service, because of legal problem. But at that moment, I thought it was discrimination, which is done by music company. "
}, {
- "id": 24,
+ "id": 26,
"url": "http://localhost:4000/2020/01/Git-Merge/",
"title": "Why am I not listed as a contributor?!",
"body": "2020/01/10 - From the end of last year, big changes have witnessed in NLP research. Embracing an unprecedented growth, I started to study new exciting results and advances. In doing so, I noticed I’m not listed as contributor of repo which my PR accessed. How did I come to a repository?: When I’m stuck, I would prefer to code, than to go deep in theory. (It must be so. . too much to understand 🤒)It was BERT released by Google AI I felt keenly the necessity of implementing, because not only couldn’t understand the way they figured out positional encoding formula, but how it actually works. What does it mean to “scale” dot product in Attention? (Now I know it’s far from my section 😂) Figure 1. Scaled Dot Product. Adopted from tensorflow blogWhat was the code error?: For implement code in paper, I read the papers Transformer and BERT, structured the model, and refered the others’ code. Meanwhile, I found out a small error in tokenization process, which was changing a token into [MASK], enabled bidirectional representation. I’ve made PR, and got merged. But I was not in contributors. Why?: Figure 2. Merged Pull request Adopted from graykode projectActually I happened to know there can be couple of reasons github doesn’t include my name as contributor. Well, if contributors tab has more than 100 people, in which case it shows you up only if you are in the top 100 contributors because displaying too many contributors can make webpages down. Somethimes, however, it doesn’t that problem. Why not? Two possibilities are there. First, According to Joel-Glovier, if repository maintainer merged-as-a-rebase PR will end up showing as maintainer’s commit. But maintainer shouldn’t normally do this. Second, if you happend to commit using a different git email that what is in your GitHub profile, it will not be attached to your Github user, and “doesn’t show up” as you. Reference: Michał Chromiak’s blog Github: why are my contributions are not showing on my profile atlassian-gitfetch"
}, {
- "id": 25,
- "url": "http://localhost:4000/2019/12/lesson1-fastai/",
- "title": "Fine Grained Classification",
- "body": "2019/12/31 - Finally you can solve the mystery behind this weird drawing. . through this course. juptyer notebook magic: %reload_ext autoreload%autoreload 2%matplotlib inlinethis is special directives to jupyter notebook, not python code. And it is called ‘magics’ (but i think jeremy is magicion) If somebody changes underlying library code while I’m running this, please reload it automatically If somebody asks to plot something, then please plot it here in this Jupyter NotebookDon’t hesitate to import start~ Digging into untar_data, path. ls: Union[pathlib. Path, str]: typed programming language? -> maybe i think disclaim the type beforehand for sure. Q. like assert? path. ls()this is some module that fast. ai made because os. listdir(‘path’) is unconvinient. Python3 pathlib library!: pathlib "
- }, {
- "id": 26,
+ "id": 27,
"url": "http://localhost:4000/2019/12/jeremy-howard/",
"title": "Jeremy Howard",
"body": "2019/12/15 - This is journey to find out ‘who am I trying to be?’: How he impacted me? The person who made me start Computer Vision again. He emphasized the importance of studying NLP and Computer together to understand the deep-learning. He didn’t order it to study, but always he pursuade me with reasonable way. “It’s not just something I can throw away. NLP and computer vision a few weeks apart and that’s going to force your brain to realize like ‘oh I have to remember this’” He made me admit my failure in deep-learning. I started to objectify where am I. What should I do when I’m frustrated. “Keep going. You’re not expected to remember everything. Yet. You’re not expected to understand everything. Yet. You’re not expected to know why everything works. Yet. ” His articles are numerous, below. What is torch. nn Really? High Performance Numeric Programming with Swift: Explorations and Reflections C++11, random distributions, and Swift And especially, I like this book. Designing great data products Great predictive modeling is an important part of the solution, but it no longer stands on its own; as products become more sophisticated, it disappears into the plumbing. Designing great data products And he is also famous for words. Here are some. we’re going to try and use that to really understand what’s going on. So to warn you, none of it is rocket science but a lot of its going to look really new. So don’t expect to get it the first time but expect to listen and jump into the notebook try a few things test things out look particularly at like tensor shapes and inputs and outputs to check your understanding then go back and listen again. But and kind of try it, a few times, because you will get there right, it’s just that there’s going to be a lot of new concepts because we haven’t done that much stuff in pure Pytorch. Lesson 6: Deep Learning 2019 "
}, {
- "id": 27,
+ "id": 28,
"url": "http://localhost:4000/2019/11/julia-evans/",
"title": "Julia Evans",
"body": "2019/11/20 - This is journey to find out ‘who am I trying to be?’: The women who surprised me in many ways. First, she approached me to teaching some concepts drawing cartoons. It was at Hackers news, which was hightest ranks. Personally I have the use of not to reading title, so and cartoon was so cute and clear. I naturally gonna understood mechanism and astonished by her explaination ability. Her value, which she was taught by many people so want to do same things, moved me. Volume of her knowledge, that just reading post title is a deal of work, amazed me. "
}, {
- "id": 28,
+ "id": 29,
"url": "http://localhost:4000/2019/11/coc-retropective/",
"title": "Retrospective on Pycon 2019 Korea (CoC Committee)",
"body": "2019/11/05 - When I was volunteer, it seems like busy and hectic to managing that crowded conference. In my experience, to get things moving, it needs hierarchy. But it didn’t. Organizers emphasized our responsibility, and if I passed each other’s burden, It could be my burden next time. In solidarity of the obligation, we finished conference well. And after participating PyCon Korea 2018 as volunteer, I’ve joined PyCon Korea Organizer last year. <Figure 1> First meeting of PyCon 2019 Korea Organizers It’s been a while since PyCon 2019 finished. It’s held on Aug 15 - 18, at Coex Grand Balloom <Figure 2> Ongoing session, speaking on news comment processing <Figure 3> Sponsor Booth iin Coex Hall <Figure 4> After PyCon 2019, with all of volunteer, organizer, speakers 😍 🥰 Serving as part of the coc TF, I spent large fraction of last year doing CoC job. here’s the path what we’ve been grappled with to grasp a solution. First half: Before the conference Toward Diverse Community: Formally we’ve been reusing and modifying PyCon US CoC, but we needed fit in Korean and I was part of that to revise code of conduct. Except ‘That’ Diversity, Because it is ‘Harassment’: Specific point was harassment, and the others were not. process of finding the points. How can we settle this point?Second half: During the conference Handling the potential Harassment: Disjunction of policy and real-time situation: This ‘PyCon 2019 Korea retrospective series’ would be devided into 3 Episodes. “Retrospective on Pycon 2019 Korea (CoC Committee)” “Retrospective on Pycon 2019 Korea (Program Chair)” (20 Nov, To Be Update) “Maintaining participation while still making timely decisions” (29 Nov, To Be Update)"
}, {
- "id": 29,
+ "id": 30,
"url": "http://localhost:4000/2019/11/elif-shafak/",
"title": "Elif Shafak",
"body": "2019/11/05 - This is journey to find out ‘who am I trying to be?’: For creative-minded people, Istanbul is a treasure. ’ Photo © Chris Boland, licensed under CC BY-NC-ND 2. 0 it suddenly felt like what I was trying to convey was more complicated and detailed than what the circumstances allowed me to say. And I did what I usually do in similar situations: I stammered, I shut down, and I stopped talking. I stopped talking because the truth was complicated, even though I knew, deep within, that one should never, ever remain silent for fear of complexity. <Figure 1> Elif Shafak Photo credit: www. elifsafak. com. tr I want to talk about emotions and the need to boost our emotional intelligence. I think it’s a pity that mainstream political theory pays very little attention to emotions. Oftentimes, analysts and experts are so busy with data and metrics that they seem to forget those things in life that are difficult to measure and perhaps impossible to cluster under statistical models. But I think this is a mistake, for two main reasons. We are emotional beings. I think it’s going to be one of our biggest intellectual challenges, because our political systems are replete with emotions. In country after country, we have seen illiberal politicians exploiting these emotions. And yet within the academia and among the intelligentsia, we are yet to take emotions seriously. I think we should. 1 2 Reference: British Council Worldwide ↩ Ted Talk ↩ "
}, {
- "id": 30,
+ "id": 31,
"url": "http://localhost:4000/2019/01/dps-week1/",
"title": "Digital Product School week 1",
"body": "2019/01/11 - The 1th week retropect at Digital Product School [This week’s schedule] CONTENT: Welcome to Digital Product School! Trip to Spitzingsee Welcome to Design Office Specifying our goal of product Welcome to Digital Product School!: Trip to Spitzingsee: At the first day of Digital Product School, we had a off-site with all of batch 9 people. All the costs were managed by dps. At the beautiful mountain, we settled the team, and got my team goal. Basically, there are two kind of team in DPS. (1) Wild team - the team has fixed topic(2) Company team - the team which has specific stakeholders, and also topic defined by that stakeholders The Core-team will fix what team you will join in DPS for 3 months based on ymy professionals, they announce it at off-site. [My team for 3 months at DPS] And we decide on my batch #9 theme song. How? Each team draw for songs and pitch ‘why this song should be batch #9 theme song’The result? Imagine dragon - Believer (I didn’t know at the moment, this song would be stamped in my memory) We have a workshop for getting to know each other. For example, we share 1) what do I expect from 3 months of dps, 2) when I feel happy in my life time, 3) what I worked for last week, 4) what was my last project and 5) what plays important role in my life My team's board Cero Welcome to Design Office: At first day of design office, we had workshop, which celebrates my day in dps also discuss specific rule, menifesto and stakeholders We get sticker and attach it in map depends on my nationality Now time to get to know my team’s stakeholders. What they want for us? What they expect from us? How free my team are on the topic?To be honest, it is endless tug-of-war. We should discuss with my stakeholders, endlessly, and find out solution which can meet interest of users, stakeholders and my team. Basically, my team’s main stakeholder is ADAC, but BMW, City of munich and Nokia will also participate as my team’s stakeholders. Specifying our goal of product: "
diff --git a/_site/2020/02/GPU-time/index.html b/_site/2020/02/GPU-time/index.html
index 0e58c9da21..2511842bff 100644
--- a/_site/2020/02/GPU-time/index.html
+++ b/_site/2020/02/GPU-time/index.html
@@ -19,9 +19,9 @@
-
+
+{"description":"Motivation","author":{"@type":"Person","name":"dionne"},"@type":"BlogPosting","url":"http://localhost:4000/2020/02/GPU-time/","publisher":{"@type":"Organization","logo":{"@type":"ImageObject","url":"http://localhost:4000/assets/images/logo.png"},"name":"dionne"},"image":"http://localhost:4000/assets/images/10.png","headline":"4 reasons took much time to setting GPU for fast.ai than I expected","dateModified":"2020-02-05T00:00:00+09:00","datePublished":"2020-02-05T00:00:00+09:00","mainEntityOfPage":{"@type":"WebPage","@id":"http://localhost:4000/2020/02/GPU-time/"},"@context":"http://schema.org"}
@@ -161,96 +161,101 @@
"body": " {% if page. url == / %} {% assign latest_post = site. posts[0] %} <div class= topfirstimage style= background-image: url({% if latest_post. image contains :// %}{{ latest_post. image }}{% else %} {{site. baseurl}}/{{ latest_post. image}}{% endif %}); height: 200px; background-size: cover; background-repeat: no-repeat; ></div> {{ latest_post. title }} : {{ latest_post. excerpt | strip_html | strip_newlines | truncate: 136 }} In {% for category in latest_post. categories %} {{ category }}, {% endfor %} {{ latest_post. date | date: '%b %d, %Y' }} {%- assign second_post = site. posts[1] -%} {% if second_post. image %} <img class= w-100 src= {% if second_post. image contains :// %}{{ second_post. image }}{% else %}{{ second_post. image | absolute_url }}{% endif %} alt= {{ second_post. title }} > {% endif %} {{ second_post. title }} : In {% for category in second_post. categories %} {{ category }}, {% endfor %} {{ second_post. date | date: '%b %d, %Y' }} {%- assign third_post = site. posts[2] -%} {% if third_post. image %} <img class= w-100 src= {% if third_post. image contains :// %}{{ third_post. image }}{% else %}{{site. baseurl}}/{{ third_post. image }}{% endif %} alt= {{ third_post. title }} > {% endif %} {{ third_post. title }} : In {% for category in third_post. categories %} {{ category }}, {% endfor %} {{ third_post. date | date: '%b %d, %Y' }} {%- assign fourth_post = site. posts[3] -%} {% if fourth_post. image %} <img class= w-100 src= {% if fourth_post. image contains :// %}{{ fourth_post. image }}{% else %}{{site. baseurl}}/{{ fourth_post. image }}{% endif %} alt= {{ fourth_post. title }} > {% endif %} {{ fourth_post. title }} : In {% for category in fourth_post. categories %} {{ category }}, {% endfor %} {{ fourth_post. date | date: '%b %d, %Y' }} {% for post in site. posts %} {% if post. tags contains sticky %} {{post. title}} {{ post. excerpt | strip_html | strip_newlines | truncate: 136 }} Read More {% endif %}{% endfor %} {% endif %} All Stories: {% for post in paginator. posts %} {% include main-loop-card. html %} {% endfor %} {% if paginator. total_pages > 1 %} {% if paginator. previous_page %} « Prev {% else %} « {% endif %} {% for page in (1. . paginator. total_pages) %} {% if page == paginator. page %} {{ page }} {% elsif page == 1 %} {{ page }} {% else %} {{ page }} {% endif %} {% endfor %} {% if paginator. next_page %} Next » {% else %} » {% endif %} {% endif %} {% include sidebar-featured. html %} "
}, {
"id": 12,
+ "url": "http://localhost:4000/2020/04/v3-2019-lesson06-note/",
+ "title": "fastai 2019 course-v3 Part1, lesson06",
+ "body": "2020/04/15 - Lesson 06Rossmann(Tabular): Tabular data: be careful on Categorical variable vs Continuous variable. if datatype is int, fastai think it is classification, not a regression. Root mean square percentage error. as loss function. When you assign the y_range, it’s better to assign little bit more than actual maximum. > because it’s sigmoid. intermediate layers, which is weight matrix is 1) 1000, and 2) 500 -> which means our parameter would be 500*1000. learn. modelWhat is dropout and embedding dropout?: Nitish Srivastava, Dropout: A Simple way to prevent Neural Networks from Overfitting you can dropout with p value, make it specified to specific layer, or make it applied to all the layers. Pytorch code 1) bernoulli, which decides whether you will hold it? 2) and divide the noise value depends on noise value. so noise became 2 or remain 0. According to pytorch code, We do change at training time, but we do nothing at test time. and this means you don’t have to do anything special with inference time. ’ TODO: find at forums what is inference time - Related to NVIDIA, GPU. Embedding dropout is just a dropout. It’s different between continuous variable and embedding layer. TODO Still can’t understand. why embedding dropout is effective. or,… in need. Let’s delete at random, some of the results of the embedding. and It worked well especially at Kaggle Batch Normalization: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift -> came out false! According to How Does Batch Normalization Help Optimization? The key was multiplicative bias {\gamma} and additive bias {\beta}` Explain Let $$ \hat{y} = f(w_1, w_2, w_3, … , x)} $$ , loss = MSE , Then y_range should be between 1 and 5` And Activation function ends with -1 -> +1 To mitigate this problem, we can add the other parameter, like $$w_n$$ But there’re so much interactions in the process so just re-scale the output. Momentum parameter at BatchNorm1d: Different from momentum like in optimization. This momentum is Exponentially weighted moving average of the mean, instead of deviation. If this is small number: mean standard deviation would be less from mini_batch to mini_batch » less regularization effect. (If this is large number, variation would be greater from mini_batch to mini_batch » more regularization effect) TODO: can’t sure, but i understand, this is not about how to update parameter but about how much reflect previous value when scale and shift Q. Preference between batchnorm and the other regularizations(drop out, weight decay)A. Nope, always try and see the results## lesson6-pets-more### Data Augmentation- Last reg- `get_transforms` has lots of params (even not yet learned all) -> check documentation - Remember you can implement all the doc contents bc it's made from nbdev - TODO: try this!!- Essence of data augmentation is you should maintain the label, while somewhat making sense. - ex) tilt, because it's optically sensible, you can always change the angle of the data view. - zeros, border, and reflection but always `reflection` works most of the time, so that is the default### Convolutional Kernel(What is convolution?)- Will make heat\_map from scratch, which means the parts convolution focuses on![setosa_visualization]()- http://setosa. io/ev/image-kernels/ - javascript thing - How convolution works - Kernel. which does element-wise multiplication, and sum them up - so it has on pixel less at borders -> so it uses padding, and fastai uses reflection as said. - why this Kernel(matrix) helps catching horizontal edge side? - because this kernel`(picture2)` weights differently, depends on `x axis` - why familiar, because it's similar intuition with fugus`(paper)` paper- CNN from different viewpoints`link` - output of pixel is results from different linear equations. - If you connect this with represents of neural network nodes, you can see that the specific inp nodes connected with specific out nodes. - **Summarize**: cnn does 1) matmul some of the elements are always zero 2) same weight for every row, which is called `weight time? weight. . ?, 1:18:50` `(picture)`#### Further lowdown- Because generally image has 3 channels, we need rank 3 kernel. - And **do multiply with all channel output is one pixel**. (`draw by your self`) - but this kernel will catch one feature, like horizontal, so that we make more kernel so that output becomes (h * w * kernel) - And that `kernel` come to `channel`- **Conv2d**: with 3 by 3 kernel, stride 2 conv -> (h/2 * w/2 * kernel) - skip or jump over input pixel - to protect from memory out of control~~~pythonlearn. modellearn. summary()~~~TODO: understand yourself the blocks of conv-kernel: - Usually use big kernel size at first layer (will study this at part2)- Bottom right highlighting kernel(`pic / draw`)- `torch. tensor. expand`: for memory efficient, because we should do RGB- We do not make separate kernel, but make rank 4 kernel - 4d tensor is just stacked kernel- `t[None]. shape` create new unit axis, and why? we make this -> it should move unit of batch, not one size image. ### Average pooling, feature- suppose our pre-trained model results in size of `11 by 11 by 512 ` `pic 4` and my classification task has 37 classes * take the first face of channel, which is 11 by 11 and `mean` it, so that make rank 2 tensor, 512 by 1 * and make 2d matrix, which is 512 by 37 and multiply so that we can get 37 by 1 matrix. - Feature, at convolution block - So, when we transfer-learning without unfreeze, every element of last matrix (512 by 1) should represent(or could catch) each feature. ### Heatmap, Hook~~~hook_output(model[0]) -> acts -> avg_acts~~~- if we average the block with `axis=feature`, result of matrix(11 by 11) depicts `how activated was that area?` -> it is heatmap, `avg_acts`- and acts comes from hook, which is more advanced pytorch feature. - hook into pytorch machine itself, and run any arbitrary Pytorch code - Why this is cool?: Normally it gives set of outputs of forward pass, but we can interrupt and hook the forward pass. - Also can store the output of the convolutional part of the model, which is before avg_pooling- Thinking back when we do cut off `after` the conv part. - but with fast. ai the original convolutional part of the model would be *the first thing in the model*, specifically could be given from `learn. model. eval()[0]` - And this is gotten from `hooked_output` and having hooked the output, we can pass our x_minibatch to output. - Not directly, but with normalized, minibatch, put on to the gpu - `one_item()` function do it, when we have one data `TODO: this is assignment` do it yourself without one_item function - and `. cuda()` put it on gpu- you should print out very often the shape of tensor, and try think why. "
+ }, {
+ "id": 13,
+ "url": "http://localhost:4000/2020/04/qna-image-segmentation/",
+ "title": "[Q&A] Image Segmentation, using Unet with Driving Video data",
+ "body": "2020/04/02 - This post is about my questions while I was studying USF Deep Learning course about image segmentation task. All the answers are from the course, source code, library document, or document. I cared about being clear at reporting information including source of information, however if there are still anything unclear, please contact me. And thank you Jeremy&Rachael for everything. Also Thank you Cambridge Computer Vision Lab to made us to study with your labor. The Cambridge-driving Labeled Video Database (CamVid) is the first collection of videos with object class semantic labels, complete with metadata. The database provides ground truth labels that associate each pixel with one of 32 semantic classes. If someone is interested in this project, please check the site and see the details. Now, let’s start first using jupyter’s one of tricks which I love most. It enables cell to print the code without print function. from IPython. core. interactiveshell import InteractiveShell# pretty print all cell's output and not just the last oneInteractiveShell. ast_node_interactivity = all from fastai. vision import *from fastai. callbacks. hooks import *from fastai. utils. mem import *path = untar_data(URLs. CAMVID) # The locations where the data and models are downloaded are set in config. ymlpath. ls() I’m trying to accustomed to using pathlib module, not just it became built-in module in python, but I felt uncomfortable myself with os module. However, still unpredictable conflicts are remain, even in the quite standard library like Pytorch, tensorflow, onnx. (it require me string for path. not PosixPath. will send PR. . ) [PosixPath('/root/. fastai/data/camvid/valid. txt'), PosixPath('/root/. fastai/data/camvid/images'), PosixPath('/root/. fastai/data/camvid/labels'), PosixPath('/root/. fastai/data/camvid/codes. txt')]path_img = path/'images'path_lbl = path/'labels'fnames = get_image_files(path_img) #filenamelbl_names = get_image_files(path_lbl)1. (Play with data) My Hypothesis: File name has A_B format. and A / B would be at key-value position. Use collections - defaultdict Default Dict: Link: easy to group a sequence of key and value pairs into a dictionary of list?from collections import defaultdictfnames[0], lbl_names[0](PosixPath('/root/. fastai/data/camvid/images/0001TP_009210. png'), PosixPath('/root/. fastai/data/camvid/labels/0016E5_01800_P. png'))files = [tuple(i. stem. split('_')) for i in fnames]labels = [tuple(i. stem. split('_')[:-1]) for i in lbl_names]d = defaultdict(list)for k, v in files: d[k]. append(v)d. keys()len(d['0001TP'])124for k, v in d. items(): print(k, v)0001TP ['009210', '008850', '007350', '008970', '009840', '010140', '008490', '008520', '009540', '008250', '008340', '006840', '007860', '007410', '007740', '009870', '010080', '007890', '008790', '010020', '008400', '007080', '008280', '010380', '009330', '009060', '007470', '006810', '009720', '008580', '007110', '008730', '009150', '007680', '009780', '007800', '007290', '008760', '009510', '008640', '008310', '007440', '006900', '007500', '008460', '009030', '008130', '009480', '009900', '010230', '009270', '008040', '007590', '007950', '009990', '008550', '007260', '008100', '007530', '006960', '008190', '009420', '009930', '009000', '007830', '008940', '006690', '009570', '008880', '010170', '007560', '009300', '006750', '009360', '010200', '007320', '008010', '009120', '007620', '007200', '007140', '010320', '006720', '008670', '007230', '008370', '010260', '009690', '006930', '009090', '007770', '010290', '010350', '008610', '008070', '009600', '008430', '009450', '007380', '009240', '007710', '007170', '008160', '008910', '007020', '006780', '007050', '009960', '009810', '008220', '009180', '009750', '010050', '009660', '010110', '007920', '009630', '007650', '006990', '008700', '009390', '007980', '008820', '006870']0016E5 ['01290', '08159', '05760', '08133', '08063', '06660', '00960', '05850', '00750', '06960', '08035', '08107', '07975', '08017', '05610', '07140', '08119', '08027', '07170', '08400', '08093', '02100', '06390', '04470', '08340', '06060', '00600', '07470', '08151', '07800', '01620', '05730', '01530', '00690', '08430', '05940', '01980', '07320', '08069', '07965', '04380', '05430', '01410', '06780', '08007', '08087', '08079', '06600', '08109', '05490', '00901', '04590', '04680', '08045', '01770', '06690', '08085', '06810', '00420', '08011', '07440', '02190', '06300', '04800', '01500', '00450', '08029', '01470', '06330', '07997', '08067', '05370', '08013', '08190', '00840', '02370', '08049', '08135', '01440', '06870', '05820', '05280', '08051', '04440', '08091', '01380', '00630', '07290', '05520', '04770', '00540', '07995', '07999', '05550', '07920', '08101', '08141', '08053', '04620', '08103', '05160', '07350', '08057', '06030', '06000', '08550', '07963', '08089', '05970', '08047', '05640', '06240', '05220', '04350', '01590', '07959', '01950', '08117', '06180', '01560', '05400', '08043', '07680', '00780', '08081', '07050', '01020', '01350', '04530', '06720', '07969', '08149', '08003', '08131', '08129', '08033', '05460', '01650', '07530', '08023', '05340', '08640', '05100', '08075', '01230', '04980', '02070', '01080', '06210', '05910', '08009', '01800', '05190', '02400', '08083', '08019', '07620', '07200', '07890', '08059', '06990', '04410', '08121', '08123', '06930', '08137', '08147', '08095', '06570', '06150', '08153', '06840', '05250', '00510', '08370', '08580', '08113', '07410', '08097', '01200', '04950', '07770', '07650', '04710', '06090', '08055', '07110', '07981', '00990', '08250', '08127', '01920', '07985', '08220', '08005', '08157', '05130', '08071', '01140', '04830', '07740', '08143', '06120', '02040', '08111', '08115', '00660', '08280', '06420', '07983', '02220', '05700', '01860', '01260', '04920', '06510', '07020', '08073', '08105', '08125', '06360', '07860', '07993', '00810', '06540', '08099', '08139', '02010', '07973', '08155', '07991', '06630', '00480', '06750', '04890', '08001', '08025', '00870', '08490', '01830', '07977', '05010', '01170', '07961', '01680', '01050', '07987', '07080', '04560', '00930', '05310', '02340', '05790', '08460', '00720', '08031', '02280', '08039', '08037', '08065', '06270', '08077', '06900', '04650', '06480', '07230', '08041', '06450', '00570', '07989', '04740', '07979', '02250', '07380', '00390', '01710', '07590', '08021', '08520', '07500', '01110', '04500', '02310', '07971', '02130', '05580', '05880', '08610', '08310', '08145', '05670', '04860', '07260', '08015', '07967', '01740', '01320', '07560', '07830', '01890', '08061', '02160', '07710', '05070', '05040']Seq05VD ['f00030', 'f02550', 'f03450', 'f01110', 'f00480', 'f00210', 'f04590', 'f04170', 'f01800', 'f03990', 'f03360', 'f03900', 'f02070', 'f00810', 'f03690', 'f01350', 'f01530', 'f04980', 'f05100', 'f03060', 'f00900', 'f03870', 'f02460', 'f01470', 'f02370', 'f02820', 'f04080', 'f02760', 'f04860', 'f02250', 'f04200', 'f00270', 'f03720', 'f02850', 'f04410', 'f01200', 'f03090', 'f02010', 'f03930', 'f00090', 'f01650', 'f01890', 'f03840', 'f03030', 'f02130', 'f01230', 'f04110', 'f02520', 'f04140', 'f04020', 'f00060', 'f03420', 'f01560', 'f00120', 'f04290', 'f02340', 'f00300', 'f01380', 'f00870', 'f01860', 'f02970', 'f04560', 'f02730', 'f00330', 'f04530', 'f03780', 'f01770', 'f03390', 'f05040', 'f02430', 'f03330', 'f00660', 'f01740', 'f02100', 'f04800', 'f04050', 'f00510', 'f02790', 'f04350', 'f00690', 'f00540', 'f02490', 'f00960', 'f00930', 'f04230', 'f02880', 'f03600', 'f01020', 'f01500', 'f02400', 'f04830', 'f04470', 'f03300', 'f02670', 'f00450', 'f01980', 'f01170', 'f01620', 'f04500', 'f01080', 'f03180', 'f05070', 'f03150', 'f04950', 'f01440', 'f03510', 'f01710', 'f00360', 'f04770', 'f02910', 'f01050', 'f00630', 'f04320', 'f00570', 'f03240', 'f02190', 'f01140', 'f03540', 'f02220', 'f02640', 'f03960', 'f00000', 'f04920', 'f01950', 'f00990', 'f03480', 'f03000', 'f00420', 'f04620', 'f03210', 'f00780', 'f03570', 'f01590', 'f00750', 'f01920', 'f04650', 'f03750', 'f03630', 'f02310', 'f02610', 'f02580', 'f04740', 'f02280', 'f04680', 'f00390', 'f00720', 'f03660', 'f02040', 'f03270', 'f00180', 'f03810', 'f01410', 'f01290', 'f03120', 'f00840', 'f04440', 'f00150', 'f01260', 'f02700', 'f02940', 'f00600', 'f01830', 'f04260', 'f05010', 'f04890', 'f02160', 'f00240', 'f04380', 'f01680', 'f04710', 'f01320']0006R0 ['f02820', 'f03690', 'f03180', 'f02550', 'f01020', 'f03660', 'f02340', 'f01170', 'f02610', 'f02940', 'f01290', 'f02100', 'f01350', 'f03270', 'f03870', 'f01380', 'f01980', 'f03810', 'f02430', 'f02310', 'f01830', 'f03480', 'f02970', 'f01890', 'f03210', 'f03930', 'f02040', 'f02070', 'f02400', 'f01560', 'f03030', 'f01770', 'f01590', 'f01950', 'f03420', 'f01650', 'f03450', 'f00990', 'f03630', 'f01500', 'f03570', 'f00930', 'f03090', 'f03360', 'f02880', 'f02460', 'f01440', 'f01920', 'f01230', 'f03840', 'f02730', 'f01620', 'f02220', 'f03750', 'f03330', 'f03540', 'f02520', 'f02790', 'f01050', 'f03120', 'f01800', 'f01140', 'f01860', 'f01530', 'f01470', 'f02670', 'f02490', 'f01260', 'f01110', 'f02760', 'f01680', 'f03150', 'f02580', 'f03300', 'f02280', 'f01200', 'f03390', 'f03510', 'f02640', 'f02190', 'f02370', 'f01320', 'f02130', 'f03600', 'f03240', 'f03780', 'f03720', 'f02700', 'f01410', 'f01080', 'f02850', 'f01710', 'f03900', 'f03060', 'f01740', 'f02010', 'f02250', 'f00960', 'f03000', 'f02160', 'f02910']for k, v in d. items(): print(k, len(d[k]))0001TP 1240016E5 305Seq05VD 1710006R0 101for i in d2. keys(): print(i,len(d2[i]))0016E5 3050001TP 1240006R0 101Seq05VD 171files[0], labels[0](('0001TP', '009210'), ('0016E5', '01800'))2. My question: Link: Why do we need masking? and does color from fastai library? (have to look into source code) What do the parameter alpha do? When people make masked img, would it be have ranged integer limit? Does image normalization related with this?lbl_sorted = sorted(lbl_names)f_sorted = sorted(fnames)lbl_1 = lbl_sorted[33]f_1 = f_sorted[33]img = open_image(lbl_1)mask = open_mask(lbl_1)_,axs = plt. subplots(1,2, figsize=(10,5))# img. show(ax=axs[0], y=mask, title='masked')img. show(ax=axs[0], title='1')mask. show(ax=axs[1], title='2', alpha=1. ) img_2 = open_image(f_1)mask_2 = open_mask(f_1)_,axs = plt. subplots(1,2, figsize=(10,5))# img. show(ax=axs[0], y=mask, title='masked')img_2. show(ax=axs[0], title='3',)mask_2. show(ax=axs[1], title='4', alpha=1. ) open_mask(lbl_1). data. shapetorch. Size([1, 720, 960])open_mask(lbl_1). data. shapetorch. Size([1, 720, 960])open_image(f_1). data. shapetorch. Size([3, 720, 960])open_image(f_1). data. shapetorch. Size([3, 720, 960])img. data #labeled datatensor([[[0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], . . . , [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176]], [[0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], . . . , [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176]], [[0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], . . . , [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176]]])mask. data # after mask, labeled datatensor([[[ 4, 4, 4, . . . , 21, 21, 21], [ 4, 4, 4, . . . , 21, 21, 21], [ 4, 4, 4, . . . , 21, 21, 21], . . . , [17, 17, 17, . . . , 30, 30, 30], [17, 17, 17, . . . , 30, 30, 30], [17, 17, 17, . . . , 30, 30, 30]]])img_2. data, mask_2. data(tensor([[[0. 0706, 0. 0667, 0. 0706, . . . , 0. 6431, 0. 6549, 0. 6627], [0. 0745, 0. 0706, 0. 0706, . . . , 0. 6431, 0. 6510, 0. 6549], [0. 0784, 0. 0706, 0. 0745, . . . , 0. 6392, 0. 6588, 0. 6588], . . . , [0. 0863, 0. 0824, 0. 0824, . . . , 0. 1333, 0. 1216, 0. 1255], [0. 0902, 0. 0863, 0. 0824, . . . , 0. 1255, 0. 1176, 0. 1216], [0. 0863, 0. 0824, 0. 0784, . . . , 0. 1137, 0. 1059, 0. 1137]], [[0. 0706, 0. 0667, 0. 0706, . . . , 0. 7490, 0. 7608, 0. 7686], [0. 0745, 0. 0706, 0. 0706, . . . , 0. 7451, 0. 7569, 0. 7608], [0. 0784, 0. 0706, 0. 0745, . . . , 0. 7412, 0. 7529, 0. 7529], . . . , [0. 0980, 0. 0941, 0. 0941, . . . , 0. 1804, 0. 1686, 0. 1725], [0. 1059, 0. 1020, 0. 0980, . . . , 0. 1725, 0. 1647, 0. 1686], [0. 1020, 0. 0980, 0. 0941, . . . , 0. 1608, 0. 1529, 0. 1608]], [[0. 0784, 0. 0745, 0. 0784, . . . , 0. 7569, 0. 7686, 0. 7765], [0. 0824, 0. 0784, 0. 0784, . . . , 0. 7647, 0. 7647, 0. 7686], [0. 0784, 0. 0706, 0. 0745, . . . , 0. 7608, 0. 7647, 0. 7647], . . . , [0. 1216, 0. 1176, 0. 1176, . . . , 0. 2000, 0. 1882, 0. 1922], [0. 1176, 0. 1137, 0. 1098, . . . , 0. 1843, 0. 1765, 0. 1804], [0. 1137, 0. 1098, 0. 1059, . . . , 0. 1725, 0. 1647, 0. 1725]]]), tensor([[[ 18, 17, 18, . . . , 183, 186, 188], [ 19, 18, 18, . . . , 183, 185, 186], [ 20, 18, 19, . . . , 182, 185, 185], . . . , [ 25, 24, 24, . . . , 43, 40, 41], [ 26, 25, 24, . . . , 41, 39, 40], [ 25, 24, 23, . . . , 38, 36, 38]]]))3. What is a difference between image and imageSegment?: imageSegment An ImageSegment object has the same properties as an Image. The only difference is that when applying the transformations to an ImageSegment, it will ignore the functions that deal with lighting and keep values of 0 and 1. It’s easy to show the segmentation mask over the associated Image by using the y argument of show_image. img = open_image(fnames[0])mask = open_mask(lbl_names[0])_,axs = plt. subplots(1,3, figsize=(8,4))img. show(ax=axs[0], title='no mask')img. show(ax=axs[1], y=mask, title='masked') #seg mask over the img using y argmask. show(ax=axs[2], title='mask only', alpha=1. ) vision. image ##4. Why/How img div by 255 and how it results fast. ai : vision. image - If div=True, pixel values are divided by 255. to become floats between 0. and 1. At times, you want to get rid of distortions caused by lights and shadows in an image. Normalizing the RGB values of an image can at times be a simple and effective way of achieving this. So sum of the pixel’s value over all channels(which is S) divides each intensified channel so that nomalized value will be R/S, G/S and B/S (where, S=R+G+B). Detailed explain here4. Python Evaluation Order: Python evaluates expressions from left to right. Notice that while evaluating an assignment, the right-hand side is evaluated before the left-hand side. mask_tmp, trg_tmp, void_tmp = 2, 1, 10mask_tmp = trg_tmp != void_tmpprint(mask_tmp, trg_tmp, void_tmp) # (1) target is not same with voidTrue 1 10# Example 1x = 1y = 2x,y = y,x+yx, y(2, 3)# Example 2x = 1y = 2x = yy = x+yx, y(2, 4)5. model learner parameter :: pct_start: A: Percentage of total number of epochs when learning rate rises during one cycle. Q: Sorry, I still confused that one cycle in the new API only runs one epoch. How the percentage of total number of epochs works? Can you give a example? If learn. fit_one_cycle(10, slice(1e-4,1e-3,1e-2), pct_start=0. 05)??A: Ok, strictly correct answer would be percentage of iterations, so you can have lr both increase and decrease during same epoch. In your example, say, you have 100 iterations per epoch, then for half an epoch (0. 05 * (10 * 100) = 50) lr will rise, then slowly decrease. Q2: Thanks for this explanation … so essentially, it is the percentage of overall iterations where the LR is increasing, correct? So, given the default of 0. 3, it means that your LR is going up for 30% of your iterations and then decreasing over the last 70%. Is that a correct summation of what is happening? A2: Yes, I think that’s correct. You can verify that by changing its value and check:learn. recorder. plot_lr() For example if pct_start = 0. 2 source: forums. fastai "
+ }, {
+ "id": 14,
"url": "http://localhost:4000/2020/03/note08-fastai-4/",
"title": "Gradient backward, Chain Rule, Refactoring",
- "body": "2020/03/02 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring” Lecture 08 - Deep Learning From Foundations-part2 “ Homework: calculus for machine learning einsum conventionCONTENTS: Foundation version Gradients backward pass decompose function chain rule with code check the result using Pytorch autograd Refactor model Layers as classes Modue. forward() Without einsum nn. Linear and nn. Module Forward process Foundation version: Gradients backward pass: Gradients is output with respect to parameter we’ve done this work in this path(below) to simplify this calculus, we can just change it into, So, you should know of the derivative of each bit on its own, and then you multiply them all together. As a result, it would be over cross over the data. So you can get gradient, output with respect to parameter What order should we calculate? BTW, why Jeremy wrote , not Loss function?1 decompose function We want to get derivative of which forms But, we have a estimation of answer (we call it y hat) now So, I will decompose funciton to trace target variable. Using the above forward pass, we can suppose some function from the end. start from , We know MSE funciton got two parameters, output, and target . from MSE’s input we know function’s output and supposing v is input of that function, similarly, v became output of chain rule with code examplify backward process by random sampling To get a variable, I modified forward model a little def model_ping(out = 'x_train'): l1 = lin(x_train, w1, b1) # one linear layer l2 = relu(l1) # one relu layer l3 = lin(l2, w2, b2) # one more linear layer return eval(out) Be careful we don’t use mse_loss in backward process1) start with the very last function, which is loss funciton. MSE If we codify this formula,def mse_grad(inp, targ): #mse_input(1000,1), mse_targ (1000,1) # grad of loss with respect to output of previous layer inp. g = 2. * (inp. squeeze() - targ). unsqueeze(-1) / inp. shape[0] And, this can be examplified like below. Notice that input of gradient function is same with forward functiony_hat = model_ping('l3') #get value from forward modely_hat. g = ((y_hat. squeeze(-1)-y_train). unsqueeze(-1))/y_hat. shape[0]y_hat. g. shape>>> torch. Size([50000, 1]) We can just calculate using broadcasting, not using squeeze. then why should do and unsqueeze again?🎯 It’s related with random access memory(RAM). . If I don’t squeeze, (I’m using colab) it out of RAM. 2) Derivative of linear2 function This process’s weight dimensions defined by axis=1, axis=2. axis=0 dimension means size of data. This will be summazed by . sum(0) method. unsqeeze(-1)&unsqeeze(1) seperates the dimension, and make a dot product, and vanish axis=0 dimension. def lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowlin2 = model_ping('l2'); #get value from forward modellin2. g = y_hat. g@w2. t(); w2. g = (lin2. unsqueeze(-1) * y_hat. g. unsqueeze(1)). sum(0);b2. g = y_hat. g. sum(0);lin2. g. shape, w2. g. shape, b2. g. shape>>> torch. Size([50000, 50])torch. Size([50, 1])torch. Size([1]) Notice going reverse order, we’re passing in gradient backward3) derivative of ReLU def relu_grad(inp, out): # grad of relu with respect to input activations inp. g = (inp>0). float() * out. g Examplified belowlin1=model_ping('l1') #get value from forward modellin1. g = (lin1>0). float() * lin2. g;lin1. g. shape>>> torch. Size([50000, 50])4) Derivative of linear1 Same process with 2) but, this process’s weight hasdef lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowx_train. g = lin1. g @ w1. t(); w1. g = (x_train. unsqueeze(-1) * lin1. g. unsqueeze(1)). sum(0); b1. g = lin1. g. sum(0);x_train. g. shape, w1. g. shape, b1. g. shape>>> torch. Size([50000, 784])torch. Size([784, 50])torch. Size([50])5) Then it goes backward pass def forward_and_backward(inp, targ): # forward pass: l1 = inp @ w1 + b1 l2 = relu(l1) out = l2 @ w2 + b2 # we don't actually need the loss in backward! loss = mse(out, targ) # backward pass: mse_grad(out, targ) lin_grad(l2, out, w2, b2) relu_grad(l1, l2) lin_grad(inp, l1, w1, b1)Version 1 (Basic)- Wall time: 1. 95 s Summary Notice that output of function at forward pass became input of backward pass backpropagation is just the chain rule value loss (loss=mse(out,targ)) is not used in gradient calcuation. Because, it doesn’t appear with the weight. w1g, w2g, b1g, b2g, ig will be used for optimizercheck the result using Pytorch autograd require_grad_ is the magical function, which can automatic differentiation. 2 This magical auto gradified tensor keep track what happend in forward (taking loss function), and do the backward3 So it saves our time to differentiate ourselves ⤵️ THis is benchmark…. . Version 2 (torch autograd)- Wall time: 3. 81 µs Refactor model: Amazingly, just refactoring our main pieces, it comes down up to Pytorch package. 🌟 Implement yourself, Practice, practice, practice! 🌟 Layers as classes: Relu and Linear are layers in oue neural net. -> make it as classes For the forward, using __call__ for the both of forward & backward. Because ‘call’ means we treat this as a function. class Lin(): def __init__(self, w, b): self. w,self. b = w,b def __call__(self, inp): self. inp = inp self. out = inp@self. w + self. b return self. out def backward(self): self. inp. g = self. out. g @ self. w. t() # Creating a giant outer product, just to sum it, is inefficient! self. w. g = (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) self. b. g = self. out. g. sum(0) Remember that in lin_grad function, we save bias&weight!!!!!💬 inp. g : gradient of the output with respect to the input. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 w. g : gradient of the output with respect to the weight. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 b. g : gradient of the output with respect to the bias. {: style=”color:grey; font-size: 90%; text-align: center;”} class Model(): def __init__(self, w1, b1, w2, b2): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ) def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() refer to Jeremy’s Model class, he put layers in list Dionne’s self-study note: Decomposing Jeremy’s Model class init needs weight, bias but not x data when call that class(a. k. a function) it gave x data and y label! jeremy composited function in layers. x = l(x) so concise…. . also utilized that layer list when backward ust reversing it (using python list’s method) And he is recursively calling the function on the result of the previous thing. ⬇️for l in self. layers: x = l(x)Q2: Don’t I need to declare magical autograd function, requires_grad_?{: style=”color:red; font-size: 130%; text-align: center;”} [The questions migrated to this article] Version 3 (refactoring - layer to class)- Wall time: 5. 25 µs Modue. forward(): Duplicate code makes execution time slow. Role of __call__ changed. No more __call__ for implementing forward pass. By initializing the forward with __call__, Module. forward() use overriding to maximize reusability. So any layer inherit Module, can use parent’s function. gradient of the output with respect to the weight (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) can be reexpressed using einsum, torch. einsum( bi,bj->ij , inp, out. g) Defining forward and Module enables Pytorch to out almost duplicatesVersion 4 (Module & einsum)- Wall time: 4. 29 µs Q2: Isn’t there any way to use broadcasting? Why we should use outer product?{: style=”color:red; font-size: 130%; text-align: center;”} Without einsum: Replacing einsum to matrix product is even more faster. torch. einsum( bi,bj->ij , inp, out. g)can be reexpressed using matrix product, inp. t() @ out. gVersion 5 (without einsum)- Wall time: 3. 81 µs nn. Linear and nn. Module: Torch’s package nn. Linear and nn. Module Version 6 (torch package)- Wall time: 5. 01 µs Final, Using torch. nn. Linear & torch. nn. Module~~~pythonclass Model(nn. Module): def init(self, n_in, nh, n_out): super(). init() self. layers = [nn. Linear(n_in,nh), nn. ReLU(), nn. Linear(nh,n_out)] self. loss = mse def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x. squeeze(), targ)class Model(): def init(self): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ)def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() ~~~ Footnote: fast. ai forums Lesson-8 ↩ pytorch docs - autograd ↩ stackoverflow - finding methods a object has ↩ "
+ "body": "2020/03/02 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring ” Lecture 08 - Deep Learning From Foundations-part2 “ Homework: calculus for machine learning einsum conventionCONTENTS: Foundation version Gradients backward pass decompose function chain rule with code check the result using Pytorch autograd Refactor model Layers as classes Modue. forward() Without einsum nn. Linear and nn. Module Forward process Foundation version: Gradients backward pass: Gradients is output with respect to parameter we’ve done this work in this path(below) to simplify this calculus, we can just change it into, So, you should know of the derivative of each bit on its own, and then you multiply them all together. As a result, it would be over cross over the data. So you can get gradient, output with respect to parameter What order should we calculate? BTW, why Jeremy wrote , not Loss function?1 decompose function We want to get derivative of which forms But, we have a estimation of answer (we call it y hat) now So, I will decompose funciton to trace target variable. Using the above forward pass, we can suppose some function from the end. start from , We know MSE funciton got two parameters, output, and target . from MSE’s input we know function’s output and supposing v is input of that function, similarly, v became output of chain rule with code examplify backward process by random sampling To get a variable, I modified forward model a little def model_ping(out = 'x_train'): l1 = lin(x_train, w1, b1) # one linear layer l2 = relu(l1) # one relu layer l3 = lin(l2, w2, b2) # one more linear layer return eval(out) Be careful we don’t use mse_loss in backward process1) start with the very last function, which is loss funciton. MSE If we codify this formula,def mse_grad(inp, targ): #mse_input(1000,1), mse_targ (1000,1) # grad of loss with respect to output of previous layer inp. g = 2. * (inp. squeeze() - targ). unsqueeze(-1) / inp. shape[0] And, this can be examplified like below. Notice that input of gradient function is same with forward functiony_hat = model_ping('l3') #get value from forward modely_hat. g = ((y_hat. squeeze(-1)-y_train). unsqueeze(-1))/y_hat. shape[0]y_hat. g. shape>>> torch. Size([50000, 1]) We can just calculate using broadcasting, not using squeeze. then why should do and unsqueeze again?🎯 It’s related with random access memory(RAM). . If I don’t squeeze, (I’m using colab) it out of RAM. 2) Derivative of linear2 function This process’s weight dimensions defined by axis=1, axis=2. axis=0 dimension means size of data. This will be summazed by . sum(0) method. unsqeeze(-1)&unsqeeze(1) seperates the dimension, and make a dot product, and vanish axis=0 dimension. def lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowlin2 = model_ping('l2'); #get value from forward modellin2. g = y_hat. g@w2. t(); w2. g = (lin2. unsqueeze(-1) * y_hat. g. unsqueeze(1)). sum(0);b2. g = y_hat. g. sum(0);lin2. g. shape, w2. g. shape, b2. g. shape>>> torch. Size([50000, 50])torch. Size([50, 1])torch. Size([1]) Notice going reverse order, we’re passing in gradient backward3) derivative of ReLU def relu_grad(inp, out): # grad of relu with respect to input activations inp. g = (inp>0). float() * out. g Examplified belowlin1=model_ping('l1') #get value from forward modellin1. g = (lin1>0). float() * lin2. g;lin1. g. shape>>> torch. Size([50000, 50])4) Derivative of linear1 Same process with 2) but, this process’s weight hasdef lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowx_train. g = lin1. g @ w1. t(); w1. g = (x_train. unsqueeze(-1) * lin1. g. unsqueeze(1)). sum(0); b1. g = lin1. g. sum(0);x_train. g. shape, w1. g. shape, b1. g. shape>>> torch. Size([50000, 784])torch. Size([784, 50])torch. Size([50])5) Then it goes backward pass def forward_and_backward(inp, targ): # forward pass: l1 = inp @ w1 + b1 l2 = relu(l1) out = l2 @ w2 + b2 # we don't actually need the loss in backward! loss = mse(out, targ) # backward pass: mse_grad(out, targ) lin_grad(l2, out, w2, b2) relu_grad(l1, l2) lin_grad(inp, l1, w1, b1)Version 1 (Basic)- Wall time: 1. 95 s Summary Notice that output of function at forward pass became input of backward pass backpropagation is just the chain rule value loss (loss=mse(out,targ)) is not used in gradient calcuation. Because, it doesn’t appear with the weight. w1g, w2g, b1g, b2g, ig will be used for optimizercheck the result using Pytorch autograd require_grad_ is the magical function, which can automatic differentiation. 2 This magical auto gradified tensor keep track what happend in forward (taking loss function), and do the backward3 So it saves our time to differentiate ourselves Postfix underscore means in pytorch, in-place function, What is in-place function?⤵️ THis is benchmark…. . Version 2 (torch autograd)- Wall time: 3. 81 µs Refactor model: Amazingly, just refactoring our main pieces, it comes down up to Pytorch package. 🌟 Implement yourself, Practice, practice, practice! 🌟 Layers as classes: Relu and Linear are layers in oue neural net. -> make it as classes For the forward, using __call__ for the both of forward & backward. Because ‘call’ means we treat this as a function. class Lin(): def __init__(self, w, b): self. w,self. b = w,b def __call__(self, inp): self. inp = inp self. out = inp@self. w + self. b return self. out def backward(self): self. inp. g = self. out. g @ self. w. t() # Creating a giant outer product, just to sum it, is inefficient! self. w. g = (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) self. b. g = self. out. g. sum(0) Remember that in lin_grad function, we save bias&weight!!!!!💬 inp. g : gradient of the output with respect to the input. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 w. g : gradient of the output with respect to the weight. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 b. g : gradient of the output with respect to the bias. {: style=”color:grey; font-size: 90%; text-align: center;”} class Model(): def __init__(self, w1, b1, w2, b2): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ) def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() refer to Jeremy’s Model class, he put layers in list Dionne’s self-study note: Decomposing Jeremy’s Model class init needs weight, bias but not x data when call that class(a. k. a function) it gave x data and y label! jeremy composited function in layers. x = l(x) so concise…. . also utilized that layer list when backward ust reversing it (using python list’s method) And he is recursively calling the function on the result of the previous thing. ⬇️for l in self. layers: x = l(x)Q2: Don’t I need to declare magical autograd function, requires_grad_?{: style=”color:red; font-size: 130%; text-align: center;”} [The questions migrated to this article] Version 3 (refactoring - layer to class)- Wall time: 5. 25 µs Modue. forward(): Duplicate code makes execution time slow. Role of __call__ changed. No more __call__ for implementing forward pass. By initializing the forward with __call__, Module. forward() use overriding to maximize reusability. So any layer inherit Module, can use parent’s function. gradient of the output with respect to the weight (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) can be reexpressed using einsum, torch. einsum( bi,bj->ij , inp, out. g) Defining forward and Module enables Pytorch to out almost duplicatesVersion 4 (Module & einsum)- Wall time: 4. 29 µs Q2: Isn’t there any way to use broadcasting? Why we should use outer product?{: style=”color:red; font-size: 130%; text-align: center;”} Without einsum: Replacing einsum to matrix product is even more faster. torch. einsum( bi,bj->ij , inp, out. g)can be reexpressed using matrix product, inp. t() @ out. gVersion 5 (without einsum)- Wall time: 3. 81 µs nn. Linear and nn. Module: Torch’s package nn. Linear and nn. Module Version 6 (torch package)- Wall time: 5. 01 µs Final, Using torch. nn. Linear & torch. nn. Module~~~pythonclass Model(nn. Module): def init(self, n_in, nh, n_out): super(). init() self. layers = [nn. Linear(n_in,nh), nn. ReLU(), nn. Linear(nh,n_out)] self. loss = mse def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x. squeeze(), targ)class Model(): def init(self): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ)def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() ~~~ Footnote: fast. ai forums Lesson-8 ↩ pytorch docs - autograd ↩ stackoverflow - finding methods a object has ↩ "
}, {
- "id": 13,
+ "id": 15,
"url": "http://localhost:4000/2020/03/note08-fastai-3/",
"title": "Implement forward&backward pass from scratch",
"body": "2020/03/01 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring1. The forward and backward passes: 1. 1 Normalization: train_mean,train_std = x_train. mean(),x_train. std()>>> train_mean,train_std(tensor(0. 1304), tensor(0. 3073))Remember! Dataset, which is x_train, mean and standard deviation is not 0&1. But we need them to be which means we should substract means and divide data by std. You should not standarlize validation set because training set and validation set should be aparted. after normalize, mean is close to zero, and standard deviation is close to 1. 1. 2 Variable definition: n,m: size of the training set c: the number of activations we need in our model2. Foundation Version: 2. 1 Basic architecture: Our model has one hidden layer, output to have 10 activations, used in cross entropy. But in process of building architecture, we will use mean square error, output to have 1 activations and lator change it to cross entropy number of hidden unit; 50see below pic We want to make w1&w2 mean and std be 0&1. why initializating and make mean zero and std one is important? paper highlighting importance of normalisation - training 10,000 layer network without regularisation1 2. 1. 1 simplified kaiming initQ: Why we did init, normalize with only validation data? Because we can not handle and get statistics from each value of x_valid?{: style=”color:red; font-size: 130%; text-align: center;”} what about hidden(first) layer?w1 = torch. randn(m,nh)b1 = torch. zeros(nh)t = lin(x_valid, w1, b1) # hidden>>> t. mean(), t. std()((tensor(2. 3191), tensor(27. 0303))In output(second) layer, w2 = torch. randn(nh,1)b2 = torch. zeros(1)t2 = lin(t, w2, b2) # output>>> t2. mean(), t2. std()(tensor(-58. 2665), tensor(170. 9717)) which is terribly far from normalzed value. But if we apply simplified kaiming init w1 = torch. randn(m,nh)/math. sqrt(m); b1 = torch. zeros(nh)w2 = torch. randn(nh,1)/math. sqrt(nh); b2 = torch. zeros(1)t = lin(x_valid, w1, b1)t. mean(),t. std()>>> (tensor(-0. 0516), tensor(0. 9354)) But, actually, we use activations not only linear function After applying activations relu at linear layer, mean and deviation became 0. 5. 2. 1. 2 Glorrot initializationPaper2: Understanding the difficulty of training deep feedforward neural networks Gaussian(, bell shaped, normal distributions) is not trained very well. How to initialize neural nets? with the size of layer , the number of filters . But there is No acount for import of ReLU If we got 1000 layers, vanishing gradients problem emerges2. 1. 3 Kaiming initializatingPaper3: Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Kaiming He, explained here rectifier: rectified linear unit rectifier network: neural network with rectifier linear units This is kaiming init, and why suddenly replace one to two on a top? to avoid vanishing gradient(weights) But it doesn’t give very nice mean tough. 2. 1. 4 Pytorch package Why fan_out? according to pytorch documentation, choosing 'fan_in' preserves the magnitude of the variance of the wights in the forward pass. choosing 'fan_out' preserves the magnitues in the backward pass(, which means matmul; with transposed matrix) ➡️ in the other words, torch use fan_out cz pytorch transpose in linear transformaton. What about CNN in Pytorch?I tried torch. nn. Conv2d. conv2d_forward?? Jeremy digged into using torch. nn. modules. conv. _ConvNd. reset_parameters?? 2 in Pytorch, it doesn’t seem to be implemented kaiming init in right formula. so we should use our own operation. But actually, this has been discussed in Pytorch community before. 3 4 Jeremy said it enhanced variance also, so I sampled 100 times and counted better results. To make sure the shape seems sensible. check with assert. (remember we will replace 1 to 10 in cross entropy)assert model(x_valid). shape==torch. Size([x_valid. shape[0],1])>>> model(x_valid). shape(10000, 1) We have made Relu, init, linear, it seems we can forward pass code we need for basic architecture nh = 50def lin(x, w, b): return x@w + b;w1 = torch. randn(m,nh)*math. sqrt(2. /m ); b1 = torch. zeros(nh)w2 = torch. randn(nh,1); b2 = torch. zeros(1)def relu(x): return x. clamp_min(0. ) - 0. 5t1 = relu(lin(x_valid, w1, b1))def model(xb): l1 = lin(xb, w1, b1) l2 = relu(l1) l3 = lin(l2, w2, b2) return l32. 2 Loss function: MSE: Mean squared error need unit vector, so we remove unit axis. def mse(output, targ): return (output. squeeze(-1) - targ). pow(2). mean() In python, in case you remove axis, you use ‘squeeze’, or add axis use ‘unsqueeze’ torch. squeeze where code commonly broken. so, when you use squeeze, clarify dimension axis you want to removetmp = torch. tensor([1,1])tmp. squeeze()>>> tensor([1, 1]) make sure to make as float when you calculateBut why??? because it is tensor?{: style=”color:red; font-size: 130%;”} Here’s the error when I don’t transform the data type ---------------------------------------------------------------------------TypeError Traceback (most recent call last)<ipython-input-22-ae6009bef8b4> in <module>()----> 1 y_train = get_data()[1] # call data again 2 mse(preds, y_train)TypeError: 'map' object is not subscriptable This is forward passFootnote: Other materials: Understanding the difficulty of training deep feedforward neural networks, paper that introduced Xavier initialization Fixup Initialization: Residual Learning Without Normalization ↩ Pytorch implementaion on Kaiming init of conv and linear layers ↩ Pytorch kaiming init issue ↩ Pytorch kaiming init explained ↩ "
}, {
- "id": 14,
+ "id": 16,
"url": "http://localhost:4000/2020/03/note08-fastai-2/",
"title": "What's inside Pytorch Operator?",
"body": "2020/03/01 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, RefactoringWhat’s inside Pytorch Operator?: Section02 Time comparison with pure Python: Matmul with broadcasting> 3194. 95 times faster Einstein summation> 16090. 91 times faster Pytorch’s operator> 49166. 67 times faster 1. Elementwise op: 1. 1 Frobenius norm: above converted into (m*m). sum(). sqrt() Plus, don’t suffer from mathmatical symbols. He also copy and paste that equations from wikipedia. and if you need latex form, download it from archive. 2. Elementwise Matmul: What is the meaning of elementwise? We do not calculate each component. But all of the component at once. Because, length of column of A and row of B are fixed. How much time we saved? So now that takes 1. 37ms. We have removed one line of code and it is a 178 times faster…#TODOI don’t know where the 5 from. but keep it. Maybe this is related with frobenius norm…?as a result, the code before for k in range(ac): c[i,j] += a[i,k] + b[k,j]the code after c[i,j] = (a[i,:] * b[:,j]). sum()To compare it (result betweet original and adjusted version) we use not test_eq but other function. The reason for this is that due to rounding errors from math operations, matrices may not be exactly the same. As a result, we want a function that will “is a equal to b within some tolerance” #exportdef near(a,b): return torch. allclose(a, b, rtol=1e-3, atol=1e-5)def test_near(a,b): test(a,b,near)test_near(t1, matmul(m1, m2))3. Broadcasting: Now, we will use the broadcasting and removec[i,j] = (a[i,:] * b[:,j]). sum() How it works?>>> a=tensor([[10,10,10], [20,20,20], [30,30,30]])>>> b=tensor([1,2,3,])>>> a,b (tensor([[10, 10, 10], [20, 20, 20], [30, 30, 30]]),tensor([1, 2, 3])) >>> a+btensor([[11, 12, 13], [21, 22, 23], [31, 32, 33]]) <Figure 2> demonstrated how array b is broadcasting(or copied but not occupy memory) to compatible with a. Refered from numpy_tutorial there is no loop, but it seems there is exactly the loop. This is not from jeremy (actually after a moment he cover it) but i wondered How to broadcast an array by columns? c=tensor([[1],[2],[3]])a+ctensor([[11, 11, 11], [22, 22, 22], [33, 33, 33]])s What is tensor. stride()?help(t. stride)Help on built-in function stride: stride(…) method of torch. Tensor instancestride(dim) -> tuple or intReturns the stride of :attr:’self’ tensor. Stride is the jump necessary to go from one element to the next one in the specified dimension :attr:’dim’. A tuple of all strides is returned when no argument is passed in. Otherwise, an integer value is returned as the stride in the particular dimension :attr:’dim’. Args: dim (int, optional): the desired dimension in which stride is requiredExample::* x = torch. tensor([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])`x. stride()>>> (5, 1)x. stride(0)>>> 5x. stride(-1)>>> 1 unsqueeze & None index We can manipulate rank of tensor Special value ‘None’, which means please squeeze a new axis here== please broadcast herec = torch. tensor([10,20,30])c[None,:] in c, squeeze a new axis in here please. 2. 2 Matmul with broadcasting: for i in range(ar):# c[i,j] = (a[i,:]). *[:,j]. sum() #previous c[i] = (a[i]. unsqueeze(-1) * b). sum(dim=0) And Using None also (As howard teached)c[i] = (a[i ]. unsqueeze(-1) * b). sum(dim=0) #howardc[i] = (a[i][:,None] * b). sum(dim=0) # using Nonec[i] = (a[i,:,None]*b). sum(dim=0)⭐️Tips🌟 1) Anytime there’s a trailinng(final) colon in numpy or pytorch you can delete it ex) c[i, :] = c [i]2) any number of colon commas at the start, you can switch it with the single elipsis. ex) c[:,:,:,:,i] = c […,i] 2. 3 Broadcasting Rules: What if we tensor. size([1,3]) * tensor. size([3,1])? torch. Size([3, 3]) What is scale???? What if they are one array is times of the other array? ex) Image : 256 x 256 x 3Scale : 128 x 256 x 3Result: ? Why I did not inserted axis via None, but happened broadcasting? >>> c * c[:,None]tensor([[100. , 200. , 300. ], [200. , 400. , 600. ], [300. , 600. , 900. ]])maybe it broadcast cz following array has 3 rows as same principle, no matter what nature shape was, if we do the operation tensor broadcasts to the other. >>> c==c[None]tensor([[True, True, True]])>>> c[None]==c[None,:]tensor([[True, True, True]])>>>c[None,:]==ctensor([[True, True, True]])3. Einstein summation: Creates batch-wise, remove inner most loop, and replaced it with an elementwise producta. k. ac[i,j] += a[i,k] * b[k,j]inner most loop c[i,j] = (a[i,:] * b[:,j]). sum()elementwise product Because K is repeated so we do a dot product. And it is torch. Usage of einsum()1) transpose2) diagnalisation tracing3) batch-wise (matmul) … einstein summation notationdef matmul(a,b): return torch. einsum('ik,kj->ij', a, b)so after all, we are now 16000 times faster than Python. 4. Pytorch op: 49166. 67 times faster than pure python And we will use this matrix multiplication in Fully Connect forward, with some initialized parameters and ReLU. But before that, we need initialized parameters and ReLU, Footnote: TensorRank ti noteResources: Frobenius Norm Review Broadcasting Review (especially Rule) Refer colab! (I totally confused with extension of arrays) torch. allclose Review np. einsum Reviewh "
}, {
- "id": 15,
+ "id": 17,
"url": "http://localhost:4000/2020/02/note08-fastai-1/",
"title": "What is the meaning of 'deep-learning from foundations?'",
"body": "2020/02/29 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring” Lecture 08 - Deep Learning From Foundations-part2 “ I don’t know if you read this article, but I heartily appreciate Rachael Thomas and Jeremy Howard for providing these priceless lectures for free Homework: Review concepts 16 concepts from Course 1 (lessons 1 - 7)(1) Affine Functions & non-linearities; 2) Parameters & activations; 3) Random initialization & transfer learning; 4) SGD, Momentum, Adam; 5) Convolutions; Batch-norm; 6) Dropout; 7) Data augmentation; 8) Weight decay; 9) Res/dense blocks; 10) Image classification and regression; 11)Embeddings; 12) Continuous & Categorical variables; 13) Collaborative filtering; 14) Language models; 15) NLP classification; 16) Segmentation; U-net; GANS) Make sure you understand broadcasting Read section 2. 2 in Delving Deep into Rectifiers Try to replicate as much of the notebooks as you can without peeking; when you get stuck, peek at the lesson notebook, but then close it and try to do it yourself calculus for machine learning based on weight… einsum conventionCONTENTS: What is going on in this course? What is ‘from foundations’? Steps to a basic modern CNN model Today’s implementation goal: 1) matmul -> 4) FC backward Library development using jupyter notebook jupyter notebook certainly can make module Elementwise ops How can we make python faster? What is element wise operation? FootnoteWhat is going on in this course?: What is ‘from foundations’?: 1) Recreate fast. ai and Pytorch 2) using pure python Evade OverfittingOverfit : validation error getting worsetraining loss < validation loss Know the name of the symbol you usefind in this page if you don’t know the symbol that you are using or just draw it here (run by ML!) Steps to a basic modern CNN model: 1) Matrix multiplication -> 2) Relu/Initialization -> 3) Fully-connected Forward-> 4) Fully-connected Backward -> 5) Train loop -> 6) Convolution-> 7) Optimization ->8) Batchnormalization -> 9) Resnet Today’s implementation goal: 1) matmul -> 4) FC backward: Library development using jupyter notebook: what is assers? jupyter notebook certainly can make module: There will be #export tag that Howard (and we) want to extract special notebook2script. py will detect sign of #expert and convert following into python module and test ittest\_eq(TEST,'test')test\_eq(TEST,'test1') what is run_notebook. py? when you want to test your module in command line interface !python run\_notebook. py 01_matmul. ipynb Is there any difference between 1) and 2)?1) test -> test01 2) test01 -> test #TODO I don’t know yet look into run_notebook. py, package fire Jeremy used. What is that?read and run the code in a notebook, and in the process, Jeremy made Python Fire library called!shockingly, fire takes any kind of function and converts into CLI command. fire library was released by Google open source, Thursday, March 2, 2017 Get data pytorch and numpy are pretty much same. variable c explains how many pixels there are in in MNIST, 28 pixels PyTorch’s view() method: torch function that manipulating tensor, and squeeze() in torch & mathmatical operation similar function Rao & McMahan said usually this functions result in feature vector. In part 1, you can use view function several times. Initial python model Which is Linear, like $Xw$(weight)$+a$(bias) $= Y$ If you don’t know hou to multiple matrix, refer this site matmul visulization site How many time spends if we we use pure python function matmul, typical matrix multiplication function, takes about 1 second for calculating 1 single train data! (maybe assumed stochastic, 5 data points in validation) it takes about 11. 36 hours to update parameters even single layer and 1 iteration! (if that was my computer, it would be 14 hours. . )🤪 THIS is why we need to consider ‘time’&’space’ This is kinda slow - what if we could speed it up by 50,000 times? Let’s try! Elementwise ops: How can we make python faster?: If we want to calculate faster, then do remove pythonic calcuation, by passing its computation down to something that is written something other than python, like pytorch. According to PyTorch doc it uses C++ (via ATen), so we are going to implement that function with python. What is element wise operation?: items makes a pair, operate corresponding componentFootnote: notebooks material video broadcasting excel"
}, {
- "id": 16,
+ "id": 18,
"url": "http://localhost:4000/2020/02/what-is-convolution/",
"title": "Digging into convolution",
"body": "2020/02/28 - Issues 1) Kaiming Initializtion in Pytorch was in trouble. 1 2) Jeremy started to dig in, in lesson09, but I didn’t know why the size of tensor is 2 and even understand this spreadsheet data. 3 Homework: Read Visualizing and Understanding Convolutional Networks paper What is a convolution? Visualization one kernel Matthew D Zeiler & Rob Fergus Paper Convolution can be represented as matmul Padding Kernel has rank 3 How can we find a side-edge, a gradient and area of constant weight? What is a convolution?: A convolutional neural network is that your red, green, and blue pixels go into the simple computation, and something comes out of that, and then the result of that goes into a second layer, and the result of that goes into the third layer and so forth. Visualization: one kernel Refer this site for visualizing CNN filteringMatthew D Zeiler & Rob Fergus PaperLecture01 Nine examples of the actual coefficients from the **first layer** Convolution can be represented as matmul: CNNs from different viewpoints {align-items: center;} [A B C D E F G H I J] is 3 by 3 image data flatten to vector. As a result, convolution is a just matrix just two things happens Some of entries are set to zeros at all the times same color always have the same weight. That called weight time / wegith sharing So, we can implement a convolution with matrix multiplication. But, we don’t do that because it’s slow!Padding: What most of libraries do is just put zeros asdie of matrix fast. ai uses reflection paddings (what is this? Jeremy said he uttered it)Kernel has rank 3: As standard picture input would be 4 5, it would be actually 3d, not 2d. If we make kernel as a 3x3 size, we pass over same kernel all the different Red, Green, Blue Pixels. This could make problem, because, if we want to detect frog, which is green, we would want more activations on the green(I made a test cell in my colab 6) How can we find a side-edge, a gradient and area of constant weight?: Not top-edge! One kernel can find only the top-edge, so we should stack the kernels 7 So, we pass it through bunch of kernels to the input images, and that process gives us height x width x corresponding number of kernels. Usually that number of chanel is 16 And if we want to get the more channels and features, we should repeat that process This process gives rise to memory out of control, we do the stride #### conv-example. xlsx 2 convolutional filters At a second layer, filter is 3x3x2 tensor, because to add up together the first layer’s channel. Reference: Problem was math. sqrt(5) was not kaiming initialization formula, Implementation in Pytorch ↩ size of tensor, lecture09 ↩ conv-example. xlsx ↩ Why do computer use red, green and blue instead of primary colors ↩ Grayscale is a group of shades without any visible color. … Each of these dots has its own brightness level as well and, therefore, can be converted to grayscale. A grayscale image is one with all color information removed. ↩ Testing RGB and grayscale ↩ stack kernel and make new rank of tensor at output, Lesson06-2019 ↩ "
}, {
- "id": 17,
+ "id": 19,
"url": "http://localhost:4000/2020/02/dps-week8/",
- "title": "Digital Product School week 8&9",
- "body": "2020/02/24 - The 8th week retropect at Digital Product School Week 8/9 - Ship your MVP/Release next iteration each day This week's schedule CONTENT: Preparing engineering weekly Agile Process Daily Stand-up Making application flowchart (feat draw. io) / ER diagram Flowchart, understaning user journey ER diagram Engineering weekly AI lunch Connecting firebase andPreparing engineering weekly: This week at Wednesday, I planned to explain the Language Modelings, mainly focusing ELMo, ULMFiT, BERT and GPT-2. Slides is available here Changed the presentation, because there were people who are not in ML domain. hereWhenever I do the presentation, I learn more than the information I give them. At the same time, I realize I need to learn more than I know. Agile Process: One of a priceless lesson I learnt from digital product school, was experience of doing agile work. Before I came here, it was a little bit vague concept. I’m not sure ‘what is agile’ but this is what we tried to make agile process. Daily Stand-up: Sharing the works everyday helps interdisciplinary team to work better. Since product started to get higher fidelity, the gap between engineer and non-engineer increased. Actually I didn’t planned to explain concept because I thougth I would be lose my audience when I start to explain. But as daily stand-up, which shares our progess, goes day by day, I planed and reported the issues. And it made each other’s topic feel more familiar. I think point is very important, because at that point people start to be curious. So we can actively ask to the others, and that momwnr, we can explain the point teammate dosen’t know. Each color means every different section. Red: Our team goal, Blue: Interaction designer, Green: Product manager, Yellow: Software/AI engineer This week engineer's main plan Each of us try to explain what we are doing, but things become easier when we are asked. Because we explained something was important to us before, but if we asked it is something important for the others. Making application flowchart (feat draw. io) / ER diagram: Before we start the party, we should clarify the flowchart and ER diagram of our application. Flowchart, understaning user journey: Thanks for google, we could use draw. io for our framechart framework. Actually, we cana choice other good flatform, but draw. io has connected app throgh google drive, most of our engineer was used to it. And after this job, I got to know there is also (of course) rule with the symbols, color, size, space, scaling and direction of arrow -reference. But why we should do this? WE have made our storymap before!! I think storymap is for visualize our status and app. So it should be shared with whole the team, and they should able to understand each role’s issue. But flowchart is more like testing technical feasibility, and error that user can experience. So it could be little more specific, complicated, and hypothetical. This week engineer's main plan ER diagram: Even if we use NoSQL database through firebase, my team was accustomed to SQL more. That what we educated when we were at college, so we had to organize our concept while we were learning NoSQL. Engineering weekly: Every engineering weekly we exchange our knowledge each other so that we can grow together. Before today, my AI collegues presented regression, knn and it was my turn. I prepared slide that explain about pre-trained language model, but my header advised me if I go deep of theoretical things, I would lose my audience. So I decided to brief BERT mode, how I can contribute to other team’s project. Since BERT was breakthrough of NLP industry, I tried to explain how it can be applied to hands on product and how it can help people in their product. The result was quite motivative to me. They gave feedback that since it wasn’t that much theoretical, they could enjoy it, and useful information. Someone asked me do I had learned of presentation before. I was really happy with their feedback! AI lunch: Connecting firebase and: "
+ "title": "My life in Digital Product School - week 8/19/10",
+ "body": "2020/02/24 - The 8/9/10th week retropect at Digital Product School Week 8 - Ship your MVPWeek 9/10 - Release next iteration each day Week 8th schedule CONTENT: Agile Product Development Daily Stand-up(planning) Gemba Walk Sprint Reviews Engineering weeklyAgile Product Development: One of a priceless lesson I learnt from digital product school, was experience of doing agile work. Before I came here, it was a little bit vague concept. I’m still not sure ‘what is agile’ but this is how we tried to make agile process. Daily Stand-up(planning): Sharing the works everyday helps interdisciplinary team to work better. Since product started to get higher fidelity, the gap between engineer and non-engineer increased. Actually I didn’t planned to explain concept because I thougth I would be lose my audience when I start to explain. But as daily stand-up, which shares our progess, goes day by day, I planed and reported the issues. And it made each other’s topic feel more familiar. I think point is very important, because at that point people start to be curious. So we can actively ask to the others, and that momwnr, we can explain the point teammate dosen’t know. Each color means every different section. Red: Our team goal, Blue: Interaction designer, Green: Product manager, Yellow: Software/AI engineer This week engineer's main plan Each of us try to explain what we are doing, but things become easier when we are asked. Because we explained something was important to us before, but if we asked it is something important for the others. Gemba Walk: Team Cero with core team Every 2 weeks, we do the Gemba work, which is ‘question everything to the core team’ time. At this period, people can ask anything related to our product, workshop, and framework. Core team will help just for each team, and each team can solve the problem related to their work. < br/>Why we need this session? because with workshop and general schedule, core team has no time just focus on each team. So through this session, we can have opportunity to understand each program and workshop, like why we are using this platform, and when is the due of our small project, and we have this problem and we need help for this. whatever small problem you have, core team is always willing to help you. Sprint Reviews: Every Friday, we have time to summarise what we did for the week. Maybe we need HMW question and our storymap to share our process and then tell and share what we did try, what point we succeeded and what point it was deviant of our prediction, and why we tried it. . Sprint of Ve-link And then, just after all team’s ppt, we do vote with such a cute marvel. Always it’s very difficult to vote (of course you can’t vote to your team!) Because it depends on criteria what do I value!But since this is process of our agile work, I try to focus on what they have changed since last week, and why they did it, how they did it. Engineering weekly: Every engineering weekly we exchange our knowledge each other so that we can grow together. Everyone have their knowledge to share and we can be tutor and at the same time can be of tutee. Previously, my AI collegues presented regression, knn. And because I’m somewhat specialized to NLP, I prepared slide that explain about pre-trained language model, but my header advised me if I go deep of theoretical things, I would lose my audience. So I decided to brief BERT mode, how I can contribute to other team’s project. Since BERT was breakthrough of NLP industry, I tried to explain how it can be applied to hands on product and how it can help people in their product. The result was quite motivative to me. They gave feedback that since it wasn’t that much theoretical, they could enjoy it, and useful information. Someone asked me do I had learned of presentation before. I was really happy with their feedback! "
}, {
- "id": 18,
+ "id": 20,
"url": "http://localhost:4000/2020/02/fast.ai-nlp-note-16/",
"title": "Algorithmic bias",
"body": "2020/02/20 - Algorithms can encode & magnify human bias Case Study 1: Facial Recognition & Predictive Policing: Joy Buolamwini & Timnit Gebru, gendershades. org Microsoft, FACE+, IBM - All of these things are sell now. Largest gap between $\therefore\ Lighter Male\ >\ Darker\ Female $ This US mayor joked cops should “mount . 50-caliber” guns where AI predicts crime With machine learning, with automation, there’s a 99% success, so that robot is ㅡwill beㅡ99% accurate in telling us what is going to happen next, which is really interesting. - city official in Lancater, CA, approving on using IBM for public security Bias: Bias is type of error Statistical Bias: difference between a statistic’s expected value and the true value Unjust Bias: disproportionate preference for or prejudice against a group Unconscious bias: bias that we don’t realize we have But, term bias is too generic to be productive. Different sources of bias have different causes Representation Bias: Dataset was not representative of the algorithm that might be used on later. Above : Data is okay, but algorithm has some problem. Below : Data has error. For example, object detection production that performs very well in common product of US. But in contrast, change of target product region, like Zimbabwe, Solomon Island, and so on, reduced the performence remarkably. It is not the algorithmic problem, so we should care about data volume of region. Evaluation Bias: Benchmark datasets spur on research, 4. 4% of IJB-A images are dark-skinned women. 2/3 of ImageNet images from the West (Sharkar et al, 2017) Case Study 2: Recidivism Algorithm Used Prison Sentencing: Case Study 3: Online Ad Delivery: Bias in NLP: ( Nothing to do with the course, but I’m researching this field these days. ) But all about Englsih ImpactThe person is doctor. The person is nurse -> 그는 의사다. 그녀는 간호사다. Concept of “biased data” often too generic to be useful: Different sources of bias have different sources Data, models and systems are not unchanging numbers on a screen. They’re the result of a complex process that starts with years of historical context and involves a series of choices and norms, from data measurement to model evaluation to human interpretation. - Harini Suresh, “The problem with Biased Data” Five Sources of Bias in ML: Representation Bias Evaluation Bias Measurement Bias Aggregation Bias(46:02) Historical Bias(46:26) A few studies(47:13) Racial Bias, Even when we have good intentions(new york times)(47:10) gender(48:59) Humans are biased, so why does algorithmic bias matter?: Algorithms & humans are used differently (humans are usually decision maker) Algorithms are accurate and objective No way to apeal if there if error processed large scale cheap Machine learning can amplify bias Machine learning can create feedback loops. Technology is power. And with that comes responsibility. Solutions: Analyze a project at work/school: Questions about AI 5 types of bias (Suresh & Guttag) Datasheets for datasets, Modelcards for model reporting Accuracy rate on different sub-groups Work with domain experts & those impacted Increase diversity in our workspace Advocate for good policy Be on the ongoing lookout for bias"
}, {
- "id": 19,
+ "id": 21,
"url": "http://localhost:4000/2020/02/classifier-city/",
"title": "Making a classifier with image dataset made from gooogle",
"body": "2020/02/15 - CONTENTS: Creating dataset from google images Using google_images_download Create ImageDataBunch Train model fit_one_cycle() Let’s find-tune Let’s train the whole model! Let’s make batch size bigger! Interpretation Model in productionCode can be found hereDeployed model here Making a classifier which can distinguish Seoul from Munich and Sanfrancisco!(hoping my well in Munich!) Creating dataset from google images: In machine learning, you always need data before you build your model. You can use either URLs or google_images_download package. Since Jeremy explained specifically, I will try the other. Using google_images_download: note: This is not google official package Refer to Official Doncument, put that arguments. from google_images_download import google_images_downloadresponse = google_images_download. googleimagesdownload() #class instantiationout_dir = os. path. abspath('. . /. . /materials/dataset/pkg/')os. mkdir(out_dir)arguments = { keywords : Cebu,Munich,Seoul , print_urls :True, suffix_keywords : city , output_directory :out_dir, type : photo , }paths = response. download(arguments) #passing the arguments to the functionprint(paths)and if you need, here is main code. Create ImageDataBunch: We need to separate validation set because we just grabbed these imagese from Google. Most of the dataset we use (kaggle/research) splited into train / validation / test so if they are not devided beforehand we should make databunch, and Jeremy recommended assign 20% to validation. Help on function verify_images in module fastai. vision. data:verify_images(path: Union[pathlib. Path, str], delete: bool = True, max_workers: int = 4, max_size: int = None, recurse: bool = False, dest: Union[pathlib. Path, str] = '. ', n_channels: int = 3, interp=2, ext: str = None, img_format: str = None, resume: bool = None, **kwargs) Check if the images in `path` aren't broken, maybe resize them and copy it in `dest`. Data from google image url Data from package Train model: len(class) len(train) len(valid) Data_url 3 432 108 Data_pkg 3 216 53 Uisng model: restnet34 1, Measurement: accuracy 2 fit_one_cycle(): What is fit one cycle? Cyclical Learning Rates for Training Neural Networks One of the way to find good learning rate. Core idea is to start with small learning rate (like 1e-4, 1e-3) and increase the learning rate after each mini-batch till loss starts exploding. And pick up learning rate one order lower than exploding point. For example, plotted learning rate is like below picture, picking up around 1e-2 is the best way. Why this methods Traditionally, the learning rate is decreased as the learning starts converging with time. But this paper suggests to cycle our learning rate, because it makes us avoid local minimum. Basically this cyclic method enables us to explore whole of loss function so that find out global minimum. In other words, higher learning rate behaves like regularisation. Let’s find-tune: Do train just one last layer by learning rate found by find_lr This section you should find the strongest downward slope that kind of sticking around for quite a while. And choose just one order lower than lowest point. As explained before, I will pick up 1e-2. And of course, this is fine-tuning, we don’t need discriminative learning rate yet. Let’s train the whole model!: link When you plot the learning rate again, maybe you will get soaring shape of learning rate. Rule of thumb, When you slice the learning rate, use learning rate you used at unfrozen part. Divide it by 5 or 10 and put it on maximum bound. At minimum bound, get the point just before it soared, and divide it by 10. Let’s make batch size bigger!: Since default batch size is 64, I tried it to 128. And it gets way more better result(even it’s still underfitting!) And if I freeze model and train whole model again, the model would be better. Also, you can use this method to the other big dataset model training! Interpretation: See the confusion matrix. Result is quite great. *Since I’m using colab, I will skip data cleansing. But I highly recommend you to use ImageCleaner widget, only if you are using jupyter notebook (not jupyter lab) Model in production: You can deploy your model in simple way. I referred fast. ai, and used render(it’s free for limited time). You can find detailed document here. and you can create a route like this. @app. route( /classify-url , methods=[ GET ])async def classify_url(request): bytes = await get_bytes(request. query_params[ url ]) img = open_image(BytesIO(bytes)) _,_,losses = learner. predict(img) return JSONResponse({ predictions : sorted( zip(cat_learner. data. classes, map(float, losses)), key=lambda p: p[1], reverse=True ) })You can find my deployed model here Reference: How to create a deep learning dataset using Google Images towardsdatascience - one cycle policy Deep Residual Learning for Image Recognition ↩ Accuracy_and_precision ↩ "
}, {
- "id": 20,
+ "id": 22,
"url": "http://localhost:4000/2020/02/dps-week5/",
"title": "Digital Product School week 5",
"body": "2020/02/09 - The 5th week retropect at Digital Product School Week 5 - Create a Storymap and sync it with Lean Canvas This week's schedule CONTENT: How to create our story map Prepare your story Discover your product’s AI potentialMondayHow to create our story map: We need this 'aha' moment There was a Milestone workshop, about our weekly goal. As we are agile working, we go fast and change every week’s goal. This week we will finalize our story map based on user’s pain-point and HMW questions. How should we make our story-map Basically we should make story map based on this rule Tell stories, don’t just write them! We always need context, that means all the story component should be connected Visualize your product to establish a shared understanding and speed up discussions! Post-it filled of text is not enough, we should fill it with visualizations then team mates can understand it fast Only discuss in front our your story map! (Speed) So we can update our story-map as soon as we change our opinion And also Use a story map to find the parts that matter most and to identify holes in your idea! Since the story map consists of techinical part, we should consider each story’s technical feasibility Minimise output, maximise outcome and impact! Build tests to figure out what’s minimum and what’s viable! This story map functions to find out our minimum value of ideas Work iteratively: Change your story map according to your learnings! We should repeat this process again and again PMs: Make sure Storymap is up to date!Prepare your story: team cero, our whole story map Our goal Technical feasibility of our storyWhat is your strategy to make user achieve something? This would be our expand point Discover your product’s AI potential: How can we apply AI to our product? Let’s write down our ‘HMW’ questions, and find out all p ossibilities. These are suggestion of possibilities, so don’t attached to feasibility (we will do in at lean start-up) Software section's expectation AI section's expectationTuesday Engineer's task, week5This 5th week, engineers settled WendesdayThursdayFriday"
}, {
- "id": 21,
+ "id": 23,
"url": "http://localhost:4000/2020/02/GPU-time/",
"title": "4 reasons took much time to setting GPU for fast.ai than I expected",
"body": "2020/02/05 - Motivation: Before now, me as a undergraduate student, I was parsimony who usually depend on colab, kaggle, friend’s server(occasional) whenever i need GPU. . And this time it’s been for a while to install GPU than I expected and I share the several component that stood in my way. Written at Oct 24 2019, if you think this is deprecated, please do not have a leap of faith. Just for the record, I’ve used Kaggle, Colab, GCP, Azure, EC2 as GPU cloud. 1. Did not know there is JupyterLab option in Google Cloud Platform. : At the first time when GCP came out, there was no AI Platform service. So from starting vm instance to launching jupyter and installing packages, I did all of the things myself. (and I learned 🤗) $ curl -O https://repo. continuum. io/archive/Anaconda3-5. 0. 1-Linux-x86_64. sh[Downloading conda in ssh] I created VM instance,selected zone, machine type and disk type. Then, define firewall rules and in ssh terminal, install jupyter and other packages. But you can do all of these things just using AI Platform. [AI Platform] I think it especially save your time if you are living in Asia-Pacific, which google doesn’t support not that much GPU resources. 2. Consider if the platform has limited resources in a region you live in. : I live in South Korea, East Asia, and it seems like this region has lots of limitation in GPU (except quite expensive AWS) And the Taiwan which was the only one region where I can launch my own VM with GPU (I tried all the other regions in the list) sometimes do normaly, but not always. 😥After launching, I did several works and next day I could not start VM. (I didn’t count it, but tried it a few hours because I didn’t want cost any more time…) Endlessly failed to start instance, then I choose to move AWS as an alternative way. 3. Fast. ai gives deliberate guide and I didn’t know it. : Fast. ai offer the guide for all available platform. (Colab, salamander, Gradient, Kaggle, Colab, and so on) It is so important, and really needs, because cloud computing options are vary as occasion and purpose arise. I didn’t know fast. ai has manual to running GCP, and I think it’s as good a reason as any for me to be have taken time. It helped me so much when I had aws and shortened my time. I don’t want to read all of the manual in amazno. . (It is recommended. . but I’d rather read GIT PRO now…) ssh -i ~/. ssh/<your_private_key_pair> -L localhost:8888:localhost:8888 ubuntu@<your instance IP>4. You should wait to add more volume just after add volume, by building AWS EC2. : Since Elastic Block Store(EBS) storage supports optimized storage, users can’t extend storage volume two times in a row. Unfortunately, at the first time, I didn’t know it (again 👻) and when VM lacked volume, I doubled dist capacity (76*2) at a rough but It needs more. <!– this time I installed GPU in two years, and it became little complicated compared to 2 years ago. And this time for the first time(maybe not the first time. . but i handled it in my class or with my friend. but it’s my first time on my own. ) I very I’m started to using used google colab, kaggleand, GCP-JupyterLab, ec2 - friend made, aws vm machine but I had a environment variable but i did not know of it. On these days, I could not get a resources from taiwan… I couldn’t notice a deliberate Anyway, as a result I tried myself gcp myself and aws ec2 with fast. ai But I think doing on my self surely takes much time (in this point I wonder why I’m doing this, and should remind me, especially I was studying disk volume optimization) disk volume exceed - https://askubuntu. com/questions/919748/no-space-left-on-device-even-though-there-is: "
}, {
- "id": 22,
+ "id": 24,
"url": "http://localhost:4000/2020/02/dps-week4/",
"title": "Digital Product School week 4",
"body": "2020/02/01 - The 4th week retropect at Digital Product School Week 4 - Find solution ideas and run experiments [This week’s schedule] CONTENT: Ideation Techniques What is ideation techniques? Generating idea in my team AIdeation Team brain storming of idea Die Produkt MacherMondayIdeation Techniques: [slides from @steffen] What is ideation techniques?: We tried to find out user’s painpoint last week. Tried to users talk about their, pain point. No question directly, but extract from them their pain with transportation. Generating idea in my team: AIdeation: TuesdayTeam brain storming of idea: Based on generated idea on Monday, we extended our idea doing rolling-paper! Die Produkt Macher: What is lean start-up? Lean startup is a methodology for developing businesses and products that aims to shorten product development cycles and rapidly discover if a proposed business model is viable; this is achieved by adopting a combination of business-hypothesis-driven experimentation, iterative product releases, and validated learning. - wikipedia WendesdayThursdayFriday"
}, {
- "id": 23,
+ "id": 25,
"url": "http://localhost:4000/2020/01/retrosprect-of-acl-paper-2020/",
"title": "Retrospect of ACL 2020 paper writing",
"body": "2020/01/29 - 2020 Annual Conference of the Association for Computational Linguistics Why I can’t use ‘Cebuano’ for the research?: Why I had to change target language from ‘Cebuano’ to ‘Tagalog’?-> No language translator options except google translation. But before knowing that I already consult my friend, whose mother tongue is English. So I had to aplogize her, but couldn’t tell her why suddenly I changed my plan. -> I realized there are many languages even can’t be researched at all. . -> Getting accustomed to discrimination makes misunderstanding, sometimes. At my country, we couldn’t use music streaming service, because of legal problem. But at that moment, I thought it was discrimination, which is done by music company. "
}, {
- "id": 24,
+ "id": 26,
"url": "http://localhost:4000/2020/01/Git-Merge/",
"title": "Why am I not listed as a contributor?!",
"body": "2020/01/10 - From the end of last year, big changes have witnessed in NLP research. Embracing an unprecedented growth, I started to study new exciting results and advances. In doing so, I noticed I’m not listed as contributor of repo which my PR accessed. How did I come to a repository?: When I’m stuck, I would prefer to code, than to go deep in theory. (It must be so. . too much to understand 🤒)It was BERT released by Google AI I felt keenly the necessity of implementing, because not only couldn’t understand the way they figured out positional encoding formula, but how it actually works. What does it mean to “scale” dot product in Attention? (Now I know it’s far from my section 😂) Figure 1. Scaled Dot Product. Adopted from tensorflow blogWhat was the code error?: For implement code in paper, I read the papers Transformer and BERT, structured the model, and refered the others’ code. Meanwhile, I found out a small error in tokenization process, which was changing a token into [MASK], enabled bidirectional representation. I’ve made PR, and got merged. But I was not in contributors. Why?: Figure 2. Merged Pull request Adopted from graykode projectActually I happened to know there can be couple of reasons github doesn’t include my name as contributor. Well, if contributors tab has more than 100 people, in which case it shows you up only if you are in the top 100 contributors because displaying too many contributors can make webpages down. Somethimes, however, it doesn’t that problem. Why not? Two possibilities are there. First, According to Joel-Glovier, if repository maintainer merged-as-a-rebase PR will end up showing as maintainer’s commit. But maintainer shouldn’t normally do this. Second, if you happend to commit using a different git email that what is in your GitHub profile, it will not be attached to your Github user, and “doesn’t show up” as you. Reference: Michał Chromiak’s blog Github: why are my contributions are not showing on my profile atlassian-gitfetch"
}, {
- "id": 25,
- "url": "http://localhost:4000/2019/12/lesson1-fastai/",
- "title": "Fine Grained Classification",
- "body": "2019/12/31 - Finally you can solve the mystery behind this weird drawing. . through this course. juptyer notebook magic: %reload_ext autoreload%autoreload 2%matplotlib inlinethis is special directives to jupyter notebook, not python code. And it is called ‘magics’ (but i think jeremy is magicion) If somebody changes underlying library code while I’m running this, please reload it automatically If somebody asks to plot something, then please plot it here in this Jupyter NotebookDon’t hesitate to import start~ Digging into untar_data, path. ls: Union[pathlib. Path, str]: typed programming language? -> maybe i think disclaim the type beforehand for sure. Q. like assert? path. ls()this is some module that fast. ai made because os. listdir(‘path’) is unconvinient. Python3 pathlib library!: pathlib "
- }, {
- "id": 26,
+ "id": 27,
"url": "http://localhost:4000/2019/12/jeremy-howard/",
"title": "Jeremy Howard",
"body": "2019/12/15 - This is journey to find out ‘who am I trying to be?’: How he impacted me? The person who made me start Computer Vision again. He emphasized the importance of studying NLP and Computer together to understand the deep-learning. He didn’t order it to study, but always he pursuade me with reasonable way. “It’s not just something I can throw away. NLP and computer vision a few weeks apart and that’s going to force your brain to realize like ‘oh I have to remember this’” He made me admit my failure in deep-learning. I started to objectify where am I. What should I do when I’m frustrated. “Keep going. You’re not expected to remember everything. Yet. You’re not expected to understand everything. Yet. You’re not expected to know why everything works. Yet. ” His articles are numerous, below. What is torch. nn Really? High Performance Numeric Programming with Swift: Explorations and Reflections C++11, random distributions, and Swift And especially, I like this book. Designing great data products Great predictive modeling is an important part of the solution, but it no longer stands on its own; as products become more sophisticated, it disappears into the plumbing. Designing great data products And he is also famous for words. Here are some. we’re going to try and use that to really understand what’s going on. So to warn you, none of it is rocket science but a lot of its going to look really new. So don’t expect to get it the first time but expect to listen and jump into the notebook try a few things test things out look particularly at like tensor shapes and inputs and outputs to check your understanding then go back and listen again. But and kind of try it, a few times, because you will get there right, it’s just that there’s going to be a lot of new concepts because we haven’t done that much stuff in pure Pytorch. Lesson 6: Deep Learning 2019 "
}, {
- "id": 27,
+ "id": 28,
"url": "http://localhost:4000/2019/11/julia-evans/",
"title": "Julia Evans",
"body": "2019/11/20 - This is journey to find out ‘who am I trying to be?’: The women who surprised me in many ways. First, she approached me to teaching some concepts drawing cartoons. It was at Hackers news, which was hightest ranks. Personally I have the use of not to reading title, so and cartoon was so cute and clear. I naturally gonna understood mechanism and astonished by her explaination ability. Her value, which she was taught by many people so want to do same things, moved me. Volume of her knowledge, that just reading post title is a deal of work, amazed me. "
}, {
- "id": 28,
+ "id": 29,
"url": "http://localhost:4000/2019/11/coc-retropective/",
"title": "Retrospective on Pycon 2019 Korea (CoC Committee)",
"body": "2019/11/05 - When I was volunteer, it seems like busy and hectic to managing that crowded conference. In my experience, to get things moving, it needs hierarchy. But it didn’t. Organizers emphasized our responsibility, and if I passed each other’s burden, It could be my burden next time. In solidarity of the obligation, we finished conference well. And after participating PyCon Korea 2018 as volunteer, I’ve joined PyCon Korea Organizer last year. <Figure 1> First meeting of PyCon 2019 Korea Organizers It’s been a while since PyCon 2019 finished. It’s held on Aug 15 - 18, at Coex Grand Balloom <Figure 2> Ongoing session, speaking on news comment processing <Figure 3> Sponsor Booth iin Coex Hall <Figure 4> After PyCon 2019, with all of volunteer, organizer, speakers 😍 🥰 Serving as part of the coc TF, I spent large fraction of last year doing CoC job. here’s the path what we’ve been grappled with to grasp a solution. First half: Before the conference Toward Diverse Community: Formally we’ve been reusing and modifying PyCon US CoC, but we needed fit in Korean and I was part of that to revise code of conduct. Except ‘That’ Diversity, Because it is ‘Harassment’: Specific point was harassment, and the others were not. process of finding the points. How can we settle this point?Second half: During the conference Handling the potential Harassment: Disjunction of policy and real-time situation: This ‘PyCon 2019 Korea retrospective series’ would be devided into 3 Episodes. “Retrospective on Pycon 2019 Korea (CoC Committee)” “Retrospective on Pycon 2019 Korea (Program Chair)” (20 Nov, To Be Update) “Maintaining participation while still making timely decisions” (29 Nov, To Be Update)"
}, {
- "id": 29,
+ "id": 30,
"url": "http://localhost:4000/2019/11/elif-shafak/",
"title": "Elif Shafak",
"body": "2019/11/05 - This is journey to find out ‘who am I trying to be?’: For creative-minded people, Istanbul is a treasure. ’ Photo © Chris Boland, licensed under CC BY-NC-ND 2. 0 it suddenly felt like what I was trying to convey was more complicated and detailed than what the circumstances allowed me to say. And I did what I usually do in similar situations: I stammered, I shut down, and I stopped talking. I stopped talking because the truth was complicated, even though I knew, deep within, that one should never, ever remain silent for fear of complexity. <Figure 1> Elif Shafak Photo credit: www. elifsafak. com. tr I want to talk about emotions and the need to boost our emotional intelligence. I think it’s a pity that mainstream political theory pays very little attention to emotions. Oftentimes, analysts and experts are so busy with data and metrics that they seem to forget those things in life that are difficult to measure and perhaps impossible to cluster under statistical models. But I think this is a mistake, for two main reasons. We are emotional beings. I think it’s going to be one of our biggest intellectual challenges, because our political systems are replete with emotions. In country after country, we have seen illiberal politicians exploiting these emotions. And yet within the academia and among the intelligentsia, we are yet to take emotions seriously. I think we should. 1 2 Reference: British Council Worldwide ↩ Ted Talk ↩ "
}, {
- "id": 30,
+ "id": 31,
"url": "http://localhost:4000/2019/01/dps-week1/",
"title": "Digital Product School week 1",
"body": "2019/01/11 - The 1th week retropect at Digital Product School [This week’s schedule] CONTENT: Welcome to Digital Product School! Trip to Spitzingsee Welcome to Design Office Specifying our goal of product Welcome to Digital Product School!: Trip to Spitzingsee: At the first day of Digital Product School, we had a off-site with all of batch 9 people. All the costs were managed by dps. At the beautiful mountain, we settled the team, and got my team goal. Basically, there are two kind of team in DPS. (1) Wild team - the team has fixed topic(2) Company team - the team which has specific stakeholders, and also topic defined by that stakeholders The Core-team will fix what team you will join in DPS for 3 months based on ymy professionals, they announce it at off-site. [My team for 3 months at DPS] And we decide on my batch #9 theme song. How? Each team draw for songs and pitch ‘why this song should be batch #9 theme song’The result? Imagine dragon - Believer (I didn’t know at the moment, this song would be stamped in my memory) We have a workshop for getting to know each other. For example, we share 1) what do I expect from 3 months of dps, 2) when I feel happy in my life time, 3) what I worked for last week, 4) what was my last project and 5) what plays important role in my life My team's board Cero Welcome to Design Office: At first day of design office, we had workshop, which celebrates my day in dps also discuss specific rule, menifesto and stakeholders We get sticker and attach it in map depends on my nationality Now time to get to know my team’s stakeholders. What they want for us? What they expect from us? How free my team are on the topic?To be honest, it is endless tug-of-war. We should discuss with my stakeholders, endlessly, and find out solution which can meet interest of users, stakeholders and my team. Basically, my team’s main stakeholder is ADAC, but BMW, City of munich and Nokia will also participate as my team’s stakeholders. Specifying our goal of product: "
diff --git a/_site/2020/02/classifier-city/index.html b/_site/2020/02/classifier-city/index.html
index 97866cc76a..b9d18103eb 100644
--- a/_site/2020/02/classifier-city/index.html
+++ b/_site/2020/02/classifier-city/index.html
@@ -19,9 +19,9 @@
-
+
+{"description":"CONTENTS","author":{"@type":"Person","name":"dionne"},"@type":"BlogPosting","url":"http://localhost:4000/2020/02/classifier-city/","publisher":{"@type":"Organization","logo":{"@type":"ImageObject","url":"http://localhost:4000/assets/images/logo.png"},"name":"dionne"},"image":"http://localhost:4000/assets/images/munich2.jpg","headline":"Making a classifier with image dataset made from gooogle","dateModified":"2020-02-15T00:00:00+09:00","datePublished":"2020-02-15T00:00:00+09:00","mainEntityOfPage":{"@type":"WebPage","@id":"http://localhost:4000/2020/02/classifier-city/"},"@context":"http://schema.org"}
@@ -161,96 +161,101 @@
"body": " {% if page. url == / %} {% assign latest_post = site. posts[0] %} <div class= topfirstimage style= background-image: url({% if latest_post. image contains :// %}{{ latest_post. image }}{% else %} {{site. baseurl}}/{{ latest_post. image}}{% endif %}); height: 200px; background-size: cover; background-repeat: no-repeat; ></div> {{ latest_post. title }} : {{ latest_post. excerpt | strip_html | strip_newlines | truncate: 136 }} In {% for category in latest_post. categories %} {{ category }}, {% endfor %} {{ latest_post. date | date: '%b %d, %Y' }} {%- assign second_post = site. posts[1] -%} {% if second_post. image %} <img class= w-100 src= {% if second_post. image contains :// %}{{ second_post. image }}{% else %}{{ second_post. image | absolute_url }}{% endif %} alt= {{ second_post. title }} > {% endif %} {{ second_post. title }} : In {% for category in second_post. categories %} {{ category }}, {% endfor %} {{ second_post. date | date: '%b %d, %Y' }} {%- assign third_post = site. posts[2] -%} {% if third_post. image %} <img class= w-100 src= {% if third_post. image contains :// %}{{ third_post. image }}{% else %}{{site. baseurl}}/{{ third_post. image }}{% endif %} alt= {{ third_post. title }} > {% endif %} {{ third_post. title }} : In {% for category in third_post. categories %} {{ category }}, {% endfor %} {{ third_post. date | date: '%b %d, %Y' }} {%- assign fourth_post = site. posts[3] -%} {% if fourth_post. image %} <img class= w-100 src= {% if fourth_post. image contains :// %}{{ fourth_post. image }}{% else %}{{site. baseurl}}/{{ fourth_post. image }}{% endif %} alt= {{ fourth_post. title }} > {% endif %} {{ fourth_post. title }} : In {% for category in fourth_post. categories %} {{ category }}, {% endfor %} {{ fourth_post. date | date: '%b %d, %Y' }} {% for post in site. posts %} {% if post. tags contains sticky %} {{post. title}} {{ post. excerpt | strip_html | strip_newlines | truncate: 136 }} Read More {% endif %}{% endfor %} {% endif %} All Stories: {% for post in paginator. posts %} {% include main-loop-card. html %} {% endfor %} {% if paginator. total_pages > 1 %} {% if paginator. previous_page %} « Prev {% else %} « {% endif %} {% for page in (1. . paginator. total_pages) %} {% if page == paginator. page %} {{ page }} {% elsif page == 1 %} {{ page }} {% else %} {{ page }} {% endif %} {% endfor %} {% if paginator. next_page %} Next » {% else %} » {% endif %} {% endif %} {% include sidebar-featured. html %} "
}, {
"id": 12,
+ "url": "http://localhost:4000/2020/04/v3-2019-lesson06-note/",
+ "title": "fastai 2019 course-v3 Part1, lesson06",
+ "body": "2020/04/15 - Lesson 06Rossmann(Tabular): Tabular data: be careful on Categorical variable vs Continuous variable. if datatype is int, fastai think it is classification, not a regression. Root mean square percentage error. as loss function. When you assign the y_range, it’s better to assign little bit more than actual maximum. > because it’s sigmoid. intermediate layers, which is weight matrix is 1) 1000, and 2) 500 -> which means our parameter would be 500*1000. learn. modelWhat is dropout and embedding dropout?: Nitish Srivastava, Dropout: A Simple way to prevent Neural Networks from Overfitting you can dropout with p value, make it specified to specific layer, or make it applied to all the layers. Pytorch code 1) bernoulli, which decides whether you will hold it? 2) and divide the noise value depends on noise value. so noise became 2 or remain 0. According to pytorch code, We do change at training time, but we do nothing at test time. and this means you don’t have to do anything special with inference time. ’ TODO: find at forums what is inference time - Related to NVIDIA, GPU. Embedding dropout is just a dropout. It’s different between continuous variable and embedding layer. TODO Still can’t understand. why embedding dropout is effective. or,… in need. Let’s delete at random, some of the results of the embedding. and It worked well especially at Kaggle Batch Normalization: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift -> came out false! According to How Does Batch Normalization Help Optimization? The key was multiplicative bias {\gamma} and additive bias {\beta}` Explain Let $$ \hat{y} = f(w_1, w_2, w_3, … , x)} $$ , loss = MSE , Then y_range should be between 1 and 5` And Activation function ends with -1 -> +1 To mitigate this problem, we can add the other parameter, like $$w_n$$ But there’re so much interactions in the process so just re-scale the output. Momentum parameter at BatchNorm1d: Different from momentum like in optimization. This momentum is Exponentially weighted moving average of the mean, instead of deviation. If this is small number: mean standard deviation would be less from mini_batch to mini_batch » less regularization effect. (If this is large number, variation would be greater from mini_batch to mini_batch » more regularization effect) TODO: can’t sure, but i understand, this is not about how to update parameter but about how much reflect previous value when scale and shift Q. Preference between batchnorm and the other regularizations(drop out, weight decay)A. Nope, always try and see the results## lesson6-pets-more### Data Augmentation- Last reg- `get_transforms` has lots of params (even not yet learned all) -> check documentation - Remember you can implement all the doc contents bc it's made from nbdev - TODO: try this!!- Essence of data augmentation is you should maintain the label, while somewhat making sense. - ex) tilt, because it's optically sensible, you can always change the angle of the data view. - zeros, border, and reflection but always `reflection` works most of the time, so that is the default### Convolutional Kernel(What is convolution?)- Will make heat\_map from scratch, which means the parts convolution focuses on![setosa_visualization]()- http://setosa. io/ev/image-kernels/ - javascript thing - How convolution works - Kernel. which does element-wise multiplication, and sum them up - so it has on pixel less at borders -> so it uses padding, and fastai uses reflection as said. - why this Kernel(matrix) helps catching horizontal edge side? - because this kernel`(picture2)` weights differently, depends on `x axis` - why familiar, because it's similar intuition with fugus`(paper)` paper- CNN from different viewpoints`link` - output of pixel is results from different linear equations. - If you connect this with represents of neural network nodes, you can see that the specific inp nodes connected with specific out nodes. - **Summarize**: cnn does 1) matmul some of the elements are always zero 2) same weight for every row, which is called `weight time? weight. . ?, 1:18:50` `(picture)`#### Further lowdown- Because generally image has 3 channels, we need rank 3 kernel. - And **do multiply with all channel output is one pixel**. (`draw by your self`) - but this kernel will catch one feature, like horizontal, so that we make more kernel so that output becomes (h * w * kernel) - And that `kernel` come to `channel`- **Conv2d**: with 3 by 3 kernel, stride 2 conv -> (h/2 * w/2 * kernel) - skip or jump over input pixel - to protect from memory out of control~~~pythonlearn. modellearn. summary()~~~TODO: understand yourself the blocks of conv-kernel: - Usually use big kernel size at first layer (will study this at part2)- Bottom right highlighting kernel(`pic / draw`)- `torch. tensor. expand`: for memory efficient, because we should do RGB- We do not make separate kernel, but make rank 4 kernel - 4d tensor is just stacked kernel- `t[None]. shape` create new unit axis, and why? we make this -> it should move unit of batch, not one size image. ### Average pooling, feature- suppose our pre-trained model results in size of `11 by 11 by 512 ` `pic 4` and my classification task has 37 classes * take the first face of channel, which is 11 by 11 and `mean` it, so that make rank 2 tensor, 512 by 1 * and make 2d matrix, which is 512 by 37 and multiply so that we can get 37 by 1 matrix. - Feature, at convolution block - So, when we transfer-learning without unfreeze, every element of last matrix (512 by 1) should represent(or could catch) each feature. ### Heatmap, Hook~~~hook_output(model[0]) -> acts -> avg_acts~~~- if we average the block with `axis=feature`, result of matrix(11 by 11) depicts `how activated was that area?` -> it is heatmap, `avg_acts`- and acts comes from hook, which is more advanced pytorch feature. - hook into pytorch machine itself, and run any arbitrary Pytorch code - Why this is cool?: Normally it gives set of outputs of forward pass, but we can interrupt and hook the forward pass. - Also can store the output of the convolutional part of the model, which is before avg_pooling- Thinking back when we do cut off `after` the conv part. - but with fast. ai the original convolutional part of the model would be *the first thing in the model*, specifically could be given from `learn. model. eval()[0]` - And this is gotten from `hooked_output` and having hooked the output, we can pass our x_minibatch to output. - Not directly, but with normalized, minibatch, put on to the gpu - `one_item()` function do it, when we have one data `TODO: this is assignment` do it yourself without one_item function - and `. cuda()` put it on gpu- you should print out very often the shape of tensor, and try think why. "
+ }, {
+ "id": 13,
+ "url": "http://localhost:4000/2020/04/qna-image-segmentation/",
+ "title": "[Q&A] Image Segmentation, using Unet with Driving Video data",
+ "body": "2020/04/02 - This post is about my questions while I was studying USF Deep Learning course about image segmentation task. All the answers are from the course, source code, library document, or document. I cared about being clear at reporting information including source of information, however if there are still anything unclear, please contact me. And thank you Jeremy&Rachael for everything. Also Thank you Cambridge Computer Vision Lab to made us to study with your labor. The Cambridge-driving Labeled Video Database (CamVid) is the first collection of videos with object class semantic labels, complete with metadata. The database provides ground truth labels that associate each pixel with one of 32 semantic classes. If someone is interested in this project, please check the site and see the details. Now, let’s start first using jupyter’s one of tricks which I love most. It enables cell to print the code without print function. from IPython. core. interactiveshell import InteractiveShell# pretty print all cell's output and not just the last oneInteractiveShell. ast_node_interactivity = all from fastai. vision import *from fastai. callbacks. hooks import *from fastai. utils. mem import *path = untar_data(URLs. CAMVID) # The locations where the data and models are downloaded are set in config. ymlpath. ls() I’m trying to accustomed to using pathlib module, not just it became built-in module in python, but I felt uncomfortable myself with os module. However, still unpredictable conflicts are remain, even in the quite standard library like Pytorch, tensorflow, onnx. (it require me string for path. not PosixPath. will send PR. . ) [PosixPath('/root/. fastai/data/camvid/valid. txt'), PosixPath('/root/. fastai/data/camvid/images'), PosixPath('/root/. fastai/data/camvid/labels'), PosixPath('/root/. fastai/data/camvid/codes. txt')]path_img = path/'images'path_lbl = path/'labels'fnames = get_image_files(path_img) #filenamelbl_names = get_image_files(path_lbl)1. (Play with data) My Hypothesis: File name has A_B format. and A / B would be at key-value position. Use collections - defaultdict Default Dict: Link: easy to group a sequence of key and value pairs into a dictionary of list?from collections import defaultdictfnames[0], lbl_names[0](PosixPath('/root/. fastai/data/camvid/images/0001TP_009210. png'), PosixPath('/root/. fastai/data/camvid/labels/0016E5_01800_P. png'))files = [tuple(i. stem. split('_')) for i in fnames]labels = [tuple(i. stem. split('_')[:-1]) for i in lbl_names]d = defaultdict(list)for k, v in files: d[k]. append(v)d. keys()len(d['0001TP'])124for k, v in d. items(): print(k, v)0001TP ['009210', '008850', '007350', '008970', '009840', '010140', '008490', '008520', '009540', '008250', '008340', '006840', '007860', '007410', '007740', '009870', '010080', '007890', '008790', '010020', '008400', '007080', '008280', '010380', '009330', '009060', '007470', '006810', '009720', '008580', '007110', '008730', '009150', '007680', '009780', '007800', '007290', '008760', '009510', '008640', '008310', '007440', '006900', '007500', '008460', '009030', '008130', '009480', '009900', '010230', '009270', '008040', '007590', '007950', '009990', '008550', '007260', '008100', '007530', '006960', '008190', '009420', '009930', '009000', '007830', '008940', '006690', '009570', '008880', '010170', '007560', '009300', '006750', '009360', '010200', '007320', '008010', '009120', '007620', '007200', '007140', '010320', '006720', '008670', '007230', '008370', '010260', '009690', '006930', '009090', '007770', '010290', '010350', '008610', '008070', '009600', '008430', '009450', '007380', '009240', '007710', '007170', '008160', '008910', '007020', '006780', '007050', '009960', '009810', '008220', '009180', '009750', '010050', '009660', '010110', '007920', '009630', '007650', '006990', '008700', '009390', '007980', '008820', '006870']0016E5 ['01290', '08159', '05760', '08133', '08063', '06660', '00960', '05850', '00750', '06960', '08035', '08107', '07975', '08017', '05610', '07140', '08119', '08027', '07170', '08400', '08093', '02100', '06390', '04470', '08340', '06060', '00600', '07470', '08151', '07800', '01620', '05730', '01530', '00690', '08430', '05940', '01980', '07320', '08069', '07965', '04380', '05430', '01410', '06780', '08007', '08087', '08079', '06600', '08109', '05490', '00901', '04590', '04680', '08045', '01770', '06690', '08085', '06810', '00420', '08011', '07440', '02190', '06300', '04800', '01500', '00450', '08029', '01470', '06330', '07997', '08067', '05370', '08013', '08190', '00840', '02370', '08049', '08135', '01440', '06870', '05820', '05280', '08051', '04440', '08091', '01380', '00630', '07290', '05520', '04770', '00540', '07995', '07999', '05550', '07920', '08101', '08141', '08053', '04620', '08103', '05160', '07350', '08057', '06030', '06000', '08550', '07963', '08089', '05970', '08047', '05640', '06240', '05220', '04350', '01590', '07959', '01950', '08117', '06180', '01560', '05400', '08043', '07680', '00780', '08081', '07050', '01020', '01350', '04530', '06720', '07969', '08149', '08003', '08131', '08129', '08033', '05460', '01650', '07530', '08023', '05340', '08640', '05100', '08075', '01230', '04980', '02070', '01080', '06210', '05910', '08009', '01800', '05190', '02400', '08083', '08019', '07620', '07200', '07890', '08059', '06990', '04410', '08121', '08123', '06930', '08137', '08147', '08095', '06570', '06150', '08153', '06840', '05250', '00510', '08370', '08580', '08113', '07410', '08097', '01200', '04950', '07770', '07650', '04710', '06090', '08055', '07110', '07981', '00990', '08250', '08127', '01920', '07985', '08220', '08005', '08157', '05130', '08071', '01140', '04830', '07740', '08143', '06120', '02040', '08111', '08115', '00660', '08280', '06420', '07983', '02220', '05700', '01860', '01260', '04920', '06510', '07020', '08073', '08105', '08125', '06360', '07860', '07993', '00810', '06540', '08099', '08139', '02010', '07973', '08155', '07991', '06630', '00480', '06750', '04890', '08001', '08025', '00870', '08490', '01830', '07977', '05010', '01170', '07961', '01680', '01050', '07987', '07080', '04560', '00930', '05310', '02340', '05790', '08460', '00720', '08031', '02280', '08039', '08037', '08065', '06270', '08077', '06900', '04650', '06480', '07230', '08041', '06450', '00570', '07989', '04740', '07979', '02250', '07380', '00390', '01710', '07590', '08021', '08520', '07500', '01110', '04500', '02310', '07971', '02130', '05580', '05880', '08610', '08310', '08145', '05670', '04860', '07260', '08015', '07967', '01740', '01320', '07560', '07830', '01890', '08061', '02160', '07710', '05070', '05040']Seq05VD ['f00030', 'f02550', 'f03450', 'f01110', 'f00480', 'f00210', 'f04590', 'f04170', 'f01800', 'f03990', 'f03360', 'f03900', 'f02070', 'f00810', 'f03690', 'f01350', 'f01530', 'f04980', 'f05100', 'f03060', 'f00900', 'f03870', 'f02460', 'f01470', 'f02370', 'f02820', 'f04080', 'f02760', 'f04860', 'f02250', 'f04200', 'f00270', 'f03720', 'f02850', 'f04410', 'f01200', 'f03090', 'f02010', 'f03930', 'f00090', 'f01650', 'f01890', 'f03840', 'f03030', 'f02130', 'f01230', 'f04110', 'f02520', 'f04140', 'f04020', 'f00060', 'f03420', 'f01560', 'f00120', 'f04290', 'f02340', 'f00300', 'f01380', 'f00870', 'f01860', 'f02970', 'f04560', 'f02730', 'f00330', 'f04530', 'f03780', 'f01770', 'f03390', 'f05040', 'f02430', 'f03330', 'f00660', 'f01740', 'f02100', 'f04800', 'f04050', 'f00510', 'f02790', 'f04350', 'f00690', 'f00540', 'f02490', 'f00960', 'f00930', 'f04230', 'f02880', 'f03600', 'f01020', 'f01500', 'f02400', 'f04830', 'f04470', 'f03300', 'f02670', 'f00450', 'f01980', 'f01170', 'f01620', 'f04500', 'f01080', 'f03180', 'f05070', 'f03150', 'f04950', 'f01440', 'f03510', 'f01710', 'f00360', 'f04770', 'f02910', 'f01050', 'f00630', 'f04320', 'f00570', 'f03240', 'f02190', 'f01140', 'f03540', 'f02220', 'f02640', 'f03960', 'f00000', 'f04920', 'f01950', 'f00990', 'f03480', 'f03000', 'f00420', 'f04620', 'f03210', 'f00780', 'f03570', 'f01590', 'f00750', 'f01920', 'f04650', 'f03750', 'f03630', 'f02310', 'f02610', 'f02580', 'f04740', 'f02280', 'f04680', 'f00390', 'f00720', 'f03660', 'f02040', 'f03270', 'f00180', 'f03810', 'f01410', 'f01290', 'f03120', 'f00840', 'f04440', 'f00150', 'f01260', 'f02700', 'f02940', 'f00600', 'f01830', 'f04260', 'f05010', 'f04890', 'f02160', 'f00240', 'f04380', 'f01680', 'f04710', 'f01320']0006R0 ['f02820', 'f03690', 'f03180', 'f02550', 'f01020', 'f03660', 'f02340', 'f01170', 'f02610', 'f02940', 'f01290', 'f02100', 'f01350', 'f03270', 'f03870', 'f01380', 'f01980', 'f03810', 'f02430', 'f02310', 'f01830', 'f03480', 'f02970', 'f01890', 'f03210', 'f03930', 'f02040', 'f02070', 'f02400', 'f01560', 'f03030', 'f01770', 'f01590', 'f01950', 'f03420', 'f01650', 'f03450', 'f00990', 'f03630', 'f01500', 'f03570', 'f00930', 'f03090', 'f03360', 'f02880', 'f02460', 'f01440', 'f01920', 'f01230', 'f03840', 'f02730', 'f01620', 'f02220', 'f03750', 'f03330', 'f03540', 'f02520', 'f02790', 'f01050', 'f03120', 'f01800', 'f01140', 'f01860', 'f01530', 'f01470', 'f02670', 'f02490', 'f01260', 'f01110', 'f02760', 'f01680', 'f03150', 'f02580', 'f03300', 'f02280', 'f01200', 'f03390', 'f03510', 'f02640', 'f02190', 'f02370', 'f01320', 'f02130', 'f03600', 'f03240', 'f03780', 'f03720', 'f02700', 'f01410', 'f01080', 'f02850', 'f01710', 'f03900', 'f03060', 'f01740', 'f02010', 'f02250', 'f00960', 'f03000', 'f02160', 'f02910']for k, v in d. items(): print(k, len(d[k]))0001TP 1240016E5 305Seq05VD 1710006R0 101for i in d2. keys(): print(i,len(d2[i]))0016E5 3050001TP 1240006R0 101Seq05VD 171files[0], labels[0](('0001TP', '009210'), ('0016E5', '01800'))2. My question: Link: Why do we need masking? and does color from fastai library? (have to look into source code) What do the parameter alpha do? When people make masked img, would it be have ranged integer limit? Does image normalization related with this?lbl_sorted = sorted(lbl_names)f_sorted = sorted(fnames)lbl_1 = lbl_sorted[33]f_1 = f_sorted[33]img = open_image(lbl_1)mask = open_mask(lbl_1)_,axs = plt. subplots(1,2, figsize=(10,5))# img. show(ax=axs[0], y=mask, title='masked')img. show(ax=axs[0], title='1')mask. show(ax=axs[1], title='2', alpha=1. ) img_2 = open_image(f_1)mask_2 = open_mask(f_1)_,axs = plt. subplots(1,2, figsize=(10,5))# img. show(ax=axs[0], y=mask, title='masked')img_2. show(ax=axs[0], title='3',)mask_2. show(ax=axs[1], title='4', alpha=1. ) open_mask(lbl_1). data. shapetorch. Size([1, 720, 960])open_mask(lbl_1). data. shapetorch. Size([1, 720, 960])open_image(f_1). data. shapetorch. Size([3, 720, 960])open_image(f_1). data. shapetorch. Size([3, 720, 960])img. data #labeled datatensor([[[0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], . . . , [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176]], [[0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], . . . , [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176]], [[0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], . . . , [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176]]])mask. data # after mask, labeled datatensor([[[ 4, 4, 4, . . . , 21, 21, 21], [ 4, 4, 4, . . . , 21, 21, 21], [ 4, 4, 4, . . . , 21, 21, 21], . . . , [17, 17, 17, . . . , 30, 30, 30], [17, 17, 17, . . . , 30, 30, 30], [17, 17, 17, . . . , 30, 30, 30]]])img_2. data, mask_2. data(tensor([[[0. 0706, 0. 0667, 0. 0706, . . . , 0. 6431, 0. 6549, 0. 6627], [0. 0745, 0. 0706, 0. 0706, . . . , 0. 6431, 0. 6510, 0. 6549], [0. 0784, 0. 0706, 0. 0745, . . . , 0. 6392, 0. 6588, 0. 6588], . . . , [0. 0863, 0. 0824, 0. 0824, . . . , 0. 1333, 0. 1216, 0. 1255], [0. 0902, 0. 0863, 0. 0824, . . . , 0. 1255, 0. 1176, 0. 1216], [0. 0863, 0. 0824, 0. 0784, . . . , 0. 1137, 0. 1059, 0. 1137]], [[0. 0706, 0. 0667, 0. 0706, . . . , 0. 7490, 0. 7608, 0. 7686], [0. 0745, 0. 0706, 0. 0706, . . . , 0. 7451, 0. 7569, 0. 7608], [0. 0784, 0. 0706, 0. 0745, . . . , 0. 7412, 0. 7529, 0. 7529], . . . , [0. 0980, 0. 0941, 0. 0941, . . . , 0. 1804, 0. 1686, 0. 1725], [0. 1059, 0. 1020, 0. 0980, . . . , 0. 1725, 0. 1647, 0. 1686], [0. 1020, 0. 0980, 0. 0941, . . . , 0. 1608, 0. 1529, 0. 1608]], [[0. 0784, 0. 0745, 0. 0784, . . . , 0. 7569, 0. 7686, 0. 7765], [0. 0824, 0. 0784, 0. 0784, . . . , 0. 7647, 0. 7647, 0. 7686], [0. 0784, 0. 0706, 0. 0745, . . . , 0. 7608, 0. 7647, 0. 7647], . . . , [0. 1216, 0. 1176, 0. 1176, . . . , 0. 2000, 0. 1882, 0. 1922], [0. 1176, 0. 1137, 0. 1098, . . . , 0. 1843, 0. 1765, 0. 1804], [0. 1137, 0. 1098, 0. 1059, . . . , 0. 1725, 0. 1647, 0. 1725]]]), tensor([[[ 18, 17, 18, . . . , 183, 186, 188], [ 19, 18, 18, . . . , 183, 185, 186], [ 20, 18, 19, . . . , 182, 185, 185], . . . , [ 25, 24, 24, . . . , 43, 40, 41], [ 26, 25, 24, . . . , 41, 39, 40], [ 25, 24, 23, . . . , 38, 36, 38]]]))3. What is a difference between image and imageSegment?: imageSegment An ImageSegment object has the same properties as an Image. The only difference is that when applying the transformations to an ImageSegment, it will ignore the functions that deal with lighting and keep values of 0 and 1. It’s easy to show the segmentation mask over the associated Image by using the y argument of show_image. img = open_image(fnames[0])mask = open_mask(lbl_names[0])_,axs = plt. subplots(1,3, figsize=(8,4))img. show(ax=axs[0], title='no mask')img. show(ax=axs[1], y=mask, title='masked') #seg mask over the img using y argmask. show(ax=axs[2], title='mask only', alpha=1. ) vision. image ##4. Why/How img div by 255 and how it results fast. ai : vision. image - If div=True, pixel values are divided by 255. to become floats between 0. and 1. At times, you want to get rid of distortions caused by lights and shadows in an image. Normalizing the RGB values of an image can at times be a simple and effective way of achieving this. So sum of the pixel’s value over all channels(which is S) divides each intensified channel so that nomalized value will be R/S, G/S and B/S (where, S=R+G+B). Detailed explain here4. Python Evaluation Order: Python evaluates expressions from left to right. Notice that while evaluating an assignment, the right-hand side is evaluated before the left-hand side. mask_tmp, trg_tmp, void_tmp = 2, 1, 10mask_tmp = trg_tmp != void_tmpprint(mask_tmp, trg_tmp, void_tmp) # (1) target is not same with voidTrue 1 10# Example 1x = 1y = 2x,y = y,x+yx, y(2, 3)# Example 2x = 1y = 2x = yy = x+yx, y(2, 4)5. model learner parameter :: pct_start: A: Percentage of total number of epochs when learning rate rises during one cycle. Q: Sorry, I still confused that one cycle in the new API only runs one epoch. How the percentage of total number of epochs works? Can you give a example? If learn. fit_one_cycle(10, slice(1e-4,1e-3,1e-2), pct_start=0. 05)??A: Ok, strictly correct answer would be percentage of iterations, so you can have lr both increase and decrease during same epoch. In your example, say, you have 100 iterations per epoch, then for half an epoch (0. 05 * (10 * 100) = 50) lr will rise, then slowly decrease. Q2: Thanks for this explanation … so essentially, it is the percentage of overall iterations where the LR is increasing, correct? So, given the default of 0. 3, it means that your LR is going up for 30% of your iterations and then decreasing over the last 70%. Is that a correct summation of what is happening? A2: Yes, I think that’s correct. You can verify that by changing its value and check:learn. recorder. plot_lr() For example if pct_start = 0. 2 source: forums. fastai "
+ }, {
+ "id": 14,
"url": "http://localhost:4000/2020/03/note08-fastai-4/",
"title": "Gradient backward, Chain Rule, Refactoring",
- "body": "2020/03/02 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring” Lecture 08 - Deep Learning From Foundations-part2 “ Homework: calculus for machine learning einsum conventionCONTENTS: Foundation version Gradients backward pass decompose function chain rule with code check the result using Pytorch autograd Refactor model Layers as classes Modue. forward() Without einsum nn. Linear and nn. Module Forward process Foundation version: Gradients backward pass: Gradients is output with respect to parameter we’ve done this work in this path(below) to simplify this calculus, we can just change it into, So, you should know of the derivative of each bit on its own, and then you multiply them all together. As a result, it would be over cross over the data. So you can get gradient, output with respect to parameter What order should we calculate? BTW, why Jeremy wrote , not Loss function?1 decompose function We want to get derivative of which forms But, we have a estimation of answer (we call it y hat) now So, I will decompose funciton to trace target variable. Using the above forward pass, we can suppose some function from the end. start from , We know MSE funciton got two parameters, output, and target . from MSE’s input we know function’s output and supposing v is input of that function, similarly, v became output of chain rule with code examplify backward process by random sampling To get a variable, I modified forward model a little def model_ping(out = 'x_train'): l1 = lin(x_train, w1, b1) # one linear layer l2 = relu(l1) # one relu layer l3 = lin(l2, w2, b2) # one more linear layer return eval(out) Be careful we don’t use mse_loss in backward process1) start with the very last function, which is loss funciton. MSE If we codify this formula,def mse_grad(inp, targ): #mse_input(1000,1), mse_targ (1000,1) # grad of loss with respect to output of previous layer inp. g = 2. * (inp. squeeze() - targ). unsqueeze(-1) / inp. shape[0] And, this can be examplified like below. Notice that input of gradient function is same with forward functiony_hat = model_ping('l3') #get value from forward modely_hat. g = ((y_hat. squeeze(-1)-y_train). unsqueeze(-1))/y_hat. shape[0]y_hat. g. shape>>> torch. Size([50000, 1]) We can just calculate using broadcasting, not using squeeze. then why should do and unsqueeze again?🎯 It’s related with random access memory(RAM). . If I don’t squeeze, (I’m using colab) it out of RAM. 2) Derivative of linear2 function This process’s weight dimensions defined by axis=1, axis=2. axis=0 dimension means size of data. This will be summazed by . sum(0) method. unsqeeze(-1)&unsqeeze(1) seperates the dimension, and make a dot product, and vanish axis=0 dimension. def lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowlin2 = model_ping('l2'); #get value from forward modellin2. g = y_hat. g@w2. t(); w2. g = (lin2. unsqueeze(-1) * y_hat. g. unsqueeze(1)). sum(0);b2. g = y_hat. g. sum(0);lin2. g. shape, w2. g. shape, b2. g. shape>>> torch. Size([50000, 50])torch. Size([50, 1])torch. Size([1]) Notice going reverse order, we’re passing in gradient backward3) derivative of ReLU def relu_grad(inp, out): # grad of relu with respect to input activations inp. g = (inp>0). float() * out. g Examplified belowlin1=model_ping('l1') #get value from forward modellin1. g = (lin1>0). float() * lin2. g;lin1. g. shape>>> torch. Size([50000, 50])4) Derivative of linear1 Same process with 2) but, this process’s weight hasdef lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowx_train. g = lin1. g @ w1. t(); w1. g = (x_train. unsqueeze(-1) * lin1. g. unsqueeze(1)). sum(0); b1. g = lin1. g. sum(0);x_train. g. shape, w1. g. shape, b1. g. shape>>> torch. Size([50000, 784])torch. Size([784, 50])torch. Size([50])5) Then it goes backward pass def forward_and_backward(inp, targ): # forward pass: l1 = inp @ w1 + b1 l2 = relu(l1) out = l2 @ w2 + b2 # we don't actually need the loss in backward! loss = mse(out, targ) # backward pass: mse_grad(out, targ) lin_grad(l2, out, w2, b2) relu_grad(l1, l2) lin_grad(inp, l1, w1, b1)Version 1 (Basic)- Wall time: 1. 95 s Summary Notice that output of function at forward pass became input of backward pass backpropagation is just the chain rule value loss (loss=mse(out,targ)) is not used in gradient calcuation. Because, it doesn’t appear with the weight. w1g, w2g, b1g, b2g, ig will be used for optimizercheck the result using Pytorch autograd require_grad_ is the magical function, which can automatic differentiation. 2 This magical auto gradified tensor keep track what happend in forward (taking loss function), and do the backward3 So it saves our time to differentiate ourselves ⤵️ THis is benchmark…. . Version 2 (torch autograd)- Wall time: 3. 81 µs Refactor model: Amazingly, just refactoring our main pieces, it comes down up to Pytorch package. 🌟 Implement yourself, Practice, practice, practice! 🌟 Layers as classes: Relu and Linear are layers in oue neural net. -> make it as classes For the forward, using __call__ for the both of forward & backward. Because ‘call’ means we treat this as a function. class Lin(): def __init__(self, w, b): self. w,self. b = w,b def __call__(self, inp): self. inp = inp self. out = inp@self. w + self. b return self. out def backward(self): self. inp. g = self. out. g @ self. w. t() # Creating a giant outer product, just to sum it, is inefficient! self. w. g = (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) self. b. g = self. out. g. sum(0) Remember that in lin_grad function, we save bias&weight!!!!!💬 inp. g : gradient of the output with respect to the input. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 w. g : gradient of the output with respect to the weight. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 b. g : gradient of the output with respect to the bias. {: style=”color:grey; font-size: 90%; text-align: center;”} class Model(): def __init__(self, w1, b1, w2, b2): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ) def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() refer to Jeremy’s Model class, he put layers in list Dionne’s self-study note: Decomposing Jeremy’s Model class init needs weight, bias but not x data when call that class(a. k. a function) it gave x data and y label! jeremy composited function in layers. x = l(x) so concise…. . also utilized that layer list when backward ust reversing it (using python list’s method) And he is recursively calling the function on the result of the previous thing. ⬇️for l in self. layers: x = l(x)Q2: Don’t I need to declare magical autograd function, requires_grad_?{: style=”color:red; font-size: 130%; text-align: center;”} [The questions migrated to this article] Version 3 (refactoring - layer to class)- Wall time: 5. 25 µs Modue. forward(): Duplicate code makes execution time slow. Role of __call__ changed. No more __call__ for implementing forward pass. By initializing the forward with __call__, Module. forward() use overriding to maximize reusability. So any layer inherit Module, can use parent’s function. gradient of the output with respect to the weight (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) can be reexpressed using einsum, torch. einsum( bi,bj->ij , inp, out. g) Defining forward and Module enables Pytorch to out almost duplicatesVersion 4 (Module & einsum)- Wall time: 4. 29 µs Q2: Isn’t there any way to use broadcasting? Why we should use outer product?{: style=”color:red; font-size: 130%; text-align: center;”} Without einsum: Replacing einsum to matrix product is even more faster. torch. einsum( bi,bj->ij , inp, out. g)can be reexpressed using matrix product, inp. t() @ out. gVersion 5 (without einsum)- Wall time: 3. 81 µs nn. Linear and nn. Module: Torch’s package nn. Linear and nn. Module Version 6 (torch package)- Wall time: 5. 01 µs Final, Using torch. nn. Linear & torch. nn. Module~~~pythonclass Model(nn. Module): def init(self, n_in, nh, n_out): super(). init() self. layers = [nn. Linear(n_in,nh), nn. ReLU(), nn. Linear(nh,n_out)] self. loss = mse def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x. squeeze(), targ)class Model(): def init(self): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ)def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() ~~~ Footnote: fast. ai forums Lesson-8 ↩ pytorch docs - autograd ↩ stackoverflow - finding methods a object has ↩ "
+ "body": "2020/03/02 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring ” Lecture 08 - Deep Learning From Foundations-part2 “ Homework: calculus for machine learning einsum conventionCONTENTS: Foundation version Gradients backward pass decompose function chain rule with code check the result using Pytorch autograd Refactor model Layers as classes Modue. forward() Without einsum nn. Linear and nn. Module Forward process Foundation version: Gradients backward pass: Gradients is output with respect to parameter we’ve done this work in this path(below) to simplify this calculus, we can just change it into, So, you should know of the derivative of each bit on its own, and then you multiply them all together. As a result, it would be over cross over the data. So you can get gradient, output with respect to parameter What order should we calculate? BTW, why Jeremy wrote , not Loss function?1 decompose function We want to get derivative of which forms But, we have a estimation of answer (we call it y hat) now So, I will decompose funciton to trace target variable. Using the above forward pass, we can suppose some function from the end. start from , We know MSE funciton got two parameters, output, and target . from MSE’s input we know function’s output and supposing v is input of that function, similarly, v became output of chain rule with code examplify backward process by random sampling To get a variable, I modified forward model a little def model_ping(out = 'x_train'): l1 = lin(x_train, w1, b1) # one linear layer l2 = relu(l1) # one relu layer l3 = lin(l2, w2, b2) # one more linear layer return eval(out) Be careful we don’t use mse_loss in backward process1) start with the very last function, which is loss funciton. MSE If we codify this formula,def mse_grad(inp, targ): #mse_input(1000,1), mse_targ (1000,1) # grad of loss with respect to output of previous layer inp. g = 2. * (inp. squeeze() - targ). unsqueeze(-1) / inp. shape[0] And, this can be examplified like below. Notice that input of gradient function is same with forward functiony_hat = model_ping('l3') #get value from forward modely_hat. g = ((y_hat. squeeze(-1)-y_train). unsqueeze(-1))/y_hat. shape[0]y_hat. g. shape>>> torch. Size([50000, 1]) We can just calculate using broadcasting, not using squeeze. then why should do and unsqueeze again?🎯 It’s related with random access memory(RAM). . If I don’t squeeze, (I’m using colab) it out of RAM. 2) Derivative of linear2 function This process’s weight dimensions defined by axis=1, axis=2. axis=0 dimension means size of data. This will be summazed by . sum(0) method. unsqeeze(-1)&unsqeeze(1) seperates the dimension, and make a dot product, and vanish axis=0 dimension. def lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowlin2 = model_ping('l2'); #get value from forward modellin2. g = y_hat. g@w2. t(); w2. g = (lin2. unsqueeze(-1) * y_hat. g. unsqueeze(1)). sum(0);b2. g = y_hat. g. sum(0);lin2. g. shape, w2. g. shape, b2. g. shape>>> torch. Size([50000, 50])torch. Size([50, 1])torch. Size([1]) Notice going reverse order, we’re passing in gradient backward3) derivative of ReLU def relu_grad(inp, out): # grad of relu with respect to input activations inp. g = (inp>0). float() * out. g Examplified belowlin1=model_ping('l1') #get value from forward modellin1. g = (lin1>0). float() * lin2. g;lin1. g. shape>>> torch. Size([50000, 50])4) Derivative of linear1 Same process with 2) but, this process’s weight hasdef lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowx_train. g = lin1. g @ w1. t(); w1. g = (x_train. unsqueeze(-1) * lin1. g. unsqueeze(1)). sum(0); b1. g = lin1. g. sum(0);x_train. g. shape, w1. g. shape, b1. g. shape>>> torch. Size([50000, 784])torch. Size([784, 50])torch. Size([50])5) Then it goes backward pass def forward_and_backward(inp, targ): # forward pass: l1 = inp @ w1 + b1 l2 = relu(l1) out = l2 @ w2 + b2 # we don't actually need the loss in backward! loss = mse(out, targ) # backward pass: mse_grad(out, targ) lin_grad(l2, out, w2, b2) relu_grad(l1, l2) lin_grad(inp, l1, w1, b1)Version 1 (Basic)- Wall time: 1. 95 s Summary Notice that output of function at forward pass became input of backward pass backpropagation is just the chain rule value loss (loss=mse(out,targ)) is not used in gradient calcuation. Because, it doesn’t appear with the weight. w1g, w2g, b1g, b2g, ig will be used for optimizercheck the result using Pytorch autograd require_grad_ is the magical function, which can automatic differentiation. 2 This magical auto gradified tensor keep track what happend in forward (taking loss function), and do the backward3 So it saves our time to differentiate ourselves Postfix underscore means in pytorch, in-place function, What is in-place function?⤵️ THis is benchmark…. . Version 2 (torch autograd)- Wall time: 3. 81 µs Refactor model: Amazingly, just refactoring our main pieces, it comes down up to Pytorch package. 🌟 Implement yourself, Practice, practice, practice! 🌟 Layers as classes: Relu and Linear are layers in oue neural net. -> make it as classes For the forward, using __call__ for the both of forward & backward. Because ‘call’ means we treat this as a function. class Lin(): def __init__(self, w, b): self. w,self. b = w,b def __call__(self, inp): self. inp = inp self. out = inp@self. w + self. b return self. out def backward(self): self. inp. g = self. out. g @ self. w. t() # Creating a giant outer product, just to sum it, is inefficient! self. w. g = (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) self. b. g = self. out. g. sum(0) Remember that in lin_grad function, we save bias&weight!!!!!💬 inp. g : gradient of the output with respect to the input. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 w. g : gradient of the output with respect to the weight. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 b. g : gradient of the output with respect to the bias. {: style=”color:grey; font-size: 90%; text-align: center;”} class Model(): def __init__(self, w1, b1, w2, b2): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ) def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() refer to Jeremy’s Model class, he put layers in list Dionne’s self-study note: Decomposing Jeremy’s Model class init needs weight, bias but not x data when call that class(a. k. a function) it gave x data and y label! jeremy composited function in layers. x = l(x) so concise…. . also utilized that layer list when backward ust reversing it (using python list’s method) And he is recursively calling the function on the result of the previous thing. ⬇️for l in self. layers: x = l(x)Q2: Don’t I need to declare magical autograd function, requires_grad_?{: style=”color:red; font-size: 130%; text-align: center;”} [The questions migrated to this article] Version 3 (refactoring - layer to class)- Wall time: 5. 25 µs Modue. forward(): Duplicate code makes execution time slow. Role of __call__ changed. No more __call__ for implementing forward pass. By initializing the forward with __call__, Module. forward() use overriding to maximize reusability. So any layer inherit Module, can use parent’s function. gradient of the output with respect to the weight (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) can be reexpressed using einsum, torch. einsum( bi,bj->ij , inp, out. g) Defining forward and Module enables Pytorch to out almost duplicatesVersion 4 (Module & einsum)- Wall time: 4. 29 µs Q2: Isn’t there any way to use broadcasting? Why we should use outer product?{: style=”color:red; font-size: 130%; text-align: center;”} Without einsum: Replacing einsum to matrix product is even more faster. torch. einsum( bi,bj->ij , inp, out. g)can be reexpressed using matrix product, inp. t() @ out. gVersion 5 (without einsum)- Wall time: 3. 81 µs nn. Linear and nn. Module: Torch’s package nn. Linear and nn. Module Version 6 (torch package)- Wall time: 5. 01 µs Final, Using torch. nn. Linear & torch. nn. Module~~~pythonclass Model(nn. Module): def init(self, n_in, nh, n_out): super(). init() self. layers = [nn. Linear(n_in,nh), nn. ReLU(), nn. Linear(nh,n_out)] self. loss = mse def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x. squeeze(), targ)class Model(): def init(self): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ)def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() ~~~ Footnote: fast. ai forums Lesson-8 ↩ pytorch docs - autograd ↩ stackoverflow - finding methods a object has ↩ "
}, {
- "id": 13,
+ "id": 15,
"url": "http://localhost:4000/2020/03/note08-fastai-3/",
"title": "Implement forward&backward pass from scratch",
"body": "2020/03/01 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring1. The forward and backward passes: 1. 1 Normalization: train_mean,train_std = x_train. mean(),x_train. std()>>> train_mean,train_std(tensor(0. 1304), tensor(0. 3073))Remember! Dataset, which is x_train, mean and standard deviation is not 0&1. But we need them to be which means we should substract means and divide data by std. You should not standarlize validation set because training set and validation set should be aparted. after normalize, mean is close to zero, and standard deviation is close to 1. 1. 2 Variable definition: n,m: size of the training set c: the number of activations we need in our model2. Foundation Version: 2. 1 Basic architecture: Our model has one hidden layer, output to have 10 activations, used in cross entropy. But in process of building architecture, we will use mean square error, output to have 1 activations and lator change it to cross entropy number of hidden unit; 50see below pic We want to make w1&w2 mean and std be 0&1. why initializating and make mean zero and std one is important? paper highlighting importance of normalisation - training 10,000 layer network without regularisation1 2. 1. 1 simplified kaiming initQ: Why we did init, normalize with only validation data? Because we can not handle and get statistics from each value of x_valid?{: style=”color:red; font-size: 130%; text-align: center;”} what about hidden(first) layer?w1 = torch. randn(m,nh)b1 = torch. zeros(nh)t = lin(x_valid, w1, b1) # hidden>>> t. mean(), t. std()((tensor(2. 3191), tensor(27. 0303))In output(second) layer, w2 = torch. randn(nh,1)b2 = torch. zeros(1)t2 = lin(t, w2, b2) # output>>> t2. mean(), t2. std()(tensor(-58. 2665), tensor(170. 9717)) which is terribly far from normalzed value. But if we apply simplified kaiming init w1 = torch. randn(m,nh)/math. sqrt(m); b1 = torch. zeros(nh)w2 = torch. randn(nh,1)/math. sqrt(nh); b2 = torch. zeros(1)t = lin(x_valid, w1, b1)t. mean(),t. std()>>> (tensor(-0. 0516), tensor(0. 9354)) But, actually, we use activations not only linear function After applying activations relu at linear layer, mean and deviation became 0. 5. 2. 1. 2 Glorrot initializationPaper2: Understanding the difficulty of training deep feedforward neural networks Gaussian(, bell shaped, normal distributions) is not trained very well. How to initialize neural nets? with the size of layer , the number of filters . But there is No acount for import of ReLU If we got 1000 layers, vanishing gradients problem emerges2. 1. 3 Kaiming initializatingPaper3: Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Kaiming He, explained here rectifier: rectified linear unit rectifier network: neural network with rectifier linear units This is kaiming init, and why suddenly replace one to two on a top? to avoid vanishing gradient(weights) But it doesn’t give very nice mean tough. 2. 1. 4 Pytorch package Why fan_out? according to pytorch documentation, choosing 'fan_in' preserves the magnitude of the variance of the wights in the forward pass. choosing 'fan_out' preserves the magnitues in the backward pass(, which means matmul; with transposed matrix) ➡️ in the other words, torch use fan_out cz pytorch transpose in linear transformaton. What about CNN in Pytorch?I tried torch. nn. Conv2d. conv2d_forward?? Jeremy digged into using torch. nn. modules. conv. _ConvNd. reset_parameters?? 2 in Pytorch, it doesn’t seem to be implemented kaiming init in right formula. so we should use our own operation. But actually, this has been discussed in Pytorch community before. 3 4 Jeremy said it enhanced variance also, so I sampled 100 times and counted better results. To make sure the shape seems sensible. check with assert. (remember we will replace 1 to 10 in cross entropy)assert model(x_valid). shape==torch. Size([x_valid. shape[0],1])>>> model(x_valid). shape(10000, 1) We have made Relu, init, linear, it seems we can forward pass code we need for basic architecture nh = 50def lin(x, w, b): return x@w + b;w1 = torch. randn(m,nh)*math. sqrt(2. /m ); b1 = torch. zeros(nh)w2 = torch. randn(nh,1); b2 = torch. zeros(1)def relu(x): return x. clamp_min(0. ) - 0. 5t1 = relu(lin(x_valid, w1, b1))def model(xb): l1 = lin(xb, w1, b1) l2 = relu(l1) l3 = lin(l2, w2, b2) return l32. 2 Loss function: MSE: Mean squared error need unit vector, so we remove unit axis. def mse(output, targ): return (output. squeeze(-1) - targ). pow(2). mean() In python, in case you remove axis, you use ‘squeeze’, or add axis use ‘unsqueeze’ torch. squeeze where code commonly broken. so, when you use squeeze, clarify dimension axis you want to removetmp = torch. tensor([1,1])tmp. squeeze()>>> tensor([1, 1]) make sure to make as float when you calculateBut why??? because it is tensor?{: style=”color:red; font-size: 130%;”} Here’s the error when I don’t transform the data type ---------------------------------------------------------------------------TypeError Traceback (most recent call last)<ipython-input-22-ae6009bef8b4> in <module>()----> 1 y_train = get_data()[1] # call data again 2 mse(preds, y_train)TypeError: 'map' object is not subscriptable This is forward passFootnote: Other materials: Understanding the difficulty of training deep feedforward neural networks, paper that introduced Xavier initialization Fixup Initialization: Residual Learning Without Normalization ↩ Pytorch implementaion on Kaiming init of conv and linear layers ↩ Pytorch kaiming init issue ↩ Pytorch kaiming init explained ↩ "
}, {
- "id": 14,
+ "id": 16,
"url": "http://localhost:4000/2020/03/note08-fastai-2/",
"title": "What's inside Pytorch Operator?",
"body": "2020/03/01 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, RefactoringWhat’s inside Pytorch Operator?: Section02 Time comparison with pure Python: Matmul with broadcasting> 3194. 95 times faster Einstein summation> 16090. 91 times faster Pytorch’s operator> 49166. 67 times faster 1. Elementwise op: 1. 1 Frobenius norm: above converted into (m*m). sum(). sqrt() Plus, don’t suffer from mathmatical symbols. He also copy and paste that equations from wikipedia. and if you need latex form, download it from archive. 2. Elementwise Matmul: What is the meaning of elementwise? We do not calculate each component. But all of the component at once. Because, length of column of A and row of B are fixed. How much time we saved? So now that takes 1. 37ms. We have removed one line of code and it is a 178 times faster…#TODOI don’t know where the 5 from. but keep it. Maybe this is related with frobenius norm…?as a result, the code before for k in range(ac): c[i,j] += a[i,k] + b[k,j]the code after c[i,j] = (a[i,:] * b[:,j]). sum()To compare it (result betweet original and adjusted version) we use not test_eq but other function. The reason for this is that due to rounding errors from math operations, matrices may not be exactly the same. As a result, we want a function that will “is a equal to b within some tolerance” #exportdef near(a,b): return torch. allclose(a, b, rtol=1e-3, atol=1e-5)def test_near(a,b): test(a,b,near)test_near(t1, matmul(m1, m2))3. Broadcasting: Now, we will use the broadcasting and removec[i,j] = (a[i,:] * b[:,j]). sum() How it works?>>> a=tensor([[10,10,10], [20,20,20], [30,30,30]])>>> b=tensor([1,2,3,])>>> a,b (tensor([[10, 10, 10], [20, 20, 20], [30, 30, 30]]),tensor([1, 2, 3])) >>> a+btensor([[11, 12, 13], [21, 22, 23], [31, 32, 33]]) <Figure 2> demonstrated how array b is broadcasting(or copied but not occupy memory) to compatible with a. Refered from numpy_tutorial there is no loop, but it seems there is exactly the loop. This is not from jeremy (actually after a moment he cover it) but i wondered How to broadcast an array by columns? c=tensor([[1],[2],[3]])a+ctensor([[11, 11, 11], [22, 22, 22], [33, 33, 33]])s What is tensor. stride()?help(t. stride)Help on built-in function stride: stride(…) method of torch. Tensor instancestride(dim) -> tuple or intReturns the stride of :attr:’self’ tensor. Stride is the jump necessary to go from one element to the next one in the specified dimension :attr:’dim’. A tuple of all strides is returned when no argument is passed in. Otherwise, an integer value is returned as the stride in the particular dimension :attr:’dim’. Args: dim (int, optional): the desired dimension in which stride is requiredExample::* x = torch. tensor([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])`x. stride()>>> (5, 1)x. stride(0)>>> 5x. stride(-1)>>> 1 unsqueeze & None index We can manipulate rank of tensor Special value ‘None’, which means please squeeze a new axis here== please broadcast herec = torch. tensor([10,20,30])c[None,:] in c, squeeze a new axis in here please. 2. 2 Matmul with broadcasting: for i in range(ar):# c[i,j] = (a[i,:]). *[:,j]. sum() #previous c[i] = (a[i]. unsqueeze(-1) * b). sum(dim=0) And Using None also (As howard teached)c[i] = (a[i ]. unsqueeze(-1) * b). sum(dim=0) #howardc[i] = (a[i][:,None] * b). sum(dim=0) # using Nonec[i] = (a[i,:,None]*b). sum(dim=0)⭐️Tips🌟 1) Anytime there’s a trailinng(final) colon in numpy or pytorch you can delete it ex) c[i, :] = c [i]2) any number of colon commas at the start, you can switch it with the single elipsis. ex) c[:,:,:,:,i] = c […,i] 2. 3 Broadcasting Rules: What if we tensor. size([1,3]) * tensor. size([3,1])? torch. Size([3, 3]) What is scale???? What if they are one array is times of the other array? ex) Image : 256 x 256 x 3Scale : 128 x 256 x 3Result: ? Why I did not inserted axis via None, but happened broadcasting? >>> c * c[:,None]tensor([[100. , 200. , 300. ], [200. , 400. , 600. ], [300. , 600. , 900. ]])maybe it broadcast cz following array has 3 rows as same principle, no matter what nature shape was, if we do the operation tensor broadcasts to the other. >>> c==c[None]tensor([[True, True, True]])>>> c[None]==c[None,:]tensor([[True, True, True]])>>>c[None,:]==ctensor([[True, True, True]])3. Einstein summation: Creates batch-wise, remove inner most loop, and replaced it with an elementwise producta. k. ac[i,j] += a[i,k] * b[k,j]inner most loop c[i,j] = (a[i,:] * b[:,j]). sum()elementwise product Because K is repeated so we do a dot product. And it is torch. Usage of einsum()1) transpose2) diagnalisation tracing3) batch-wise (matmul) … einstein summation notationdef matmul(a,b): return torch. einsum('ik,kj->ij', a, b)so after all, we are now 16000 times faster than Python. 4. Pytorch op: 49166. 67 times faster than pure python And we will use this matrix multiplication in Fully Connect forward, with some initialized parameters and ReLU. But before that, we need initialized parameters and ReLU, Footnote: TensorRank ti noteResources: Frobenius Norm Review Broadcasting Review (especially Rule) Refer colab! (I totally confused with extension of arrays) torch. allclose Review np. einsum Reviewh "
}, {
- "id": 15,
+ "id": 17,
"url": "http://localhost:4000/2020/02/note08-fastai-1/",
"title": "What is the meaning of 'deep-learning from foundations?'",
"body": "2020/02/29 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring” Lecture 08 - Deep Learning From Foundations-part2 “ I don’t know if you read this article, but I heartily appreciate Rachael Thomas and Jeremy Howard for providing these priceless lectures for free Homework: Review concepts 16 concepts from Course 1 (lessons 1 - 7)(1) Affine Functions & non-linearities; 2) Parameters & activations; 3) Random initialization & transfer learning; 4) SGD, Momentum, Adam; 5) Convolutions; Batch-norm; 6) Dropout; 7) Data augmentation; 8) Weight decay; 9) Res/dense blocks; 10) Image classification and regression; 11)Embeddings; 12) Continuous & Categorical variables; 13) Collaborative filtering; 14) Language models; 15) NLP classification; 16) Segmentation; U-net; GANS) Make sure you understand broadcasting Read section 2. 2 in Delving Deep into Rectifiers Try to replicate as much of the notebooks as you can without peeking; when you get stuck, peek at the lesson notebook, but then close it and try to do it yourself calculus for machine learning based on weight… einsum conventionCONTENTS: What is going on in this course? What is ‘from foundations’? Steps to a basic modern CNN model Today’s implementation goal: 1) matmul -> 4) FC backward Library development using jupyter notebook jupyter notebook certainly can make module Elementwise ops How can we make python faster? What is element wise operation? FootnoteWhat is going on in this course?: What is ‘from foundations’?: 1) Recreate fast. ai and Pytorch 2) using pure python Evade OverfittingOverfit : validation error getting worsetraining loss < validation loss Know the name of the symbol you usefind in this page if you don’t know the symbol that you are using or just draw it here (run by ML!) Steps to a basic modern CNN model: 1) Matrix multiplication -> 2) Relu/Initialization -> 3) Fully-connected Forward-> 4) Fully-connected Backward -> 5) Train loop -> 6) Convolution-> 7) Optimization ->8) Batchnormalization -> 9) Resnet Today’s implementation goal: 1) matmul -> 4) FC backward: Library development using jupyter notebook: what is assers? jupyter notebook certainly can make module: There will be #export tag that Howard (and we) want to extract special notebook2script. py will detect sign of #expert and convert following into python module and test ittest\_eq(TEST,'test')test\_eq(TEST,'test1') what is run_notebook. py? when you want to test your module in command line interface !python run\_notebook. py 01_matmul. ipynb Is there any difference between 1) and 2)?1) test -> test01 2) test01 -> test #TODO I don’t know yet look into run_notebook. py, package fire Jeremy used. What is that?read and run the code in a notebook, and in the process, Jeremy made Python Fire library called!shockingly, fire takes any kind of function and converts into CLI command. fire library was released by Google open source, Thursday, March 2, 2017 Get data pytorch and numpy are pretty much same. variable c explains how many pixels there are in in MNIST, 28 pixels PyTorch’s view() method: torch function that manipulating tensor, and squeeze() in torch & mathmatical operation similar function Rao & McMahan said usually this functions result in feature vector. In part 1, you can use view function several times. Initial python model Which is Linear, like $Xw$(weight)$+a$(bias) $= Y$ If you don’t know hou to multiple matrix, refer this site matmul visulization site How many time spends if we we use pure python function matmul, typical matrix multiplication function, takes about 1 second for calculating 1 single train data! (maybe assumed stochastic, 5 data points in validation) it takes about 11. 36 hours to update parameters even single layer and 1 iteration! (if that was my computer, it would be 14 hours. . )🤪 THIS is why we need to consider ‘time’&’space’ This is kinda slow - what if we could speed it up by 50,000 times? Let’s try! Elementwise ops: How can we make python faster?: If we want to calculate faster, then do remove pythonic calcuation, by passing its computation down to something that is written something other than python, like pytorch. According to PyTorch doc it uses C++ (via ATen), so we are going to implement that function with python. What is element wise operation?: items makes a pair, operate corresponding componentFootnote: notebooks material video broadcasting excel"
}, {
- "id": 16,
+ "id": 18,
"url": "http://localhost:4000/2020/02/what-is-convolution/",
"title": "Digging into convolution",
"body": "2020/02/28 - Issues 1) Kaiming Initializtion in Pytorch was in trouble. 1 2) Jeremy started to dig in, in lesson09, but I didn’t know why the size of tensor is 2 and even understand this spreadsheet data. 3 Homework: Read Visualizing and Understanding Convolutional Networks paper What is a convolution? Visualization one kernel Matthew D Zeiler & Rob Fergus Paper Convolution can be represented as matmul Padding Kernel has rank 3 How can we find a side-edge, a gradient and area of constant weight? What is a convolution?: A convolutional neural network is that your red, green, and blue pixels go into the simple computation, and something comes out of that, and then the result of that goes into a second layer, and the result of that goes into the third layer and so forth. Visualization: one kernel Refer this site for visualizing CNN filteringMatthew D Zeiler & Rob Fergus PaperLecture01 Nine examples of the actual coefficients from the **first layer** Convolution can be represented as matmul: CNNs from different viewpoints {align-items: center;} [A B C D E F G H I J] is 3 by 3 image data flatten to vector. As a result, convolution is a just matrix just two things happens Some of entries are set to zeros at all the times same color always have the same weight. That called weight time / wegith sharing So, we can implement a convolution with matrix multiplication. But, we don’t do that because it’s slow!Padding: What most of libraries do is just put zeros asdie of matrix fast. ai uses reflection paddings (what is this? Jeremy said he uttered it)Kernel has rank 3: As standard picture input would be 4 5, it would be actually 3d, not 2d. If we make kernel as a 3x3 size, we pass over same kernel all the different Red, Green, Blue Pixels. This could make problem, because, if we want to detect frog, which is green, we would want more activations on the green(I made a test cell in my colab 6) How can we find a side-edge, a gradient and area of constant weight?: Not top-edge! One kernel can find only the top-edge, so we should stack the kernels 7 So, we pass it through bunch of kernels to the input images, and that process gives us height x width x corresponding number of kernels. Usually that number of chanel is 16 And if we want to get the more channels and features, we should repeat that process This process gives rise to memory out of control, we do the stride #### conv-example. xlsx 2 convolutional filters At a second layer, filter is 3x3x2 tensor, because to add up together the first layer’s channel. Reference: Problem was math. sqrt(5) was not kaiming initialization formula, Implementation in Pytorch ↩ size of tensor, lecture09 ↩ conv-example. xlsx ↩ Why do computer use red, green and blue instead of primary colors ↩ Grayscale is a group of shades without any visible color. … Each of these dots has its own brightness level as well and, therefore, can be converted to grayscale. A grayscale image is one with all color information removed. ↩ Testing RGB and grayscale ↩ stack kernel and make new rank of tensor at output, Lesson06-2019 ↩ "
}, {
- "id": 17,
+ "id": 19,
"url": "http://localhost:4000/2020/02/dps-week8/",
- "title": "Digital Product School week 8&9",
- "body": "2020/02/24 - The 8th week retropect at Digital Product School Week 8/9 - Ship your MVP/Release next iteration each day This week's schedule CONTENT: Preparing engineering weekly Agile Process Daily Stand-up Making application flowchart (feat draw. io) / ER diagram Flowchart, understaning user journey ER diagram Engineering weekly AI lunch Connecting firebase andPreparing engineering weekly: This week at Wednesday, I planned to explain the Language Modelings, mainly focusing ELMo, ULMFiT, BERT and GPT-2. Slides is available here Changed the presentation, because there were people who are not in ML domain. hereWhenever I do the presentation, I learn more than the information I give them. At the same time, I realize I need to learn more than I know. Agile Process: One of a priceless lesson I learnt from digital product school, was experience of doing agile work. Before I came here, it was a little bit vague concept. I’m not sure ‘what is agile’ but this is what we tried to make agile process. Daily Stand-up: Sharing the works everyday helps interdisciplinary team to work better. Since product started to get higher fidelity, the gap between engineer and non-engineer increased. Actually I didn’t planned to explain concept because I thougth I would be lose my audience when I start to explain. But as daily stand-up, which shares our progess, goes day by day, I planed and reported the issues. And it made each other’s topic feel more familiar. I think point is very important, because at that point people start to be curious. So we can actively ask to the others, and that momwnr, we can explain the point teammate dosen’t know. Each color means every different section. Red: Our team goal, Blue: Interaction designer, Green: Product manager, Yellow: Software/AI engineer This week engineer's main plan Each of us try to explain what we are doing, but things become easier when we are asked. Because we explained something was important to us before, but if we asked it is something important for the others. Making application flowchart (feat draw. io) / ER diagram: Before we start the party, we should clarify the flowchart and ER diagram of our application. Flowchart, understaning user journey: Thanks for google, we could use draw. io for our framechart framework. Actually, we cana choice other good flatform, but draw. io has connected app throgh google drive, most of our engineer was used to it. And after this job, I got to know there is also (of course) rule with the symbols, color, size, space, scaling and direction of arrow -reference. But why we should do this? WE have made our storymap before!! I think storymap is for visualize our status and app. So it should be shared with whole the team, and they should able to understand each role’s issue. But flowchart is more like testing technical feasibility, and error that user can experience. So it could be little more specific, complicated, and hypothetical. This week engineer's main plan ER diagram: Even if we use NoSQL database through firebase, my team was accustomed to SQL more. That what we educated when we were at college, so we had to organize our concept while we were learning NoSQL. Engineering weekly: Every engineering weekly we exchange our knowledge each other so that we can grow together. Before today, my AI collegues presented regression, knn and it was my turn. I prepared slide that explain about pre-trained language model, but my header advised me if I go deep of theoretical things, I would lose my audience. So I decided to brief BERT mode, how I can contribute to other team’s project. Since BERT was breakthrough of NLP industry, I tried to explain how it can be applied to hands on product and how it can help people in their product. The result was quite motivative to me. They gave feedback that since it wasn’t that much theoretical, they could enjoy it, and useful information. Someone asked me do I had learned of presentation before. I was really happy with their feedback! AI lunch: Connecting firebase and: "
+ "title": "My life in Digital Product School - week 8/19/10",
+ "body": "2020/02/24 - The 8/9/10th week retropect at Digital Product School Week 8 - Ship your MVPWeek 9/10 - Release next iteration each day Week 8th schedule CONTENT: Agile Product Development Daily Stand-up(planning) Gemba Walk Sprint Reviews Engineering weeklyAgile Product Development: One of a priceless lesson I learnt from digital product school, was experience of doing agile work. Before I came here, it was a little bit vague concept. I’m still not sure ‘what is agile’ but this is how we tried to make agile process. Daily Stand-up(planning): Sharing the works everyday helps interdisciplinary team to work better. Since product started to get higher fidelity, the gap between engineer and non-engineer increased. Actually I didn’t planned to explain concept because I thougth I would be lose my audience when I start to explain. But as daily stand-up, which shares our progess, goes day by day, I planed and reported the issues. And it made each other’s topic feel more familiar. I think point is very important, because at that point people start to be curious. So we can actively ask to the others, and that momwnr, we can explain the point teammate dosen’t know. Each color means every different section. Red: Our team goal, Blue: Interaction designer, Green: Product manager, Yellow: Software/AI engineer This week engineer's main plan Each of us try to explain what we are doing, but things become easier when we are asked. Because we explained something was important to us before, but if we asked it is something important for the others. Gemba Walk: Team Cero with core team Every 2 weeks, we do the Gemba work, which is ‘question everything to the core team’ time. At this period, people can ask anything related to our product, workshop, and framework. Core team will help just for each team, and each team can solve the problem related to their work. < br/>Why we need this session? because with workshop and general schedule, core team has no time just focus on each team. So through this session, we can have opportunity to understand each program and workshop, like why we are using this platform, and when is the due of our small project, and we have this problem and we need help for this. whatever small problem you have, core team is always willing to help you. Sprint Reviews: Every Friday, we have time to summarise what we did for the week. Maybe we need HMW question and our storymap to share our process and then tell and share what we did try, what point we succeeded and what point it was deviant of our prediction, and why we tried it. . Sprint of Ve-link And then, just after all team’s ppt, we do vote with such a cute marvel. Always it’s very difficult to vote (of course you can’t vote to your team!) Because it depends on criteria what do I value!But since this is process of our agile work, I try to focus on what they have changed since last week, and why they did it, how they did it. Engineering weekly: Every engineering weekly we exchange our knowledge each other so that we can grow together. Everyone have their knowledge to share and we can be tutor and at the same time can be of tutee. Previously, my AI collegues presented regression, knn. And because I’m somewhat specialized to NLP, I prepared slide that explain about pre-trained language model, but my header advised me if I go deep of theoretical things, I would lose my audience. So I decided to brief BERT mode, how I can contribute to other team’s project. Since BERT was breakthrough of NLP industry, I tried to explain how it can be applied to hands on product and how it can help people in their product. The result was quite motivative to me. They gave feedback that since it wasn’t that much theoretical, they could enjoy it, and useful information. Someone asked me do I had learned of presentation before. I was really happy with their feedback! "
}, {
- "id": 18,
+ "id": 20,
"url": "http://localhost:4000/2020/02/fast.ai-nlp-note-16/",
"title": "Algorithmic bias",
"body": "2020/02/20 - Algorithms can encode & magnify human bias Case Study 1: Facial Recognition & Predictive Policing: Joy Buolamwini & Timnit Gebru, gendershades. org Microsoft, FACE+, IBM - All of these things are sell now. Largest gap between $\therefore\ Lighter Male\ >\ Darker\ Female $ This US mayor joked cops should “mount . 50-caliber” guns where AI predicts crime With machine learning, with automation, there’s a 99% success, so that robot is ㅡwill beㅡ99% accurate in telling us what is going to happen next, which is really interesting. - city official in Lancater, CA, approving on using IBM for public security Bias: Bias is type of error Statistical Bias: difference between a statistic’s expected value and the true value Unjust Bias: disproportionate preference for or prejudice against a group Unconscious bias: bias that we don’t realize we have But, term bias is too generic to be productive. Different sources of bias have different causes Representation Bias: Dataset was not representative of the algorithm that might be used on later. Above : Data is okay, but algorithm has some problem. Below : Data has error. For example, object detection production that performs very well in common product of US. But in contrast, change of target product region, like Zimbabwe, Solomon Island, and so on, reduced the performence remarkably. It is not the algorithmic problem, so we should care about data volume of region. Evaluation Bias: Benchmark datasets spur on research, 4. 4% of IJB-A images are dark-skinned women. 2/3 of ImageNet images from the West (Sharkar et al, 2017) Case Study 2: Recidivism Algorithm Used Prison Sentencing: Case Study 3: Online Ad Delivery: Bias in NLP: ( Nothing to do with the course, but I’m researching this field these days. ) But all about Englsih ImpactThe person is doctor. The person is nurse -> 그는 의사다. 그녀는 간호사다. Concept of “biased data” often too generic to be useful: Different sources of bias have different sources Data, models and systems are not unchanging numbers on a screen. They’re the result of a complex process that starts with years of historical context and involves a series of choices and norms, from data measurement to model evaluation to human interpretation. - Harini Suresh, “The problem with Biased Data” Five Sources of Bias in ML: Representation Bias Evaluation Bias Measurement Bias Aggregation Bias(46:02) Historical Bias(46:26) A few studies(47:13) Racial Bias, Even when we have good intentions(new york times)(47:10) gender(48:59) Humans are biased, so why does algorithmic bias matter?: Algorithms & humans are used differently (humans are usually decision maker) Algorithms are accurate and objective No way to apeal if there if error processed large scale cheap Machine learning can amplify bias Machine learning can create feedback loops. Technology is power. And with that comes responsibility. Solutions: Analyze a project at work/school: Questions about AI 5 types of bias (Suresh & Guttag) Datasheets for datasets, Modelcards for model reporting Accuracy rate on different sub-groups Work with domain experts & those impacted Increase diversity in our workspace Advocate for good policy Be on the ongoing lookout for bias"
}, {
- "id": 19,
+ "id": 21,
"url": "http://localhost:4000/2020/02/classifier-city/",
"title": "Making a classifier with image dataset made from gooogle",
"body": "2020/02/15 - CONTENTS: Creating dataset from google images Using google_images_download Create ImageDataBunch Train model fit_one_cycle() Let’s find-tune Let’s train the whole model! Let’s make batch size bigger! Interpretation Model in productionCode can be found hereDeployed model here Making a classifier which can distinguish Seoul from Munich and Sanfrancisco!(hoping my well in Munich!) Creating dataset from google images: In machine learning, you always need data before you build your model. You can use either URLs or google_images_download package. Since Jeremy explained specifically, I will try the other. Using google_images_download: note: This is not google official package Refer to Official Doncument, put that arguments. from google_images_download import google_images_downloadresponse = google_images_download. googleimagesdownload() #class instantiationout_dir = os. path. abspath('. . /. . /materials/dataset/pkg/')os. mkdir(out_dir)arguments = { keywords : Cebu,Munich,Seoul , print_urls :True, suffix_keywords : city , output_directory :out_dir, type : photo , }paths = response. download(arguments) #passing the arguments to the functionprint(paths)and if you need, here is main code. Create ImageDataBunch: We need to separate validation set because we just grabbed these imagese from Google. Most of the dataset we use (kaggle/research) splited into train / validation / test so if they are not devided beforehand we should make databunch, and Jeremy recommended assign 20% to validation. Help on function verify_images in module fastai. vision. data:verify_images(path: Union[pathlib. Path, str], delete: bool = True, max_workers: int = 4, max_size: int = None, recurse: bool = False, dest: Union[pathlib. Path, str] = '. ', n_channels: int = 3, interp=2, ext: str = None, img_format: str = None, resume: bool = None, **kwargs) Check if the images in `path` aren't broken, maybe resize them and copy it in `dest`. Data from google image url Data from package Train model: len(class) len(train) len(valid) Data_url 3 432 108 Data_pkg 3 216 53 Uisng model: restnet34 1, Measurement: accuracy 2 fit_one_cycle(): What is fit one cycle? Cyclical Learning Rates for Training Neural Networks One of the way to find good learning rate. Core idea is to start with small learning rate (like 1e-4, 1e-3) and increase the learning rate after each mini-batch till loss starts exploding. And pick up learning rate one order lower than exploding point. For example, plotted learning rate is like below picture, picking up around 1e-2 is the best way. Why this methods Traditionally, the learning rate is decreased as the learning starts converging with time. But this paper suggests to cycle our learning rate, because it makes us avoid local minimum. Basically this cyclic method enables us to explore whole of loss function so that find out global minimum. In other words, higher learning rate behaves like regularisation. Let’s find-tune: Do train just one last layer by learning rate found by find_lr This section you should find the strongest downward slope that kind of sticking around for quite a while. And choose just one order lower than lowest point. As explained before, I will pick up 1e-2. And of course, this is fine-tuning, we don’t need discriminative learning rate yet. Let’s train the whole model!: link When you plot the learning rate again, maybe you will get soaring shape of learning rate. Rule of thumb, When you slice the learning rate, use learning rate you used at unfrozen part. Divide it by 5 or 10 and put it on maximum bound. At minimum bound, get the point just before it soared, and divide it by 10. Let’s make batch size bigger!: Since default batch size is 64, I tried it to 128. And it gets way more better result(even it’s still underfitting!) And if I freeze model and train whole model again, the model would be better. Also, you can use this method to the other big dataset model training! Interpretation: See the confusion matrix. Result is quite great. *Since I’m using colab, I will skip data cleansing. But I highly recommend you to use ImageCleaner widget, only if you are using jupyter notebook (not jupyter lab) Model in production: You can deploy your model in simple way. I referred fast. ai, and used render(it’s free for limited time). You can find detailed document here. and you can create a route like this. @app. route( /classify-url , methods=[ GET ])async def classify_url(request): bytes = await get_bytes(request. query_params[ url ]) img = open_image(BytesIO(bytes)) _,_,losses = learner. predict(img) return JSONResponse({ predictions : sorted( zip(cat_learner. data. classes, map(float, losses)), key=lambda p: p[1], reverse=True ) })You can find my deployed model here Reference: How to create a deep learning dataset using Google Images towardsdatascience - one cycle policy Deep Residual Learning for Image Recognition ↩ Accuracy_and_precision ↩ "
}, {
- "id": 20,
+ "id": 22,
"url": "http://localhost:4000/2020/02/dps-week5/",
"title": "Digital Product School week 5",
"body": "2020/02/09 - The 5th week retropect at Digital Product School Week 5 - Create a Storymap and sync it with Lean Canvas This week's schedule CONTENT: How to create our story map Prepare your story Discover your product’s AI potentialMondayHow to create our story map: We need this 'aha' moment There was a Milestone workshop, about our weekly goal. As we are agile working, we go fast and change every week’s goal. This week we will finalize our story map based on user’s pain-point and HMW questions. How should we make our story-map Basically we should make story map based on this rule Tell stories, don’t just write them! We always need context, that means all the story component should be connected Visualize your product to establish a shared understanding and speed up discussions! Post-it filled of text is not enough, we should fill it with visualizations then team mates can understand it fast Only discuss in front our your story map! (Speed) So we can update our story-map as soon as we change our opinion And also Use a story map to find the parts that matter most and to identify holes in your idea! Since the story map consists of techinical part, we should consider each story’s technical feasibility Minimise output, maximise outcome and impact! Build tests to figure out what’s minimum and what’s viable! This story map functions to find out our minimum value of ideas Work iteratively: Change your story map according to your learnings! We should repeat this process again and again PMs: Make sure Storymap is up to date!Prepare your story: team cero, our whole story map Our goal Technical feasibility of our storyWhat is your strategy to make user achieve something? This would be our expand point Discover your product’s AI potential: How can we apply AI to our product? Let’s write down our ‘HMW’ questions, and find out all p ossibilities. These are suggestion of possibilities, so don’t attached to feasibility (we will do in at lean start-up) Software section's expectation AI section's expectationTuesday Engineer's task, week5This 5th week, engineers settled WendesdayThursdayFriday"
}, {
- "id": 21,
+ "id": 23,
"url": "http://localhost:4000/2020/02/GPU-time/",
"title": "4 reasons took much time to setting GPU for fast.ai than I expected",
"body": "2020/02/05 - Motivation: Before now, me as a undergraduate student, I was parsimony who usually depend on colab, kaggle, friend’s server(occasional) whenever i need GPU. . And this time it’s been for a while to install GPU than I expected and I share the several component that stood in my way. Written at Oct 24 2019, if you think this is deprecated, please do not have a leap of faith. Just for the record, I’ve used Kaggle, Colab, GCP, Azure, EC2 as GPU cloud. 1. Did not know there is JupyterLab option in Google Cloud Platform. : At the first time when GCP came out, there was no AI Platform service. So from starting vm instance to launching jupyter and installing packages, I did all of the things myself. (and I learned 🤗) $ curl -O https://repo. continuum. io/archive/Anaconda3-5. 0. 1-Linux-x86_64. sh[Downloading conda in ssh] I created VM instance,selected zone, machine type and disk type. Then, define firewall rules and in ssh terminal, install jupyter and other packages. But you can do all of these things just using AI Platform. [AI Platform] I think it especially save your time if you are living in Asia-Pacific, which google doesn’t support not that much GPU resources. 2. Consider if the platform has limited resources in a region you live in. : I live in South Korea, East Asia, and it seems like this region has lots of limitation in GPU (except quite expensive AWS) And the Taiwan which was the only one region where I can launch my own VM with GPU (I tried all the other regions in the list) sometimes do normaly, but not always. 😥After launching, I did several works and next day I could not start VM. (I didn’t count it, but tried it a few hours because I didn’t want cost any more time…) Endlessly failed to start instance, then I choose to move AWS as an alternative way. 3. Fast. ai gives deliberate guide and I didn’t know it. : Fast. ai offer the guide for all available platform. (Colab, salamander, Gradient, Kaggle, Colab, and so on) It is so important, and really needs, because cloud computing options are vary as occasion and purpose arise. I didn’t know fast. ai has manual to running GCP, and I think it’s as good a reason as any for me to be have taken time. It helped me so much when I had aws and shortened my time. I don’t want to read all of the manual in amazno. . (It is recommended. . but I’d rather read GIT PRO now…) ssh -i ~/. ssh/<your_private_key_pair> -L localhost:8888:localhost:8888 ubuntu@<your instance IP>4. You should wait to add more volume just after add volume, by building AWS EC2. : Since Elastic Block Store(EBS) storage supports optimized storage, users can’t extend storage volume two times in a row. Unfortunately, at the first time, I didn’t know it (again 👻) and when VM lacked volume, I doubled dist capacity (76*2) at a rough but It needs more. <!– this time I installed GPU in two years, and it became little complicated compared to 2 years ago. And this time for the first time(maybe not the first time. . but i handled it in my class or with my friend. but it’s my first time on my own. ) I very I’m started to using used google colab, kaggleand, GCP-JupyterLab, ec2 - friend made, aws vm machine but I had a environment variable but i did not know of it. On these days, I could not get a resources from taiwan… I couldn’t notice a deliberate Anyway, as a result I tried myself gcp myself and aws ec2 with fast. ai But I think doing on my self surely takes much time (in this point I wonder why I’m doing this, and should remind me, especially I was studying disk volume optimization) disk volume exceed - https://askubuntu. com/questions/919748/no-space-left-on-device-even-though-there-is: "
}, {
- "id": 22,
+ "id": 24,
"url": "http://localhost:4000/2020/02/dps-week4/",
"title": "Digital Product School week 4",
"body": "2020/02/01 - The 4th week retropect at Digital Product School Week 4 - Find solution ideas and run experiments [This week’s schedule] CONTENT: Ideation Techniques What is ideation techniques? Generating idea in my team AIdeation Team brain storming of idea Die Produkt MacherMondayIdeation Techniques: [slides from @steffen] What is ideation techniques?: We tried to find out user’s painpoint last week. Tried to users talk about their, pain point. No question directly, but extract from them their pain with transportation. Generating idea in my team: AIdeation: TuesdayTeam brain storming of idea: Based on generated idea on Monday, we extended our idea doing rolling-paper! Die Produkt Macher: What is lean start-up? Lean startup is a methodology for developing businesses and products that aims to shorten product development cycles and rapidly discover if a proposed business model is viable; this is achieved by adopting a combination of business-hypothesis-driven experimentation, iterative product releases, and validated learning. - wikipedia WendesdayThursdayFriday"
}, {
- "id": 23,
+ "id": 25,
"url": "http://localhost:4000/2020/01/retrosprect-of-acl-paper-2020/",
"title": "Retrospect of ACL 2020 paper writing",
"body": "2020/01/29 - 2020 Annual Conference of the Association for Computational Linguistics Why I can’t use ‘Cebuano’ for the research?: Why I had to change target language from ‘Cebuano’ to ‘Tagalog’?-> No language translator options except google translation. But before knowing that I already consult my friend, whose mother tongue is English. So I had to aplogize her, but couldn’t tell her why suddenly I changed my plan. -> I realized there are many languages even can’t be researched at all. . -> Getting accustomed to discrimination makes misunderstanding, sometimes. At my country, we couldn’t use music streaming service, because of legal problem. But at that moment, I thought it was discrimination, which is done by music company. "
}, {
- "id": 24,
+ "id": 26,
"url": "http://localhost:4000/2020/01/Git-Merge/",
"title": "Why am I not listed as a contributor?!",
"body": "2020/01/10 - From the end of last year, big changes have witnessed in NLP research. Embracing an unprecedented growth, I started to study new exciting results and advances. In doing so, I noticed I’m not listed as contributor of repo which my PR accessed. How did I come to a repository?: When I’m stuck, I would prefer to code, than to go deep in theory. (It must be so. . too much to understand 🤒)It was BERT released by Google AI I felt keenly the necessity of implementing, because not only couldn’t understand the way they figured out positional encoding formula, but how it actually works. What does it mean to “scale” dot product in Attention? (Now I know it’s far from my section 😂) Figure 1. Scaled Dot Product. Adopted from tensorflow blogWhat was the code error?: For implement code in paper, I read the papers Transformer and BERT, structured the model, and refered the others’ code. Meanwhile, I found out a small error in tokenization process, which was changing a token into [MASK], enabled bidirectional representation. I’ve made PR, and got merged. But I was not in contributors. Why?: Figure 2. Merged Pull request Adopted from graykode projectActually I happened to know there can be couple of reasons github doesn’t include my name as contributor. Well, if contributors tab has more than 100 people, in which case it shows you up only if you are in the top 100 contributors because displaying too many contributors can make webpages down. Somethimes, however, it doesn’t that problem. Why not? Two possibilities are there. First, According to Joel-Glovier, if repository maintainer merged-as-a-rebase PR will end up showing as maintainer’s commit. But maintainer shouldn’t normally do this. Second, if you happend to commit using a different git email that what is in your GitHub profile, it will not be attached to your Github user, and “doesn’t show up” as you. Reference: Michał Chromiak’s blog Github: why are my contributions are not showing on my profile atlassian-gitfetch"
}, {
- "id": 25,
- "url": "http://localhost:4000/2019/12/lesson1-fastai/",
- "title": "Fine Grained Classification",
- "body": "2019/12/31 - Finally you can solve the mystery behind this weird drawing. . through this course. juptyer notebook magic: %reload_ext autoreload%autoreload 2%matplotlib inlinethis is special directives to jupyter notebook, not python code. And it is called ‘magics’ (but i think jeremy is magicion) If somebody changes underlying library code while I’m running this, please reload it automatically If somebody asks to plot something, then please plot it here in this Jupyter NotebookDon’t hesitate to import start~ Digging into untar_data, path. ls: Union[pathlib. Path, str]: typed programming language? -> maybe i think disclaim the type beforehand for sure. Q. like assert? path. ls()this is some module that fast. ai made because os. listdir(‘path’) is unconvinient. Python3 pathlib library!: pathlib "
- }, {
- "id": 26,
+ "id": 27,
"url": "http://localhost:4000/2019/12/jeremy-howard/",
"title": "Jeremy Howard",
"body": "2019/12/15 - This is journey to find out ‘who am I trying to be?’: How he impacted me? The person who made me start Computer Vision again. He emphasized the importance of studying NLP and Computer together to understand the deep-learning. He didn’t order it to study, but always he pursuade me with reasonable way. “It’s not just something I can throw away. NLP and computer vision a few weeks apart and that’s going to force your brain to realize like ‘oh I have to remember this’” He made me admit my failure in deep-learning. I started to objectify where am I. What should I do when I’m frustrated. “Keep going. You’re not expected to remember everything. Yet. You’re not expected to understand everything. Yet. You’re not expected to know why everything works. Yet. ” His articles are numerous, below. What is torch. nn Really? High Performance Numeric Programming with Swift: Explorations and Reflections C++11, random distributions, and Swift And especially, I like this book. Designing great data products Great predictive modeling is an important part of the solution, but it no longer stands on its own; as products become more sophisticated, it disappears into the plumbing. Designing great data products And he is also famous for words. Here are some. we’re going to try and use that to really understand what’s going on. So to warn you, none of it is rocket science but a lot of its going to look really new. So don’t expect to get it the first time but expect to listen and jump into the notebook try a few things test things out look particularly at like tensor shapes and inputs and outputs to check your understanding then go back and listen again. But and kind of try it, a few times, because you will get there right, it’s just that there’s going to be a lot of new concepts because we haven’t done that much stuff in pure Pytorch. Lesson 6: Deep Learning 2019 "
}, {
- "id": 27,
+ "id": 28,
"url": "http://localhost:4000/2019/11/julia-evans/",
"title": "Julia Evans",
"body": "2019/11/20 - This is journey to find out ‘who am I trying to be?’: The women who surprised me in many ways. First, she approached me to teaching some concepts drawing cartoons. It was at Hackers news, which was hightest ranks. Personally I have the use of not to reading title, so and cartoon was so cute and clear. I naturally gonna understood mechanism and astonished by her explaination ability. Her value, which she was taught by many people so want to do same things, moved me. Volume of her knowledge, that just reading post title is a deal of work, amazed me. "
}, {
- "id": 28,
+ "id": 29,
"url": "http://localhost:4000/2019/11/coc-retropective/",
"title": "Retrospective on Pycon 2019 Korea (CoC Committee)",
"body": "2019/11/05 - When I was volunteer, it seems like busy and hectic to managing that crowded conference. In my experience, to get things moving, it needs hierarchy. But it didn’t. Organizers emphasized our responsibility, and if I passed each other’s burden, It could be my burden next time. In solidarity of the obligation, we finished conference well. And after participating PyCon Korea 2018 as volunteer, I’ve joined PyCon Korea Organizer last year. <Figure 1> First meeting of PyCon 2019 Korea Organizers It’s been a while since PyCon 2019 finished. It’s held on Aug 15 - 18, at Coex Grand Balloom <Figure 2> Ongoing session, speaking on news comment processing <Figure 3> Sponsor Booth iin Coex Hall <Figure 4> After PyCon 2019, with all of volunteer, organizer, speakers 😍 🥰 Serving as part of the coc TF, I spent large fraction of last year doing CoC job. here’s the path what we’ve been grappled with to grasp a solution. First half: Before the conference Toward Diverse Community: Formally we’ve been reusing and modifying PyCon US CoC, but we needed fit in Korean and I was part of that to revise code of conduct. Except ‘That’ Diversity, Because it is ‘Harassment’: Specific point was harassment, and the others were not. process of finding the points. How can we settle this point?Second half: During the conference Handling the potential Harassment: Disjunction of policy and real-time situation: This ‘PyCon 2019 Korea retrospective series’ would be devided into 3 Episodes. “Retrospective on Pycon 2019 Korea (CoC Committee)” “Retrospective on Pycon 2019 Korea (Program Chair)” (20 Nov, To Be Update) “Maintaining participation while still making timely decisions” (29 Nov, To Be Update)"
}, {
- "id": 29,
+ "id": 30,
"url": "http://localhost:4000/2019/11/elif-shafak/",
"title": "Elif Shafak",
"body": "2019/11/05 - This is journey to find out ‘who am I trying to be?’: For creative-minded people, Istanbul is a treasure. ’ Photo © Chris Boland, licensed under CC BY-NC-ND 2. 0 it suddenly felt like what I was trying to convey was more complicated and detailed than what the circumstances allowed me to say. And I did what I usually do in similar situations: I stammered, I shut down, and I stopped talking. I stopped talking because the truth was complicated, even though I knew, deep within, that one should never, ever remain silent for fear of complexity. <Figure 1> Elif Shafak Photo credit: www. elifsafak. com. tr I want to talk about emotions and the need to boost our emotional intelligence. I think it’s a pity that mainstream political theory pays very little attention to emotions. Oftentimes, analysts and experts are so busy with data and metrics that they seem to forget those things in life that are difficult to measure and perhaps impossible to cluster under statistical models. But I think this is a mistake, for two main reasons. We are emotional beings. I think it’s going to be one of our biggest intellectual challenges, because our political systems are replete with emotions. In country after country, we have seen illiberal politicians exploiting these emotions. And yet within the academia and among the intelligentsia, we are yet to take emotions seriously. I think we should. 1 2 Reference: British Council Worldwide ↩ Ted Talk ↩ "
}, {
- "id": 30,
+ "id": 31,
"url": "http://localhost:4000/2019/01/dps-week1/",
"title": "Digital Product School week 1",
"body": "2019/01/11 - The 1th week retropect at Digital Product School [This week’s schedule] CONTENT: Welcome to Digital Product School! Trip to Spitzingsee Welcome to Design Office Specifying our goal of product Welcome to Digital Product School!: Trip to Spitzingsee: At the first day of Digital Product School, we had a off-site with all of batch 9 people. All the costs were managed by dps. At the beautiful mountain, we settled the team, and got my team goal. Basically, there are two kind of team in DPS. (1) Wild team - the team has fixed topic(2) Company team - the team which has specific stakeholders, and also topic defined by that stakeholders The Core-team will fix what team you will join in DPS for 3 months based on ymy professionals, they announce it at off-site. [My team for 3 months at DPS] And we decide on my batch #9 theme song. How? Each team draw for songs and pitch ‘why this song should be batch #9 theme song’The result? Imagine dragon - Believer (I didn’t know at the moment, this song would be stamped in my memory) We have a workshop for getting to know each other. For example, we share 1) what do I expect from 3 months of dps, 2) when I feel happy in my life time, 3) what I worked for last week, 4) what was my last project and 5) what plays important role in my life My team's board Cero Welcome to Design Office: At first day of design office, we had workshop, which celebrates my day in dps also discuss specific rule, menifesto and stakeholders We get sticker and attach it in map depends on my nationality Now time to get to know my team’s stakeholders. What they want for us? What they expect from us? How free my team are on the topic?To be honest, it is endless tug-of-war. We should discuss with my stakeholders, endlessly, and find out solution which can meet interest of users, stakeholders and my team. Basically, my team’s main stakeholder is ADAC, but BMW, City of munich and Nokia will also participate as my team’s stakeholders. Specifying our goal of product: "
diff --git a/_site/2020/02/dps-week4/index.html b/_site/2020/02/dps-week4/index.html
index 1e781c8b17..d4bdbf567a 100644
--- a/_site/2020/02/dps-week4/index.html
+++ b/_site/2020/02/dps-week4/index.html
@@ -19,9 +19,9 @@
-
+
+{"description":"The 4th week retropect at Digital Product School","author":{"@type":"Person","name":"dionne"},"@type":"BlogPosting","url":"http://localhost:4000/2020/02/dps-week4/","publisher":{"@type":"Organization","logo":{"@type":"ImageObject","url":"http://localhost:4000/assets/images/logo.png"},"name":"dionne"},"image":"http://localhost:4000/assets/images/week4/week4-ourteam.JPG","headline":"Digital Product School week 4","dateModified":"2020-02-01T00:00:00+09:00","datePublished":"2020-02-01T00:00:00+09:00","mainEntityOfPage":{"@type":"WebPage","@id":"http://localhost:4000/2020/02/dps-week4/"},"@context":"http://schema.org"}
@@ -161,96 +161,101 @@
"body": " {% if page. url == / %} {% assign latest_post = site. posts[0] %} <div class= topfirstimage style= background-image: url({% if latest_post. image contains :// %}{{ latest_post. image }}{% else %} {{site. baseurl}}/{{ latest_post. image}}{% endif %}); height: 200px; background-size: cover; background-repeat: no-repeat; ></div> {{ latest_post. title }} : {{ latest_post. excerpt | strip_html | strip_newlines | truncate: 136 }} In {% for category in latest_post. categories %} {{ category }}, {% endfor %} {{ latest_post. date | date: '%b %d, %Y' }} {%- assign second_post = site. posts[1] -%} {% if second_post. image %} <img class= w-100 src= {% if second_post. image contains :// %}{{ second_post. image }}{% else %}{{ second_post. image | absolute_url }}{% endif %} alt= {{ second_post. title }} > {% endif %} {{ second_post. title }} : In {% for category in second_post. categories %} {{ category }}, {% endfor %} {{ second_post. date | date: '%b %d, %Y' }} {%- assign third_post = site. posts[2] -%} {% if third_post. image %} <img class= w-100 src= {% if third_post. image contains :// %}{{ third_post. image }}{% else %}{{site. baseurl}}/{{ third_post. image }}{% endif %} alt= {{ third_post. title }} > {% endif %} {{ third_post. title }} : In {% for category in third_post. categories %} {{ category }}, {% endfor %} {{ third_post. date | date: '%b %d, %Y' }} {%- assign fourth_post = site. posts[3] -%} {% if fourth_post. image %} <img class= w-100 src= {% if fourth_post. image contains :// %}{{ fourth_post. image }}{% else %}{{site. baseurl}}/{{ fourth_post. image }}{% endif %} alt= {{ fourth_post. title }} > {% endif %} {{ fourth_post. title }} : In {% for category in fourth_post. categories %} {{ category }}, {% endfor %} {{ fourth_post. date | date: '%b %d, %Y' }} {% for post in site. posts %} {% if post. tags contains sticky %} {{post. title}} {{ post. excerpt | strip_html | strip_newlines | truncate: 136 }} Read More {% endif %}{% endfor %} {% endif %} All Stories: {% for post in paginator. posts %} {% include main-loop-card. html %} {% endfor %} {% if paginator. total_pages > 1 %} {% if paginator. previous_page %} « Prev {% else %} « {% endif %} {% for page in (1. . paginator. total_pages) %} {% if page == paginator. page %} {{ page }} {% elsif page == 1 %} {{ page }} {% else %} {{ page }} {% endif %} {% endfor %} {% if paginator. next_page %} Next » {% else %} » {% endif %} {% endif %} {% include sidebar-featured. html %} "
}, {
"id": 12,
+ "url": "http://localhost:4000/2020/04/v3-2019-lesson06-note/",
+ "title": "fastai 2019 course-v3 Part1, lesson06",
+ "body": "2020/04/15 - Lesson 06Rossmann(Tabular): Tabular data: be careful on Categorical variable vs Continuous variable. if datatype is int, fastai think it is classification, not a regression. Root mean square percentage error. as loss function. When you assign the y_range, it’s better to assign little bit more than actual maximum. > because it’s sigmoid. intermediate layers, which is weight matrix is 1) 1000, and 2) 500 -> which means our parameter would be 500*1000. learn. modelWhat is dropout and embedding dropout?: Nitish Srivastava, Dropout: A Simple way to prevent Neural Networks from Overfitting you can dropout with p value, make it specified to specific layer, or make it applied to all the layers. Pytorch code 1) bernoulli, which decides whether you will hold it? 2) and divide the noise value depends on noise value. so noise became 2 or remain 0. According to pytorch code, We do change at training time, but we do nothing at test time. and this means you don’t have to do anything special with inference time. ’ TODO: find at forums what is inference time - Related to NVIDIA, GPU. Embedding dropout is just a dropout. It’s different between continuous variable and embedding layer. TODO Still can’t understand. why embedding dropout is effective. or,… in need. Let’s delete at random, some of the results of the embedding. and It worked well especially at Kaggle Batch Normalization: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift -> came out false! According to How Does Batch Normalization Help Optimization? The key was multiplicative bias {\gamma} and additive bias {\beta}` Explain Let $$ \hat{y} = f(w_1, w_2, w_3, … , x)} $$ , loss = MSE , Then y_range should be between 1 and 5` And Activation function ends with -1 -> +1 To mitigate this problem, we can add the other parameter, like $$w_n$$ But there’re so much interactions in the process so just re-scale the output. Momentum parameter at BatchNorm1d: Different from momentum like in optimization. This momentum is Exponentially weighted moving average of the mean, instead of deviation. If this is small number: mean standard deviation would be less from mini_batch to mini_batch » less regularization effect. (If this is large number, variation would be greater from mini_batch to mini_batch » more regularization effect) TODO: can’t sure, but i understand, this is not about how to update parameter but about how much reflect previous value when scale and shift Q. Preference between batchnorm and the other regularizations(drop out, weight decay)A. Nope, always try and see the results## lesson6-pets-more### Data Augmentation- Last reg- `get_transforms` has lots of params (even not yet learned all) -> check documentation - Remember you can implement all the doc contents bc it's made from nbdev - TODO: try this!!- Essence of data augmentation is you should maintain the label, while somewhat making sense. - ex) tilt, because it's optically sensible, you can always change the angle of the data view. - zeros, border, and reflection but always `reflection` works most of the time, so that is the default### Convolutional Kernel(What is convolution?)- Will make heat\_map from scratch, which means the parts convolution focuses on![setosa_visualization]()- http://setosa. io/ev/image-kernels/ - javascript thing - How convolution works - Kernel. which does element-wise multiplication, and sum them up - so it has on pixel less at borders -> so it uses padding, and fastai uses reflection as said. - why this Kernel(matrix) helps catching horizontal edge side? - because this kernel`(picture2)` weights differently, depends on `x axis` - why familiar, because it's similar intuition with fugus`(paper)` paper- CNN from different viewpoints`link` - output of pixel is results from different linear equations. - If you connect this with represents of neural network nodes, you can see that the specific inp nodes connected with specific out nodes. - **Summarize**: cnn does 1) matmul some of the elements are always zero 2) same weight for every row, which is called `weight time? weight. . ?, 1:18:50` `(picture)`#### Further lowdown- Because generally image has 3 channels, we need rank 3 kernel. - And **do multiply with all channel output is one pixel**. (`draw by your self`) - but this kernel will catch one feature, like horizontal, so that we make more kernel so that output becomes (h * w * kernel) - And that `kernel` come to `channel`- **Conv2d**: with 3 by 3 kernel, stride 2 conv -> (h/2 * w/2 * kernel) - skip or jump over input pixel - to protect from memory out of control~~~pythonlearn. modellearn. summary()~~~TODO: understand yourself the blocks of conv-kernel: - Usually use big kernel size at first layer (will study this at part2)- Bottom right highlighting kernel(`pic / draw`)- `torch. tensor. expand`: for memory efficient, because we should do RGB- We do not make separate kernel, but make rank 4 kernel - 4d tensor is just stacked kernel- `t[None]. shape` create new unit axis, and why? we make this -> it should move unit of batch, not one size image. ### Average pooling, feature- suppose our pre-trained model results in size of `11 by 11 by 512 ` `pic 4` and my classification task has 37 classes * take the first face of channel, which is 11 by 11 and `mean` it, so that make rank 2 tensor, 512 by 1 * and make 2d matrix, which is 512 by 37 and multiply so that we can get 37 by 1 matrix. - Feature, at convolution block - So, when we transfer-learning without unfreeze, every element of last matrix (512 by 1) should represent(or could catch) each feature. ### Heatmap, Hook~~~hook_output(model[0]) -> acts -> avg_acts~~~- if we average the block with `axis=feature`, result of matrix(11 by 11) depicts `how activated was that area?` -> it is heatmap, `avg_acts`- and acts comes from hook, which is more advanced pytorch feature. - hook into pytorch machine itself, and run any arbitrary Pytorch code - Why this is cool?: Normally it gives set of outputs of forward pass, but we can interrupt and hook the forward pass. - Also can store the output of the convolutional part of the model, which is before avg_pooling- Thinking back when we do cut off `after` the conv part. - but with fast. ai the original convolutional part of the model would be *the first thing in the model*, specifically could be given from `learn. model. eval()[0]` - And this is gotten from `hooked_output` and having hooked the output, we can pass our x_minibatch to output. - Not directly, but with normalized, minibatch, put on to the gpu - `one_item()` function do it, when we have one data `TODO: this is assignment` do it yourself without one_item function - and `. cuda()` put it on gpu- you should print out very often the shape of tensor, and try think why. "
+ }, {
+ "id": 13,
+ "url": "http://localhost:4000/2020/04/qna-image-segmentation/",
+ "title": "[Q&A] Image Segmentation, using Unet with Driving Video data",
+ "body": "2020/04/02 - This post is about my questions while I was studying USF Deep Learning course about image segmentation task. All the answers are from the course, source code, library document, or document. I cared about being clear at reporting information including source of information, however if there are still anything unclear, please contact me. And thank you Jeremy&Rachael for everything. Also Thank you Cambridge Computer Vision Lab to made us to study with your labor. The Cambridge-driving Labeled Video Database (CamVid) is the first collection of videos with object class semantic labels, complete with metadata. The database provides ground truth labels that associate each pixel with one of 32 semantic classes. If someone is interested in this project, please check the site and see the details. Now, let’s start first using jupyter’s one of tricks which I love most. It enables cell to print the code without print function. from IPython. core. interactiveshell import InteractiveShell# pretty print all cell's output and not just the last oneInteractiveShell. ast_node_interactivity = all from fastai. vision import *from fastai. callbacks. hooks import *from fastai. utils. mem import *path = untar_data(URLs. CAMVID) # The locations where the data and models are downloaded are set in config. ymlpath. ls() I’m trying to accustomed to using pathlib module, not just it became built-in module in python, but I felt uncomfortable myself with os module. However, still unpredictable conflicts are remain, even in the quite standard library like Pytorch, tensorflow, onnx. (it require me string for path. not PosixPath. will send PR. . ) [PosixPath('/root/. fastai/data/camvid/valid. txt'), PosixPath('/root/. fastai/data/camvid/images'), PosixPath('/root/. fastai/data/camvid/labels'), PosixPath('/root/. fastai/data/camvid/codes. txt')]path_img = path/'images'path_lbl = path/'labels'fnames = get_image_files(path_img) #filenamelbl_names = get_image_files(path_lbl)1. (Play with data) My Hypothesis: File name has A_B format. and A / B would be at key-value position. Use collections - defaultdict Default Dict: Link: easy to group a sequence of key and value pairs into a dictionary of list?from collections import defaultdictfnames[0], lbl_names[0](PosixPath('/root/. fastai/data/camvid/images/0001TP_009210. png'), PosixPath('/root/. fastai/data/camvid/labels/0016E5_01800_P. png'))files = [tuple(i. stem. split('_')) for i in fnames]labels = [tuple(i. stem. split('_')[:-1]) for i in lbl_names]d = defaultdict(list)for k, v in files: d[k]. append(v)d. keys()len(d['0001TP'])124for k, v in d. items(): print(k, v)0001TP ['009210', '008850', '007350', '008970', '009840', '010140', '008490', '008520', '009540', '008250', '008340', '006840', '007860', '007410', '007740', '009870', '010080', '007890', '008790', '010020', '008400', '007080', '008280', '010380', '009330', '009060', '007470', '006810', '009720', '008580', '007110', '008730', '009150', '007680', '009780', '007800', '007290', '008760', '009510', '008640', '008310', '007440', '006900', '007500', '008460', '009030', '008130', '009480', '009900', '010230', '009270', '008040', '007590', '007950', '009990', '008550', '007260', '008100', '007530', '006960', '008190', '009420', '009930', '009000', '007830', '008940', '006690', '009570', '008880', '010170', '007560', '009300', '006750', '009360', '010200', '007320', '008010', '009120', '007620', '007200', '007140', '010320', '006720', '008670', '007230', '008370', '010260', '009690', '006930', '009090', '007770', '010290', '010350', '008610', '008070', '009600', '008430', '009450', '007380', '009240', '007710', '007170', '008160', '008910', '007020', '006780', '007050', '009960', '009810', '008220', '009180', '009750', '010050', '009660', '010110', '007920', '009630', '007650', '006990', '008700', '009390', '007980', '008820', '006870']0016E5 ['01290', '08159', '05760', '08133', '08063', '06660', '00960', '05850', '00750', '06960', '08035', '08107', '07975', '08017', '05610', '07140', '08119', '08027', '07170', '08400', '08093', '02100', '06390', '04470', '08340', '06060', '00600', '07470', '08151', '07800', '01620', '05730', '01530', '00690', '08430', '05940', '01980', '07320', '08069', '07965', '04380', '05430', '01410', '06780', '08007', '08087', '08079', '06600', '08109', '05490', '00901', '04590', '04680', '08045', '01770', '06690', '08085', '06810', '00420', '08011', '07440', '02190', '06300', '04800', '01500', '00450', '08029', '01470', '06330', '07997', '08067', '05370', '08013', '08190', '00840', '02370', '08049', '08135', '01440', '06870', '05820', '05280', '08051', '04440', '08091', '01380', '00630', '07290', '05520', '04770', '00540', '07995', '07999', '05550', '07920', '08101', '08141', '08053', '04620', '08103', '05160', '07350', '08057', '06030', '06000', '08550', '07963', '08089', '05970', '08047', '05640', '06240', '05220', '04350', '01590', '07959', '01950', '08117', '06180', '01560', '05400', '08043', '07680', '00780', '08081', '07050', '01020', '01350', '04530', '06720', '07969', '08149', '08003', '08131', '08129', '08033', '05460', '01650', '07530', '08023', '05340', '08640', '05100', '08075', '01230', '04980', '02070', '01080', '06210', '05910', '08009', '01800', '05190', '02400', '08083', '08019', '07620', '07200', '07890', '08059', '06990', '04410', '08121', '08123', '06930', '08137', '08147', '08095', '06570', '06150', '08153', '06840', '05250', '00510', '08370', '08580', '08113', '07410', '08097', '01200', '04950', '07770', '07650', '04710', '06090', '08055', '07110', '07981', '00990', '08250', '08127', '01920', '07985', '08220', '08005', '08157', '05130', '08071', '01140', '04830', '07740', '08143', '06120', '02040', '08111', '08115', '00660', '08280', '06420', '07983', '02220', '05700', '01860', '01260', '04920', '06510', '07020', '08073', '08105', '08125', '06360', '07860', '07993', '00810', '06540', '08099', '08139', '02010', '07973', '08155', '07991', '06630', '00480', '06750', '04890', '08001', '08025', '00870', '08490', '01830', '07977', '05010', '01170', '07961', '01680', '01050', '07987', '07080', '04560', '00930', '05310', '02340', '05790', '08460', '00720', '08031', '02280', '08039', '08037', '08065', '06270', '08077', '06900', '04650', '06480', '07230', '08041', '06450', '00570', '07989', '04740', '07979', '02250', '07380', '00390', '01710', '07590', '08021', '08520', '07500', '01110', '04500', '02310', '07971', '02130', '05580', '05880', '08610', '08310', '08145', '05670', '04860', '07260', '08015', '07967', '01740', '01320', '07560', '07830', '01890', '08061', '02160', '07710', '05070', '05040']Seq05VD ['f00030', 'f02550', 'f03450', 'f01110', 'f00480', 'f00210', 'f04590', 'f04170', 'f01800', 'f03990', 'f03360', 'f03900', 'f02070', 'f00810', 'f03690', 'f01350', 'f01530', 'f04980', 'f05100', 'f03060', 'f00900', 'f03870', 'f02460', 'f01470', 'f02370', 'f02820', 'f04080', 'f02760', 'f04860', 'f02250', 'f04200', 'f00270', 'f03720', 'f02850', 'f04410', 'f01200', 'f03090', 'f02010', 'f03930', 'f00090', 'f01650', 'f01890', 'f03840', 'f03030', 'f02130', 'f01230', 'f04110', 'f02520', 'f04140', 'f04020', 'f00060', 'f03420', 'f01560', 'f00120', 'f04290', 'f02340', 'f00300', 'f01380', 'f00870', 'f01860', 'f02970', 'f04560', 'f02730', 'f00330', 'f04530', 'f03780', 'f01770', 'f03390', 'f05040', 'f02430', 'f03330', 'f00660', 'f01740', 'f02100', 'f04800', 'f04050', 'f00510', 'f02790', 'f04350', 'f00690', 'f00540', 'f02490', 'f00960', 'f00930', 'f04230', 'f02880', 'f03600', 'f01020', 'f01500', 'f02400', 'f04830', 'f04470', 'f03300', 'f02670', 'f00450', 'f01980', 'f01170', 'f01620', 'f04500', 'f01080', 'f03180', 'f05070', 'f03150', 'f04950', 'f01440', 'f03510', 'f01710', 'f00360', 'f04770', 'f02910', 'f01050', 'f00630', 'f04320', 'f00570', 'f03240', 'f02190', 'f01140', 'f03540', 'f02220', 'f02640', 'f03960', 'f00000', 'f04920', 'f01950', 'f00990', 'f03480', 'f03000', 'f00420', 'f04620', 'f03210', 'f00780', 'f03570', 'f01590', 'f00750', 'f01920', 'f04650', 'f03750', 'f03630', 'f02310', 'f02610', 'f02580', 'f04740', 'f02280', 'f04680', 'f00390', 'f00720', 'f03660', 'f02040', 'f03270', 'f00180', 'f03810', 'f01410', 'f01290', 'f03120', 'f00840', 'f04440', 'f00150', 'f01260', 'f02700', 'f02940', 'f00600', 'f01830', 'f04260', 'f05010', 'f04890', 'f02160', 'f00240', 'f04380', 'f01680', 'f04710', 'f01320']0006R0 ['f02820', 'f03690', 'f03180', 'f02550', 'f01020', 'f03660', 'f02340', 'f01170', 'f02610', 'f02940', 'f01290', 'f02100', 'f01350', 'f03270', 'f03870', 'f01380', 'f01980', 'f03810', 'f02430', 'f02310', 'f01830', 'f03480', 'f02970', 'f01890', 'f03210', 'f03930', 'f02040', 'f02070', 'f02400', 'f01560', 'f03030', 'f01770', 'f01590', 'f01950', 'f03420', 'f01650', 'f03450', 'f00990', 'f03630', 'f01500', 'f03570', 'f00930', 'f03090', 'f03360', 'f02880', 'f02460', 'f01440', 'f01920', 'f01230', 'f03840', 'f02730', 'f01620', 'f02220', 'f03750', 'f03330', 'f03540', 'f02520', 'f02790', 'f01050', 'f03120', 'f01800', 'f01140', 'f01860', 'f01530', 'f01470', 'f02670', 'f02490', 'f01260', 'f01110', 'f02760', 'f01680', 'f03150', 'f02580', 'f03300', 'f02280', 'f01200', 'f03390', 'f03510', 'f02640', 'f02190', 'f02370', 'f01320', 'f02130', 'f03600', 'f03240', 'f03780', 'f03720', 'f02700', 'f01410', 'f01080', 'f02850', 'f01710', 'f03900', 'f03060', 'f01740', 'f02010', 'f02250', 'f00960', 'f03000', 'f02160', 'f02910']for k, v in d. items(): print(k, len(d[k]))0001TP 1240016E5 305Seq05VD 1710006R0 101for i in d2. keys(): print(i,len(d2[i]))0016E5 3050001TP 1240006R0 101Seq05VD 171files[0], labels[0](('0001TP', '009210'), ('0016E5', '01800'))2. My question: Link: Why do we need masking? and does color from fastai library? (have to look into source code) What do the parameter alpha do? When people make masked img, would it be have ranged integer limit? Does image normalization related with this?lbl_sorted = sorted(lbl_names)f_sorted = sorted(fnames)lbl_1 = lbl_sorted[33]f_1 = f_sorted[33]img = open_image(lbl_1)mask = open_mask(lbl_1)_,axs = plt. subplots(1,2, figsize=(10,5))# img. show(ax=axs[0], y=mask, title='masked')img. show(ax=axs[0], title='1')mask. show(ax=axs[1], title='2', alpha=1. ) img_2 = open_image(f_1)mask_2 = open_mask(f_1)_,axs = plt. subplots(1,2, figsize=(10,5))# img. show(ax=axs[0], y=mask, title='masked')img_2. show(ax=axs[0], title='3',)mask_2. show(ax=axs[1], title='4', alpha=1. ) open_mask(lbl_1). data. shapetorch. Size([1, 720, 960])open_mask(lbl_1). data. shapetorch. Size([1, 720, 960])open_image(f_1). data. shapetorch. Size([3, 720, 960])open_image(f_1). data. shapetorch. Size([3, 720, 960])img. data #labeled datatensor([[[0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], . . . , [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176]], [[0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], . . . , [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176]], [[0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], . . . , [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176]]])mask. data # after mask, labeled datatensor([[[ 4, 4, 4, . . . , 21, 21, 21], [ 4, 4, 4, . . . , 21, 21, 21], [ 4, 4, 4, . . . , 21, 21, 21], . . . , [17, 17, 17, . . . , 30, 30, 30], [17, 17, 17, . . . , 30, 30, 30], [17, 17, 17, . . . , 30, 30, 30]]])img_2. data, mask_2. data(tensor([[[0. 0706, 0. 0667, 0. 0706, . . . , 0. 6431, 0. 6549, 0. 6627], [0. 0745, 0. 0706, 0. 0706, . . . , 0. 6431, 0. 6510, 0. 6549], [0. 0784, 0. 0706, 0. 0745, . . . , 0. 6392, 0. 6588, 0. 6588], . . . , [0. 0863, 0. 0824, 0. 0824, . . . , 0. 1333, 0. 1216, 0. 1255], [0. 0902, 0. 0863, 0. 0824, . . . , 0. 1255, 0. 1176, 0. 1216], [0. 0863, 0. 0824, 0. 0784, . . . , 0. 1137, 0. 1059, 0. 1137]], [[0. 0706, 0. 0667, 0. 0706, . . . , 0. 7490, 0. 7608, 0. 7686], [0. 0745, 0. 0706, 0. 0706, . . . , 0. 7451, 0. 7569, 0. 7608], [0. 0784, 0. 0706, 0. 0745, . . . , 0. 7412, 0. 7529, 0. 7529], . . . , [0. 0980, 0. 0941, 0. 0941, . . . , 0. 1804, 0. 1686, 0. 1725], [0. 1059, 0. 1020, 0. 0980, . . . , 0. 1725, 0. 1647, 0. 1686], [0. 1020, 0. 0980, 0. 0941, . . . , 0. 1608, 0. 1529, 0. 1608]], [[0. 0784, 0. 0745, 0. 0784, . . . , 0. 7569, 0. 7686, 0. 7765], [0. 0824, 0. 0784, 0. 0784, . . . , 0. 7647, 0. 7647, 0. 7686], [0. 0784, 0. 0706, 0. 0745, . . . , 0. 7608, 0. 7647, 0. 7647], . . . , [0. 1216, 0. 1176, 0. 1176, . . . , 0. 2000, 0. 1882, 0. 1922], [0. 1176, 0. 1137, 0. 1098, . . . , 0. 1843, 0. 1765, 0. 1804], [0. 1137, 0. 1098, 0. 1059, . . . , 0. 1725, 0. 1647, 0. 1725]]]), tensor([[[ 18, 17, 18, . . . , 183, 186, 188], [ 19, 18, 18, . . . , 183, 185, 186], [ 20, 18, 19, . . . , 182, 185, 185], . . . , [ 25, 24, 24, . . . , 43, 40, 41], [ 26, 25, 24, . . . , 41, 39, 40], [ 25, 24, 23, . . . , 38, 36, 38]]]))3. What is a difference between image and imageSegment?: imageSegment An ImageSegment object has the same properties as an Image. The only difference is that when applying the transformations to an ImageSegment, it will ignore the functions that deal with lighting and keep values of 0 and 1. It’s easy to show the segmentation mask over the associated Image by using the y argument of show_image. img = open_image(fnames[0])mask = open_mask(lbl_names[0])_,axs = plt. subplots(1,3, figsize=(8,4))img. show(ax=axs[0], title='no mask')img. show(ax=axs[1], y=mask, title='masked') #seg mask over the img using y argmask. show(ax=axs[2], title='mask only', alpha=1. ) vision. image ##4. Why/How img div by 255 and how it results fast. ai : vision. image - If div=True, pixel values are divided by 255. to become floats between 0. and 1. At times, you want to get rid of distortions caused by lights and shadows in an image. Normalizing the RGB values of an image can at times be a simple and effective way of achieving this. So sum of the pixel’s value over all channels(which is S) divides each intensified channel so that nomalized value will be R/S, G/S and B/S (where, S=R+G+B). Detailed explain here4. Python Evaluation Order: Python evaluates expressions from left to right. Notice that while evaluating an assignment, the right-hand side is evaluated before the left-hand side. mask_tmp, trg_tmp, void_tmp = 2, 1, 10mask_tmp = trg_tmp != void_tmpprint(mask_tmp, trg_tmp, void_tmp) # (1) target is not same with voidTrue 1 10# Example 1x = 1y = 2x,y = y,x+yx, y(2, 3)# Example 2x = 1y = 2x = yy = x+yx, y(2, 4)5. model learner parameter :: pct_start: A: Percentage of total number of epochs when learning rate rises during one cycle. Q: Sorry, I still confused that one cycle in the new API only runs one epoch. How the percentage of total number of epochs works? Can you give a example? If learn. fit_one_cycle(10, slice(1e-4,1e-3,1e-2), pct_start=0. 05)??A: Ok, strictly correct answer would be percentage of iterations, so you can have lr both increase and decrease during same epoch. In your example, say, you have 100 iterations per epoch, then for half an epoch (0. 05 * (10 * 100) = 50) lr will rise, then slowly decrease. Q2: Thanks for this explanation … so essentially, it is the percentage of overall iterations where the LR is increasing, correct? So, given the default of 0. 3, it means that your LR is going up for 30% of your iterations and then decreasing over the last 70%. Is that a correct summation of what is happening? A2: Yes, I think that’s correct. You can verify that by changing its value and check:learn. recorder. plot_lr() For example if pct_start = 0. 2 source: forums. fastai "
+ }, {
+ "id": 14,
"url": "http://localhost:4000/2020/03/note08-fastai-4/",
"title": "Gradient backward, Chain Rule, Refactoring",
- "body": "2020/03/02 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring” Lecture 08 - Deep Learning From Foundations-part2 “ Homework: calculus for machine learning einsum conventionCONTENTS: Foundation version Gradients backward pass decompose function chain rule with code check the result using Pytorch autograd Refactor model Layers as classes Modue. forward() Without einsum nn. Linear and nn. Module Forward process Foundation version: Gradients backward pass: Gradients is output with respect to parameter we’ve done this work in this path(below) to simplify this calculus, we can just change it into, So, you should know of the derivative of each bit on its own, and then you multiply them all together. As a result, it would be over cross over the data. So you can get gradient, output with respect to parameter What order should we calculate? BTW, why Jeremy wrote , not Loss function?1 decompose function We want to get derivative of which forms But, we have a estimation of answer (we call it y hat) now So, I will decompose funciton to trace target variable. Using the above forward pass, we can suppose some function from the end. start from , We know MSE funciton got two parameters, output, and target . from MSE’s input we know function’s output and supposing v is input of that function, similarly, v became output of chain rule with code examplify backward process by random sampling To get a variable, I modified forward model a little def model_ping(out = 'x_train'): l1 = lin(x_train, w1, b1) # one linear layer l2 = relu(l1) # one relu layer l3 = lin(l2, w2, b2) # one more linear layer return eval(out) Be careful we don’t use mse_loss in backward process1) start with the very last function, which is loss funciton. MSE If we codify this formula,def mse_grad(inp, targ): #mse_input(1000,1), mse_targ (1000,1) # grad of loss with respect to output of previous layer inp. g = 2. * (inp. squeeze() - targ). unsqueeze(-1) / inp. shape[0] And, this can be examplified like below. Notice that input of gradient function is same with forward functiony_hat = model_ping('l3') #get value from forward modely_hat. g = ((y_hat. squeeze(-1)-y_train). unsqueeze(-1))/y_hat. shape[0]y_hat. g. shape>>> torch. Size([50000, 1]) We can just calculate using broadcasting, not using squeeze. then why should do and unsqueeze again?🎯 It’s related with random access memory(RAM). . If I don’t squeeze, (I’m using colab) it out of RAM. 2) Derivative of linear2 function This process’s weight dimensions defined by axis=1, axis=2. axis=0 dimension means size of data. This will be summazed by . sum(0) method. unsqeeze(-1)&unsqeeze(1) seperates the dimension, and make a dot product, and vanish axis=0 dimension. def lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowlin2 = model_ping('l2'); #get value from forward modellin2. g = y_hat. g@w2. t(); w2. g = (lin2. unsqueeze(-1) * y_hat. g. unsqueeze(1)). sum(0);b2. g = y_hat. g. sum(0);lin2. g. shape, w2. g. shape, b2. g. shape>>> torch. Size([50000, 50])torch. Size([50, 1])torch. Size([1]) Notice going reverse order, we’re passing in gradient backward3) derivative of ReLU def relu_grad(inp, out): # grad of relu with respect to input activations inp. g = (inp>0). float() * out. g Examplified belowlin1=model_ping('l1') #get value from forward modellin1. g = (lin1>0). float() * lin2. g;lin1. g. shape>>> torch. Size([50000, 50])4) Derivative of linear1 Same process with 2) but, this process’s weight hasdef lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowx_train. g = lin1. g @ w1. t(); w1. g = (x_train. unsqueeze(-1) * lin1. g. unsqueeze(1)). sum(0); b1. g = lin1. g. sum(0);x_train. g. shape, w1. g. shape, b1. g. shape>>> torch. Size([50000, 784])torch. Size([784, 50])torch. Size([50])5) Then it goes backward pass def forward_and_backward(inp, targ): # forward pass: l1 = inp @ w1 + b1 l2 = relu(l1) out = l2 @ w2 + b2 # we don't actually need the loss in backward! loss = mse(out, targ) # backward pass: mse_grad(out, targ) lin_grad(l2, out, w2, b2) relu_grad(l1, l2) lin_grad(inp, l1, w1, b1)Version 1 (Basic)- Wall time: 1. 95 s Summary Notice that output of function at forward pass became input of backward pass backpropagation is just the chain rule value loss (loss=mse(out,targ)) is not used in gradient calcuation. Because, it doesn’t appear with the weight. w1g, w2g, b1g, b2g, ig will be used for optimizercheck the result using Pytorch autograd require_grad_ is the magical function, which can automatic differentiation. 2 This magical auto gradified tensor keep track what happend in forward (taking loss function), and do the backward3 So it saves our time to differentiate ourselves ⤵️ THis is benchmark…. . Version 2 (torch autograd)- Wall time: 3. 81 µs Refactor model: Amazingly, just refactoring our main pieces, it comes down up to Pytorch package. 🌟 Implement yourself, Practice, practice, practice! 🌟 Layers as classes: Relu and Linear are layers in oue neural net. -> make it as classes For the forward, using __call__ for the both of forward & backward. Because ‘call’ means we treat this as a function. class Lin(): def __init__(self, w, b): self. w,self. b = w,b def __call__(self, inp): self. inp = inp self. out = inp@self. w + self. b return self. out def backward(self): self. inp. g = self. out. g @ self. w. t() # Creating a giant outer product, just to sum it, is inefficient! self. w. g = (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) self. b. g = self. out. g. sum(0) Remember that in lin_grad function, we save bias&weight!!!!!💬 inp. g : gradient of the output with respect to the input. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 w. g : gradient of the output with respect to the weight. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 b. g : gradient of the output with respect to the bias. {: style=”color:grey; font-size: 90%; text-align: center;”} class Model(): def __init__(self, w1, b1, w2, b2): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ) def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() refer to Jeremy’s Model class, he put layers in list Dionne’s self-study note: Decomposing Jeremy’s Model class init needs weight, bias but not x data when call that class(a. k. a function) it gave x data and y label! jeremy composited function in layers. x = l(x) so concise…. . also utilized that layer list when backward ust reversing it (using python list’s method) And he is recursively calling the function on the result of the previous thing. ⬇️for l in self. layers: x = l(x)Q2: Don’t I need to declare magical autograd function, requires_grad_?{: style=”color:red; font-size: 130%; text-align: center;”} [The questions migrated to this article] Version 3 (refactoring - layer to class)- Wall time: 5. 25 µs Modue. forward(): Duplicate code makes execution time slow. Role of __call__ changed. No more __call__ for implementing forward pass. By initializing the forward with __call__, Module. forward() use overriding to maximize reusability. So any layer inherit Module, can use parent’s function. gradient of the output with respect to the weight (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) can be reexpressed using einsum, torch. einsum( bi,bj->ij , inp, out. g) Defining forward and Module enables Pytorch to out almost duplicatesVersion 4 (Module & einsum)- Wall time: 4. 29 µs Q2: Isn’t there any way to use broadcasting? Why we should use outer product?{: style=”color:red; font-size: 130%; text-align: center;”} Without einsum: Replacing einsum to matrix product is even more faster. torch. einsum( bi,bj->ij , inp, out. g)can be reexpressed using matrix product, inp. t() @ out. gVersion 5 (without einsum)- Wall time: 3. 81 µs nn. Linear and nn. Module: Torch’s package nn. Linear and nn. Module Version 6 (torch package)- Wall time: 5. 01 µs Final, Using torch. nn. Linear & torch. nn. Module~~~pythonclass Model(nn. Module): def init(self, n_in, nh, n_out): super(). init() self. layers = [nn. Linear(n_in,nh), nn. ReLU(), nn. Linear(nh,n_out)] self. loss = mse def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x. squeeze(), targ)class Model(): def init(self): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ)def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() ~~~ Footnote: fast. ai forums Lesson-8 ↩ pytorch docs - autograd ↩ stackoverflow - finding methods a object has ↩ "
+ "body": "2020/03/02 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring ” Lecture 08 - Deep Learning From Foundations-part2 “ Homework: calculus for machine learning einsum conventionCONTENTS: Foundation version Gradients backward pass decompose function chain rule with code check the result using Pytorch autograd Refactor model Layers as classes Modue. forward() Without einsum nn. Linear and nn. Module Forward process Foundation version: Gradients backward pass: Gradients is output with respect to parameter we’ve done this work in this path(below) to simplify this calculus, we can just change it into, So, you should know of the derivative of each bit on its own, and then you multiply them all together. As a result, it would be over cross over the data. So you can get gradient, output with respect to parameter What order should we calculate? BTW, why Jeremy wrote , not Loss function?1 decompose function We want to get derivative of which forms But, we have a estimation of answer (we call it y hat) now So, I will decompose funciton to trace target variable. Using the above forward pass, we can suppose some function from the end. start from , We know MSE funciton got two parameters, output, and target . from MSE’s input we know function’s output and supposing v is input of that function, similarly, v became output of chain rule with code examplify backward process by random sampling To get a variable, I modified forward model a little def model_ping(out = 'x_train'): l1 = lin(x_train, w1, b1) # one linear layer l2 = relu(l1) # one relu layer l3 = lin(l2, w2, b2) # one more linear layer return eval(out) Be careful we don’t use mse_loss in backward process1) start with the very last function, which is loss funciton. MSE If we codify this formula,def mse_grad(inp, targ): #mse_input(1000,1), mse_targ (1000,1) # grad of loss with respect to output of previous layer inp. g = 2. * (inp. squeeze() - targ). unsqueeze(-1) / inp. shape[0] And, this can be examplified like below. Notice that input of gradient function is same with forward functiony_hat = model_ping('l3') #get value from forward modely_hat. g = ((y_hat. squeeze(-1)-y_train). unsqueeze(-1))/y_hat. shape[0]y_hat. g. shape>>> torch. Size([50000, 1]) We can just calculate using broadcasting, not using squeeze. then why should do and unsqueeze again?🎯 It’s related with random access memory(RAM). . If I don’t squeeze, (I’m using colab) it out of RAM. 2) Derivative of linear2 function This process’s weight dimensions defined by axis=1, axis=2. axis=0 dimension means size of data. This will be summazed by . sum(0) method. unsqeeze(-1)&unsqeeze(1) seperates the dimension, and make a dot product, and vanish axis=0 dimension. def lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowlin2 = model_ping('l2'); #get value from forward modellin2. g = y_hat. g@w2. t(); w2. g = (lin2. unsqueeze(-1) * y_hat. g. unsqueeze(1)). sum(0);b2. g = y_hat. g. sum(0);lin2. g. shape, w2. g. shape, b2. g. shape>>> torch. Size([50000, 50])torch. Size([50, 1])torch. Size([1]) Notice going reverse order, we’re passing in gradient backward3) derivative of ReLU def relu_grad(inp, out): # grad of relu with respect to input activations inp. g = (inp>0). float() * out. g Examplified belowlin1=model_ping('l1') #get value from forward modellin1. g = (lin1>0). float() * lin2. g;lin1. g. shape>>> torch. Size([50000, 50])4) Derivative of linear1 Same process with 2) but, this process’s weight hasdef lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowx_train. g = lin1. g @ w1. t(); w1. g = (x_train. unsqueeze(-1) * lin1. g. unsqueeze(1)). sum(0); b1. g = lin1. g. sum(0);x_train. g. shape, w1. g. shape, b1. g. shape>>> torch. Size([50000, 784])torch. Size([784, 50])torch. Size([50])5) Then it goes backward pass def forward_and_backward(inp, targ): # forward pass: l1 = inp @ w1 + b1 l2 = relu(l1) out = l2 @ w2 + b2 # we don't actually need the loss in backward! loss = mse(out, targ) # backward pass: mse_grad(out, targ) lin_grad(l2, out, w2, b2) relu_grad(l1, l2) lin_grad(inp, l1, w1, b1)Version 1 (Basic)- Wall time: 1. 95 s Summary Notice that output of function at forward pass became input of backward pass backpropagation is just the chain rule value loss (loss=mse(out,targ)) is not used in gradient calcuation. Because, it doesn’t appear with the weight. w1g, w2g, b1g, b2g, ig will be used for optimizercheck the result using Pytorch autograd require_grad_ is the magical function, which can automatic differentiation. 2 This magical auto gradified tensor keep track what happend in forward (taking loss function), and do the backward3 So it saves our time to differentiate ourselves Postfix underscore means in pytorch, in-place function, What is in-place function?⤵️ THis is benchmark…. . Version 2 (torch autograd)- Wall time: 3. 81 µs Refactor model: Amazingly, just refactoring our main pieces, it comes down up to Pytorch package. 🌟 Implement yourself, Practice, practice, practice! 🌟 Layers as classes: Relu and Linear are layers in oue neural net. -> make it as classes For the forward, using __call__ for the both of forward & backward. Because ‘call’ means we treat this as a function. class Lin(): def __init__(self, w, b): self. w,self. b = w,b def __call__(self, inp): self. inp = inp self. out = inp@self. w + self. b return self. out def backward(self): self. inp. g = self. out. g @ self. w. t() # Creating a giant outer product, just to sum it, is inefficient! self. w. g = (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) self. b. g = self. out. g. sum(0) Remember that in lin_grad function, we save bias&weight!!!!!💬 inp. g : gradient of the output with respect to the input. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 w. g : gradient of the output with respect to the weight. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 b. g : gradient of the output with respect to the bias. {: style=”color:grey; font-size: 90%; text-align: center;”} class Model(): def __init__(self, w1, b1, w2, b2): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ) def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() refer to Jeremy’s Model class, he put layers in list Dionne’s self-study note: Decomposing Jeremy’s Model class init needs weight, bias but not x data when call that class(a. k. a function) it gave x data and y label! jeremy composited function in layers. x = l(x) so concise…. . also utilized that layer list when backward ust reversing it (using python list’s method) And he is recursively calling the function on the result of the previous thing. ⬇️for l in self. layers: x = l(x)Q2: Don’t I need to declare magical autograd function, requires_grad_?{: style=”color:red; font-size: 130%; text-align: center;”} [The questions migrated to this article] Version 3 (refactoring - layer to class)- Wall time: 5. 25 µs Modue. forward(): Duplicate code makes execution time slow. Role of __call__ changed. No more __call__ for implementing forward pass. By initializing the forward with __call__, Module. forward() use overriding to maximize reusability. So any layer inherit Module, can use parent’s function. gradient of the output with respect to the weight (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) can be reexpressed using einsum, torch. einsum( bi,bj->ij , inp, out. g) Defining forward and Module enables Pytorch to out almost duplicatesVersion 4 (Module & einsum)- Wall time: 4. 29 µs Q2: Isn’t there any way to use broadcasting? Why we should use outer product?{: style=”color:red; font-size: 130%; text-align: center;”} Without einsum: Replacing einsum to matrix product is even more faster. torch. einsum( bi,bj->ij , inp, out. g)can be reexpressed using matrix product, inp. t() @ out. gVersion 5 (without einsum)- Wall time: 3. 81 µs nn. Linear and nn. Module: Torch’s package nn. Linear and nn. Module Version 6 (torch package)- Wall time: 5. 01 µs Final, Using torch. nn. Linear & torch. nn. Module~~~pythonclass Model(nn. Module): def init(self, n_in, nh, n_out): super(). init() self. layers = [nn. Linear(n_in,nh), nn. ReLU(), nn. Linear(nh,n_out)] self. loss = mse def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x. squeeze(), targ)class Model(): def init(self): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ)def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() ~~~ Footnote: fast. ai forums Lesson-8 ↩ pytorch docs - autograd ↩ stackoverflow - finding methods a object has ↩ "
}, {
- "id": 13,
+ "id": 15,
"url": "http://localhost:4000/2020/03/note08-fastai-3/",
"title": "Implement forward&backward pass from scratch",
"body": "2020/03/01 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring1. The forward and backward passes: 1. 1 Normalization: train_mean,train_std = x_train. mean(),x_train. std()>>> train_mean,train_std(tensor(0. 1304), tensor(0. 3073))Remember! Dataset, which is x_train, mean and standard deviation is not 0&1. But we need them to be which means we should substract means and divide data by std. You should not standarlize validation set because training set and validation set should be aparted. after normalize, mean is close to zero, and standard deviation is close to 1. 1. 2 Variable definition: n,m: size of the training set c: the number of activations we need in our model2. Foundation Version: 2. 1 Basic architecture: Our model has one hidden layer, output to have 10 activations, used in cross entropy. But in process of building architecture, we will use mean square error, output to have 1 activations and lator change it to cross entropy number of hidden unit; 50see below pic We want to make w1&w2 mean and std be 0&1. why initializating and make mean zero and std one is important? paper highlighting importance of normalisation - training 10,000 layer network without regularisation1 2. 1. 1 simplified kaiming initQ: Why we did init, normalize with only validation data? Because we can not handle and get statistics from each value of x_valid?{: style=”color:red; font-size: 130%; text-align: center;”} what about hidden(first) layer?w1 = torch. randn(m,nh)b1 = torch. zeros(nh)t = lin(x_valid, w1, b1) # hidden>>> t. mean(), t. std()((tensor(2. 3191), tensor(27. 0303))In output(second) layer, w2 = torch. randn(nh,1)b2 = torch. zeros(1)t2 = lin(t, w2, b2) # output>>> t2. mean(), t2. std()(tensor(-58. 2665), tensor(170. 9717)) which is terribly far from normalzed value. But if we apply simplified kaiming init w1 = torch. randn(m,nh)/math. sqrt(m); b1 = torch. zeros(nh)w2 = torch. randn(nh,1)/math. sqrt(nh); b2 = torch. zeros(1)t = lin(x_valid, w1, b1)t. mean(),t. std()>>> (tensor(-0. 0516), tensor(0. 9354)) But, actually, we use activations not only linear function After applying activations relu at linear layer, mean and deviation became 0. 5. 2. 1. 2 Glorrot initializationPaper2: Understanding the difficulty of training deep feedforward neural networks Gaussian(, bell shaped, normal distributions) is not trained very well. How to initialize neural nets? with the size of layer , the number of filters . But there is No acount for import of ReLU If we got 1000 layers, vanishing gradients problem emerges2. 1. 3 Kaiming initializatingPaper3: Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Kaiming He, explained here rectifier: rectified linear unit rectifier network: neural network with rectifier linear units This is kaiming init, and why suddenly replace one to two on a top? to avoid vanishing gradient(weights) But it doesn’t give very nice mean tough. 2. 1. 4 Pytorch package Why fan_out? according to pytorch documentation, choosing 'fan_in' preserves the magnitude of the variance of the wights in the forward pass. choosing 'fan_out' preserves the magnitues in the backward pass(, which means matmul; with transposed matrix) ➡️ in the other words, torch use fan_out cz pytorch transpose in linear transformaton. What about CNN in Pytorch?I tried torch. nn. Conv2d. conv2d_forward?? Jeremy digged into using torch. nn. modules. conv. _ConvNd. reset_parameters?? 2 in Pytorch, it doesn’t seem to be implemented kaiming init in right formula. so we should use our own operation. But actually, this has been discussed in Pytorch community before. 3 4 Jeremy said it enhanced variance also, so I sampled 100 times and counted better results. To make sure the shape seems sensible. check with assert. (remember we will replace 1 to 10 in cross entropy)assert model(x_valid). shape==torch. Size([x_valid. shape[0],1])>>> model(x_valid). shape(10000, 1) We have made Relu, init, linear, it seems we can forward pass code we need for basic architecture nh = 50def lin(x, w, b): return x@w + b;w1 = torch. randn(m,nh)*math. sqrt(2. /m ); b1 = torch. zeros(nh)w2 = torch. randn(nh,1); b2 = torch. zeros(1)def relu(x): return x. clamp_min(0. ) - 0. 5t1 = relu(lin(x_valid, w1, b1))def model(xb): l1 = lin(xb, w1, b1) l2 = relu(l1) l3 = lin(l2, w2, b2) return l32. 2 Loss function: MSE: Mean squared error need unit vector, so we remove unit axis. def mse(output, targ): return (output. squeeze(-1) - targ). pow(2). mean() In python, in case you remove axis, you use ‘squeeze’, or add axis use ‘unsqueeze’ torch. squeeze where code commonly broken. so, when you use squeeze, clarify dimension axis you want to removetmp = torch. tensor([1,1])tmp. squeeze()>>> tensor([1, 1]) make sure to make as float when you calculateBut why??? because it is tensor?{: style=”color:red; font-size: 130%;”} Here’s the error when I don’t transform the data type ---------------------------------------------------------------------------TypeError Traceback (most recent call last)<ipython-input-22-ae6009bef8b4> in <module>()----> 1 y_train = get_data()[1] # call data again 2 mse(preds, y_train)TypeError: 'map' object is not subscriptable This is forward passFootnote: Other materials: Understanding the difficulty of training deep feedforward neural networks, paper that introduced Xavier initialization Fixup Initialization: Residual Learning Without Normalization ↩ Pytorch implementaion on Kaiming init of conv and linear layers ↩ Pytorch kaiming init issue ↩ Pytorch kaiming init explained ↩ "
}, {
- "id": 14,
+ "id": 16,
"url": "http://localhost:4000/2020/03/note08-fastai-2/",
"title": "What's inside Pytorch Operator?",
"body": "2020/03/01 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, RefactoringWhat’s inside Pytorch Operator?: Section02 Time comparison with pure Python: Matmul with broadcasting> 3194. 95 times faster Einstein summation> 16090. 91 times faster Pytorch’s operator> 49166. 67 times faster 1. Elementwise op: 1. 1 Frobenius norm: above converted into (m*m). sum(). sqrt() Plus, don’t suffer from mathmatical symbols. He also copy and paste that equations from wikipedia. and if you need latex form, download it from archive. 2. Elementwise Matmul: What is the meaning of elementwise? We do not calculate each component. But all of the component at once. Because, length of column of A and row of B are fixed. How much time we saved? So now that takes 1. 37ms. We have removed one line of code and it is a 178 times faster…#TODOI don’t know where the 5 from. but keep it. Maybe this is related with frobenius norm…?as a result, the code before for k in range(ac): c[i,j] += a[i,k] + b[k,j]the code after c[i,j] = (a[i,:] * b[:,j]). sum()To compare it (result betweet original and adjusted version) we use not test_eq but other function. The reason for this is that due to rounding errors from math operations, matrices may not be exactly the same. As a result, we want a function that will “is a equal to b within some tolerance” #exportdef near(a,b): return torch. allclose(a, b, rtol=1e-3, atol=1e-5)def test_near(a,b): test(a,b,near)test_near(t1, matmul(m1, m2))3. Broadcasting: Now, we will use the broadcasting and removec[i,j] = (a[i,:] * b[:,j]). sum() How it works?>>> a=tensor([[10,10,10], [20,20,20], [30,30,30]])>>> b=tensor([1,2,3,])>>> a,b (tensor([[10, 10, 10], [20, 20, 20], [30, 30, 30]]),tensor([1, 2, 3])) >>> a+btensor([[11, 12, 13], [21, 22, 23], [31, 32, 33]]) <Figure 2> demonstrated how array b is broadcasting(or copied but not occupy memory) to compatible with a. Refered from numpy_tutorial there is no loop, but it seems there is exactly the loop. This is not from jeremy (actually after a moment he cover it) but i wondered How to broadcast an array by columns? c=tensor([[1],[2],[3]])a+ctensor([[11, 11, 11], [22, 22, 22], [33, 33, 33]])s What is tensor. stride()?help(t. stride)Help on built-in function stride: stride(…) method of torch. Tensor instancestride(dim) -> tuple or intReturns the stride of :attr:’self’ tensor. Stride is the jump necessary to go from one element to the next one in the specified dimension :attr:’dim’. A tuple of all strides is returned when no argument is passed in. Otherwise, an integer value is returned as the stride in the particular dimension :attr:’dim’. Args: dim (int, optional): the desired dimension in which stride is requiredExample::* x = torch. tensor([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])`x. stride()>>> (5, 1)x. stride(0)>>> 5x. stride(-1)>>> 1 unsqueeze & None index We can manipulate rank of tensor Special value ‘None’, which means please squeeze a new axis here== please broadcast herec = torch. tensor([10,20,30])c[None,:] in c, squeeze a new axis in here please. 2. 2 Matmul with broadcasting: for i in range(ar):# c[i,j] = (a[i,:]). *[:,j]. sum() #previous c[i] = (a[i]. unsqueeze(-1) * b). sum(dim=0) And Using None also (As howard teached)c[i] = (a[i ]. unsqueeze(-1) * b). sum(dim=0) #howardc[i] = (a[i][:,None] * b). sum(dim=0) # using Nonec[i] = (a[i,:,None]*b). sum(dim=0)⭐️Tips🌟 1) Anytime there’s a trailinng(final) colon in numpy or pytorch you can delete it ex) c[i, :] = c [i]2) any number of colon commas at the start, you can switch it with the single elipsis. ex) c[:,:,:,:,i] = c […,i] 2. 3 Broadcasting Rules: What if we tensor. size([1,3]) * tensor. size([3,1])? torch. Size([3, 3]) What is scale???? What if they are one array is times of the other array? ex) Image : 256 x 256 x 3Scale : 128 x 256 x 3Result: ? Why I did not inserted axis via None, but happened broadcasting? >>> c * c[:,None]tensor([[100. , 200. , 300. ], [200. , 400. , 600. ], [300. , 600. , 900. ]])maybe it broadcast cz following array has 3 rows as same principle, no matter what nature shape was, if we do the operation tensor broadcasts to the other. >>> c==c[None]tensor([[True, True, True]])>>> c[None]==c[None,:]tensor([[True, True, True]])>>>c[None,:]==ctensor([[True, True, True]])3. Einstein summation: Creates batch-wise, remove inner most loop, and replaced it with an elementwise producta. k. ac[i,j] += a[i,k] * b[k,j]inner most loop c[i,j] = (a[i,:] * b[:,j]). sum()elementwise product Because K is repeated so we do a dot product. And it is torch. Usage of einsum()1) transpose2) diagnalisation tracing3) batch-wise (matmul) … einstein summation notationdef matmul(a,b): return torch. einsum('ik,kj->ij', a, b)so after all, we are now 16000 times faster than Python. 4. Pytorch op: 49166. 67 times faster than pure python And we will use this matrix multiplication in Fully Connect forward, with some initialized parameters and ReLU. But before that, we need initialized parameters and ReLU, Footnote: TensorRank ti noteResources: Frobenius Norm Review Broadcasting Review (especially Rule) Refer colab! (I totally confused with extension of arrays) torch. allclose Review np. einsum Reviewh "
}, {
- "id": 15,
+ "id": 17,
"url": "http://localhost:4000/2020/02/note08-fastai-1/",
"title": "What is the meaning of 'deep-learning from foundations?'",
"body": "2020/02/29 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring” Lecture 08 - Deep Learning From Foundations-part2 “ I don’t know if you read this article, but I heartily appreciate Rachael Thomas and Jeremy Howard for providing these priceless lectures for free Homework: Review concepts 16 concepts from Course 1 (lessons 1 - 7)(1) Affine Functions & non-linearities; 2) Parameters & activations; 3) Random initialization & transfer learning; 4) SGD, Momentum, Adam; 5) Convolutions; Batch-norm; 6) Dropout; 7) Data augmentation; 8) Weight decay; 9) Res/dense blocks; 10) Image classification and regression; 11)Embeddings; 12) Continuous & Categorical variables; 13) Collaborative filtering; 14) Language models; 15) NLP classification; 16) Segmentation; U-net; GANS) Make sure you understand broadcasting Read section 2. 2 in Delving Deep into Rectifiers Try to replicate as much of the notebooks as you can without peeking; when you get stuck, peek at the lesson notebook, but then close it and try to do it yourself calculus for machine learning based on weight… einsum conventionCONTENTS: What is going on in this course? What is ‘from foundations’? Steps to a basic modern CNN model Today’s implementation goal: 1) matmul -> 4) FC backward Library development using jupyter notebook jupyter notebook certainly can make module Elementwise ops How can we make python faster? What is element wise operation? FootnoteWhat is going on in this course?: What is ‘from foundations’?: 1) Recreate fast. ai and Pytorch 2) using pure python Evade OverfittingOverfit : validation error getting worsetraining loss < validation loss Know the name of the symbol you usefind in this page if you don’t know the symbol that you are using or just draw it here (run by ML!) Steps to a basic modern CNN model: 1) Matrix multiplication -> 2) Relu/Initialization -> 3) Fully-connected Forward-> 4) Fully-connected Backward -> 5) Train loop -> 6) Convolution-> 7) Optimization ->8) Batchnormalization -> 9) Resnet Today’s implementation goal: 1) matmul -> 4) FC backward: Library development using jupyter notebook: what is assers? jupyter notebook certainly can make module: There will be #export tag that Howard (and we) want to extract special notebook2script. py will detect sign of #expert and convert following into python module and test ittest\_eq(TEST,'test')test\_eq(TEST,'test1') what is run_notebook. py? when you want to test your module in command line interface !python run\_notebook. py 01_matmul. ipynb Is there any difference between 1) and 2)?1) test -> test01 2) test01 -> test #TODO I don’t know yet look into run_notebook. py, package fire Jeremy used. What is that?read and run the code in a notebook, and in the process, Jeremy made Python Fire library called!shockingly, fire takes any kind of function and converts into CLI command. fire library was released by Google open source, Thursday, March 2, 2017 Get data pytorch and numpy are pretty much same. variable c explains how many pixels there are in in MNIST, 28 pixels PyTorch’s view() method: torch function that manipulating tensor, and squeeze() in torch & mathmatical operation similar function Rao & McMahan said usually this functions result in feature vector. In part 1, you can use view function several times. Initial python model Which is Linear, like $Xw$(weight)$+a$(bias) $= Y$ If you don’t know hou to multiple matrix, refer this site matmul visulization site How many time spends if we we use pure python function matmul, typical matrix multiplication function, takes about 1 second for calculating 1 single train data! (maybe assumed stochastic, 5 data points in validation) it takes about 11. 36 hours to update parameters even single layer and 1 iteration! (if that was my computer, it would be 14 hours. . )🤪 THIS is why we need to consider ‘time’&’space’ This is kinda slow - what if we could speed it up by 50,000 times? Let’s try! Elementwise ops: How can we make python faster?: If we want to calculate faster, then do remove pythonic calcuation, by passing its computation down to something that is written something other than python, like pytorch. According to PyTorch doc it uses C++ (via ATen), so we are going to implement that function with python. What is element wise operation?: items makes a pair, operate corresponding componentFootnote: notebooks material video broadcasting excel"
}, {
- "id": 16,
+ "id": 18,
"url": "http://localhost:4000/2020/02/what-is-convolution/",
"title": "Digging into convolution",
"body": "2020/02/28 - Issues 1) Kaiming Initializtion in Pytorch was in trouble. 1 2) Jeremy started to dig in, in lesson09, but I didn’t know why the size of tensor is 2 and even understand this spreadsheet data. 3 Homework: Read Visualizing and Understanding Convolutional Networks paper What is a convolution? Visualization one kernel Matthew D Zeiler & Rob Fergus Paper Convolution can be represented as matmul Padding Kernel has rank 3 How can we find a side-edge, a gradient and area of constant weight? What is a convolution?: A convolutional neural network is that your red, green, and blue pixels go into the simple computation, and something comes out of that, and then the result of that goes into a second layer, and the result of that goes into the third layer and so forth. Visualization: one kernel Refer this site for visualizing CNN filteringMatthew D Zeiler & Rob Fergus PaperLecture01 Nine examples of the actual coefficients from the **first layer** Convolution can be represented as matmul: CNNs from different viewpoints {align-items: center;} [A B C D E F G H I J] is 3 by 3 image data flatten to vector. As a result, convolution is a just matrix just two things happens Some of entries are set to zeros at all the times same color always have the same weight. That called weight time / wegith sharing So, we can implement a convolution with matrix multiplication. But, we don’t do that because it’s slow!Padding: What most of libraries do is just put zeros asdie of matrix fast. ai uses reflection paddings (what is this? Jeremy said he uttered it)Kernel has rank 3: As standard picture input would be 4 5, it would be actually 3d, not 2d. If we make kernel as a 3x3 size, we pass over same kernel all the different Red, Green, Blue Pixels. This could make problem, because, if we want to detect frog, which is green, we would want more activations on the green(I made a test cell in my colab 6) How can we find a side-edge, a gradient and area of constant weight?: Not top-edge! One kernel can find only the top-edge, so we should stack the kernels 7 So, we pass it through bunch of kernels to the input images, and that process gives us height x width x corresponding number of kernels. Usually that number of chanel is 16 And if we want to get the more channels and features, we should repeat that process This process gives rise to memory out of control, we do the stride #### conv-example. xlsx 2 convolutional filters At a second layer, filter is 3x3x2 tensor, because to add up together the first layer’s channel. Reference: Problem was math. sqrt(5) was not kaiming initialization formula, Implementation in Pytorch ↩ size of tensor, lecture09 ↩ conv-example. xlsx ↩ Why do computer use red, green and blue instead of primary colors ↩ Grayscale is a group of shades without any visible color. … Each of these dots has its own brightness level as well and, therefore, can be converted to grayscale. A grayscale image is one with all color information removed. ↩ Testing RGB and grayscale ↩ stack kernel and make new rank of tensor at output, Lesson06-2019 ↩ "
}, {
- "id": 17,
+ "id": 19,
"url": "http://localhost:4000/2020/02/dps-week8/",
- "title": "Digital Product School week 8&9",
- "body": "2020/02/24 - The 8th week retropect at Digital Product School Week 8/9 - Ship your MVP/Release next iteration each day This week's schedule CONTENT: Preparing engineering weekly Agile Process Daily Stand-up Making application flowchart (feat draw. io) / ER diagram Flowchart, understaning user journey ER diagram Engineering weekly AI lunch Connecting firebase andPreparing engineering weekly: This week at Wednesday, I planned to explain the Language Modelings, mainly focusing ELMo, ULMFiT, BERT and GPT-2. Slides is available here Changed the presentation, because there were people who are not in ML domain. hereWhenever I do the presentation, I learn more than the information I give them. At the same time, I realize I need to learn more than I know. Agile Process: One of a priceless lesson I learnt from digital product school, was experience of doing agile work. Before I came here, it was a little bit vague concept. I’m not sure ‘what is agile’ but this is what we tried to make agile process. Daily Stand-up: Sharing the works everyday helps interdisciplinary team to work better. Since product started to get higher fidelity, the gap between engineer and non-engineer increased. Actually I didn’t planned to explain concept because I thougth I would be lose my audience when I start to explain. But as daily stand-up, which shares our progess, goes day by day, I planed and reported the issues. And it made each other’s topic feel more familiar. I think point is very important, because at that point people start to be curious. So we can actively ask to the others, and that momwnr, we can explain the point teammate dosen’t know. Each color means every different section. Red: Our team goal, Blue: Interaction designer, Green: Product manager, Yellow: Software/AI engineer This week engineer's main plan Each of us try to explain what we are doing, but things become easier when we are asked. Because we explained something was important to us before, but if we asked it is something important for the others. Making application flowchart (feat draw. io) / ER diagram: Before we start the party, we should clarify the flowchart and ER diagram of our application. Flowchart, understaning user journey: Thanks for google, we could use draw. io for our framechart framework. Actually, we cana choice other good flatform, but draw. io has connected app throgh google drive, most of our engineer was used to it. And after this job, I got to know there is also (of course) rule with the symbols, color, size, space, scaling and direction of arrow -reference. But why we should do this? WE have made our storymap before!! I think storymap is for visualize our status and app. So it should be shared with whole the team, and they should able to understand each role’s issue. But flowchart is more like testing technical feasibility, and error that user can experience. So it could be little more specific, complicated, and hypothetical. This week engineer's main plan ER diagram: Even if we use NoSQL database through firebase, my team was accustomed to SQL more. That what we educated when we were at college, so we had to organize our concept while we were learning NoSQL. Engineering weekly: Every engineering weekly we exchange our knowledge each other so that we can grow together. Before today, my AI collegues presented regression, knn and it was my turn. I prepared slide that explain about pre-trained language model, but my header advised me if I go deep of theoretical things, I would lose my audience. So I decided to brief BERT mode, how I can contribute to other team’s project. Since BERT was breakthrough of NLP industry, I tried to explain how it can be applied to hands on product and how it can help people in their product. The result was quite motivative to me. They gave feedback that since it wasn’t that much theoretical, they could enjoy it, and useful information. Someone asked me do I had learned of presentation before. I was really happy with their feedback! AI lunch: Connecting firebase and: "
+ "title": "My life in Digital Product School - week 8/19/10",
+ "body": "2020/02/24 - The 8/9/10th week retropect at Digital Product School Week 8 - Ship your MVPWeek 9/10 - Release next iteration each day Week 8th schedule CONTENT: Agile Product Development Daily Stand-up(planning) Gemba Walk Sprint Reviews Engineering weeklyAgile Product Development: One of a priceless lesson I learnt from digital product school, was experience of doing agile work. Before I came here, it was a little bit vague concept. I’m still not sure ‘what is agile’ but this is how we tried to make agile process. Daily Stand-up(planning): Sharing the works everyday helps interdisciplinary team to work better. Since product started to get higher fidelity, the gap between engineer and non-engineer increased. Actually I didn’t planned to explain concept because I thougth I would be lose my audience when I start to explain. But as daily stand-up, which shares our progess, goes day by day, I planed and reported the issues. And it made each other’s topic feel more familiar. I think point is very important, because at that point people start to be curious. So we can actively ask to the others, and that momwnr, we can explain the point teammate dosen’t know. Each color means every different section. Red: Our team goal, Blue: Interaction designer, Green: Product manager, Yellow: Software/AI engineer This week engineer's main plan Each of us try to explain what we are doing, but things become easier when we are asked. Because we explained something was important to us before, but if we asked it is something important for the others. Gemba Walk: Team Cero with core team Every 2 weeks, we do the Gemba work, which is ‘question everything to the core team’ time. At this period, people can ask anything related to our product, workshop, and framework. Core team will help just for each team, and each team can solve the problem related to their work. < br/>Why we need this session? because with workshop and general schedule, core team has no time just focus on each team. So through this session, we can have opportunity to understand each program and workshop, like why we are using this platform, and when is the due of our small project, and we have this problem and we need help for this. whatever small problem you have, core team is always willing to help you. Sprint Reviews: Every Friday, we have time to summarise what we did for the week. Maybe we need HMW question and our storymap to share our process and then tell and share what we did try, what point we succeeded and what point it was deviant of our prediction, and why we tried it. . Sprint of Ve-link And then, just after all team’s ppt, we do vote with such a cute marvel. Always it’s very difficult to vote (of course you can’t vote to your team!) Because it depends on criteria what do I value!But since this is process of our agile work, I try to focus on what they have changed since last week, and why they did it, how they did it. Engineering weekly: Every engineering weekly we exchange our knowledge each other so that we can grow together. Everyone have their knowledge to share and we can be tutor and at the same time can be of tutee. Previously, my AI collegues presented regression, knn. And because I’m somewhat specialized to NLP, I prepared slide that explain about pre-trained language model, but my header advised me if I go deep of theoretical things, I would lose my audience. So I decided to brief BERT mode, how I can contribute to other team’s project. Since BERT was breakthrough of NLP industry, I tried to explain how it can be applied to hands on product and how it can help people in their product. The result was quite motivative to me. They gave feedback that since it wasn’t that much theoretical, they could enjoy it, and useful information. Someone asked me do I had learned of presentation before. I was really happy with their feedback! "
}, {
- "id": 18,
+ "id": 20,
"url": "http://localhost:4000/2020/02/fast.ai-nlp-note-16/",
"title": "Algorithmic bias",
"body": "2020/02/20 - Algorithms can encode & magnify human bias Case Study 1: Facial Recognition & Predictive Policing: Joy Buolamwini & Timnit Gebru, gendershades. org Microsoft, FACE+, IBM - All of these things are sell now. Largest gap between $\therefore\ Lighter Male\ >\ Darker\ Female $ This US mayor joked cops should “mount . 50-caliber” guns where AI predicts crime With machine learning, with automation, there’s a 99% success, so that robot is ㅡwill beㅡ99% accurate in telling us what is going to happen next, which is really interesting. - city official in Lancater, CA, approving on using IBM for public security Bias: Bias is type of error Statistical Bias: difference between a statistic’s expected value and the true value Unjust Bias: disproportionate preference for or prejudice against a group Unconscious bias: bias that we don’t realize we have But, term bias is too generic to be productive. Different sources of bias have different causes Representation Bias: Dataset was not representative of the algorithm that might be used on later. Above : Data is okay, but algorithm has some problem. Below : Data has error. For example, object detection production that performs very well in common product of US. But in contrast, change of target product region, like Zimbabwe, Solomon Island, and so on, reduced the performence remarkably. It is not the algorithmic problem, so we should care about data volume of region. Evaluation Bias: Benchmark datasets spur on research, 4. 4% of IJB-A images are dark-skinned women. 2/3 of ImageNet images from the West (Sharkar et al, 2017) Case Study 2: Recidivism Algorithm Used Prison Sentencing: Case Study 3: Online Ad Delivery: Bias in NLP: ( Nothing to do with the course, but I’m researching this field these days. ) But all about Englsih ImpactThe person is doctor. The person is nurse -> 그는 의사다. 그녀는 간호사다. Concept of “biased data” often too generic to be useful: Different sources of bias have different sources Data, models and systems are not unchanging numbers on a screen. They’re the result of a complex process that starts with years of historical context and involves a series of choices and norms, from data measurement to model evaluation to human interpretation. - Harini Suresh, “The problem with Biased Data” Five Sources of Bias in ML: Representation Bias Evaluation Bias Measurement Bias Aggregation Bias(46:02) Historical Bias(46:26) A few studies(47:13) Racial Bias, Even when we have good intentions(new york times)(47:10) gender(48:59) Humans are biased, so why does algorithmic bias matter?: Algorithms & humans are used differently (humans are usually decision maker) Algorithms are accurate and objective No way to apeal if there if error processed large scale cheap Machine learning can amplify bias Machine learning can create feedback loops. Technology is power. And with that comes responsibility. Solutions: Analyze a project at work/school: Questions about AI 5 types of bias (Suresh & Guttag) Datasheets for datasets, Modelcards for model reporting Accuracy rate on different sub-groups Work with domain experts & those impacted Increase diversity in our workspace Advocate for good policy Be on the ongoing lookout for bias"
}, {
- "id": 19,
+ "id": 21,
"url": "http://localhost:4000/2020/02/classifier-city/",
"title": "Making a classifier with image dataset made from gooogle",
"body": "2020/02/15 - CONTENTS: Creating dataset from google images Using google_images_download Create ImageDataBunch Train model fit_one_cycle() Let’s find-tune Let’s train the whole model! Let’s make batch size bigger! Interpretation Model in productionCode can be found hereDeployed model here Making a classifier which can distinguish Seoul from Munich and Sanfrancisco!(hoping my well in Munich!) Creating dataset from google images: In machine learning, you always need data before you build your model. You can use either URLs or google_images_download package. Since Jeremy explained specifically, I will try the other. Using google_images_download: note: This is not google official package Refer to Official Doncument, put that arguments. from google_images_download import google_images_downloadresponse = google_images_download. googleimagesdownload() #class instantiationout_dir = os. path. abspath('. . /. . /materials/dataset/pkg/')os. mkdir(out_dir)arguments = { keywords : Cebu,Munich,Seoul , print_urls :True, suffix_keywords : city , output_directory :out_dir, type : photo , }paths = response. download(arguments) #passing the arguments to the functionprint(paths)and if you need, here is main code. Create ImageDataBunch: We need to separate validation set because we just grabbed these imagese from Google. Most of the dataset we use (kaggle/research) splited into train / validation / test so if they are not devided beforehand we should make databunch, and Jeremy recommended assign 20% to validation. Help on function verify_images in module fastai. vision. data:verify_images(path: Union[pathlib. Path, str], delete: bool = True, max_workers: int = 4, max_size: int = None, recurse: bool = False, dest: Union[pathlib. Path, str] = '. ', n_channels: int = 3, interp=2, ext: str = None, img_format: str = None, resume: bool = None, **kwargs) Check if the images in `path` aren't broken, maybe resize them and copy it in `dest`. Data from google image url Data from package Train model: len(class) len(train) len(valid) Data_url 3 432 108 Data_pkg 3 216 53 Uisng model: restnet34 1, Measurement: accuracy 2 fit_one_cycle(): What is fit one cycle? Cyclical Learning Rates for Training Neural Networks One of the way to find good learning rate. Core idea is to start with small learning rate (like 1e-4, 1e-3) and increase the learning rate after each mini-batch till loss starts exploding. And pick up learning rate one order lower than exploding point. For example, plotted learning rate is like below picture, picking up around 1e-2 is the best way. Why this methods Traditionally, the learning rate is decreased as the learning starts converging with time. But this paper suggests to cycle our learning rate, because it makes us avoid local minimum. Basically this cyclic method enables us to explore whole of loss function so that find out global minimum. In other words, higher learning rate behaves like regularisation. Let’s find-tune: Do train just one last layer by learning rate found by find_lr This section you should find the strongest downward slope that kind of sticking around for quite a while. And choose just one order lower than lowest point. As explained before, I will pick up 1e-2. And of course, this is fine-tuning, we don’t need discriminative learning rate yet. Let’s train the whole model!: link When you plot the learning rate again, maybe you will get soaring shape of learning rate. Rule of thumb, When you slice the learning rate, use learning rate you used at unfrozen part. Divide it by 5 or 10 and put it on maximum bound. At minimum bound, get the point just before it soared, and divide it by 10. Let’s make batch size bigger!: Since default batch size is 64, I tried it to 128. And it gets way more better result(even it’s still underfitting!) And if I freeze model and train whole model again, the model would be better. Also, you can use this method to the other big dataset model training! Interpretation: See the confusion matrix. Result is quite great. *Since I’m using colab, I will skip data cleansing. But I highly recommend you to use ImageCleaner widget, only if you are using jupyter notebook (not jupyter lab) Model in production: You can deploy your model in simple way. I referred fast. ai, and used render(it’s free for limited time). You can find detailed document here. and you can create a route like this. @app. route( /classify-url , methods=[ GET ])async def classify_url(request): bytes = await get_bytes(request. query_params[ url ]) img = open_image(BytesIO(bytes)) _,_,losses = learner. predict(img) return JSONResponse({ predictions : sorted( zip(cat_learner. data. classes, map(float, losses)), key=lambda p: p[1], reverse=True ) })You can find my deployed model here Reference: How to create a deep learning dataset using Google Images towardsdatascience - one cycle policy Deep Residual Learning for Image Recognition ↩ Accuracy_and_precision ↩ "
}, {
- "id": 20,
+ "id": 22,
"url": "http://localhost:4000/2020/02/dps-week5/",
"title": "Digital Product School week 5",
"body": "2020/02/09 - The 5th week retropect at Digital Product School Week 5 - Create a Storymap and sync it with Lean Canvas This week's schedule CONTENT: How to create our story map Prepare your story Discover your product’s AI potentialMondayHow to create our story map: We need this 'aha' moment There was a Milestone workshop, about our weekly goal. As we are agile working, we go fast and change every week’s goal. This week we will finalize our story map based on user’s pain-point and HMW questions. How should we make our story-map Basically we should make story map based on this rule Tell stories, don’t just write them! We always need context, that means all the story component should be connected Visualize your product to establish a shared understanding and speed up discussions! Post-it filled of text is not enough, we should fill it with visualizations then team mates can understand it fast Only discuss in front our your story map! (Speed) So we can update our story-map as soon as we change our opinion And also Use a story map to find the parts that matter most and to identify holes in your idea! Since the story map consists of techinical part, we should consider each story’s technical feasibility Minimise output, maximise outcome and impact! Build tests to figure out what’s minimum and what’s viable! This story map functions to find out our minimum value of ideas Work iteratively: Change your story map according to your learnings! We should repeat this process again and again PMs: Make sure Storymap is up to date!Prepare your story: team cero, our whole story map Our goal Technical feasibility of our storyWhat is your strategy to make user achieve something? This would be our expand point Discover your product’s AI potential: How can we apply AI to our product? Let’s write down our ‘HMW’ questions, and find out all p ossibilities. These are suggestion of possibilities, so don’t attached to feasibility (we will do in at lean start-up) Software section's expectation AI section's expectationTuesday Engineer's task, week5This 5th week, engineers settled WendesdayThursdayFriday"
}, {
- "id": 21,
+ "id": 23,
"url": "http://localhost:4000/2020/02/GPU-time/",
"title": "4 reasons took much time to setting GPU for fast.ai than I expected",
"body": "2020/02/05 - Motivation: Before now, me as a undergraduate student, I was parsimony who usually depend on colab, kaggle, friend’s server(occasional) whenever i need GPU. . And this time it’s been for a while to install GPU than I expected and I share the several component that stood in my way. Written at Oct 24 2019, if you think this is deprecated, please do not have a leap of faith. Just for the record, I’ve used Kaggle, Colab, GCP, Azure, EC2 as GPU cloud. 1. Did not know there is JupyterLab option in Google Cloud Platform. : At the first time when GCP came out, there was no AI Platform service. So from starting vm instance to launching jupyter and installing packages, I did all of the things myself. (and I learned 🤗) $ curl -O https://repo. continuum. io/archive/Anaconda3-5. 0. 1-Linux-x86_64. sh[Downloading conda in ssh] I created VM instance,selected zone, machine type and disk type. Then, define firewall rules and in ssh terminal, install jupyter and other packages. But you can do all of these things just using AI Platform. [AI Platform] I think it especially save your time if you are living in Asia-Pacific, which google doesn’t support not that much GPU resources. 2. Consider if the platform has limited resources in a region you live in. : I live in South Korea, East Asia, and it seems like this region has lots of limitation in GPU (except quite expensive AWS) And the Taiwan which was the only one region where I can launch my own VM with GPU (I tried all the other regions in the list) sometimes do normaly, but not always. 😥After launching, I did several works and next day I could not start VM. (I didn’t count it, but tried it a few hours because I didn’t want cost any more time…) Endlessly failed to start instance, then I choose to move AWS as an alternative way. 3. Fast. ai gives deliberate guide and I didn’t know it. : Fast. ai offer the guide for all available platform. (Colab, salamander, Gradient, Kaggle, Colab, and so on) It is so important, and really needs, because cloud computing options are vary as occasion and purpose arise. I didn’t know fast. ai has manual to running GCP, and I think it’s as good a reason as any for me to be have taken time. It helped me so much when I had aws and shortened my time. I don’t want to read all of the manual in amazno. . (It is recommended. . but I’d rather read GIT PRO now…) ssh -i ~/. ssh/<your_private_key_pair> -L localhost:8888:localhost:8888 ubuntu@<your instance IP>4. You should wait to add more volume just after add volume, by building AWS EC2. : Since Elastic Block Store(EBS) storage supports optimized storage, users can’t extend storage volume two times in a row. Unfortunately, at the first time, I didn’t know it (again 👻) and when VM lacked volume, I doubled dist capacity (76*2) at a rough but It needs more. <!– this time I installed GPU in two years, and it became little complicated compared to 2 years ago. And this time for the first time(maybe not the first time. . but i handled it in my class or with my friend. but it’s my first time on my own. ) I very I’m started to using used google colab, kaggleand, GCP-JupyterLab, ec2 - friend made, aws vm machine but I had a environment variable but i did not know of it. On these days, I could not get a resources from taiwan… I couldn’t notice a deliberate Anyway, as a result I tried myself gcp myself and aws ec2 with fast. ai But I think doing on my self surely takes much time (in this point I wonder why I’m doing this, and should remind me, especially I was studying disk volume optimization) disk volume exceed - https://askubuntu. com/questions/919748/no-space-left-on-device-even-though-there-is: "
}, {
- "id": 22,
+ "id": 24,
"url": "http://localhost:4000/2020/02/dps-week4/",
"title": "Digital Product School week 4",
"body": "2020/02/01 - The 4th week retropect at Digital Product School Week 4 - Find solution ideas and run experiments [This week’s schedule] CONTENT: Ideation Techniques What is ideation techniques? Generating idea in my team AIdeation Team brain storming of idea Die Produkt MacherMondayIdeation Techniques: [slides from @steffen] What is ideation techniques?: We tried to find out user’s painpoint last week. Tried to users talk about their, pain point. No question directly, but extract from them their pain with transportation. Generating idea in my team: AIdeation: TuesdayTeam brain storming of idea: Based on generated idea on Monday, we extended our idea doing rolling-paper! Die Produkt Macher: What is lean start-up? Lean startup is a methodology for developing businesses and products that aims to shorten product development cycles and rapidly discover if a proposed business model is viable; this is achieved by adopting a combination of business-hypothesis-driven experimentation, iterative product releases, and validated learning. - wikipedia WendesdayThursdayFriday"
}, {
- "id": 23,
+ "id": 25,
"url": "http://localhost:4000/2020/01/retrosprect-of-acl-paper-2020/",
"title": "Retrospect of ACL 2020 paper writing",
"body": "2020/01/29 - 2020 Annual Conference of the Association for Computational Linguistics Why I can’t use ‘Cebuano’ for the research?: Why I had to change target language from ‘Cebuano’ to ‘Tagalog’?-> No language translator options except google translation. But before knowing that I already consult my friend, whose mother tongue is English. So I had to aplogize her, but couldn’t tell her why suddenly I changed my plan. -> I realized there are many languages even can’t be researched at all. . -> Getting accustomed to discrimination makes misunderstanding, sometimes. At my country, we couldn’t use music streaming service, because of legal problem. But at that moment, I thought it was discrimination, which is done by music company. "
}, {
- "id": 24,
+ "id": 26,
"url": "http://localhost:4000/2020/01/Git-Merge/",
"title": "Why am I not listed as a contributor?!",
"body": "2020/01/10 - From the end of last year, big changes have witnessed in NLP research. Embracing an unprecedented growth, I started to study new exciting results and advances. In doing so, I noticed I’m not listed as contributor of repo which my PR accessed. How did I come to a repository?: When I’m stuck, I would prefer to code, than to go deep in theory. (It must be so. . too much to understand 🤒)It was BERT released by Google AI I felt keenly the necessity of implementing, because not only couldn’t understand the way they figured out positional encoding formula, but how it actually works. What does it mean to “scale” dot product in Attention? (Now I know it’s far from my section 😂) Figure 1. Scaled Dot Product. Adopted from tensorflow blogWhat was the code error?: For implement code in paper, I read the papers Transformer and BERT, structured the model, and refered the others’ code. Meanwhile, I found out a small error in tokenization process, which was changing a token into [MASK], enabled bidirectional representation. I’ve made PR, and got merged. But I was not in contributors. Why?: Figure 2. Merged Pull request Adopted from graykode projectActually I happened to know there can be couple of reasons github doesn’t include my name as contributor. Well, if contributors tab has more than 100 people, in which case it shows you up only if you are in the top 100 contributors because displaying too many contributors can make webpages down. Somethimes, however, it doesn’t that problem. Why not? Two possibilities are there. First, According to Joel-Glovier, if repository maintainer merged-as-a-rebase PR will end up showing as maintainer’s commit. But maintainer shouldn’t normally do this. Second, if you happend to commit using a different git email that what is in your GitHub profile, it will not be attached to your Github user, and “doesn’t show up” as you. Reference: Michał Chromiak’s blog Github: why are my contributions are not showing on my profile atlassian-gitfetch"
}, {
- "id": 25,
- "url": "http://localhost:4000/2019/12/lesson1-fastai/",
- "title": "Fine Grained Classification",
- "body": "2019/12/31 - Finally you can solve the mystery behind this weird drawing. . through this course. juptyer notebook magic: %reload_ext autoreload%autoreload 2%matplotlib inlinethis is special directives to jupyter notebook, not python code. And it is called ‘magics’ (but i think jeremy is magicion) If somebody changes underlying library code while I’m running this, please reload it automatically If somebody asks to plot something, then please plot it here in this Jupyter NotebookDon’t hesitate to import start~ Digging into untar_data, path. ls: Union[pathlib. Path, str]: typed programming language? -> maybe i think disclaim the type beforehand for sure. Q. like assert? path. ls()this is some module that fast. ai made because os. listdir(‘path’) is unconvinient. Python3 pathlib library!: pathlib "
- }, {
- "id": 26,
+ "id": 27,
"url": "http://localhost:4000/2019/12/jeremy-howard/",
"title": "Jeremy Howard",
"body": "2019/12/15 - This is journey to find out ‘who am I trying to be?’: How he impacted me? The person who made me start Computer Vision again. He emphasized the importance of studying NLP and Computer together to understand the deep-learning. He didn’t order it to study, but always he pursuade me with reasonable way. “It’s not just something I can throw away. NLP and computer vision a few weeks apart and that’s going to force your brain to realize like ‘oh I have to remember this’” He made me admit my failure in deep-learning. I started to objectify where am I. What should I do when I’m frustrated. “Keep going. You’re not expected to remember everything. Yet. You’re not expected to understand everything. Yet. You’re not expected to know why everything works. Yet. ” His articles are numerous, below. What is torch. nn Really? High Performance Numeric Programming with Swift: Explorations and Reflections C++11, random distributions, and Swift And especially, I like this book. Designing great data products Great predictive modeling is an important part of the solution, but it no longer stands on its own; as products become more sophisticated, it disappears into the plumbing. Designing great data products And he is also famous for words. Here are some. we’re going to try and use that to really understand what’s going on. So to warn you, none of it is rocket science but a lot of its going to look really new. So don’t expect to get it the first time but expect to listen and jump into the notebook try a few things test things out look particularly at like tensor shapes and inputs and outputs to check your understanding then go back and listen again. But and kind of try it, a few times, because you will get there right, it’s just that there’s going to be a lot of new concepts because we haven’t done that much stuff in pure Pytorch. Lesson 6: Deep Learning 2019 "
}, {
- "id": 27,
+ "id": 28,
"url": "http://localhost:4000/2019/11/julia-evans/",
"title": "Julia Evans",
"body": "2019/11/20 - This is journey to find out ‘who am I trying to be?’: The women who surprised me in many ways. First, she approached me to teaching some concepts drawing cartoons. It was at Hackers news, which was hightest ranks. Personally I have the use of not to reading title, so and cartoon was so cute and clear. I naturally gonna understood mechanism and astonished by her explaination ability. Her value, which she was taught by many people so want to do same things, moved me. Volume of her knowledge, that just reading post title is a deal of work, amazed me. "
}, {
- "id": 28,
+ "id": 29,
"url": "http://localhost:4000/2019/11/coc-retropective/",
"title": "Retrospective on Pycon 2019 Korea (CoC Committee)",
"body": "2019/11/05 - When I was volunteer, it seems like busy and hectic to managing that crowded conference. In my experience, to get things moving, it needs hierarchy. But it didn’t. Organizers emphasized our responsibility, and if I passed each other’s burden, It could be my burden next time. In solidarity of the obligation, we finished conference well. And after participating PyCon Korea 2018 as volunteer, I’ve joined PyCon Korea Organizer last year. <Figure 1> First meeting of PyCon 2019 Korea Organizers It’s been a while since PyCon 2019 finished. It’s held on Aug 15 - 18, at Coex Grand Balloom <Figure 2> Ongoing session, speaking on news comment processing <Figure 3> Sponsor Booth iin Coex Hall <Figure 4> After PyCon 2019, with all of volunteer, organizer, speakers 😍 🥰 Serving as part of the coc TF, I spent large fraction of last year doing CoC job. here’s the path what we’ve been grappled with to grasp a solution. First half: Before the conference Toward Diverse Community: Formally we’ve been reusing and modifying PyCon US CoC, but we needed fit in Korean and I was part of that to revise code of conduct. Except ‘That’ Diversity, Because it is ‘Harassment’: Specific point was harassment, and the others were not. process of finding the points. How can we settle this point?Second half: During the conference Handling the potential Harassment: Disjunction of policy and real-time situation: This ‘PyCon 2019 Korea retrospective series’ would be devided into 3 Episodes. “Retrospective on Pycon 2019 Korea (CoC Committee)” “Retrospective on Pycon 2019 Korea (Program Chair)” (20 Nov, To Be Update) “Maintaining participation while still making timely decisions” (29 Nov, To Be Update)"
}, {
- "id": 29,
+ "id": 30,
"url": "http://localhost:4000/2019/11/elif-shafak/",
"title": "Elif Shafak",
"body": "2019/11/05 - This is journey to find out ‘who am I trying to be?’: For creative-minded people, Istanbul is a treasure. ’ Photo © Chris Boland, licensed under CC BY-NC-ND 2. 0 it suddenly felt like what I was trying to convey was more complicated and detailed than what the circumstances allowed me to say. And I did what I usually do in similar situations: I stammered, I shut down, and I stopped talking. I stopped talking because the truth was complicated, even though I knew, deep within, that one should never, ever remain silent for fear of complexity. <Figure 1> Elif Shafak Photo credit: www. elifsafak. com. tr I want to talk about emotions and the need to boost our emotional intelligence. I think it’s a pity that mainstream political theory pays very little attention to emotions. Oftentimes, analysts and experts are so busy with data and metrics that they seem to forget those things in life that are difficult to measure and perhaps impossible to cluster under statistical models. But I think this is a mistake, for two main reasons. We are emotional beings. I think it’s going to be one of our biggest intellectual challenges, because our political systems are replete with emotions. In country after country, we have seen illiberal politicians exploiting these emotions. And yet within the academia and among the intelligentsia, we are yet to take emotions seriously. I think we should. 1 2 Reference: British Council Worldwide ↩ Ted Talk ↩ "
}, {
- "id": 30,
+ "id": 31,
"url": "http://localhost:4000/2019/01/dps-week1/",
"title": "Digital Product School week 1",
"body": "2019/01/11 - The 1th week retropect at Digital Product School [This week’s schedule] CONTENT: Welcome to Digital Product School! Trip to Spitzingsee Welcome to Design Office Specifying our goal of product Welcome to Digital Product School!: Trip to Spitzingsee: At the first day of Digital Product School, we had a off-site with all of batch 9 people. All the costs were managed by dps. At the beautiful mountain, we settled the team, and got my team goal. Basically, there are two kind of team in DPS. (1) Wild team - the team has fixed topic(2) Company team - the team which has specific stakeholders, and also topic defined by that stakeholders The Core-team will fix what team you will join in DPS for 3 months based on ymy professionals, they announce it at off-site. [My team for 3 months at DPS] And we decide on my batch #9 theme song. How? Each team draw for songs and pitch ‘why this song should be batch #9 theme song’The result? Imagine dragon - Believer (I didn’t know at the moment, this song would be stamped in my memory) We have a workshop for getting to know each other. For example, we share 1) what do I expect from 3 months of dps, 2) when I feel happy in my life time, 3) what I worked for last week, 4) what was my last project and 5) what plays important role in my life My team's board Cero Welcome to Design Office: At first day of design office, we had workshop, which celebrates my day in dps also discuss specific rule, menifesto and stakeholders We get sticker and attach it in map depends on my nationality Now time to get to know my team’s stakeholders. What they want for us? What they expect from us? How free my team are on the topic?To be honest, it is endless tug-of-war. We should discuss with my stakeholders, endlessly, and find out solution which can meet interest of users, stakeholders and my team. Basically, my team’s main stakeholder is ADAC, but BMW, City of munich and Nokia will also participate as my team’s stakeholders. Specifying our goal of product: "
diff --git a/_site/2020/02/dps-week5/index.html b/_site/2020/02/dps-week5/index.html
index 52b62a8739..fb805db71d 100644
--- a/_site/2020/02/dps-week5/index.html
+++ b/_site/2020/02/dps-week5/index.html
@@ -19,9 +19,9 @@
-
+
+{"description":"The 5th week retropect at Digital Product School","author":{"@type":"Person","name":"dionne"},"@type":"BlogPosting","url":"http://localhost:4000/2020/02/dps-week5/","publisher":{"@type":"Organization","logo":{"@type":"ImageObject","url":"http://localhost:4000/assets/images/logo.png"},"name":"dionne"},"image":"http://localhost:4000/assets/images/week5/user-storymap.png","headline":"Digital Product School week 5","dateModified":"2020-02-09T00:00:00+09:00","datePublished":"2020-02-09T00:00:00+09:00","mainEntityOfPage":{"@type":"WebPage","@id":"http://localhost:4000/2020/02/dps-week5/"},"@context":"http://schema.org"}
@@ -161,96 +161,101 @@
"body": " {% if page. url == / %} {% assign latest_post = site. posts[0] %} <div class= topfirstimage style= background-image: url({% if latest_post. image contains :// %}{{ latest_post. image }}{% else %} {{site. baseurl}}/{{ latest_post. image}}{% endif %}); height: 200px; background-size: cover; background-repeat: no-repeat; ></div> {{ latest_post. title }} : {{ latest_post. excerpt | strip_html | strip_newlines | truncate: 136 }} In {% for category in latest_post. categories %} {{ category }}, {% endfor %} {{ latest_post. date | date: '%b %d, %Y' }} {%- assign second_post = site. posts[1] -%} {% if second_post. image %} <img class= w-100 src= {% if second_post. image contains :// %}{{ second_post. image }}{% else %}{{ second_post. image | absolute_url }}{% endif %} alt= {{ second_post. title }} > {% endif %} {{ second_post. title }} : In {% for category in second_post. categories %} {{ category }}, {% endfor %} {{ second_post. date | date: '%b %d, %Y' }} {%- assign third_post = site. posts[2] -%} {% if third_post. image %} <img class= w-100 src= {% if third_post. image contains :// %}{{ third_post. image }}{% else %}{{site. baseurl}}/{{ third_post. image }}{% endif %} alt= {{ third_post. title }} > {% endif %} {{ third_post. title }} : In {% for category in third_post. categories %} {{ category }}, {% endfor %} {{ third_post. date | date: '%b %d, %Y' }} {%- assign fourth_post = site. posts[3] -%} {% if fourth_post. image %} <img class= w-100 src= {% if fourth_post. image contains :// %}{{ fourth_post. image }}{% else %}{{site. baseurl}}/{{ fourth_post. image }}{% endif %} alt= {{ fourth_post. title }} > {% endif %} {{ fourth_post. title }} : In {% for category in fourth_post. categories %} {{ category }}, {% endfor %} {{ fourth_post. date | date: '%b %d, %Y' }} {% for post in site. posts %} {% if post. tags contains sticky %} {{post. title}} {{ post. excerpt | strip_html | strip_newlines | truncate: 136 }} Read More {% endif %}{% endfor %} {% endif %} All Stories: {% for post in paginator. posts %} {% include main-loop-card. html %} {% endfor %} {% if paginator. total_pages > 1 %} {% if paginator. previous_page %} « Prev {% else %} « {% endif %} {% for page in (1. . paginator. total_pages) %} {% if page == paginator. page %} {{ page }} {% elsif page == 1 %} {{ page }} {% else %} {{ page }} {% endif %} {% endfor %} {% if paginator. next_page %} Next » {% else %} » {% endif %} {% endif %} {% include sidebar-featured. html %} "
}, {
"id": 12,
+ "url": "http://localhost:4000/2020/04/v3-2019-lesson06-note/",
+ "title": "fastai 2019 course-v3 Part1, lesson06",
+ "body": "2020/04/15 - Lesson 06Rossmann(Tabular): Tabular data: be careful on Categorical variable vs Continuous variable. if datatype is int, fastai think it is classification, not a regression. Root mean square percentage error. as loss function. When you assign the y_range, it’s better to assign little bit more than actual maximum. > because it’s sigmoid. intermediate layers, which is weight matrix is 1) 1000, and 2) 500 -> which means our parameter would be 500*1000. learn. modelWhat is dropout and embedding dropout?: Nitish Srivastava, Dropout: A Simple way to prevent Neural Networks from Overfitting you can dropout with p value, make it specified to specific layer, or make it applied to all the layers. Pytorch code 1) bernoulli, which decides whether you will hold it? 2) and divide the noise value depends on noise value. so noise became 2 or remain 0. According to pytorch code, We do change at training time, but we do nothing at test time. and this means you don’t have to do anything special with inference time. ’ TODO: find at forums what is inference time - Related to NVIDIA, GPU. Embedding dropout is just a dropout. It’s different between continuous variable and embedding layer. TODO Still can’t understand. why embedding dropout is effective. or,… in need. Let’s delete at random, some of the results of the embedding. and It worked well especially at Kaggle Batch Normalization: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift -> came out false! According to How Does Batch Normalization Help Optimization? The key was multiplicative bias {\gamma} and additive bias {\beta}` Explain Let $$ \hat{y} = f(w_1, w_2, w_3, … , x)} $$ , loss = MSE , Then y_range should be between 1 and 5` And Activation function ends with -1 -> +1 To mitigate this problem, we can add the other parameter, like $$w_n$$ But there’re so much interactions in the process so just re-scale the output. Momentum parameter at BatchNorm1d: Different from momentum like in optimization. This momentum is Exponentially weighted moving average of the mean, instead of deviation. If this is small number: mean standard deviation would be less from mini_batch to mini_batch » less regularization effect. (If this is large number, variation would be greater from mini_batch to mini_batch » more regularization effect) TODO: can’t sure, but i understand, this is not about how to update parameter but about how much reflect previous value when scale and shift Q. Preference between batchnorm and the other regularizations(drop out, weight decay)A. Nope, always try and see the results## lesson6-pets-more### Data Augmentation- Last reg- `get_transforms` has lots of params (even not yet learned all) -> check documentation - Remember you can implement all the doc contents bc it's made from nbdev - TODO: try this!!- Essence of data augmentation is you should maintain the label, while somewhat making sense. - ex) tilt, because it's optically sensible, you can always change the angle of the data view. - zeros, border, and reflection but always `reflection` works most of the time, so that is the default### Convolutional Kernel(What is convolution?)- Will make heat\_map from scratch, which means the parts convolution focuses on![setosa_visualization]()- http://setosa. io/ev/image-kernels/ - javascript thing - How convolution works - Kernel. which does element-wise multiplication, and sum them up - so it has on pixel less at borders -> so it uses padding, and fastai uses reflection as said. - why this Kernel(matrix) helps catching horizontal edge side? - because this kernel`(picture2)` weights differently, depends on `x axis` - why familiar, because it's similar intuition with fugus`(paper)` paper- CNN from different viewpoints`link` - output of pixel is results from different linear equations. - If you connect this with represents of neural network nodes, you can see that the specific inp nodes connected with specific out nodes. - **Summarize**: cnn does 1) matmul some of the elements are always zero 2) same weight for every row, which is called `weight time? weight. . ?, 1:18:50` `(picture)`#### Further lowdown- Because generally image has 3 channels, we need rank 3 kernel. - And **do multiply with all channel output is one pixel**. (`draw by your self`) - but this kernel will catch one feature, like horizontal, so that we make more kernel so that output becomes (h * w * kernel) - And that `kernel` come to `channel`- **Conv2d**: with 3 by 3 kernel, stride 2 conv -> (h/2 * w/2 * kernel) - skip or jump over input pixel - to protect from memory out of control~~~pythonlearn. modellearn. summary()~~~TODO: understand yourself the blocks of conv-kernel: - Usually use big kernel size at first layer (will study this at part2)- Bottom right highlighting kernel(`pic / draw`)- `torch. tensor. expand`: for memory efficient, because we should do RGB- We do not make separate kernel, but make rank 4 kernel - 4d tensor is just stacked kernel- `t[None]. shape` create new unit axis, and why? we make this -> it should move unit of batch, not one size image. ### Average pooling, feature- suppose our pre-trained model results in size of `11 by 11 by 512 ` `pic 4` and my classification task has 37 classes * take the first face of channel, which is 11 by 11 and `mean` it, so that make rank 2 tensor, 512 by 1 * and make 2d matrix, which is 512 by 37 and multiply so that we can get 37 by 1 matrix. - Feature, at convolution block - So, when we transfer-learning without unfreeze, every element of last matrix (512 by 1) should represent(or could catch) each feature. ### Heatmap, Hook~~~hook_output(model[0]) -> acts -> avg_acts~~~- if we average the block with `axis=feature`, result of matrix(11 by 11) depicts `how activated was that area?` -> it is heatmap, `avg_acts`- and acts comes from hook, which is more advanced pytorch feature. - hook into pytorch machine itself, and run any arbitrary Pytorch code - Why this is cool?: Normally it gives set of outputs of forward pass, but we can interrupt and hook the forward pass. - Also can store the output of the convolutional part of the model, which is before avg_pooling- Thinking back when we do cut off `after` the conv part. - but with fast. ai the original convolutional part of the model would be *the first thing in the model*, specifically could be given from `learn. model. eval()[0]` - And this is gotten from `hooked_output` and having hooked the output, we can pass our x_minibatch to output. - Not directly, but with normalized, minibatch, put on to the gpu - `one_item()` function do it, when we have one data `TODO: this is assignment` do it yourself without one_item function - and `. cuda()` put it on gpu- you should print out very often the shape of tensor, and try think why. "
+ }, {
+ "id": 13,
+ "url": "http://localhost:4000/2020/04/qna-image-segmentation/",
+ "title": "[Q&A] Image Segmentation, using Unet with Driving Video data",
+ "body": "2020/04/02 - This post is about my questions while I was studying USF Deep Learning course about image segmentation task. All the answers are from the course, source code, library document, or document. I cared about being clear at reporting information including source of information, however if there are still anything unclear, please contact me. And thank you Jeremy&Rachael for everything. Also Thank you Cambridge Computer Vision Lab to made us to study with your labor. The Cambridge-driving Labeled Video Database (CamVid) is the first collection of videos with object class semantic labels, complete with metadata. The database provides ground truth labels that associate each pixel with one of 32 semantic classes. If someone is interested in this project, please check the site and see the details. Now, let’s start first using jupyter’s one of tricks which I love most. It enables cell to print the code without print function. from IPython. core. interactiveshell import InteractiveShell# pretty print all cell's output and not just the last oneInteractiveShell. ast_node_interactivity = all from fastai. vision import *from fastai. callbacks. hooks import *from fastai. utils. mem import *path = untar_data(URLs. CAMVID) # The locations where the data and models are downloaded are set in config. ymlpath. ls() I’m trying to accustomed to using pathlib module, not just it became built-in module in python, but I felt uncomfortable myself with os module. However, still unpredictable conflicts are remain, even in the quite standard library like Pytorch, tensorflow, onnx. (it require me string for path. not PosixPath. will send PR. . ) [PosixPath('/root/. fastai/data/camvid/valid. txt'), PosixPath('/root/. fastai/data/camvid/images'), PosixPath('/root/. fastai/data/camvid/labels'), PosixPath('/root/. fastai/data/camvid/codes. txt')]path_img = path/'images'path_lbl = path/'labels'fnames = get_image_files(path_img) #filenamelbl_names = get_image_files(path_lbl)1. (Play with data) My Hypothesis: File name has A_B format. and A / B would be at key-value position. Use collections - defaultdict Default Dict: Link: easy to group a sequence of key and value pairs into a dictionary of list?from collections import defaultdictfnames[0], lbl_names[0](PosixPath('/root/. fastai/data/camvid/images/0001TP_009210. png'), PosixPath('/root/. fastai/data/camvid/labels/0016E5_01800_P. png'))files = [tuple(i. stem. split('_')) for i in fnames]labels = [tuple(i. stem. split('_')[:-1]) for i in lbl_names]d = defaultdict(list)for k, v in files: d[k]. append(v)d. keys()len(d['0001TP'])124for k, v in d. items(): print(k, v)0001TP ['009210', '008850', '007350', '008970', '009840', '010140', '008490', '008520', '009540', '008250', '008340', '006840', '007860', '007410', '007740', '009870', '010080', '007890', '008790', '010020', '008400', '007080', '008280', '010380', '009330', '009060', '007470', '006810', '009720', '008580', '007110', '008730', '009150', '007680', '009780', '007800', '007290', '008760', '009510', '008640', '008310', '007440', '006900', '007500', '008460', '009030', '008130', '009480', '009900', '010230', '009270', '008040', '007590', '007950', '009990', '008550', '007260', '008100', '007530', '006960', '008190', '009420', '009930', '009000', '007830', '008940', '006690', '009570', '008880', '010170', '007560', '009300', '006750', '009360', '010200', '007320', '008010', '009120', '007620', '007200', '007140', '010320', '006720', '008670', '007230', '008370', '010260', '009690', '006930', '009090', '007770', '010290', '010350', '008610', '008070', '009600', '008430', '009450', '007380', '009240', '007710', '007170', '008160', '008910', '007020', '006780', '007050', '009960', '009810', '008220', '009180', '009750', '010050', '009660', '010110', '007920', '009630', '007650', '006990', '008700', '009390', '007980', '008820', '006870']0016E5 ['01290', '08159', '05760', '08133', '08063', '06660', '00960', '05850', '00750', '06960', '08035', '08107', '07975', '08017', '05610', '07140', '08119', '08027', '07170', '08400', '08093', '02100', '06390', '04470', '08340', '06060', '00600', '07470', '08151', '07800', '01620', '05730', '01530', '00690', '08430', '05940', '01980', '07320', '08069', '07965', '04380', '05430', '01410', '06780', '08007', '08087', '08079', '06600', '08109', '05490', '00901', '04590', '04680', '08045', '01770', '06690', '08085', '06810', '00420', '08011', '07440', '02190', '06300', '04800', '01500', '00450', '08029', '01470', '06330', '07997', '08067', '05370', '08013', '08190', '00840', '02370', '08049', '08135', '01440', '06870', '05820', '05280', '08051', '04440', '08091', '01380', '00630', '07290', '05520', '04770', '00540', '07995', '07999', '05550', '07920', '08101', '08141', '08053', '04620', '08103', '05160', '07350', '08057', '06030', '06000', '08550', '07963', '08089', '05970', '08047', '05640', '06240', '05220', '04350', '01590', '07959', '01950', '08117', '06180', '01560', '05400', '08043', '07680', '00780', '08081', '07050', '01020', '01350', '04530', '06720', '07969', '08149', '08003', '08131', '08129', '08033', '05460', '01650', '07530', '08023', '05340', '08640', '05100', '08075', '01230', '04980', '02070', '01080', '06210', '05910', '08009', '01800', '05190', '02400', '08083', '08019', '07620', '07200', '07890', '08059', '06990', '04410', '08121', '08123', '06930', '08137', '08147', '08095', '06570', '06150', '08153', '06840', '05250', '00510', '08370', '08580', '08113', '07410', '08097', '01200', '04950', '07770', '07650', '04710', '06090', '08055', '07110', '07981', '00990', '08250', '08127', '01920', '07985', '08220', '08005', '08157', '05130', '08071', '01140', '04830', '07740', '08143', '06120', '02040', '08111', '08115', '00660', '08280', '06420', '07983', '02220', '05700', '01860', '01260', '04920', '06510', '07020', '08073', '08105', '08125', '06360', '07860', '07993', '00810', '06540', '08099', '08139', '02010', '07973', '08155', '07991', '06630', '00480', '06750', '04890', '08001', '08025', '00870', '08490', '01830', '07977', '05010', '01170', '07961', '01680', '01050', '07987', '07080', '04560', '00930', '05310', '02340', '05790', '08460', '00720', '08031', '02280', '08039', '08037', '08065', '06270', '08077', '06900', '04650', '06480', '07230', '08041', '06450', '00570', '07989', '04740', '07979', '02250', '07380', '00390', '01710', '07590', '08021', '08520', '07500', '01110', '04500', '02310', '07971', '02130', '05580', '05880', '08610', '08310', '08145', '05670', '04860', '07260', '08015', '07967', '01740', '01320', '07560', '07830', '01890', '08061', '02160', '07710', '05070', '05040']Seq05VD ['f00030', 'f02550', 'f03450', 'f01110', 'f00480', 'f00210', 'f04590', 'f04170', 'f01800', 'f03990', 'f03360', 'f03900', 'f02070', 'f00810', 'f03690', 'f01350', 'f01530', 'f04980', 'f05100', 'f03060', 'f00900', 'f03870', 'f02460', 'f01470', 'f02370', 'f02820', 'f04080', 'f02760', 'f04860', 'f02250', 'f04200', 'f00270', 'f03720', 'f02850', 'f04410', 'f01200', 'f03090', 'f02010', 'f03930', 'f00090', 'f01650', 'f01890', 'f03840', 'f03030', 'f02130', 'f01230', 'f04110', 'f02520', 'f04140', 'f04020', 'f00060', 'f03420', 'f01560', 'f00120', 'f04290', 'f02340', 'f00300', 'f01380', 'f00870', 'f01860', 'f02970', 'f04560', 'f02730', 'f00330', 'f04530', 'f03780', 'f01770', 'f03390', 'f05040', 'f02430', 'f03330', 'f00660', 'f01740', 'f02100', 'f04800', 'f04050', 'f00510', 'f02790', 'f04350', 'f00690', 'f00540', 'f02490', 'f00960', 'f00930', 'f04230', 'f02880', 'f03600', 'f01020', 'f01500', 'f02400', 'f04830', 'f04470', 'f03300', 'f02670', 'f00450', 'f01980', 'f01170', 'f01620', 'f04500', 'f01080', 'f03180', 'f05070', 'f03150', 'f04950', 'f01440', 'f03510', 'f01710', 'f00360', 'f04770', 'f02910', 'f01050', 'f00630', 'f04320', 'f00570', 'f03240', 'f02190', 'f01140', 'f03540', 'f02220', 'f02640', 'f03960', 'f00000', 'f04920', 'f01950', 'f00990', 'f03480', 'f03000', 'f00420', 'f04620', 'f03210', 'f00780', 'f03570', 'f01590', 'f00750', 'f01920', 'f04650', 'f03750', 'f03630', 'f02310', 'f02610', 'f02580', 'f04740', 'f02280', 'f04680', 'f00390', 'f00720', 'f03660', 'f02040', 'f03270', 'f00180', 'f03810', 'f01410', 'f01290', 'f03120', 'f00840', 'f04440', 'f00150', 'f01260', 'f02700', 'f02940', 'f00600', 'f01830', 'f04260', 'f05010', 'f04890', 'f02160', 'f00240', 'f04380', 'f01680', 'f04710', 'f01320']0006R0 ['f02820', 'f03690', 'f03180', 'f02550', 'f01020', 'f03660', 'f02340', 'f01170', 'f02610', 'f02940', 'f01290', 'f02100', 'f01350', 'f03270', 'f03870', 'f01380', 'f01980', 'f03810', 'f02430', 'f02310', 'f01830', 'f03480', 'f02970', 'f01890', 'f03210', 'f03930', 'f02040', 'f02070', 'f02400', 'f01560', 'f03030', 'f01770', 'f01590', 'f01950', 'f03420', 'f01650', 'f03450', 'f00990', 'f03630', 'f01500', 'f03570', 'f00930', 'f03090', 'f03360', 'f02880', 'f02460', 'f01440', 'f01920', 'f01230', 'f03840', 'f02730', 'f01620', 'f02220', 'f03750', 'f03330', 'f03540', 'f02520', 'f02790', 'f01050', 'f03120', 'f01800', 'f01140', 'f01860', 'f01530', 'f01470', 'f02670', 'f02490', 'f01260', 'f01110', 'f02760', 'f01680', 'f03150', 'f02580', 'f03300', 'f02280', 'f01200', 'f03390', 'f03510', 'f02640', 'f02190', 'f02370', 'f01320', 'f02130', 'f03600', 'f03240', 'f03780', 'f03720', 'f02700', 'f01410', 'f01080', 'f02850', 'f01710', 'f03900', 'f03060', 'f01740', 'f02010', 'f02250', 'f00960', 'f03000', 'f02160', 'f02910']for k, v in d. items(): print(k, len(d[k]))0001TP 1240016E5 305Seq05VD 1710006R0 101for i in d2. keys(): print(i,len(d2[i]))0016E5 3050001TP 1240006R0 101Seq05VD 171files[0], labels[0](('0001TP', '009210'), ('0016E5', '01800'))2. My question: Link: Why do we need masking? and does color from fastai library? (have to look into source code) What do the parameter alpha do? When people make masked img, would it be have ranged integer limit? Does image normalization related with this?lbl_sorted = sorted(lbl_names)f_sorted = sorted(fnames)lbl_1 = lbl_sorted[33]f_1 = f_sorted[33]img = open_image(lbl_1)mask = open_mask(lbl_1)_,axs = plt. subplots(1,2, figsize=(10,5))# img. show(ax=axs[0], y=mask, title='masked')img. show(ax=axs[0], title='1')mask. show(ax=axs[1], title='2', alpha=1. ) img_2 = open_image(f_1)mask_2 = open_mask(f_1)_,axs = plt. subplots(1,2, figsize=(10,5))# img. show(ax=axs[0], y=mask, title='masked')img_2. show(ax=axs[0], title='3',)mask_2. show(ax=axs[1], title='4', alpha=1. ) open_mask(lbl_1). data. shapetorch. Size([1, 720, 960])open_mask(lbl_1). data. shapetorch. Size([1, 720, 960])open_image(f_1). data. shapetorch. Size([3, 720, 960])open_image(f_1). data. shapetorch. Size([3, 720, 960])img. data #labeled datatensor([[[0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], . . . , [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176]], [[0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], . . . , [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176]], [[0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], . . . , [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176]]])mask. data # after mask, labeled datatensor([[[ 4, 4, 4, . . . , 21, 21, 21], [ 4, 4, 4, . . . , 21, 21, 21], [ 4, 4, 4, . . . , 21, 21, 21], . . . , [17, 17, 17, . . . , 30, 30, 30], [17, 17, 17, . . . , 30, 30, 30], [17, 17, 17, . . . , 30, 30, 30]]])img_2. data, mask_2. data(tensor([[[0. 0706, 0. 0667, 0. 0706, . . . , 0. 6431, 0. 6549, 0. 6627], [0. 0745, 0. 0706, 0. 0706, . . . , 0. 6431, 0. 6510, 0. 6549], [0. 0784, 0. 0706, 0. 0745, . . . , 0. 6392, 0. 6588, 0. 6588], . . . , [0. 0863, 0. 0824, 0. 0824, . . . , 0. 1333, 0. 1216, 0. 1255], [0. 0902, 0. 0863, 0. 0824, . . . , 0. 1255, 0. 1176, 0. 1216], [0. 0863, 0. 0824, 0. 0784, . . . , 0. 1137, 0. 1059, 0. 1137]], [[0. 0706, 0. 0667, 0. 0706, . . . , 0. 7490, 0. 7608, 0. 7686], [0. 0745, 0. 0706, 0. 0706, . . . , 0. 7451, 0. 7569, 0. 7608], [0. 0784, 0. 0706, 0. 0745, . . . , 0. 7412, 0. 7529, 0. 7529], . . . , [0. 0980, 0. 0941, 0. 0941, . . . , 0. 1804, 0. 1686, 0. 1725], [0. 1059, 0. 1020, 0. 0980, . . . , 0. 1725, 0. 1647, 0. 1686], [0. 1020, 0. 0980, 0. 0941, . . . , 0. 1608, 0. 1529, 0. 1608]], [[0. 0784, 0. 0745, 0. 0784, . . . , 0. 7569, 0. 7686, 0. 7765], [0. 0824, 0. 0784, 0. 0784, . . . , 0. 7647, 0. 7647, 0. 7686], [0. 0784, 0. 0706, 0. 0745, . . . , 0. 7608, 0. 7647, 0. 7647], . . . , [0. 1216, 0. 1176, 0. 1176, . . . , 0. 2000, 0. 1882, 0. 1922], [0. 1176, 0. 1137, 0. 1098, . . . , 0. 1843, 0. 1765, 0. 1804], [0. 1137, 0. 1098, 0. 1059, . . . , 0. 1725, 0. 1647, 0. 1725]]]), tensor([[[ 18, 17, 18, . . . , 183, 186, 188], [ 19, 18, 18, . . . , 183, 185, 186], [ 20, 18, 19, . . . , 182, 185, 185], . . . , [ 25, 24, 24, . . . , 43, 40, 41], [ 26, 25, 24, . . . , 41, 39, 40], [ 25, 24, 23, . . . , 38, 36, 38]]]))3. What is a difference between image and imageSegment?: imageSegment An ImageSegment object has the same properties as an Image. The only difference is that when applying the transformations to an ImageSegment, it will ignore the functions that deal with lighting and keep values of 0 and 1. It’s easy to show the segmentation mask over the associated Image by using the y argument of show_image. img = open_image(fnames[0])mask = open_mask(lbl_names[0])_,axs = plt. subplots(1,3, figsize=(8,4))img. show(ax=axs[0], title='no mask')img. show(ax=axs[1], y=mask, title='masked') #seg mask over the img using y argmask. show(ax=axs[2], title='mask only', alpha=1. ) vision. image ##4. Why/How img div by 255 and how it results fast. ai : vision. image - If div=True, pixel values are divided by 255. to become floats between 0. and 1. At times, you want to get rid of distortions caused by lights and shadows in an image. Normalizing the RGB values of an image can at times be a simple and effective way of achieving this. So sum of the pixel’s value over all channels(which is S) divides each intensified channel so that nomalized value will be R/S, G/S and B/S (where, S=R+G+B). Detailed explain here4. Python Evaluation Order: Python evaluates expressions from left to right. Notice that while evaluating an assignment, the right-hand side is evaluated before the left-hand side. mask_tmp, trg_tmp, void_tmp = 2, 1, 10mask_tmp = trg_tmp != void_tmpprint(mask_tmp, trg_tmp, void_tmp) # (1) target is not same with voidTrue 1 10# Example 1x = 1y = 2x,y = y,x+yx, y(2, 3)# Example 2x = 1y = 2x = yy = x+yx, y(2, 4)5. model learner parameter :: pct_start: A: Percentage of total number of epochs when learning rate rises during one cycle. Q: Sorry, I still confused that one cycle in the new API only runs one epoch. How the percentage of total number of epochs works? Can you give a example? If learn. fit_one_cycle(10, slice(1e-4,1e-3,1e-2), pct_start=0. 05)??A: Ok, strictly correct answer would be percentage of iterations, so you can have lr both increase and decrease during same epoch. In your example, say, you have 100 iterations per epoch, then for half an epoch (0. 05 * (10 * 100) = 50) lr will rise, then slowly decrease. Q2: Thanks for this explanation … so essentially, it is the percentage of overall iterations where the LR is increasing, correct? So, given the default of 0. 3, it means that your LR is going up for 30% of your iterations and then decreasing over the last 70%. Is that a correct summation of what is happening? A2: Yes, I think that’s correct. You can verify that by changing its value and check:learn. recorder. plot_lr() For example if pct_start = 0. 2 source: forums. fastai "
+ }, {
+ "id": 14,
"url": "http://localhost:4000/2020/03/note08-fastai-4/",
"title": "Gradient backward, Chain Rule, Refactoring",
- "body": "2020/03/02 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring” Lecture 08 - Deep Learning From Foundations-part2 “ Homework: calculus for machine learning einsum conventionCONTENTS: Foundation version Gradients backward pass decompose function chain rule with code check the result using Pytorch autograd Refactor model Layers as classes Modue. forward() Without einsum nn. Linear and nn. Module Forward process Foundation version: Gradients backward pass: Gradients is output with respect to parameter we’ve done this work in this path(below) to simplify this calculus, we can just change it into, So, you should know of the derivative of each bit on its own, and then you multiply them all together. As a result, it would be over cross over the data. So you can get gradient, output with respect to parameter What order should we calculate? BTW, why Jeremy wrote , not Loss function?1 decompose function We want to get derivative of which forms But, we have a estimation of answer (we call it y hat) now So, I will decompose funciton to trace target variable. Using the above forward pass, we can suppose some function from the end. start from , We know MSE funciton got two parameters, output, and target . from MSE’s input we know function’s output and supposing v is input of that function, similarly, v became output of chain rule with code examplify backward process by random sampling To get a variable, I modified forward model a little def model_ping(out = 'x_train'): l1 = lin(x_train, w1, b1) # one linear layer l2 = relu(l1) # one relu layer l3 = lin(l2, w2, b2) # one more linear layer return eval(out) Be careful we don’t use mse_loss in backward process1) start with the very last function, which is loss funciton. MSE If we codify this formula,def mse_grad(inp, targ): #mse_input(1000,1), mse_targ (1000,1) # grad of loss with respect to output of previous layer inp. g = 2. * (inp. squeeze() - targ). unsqueeze(-1) / inp. shape[0] And, this can be examplified like below. Notice that input of gradient function is same with forward functiony_hat = model_ping('l3') #get value from forward modely_hat. g = ((y_hat. squeeze(-1)-y_train). unsqueeze(-1))/y_hat. shape[0]y_hat. g. shape>>> torch. Size([50000, 1]) We can just calculate using broadcasting, not using squeeze. then why should do and unsqueeze again?🎯 It’s related with random access memory(RAM). . If I don’t squeeze, (I’m using colab) it out of RAM. 2) Derivative of linear2 function This process’s weight dimensions defined by axis=1, axis=2. axis=0 dimension means size of data. This will be summazed by . sum(0) method. unsqeeze(-1)&unsqeeze(1) seperates the dimension, and make a dot product, and vanish axis=0 dimension. def lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowlin2 = model_ping('l2'); #get value from forward modellin2. g = y_hat. g@w2. t(); w2. g = (lin2. unsqueeze(-1) * y_hat. g. unsqueeze(1)). sum(0);b2. g = y_hat. g. sum(0);lin2. g. shape, w2. g. shape, b2. g. shape>>> torch. Size([50000, 50])torch. Size([50, 1])torch. Size([1]) Notice going reverse order, we’re passing in gradient backward3) derivative of ReLU def relu_grad(inp, out): # grad of relu with respect to input activations inp. g = (inp>0). float() * out. g Examplified belowlin1=model_ping('l1') #get value from forward modellin1. g = (lin1>0). float() * lin2. g;lin1. g. shape>>> torch. Size([50000, 50])4) Derivative of linear1 Same process with 2) but, this process’s weight hasdef lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowx_train. g = lin1. g @ w1. t(); w1. g = (x_train. unsqueeze(-1) * lin1. g. unsqueeze(1)). sum(0); b1. g = lin1. g. sum(0);x_train. g. shape, w1. g. shape, b1. g. shape>>> torch. Size([50000, 784])torch. Size([784, 50])torch. Size([50])5) Then it goes backward pass def forward_and_backward(inp, targ): # forward pass: l1 = inp @ w1 + b1 l2 = relu(l1) out = l2 @ w2 + b2 # we don't actually need the loss in backward! loss = mse(out, targ) # backward pass: mse_grad(out, targ) lin_grad(l2, out, w2, b2) relu_grad(l1, l2) lin_grad(inp, l1, w1, b1)Version 1 (Basic)- Wall time: 1. 95 s Summary Notice that output of function at forward pass became input of backward pass backpropagation is just the chain rule value loss (loss=mse(out,targ)) is not used in gradient calcuation. Because, it doesn’t appear with the weight. w1g, w2g, b1g, b2g, ig will be used for optimizercheck the result using Pytorch autograd require_grad_ is the magical function, which can automatic differentiation. 2 This magical auto gradified tensor keep track what happend in forward (taking loss function), and do the backward3 So it saves our time to differentiate ourselves ⤵️ THis is benchmark…. . Version 2 (torch autograd)- Wall time: 3. 81 µs Refactor model: Amazingly, just refactoring our main pieces, it comes down up to Pytorch package. 🌟 Implement yourself, Practice, practice, practice! 🌟 Layers as classes: Relu and Linear are layers in oue neural net. -> make it as classes For the forward, using __call__ for the both of forward & backward. Because ‘call’ means we treat this as a function. class Lin(): def __init__(self, w, b): self. w,self. b = w,b def __call__(self, inp): self. inp = inp self. out = inp@self. w + self. b return self. out def backward(self): self. inp. g = self. out. g @ self. w. t() # Creating a giant outer product, just to sum it, is inefficient! self. w. g = (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) self. b. g = self. out. g. sum(0) Remember that in lin_grad function, we save bias&weight!!!!!💬 inp. g : gradient of the output with respect to the input. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 w. g : gradient of the output with respect to the weight. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 b. g : gradient of the output with respect to the bias. {: style=”color:grey; font-size: 90%; text-align: center;”} class Model(): def __init__(self, w1, b1, w2, b2): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ) def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() refer to Jeremy’s Model class, he put layers in list Dionne’s self-study note: Decomposing Jeremy’s Model class init needs weight, bias but not x data when call that class(a. k. a function) it gave x data and y label! jeremy composited function in layers. x = l(x) so concise…. . also utilized that layer list when backward ust reversing it (using python list’s method) And he is recursively calling the function on the result of the previous thing. ⬇️for l in self. layers: x = l(x)Q2: Don’t I need to declare magical autograd function, requires_grad_?{: style=”color:red; font-size: 130%; text-align: center;”} [The questions migrated to this article] Version 3 (refactoring - layer to class)- Wall time: 5. 25 µs Modue. forward(): Duplicate code makes execution time slow. Role of __call__ changed. No more __call__ for implementing forward pass. By initializing the forward with __call__, Module. forward() use overriding to maximize reusability. So any layer inherit Module, can use parent’s function. gradient of the output with respect to the weight (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) can be reexpressed using einsum, torch. einsum( bi,bj->ij , inp, out. g) Defining forward and Module enables Pytorch to out almost duplicatesVersion 4 (Module & einsum)- Wall time: 4. 29 µs Q2: Isn’t there any way to use broadcasting? Why we should use outer product?{: style=”color:red; font-size: 130%; text-align: center;”} Without einsum: Replacing einsum to matrix product is even more faster. torch. einsum( bi,bj->ij , inp, out. g)can be reexpressed using matrix product, inp. t() @ out. gVersion 5 (without einsum)- Wall time: 3. 81 µs nn. Linear and nn. Module: Torch’s package nn. Linear and nn. Module Version 6 (torch package)- Wall time: 5. 01 µs Final, Using torch. nn. Linear & torch. nn. Module~~~pythonclass Model(nn. Module): def init(self, n_in, nh, n_out): super(). init() self. layers = [nn. Linear(n_in,nh), nn. ReLU(), nn. Linear(nh,n_out)] self. loss = mse def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x. squeeze(), targ)class Model(): def init(self): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ)def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() ~~~ Footnote: fast. ai forums Lesson-8 ↩ pytorch docs - autograd ↩ stackoverflow - finding methods a object has ↩ "
+ "body": "2020/03/02 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring ” Lecture 08 - Deep Learning From Foundations-part2 “ Homework: calculus for machine learning einsum conventionCONTENTS: Foundation version Gradients backward pass decompose function chain rule with code check the result using Pytorch autograd Refactor model Layers as classes Modue. forward() Without einsum nn. Linear and nn. Module Forward process Foundation version: Gradients backward pass: Gradients is output with respect to parameter we’ve done this work in this path(below) to simplify this calculus, we can just change it into, So, you should know of the derivative of each bit on its own, and then you multiply them all together. As a result, it would be over cross over the data. So you can get gradient, output with respect to parameter What order should we calculate? BTW, why Jeremy wrote , not Loss function?1 decompose function We want to get derivative of which forms But, we have a estimation of answer (we call it y hat) now So, I will decompose funciton to trace target variable. Using the above forward pass, we can suppose some function from the end. start from , We know MSE funciton got two parameters, output, and target . from MSE’s input we know function’s output and supposing v is input of that function, similarly, v became output of chain rule with code examplify backward process by random sampling To get a variable, I modified forward model a little def model_ping(out = 'x_train'): l1 = lin(x_train, w1, b1) # one linear layer l2 = relu(l1) # one relu layer l3 = lin(l2, w2, b2) # one more linear layer return eval(out) Be careful we don’t use mse_loss in backward process1) start with the very last function, which is loss funciton. MSE If we codify this formula,def mse_grad(inp, targ): #mse_input(1000,1), mse_targ (1000,1) # grad of loss with respect to output of previous layer inp. g = 2. * (inp. squeeze() - targ). unsqueeze(-1) / inp. shape[0] And, this can be examplified like below. Notice that input of gradient function is same with forward functiony_hat = model_ping('l3') #get value from forward modely_hat. g = ((y_hat. squeeze(-1)-y_train). unsqueeze(-1))/y_hat. shape[0]y_hat. g. shape>>> torch. Size([50000, 1]) We can just calculate using broadcasting, not using squeeze. then why should do and unsqueeze again?🎯 It’s related with random access memory(RAM). . If I don’t squeeze, (I’m using colab) it out of RAM. 2) Derivative of linear2 function This process’s weight dimensions defined by axis=1, axis=2. axis=0 dimension means size of data. This will be summazed by . sum(0) method. unsqeeze(-1)&unsqeeze(1) seperates the dimension, and make a dot product, and vanish axis=0 dimension. def lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowlin2 = model_ping('l2'); #get value from forward modellin2. g = y_hat. g@w2. t(); w2. g = (lin2. unsqueeze(-1) * y_hat. g. unsqueeze(1)). sum(0);b2. g = y_hat. g. sum(0);lin2. g. shape, w2. g. shape, b2. g. shape>>> torch. Size([50000, 50])torch. Size([50, 1])torch. Size([1]) Notice going reverse order, we’re passing in gradient backward3) derivative of ReLU def relu_grad(inp, out): # grad of relu with respect to input activations inp. g = (inp>0). float() * out. g Examplified belowlin1=model_ping('l1') #get value from forward modellin1. g = (lin1>0). float() * lin2. g;lin1. g. shape>>> torch. Size([50000, 50])4) Derivative of linear1 Same process with 2) but, this process’s weight hasdef lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowx_train. g = lin1. g @ w1. t(); w1. g = (x_train. unsqueeze(-1) * lin1. g. unsqueeze(1)). sum(0); b1. g = lin1. g. sum(0);x_train. g. shape, w1. g. shape, b1. g. shape>>> torch. Size([50000, 784])torch. Size([784, 50])torch. Size([50])5) Then it goes backward pass def forward_and_backward(inp, targ): # forward pass: l1 = inp @ w1 + b1 l2 = relu(l1) out = l2 @ w2 + b2 # we don't actually need the loss in backward! loss = mse(out, targ) # backward pass: mse_grad(out, targ) lin_grad(l2, out, w2, b2) relu_grad(l1, l2) lin_grad(inp, l1, w1, b1)Version 1 (Basic)- Wall time: 1. 95 s Summary Notice that output of function at forward pass became input of backward pass backpropagation is just the chain rule value loss (loss=mse(out,targ)) is not used in gradient calcuation. Because, it doesn’t appear with the weight. w1g, w2g, b1g, b2g, ig will be used for optimizercheck the result using Pytorch autograd require_grad_ is the magical function, which can automatic differentiation. 2 This magical auto gradified tensor keep track what happend in forward (taking loss function), and do the backward3 So it saves our time to differentiate ourselves Postfix underscore means in pytorch, in-place function, What is in-place function?⤵️ THis is benchmark…. . Version 2 (torch autograd)- Wall time: 3. 81 µs Refactor model: Amazingly, just refactoring our main pieces, it comes down up to Pytorch package. 🌟 Implement yourself, Practice, practice, practice! 🌟 Layers as classes: Relu and Linear are layers in oue neural net. -> make it as classes For the forward, using __call__ for the both of forward & backward. Because ‘call’ means we treat this as a function. class Lin(): def __init__(self, w, b): self. w,self. b = w,b def __call__(self, inp): self. inp = inp self. out = inp@self. w + self. b return self. out def backward(self): self. inp. g = self. out. g @ self. w. t() # Creating a giant outer product, just to sum it, is inefficient! self. w. g = (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) self. b. g = self. out. g. sum(0) Remember that in lin_grad function, we save bias&weight!!!!!💬 inp. g : gradient of the output with respect to the input. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 w. g : gradient of the output with respect to the weight. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 b. g : gradient of the output with respect to the bias. {: style=”color:grey; font-size: 90%; text-align: center;”} class Model(): def __init__(self, w1, b1, w2, b2): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ) def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() refer to Jeremy’s Model class, he put layers in list Dionne’s self-study note: Decomposing Jeremy’s Model class init needs weight, bias but not x data when call that class(a. k. a function) it gave x data and y label! jeremy composited function in layers. x = l(x) so concise…. . also utilized that layer list when backward ust reversing it (using python list’s method) And he is recursively calling the function on the result of the previous thing. ⬇️for l in self. layers: x = l(x)Q2: Don’t I need to declare magical autograd function, requires_grad_?{: style=”color:red; font-size: 130%; text-align: center;”} [The questions migrated to this article] Version 3 (refactoring - layer to class)- Wall time: 5. 25 µs Modue. forward(): Duplicate code makes execution time slow. Role of __call__ changed. No more __call__ for implementing forward pass. By initializing the forward with __call__, Module. forward() use overriding to maximize reusability. So any layer inherit Module, can use parent’s function. gradient of the output with respect to the weight (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) can be reexpressed using einsum, torch. einsum( bi,bj->ij , inp, out. g) Defining forward and Module enables Pytorch to out almost duplicatesVersion 4 (Module & einsum)- Wall time: 4. 29 µs Q2: Isn’t there any way to use broadcasting? Why we should use outer product?{: style=”color:red; font-size: 130%; text-align: center;”} Without einsum: Replacing einsum to matrix product is even more faster. torch. einsum( bi,bj->ij , inp, out. g)can be reexpressed using matrix product, inp. t() @ out. gVersion 5 (without einsum)- Wall time: 3. 81 µs nn. Linear and nn. Module: Torch’s package nn. Linear and nn. Module Version 6 (torch package)- Wall time: 5. 01 µs Final, Using torch. nn. Linear & torch. nn. Module~~~pythonclass Model(nn. Module): def init(self, n_in, nh, n_out): super(). init() self. layers = [nn. Linear(n_in,nh), nn. ReLU(), nn. Linear(nh,n_out)] self. loss = mse def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x. squeeze(), targ)class Model(): def init(self): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ)def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() ~~~ Footnote: fast. ai forums Lesson-8 ↩ pytorch docs - autograd ↩ stackoverflow - finding methods a object has ↩ "
}, {
- "id": 13,
+ "id": 15,
"url": "http://localhost:4000/2020/03/note08-fastai-3/",
"title": "Implement forward&backward pass from scratch",
"body": "2020/03/01 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring1. The forward and backward passes: 1. 1 Normalization: train_mean,train_std = x_train. mean(),x_train. std()>>> train_mean,train_std(tensor(0. 1304), tensor(0. 3073))Remember! Dataset, which is x_train, mean and standard deviation is not 0&1. But we need them to be which means we should substract means and divide data by std. You should not standarlize validation set because training set and validation set should be aparted. after normalize, mean is close to zero, and standard deviation is close to 1. 1. 2 Variable definition: n,m: size of the training set c: the number of activations we need in our model2. Foundation Version: 2. 1 Basic architecture: Our model has one hidden layer, output to have 10 activations, used in cross entropy. But in process of building architecture, we will use mean square error, output to have 1 activations and lator change it to cross entropy number of hidden unit; 50see below pic We want to make w1&w2 mean and std be 0&1. why initializating and make mean zero and std one is important? paper highlighting importance of normalisation - training 10,000 layer network without regularisation1 2. 1. 1 simplified kaiming initQ: Why we did init, normalize with only validation data? Because we can not handle and get statistics from each value of x_valid?{: style=”color:red; font-size: 130%; text-align: center;”} what about hidden(first) layer?w1 = torch. randn(m,nh)b1 = torch. zeros(nh)t = lin(x_valid, w1, b1) # hidden>>> t. mean(), t. std()((tensor(2. 3191), tensor(27. 0303))In output(second) layer, w2 = torch. randn(nh,1)b2 = torch. zeros(1)t2 = lin(t, w2, b2) # output>>> t2. mean(), t2. std()(tensor(-58. 2665), tensor(170. 9717)) which is terribly far from normalzed value. But if we apply simplified kaiming init w1 = torch. randn(m,nh)/math. sqrt(m); b1 = torch. zeros(nh)w2 = torch. randn(nh,1)/math. sqrt(nh); b2 = torch. zeros(1)t = lin(x_valid, w1, b1)t. mean(),t. std()>>> (tensor(-0. 0516), tensor(0. 9354)) But, actually, we use activations not only linear function After applying activations relu at linear layer, mean and deviation became 0. 5. 2. 1. 2 Glorrot initializationPaper2: Understanding the difficulty of training deep feedforward neural networks Gaussian(, bell shaped, normal distributions) is not trained very well. How to initialize neural nets? with the size of layer , the number of filters . But there is No acount for import of ReLU If we got 1000 layers, vanishing gradients problem emerges2. 1. 3 Kaiming initializatingPaper3: Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Kaiming He, explained here rectifier: rectified linear unit rectifier network: neural network with rectifier linear units This is kaiming init, and why suddenly replace one to two on a top? to avoid vanishing gradient(weights) But it doesn’t give very nice mean tough. 2. 1. 4 Pytorch package Why fan_out? according to pytorch documentation, choosing 'fan_in' preserves the magnitude of the variance of the wights in the forward pass. choosing 'fan_out' preserves the magnitues in the backward pass(, which means matmul; with transposed matrix) ➡️ in the other words, torch use fan_out cz pytorch transpose in linear transformaton. What about CNN in Pytorch?I tried torch. nn. Conv2d. conv2d_forward?? Jeremy digged into using torch. nn. modules. conv. _ConvNd. reset_parameters?? 2 in Pytorch, it doesn’t seem to be implemented kaiming init in right formula. so we should use our own operation. But actually, this has been discussed in Pytorch community before. 3 4 Jeremy said it enhanced variance also, so I sampled 100 times and counted better results. To make sure the shape seems sensible. check with assert. (remember we will replace 1 to 10 in cross entropy)assert model(x_valid). shape==torch. Size([x_valid. shape[0],1])>>> model(x_valid). shape(10000, 1) We have made Relu, init, linear, it seems we can forward pass code we need for basic architecture nh = 50def lin(x, w, b): return x@w + b;w1 = torch. randn(m,nh)*math. sqrt(2. /m ); b1 = torch. zeros(nh)w2 = torch. randn(nh,1); b2 = torch. zeros(1)def relu(x): return x. clamp_min(0. ) - 0. 5t1 = relu(lin(x_valid, w1, b1))def model(xb): l1 = lin(xb, w1, b1) l2 = relu(l1) l3 = lin(l2, w2, b2) return l32. 2 Loss function: MSE: Mean squared error need unit vector, so we remove unit axis. def mse(output, targ): return (output. squeeze(-1) - targ). pow(2). mean() In python, in case you remove axis, you use ‘squeeze’, or add axis use ‘unsqueeze’ torch. squeeze where code commonly broken. so, when you use squeeze, clarify dimension axis you want to removetmp = torch. tensor([1,1])tmp. squeeze()>>> tensor([1, 1]) make sure to make as float when you calculateBut why??? because it is tensor?{: style=”color:red; font-size: 130%;”} Here’s the error when I don’t transform the data type ---------------------------------------------------------------------------TypeError Traceback (most recent call last)<ipython-input-22-ae6009bef8b4> in <module>()----> 1 y_train = get_data()[1] # call data again 2 mse(preds, y_train)TypeError: 'map' object is not subscriptable This is forward passFootnote: Other materials: Understanding the difficulty of training deep feedforward neural networks, paper that introduced Xavier initialization Fixup Initialization: Residual Learning Without Normalization ↩ Pytorch implementaion on Kaiming init of conv and linear layers ↩ Pytorch kaiming init issue ↩ Pytorch kaiming init explained ↩ "
}, {
- "id": 14,
+ "id": 16,
"url": "http://localhost:4000/2020/03/note08-fastai-2/",
"title": "What's inside Pytorch Operator?",
"body": "2020/03/01 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, RefactoringWhat’s inside Pytorch Operator?: Section02 Time comparison with pure Python: Matmul with broadcasting> 3194. 95 times faster Einstein summation> 16090. 91 times faster Pytorch’s operator> 49166. 67 times faster 1. Elementwise op: 1. 1 Frobenius norm: above converted into (m*m). sum(). sqrt() Plus, don’t suffer from mathmatical symbols. He also copy and paste that equations from wikipedia. and if you need latex form, download it from archive. 2. Elementwise Matmul: What is the meaning of elementwise? We do not calculate each component. But all of the component at once. Because, length of column of A and row of B are fixed. How much time we saved? So now that takes 1. 37ms. We have removed one line of code and it is a 178 times faster…#TODOI don’t know where the 5 from. but keep it. Maybe this is related with frobenius norm…?as a result, the code before for k in range(ac): c[i,j] += a[i,k] + b[k,j]the code after c[i,j] = (a[i,:] * b[:,j]). sum()To compare it (result betweet original and adjusted version) we use not test_eq but other function. The reason for this is that due to rounding errors from math operations, matrices may not be exactly the same. As a result, we want a function that will “is a equal to b within some tolerance” #exportdef near(a,b): return torch. allclose(a, b, rtol=1e-3, atol=1e-5)def test_near(a,b): test(a,b,near)test_near(t1, matmul(m1, m2))3. Broadcasting: Now, we will use the broadcasting and removec[i,j] = (a[i,:] * b[:,j]). sum() How it works?>>> a=tensor([[10,10,10], [20,20,20], [30,30,30]])>>> b=tensor([1,2,3,])>>> a,b (tensor([[10, 10, 10], [20, 20, 20], [30, 30, 30]]),tensor([1, 2, 3])) >>> a+btensor([[11, 12, 13], [21, 22, 23], [31, 32, 33]]) <Figure 2> demonstrated how array b is broadcasting(or copied but not occupy memory) to compatible with a. Refered from numpy_tutorial there is no loop, but it seems there is exactly the loop. This is not from jeremy (actually after a moment he cover it) but i wondered How to broadcast an array by columns? c=tensor([[1],[2],[3]])a+ctensor([[11, 11, 11], [22, 22, 22], [33, 33, 33]])s What is tensor. stride()?help(t. stride)Help on built-in function stride: stride(…) method of torch. Tensor instancestride(dim) -> tuple or intReturns the stride of :attr:’self’ tensor. Stride is the jump necessary to go from one element to the next one in the specified dimension :attr:’dim’. A tuple of all strides is returned when no argument is passed in. Otherwise, an integer value is returned as the stride in the particular dimension :attr:’dim’. Args: dim (int, optional): the desired dimension in which stride is requiredExample::* x = torch. tensor([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])`x. stride()>>> (5, 1)x. stride(0)>>> 5x. stride(-1)>>> 1 unsqueeze & None index We can manipulate rank of tensor Special value ‘None’, which means please squeeze a new axis here== please broadcast herec = torch. tensor([10,20,30])c[None,:] in c, squeeze a new axis in here please. 2. 2 Matmul with broadcasting: for i in range(ar):# c[i,j] = (a[i,:]). *[:,j]. sum() #previous c[i] = (a[i]. unsqueeze(-1) * b). sum(dim=0) And Using None also (As howard teached)c[i] = (a[i ]. unsqueeze(-1) * b). sum(dim=0) #howardc[i] = (a[i][:,None] * b). sum(dim=0) # using Nonec[i] = (a[i,:,None]*b). sum(dim=0)⭐️Tips🌟 1) Anytime there’s a trailinng(final) colon in numpy or pytorch you can delete it ex) c[i, :] = c [i]2) any number of colon commas at the start, you can switch it with the single elipsis. ex) c[:,:,:,:,i] = c […,i] 2. 3 Broadcasting Rules: What if we tensor. size([1,3]) * tensor. size([3,1])? torch. Size([3, 3]) What is scale???? What if they are one array is times of the other array? ex) Image : 256 x 256 x 3Scale : 128 x 256 x 3Result: ? Why I did not inserted axis via None, but happened broadcasting? >>> c * c[:,None]tensor([[100. , 200. , 300. ], [200. , 400. , 600. ], [300. , 600. , 900. ]])maybe it broadcast cz following array has 3 rows as same principle, no matter what nature shape was, if we do the operation tensor broadcasts to the other. >>> c==c[None]tensor([[True, True, True]])>>> c[None]==c[None,:]tensor([[True, True, True]])>>>c[None,:]==ctensor([[True, True, True]])3. Einstein summation: Creates batch-wise, remove inner most loop, and replaced it with an elementwise producta. k. ac[i,j] += a[i,k] * b[k,j]inner most loop c[i,j] = (a[i,:] * b[:,j]). sum()elementwise product Because K is repeated so we do a dot product. And it is torch. Usage of einsum()1) transpose2) diagnalisation tracing3) batch-wise (matmul) … einstein summation notationdef matmul(a,b): return torch. einsum('ik,kj->ij', a, b)so after all, we are now 16000 times faster than Python. 4. Pytorch op: 49166. 67 times faster than pure python And we will use this matrix multiplication in Fully Connect forward, with some initialized parameters and ReLU. But before that, we need initialized parameters and ReLU, Footnote: TensorRank ti noteResources: Frobenius Norm Review Broadcasting Review (especially Rule) Refer colab! (I totally confused with extension of arrays) torch. allclose Review np. einsum Reviewh "
}, {
- "id": 15,
+ "id": 17,
"url": "http://localhost:4000/2020/02/note08-fastai-1/",
"title": "What is the meaning of 'deep-learning from foundations?'",
"body": "2020/02/29 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring” Lecture 08 - Deep Learning From Foundations-part2 “ I don’t know if you read this article, but I heartily appreciate Rachael Thomas and Jeremy Howard for providing these priceless lectures for free Homework: Review concepts 16 concepts from Course 1 (lessons 1 - 7)(1) Affine Functions & non-linearities; 2) Parameters & activations; 3) Random initialization & transfer learning; 4) SGD, Momentum, Adam; 5) Convolutions; Batch-norm; 6) Dropout; 7) Data augmentation; 8) Weight decay; 9) Res/dense blocks; 10) Image classification and regression; 11)Embeddings; 12) Continuous & Categorical variables; 13) Collaborative filtering; 14) Language models; 15) NLP classification; 16) Segmentation; U-net; GANS) Make sure you understand broadcasting Read section 2. 2 in Delving Deep into Rectifiers Try to replicate as much of the notebooks as you can without peeking; when you get stuck, peek at the lesson notebook, but then close it and try to do it yourself calculus for machine learning based on weight… einsum conventionCONTENTS: What is going on in this course? What is ‘from foundations’? Steps to a basic modern CNN model Today’s implementation goal: 1) matmul -> 4) FC backward Library development using jupyter notebook jupyter notebook certainly can make module Elementwise ops How can we make python faster? What is element wise operation? FootnoteWhat is going on in this course?: What is ‘from foundations’?: 1) Recreate fast. ai and Pytorch 2) using pure python Evade OverfittingOverfit : validation error getting worsetraining loss < validation loss Know the name of the symbol you usefind in this page if you don’t know the symbol that you are using or just draw it here (run by ML!) Steps to a basic modern CNN model: 1) Matrix multiplication -> 2) Relu/Initialization -> 3) Fully-connected Forward-> 4) Fully-connected Backward -> 5) Train loop -> 6) Convolution-> 7) Optimization ->8) Batchnormalization -> 9) Resnet Today’s implementation goal: 1) matmul -> 4) FC backward: Library development using jupyter notebook: what is assers? jupyter notebook certainly can make module: There will be #export tag that Howard (and we) want to extract special notebook2script. py will detect sign of #expert and convert following into python module and test ittest\_eq(TEST,'test')test\_eq(TEST,'test1') what is run_notebook. py? when you want to test your module in command line interface !python run\_notebook. py 01_matmul. ipynb Is there any difference between 1) and 2)?1) test -> test01 2) test01 -> test #TODO I don’t know yet look into run_notebook. py, package fire Jeremy used. What is that?read and run the code in a notebook, and in the process, Jeremy made Python Fire library called!shockingly, fire takes any kind of function and converts into CLI command. fire library was released by Google open source, Thursday, March 2, 2017 Get data pytorch and numpy are pretty much same. variable c explains how many pixels there are in in MNIST, 28 pixels PyTorch’s view() method: torch function that manipulating tensor, and squeeze() in torch & mathmatical operation similar function Rao & McMahan said usually this functions result in feature vector. In part 1, you can use view function several times. Initial python model Which is Linear, like $Xw$(weight)$+a$(bias) $= Y$ If you don’t know hou to multiple matrix, refer this site matmul visulization site How many time spends if we we use pure python function matmul, typical matrix multiplication function, takes about 1 second for calculating 1 single train data! (maybe assumed stochastic, 5 data points in validation) it takes about 11. 36 hours to update parameters even single layer and 1 iteration! (if that was my computer, it would be 14 hours. . )🤪 THIS is why we need to consider ‘time’&’space’ This is kinda slow - what if we could speed it up by 50,000 times? Let’s try! Elementwise ops: How can we make python faster?: If we want to calculate faster, then do remove pythonic calcuation, by passing its computation down to something that is written something other than python, like pytorch. According to PyTorch doc it uses C++ (via ATen), so we are going to implement that function with python. What is element wise operation?: items makes a pair, operate corresponding componentFootnote: notebooks material video broadcasting excel"
}, {
- "id": 16,
+ "id": 18,
"url": "http://localhost:4000/2020/02/what-is-convolution/",
"title": "Digging into convolution",
"body": "2020/02/28 - Issues 1) Kaiming Initializtion in Pytorch was in trouble. 1 2) Jeremy started to dig in, in lesson09, but I didn’t know why the size of tensor is 2 and even understand this spreadsheet data. 3 Homework: Read Visualizing and Understanding Convolutional Networks paper What is a convolution? Visualization one kernel Matthew D Zeiler & Rob Fergus Paper Convolution can be represented as matmul Padding Kernel has rank 3 How can we find a side-edge, a gradient and area of constant weight? What is a convolution?: A convolutional neural network is that your red, green, and blue pixels go into the simple computation, and something comes out of that, and then the result of that goes into a second layer, and the result of that goes into the third layer and so forth. Visualization: one kernel Refer this site for visualizing CNN filteringMatthew D Zeiler & Rob Fergus PaperLecture01 Nine examples of the actual coefficients from the **first layer** Convolution can be represented as matmul: CNNs from different viewpoints {align-items: center;} [A B C D E F G H I J] is 3 by 3 image data flatten to vector. As a result, convolution is a just matrix just two things happens Some of entries are set to zeros at all the times same color always have the same weight. That called weight time / wegith sharing So, we can implement a convolution with matrix multiplication. But, we don’t do that because it’s slow!Padding: What most of libraries do is just put zeros asdie of matrix fast. ai uses reflection paddings (what is this? Jeremy said he uttered it)Kernel has rank 3: As standard picture input would be 4 5, it would be actually 3d, not 2d. If we make kernel as a 3x3 size, we pass over same kernel all the different Red, Green, Blue Pixels. This could make problem, because, if we want to detect frog, which is green, we would want more activations on the green(I made a test cell in my colab 6) How can we find a side-edge, a gradient and area of constant weight?: Not top-edge! One kernel can find only the top-edge, so we should stack the kernels 7 So, we pass it through bunch of kernels to the input images, and that process gives us height x width x corresponding number of kernels. Usually that number of chanel is 16 And if we want to get the more channels and features, we should repeat that process This process gives rise to memory out of control, we do the stride #### conv-example. xlsx 2 convolutional filters At a second layer, filter is 3x3x2 tensor, because to add up together the first layer’s channel. Reference: Problem was math. sqrt(5) was not kaiming initialization formula, Implementation in Pytorch ↩ size of tensor, lecture09 ↩ conv-example. xlsx ↩ Why do computer use red, green and blue instead of primary colors ↩ Grayscale is a group of shades without any visible color. … Each of these dots has its own brightness level as well and, therefore, can be converted to grayscale. A grayscale image is one with all color information removed. ↩ Testing RGB and grayscale ↩ stack kernel and make new rank of tensor at output, Lesson06-2019 ↩ "
}, {
- "id": 17,
+ "id": 19,
"url": "http://localhost:4000/2020/02/dps-week8/",
- "title": "Digital Product School week 8&9",
- "body": "2020/02/24 - The 8th week retropect at Digital Product School Week 8/9 - Ship your MVP/Release next iteration each day This week's schedule CONTENT: Preparing engineering weekly Agile Process Daily Stand-up Making application flowchart (feat draw. io) / ER diagram Flowchart, understaning user journey ER diagram Engineering weekly AI lunch Connecting firebase andPreparing engineering weekly: This week at Wednesday, I planned to explain the Language Modelings, mainly focusing ELMo, ULMFiT, BERT and GPT-2. Slides is available here Changed the presentation, because there were people who are not in ML domain. hereWhenever I do the presentation, I learn more than the information I give them. At the same time, I realize I need to learn more than I know. Agile Process: One of a priceless lesson I learnt from digital product school, was experience of doing agile work. Before I came here, it was a little bit vague concept. I’m not sure ‘what is agile’ but this is what we tried to make agile process. Daily Stand-up: Sharing the works everyday helps interdisciplinary team to work better. Since product started to get higher fidelity, the gap between engineer and non-engineer increased. Actually I didn’t planned to explain concept because I thougth I would be lose my audience when I start to explain. But as daily stand-up, which shares our progess, goes day by day, I planed and reported the issues. And it made each other’s topic feel more familiar. I think point is very important, because at that point people start to be curious. So we can actively ask to the others, and that momwnr, we can explain the point teammate dosen’t know. Each color means every different section. Red: Our team goal, Blue: Interaction designer, Green: Product manager, Yellow: Software/AI engineer This week engineer's main plan Each of us try to explain what we are doing, but things become easier when we are asked. Because we explained something was important to us before, but if we asked it is something important for the others. Making application flowchart (feat draw. io) / ER diagram: Before we start the party, we should clarify the flowchart and ER diagram of our application. Flowchart, understaning user journey: Thanks for google, we could use draw. io for our framechart framework. Actually, we cana choice other good flatform, but draw. io has connected app throgh google drive, most of our engineer was used to it. And after this job, I got to know there is also (of course) rule with the symbols, color, size, space, scaling and direction of arrow -reference. But why we should do this? WE have made our storymap before!! I think storymap is for visualize our status and app. So it should be shared with whole the team, and they should able to understand each role’s issue. But flowchart is more like testing technical feasibility, and error that user can experience. So it could be little more specific, complicated, and hypothetical. This week engineer's main plan ER diagram: Even if we use NoSQL database through firebase, my team was accustomed to SQL more. That what we educated when we were at college, so we had to organize our concept while we were learning NoSQL. Engineering weekly: Every engineering weekly we exchange our knowledge each other so that we can grow together. Before today, my AI collegues presented regression, knn and it was my turn. I prepared slide that explain about pre-trained language model, but my header advised me if I go deep of theoretical things, I would lose my audience. So I decided to brief BERT mode, how I can contribute to other team’s project. Since BERT was breakthrough of NLP industry, I tried to explain how it can be applied to hands on product and how it can help people in their product. The result was quite motivative to me. They gave feedback that since it wasn’t that much theoretical, they could enjoy it, and useful information. Someone asked me do I had learned of presentation before. I was really happy with their feedback! AI lunch: Connecting firebase and: "
+ "title": "My life in Digital Product School - week 8/19/10",
+ "body": "2020/02/24 - The 8/9/10th week retropect at Digital Product School Week 8 - Ship your MVPWeek 9/10 - Release next iteration each day Week 8th schedule CONTENT: Agile Product Development Daily Stand-up(planning) Gemba Walk Sprint Reviews Engineering weeklyAgile Product Development: One of a priceless lesson I learnt from digital product school, was experience of doing agile work. Before I came here, it was a little bit vague concept. I’m still not sure ‘what is agile’ but this is how we tried to make agile process. Daily Stand-up(planning): Sharing the works everyday helps interdisciplinary team to work better. Since product started to get higher fidelity, the gap between engineer and non-engineer increased. Actually I didn’t planned to explain concept because I thougth I would be lose my audience when I start to explain. But as daily stand-up, which shares our progess, goes day by day, I planed and reported the issues. And it made each other’s topic feel more familiar. I think point is very important, because at that point people start to be curious. So we can actively ask to the others, and that momwnr, we can explain the point teammate dosen’t know. Each color means every different section. Red: Our team goal, Blue: Interaction designer, Green: Product manager, Yellow: Software/AI engineer This week engineer's main plan Each of us try to explain what we are doing, but things become easier when we are asked. Because we explained something was important to us before, but if we asked it is something important for the others. Gemba Walk: Team Cero with core team Every 2 weeks, we do the Gemba work, which is ‘question everything to the core team’ time. At this period, people can ask anything related to our product, workshop, and framework. Core team will help just for each team, and each team can solve the problem related to their work. < br/>Why we need this session? because with workshop and general schedule, core team has no time just focus on each team. So through this session, we can have opportunity to understand each program and workshop, like why we are using this platform, and when is the due of our small project, and we have this problem and we need help for this. whatever small problem you have, core team is always willing to help you. Sprint Reviews: Every Friday, we have time to summarise what we did for the week. Maybe we need HMW question and our storymap to share our process and then tell and share what we did try, what point we succeeded and what point it was deviant of our prediction, and why we tried it. . Sprint of Ve-link And then, just after all team’s ppt, we do vote with such a cute marvel. Always it’s very difficult to vote (of course you can’t vote to your team!) Because it depends on criteria what do I value!But since this is process of our agile work, I try to focus on what they have changed since last week, and why they did it, how they did it. Engineering weekly: Every engineering weekly we exchange our knowledge each other so that we can grow together. Everyone have their knowledge to share and we can be tutor and at the same time can be of tutee. Previously, my AI collegues presented regression, knn. And because I’m somewhat specialized to NLP, I prepared slide that explain about pre-trained language model, but my header advised me if I go deep of theoretical things, I would lose my audience. So I decided to brief BERT mode, how I can contribute to other team’s project. Since BERT was breakthrough of NLP industry, I tried to explain how it can be applied to hands on product and how it can help people in their product. The result was quite motivative to me. They gave feedback that since it wasn’t that much theoretical, they could enjoy it, and useful information. Someone asked me do I had learned of presentation before. I was really happy with their feedback! "
}, {
- "id": 18,
+ "id": 20,
"url": "http://localhost:4000/2020/02/fast.ai-nlp-note-16/",
"title": "Algorithmic bias",
"body": "2020/02/20 - Algorithms can encode & magnify human bias Case Study 1: Facial Recognition & Predictive Policing: Joy Buolamwini & Timnit Gebru, gendershades. org Microsoft, FACE+, IBM - All of these things are sell now. Largest gap between $\therefore\ Lighter Male\ >\ Darker\ Female $ This US mayor joked cops should “mount . 50-caliber” guns where AI predicts crime With machine learning, with automation, there’s a 99% success, so that robot is ㅡwill beㅡ99% accurate in telling us what is going to happen next, which is really interesting. - city official in Lancater, CA, approving on using IBM for public security Bias: Bias is type of error Statistical Bias: difference between a statistic’s expected value and the true value Unjust Bias: disproportionate preference for or prejudice against a group Unconscious bias: bias that we don’t realize we have But, term bias is too generic to be productive. Different sources of bias have different causes Representation Bias: Dataset was not representative of the algorithm that might be used on later. Above : Data is okay, but algorithm has some problem. Below : Data has error. For example, object detection production that performs very well in common product of US. But in contrast, change of target product region, like Zimbabwe, Solomon Island, and so on, reduced the performence remarkably. It is not the algorithmic problem, so we should care about data volume of region. Evaluation Bias: Benchmark datasets spur on research, 4. 4% of IJB-A images are dark-skinned women. 2/3 of ImageNet images from the West (Sharkar et al, 2017) Case Study 2: Recidivism Algorithm Used Prison Sentencing: Case Study 3: Online Ad Delivery: Bias in NLP: ( Nothing to do with the course, but I’m researching this field these days. ) But all about Englsih ImpactThe person is doctor. The person is nurse -> 그는 의사다. 그녀는 간호사다. Concept of “biased data” often too generic to be useful: Different sources of bias have different sources Data, models and systems are not unchanging numbers on a screen. They’re the result of a complex process that starts with years of historical context and involves a series of choices and norms, from data measurement to model evaluation to human interpretation. - Harini Suresh, “The problem with Biased Data” Five Sources of Bias in ML: Representation Bias Evaluation Bias Measurement Bias Aggregation Bias(46:02) Historical Bias(46:26) A few studies(47:13) Racial Bias, Even when we have good intentions(new york times)(47:10) gender(48:59) Humans are biased, so why does algorithmic bias matter?: Algorithms & humans are used differently (humans are usually decision maker) Algorithms are accurate and objective No way to apeal if there if error processed large scale cheap Machine learning can amplify bias Machine learning can create feedback loops. Technology is power. And with that comes responsibility. Solutions: Analyze a project at work/school: Questions about AI 5 types of bias (Suresh & Guttag) Datasheets for datasets, Modelcards for model reporting Accuracy rate on different sub-groups Work with domain experts & those impacted Increase diversity in our workspace Advocate for good policy Be on the ongoing lookout for bias"
}, {
- "id": 19,
+ "id": 21,
"url": "http://localhost:4000/2020/02/classifier-city/",
"title": "Making a classifier with image dataset made from gooogle",
"body": "2020/02/15 - CONTENTS: Creating dataset from google images Using google_images_download Create ImageDataBunch Train model fit_one_cycle() Let’s find-tune Let’s train the whole model! Let’s make batch size bigger! Interpretation Model in productionCode can be found hereDeployed model here Making a classifier which can distinguish Seoul from Munich and Sanfrancisco!(hoping my well in Munich!) Creating dataset from google images: In machine learning, you always need data before you build your model. You can use either URLs or google_images_download package. Since Jeremy explained specifically, I will try the other. Using google_images_download: note: This is not google official package Refer to Official Doncument, put that arguments. from google_images_download import google_images_downloadresponse = google_images_download. googleimagesdownload() #class instantiationout_dir = os. path. abspath('. . /. . /materials/dataset/pkg/')os. mkdir(out_dir)arguments = { keywords : Cebu,Munich,Seoul , print_urls :True, suffix_keywords : city , output_directory :out_dir, type : photo , }paths = response. download(arguments) #passing the arguments to the functionprint(paths)and if you need, here is main code. Create ImageDataBunch: We need to separate validation set because we just grabbed these imagese from Google. Most of the dataset we use (kaggle/research) splited into train / validation / test so if they are not devided beforehand we should make databunch, and Jeremy recommended assign 20% to validation. Help on function verify_images in module fastai. vision. data:verify_images(path: Union[pathlib. Path, str], delete: bool = True, max_workers: int = 4, max_size: int = None, recurse: bool = False, dest: Union[pathlib. Path, str] = '. ', n_channels: int = 3, interp=2, ext: str = None, img_format: str = None, resume: bool = None, **kwargs) Check if the images in `path` aren't broken, maybe resize them and copy it in `dest`. Data from google image url Data from package Train model: len(class) len(train) len(valid) Data_url 3 432 108 Data_pkg 3 216 53 Uisng model: restnet34 1, Measurement: accuracy 2 fit_one_cycle(): What is fit one cycle? Cyclical Learning Rates for Training Neural Networks One of the way to find good learning rate. Core idea is to start with small learning rate (like 1e-4, 1e-3) and increase the learning rate after each mini-batch till loss starts exploding. And pick up learning rate one order lower than exploding point. For example, plotted learning rate is like below picture, picking up around 1e-2 is the best way. Why this methods Traditionally, the learning rate is decreased as the learning starts converging with time. But this paper suggests to cycle our learning rate, because it makes us avoid local minimum. Basically this cyclic method enables us to explore whole of loss function so that find out global minimum. In other words, higher learning rate behaves like regularisation. Let’s find-tune: Do train just one last layer by learning rate found by find_lr This section you should find the strongest downward slope that kind of sticking around for quite a while. And choose just one order lower than lowest point. As explained before, I will pick up 1e-2. And of course, this is fine-tuning, we don’t need discriminative learning rate yet. Let’s train the whole model!: link When you plot the learning rate again, maybe you will get soaring shape of learning rate. Rule of thumb, When you slice the learning rate, use learning rate you used at unfrozen part. Divide it by 5 or 10 and put it on maximum bound. At minimum bound, get the point just before it soared, and divide it by 10. Let’s make batch size bigger!: Since default batch size is 64, I tried it to 128. And it gets way more better result(even it’s still underfitting!) And if I freeze model and train whole model again, the model would be better. Also, you can use this method to the other big dataset model training! Interpretation: See the confusion matrix. Result is quite great. *Since I’m using colab, I will skip data cleansing. But I highly recommend you to use ImageCleaner widget, only if you are using jupyter notebook (not jupyter lab) Model in production: You can deploy your model in simple way. I referred fast. ai, and used render(it’s free for limited time). You can find detailed document here. and you can create a route like this. @app. route( /classify-url , methods=[ GET ])async def classify_url(request): bytes = await get_bytes(request. query_params[ url ]) img = open_image(BytesIO(bytes)) _,_,losses = learner. predict(img) return JSONResponse({ predictions : sorted( zip(cat_learner. data. classes, map(float, losses)), key=lambda p: p[1], reverse=True ) })You can find my deployed model here Reference: How to create a deep learning dataset using Google Images towardsdatascience - one cycle policy Deep Residual Learning for Image Recognition ↩ Accuracy_and_precision ↩ "
}, {
- "id": 20,
+ "id": 22,
"url": "http://localhost:4000/2020/02/dps-week5/",
"title": "Digital Product School week 5",
"body": "2020/02/09 - The 5th week retropect at Digital Product School Week 5 - Create a Storymap and sync it with Lean Canvas This week's schedule CONTENT: How to create our story map Prepare your story Discover your product’s AI potentialMondayHow to create our story map: We need this 'aha' moment There was a Milestone workshop, about our weekly goal. As we are agile working, we go fast and change every week’s goal. This week we will finalize our story map based on user’s pain-point and HMW questions. How should we make our story-map Basically we should make story map based on this rule Tell stories, don’t just write them! We always need context, that means all the story component should be connected Visualize your product to establish a shared understanding and speed up discussions! Post-it filled of text is not enough, we should fill it with visualizations then team mates can understand it fast Only discuss in front our your story map! (Speed) So we can update our story-map as soon as we change our opinion And also Use a story map to find the parts that matter most and to identify holes in your idea! Since the story map consists of techinical part, we should consider each story’s technical feasibility Minimise output, maximise outcome and impact! Build tests to figure out what’s minimum and what’s viable! This story map functions to find out our minimum value of ideas Work iteratively: Change your story map according to your learnings! We should repeat this process again and again PMs: Make sure Storymap is up to date!Prepare your story: team cero, our whole story map Our goal Technical feasibility of our storyWhat is your strategy to make user achieve something? This would be our expand point Discover your product’s AI potential: How can we apply AI to our product? Let’s write down our ‘HMW’ questions, and find out all p ossibilities. These are suggestion of possibilities, so don’t attached to feasibility (we will do in at lean start-up) Software section's expectation AI section's expectationTuesday Engineer's task, week5This 5th week, engineers settled WendesdayThursdayFriday"
}, {
- "id": 21,
+ "id": 23,
"url": "http://localhost:4000/2020/02/GPU-time/",
"title": "4 reasons took much time to setting GPU for fast.ai than I expected",
"body": "2020/02/05 - Motivation: Before now, me as a undergraduate student, I was parsimony who usually depend on colab, kaggle, friend’s server(occasional) whenever i need GPU. . And this time it’s been for a while to install GPU than I expected and I share the several component that stood in my way. Written at Oct 24 2019, if you think this is deprecated, please do not have a leap of faith. Just for the record, I’ve used Kaggle, Colab, GCP, Azure, EC2 as GPU cloud. 1. Did not know there is JupyterLab option in Google Cloud Platform. : At the first time when GCP came out, there was no AI Platform service. So from starting vm instance to launching jupyter and installing packages, I did all of the things myself. (and I learned 🤗) $ curl -O https://repo. continuum. io/archive/Anaconda3-5. 0. 1-Linux-x86_64. sh[Downloading conda in ssh] I created VM instance,selected zone, machine type and disk type. Then, define firewall rules and in ssh terminal, install jupyter and other packages. But you can do all of these things just using AI Platform. [AI Platform] I think it especially save your time if you are living in Asia-Pacific, which google doesn’t support not that much GPU resources. 2. Consider if the platform has limited resources in a region you live in. : I live in South Korea, East Asia, and it seems like this region has lots of limitation in GPU (except quite expensive AWS) And the Taiwan which was the only one region where I can launch my own VM with GPU (I tried all the other regions in the list) sometimes do normaly, but not always. 😥After launching, I did several works and next day I could not start VM. (I didn’t count it, but tried it a few hours because I didn’t want cost any more time…) Endlessly failed to start instance, then I choose to move AWS as an alternative way. 3. Fast. ai gives deliberate guide and I didn’t know it. : Fast. ai offer the guide for all available platform. (Colab, salamander, Gradient, Kaggle, Colab, and so on) It is so important, and really needs, because cloud computing options are vary as occasion and purpose arise. I didn’t know fast. ai has manual to running GCP, and I think it’s as good a reason as any for me to be have taken time. It helped me so much when I had aws and shortened my time. I don’t want to read all of the manual in amazno. . (It is recommended. . but I’d rather read GIT PRO now…) ssh -i ~/. ssh/<your_private_key_pair> -L localhost:8888:localhost:8888 ubuntu@<your instance IP>4. You should wait to add more volume just after add volume, by building AWS EC2. : Since Elastic Block Store(EBS) storage supports optimized storage, users can’t extend storage volume two times in a row. Unfortunately, at the first time, I didn’t know it (again 👻) and when VM lacked volume, I doubled dist capacity (76*2) at a rough but It needs more. <!– this time I installed GPU in two years, and it became little complicated compared to 2 years ago. And this time for the first time(maybe not the first time. . but i handled it in my class or with my friend. but it’s my first time on my own. ) I very I’m started to using used google colab, kaggleand, GCP-JupyterLab, ec2 - friend made, aws vm machine but I had a environment variable but i did not know of it. On these days, I could not get a resources from taiwan… I couldn’t notice a deliberate Anyway, as a result I tried myself gcp myself and aws ec2 with fast. ai But I think doing on my self surely takes much time (in this point I wonder why I’m doing this, and should remind me, especially I was studying disk volume optimization) disk volume exceed - https://askubuntu. com/questions/919748/no-space-left-on-device-even-though-there-is: "
}, {
- "id": 22,
+ "id": 24,
"url": "http://localhost:4000/2020/02/dps-week4/",
"title": "Digital Product School week 4",
"body": "2020/02/01 - The 4th week retropect at Digital Product School Week 4 - Find solution ideas and run experiments [This week’s schedule] CONTENT: Ideation Techniques What is ideation techniques? Generating idea in my team AIdeation Team brain storming of idea Die Produkt MacherMondayIdeation Techniques: [slides from @steffen] What is ideation techniques?: We tried to find out user’s painpoint last week. Tried to users talk about their, pain point. No question directly, but extract from them their pain with transportation. Generating idea in my team: AIdeation: TuesdayTeam brain storming of idea: Based on generated idea on Monday, we extended our idea doing rolling-paper! Die Produkt Macher: What is lean start-up? Lean startup is a methodology for developing businesses and products that aims to shorten product development cycles and rapidly discover if a proposed business model is viable; this is achieved by adopting a combination of business-hypothesis-driven experimentation, iterative product releases, and validated learning. - wikipedia WendesdayThursdayFriday"
}, {
- "id": 23,
+ "id": 25,
"url": "http://localhost:4000/2020/01/retrosprect-of-acl-paper-2020/",
"title": "Retrospect of ACL 2020 paper writing",
"body": "2020/01/29 - 2020 Annual Conference of the Association for Computational Linguistics Why I can’t use ‘Cebuano’ for the research?: Why I had to change target language from ‘Cebuano’ to ‘Tagalog’?-> No language translator options except google translation. But before knowing that I already consult my friend, whose mother tongue is English. So I had to aplogize her, but couldn’t tell her why suddenly I changed my plan. -> I realized there are many languages even can’t be researched at all. . -> Getting accustomed to discrimination makes misunderstanding, sometimes. At my country, we couldn’t use music streaming service, because of legal problem. But at that moment, I thought it was discrimination, which is done by music company. "
}, {
- "id": 24,
+ "id": 26,
"url": "http://localhost:4000/2020/01/Git-Merge/",
"title": "Why am I not listed as a contributor?!",
"body": "2020/01/10 - From the end of last year, big changes have witnessed in NLP research. Embracing an unprecedented growth, I started to study new exciting results and advances. In doing so, I noticed I’m not listed as contributor of repo which my PR accessed. How did I come to a repository?: When I’m stuck, I would prefer to code, than to go deep in theory. (It must be so. . too much to understand 🤒)It was BERT released by Google AI I felt keenly the necessity of implementing, because not only couldn’t understand the way they figured out positional encoding formula, but how it actually works. What does it mean to “scale” dot product in Attention? (Now I know it’s far from my section 😂) Figure 1. Scaled Dot Product. Adopted from tensorflow blogWhat was the code error?: For implement code in paper, I read the papers Transformer and BERT, structured the model, and refered the others’ code. Meanwhile, I found out a small error in tokenization process, which was changing a token into [MASK], enabled bidirectional representation. I’ve made PR, and got merged. But I was not in contributors. Why?: Figure 2. Merged Pull request Adopted from graykode projectActually I happened to know there can be couple of reasons github doesn’t include my name as contributor. Well, if contributors tab has more than 100 people, in which case it shows you up only if you are in the top 100 contributors because displaying too many contributors can make webpages down. Somethimes, however, it doesn’t that problem. Why not? Two possibilities are there. First, According to Joel-Glovier, if repository maintainer merged-as-a-rebase PR will end up showing as maintainer’s commit. But maintainer shouldn’t normally do this. Second, if you happend to commit using a different git email that what is in your GitHub profile, it will not be attached to your Github user, and “doesn’t show up” as you. Reference: Michał Chromiak’s blog Github: why are my contributions are not showing on my profile atlassian-gitfetch"
}, {
- "id": 25,
- "url": "http://localhost:4000/2019/12/lesson1-fastai/",
- "title": "Fine Grained Classification",
- "body": "2019/12/31 - Finally you can solve the mystery behind this weird drawing. . through this course. juptyer notebook magic: %reload_ext autoreload%autoreload 2%matplotlib inlinethis is special directives to jupyter notebook, not python code. And it is called ‘magics’ (but i think jeremy is magicion) If somebody changes underlying library code while I’m running this, please reload it automatically If somebody asks to plot something, then please plot it here in this Jupyter NotebookDon’t hesitate to import start~ Digging into untar_data, path. ls: Union[pathlib. Path, str]: typed programming language? -> maybe i think disclaim the type beforehand for sure. Q. like assert? path. ls()this is some module that fast. ai made because os. listdir(‘path’) is unconvinient. Python3 pathlib library!: pathlib "
- }, {
- "id": 26,
+ "id": 27,
"url": "http://localhost:4000/2019/12/jeremy-howard/",
"title": "Jeremy Howard",
"body": "2019/12/15 - This is journey to find out ‘who am I trying to be?’: How he impacted me? The person who made me start Computer Vision again. He emphasized the importance of studying NLP and Computer together to understand the deep-learning. He didn’t order it to study, but always he pursuade me with reasonable way. “It’s not just something I can throw away. NLP and computer vision a few weeks apart and that’s going to force your brain to realize like ‘oh I have to remember this’” He made me admit my failure in deep-learning. I started to objectify where am I. What should I do when I’m frustrated. “Keep going. You’re not expected to remember everything. Yet. You’re not expected to understand everything. Yet. You’re not expected to know why everything works. Yet. ” His articles are numerous, below. What is torch. nn Really? High Performance Numeric Programming with Swift: Explorations and Reflections C++11, random distributions, and Swift And especially, I like this book. Designing great data products Great predictive modeling is an important part of the solution, but it no longer stands on its own; as products become more sophisticated, it disappears into the plumbing. Designing great data products And he is also famous for words. Here are some. we’re going to try and use that to really understand what’s going on. So to warn you, none of it is rocket science but a lot of its going to look really new. So don’t expect to get it the first time but expect to listen and jump into the notebook try a few things test things out look particularly at like tensor shapes and inputs and outputs to check your understanding then go back and listen again. But and kind of try it, a few times, because you will get there right, it’s just that there’s going to be a lot of new concepts because we haven’t done that much stuff in pure Pytorch. Lesson 6: Deep Learning 2019 "
}, {
- "id": 27,
+ "id": 28,
"url": "http://localhost:4000/2019/11/julia-evans/",
"title": "Julia Evans",
"body": "2019/11/20 - This is journey to find out ‘who am I trying to be?’: The women who surprised me in many ways. First, she approached me to teaching some concepts drawing cartoons. It was at Hackers news, which was hightest ranks. Personally I have the use of not to reading title, so and cartoon was so cute and clear. I naturally gonna understood mechanism and astonished by her explaination ability. Her value, which she was taught by many people so want to do same things, moved me. Volume of her knowledge, that just reading post title is a deal of work, amazed me. "
}, {
- "id": 28,
+ "id": 29,
"url": "http://localhost:4000/2019/11/coc-retropective/",
"title": "Retrospective on Pycon 2019 Korea (CoC Committee)",
"body": "2019/11/05 - When I was volunteer, it seems like busy and hectic to managing that crowded conference. In my experience, to get things moving, it needs hierarchy. But it didn’t. Organizers emphasized our responsibility, and if I passed each other’s burden, It could be my burden next time. In solidarity of the obligation, we finished conference well. And after participating PyCon Korea 2018 as volunteer, I’ve joined PyCon Korea Organizer last year. <Figure 1> First meeting of PyCon 2019 Korea Organizers It’s been a while since PyCon 2019 finished. It’s held on Aug 15 - 18, at Coex Grand Balloom <Figure 2> Ongoing session, speaking on news comment processing <Figure 3> Sponsor Booth iin Coex Hall <Figure 4> After PyCon 2019, with all of volunteer, organizer, speakers 😍 🥰 Serving as part of the coc TF, I spent large fraction of last year doing CoC job. here’s the path what we’ve been grappled with to grasp a solution. First half: Before the conference Toward Diverse Community: Formally we’ve been reusing and modifying PyCon US CoC, but we needed fit in Korean and I was part of that to revise code of conduct. Except ‘That’ Diversity, Because it is ‘Harassment’: Specific point was harassment, and the others were not. process of finding the points. How can we settle this point?Second half: During the conference Handling the potential Harassment: Disjunction of policy and real-time situation: This ‘PyCon 2019 Korea retrospective series’ would be devided into 3 Episodes. “Retrospective on Pycon 2019 Korea (CoC Committee)” “Retrospective on Pycon 2019 Korea (Program Chair)” (20 Nov, To Be Update) “Maintaining participation while still making timely decisions” (29 Nov, To Be Update)"
}, {
- "id": 29,
+ "id": 30,
"url": "http://localhost:4000/2019/11/elif-shafak/",
"title": "Elif Shafak",
"body": "2019/11/05 - This is journey to find out ‘who am I trying to be?’: For creative-minded people, Istanbul is a treasure. ’ Photo © Chris Boland, licensed under CC BY-NC-ND 2. 0 it suddenly felt like what I was trying to convey was more complicated and detailed than what the circumstances allowed me to say. And I did what I usually do in similar situations: I stammered, I shut down, and I stopped talking. I stopped talking because the truth was complicated, even though I knew, deep within, that one should never, ever remain silent for fear of complexity. <Figure 1> Elif Shafak Photo credit: www. elifsafak. com. tr I want to talk about emotions and the need to boost our emotional intelligence. I think it’s a pity that mainstream political theory pays very little attention to emotions. Oftentimes, analysts and experts are so busy with data and metrics that they seem to forget those things in life that are difficult to measure and perhaps impossible to cluster under statistical models. But I think this is a mistake, for two main reasons. We are emotional beings. I think it’s going to be one of our biggest intellectual challenges, because our political systems are replete with emotions. In country after country, we have seen illiberal politicians exploiting these emotions. And yet within the academia and among the intelligentsia, we are yet to take emotions seriously. I think we should. 1 2 Reference: British Council Worldwide ↩ Ted Talk ↩ "
}, {
- "id": 30,
+ "id": 31,
"url": "http://localhost:4000/2019/01/dps-week1/",
"title": "Digital Product School week 1",
"body": "2019/01/11 - The 1th week retropect at Digital Product School [This week’s schedule] CONTENT: Welcome to Digital Product School! Trip to Spitzingsee Welcome to Design Office Specifying our goal of product Welcome to Digital Product School!: Trip to Spitzingsee: At the first day of Digital Product School, we had a off-site with all of batch 9 people. All the costs were managed by dps. At the beautiful mountain, we settled the team, and got my team goal. Basically, there are two kind of team in DPS. (1) Wild team - the team has fixed topic(2) Company team - the team which has specific stakeholders, and also topic defined by that stakeholders The Core-team will fix what team you will join in DPS for 3 months based on ymy professionals, they announce it at off-site. [My team for 3 months at DPS] And we decide on my batch #9 theme song. How? Each team draw for songs and pitch ‘why this song should be batch #9 theme song’The result? Imagine dragon - Believer (I didn’t know at the moment, this song would be stamped in my memory) We have a workshop for getting to know each other. For example, we share 1) what do I expect from 3 months of dps, 2) when I feel happy in my life time, 3) what I worked for last week, 4) what was my last project and 5) what plays important role in my life My team's board Cero Welcome to Design Office: At first day of design office, we had workshop, which celebrates my day in dps also discuss specific rule, menifesto and stakeholders We get sticker and attach it in map depends on my nationality Now time to get to know my team’s stakeholders. What they want for us? What they expect from us? How free my team are on the topic?To be honest, it is endless tug-of-war. We should discuss with my stakeholders, endlessly, and find out solution which can meet interest of users, stakeholders and my team. Basically, my team’s main stakeholder is ADAC, but BMW, City of munich and Nokia will also participate as my team’s stakeholders. Specifying our goal of product: "
diff --git a/_site/2020/02/dps-week8/index.html b/_site/2020/02/dps-week8/index.html
index 1040d821ee..47b0b2adc4 100644
--- a/_site/2020/02/dps-week8/index.html
+++ b/_site/2020/02/dps-week8/index.html
@@ -4,24 +4,24 @@
-
Digital Product School week 8&9 | SpellOnYou
+
My life in Digital Product School - week 8/19/10 | SpellOnYou
-
Digital Product School week 8&9 | SpellOnYou
+
My life in Digital Product School - week 8/19/10 | SpellOnYou
-
+
-
-
+
+
-
+
+{"description":"The 8/9/10th week retropect at Digital Product School","author":{"@type":"Person","name":"dionne"},"@type":"BlogPosting","url":"http://localhost:4000/2020/02/dps-week8/","publisher":{"@type":"Organization","logo":{"@type":"ImageObject","url":"http://localhost:4000/assets/images/logo.png"},"name":"dionne"},"image":"http://localhost:4000/assets/images/week8/gate.png","headline":"My life in Digital Product School - week 8/19/10","dateModified":"2020-02-24T00:00:00+09:00","datePublished":"2020-02-24T00:00:00+09:00","mainEntityOfPage":{"@type":"WebPage","@id":"http://localhost:4000/2020/02/dps-week8/"},"@context":"http://schema.org"}
@@ -161,96 +161,101 @@
"body": " {% if page. url == / %} {% assign latest_post = site. posts[0] %} <div class= topfirstimage style= background-image: url({% if latest_post. image contains :// %}{{ latest_post. image }}{% else %} {{site. baseurl}}/{{ latest_post. image}}{% endif %}); height: 200px; background-size: cover; background-repeat: no-repeat; ></div> {{ latest_post. title }} : {{ latest_post. excerpt | strip_html | strip_newlines | truncate: 136 }} In {% for category in latest_post. categories %} {{ category }}, {% endfor %} {{ latest_post. date | date: '%b %d, %Y' }} {%- assign second_post = site. posts[1] -%} {% if second_post. image %} <img class= w-100 src= {% if second_post. image contains :// %}{{ second_post. image }}{% else %}{{ second_post. image | absolute_url }}{% endif %} alt= {{ second_post. title }} > {% endif %} {{ second_post. title }} : In {% for category in second_post. categories %} {{ category }}, {% endfor %} {{ second_post. date | date: '%b %d, %Y' }} {%- assign third_post = site. posts[2] -%} {% if third_post. image %} <img class= w-100 src= {% if third_post. image contains :// %}{{ third_post. image }}{% else %}{{site. baseurl}}/{{ third_post. image }}{% endif %} alt= {{ third_post. title }} > {% endif %} {{ third_post. title }} : In {% for category in third_post. categories %} {{ category }}, {% endfor %} {{ third_post. date | date: '%b %d, %Y' }} {%- assign fourth_post = site. posts[3] -%} {% if fourth_post. image %} <img class= w-100 src= {% if fourth_post. image contains :// %}{{ fourth_post. image }}{% else %}{{site. baseurl}}/{{ fourth_post. image }}{% endif %} alt= {{ fourth_post. title }} > {% endif %} {{ fourth_post. title }} : In {% for category in fourth_post. categories %} {{ category }}, {% endfor %} {{ fourth_post. date | date: '%b %d, %Y' }} {% for post in site. posts %} {% if post. tags contains sticky %} {{post. title}} {{ post. excerpt | strip_html | strip_newlines | truncate: 136 }} Read More {% endif %}{% endfor %} {% endif %} All Stories: {% for post in paginator. posts %} {% include main-loop-card. html %} {% endfor %} {% if paginator. total_pages > 1 %} {% if paginator. previous_page %} « Prev {% else %} « {% endif %} {% for page in (1. . paginator. total_pages) %} {% if page == paginator. page %} {{ page }} {% elsif page == 1 %} {{ page }} {% else %} {{ page }} {% endif %} {% endfor %} {% if paginator. next_page %} Next » {% else %} » {% endif %} {% endif %} {% include sidebar-featured. html %} "
}, {
"id": 12,
+ "url": "http://localhost:4000/2020/04/v3-2019-lesson06-note/",
+ "title": "fastai 2019 course-v3 Part1, lesson06",
+ "body": "2020/04/15 - Lesson 06Rossmann(Tabular): Tabular data: be careful on Categorical variable vs Continuous variable. if datatype is int, fastai think it is classification, not a regression. Root mean square percentage error. as loss function. When you assign the y_range, it’s better to assign little bit more than actual maximum. > because it’s sigmoid. intermediate layers, which is weight matrix is 1) 1000, and 2) 500 -> which means our parameter would be 500*1000. learn. modelWhat is dropout and embedding dropout?: Nitish Srivastava, Dropout: A Simple way to prevent Neural Networks from Overfitting you can dropout with p value, make it specified to specific layer, or make it applied to all the layers. Pytorch code 1) bernoulli, which decides whether you will hold it? 2) and divide the noise value depends on noise value. so noise became 2 or remain 0. According to pytorch code, We do change at training time, but we do nothing at test time. and this means you don’t have to do anything special with inference time. ’ TODO: find at forums what is inference time - Related to NVIDIA, GPU. Embedding dropout is just a dropout. It’s different between continuous variable and embedding layer. TODO Still can’t understand. why embedding dropout is effective. or,… in need. Let’s delete at random, some of the results of the embedding. and It worked well especially at Kaggle Batch Normalization: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift -> came out false! According to How Does Batch Normalization Help Optimization? The key was multiplicative bias {\gamma} and additive bias {\beta}` Explain Let $$ \hat{y} = f(w_1, w_2, w_3, … , x)} $$ , loss = MSE , Then y_range should be between 1 and 5` And Activation function ends with -1 -> +1 To mitigate this problem, we can add the other parameter, like $$w_n$$ But there’re so much interactions in the process so just re-scale the output. Momentum parameter at BatchNorm1d: Different from momentum like in optimization. This momentum is Exponentially weighted moving average of the mean, instead of deviation. If this is small number: mean standard deviation would be less from mini_batch to mini_batch » less regularization effect. (If this is large number, variation would be greater from mini_batch to mini_batch » more regularization effect) TODO: can’t sure, but i understand, this is not about how to update parameter but about how much reflect previous value when scale and shift Q. Preference between batchnorm and the other regularizations(drop out, weight decay)A. Nope, always try and see the results## lesson6-pets-more### Data Augmentation- Last reg- `get_transforms` has lots of params (even not yet learned all) -> check documentation - Remember you can implement all the doc contents bc it's made from nbdev - TODO: try this!!- Essence of data augmentation is you should maintain the label, while somewhat making sense. - ex) tilt, because it's optically sensible, you can always change the angle of the data view. - zeros, border, and reflection but always `reflection` works most of the time, so that is the default### Convolutional Kernel(What is convolution?)- Will make heat\_map from scratch, which means the parts convolution focuses on![setosa_visualization]()- http://setosa. io/ev/image-kernels/ - javascript thing - How convolution works - Kernel. which does element-wise multiplication, and sum them up - so it has on pixel less at borders -> so it uses padding, and fastai uses reflection as said. - why this Kernel(matrix) helps catching horizontal edge side? - because this kernel`(picture2)` weights differently, depends on `x axis` - why familiar, because it's similar intuition with fugus`(paper)` paper- CNN from different viewpoints`link` - output of pixel is results from different linear equations. - If you connect this with represents of neural network nodes, you can see that the specific inp nodes connected with specific out nodes. - **Summarize**: cnn does 1) matmul some of the elements are always zero 2) same weight for every row, which is called `weight time? weight. . ?, 1:18:50` `(picture)`#### Further lowdown- Because generally image has 3 channels, we need rank 3 kernel. - And **do multiply with all channel output is one pixel**. (`draw by your self`) - but this kernel will catch one feature, like horizontal, so that we make more kernel so that output becomes (h * w * kernel) - And that `kernel` come to `channel`- **Conv2d**: with 3 by 3 kernel, stride 2 conv -> (h/2 * w/2 * kernel) - skip or jump over input pixel - to protect from memory out of control~~~pythonlearn. modellearn. summary()~~~TODO: understand yourself the blocks of conv-kernel: - Usually use big kernel size at first layer (will study this at part2)- Bottom right highlighting kernel(`pic / draw`)- `torch. tensor. expand`: for memory efficient, because we should do RGB- We do not make separate kernel, but make rank 4 kernel - 4d tensor is just stacked kernel- `t[None]. shape` create new unit axis, and why? we make this -> it should move unit of batch, not one size image. ### Average pooling, feature- suppose our pre-trained model results in size of `11 by 11 by 512 ` `pic 4` and my classification task has 37 classes * take the first face of channel, which is 11 by 11 and `mean` it, so that make rank 2 tensor, 512 by 1 * and make 2d matrix, which is 512 by 37 and multiply so that we can get 37 by 1 matrix. - Feature, at convolution block - So, when we transfer-learning without unfreeze, every element of last matrix (512 by 1) should represent(or could catch) each feature. ### Heatmap, Hook~~~hook_output(model[0]) -> acts -> avg_acts~~~- if we average the block with `axis=feature`, result of matrix(11 by 11) depicts `how activated was that area?` -> it is heatmap, `avg_acts`- and acts comes from hook, which is more advanced pytorch feature. - hook into pytorch machine itself, and run any arbitrary Pytorch code - Why this is cool?: Normally it gives set of outputs of forward pass, but we can interrupt and hook the forward pass. - Also can store the output of the convolutional part of the model, which is before avg_pooling- Thinking back when we do cut off `after` the conv part. - but with fast. ai the original convolutional part of the model would be *the first thing in the model*, specifically could be given from `learn. model. eval()[0]` - And this is gotten from `hooked_output` and having hooked the output, we can pass our x_minibatch to output. - Not directly, but with normalized, minibatch, put on to the gpu - `one_item()` function do it, when we have one data `TODO: this is assignment` do it yourself without one_item function - and `. cuda()` put it on gpu- you should print out very often the shape of tensor, and try think why. "
+ }, {
+ "id": 13,
+ "url": "http://localhost:4000/2020/04/qna-image-segmentation/",
+ "title": "[Q&A] Image Segmentation, using Unet with Driving Video data",
+ "body": "2020/04/02 - This post is about my questions while I was studying USF Deep Learning course about image segmentation task. All the answers are from the course, source code, library document, or document. I cared about being clear at reporting information including source of information, however if there are still anything unclear, please contact me. And thank you Jeremy&Rachael for everything. Also Thank you Cambridge Computer Vision Lab to made us to study with your labor. The Cambridge-driving Labeled Video Database (CamVid) is the first collection of videos with object class semantic labels, complete with metadata. The database provides ground truth labels that associate each pixel with one of 32 semantic classes. If someone is interested in this project, please check the site and see the details. Now, let’s start first using jupyter’s one of tricks which I love most. It enables cell to print the code without print function. from IPython. core. interactiveshell import InteractiveShell# pretty print all cell's output and not just the last oneInteractiveShell. ast_node_interactivity = all from fastai. vision import *from fastai. callbacks. hooks import *from fastai. utils. mem import *path = untar_data(URLs. CAMVID) # The locations where the data and models are downloaded are set in config. ymlpath. ls() I’m trying to accustomed to using pathlib module, not just it became built-in module in python, but I felt uncomfortable myself with os module. However, still unpredictable conflicts are remain, even in the quite standard library like Pytorch, tensorflow, onnx. (it require me string for path. not PosixPath. will send PR. . ) [PosixPath('/root/. fastai/data/camvid/valid. txt'), PosixPath('/root/. fastai/data/camvid/images'), PosixPath('/root/. fastai/data/camvid/labels'), PosixPath('/root/. fastai/data/camvid/codes. txt')]path_img = path/'images'path_lbl = path/'labels'fnames = get_image_files(path_img) #filenamelbl_names = get_image_files(path_lbl)1. (Play with data) My Hypothesis: File name has A_B format. and A / B would be at key-value position. Use collections - defaultdict Default Dict: Link: easy to group a sequence of key and value pairs into a dictionary of list?from collections import defaultdictfnames[0], lbl_names[0](PosixPath('/root/. fastai/data/camvid/images/0001TP_009210. png'), PosixPath('/root/. fastai/data/camvid/labels/0016E5_01800_P. png'))files = [tuple(i. stem. split('_')) for i in fnames]labels = [tuple(i. stem. split('_')[:-1]) for i in lbl_names]d = defaultdict(list)for k, v in files: d[k]. append(v)d. keys()len(d['0001TP'])124for k, v in d. items(): print(k, v)0001TP ['009210', '008850', '007350', '008970', '009840', '010140', '008490', '008520', '009540', '008250', '008340', '006840', '007860', '007410', '007740', '009870', '010080', '007890', '008790', '010020', '008400', '007080', '008280', '010380', '009330', '009060', '007470', '006810', '009720', '008580', '007110', '008730', '009150', '007680', '009780', '007800', '007290', '008760', '009510', '008640', '008310', '007440', '006900', '007500', '008460', '009030', '008130', '009480', '009900', '010230', '009270', '008040', '007590', '007950', '009990', '008550', '007260', '008100', '007530', '006960', '008190', '009420', '009930', '009000', '007830', '008940', '006690', '009570', '008880', '010170', '007560', '009300', '006750', '009360', '010200', '007320', '008010', '009120', '007620', '007200', '007140', '010320', '006720', '008670', '007230', '008370', '010260', '009690', '006930', '009090', '007770', '010290', '010350', '008610', '008070', '009600', '008430', '009450', '007380', '009240', '007710', '007170', '008160', '008910', '007020', '006780', '007050', '009960', '009810', '008220', '009180', '009750', '010050', '009660', '010110', '007920', '009630', '007650', '006990', '008700', '009390', '007980', '008820', '006870']0016E5 ['01290', '08159', '05760', '08133', '08063', '06660', '00960', '05850', '00750', '06960', '08035', '08107', '07975', '08017', '05610', '07140', '08119', '08027', '07170', '08400', '08093', '02100', '06390', '04470', '08340', '06060', '00600', '07470', '08151', '07800', '01620', '05730', '01530', '00690', '08430', '05940', '01980', '07320', '08069', '07965', '04380', '05430', '01410', '06780', '08007', '08087', '08079', '06600', '08109', '05490', '00901', '04590', '04680', '08045', '01770', '06690', '08085', '06810', '00420', '08011', '07440', '02190', '06300', '04800', '01500', '00450', '08029', '01470', '06330', '07997', '08067', '05370', '08013', '08190', '00840', '02370', '08049', '08135', '01440', '06870', '05820', '05280', '08051', '04440', '08091', '01380', '00630', '07290', '05520', '04770', '00540', '07995', '07999', '05550', '07920', '08101', '08141', '08053', '04620', '08103', '05160', '07350', '08057', '06030', '06000', '08550', '07963', '08089', '05970', '08047', '05640', '06240', '05220', '04350', '01590', '07959', '01950', '08117', '06180', '01560', '05400', '08043', '07680', '00780', '08081', '07050', '01020', '01350', '04530', '06720', '07969', '08149', '08003', '08131', '08129', '08033', '05460', '01650', '07530', '08023', '05340', '08640', '05100', '08075', '01230', '04980', '02070', '01080', '06210', '05910', '08009', '01800', '05190', '02400', '08083', '08019', '07620', '07200', '07890', '08059', '06990', '04410', '08121', '08123', '06930', '08137', '08147', '08095', '06570', '06150', '08153', '06840', '05250', '00510', '08370', '08580', '08113', '07410', '08097', '01200', '04950', '07770', '07650', '04710', '06090', '08055', '07110', '07981', '00990', '08250', '08127', '01920', '07985', '08220', '08005', '08157', '05130', '08071', '01140', '04830', '07740', '08143', '06120', '02040', '08111', '08115', '00660', '08280', '06420', '07983', '02220', '05700', '01860', '01260', '04920', '06510', '07020', '08073', '08105', '08125', '06360', '07860', '07993', '00810', '06540', '08099', '08139', '02010', '07973', '08155', '07991', '06630', '00480', '06750', '04890', '08001', '08025', '00870', '08490', '01830', '07977', '05010', '01170', '07961', '01680', '01050', '07987', '07080', '04560', '00930', '05310', '02340', '05790', '08460', '00720', '08031', '02280', '08039', '08037', '08065', '06270', '08077', '06900', '04650', '06480', '07230', '08041', '06450', '00570', '07989', '04740', '07979', '02250', '07380', '00390', '01710', '07590', '08021', '08520', '07500', '01110', '04500', '02310', '07971', '02130', '05580', '05880', '08610', '08310', '08145', '05670', '04860', '07260', '08015', '07967', '01740', '01320', '07560', '07830', '01890', '08061', '02160', '07710', '05070', '05040']Seq05VD ['f00030', 'f02550', 'f03450', 'f01110', 'f00480', 'f00210', 'f04590', 'f04170', 'f01800', 'f03990', 'f03360', 'f03900', 'f02070', 'f00810', 'f03690', 'f01350', 'f01530', 'f04980', 'f05100', 'f03060', 'f00900', 'f03870', 'f02460', 'f01470', 'f02370', 'f02820', 'f04080', 'f02760', 'f04860', 'f02250', 'f04200', 'f00270', 'f03720', 'f02850', 'f04410', 'f01200', 'f03090', 'f02010', 'f03930', 'f00090', 'f01650', 'f01890', 'f03840', 'f03030', 'f02130', 'f01230', 'f04110', 'f02520', 'f04140', 'f04020', 'f00060', 'f03420', 'f01560', 'f00120', 'f04290', 'f02340', 'f00300', 'f01380', 'f00870', 'f01860', 'f02970', 'f04560', 'f02730', 'f00330', 'f04530', 'f03780', 'f01770', 'f03390', 'f05040', 'f02430', 'f03330', 'f00660', 'f01740', 'f02100', 'f04800', 'f04050', 'f00510', 'f02790', 'f04350', 'f00690', 'f00540', 'f02490', 'f00960', 'f00930', 'f04230', 'f02880', 'f03600', 'f01020', 'f01500', 'f02400', 'f04830', 'f04470', 'f03300', 'f02670', 'f00450', 'f01980', 'f01170', 'f01620', 'f04500', 'f01080', 'f03180', 'f05070', 'f03150', 'f04950', 'f01440', 'f03510', 'f01710', 'f00360', 'f04770', 'f02910', 'f01050', 'f00630', 'f04320', 'f00570', 'f03240', 'f02190', 'f01140', 'f03540', 'f02220', 'f02640', 'f03960', 'f00000', 'f04920', 'f01950', 'f00990', 'f03480', 'f03000', 'f00420', 'f04620', 'f03210', 'f00780', 'f03570', 'f01590', 'f00750', 'f01920', 'f04650', 'f03750', 'f03630', 'f02310', 'f02610', 'f02580', 'f04740', 'f02280', 'f04680', 'f00390', 'f00720', 'f03660', 'f02040', 'f03270', 'f00180', 'f03810', 'f01410', 'f01290', 'f03120', 'f00840', 'f04440', 'f00150', 'f01260', 'f02700', 'f02940', 'f00600', 'f01830', 'f04260', 'f05010', 'f04890', 'f02160', 'f00240', 'f04380', 'f01680', 'f04710', 'f01320']0006R0 ['f02820', 'f03690', 'f03180', 'f02550', 'f01020', 'f03660', 'f02340', 'f01170', 'f02610', 'f02940', 'f01290', 'f02100', 'f01350', 'f03270', 'f03870', 'f01380', 'f01980', 'f03810', 'f02430', 'f02310', 'f01830', 'f03480', 'f02970', 'f01890', 'f03210', 'f03930', 'f02040', 'f02070', 'f02400', 'f01560', 'f03030', 'f01770', 'f01590', 'f01950', 'f03420', 'f01650', 'f03450', 'f00990', 'f03630', 'f01500', 'f03570', 'f00930', 'f03090', 'f03360', 'f02880', 'f02460', 'f01440', 'f01920', 'f01230', 'f03840', 'f02730', 'f01620', 'f02220', 'f03750', 'f03330', 'f03540', 'f02520', 'f02790', 'f01050', 'f03120', 'f01800', 'f01140', 'f01860', 'f01530', 'f01470', 'f02670', 'f02490', 'f01260', 'f01110', 'f02760', 'f01680', 'f03150', 'f02580', 'f03300', 'f02280', 'f01200', 'f03390', 'f03510', 'f02640', 'f02190', 'f02370', 'f01320', 'f02130', 'f03600', 'f03240', 'f03780', 'f03720', 'f02700', 'f01410', 'f01080', 'f02850', 'f01710', 'f03900', 'f03060', 'f01740', 'f02010', 'f02250', 'f00960', 'f03000', 'f02160', 'f02910']for k, v in d. items(): print(k, len(d[k]))0001TP 1240016E5 305Seq05VD 1710006R0 101for i in d2. keys(): print(i,len(d2[i]))0016E5 3050001TP 1240006R0 101Seq05VD 171files[0], labels[0](('0001TP', '009210'), ('0016E5', '01800'))2. My question: Link: Why do we need masking? and does color from fastai library? (have to look into source code) What do the parameter alpha do? When people make masked img, would it be have ranged integer limit? Does image normalization related with this?lbl_sorted = sorted(lbl_names)f_sorted = sorted(fnames)lbl_1 = lbl_sorted[33]f_1 = f_sorted[33]img = open_image(lbl_1)mask = open_mask(lbl_1)_,axs = plt. subplots(1,2, figsize=(10,5))# img. show(ax=axs[0], y=mask, title='masked')img. show(ax=axs[0], title='1')mask. show(ax=axs[1], title='2', alpha=1. ) img_2 = open_image(f_1)mask_2 = open_mask(f_1)_,axs = plt. subplots(1,2, figsize=(10,5))# img. show(ax=axs[0], y=mask, title='masked')img_2. show(ax=axs[0], title='3',)mask_2. show(ax=axs[1], title='4', alpha=1. ) open_mask(lbl_1). data. shapetorch. Size([1, 720, 960])open_mask(lbl_1). data. shapetorch. Size([1, 720, 960])open_image(f_1). data. shapetorch. Size([3, 720, 960])open_image(f_1). data. shapetorch. Size([3, 720, 960])img. data #labeled datatensor([[[0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], . . . , [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176]], [[0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], . . . , [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176]], [[0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], . . . , [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176]]])mask. data # after mask, labeled datatensor([[[ 4, 4, 4, . . . , 21, 21, 21], [ 4, 4, 4, . . . , 21, 21, 21], [ 4, 4, 4, . . . , 21, 21, 21], . . . , [17, 17, 17, . . . , 30, 30, 30], [17, 17, 17, . . . , 30, 30, 30], [17, 17, 17, . . . , 30, 30, 30]]])img_2. data, mask_2. data(tensor([[[0. 0706, 0. 0667, 0. 0706, . . . , 0. 6431, 0. 6549, 0. 6627], [0. 0745, 0. 0706, 0. 0706, . . . , 0. 6431, 0. 6510, 0. 6549], [0. 0784, 0. 0706, 0. 0745, . . . , 0. 6392, 0. 6588, 0. 6588], . . . , [0. 0863, 0. 0824, 0. 0824, . . . , 0. 1333, 0. 1216, 0. 1255], [0. 0902, 0. 0863, 0. 0824, . . . , 0. 1255, 0. 1176, 0. 1216], [0. 0863, 0. 0824, 0. 0784, . . . , 0. 1137, 0. 1059, 0. 1137]], [[0. 0706, 0. 0667, 0. 0706, . . . , 0. 7490, 0. 7608, 0. 7686], [0. 0745, 0. 0706, 0. 0706, . . . , 0. 7451, 0. 7569, 0. 7608], [0. 0784, 0. 0706, 0. 0745, . . . , 0. 7412, 0. 7529, 0. 7529], . . . , [0. 0980, 0. 0941, 0. 0941, . . . , 0. 1804, 0. 1686, 0. 1725], [0. 1059, 0. 1020, 0. 0980, . . . , 0. 1725, 0. 1647, 0. 1686], [0. 1020, 0. 0980, 0. 0941, . . . , 0. 1608, 0. 1529, 0. 1608]], [[0. 0784, 0. 0745, 0. 0784, . . . , 0. 7569, 0. 7686, 0. 7765], [0. 0824, 0. 0784, 0. 0784, . . . , 0. 7647, 0. 7647, 0. 7686], [0. 0784, 0. 0706, 0. 0745, . . . , 0. 7608, 0. 7647, 0. 7647], . . . , [0. 1216, 0. 1176, 0. 1176, . . . , 0. 2000, 0. 1882, 0. 1922], [0. 1176, 0. 1137, 0. 1098, . . . , 0. 1843, 0. 1765, 0. 1804], [0. 1137, 0. 1098, 0. 1059, . . . , 0. 1725, 0. 1647, 0. 1725]]]), tensor([[[ 18, 17, 18, . . . , 183, 186, 188], [ 19, 18, 18, . . . , 183, 185, 186], [ 20, 18, 19, . . . , 182, 185, 185], . . . , [ 25, 24, 24, . . . , 43, 40, 41], [ 26, 25, 24, . . . , 41, 39, 40], [ 25, 24, 23, . . . , 38, 36, 38]]]))3. What is a difference between image and imageSegment?: imageSegment An ImageSegment object has the same properties as an Image. The only difference is that when applying the transformations to an ImageSegment, it will ignore the functions that deal with lighting and keep values of 0 and 1. It’s easy to show the segmentation mask over the associated Image by using the y argument of show_image. img = open_image(fnames[0])mask = open_mask(lbl_names[0])_,axs = plt. subplots(1,3, figsize=(8,4))img. show(ax=axs[0], title='no mask')img. show(ax=axs[1], y=mask, title='masked') #seg mask over the img using y argmask. show(ax=axs[2], title='mask only', alpha=1. ) vision. image ##4. Why/How img div by 255 and how it results fast. ai : vision. image - If div=True, pixel values are divided by 255. to become floats between 0. and 1. At times, you want to get rid of distortions caused by lights and shadows in an image. Normalizing the RGB values of an image can at times be a simple and effective way of achieving this. So sum of the pixel’s value over all channels(which is S) divides each intensified channel so that nomalized value will be R/S, G/S and B/S (where, S=R+G+B). Detailed explain here4. Python Evaluation Order: Python evaluates expressions from left to right. Notice that while evaluating an assignment, the right-hand side is evaluated before the left-hand side. mask_tmp, trg_tmp, void_tmp = 2, 1, 10mask_tmp = trg_tmp != void_tmpprint(mask_tmp, trg_tmp, void_tmp) # (1) target is not same with voidTrue 1 10# Example 1x = 1y = 2x,y = y,x+yx, y(2, 3)# Example 2x = 1y = 2x = yy = x+yx, y(2, 4)5. model learner parameter :: pct_start: A: Percentage of total number of epochs when learning rate rises during one cycle. Q: Sorry, I still confused that one cycle in the new API only runs one epoch. How the percentage of total number of epochs works? Can you give a example? If learn. fit_one_cycle(10, slice(1e-4,1e-3,1e-2), pct_start=0. 05)??A: Ok, strictly correct answer would be percentage of iterations, so you can have lr both increase and decrease during same epoch. In your example, say, you have 100 iterations per epoch, then for half an epoch (0. 05 * (10 * 100) = 50) lr will rise, then slowly decrease. Q2: Thanks for this explanation … so essentially, it is the percentage of overall iterations where the LR is increasing, correct? So, given the default of 0. 3, it means that your LR is going up for 30% of your iterations and then decreasing over the last 70%. Is that a correct summation of what is happening? A2: Yes, I think that’s correct. You can verify that by changing its value and check:learn. recorder. plot_lr() For example if pct_start = 0. 2 source: forums. fastai "
+ }, {
+ "id": 14,
"url": "http://localhost:4000/2020/03/note08-fastai-4/",
"title": "Gradient backward, Chain Rule, Refactoring",
- "body": "2020/03/02 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring” Lecture 08 - Deep Learning From Foundations-part2 “ Homework: calculus for machine learning einsum conventionCONTENTS: Foundation version Gradients backward pass decompose function chain rule with code check the result using Pytorch autograd Refactor model Layers as classes Modue. forward() Without einsum nn. Linear and nn. Module Forward process Foundation version: Gradients backward pass: Gradients is output with respect to parameter we’ve done this work in this path(below) to simplify this calculus, we can just change it into, So, you should know of the derivative of each bit on its own, and then you multiply them all together. As a result, it would be over cross over the data. So you can get gradient, output with respect to parameter What order should we calculate? BTW, why Jeremy wrote , not Loss function?1 decompose function We want to get derivative of which forms But, we have a estimation of answer (we call it y hat) now So, I will decompose funciton to trace target variable. Using the above forward pass, we can suppose some function from the end. start from , We know MSE funciton got two parameters, output, and target . from MSE’s input we know function’s output and supposing v is input of that function, similarly, v became output of chain rule with code examplify backward process by random sampling To get a variable, I modified forward model a little def model_ping(out = 'x_train'): l1 = lin(x_train, w1, b1) # one linear layer l2 = relu(l1) # one relu layer l3 = lin(l2, w2, b2) # one more linear layer return eval(out) Be careful we don’t use mse_loss in backward process1) start with the very last function, which is loss funciton. MSE If we codify this formula,def mse_grad(inp, targ): #mse_input(1000,1), mse_targ (1000,1) # grad of loss with respect to output of previous layer inp. g = 2. * (inp. squeeze() - targ). unsqueeze(-1) / inp. shape[0] And, this can be examplified like below. Notice that input of gradient function is same with forward functiony_hat = model_ping('l3') #get value from forward modely_hat. g = ((y_hat. squeeze(-1)-y_train). unsqueeze(-1))/y_hat. shape[0]y_hat. g. shape>>> torch. Size([50000, 1]) We can just calculate using broadcasting, not using squeeze. then why should do and unsqueeze again?🎯 It’s related with random access memory(RAM). . If I don’t squeeze, (I’m using colab) it out of RAM. 2) Derivative of linear2 function This process’s weight dimensions defined by axis=1, axis=2. axis=0 dimension means size of data. This will be summazed by . sum(0) method. unsqeeze(-1)&unsqeeze(1) seperates the dimension, and make a dot product, and vanish axis=0 dimension. def lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowlin2 = model_ping('l2'); #get value from forward modellin2. g = y_hat. g@w2. t(); w2. g = (lin2. unsqueeze(-1) * y_hat. g. unsqueeze(1)). sum(0);b2. g = y_hat. g. sum(0);lin2. g. shape, w2. g. shape, b2. g. shape>>> torch. Size([50000, 50])torch. Size([50, 1])torch. Size([1]) Notice going reverse order, we’re passing in gradient backward3) derivative of ReLU def relu_grad(inp, out): # grad of relu with respect to input activations inp. g = (inp>0). float() * out. g Examplified belowlin1=model_ping('l1') #get value from forward modellin1. g = (lin1>0). float() * lin2. g;lin1. g. shape>>> torch. Size([50000, 50])4) Derivative of linear1 Same process with 2) but, this process’s weight hasdef lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowx_train. g = lin1. g @ w1. t(); w1. g = (x_train. unsqueeze(-1) * lin1. g. unsqueeze(1)). sum(0); b1. g = lin1. g. sum(0);x_train. g. shape, w1. g. shape, b1. g. shape>>> torch. Size([50000, 784])torch. Size([784, 50])torch. Size([50])5) Then it goes backward pass def forward_and_backward(inp, targ): # forward pass: l1 = inp @ w1 + b1 l2 = relu(l1) out = l2 @ w2 + b2 # we don't actually need the loss in backward! loss = mse(out, targ) # backward pass: mse_grad(out, targ) lin_grad(l2, out, w2, b2) relu_grad(l1, l2) lin_grad(inp, l1, w1, b1)Version 1 (Basic)- Wall time: 1. 95 s Summary Notice that output of function at forward pass became input of backward pass backpropagation is just the chain rule value loss (loss=mse(out,targ)) is not used in gradient calcuation. Because, it doesn’t appear with the weight. w1g, w2g, b1g, b2g, ig will be used for optimizercheck the result using Pytorch autograd require_grad_ is the magical function, which can automatic differentiation. 2 This magical auto gradified tensor keep track what happend in forward (taking loss function), and do the backward3 So it saves our time to differentiate ourselves ⤵️ THis is benchmark…. . Version 2 (torch autograd)- Wall time: 3. 81 µs Refactor model: Amazingly, just refactoring our main pieces, it comes down up to Pytorch package. 🌟 Implement yourself, Practice, practice, practice! 🌟 Layers as classes: Relu and Linear are layers in oue neural net. -> make it as classes For the forward, using __call__ for the both of forward & backward. Because ‘call’ means we treat this as a function. class Lin(): def __init__(self, w, b): self. w,self. b = w,b def __call__(self, inp): self. inp = inp self. out = inp@self. w + self. b return self. out def backward(self): self. inp. g = self. out. g @ self. w. t() # Creating a giant outer product, just to sum it, is inefficient! self. w. g = (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) self. b. g = self. out. g. sum(0) Remember that in lin_grad function, we save bias&weight!!!!!💬 inp. g : gradient of the output with respect to the input. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 w. g : gradient of the output with respect to the weight. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 b. g : gradient of the output with respect to the bias. {: style=”color:grey; font-size: 90%; text-align: center;”} class Model(): def __init__(self, w1, b1, w2, b2): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ) def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() refer to Jeremy’s Model class, he put layers in list Dionne’s self-study note: Decomposing Jeremy’s Model class init needs weight, bias but not x data when call that class(a. k. a function) it gave x data and y label! jeremy composited function in layers. x = l(x) so concise…. . also utilized that layer list when backward ust reversing it (using python list’s method) And he is recursively calling the function on the result of the previous thing. ⬇️for l in self. layers: x = l(x)Q2: Don’t I need to declare magical autograd function, requires_grad_?{: style=”color:red; font-size: 130%; text-align: center;”} [The questions migrated to this article] Version 3 (refactoring - layer to class)- Wall time: 5. 25 µs Modue. forward(): Duplicate code makes execution time slow. Role of __call__ changed. No more __call__ for implementing forward pass. By initializing the forward with __call__, Module. forward() use overriding to maximize reusability. So any layer inherit Module, can use parent’s function. gradient of the output with respect to the weight (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) can be reexpressed using einsum, torch. einsum( bi,bj->ij , inp, out. g) Defining forward and Module enables Pytorch to out almost duplicatesVersion 4 (Module & einsum)- Wall time: 4. 29 µs Q2: Isn’t there any way to use broadcasting? Why we should use outer product?{: style=”color:red; font-size: 130%; text-align: center;”} Without einsum: Replacing einsum to matrix product is even more faster. torch. einsum( bi,bj->ij , inp, out. g)can be reexpressed using matrix product, inp. t() @ out. gVersion 5 (without einsum)- Wall time: 3. 81 µs nn. Linear and nn. Module: Torch’s package nn. Linear and nn. Module Version 6 (torch package)- Wall time: 5. 01 µs Final, Using torch. nn. Linear & torch. nn. Module~~~pythonclass Model(nn. Module): def init(self, n_in, nh, n_out): super(). init() self. layers = [nn. Linear(n_in,nh), nn. ReLU(), nn. Linear(nh,n_out)] self. loss = mse def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x. squeeze(), targ)class Model(): def init(self): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ)def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() ~~~ Footnote: fast. ai forums Lesson-8 ↩ pytorch docs - autograd ↩ stackoverflow - finding methods a object has ↩ "
+ "body": "2020/03/02 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring ” Lecture 08 - Deep Learning From Foundations-part2 “ Homework: calculus for machine learning einsum conventionCONTENTS: Foundation version Gradients backward pass decompose function chain rule with code check the result using Pytorch autograd Refactor model Layers as classes Modue. forward() Without einsum nn. Linear and nn. Module Forward process Foundation version: Gradients backward pass: Gradients is output with respect to parameter we’ve done this work in this path(below) to simplify this calculus, we can just change it into, So, you should know of the derivative of each bit on its own, and then you multiply them all together. As a result, it would be over cross over the data. So you can get gradient, output with respect to parameter What order should we calculate? BTW, why Jeremy wrote , not Loss function?1 decompose function We want to get derivative of which forms But, we have a estimation of answer (we call it y hat) now So, I will decompose funciton to trace target variable. Using the above forward pass, we can suppose some function from the end. start from , We know MSE funciton got two parameters, output, and target . from MSE’s input we know function’s output and supposing v is input of that function, similarly, v became output of chain rule with code examplify backward process by random sampling To get a variable, I modified forward model a little def model_ping(out = 'x_train'): l1 = lin(x_train, w1, b1) # one linear layer l2 = relu(l1) # one relu layer l3 = lin(l2, w2, b2) # one more linear layer return eval(out) Be careful we don’t use mse_loss in backward process1) start with the very last function, which is loss funciton. MSE If we codify this formula,def mse_grad(inp, targ): #mse_input(1000,1), mse_targ (1000,1) # grad of loss with respect to output of previous layer inp. g = 2. * (inp. squeeze() - targ). unsqueeze(-1) / inp. shape[0] And, this can be examplified like below. Notice that input of gradient function is same with forward functiony_hat = model_ping('l3') #get value from forward modely_hat. g = ((y_hat. squeeze(-1)-y_train). unsqueeze(-1))/y_hat. shape[0]y_hat. g. shape>>> torch. Size([50000, 1]) We can just calculate using broadcasting, not using squeeze. then why should do and unsqueeze again?🎯 It’s related with random access memory(RAM). . If I don’t squeeze, (I’m using colab) it out of RAM. 2) Derivative of linear2 function This process’s weight dimensions defined by axis=1, axis=2. axis=0 dimension means size of data. This will be summazed by . sum(0) method. unsqeeze(-1)&unsqeeze(1) seperates the dimension, and make a dot product, and vanish axis=0 dimension. def lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowlin2 = model_ping('l2'); #get value from forward modellin2. g = y_hat. g@w2. t(); w2. g = (lin2. unsqueeze(-1) * y_hat. g. unsqueeze(1)). sum(0);b2. g = y_hat. g. sum(0);lin2. g. shape, w2. g. shape, b2. g. shape>>> torch. Size([50000, 50])torch. Size([50, 1])torch. Size([1]) Notice going reverse order, we’re passing in gradient backward3) derivative of ReLU def relu_grad(inp, out): # grad of relu with respect to input activations inp. g = (inp>0). float() * out. g Examplified belowlin1=model_ping('l1') #get value from forward modellin1. g = (lin1>0). float() * lin2. g;lin1. g. shape>>> torch. Size([50000, 50])4) Derivative of linear1 Same process with 2) but, this process’s weight hasdef lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowx_train. g = lin1. g @ w1. t(); w1. g = (x_train. unsqueeze(-1) * lin1. g. unsqueeze(1)). sum(0); b1. g = lin1. g. sum(0);x_train. g. shape, w1. g. shape, b1. g. shape>>> torch. Size([50000, 784])torch. Size([784, 50])torch. Size([50])5) Then it goes backward pass def forward_and_backward(inp, targ): # forward pass: l1 = inp @ w1 + b1 l2 = relu(l1) out = l2 @ w2 + b2 # we don't actually need the loss in backward! loss = mse(out, targ) # backward pass: mse_grad(out, targ) lin_grad(l2, out, w2, b2) relu_grad(l1, l2) lin_grad(inp, l1, w1, b1)Version 1 (Basic)- Wall time: 1. 95 s Summary Notice that output of function at forward pass became input of backward pass backpropagation is just the chain rule value loss (loss=mse(out,targ)) is not used in gradient calcuation. Because, it doesn’t appear with the weight. w1g, w2g, b1g, b2g, ig will be used for optimizercheck the result using Pytorch autograd require_grad_ is the magical function, which can automatic differentiation. 2 This magical auto gradified tensor keep track what happend in forward (taking loss function), and do the backward3 So it saves our time to differentiate ourselves Postfix underscore means in pytorch, in-place function, What is in-place function?⤵️ THis is benchmark…. . Version 2 (torch autograd)- Wall time: 3. 81 µs Refactor model: Amazingly, just refactoring our main pieces, it comes down up to Pytorch package. 🌟 Implement yourself, Practice, practice, practice! 🌟 Layers as classes: Relu and Linear are layers in oue neural net. -> make it as classes For the forward, using __call__ for the both of forward & backward. Because ‘call’ means we treat this as a function. class Lin(): def __init__(self, w, b): self. w,self. b = w,b def __call__(self, inp): self. inp = inp self. out = inp@self. w + self. b return self. out def backward(self): self. inp. g = self. out. g @ self. w. t() # Creating a giant outer product, just to sum it, is inefficient! self. w. g = (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) self. b. g = self. out. g. sum(0) Remember that in lin_grad function, we save bias&weight!!!!!💬 inp. g : gradient of the output with respect to the input. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 w. g : gradient of the output with respect to the weight. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 b. g : gradient of the output with respect to the bias. {: style=”color:grey; font-size: 90%; text-align: center;”} class Model(): def __init__(self, w1, b1, w2, b2): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ) def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() refer to Jeremy’s Model class, he put layers in list Dionne’s self-study note: Decomposing Jeremy’s Model class init needs weight, bias but not x data when call that class(a. k. a function) it gave x data and y label! jeremy composited function in layers. x = l(x) so concise…. . also utilized that layer list when backward ust reversing it (using python list’s method) And he is recursively calling the function on the result of the previous thing. ⬇️for l in self. layers: x = l(x)Q2: Don’t I need to declare magical autograd function, requires_grad_?{: style=”color:red; font-size: 130%; text-align: center;”} [The questions migrated to this article] Version 3 (refactoring - layer to class)- Wall time: 5. 25 µs Modue. forward(): Duplicate code makes execution time slow. Role of __call__ changed. No more __call__ for implementing forward pass. By initializing the forward with __call__, Module. forward() use overriding to maximize reusability. So any layer inherit Module, can use parent’s function. gradient of the output with respect to the weight (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) can be reexpressed using einsum, torch. einsum( bi,bj->ij , inp, out. g) Defining forward and Module enables Pytorch to out almost duplicatesVersion 4 (Module & einsum)- Wall time: 4. 29 µs Q2: Isn’t there any way to use broadcasting? Why we should use outer product?{: style=”color:red; font-size: 130%; text-align: center;”} Without einsum: Replacing einsum to matrix product is even more faster. torch. einsum( bi,bj->ij , inp, out. g)can be reexpressed using matrix product, inp. t() @ out. gVersion 5 (without einsum)- Wall time: 3. 81 µs nn. Linear and nn. Module: Torch’s package nn. Linear and nn. Module Version 6 (torch package)- Wall time: 5. 01 µs Final, Using torch. nn. Linear & torch. nn. Module~~~pythonclass Model(nn. Module): def init(self, n_in, nh, n_out): super(). init() self. layers = [nn. Linear(n_in,nh), nn. ReLU(), nn. Linear(nh,n_out)] self. loss = mse def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x. squeeze(), targ)class Model(): def init(self): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ)def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() ~~~ Footnote: fast. ai forums Lesson-8 ↩ pytorch docs - autograd ↩ stackoverflow - finding methods a object has ↩ "
}, {
- "id": 13,
+ "id": 15,
"url": "http://localhost:4000/2020/03/note08-fastai-3/",
"title": "Implement forward&backward pass from scratch",
"body": "2020/03/01 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring1. The forward and backward passes: 1. 1 Normalization: train_mean,train_std = x_train. mean(),x_train. std()>>> train_mean,train_std(tensor(0. 1304), tensor(0. 3073))Remember! Dataset, which is x_train, mean and standard deviation is not 0&1. But we need them to be which means we should substract means and divide data by std. You should not standarlize validation set because training set and validation set should be aparted. after normalize, mean is close to zero, and standard deviation is close to 1. 1. 2 Variable definition: n,m: size of the training set c: the number of activations we need in our model2. Foundation Version: 2. 1 Basic architecture: Our model has one hidden layer, output to have 10 activations, used in cross entropy. But in process of building architecture, we will use mean square error, output to have 1 activations and lator change it to cross entropy number of hidden unit; 50see below pic We want to make w1&w2 mean and std be 0&1. why initializating and make mean zero and std one is important? paper highlighting importance of normalisation - training 10,000 layer network without regularisation1 2. 1. 1 simplified kaiming initQ: Why we did init, normalize with only validation data? Because we can not handle and get statistics from each value of x_valid?{: style=”color:red; font-size: 130%; text-align: center;”} what about hidden(first) layer?w1 = torch. randn(m,nh)b1 = torch. zeros(nh)t = lin(x_valid, w1, b1) # hidden>>> t. mean(), t. std()((tensor(2. 3191), tensor(27. 0303))In output(second) layer, w2 = torch. randn(nh,1)b2 = torch. zeros(1)t2 = lin(t, w2, b2) # output>>> t2. mean(), t2. std()(tensor(-58. 2665), tensor(170. 9717)) which is terribly far from normalzed value. But if we apply simplified kaiming init w1 = torch. randn(m,nh)/math. sqrt(m); b1 = torch. zeros(nh)w2 = torch. randn(nh,1)/math. sqrt(nh); b2 = torch. zeros(1)t = lin(x_valid, w1, b1)t. mean(),t. std()>>> (tensor(-0. 0516), tensor(0. 9354)) But, actually, we use activations not only linear function After applying activations relu at linear layer, mean and deviation became 0. 5. 2. 1. 2 Glorrot initializationPaper2: Understanding the difficulty of training deep feedforward neural networks Gaussian(, bell shaped, normal distributions) is not trained very well. How to initialize neural nets? with the size of layer , the number of filters . But there is No acount for import of ReLU If we got 1000 layers, vanishing gradients problem emerges2. 1. 3 Kaiming initializatingPaper3: Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Kaiming He, explained here rectifier: rectified linear unit rectifier network: neural network with rectifier linear units This is kaiming init, and why suddenly replace one to two on a top? to avoid vanishing gradient(weights) But it doesn’t give very nice mean tough. 2. 1. 4 Pytorch package Why fan_out? according to pytorch documentation, choosing 'fan_in' preserves the magnitude of the variance of the wights in the forward pass. choosing 'fan_out' preserves the magnitues in the backward pass(, which means matmul; with transposed matrix) ➡️ in the other words, torch use fan_out cz pytorch transpose in linear transformaton. What about CNN in Pytorch?I tried torch. nn. Conv2d. conv2d_forward?? Jeremy digged into using torch. nn. modules. conv. _ConvNd. reset_parameters?? 2 in Pytorch, it doesn’t seem to be implemented kaiming init in right formula. so we should use our own operation. But actually, this has been discussed in Pytorch community before. 3 4 Jeremy said it enhanced variance also, so I sampled 100 times and counted better results. To make sure the shape seems sensible. check with assert. (remember we will replace 1 to 10 in cross entropy)assert model(x_valid). shape==torch. Size([x_valid. shape[0],1])>>> model(x_valid). shape(10000, 1) We have made Relu, init, linear, it seems we can forward pass code we need for basic architecture nh = 50def lin(x, w, b): return x@w + b;w1 = torch. randn(m,nh)*math. sqrt(2. /m ); b1 = torch. zeros(nh)w2 = torch. randn(nh,1); b2 = torch. zeros(1)def relu(x): return x. clamp_min(0. ) - 0. 5t1 = relu(lin(x_valid, w1, b1))def model(xb): l1 = lin(xb, w1, b1) l2 = relu(l1) l3 = lin(l2, w2, b2) return l32. 2 Loss function: MSE: Mean squared error need unit vector, so we remove unit axis. def mse(output, targ): return (output. squeeze(-1) - targ). pow(2). mean() In python, in case you remove axis, you use ‘squeeze’, or add axis use ‘unsqueeze’ torch. squeeze where code commonly broken. so, when you use squeeze, clarify dimension axis you want to removetmp = torch. tensor([1,1])tmp. squeeze()>>> tensor([1, 1]) make sure to make as float when you calculateBut why??? because it is tensor?{: style=”color:red; font-size: 130%;”} Here’s the error when I don’t transform the data type ---------------------------------------------------------------------------TypeError Traceback (most recent call last)<ipython-input-22-ae6009bef8b4> in <module>()----> 1 y_train = get_data()[1] # call data again 2 mse(preds, y_train)TypeError: 'map' object is not subscriptable This is forward passFootnote: Other materials: Understanding the difficulty of training deep feedforward neural networks, paper that introduced Xavier initialization Fixup Initialization: Residual Learning Without Normalization ↩ Pytorch implementaion on Kaiming init of conv and linear layers ↩ Pytorch kaiming init issue ↩ Pytorch kaiming init explained ↩ "
}, {
- "id": 14,
+ "id": 16,
"url": "http://localhost:4000/2020/03/note08-fastai-2/",
"title": "What's inside Pytorch Operator?",
"body": "2020/03/01 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, RefactoringWhat’s inside Pytorch Operator?: Section02 Time comparison with pure Python: Matmul with broadcasting> 3194. 95 times faster Einstein summation> 16090. 91 times faster Pytorch’s operator> 49166. 67 times faster 1. Elementwise op: 1. 1 Frobenius norm: above converted into (m*m). sum(). sqrt() Plus, don’t suffer from mathmatical symbols. He also copy and paste that equations from wikipedia. and if you need latex form, download it from archive. 2. Elementwise Matmul: What is the meaning of elementwise? We do not calculate each component. But all of the component at once. Because, length of column of A and row of B are fixed. How much time we saved? So now that takes 1. 37ms. We have removed one line of code and it is a 178 times faster…#TODOI don’t know where the 5 from. but keep it. Maybe this is related with frobenius norm…?as a result, the code before for k in range(ac): c[i,j] += a[i,k] + b[k,j]the code after c[i,j] = (a[i,:] * b[:,j]). sum()To compare it (result betweet original and adjusted version) we use not test_eq but other function. The reason for this is that due to rounding errors from math operations, matrices may not be exactly the same. As a result, we want a function that will “is a equal to b within some tolerance” #exportdef near(a,b): return torch. allclose(a, b, rtol=1e-3, atol=1e-5)def test_near(a,b): test(a,b,near)test_near(t1, matmul(m1, m2))3. Broadcasting: Now, we will use the broadcasting and removec[i,j] = (a[i,:] * b[:,j]). sum() How it works?>>> a=tensor([[10,10,10], [20,20,20], [30,30,30]])>>> b=tensor([1,2,3,])>>> a,b (tensor([[10, 10, 10], [20, 20, 20], [30, 30, 30]]),tensor([1, 2, 3])) >>> a+btensor([[11, 12, 13], [21, 22, 23], [31, 32, 33]]) <Figure 2> demonstrated how array b is broadcasting(or copied but not occupy memory) to compatible with a. Refered from numpy_tutorial there is no loop, but it seems there is exactly the loop. This is not from jeremy (actually after a moment he cover it) but i wondered How to broadcast an array by columns? c=tensor([[1],[2],[3]])a+ctensor([[11, 11, 11], [22, 22, 22], [33, 33, 33]])s What is tensor. stride()?help(t. stride)Help on built-in function stride: stride(…) method of torch. Tensor instancestride(dim) -> tuple or intReturns the stride of :attr:’self’ tensor. Stride is the jump necessary to go from one element to the next one in the specified dimension :attr:’dim’. A tuple of all strides is returned when no argument is passed in. Otherwise, an integer value is returned as the stride in the particular dimension :attr:’dim’. Args: dim (int, optional): the desired dimension in which stride is requiredExample::* x = torch. tensor([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])`x. stride()>>> (5, 1)x. stride(0)>>> 5x. stride(-1)>>> 1 unsqueeze & None index We can manipulate rank of tensor Special value ‘None’, which means please squeeze a new axis here== please broadcast herec = torch. tensor([10,20,30])c[None,:] in c, squeeze a new axis in here please. 2. 2 Matmul with broadcasting: for i in range(ar):# c[i,j] = (a[i,:]). *[:,j]. sum() #previous c[i] = (a[i]. unsqueeze(-1) * b). sum(dim=0) And Using None also (As howard teached)c[i] = (a[i ]. unsqueeze(-1) * b). sum(dim=0) #howardc[i] = (a[i][:,None] * b). sum(dim=0) # using Nonec[i] = (a[i,:,None]*b). sum(dim=0)⭐️Tips🌟 1) Anytime there’s a trailinng(final) colon in numpy or pytorch you can delete it ex) c[i, :] = c [i]2) any number of colon commas at the start, you can switch it with the single elipsis. ex) c[:,:,:,:,i] = c […,i] 2. 3 Broadcasting Rules: What if we tensor. size([1,3]) * tensor. size([3,1])? torch. Size([3, 3]) What is scale???? What if they are one array is times of the other array? ex) Image : 256 x 256 x 3Scale : 128 x 256 x 3Result: ? Why I did not inserted axis via None, but happened broadcasting? >>> c * c[:,None]tensor([[100. , 200. , 300. ], [200. , 400. , 600. ], [300. , 600. , 900. ]])maybe it broadcast cz following array has 3 rows as same principle, no matter what nature shape was, if we do the operation tensor broadcasts to the other. >>> c==c[None]tensor([[True, True, True]])>>> c[None]==c[None,:]tensor([[True, True, True]])>>>c[None,:]==ctensor([[True, True, True]])3. Einstein summation: Creates batch-wise, remove inner most loop, and replaced it with an elementwise producta. k. ac[i,j] += a[i,k] * b[k,j]inner most loop c[i,j] = (a[i,:] * b[:,j]). sum()elementwise product Because K is repeated so we do a dot product. And it is torch. Usage of einsum()1) transpose2) diagnalisation tracing3) batch-wise (matmul) … einstein summation notationdef matmul(a,b): return torch. einsum('ik,kj->ij', a, b)so after all, we are now 16000 times faster than Python. 4. Pytorch op: 49166. 67 times faster than pure python And we will use this matrix multiplication in Fully Connect forward, with some initialized parameters and ReLU. But before that, we need initialized parameters and ReLU, Footnote: TensorRank ti noteResources: Frobenius Norm Review Broadcasting Review (especially Rule) Refer colab! (I totally confused with extension of arrays) torch. allclose Review np. einsum Reviewh "
}, {
- "id": 15,
+ "id": 17,
"url": "http://localhost:4000/2020/02/note08-fastai-1/",
"title": "What is the meaning of 'deep-learning from foundations?'",
"body": "2020/02/29 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring” Lecture 08 - Deep Learning From Foundations-part2 “ I don’t know if you read this article, but I heartily appreciate Rachael Thomas and Jeremy Howard for providing these priceless lectures for free Homework: Review concepts 16 concepts from Course 1 (lessons 1 - 7)(1) Affine Functions & non-linearities; 2) Parameters & activations; 3) Random initialization & transfer learning; 4) SGD, Momentum, Adam; 5) Convolutions; Batch-norm; 6) Dropout; 7) Data augmentation; 8) Weight decay; 9) Res/dense blocks; 10) Image classification and regression; 11)Embeddings; 12) Continuous & Categorical variables; 13) Collaborative filtering; 14) Language models; 15) NLP classification; 16) Segmentation; U-net; GANS) Make sure you understand broadcasting Read section 2. 2 in Delving Deep into Rectifiers Try to replicate as much of the notebooks as you can without peeking; when you get stuck, peek at the lesson notebook, but then close it and try to do it yourself calculus for machine learning based on weight… einsum conventionCONTENTS: What is going on in this course? What is ‘from foundations’? Steps to a basic modern CNN model Today’s implementation goal: 1) matmul -> 4) FC backward Library development using jupyter notebook jupyter notebook certainly can make module Elementwise ops How can we make python faster? What is element wise operation? FootnoteWhat is going on in this course?: What is ‘from foundations’?: 1) Recreate fast. ai and Pytorch 2) using pure python Evade OverfittingOverfit : validation error getting worsetraining loss < validation loss Know the name of the symbol you usefind in this page if you don’t know the symbol that you are using or just draw it here (run by ML!) Steps to a basic modern CNN model: 1) Matrix multiplication -> 2) Relu/Initialization -> 3) Fully-connected Forward-> 4) Fully-connected Backward -> 5) Train loop -> 6) Convolution-> 7) Optimization ->8) Batchnormalization -> 9) Resnet Today’s implementation goal: 1) matmul -> 4) FC backward: Library development using jupyter notebook: what is assers? jupyter notebook certainly can make module: There will be #export tag that Howard (and we) want to extract special notebook2script. py will detect sign of #expert and convert following into python module and test ittest\_eq(TEST,'test')test\_eq(TEST,'test1') what is run_notebook. py? when you want to test your module in command line interface !python run\_notebook. py 01_matmul. ipynb Is there any difference between 1) and 2)?1) test -> test01 2) test01 -> test #TODO I don’t know yet look into run_notebook. py, package fire Jeremy used. What is that?read and run the code in a notebook, and in the process, Jeremy made Python Fire library called!shockingly, fire takes any kind of function and converts into CLI command. fire library was released by Google open source, Thursday, March 2, 2017 Get data pytorch and numpy are pretty much same. variable c explains how many pixels there are in in MNIST, 28 pixels PyTorch’s view() method: torch function that manipulating tensor, and squeeze() in torch & mathmatical operation similar function Rao & McMahan said usually this functions result in feature vector. In part 1, you can use view function several times. Initial python model Which is Linear, like $Xw$(weight)$+a$(bias) $= Y$ If you don’t know hou to multiple matrix, refer this site matmul visulization site How many time spends if we we use pure python function matmul, typical matrix multiplication function, takes about 1 second for calculating 1 single train data! (maybe assumed stochastic, 5 data points in validation) it takes about 11. 36 hours to update parameters even single layer and 1 iteration! (if that was my computer, it would be 14 hours. . )🤪 THIS is why we need to consider ‘time’&’space’ This is kinda slow - what if we could speed it up by 50,000 times? Let’s try! Elementwise ops: How can we make python faster?: If we want to calculate faster, then do remove pythonic calcuation, by passing its computation down to something that is written something other than python, like pytorch. According to PyTorch doc it uses C++ (via ATen), so we are going to implement that function with python. What is element wise operation?: items makes a pair, operate corresponding componentFootnote: notebooks material video broadcasting excel"
}, {
- "id": 16,
+ "id": 18,
"url": "http://localhost:4000/2020/02/what-is-convolution/",
"title": "Digging into convolution",
"body": "2020/02/28 - Issues 1) Kaiming Initializtion in Pytorch was in trouble. 1 2) Jeremy started to dig in, in lesson09, but I didn’t know why the size of tensor is 2 and even understand this spreadsheet data. 3 Homework: Read Visualizing and Understanding Convolutional Networks paper What is a convolution? Visualization one kernel Matthew D Zeiler & Rob Fergus Paper Convolution can be represented as matmul Padding Kernel has rank 3 How can we find a side-edge, a gradient and area of constant weight? What is a convolution?: A convolutional neural network is that your red, green, and blue pixels go into the simple computation, and something comes out of that, and then the result of that goes into a second layer, and the result of that goes into the third layer and so forth. Visualization: one kernel Refer this site for visualizing CNN filteringMatthew D Zeiler & Rob Fergus PaperLecture01 Nine examples of the actual coefficients from the **first layer** Convolution can be represented as matmul: CNNs from different viewpoints {align-items: center;} [A B C D E F G H I J] is 3 by 3 image data flatten to vector. As a result, convolution is a just matrix just two things happens Some of entries are set to zeros at all the times same color always have the same weight. That called weight time / wegith sharing So, we can implement a convolution with matrix multiplication. But, we don’t do that because it’s slow!Padding: What most of libraries do is just put zeros asdie of matrix fast. ai uses reflection paddings (what is this? Jeremy said he uttered it)Kernel has rank 3: As standard picture input would be 4 5, it would be actually 3d, not 2d. If we make kernel as a 3x3 size, we pass over same kernel all the different Red, Green, Blue Pixels. This could make problem, because, if we want to detect frog, which is green, we would want more activations on the green(I made a test cell in my colab 6) How can we find a side-edge, a gradient and area of constant weight?: Not top-edge! One kernel can find only the top-edge, so we should stack the kernels 7 So, we pass it through bunch of kernels to the input images, and that process gives us height x width x corresponding number of kernels. Usually that number of chanel is 16 And if we want to get the more channels and features, we should repeat that process This process gives rise to memory out of control, we do the stride #### conv-example. xlsx 2 convolutional filters At a second layer, filter is 3x3x2 tensor, because to add up together the first layer’s channel. Reference: Problem was math. sqrt(5) was not kaiming initialization formula, Implementation in Pytorch ↩ size of tensor, lecture09 ↩ conv-example. xlsx ↩ Why do computer use red, green and blue instead of primary colors ↩ Grayscale is a group of shades without any visible color. … Each of these dots has its own brightness level as well and, therefore, can be converted to grayscale. A grayscale image is one with all color information removed. ↩ Testing RGB and grayscale ↩ stack kernel and make new rank of tensor at output, Lesson06-2019 ↩ "
}, {
- "id": 17,
+ "id": 19,
"url": "http://localhost:4000/2020/02/dps-week8/",
- "title": "Digital Product School week 8&9",
- "body": "2020/02/24 - The 8th week retropect at Digital Product School Week 8/9 - Ship your MVP/Release next iteration each day This week's schedule CONTENT: Preparing engineering weekly Agile Process Daily Stand-up Making application flowchart (feat draw. io) / ER diagram Flowchart, understaning user journey ER diagram Engineering weekly AI lunch Connecting firebase andPreparing engineering weekly: This week at Wednesday, I planned to explain the Language Modelings, mainly focusing ELMo, ULMFiT, BERT and GPT-2. Slides is available here Changed the presentation, because there were people who are not in ML domain. hereWhenever I do the presentation, I learn more than the information I give them. At the same time, I realize I need to learn more than I know. Agile Process: One of a priceless lesson I learnt from digital product school, was experience of doing agile work. Before I came here, it was a little bit vague concept. I’m not sure ‘what is agile’ but this is what we tried to make agile process. Daily Stand-up: Sharing the works everyday helps interdisciplinary team to work better. Since product started to get higher fidelity, the gap between engineer and non-engineer increased. Actually I didn’t planned to explain concept because I thougth I would be lose my audience when I start to explain. But as daily stand-up, which shares our progess, goes day by day, I planed and reported the issues. And it made each other’s topic feel more familiar. I think point is very important, because at that point people start to be curious. So we can actively ask to the others, and that momwnr, we can explain the point teammate dosen’t know. Each color means every different section. Red: Our team goal, Blue: Interaction designer, Green: Product manager, Yellow: Software/AI engineer This week engineer's main plan Each of us try to explain what we are doing, but things become easier when we are asked. Because we explained something was important to us before, but if we asked it is something important for the others. Making application flowchart (feat draw. io) / ER diagram: Before we start the party, we should clarify the flowchart and ER diagram of our application. Flowchart, understaning user journey: Thanks for google, we could use draw. io for our framechart framework. Actually, we cana choice other good flatform, but draw. io has connected app throgh google drive, most of our engineer was used to it. And after this job, I got to know there is also (of course) rule with the symbols, color, size, space, scaling and direction of arrow -reference. But why we should do this? WE have made our storymap before!! I think storymap is for visualize our status and app. So it should be shared with whole the team, and they should able to understand each role’s issue. But flowchart is more like testing technical feasibility, and error that user can experience. So it could be little more specific, complicated, and hypothetical. This week engineer's main plan ER diagram: Even if we use NoSQL database through firebase, my team was accustomed to SQL more. That what we educated when we were at college, so we had to organize our concept while we were learning NoSQL. Engineering weekly: Every engineering weekly we exchange our knowledge each other so that we can grow together. Before today, my AI collegues presented regression, knn and it was my turn. I prepared slide that explain about pre-trained language model, but my header advised me if I go deep of theoretical things, I would lose my audience. So I decided to brief BERT mode, how I can contribute to other team’s project. Since BERT was breakthrough of NLP industry, I tried to explain how it can be applied to hands on product and how it can help people in their product. The result was quite motivative to me. They gave feedback that since it wasn’t that much theoretical, they could enjoy it, and useful information. Someone asked me do I had learned of presentation before. I was really happy with their feedback! AI lunch: Connecting firebase and: "
+ "title": "My life in Digital Product School - week 8/19/10",
+ "body": "2020/02/24 - The 8/9/10th week retropect at Digital Product School Week 8 - Ship your MVPWeek 9/10 - Release next iteration each day Week 8th schedule CONTENT: Agile Product Development Daily Stand-up(planning) Gemba Walk Sprint Reviews Engineering weeklyAgile Product Development: One of a priceless lesson I learnt from digital product school, was experience of doing agile work. Before I came here, it was a little bit vague concept. I’m still not sure ‘what is agile’ but this is how we tried to make agile process. Daily Stand-up(planning): Sharing the works everyday helps interdisciplinary team to work better. Since product started to get higher fidelity, the gap between engineer and non-engineer increased. Actually I didn’t planned to explain concept because I thougth I would be lose my audience when I start to explain. But as daily stand-up, which shares our progess, goes day by day, I planed and reported the issues. And it made each other’s topic feel more familiar. I think point is very important, because at that point people start to be curious. So we can actively ask to the others, and that momwnr, we can explain the point teammate dosen’t know. Each color means every different section. Red: Our team goal, Blue: Interaction designer, Green: Product manager, Yellow: Software/AI engineer This week engineer's main plan Each of us try to explain what we are doing, but things become easier when we are asked. Because we explained something was important to us before, but if we asked it is something important for the others. Gemba Walk: Team Cero with core team Every 2 weeks, we do the Gemba work, which is ‘question everything to the core team’ time. At this period, people can ask anything related to our product, workshop, and framework. Core team will help just for each team, and each team can solve the problem related to their work. < br/>Why we need this session? because with workshop and general schedule, core team has no time just focus on each team. So through this session, we can have opportunity to understand each program and workshop, like why we are using this platform, and when is the due of our small project, and we have this problem and we need help for this. whatever small problem you have, core team is always willing to help you. Sprint Reviews: Every Friday, we have time to summarise what we did for the week. Maybe we need HMW question and our storymap to share our process and then tell and share what we did try, what point we succeeded and what point it was deviant of our prediction, and why we tried it. . Sprint of Ve-link And then, just after all team’s ppt, we do vote with such a cute marvel. Always it’s very difficult to vote (of course you can’t vote to your team!) Because it depends on criteria what do I value!But since this is process of our agile work, I try to focus on what they have changed since last week, and why they did it, how they did it. Engineering weekly: Every engineering weekly we exchange our knowledge each other so that we can grow together. Everyone have their knowledge to share and we can be tutor and at the same time can be of tutee. Previously, my AI collegues presented regression, knn. And because I’m somewhat specialized to NLP, I prepared slide that explain about pre-trained language model, but my header advised me if I go deep of theoretical things, I would lose my audience. So I decided to brief BERT mode, how I can contribute to other team’s project. Since BERT was breakthrough of NLP industry, I tried to explain how it can be applied to hands on product and how it can help people in their product. The result was quite motivative to me. They gave feedback that since it wasn’t that much theoretical, they could enjoy it, and useful information. Someone asked me do I had learned of presentation before. I was really happy with their feedback! "
}, {
- "id": 18,
+ "id": 20,
"url": "http://localhost:4000/2020/02/fast.ai-nlp-note-16/",
"title": "Algorithmic bias",
"body": "2020/02/20 - Algorithms can encode & magnify human bias Case Study 1: Facial Recognition & Predictive Policing: Joy Buolamwini & Timnit Gebru, gendershades. org Microsoft, FACE+, IBM - All of these things are sell now. Largest gap between $\therefore\ Lighter Male\ >\ Darker\ Female $ This US mayor joked cops should “mount . 50-caliber” guns where AI predicts crime With machine learning, with automation, there’s a 99% success, so that robot is ㅡwill beㅡ99% accurate in telling us what is going to happen next, which is really interesting. - city official in Lancater, CA, approving on using IBM for public security Bias: Bias is type of error Statistical Bias: difference between a statistic’s expected value and the true value Unjust Bias: disproportionate preference for or prejudice against a group Unconscious bias: bias that we don’t realize we have But, term bias is too generic to be productive. Different sources of bias have different causes Representation Bias: Dataset was not representative of the algorithm that might be used on later. Above : Data is okay, but algorithm has some problem. Below : Data has error. For example, object detection production that performs very well in common product of US. But in contrast, change of target product region, like Zimbabwe, Solomon Island, and so on, reduced the performence remarkably. It is not the algorithmic problem, so we should care about data volume of region. Evaluation Bias: Benchmark datasets spur on research, 4. 4% of IJB-A images are dark-skinned women. 2/3 of ImageNet images from the West (Sharkar et al, 2017) Case Study 2: Recidivism Algorithm Used Prison Sentencing: Case Study 3: Online Ad Delivery: Bias in NLP: ( Nothing to do with the course, but I’m researching this field these days. ) But all about Englsih ImpactThe person is doctor. The person is nurse -> 그는 의사다. 그녀는 간호사다. Concept of “biased data” often too generic to be useful: Different sources of bias have different sources Data, models and systems are not unchanging numbers on a screen. They’re the result of a complex process that starts with years of historical context and involves a series of choices and norms, from data measurement to model evaluation to human interpretation. - Harini Suresh, “The problem with Biased Data” Five Sources of Bias in ML: Representation Bias Evaluation Bias Measurement Bias Aggregation Bias(46:02) Historical Bias(46:26) A few studies(47:13) Racial Bias, Even when we have good intentions(new york times)(47:10) gender(48:59) Humans are biased, so why does algorithmic bias matter?: Algorithms & humans are used differently (humans are usually decision maker) Algorithms are accurate and objective No way to apeal if there if error processed large scale cheap Machine learning can amplify bias Machine learning can create feedback loops. Technology is power. And with that comes responsibility. Solutions: Analyze a project at work/school: Questions about AI 5 types of bias (Suresh & Guttag) Datasheets for datasets, Modelcards for model reporting Accuracy rate on different sub-groups Work with domain experts & those impacted Increase diversity in our workspace Advocate for good policy Be on the ongoing lookout for bias"
}, {
- "id": 19,
+ "id": 21,
"url": "http://localhost:4000/2020/02/classifier-city/",
"title": "Making a classifier with image dataset made from gooogle",
"body": "2020/02/15 - CONTENTS: Creating dataset from google images Using google_images_download Create ImageDataBunch Train model fit_one_cycle() Let’s find-tune Let’s train the whole model! Let’s make batch size bigger! Interpretation Model in productionCode can be found hereDeployed model here Making a classifier which can distinguish Seoul from Munich and Sanfrancisco!(hoping my well in Munich!) Creating dataset from google images: In machine learning, you always need data before you build your model. You can use either URLs or google_images_download package. Since Jeremy explained specifically, I will try the other. Using google_images_download: note: This is not google official package Refer to Official Doncument, put that arguments. from google_images_download import google_images_downloadresponse = google_images_download. googleimagesdownload() #class instantiationout_dir = os. path. abspath('. . /. . /materials/dataset/pkg/')os. mkdir(out_dir)arguments = { keywords : Cebu,Munich,Seoul , print_urls :True, suffix_keywords : city , output_directory :out_dir, type : photo , }paths = response. download(arguments) #passing the arguments to the functionprint(paths)and if you need, here is main code. Create ImageDataBunch: We need to separate validation set because we just grabbed these imagese from Google. Most of the dataset we use (kaggle/research) splited into train / validation / test so if they are not devided beforehand we should make databunch, and Jeremy recommended assign 20% to validation. Help on function verify_images in module fastai. vision. data:verify_images(path: Union[pathlib. Path, str], delete: bool = True, max_workers: int = 4, max_size: int = None, recurse: bool = False, dest: Union[pathlib. Path, str] = '. ', n_channels: int = 3, interp=2, ext: str = None, img_format: str = None, resume: bool = None, **kwargs) Check if the images in `path` aren't broken, maybe resize them and copy it in `dest`. Data from google image url Data from package Train model: len(class) len(train) len(valid) Data_url 3 432 108 Data_pkg 3 216 53 Uisng model: restnet34 1, Measurement: accuracy 2 fit_one_cycle(): What is fit one cycle? Cyclical Learning Rates for Training Neural Networks One of the way to find good learning rate. Core idea is to start with small learning rate (like 1e-4, 1e-3) and increase the learning rate after each mini-batch till loss starts exploding. And pick up learning rate one order lower than exploding point. For example, plotted learning rate is like below picture, picking up around 1e-2 is the best way. Why this methods Traditionally, the learning rate is decreased as the learning starts converging with time. But this paper suggests to cycle our learning rate, because it makes us avoid local minimum. Basically this cyclic method enables us to explore whole of loss function so that find out global minimum. In other words, higher learning rate behaves like regularisation. Let’s find-tune: Do train just one last layer by learning rate found by find_lr This section you should find the strongest downward slope that kind of sticking around for quite a while. And choose just one order lower than lowest point. As explained before, I will pick up 1e-2. And of course, this is fine-tuning, we don’t need discriminative learning rate yet. Let’s train the whole model!: link When you plot the learning rate again, maybe you will get soaring shape of learning rate. Rule of thumb, When you slice the learning rate, use learning rate you used at unfrozen part. Divide it by 5 or 10 and put it on maximum bound. At minimum bound, get the point just before it soared, and divide it by 10. Let’s make batch size bigger!: Since default batch size is 64, I tried it to 128. And it gets way more better result(even it’s still underfitting!) And if I freeze model and train whole model again, the model would be better. Also, you can use this method to the other big dataset model training! Interpretation: See the confusion matrix. Result is quite great. *Since I’m using colab, I will skip data cleansing. But I highly recommend you to use ImageCleaner widget, only if you are using jupyter notebook (not jupyter lab) Model in production: You can deploy your model in simple way. I referred fast. ai, and used render(it’s free for limited time). You can find detailed document here. and you can create a route like this. @app. route( /classify-url , methods=[ GET ])async def classify_url(request): bytes = await get_bytes(request. query_params[ url ]) img = open_image(BytesIO(bytes)) _,_,losses = learner. predict(img) return JSONResponse({ predictions : sorted( zip(cat_learner. data. classes, map(float, losses)), key=lambda p: p[1], reverse=True ) })You can find my deployed model here Reference: How to create a deep learning dataset using Google Images towardsdatascience - one cycle policy Deep Residual Learning for Image Recognition ↩ Accuracy_and_precision ↩ "
}, {
- "id": 20,
+ "id": 22,
"url": "http://localhost:4000/2020/02/dps-week5/",
"title": "Digital Product School week 5",
"body": "2020/02/09 - The 5th week retropect at Digital Product School Week 5 - Create a Storymap and sync it with Lean Canvas This week's schedule CONTENT: How to create our story map Prepare your story Discover your product’s AI potentialMondayHow to create our story map: We need this 'aha' moment There was a Milestone workshop, about our weekly goal. As we are agile working, we go fast and change every week’s goal. This week we will finalize our story map based on user’s pain-point and HMW questions. How should we make our story-map Basically we should make story map based on this rule Tell stories, don’t just write them! We always need context, that means all the story component should be connected Visualize your product to establish a shared understanding and speed up discussions! Post-it filled of text is not enough, we should fill it with visualizations then team mates can understand it fast Only discuss in front our your story map! (Speed) So we can update our story-map as soon as we change our opinion And also Use a story map to find the parts that matter most and to identify holes in your idea! Since the story map consists of techinical part, we should consider each story’s technical feasibility Minimise output, maximise outcome and impact! Build tests to figure out what’s minimum and what’s viable! This story map functions to find out our minimum value of ideas Work iteratively: Change your story map according to your learnings! We should repeat this process again and again PMs: Make sure Storymap is up to date!Prepare your story: team cero, our whole story map Our goal Technical feasibility of our storyWhat is your strategy to make user achieve something? This would be our expand point Discover your product’s AI potential: How can we apply AI to our product? Let’s write down our ‘HMW’ questions, and find out all p ossibilities. These are suggestion of possibilities, so don’t attached to feasibility (we will do in at lean start-up) Software section's expectation AI section's expectationTuesday Engineer's task, week5This 5th week, engineers settled WendesdayThursdayFriday"
}, {
- "id": 21,
+ "id": 23,
"url": "http://localhost:4000/2020/02/GPU-time/",
"title": "4 reasons took much time to setting GPU for fast.ai than I expected",
"body": "2020/02/05 - Motivation: Before now, me as a undergraduate student, I was parsimony who usually depend on colab, kaggle, friend’s server(occasional) whenever i need GPU. . And this time it’s been for a while to install GPU than I expected and I share the several component that stood in my way. Written at Oct 24 2019, if you think this is deprecated, please do not have a leap of faith. Just for the record, I’ve used Kaggle, Colab, GCP, Azure, EC2 as GPU cloud. 1. Did not know there is JupyterLab option in Google Cloud Platform. : At the first time when GCP came out, there was no AI Platform service. So from starting vm instance to launching jupyter and installing packages, I did all of the things myself. (and I learned 🤗) $ curl -O https://repo. continuum. io/archive/Anaconda3-5. 0. 1-Linux-x86_64. sh[Downloading conda in ssh] I created VM instance,selected zone, machine type and disk type. Then, define firewall rules and in ssh terminal, install jupyter and other packages. But you can do all of these things just using AI Platform. [AI Platform] I think it especially save your time if you are living in Asia-Pacific, which google doesn’t support not that much GPU resources. 2. Consider if the platform has limited resources in a region you live in. : I live in South Korea, East Asia, and it seems like this region has lots of limitation in GPU (except quite expensive AWS) And the Taiwan which was the only one region where I can launch my own VM with GPU (I tried all the other regions in the list) sometimes do normaly, but not always. 😥After launching, I did several works and next day I could not start VM. (I didn’t count it, but tried it a few hours because I didn’t want cost any more time…) Endlessly failed to start instance, then I choose to move AWS as an alternative way. 3. Fast. ai gives deliberate guide and I didn’t know it. : Fast. ai offer the guide for all available platform. (Colab, salamander, Gradient, Kaggle, Colab, and so on) It is so important, and really needs, because cloud computing options are vary as occasion and purpose arise. I didn’t know fast. ai has manual to running GCP, and I think it’s as good a reason as any for me to be have taken time. It helped me so much when I had aws and shortened my time. I don’t want to read all of the manual in amazno. . (It is recommended. . but I’d rather read GIT PRO now…) ssh -i ~/. ssh/<your_private_key_pair> -L localhost:8888:localhost:8888 ubuntu@<your instance IP>4. You should wait to add more volume just after add volume, by building AWS EC2. : Since Elastic Block Store(EBS) storage supports optimized storage, users can’t extend storage volume two times in a row. Unfortunately, at the first time, I didn’t know it (again 👻) and when VM lacked volume, I doubled dist capacity (76*2) at a rough but It needs more. <!– this time I installed GPU in two years, and it became little complicated compared to 2 years ago. And this time for the first time(maybe not the first time. . but i handled it in my class or with my friend. but it’s my first time on my own. ) I very I’m started to using used google colab, kaggleand, GCP-JupyterLab, ec2 - friend made, aws vm machine but I had a environment variable but i did not know of it. On these days, I could not get a resources from taiwan… I couldn’t notice a deliberate Anyway, as a result I tried myself gcp myself and aws ec2 with fast. ai But I think doing on my self surely takes much time (in this point I wonder why I’m doing this, and should remind me, especially I was studying disk volume optimization) disk volume exceed - https://askubuntu. com/questions/919748/no-space-left-on-device-even-though-there-is: "
}, {
- "id": 22,
+ "id": 24,
"url": "http://localhost:4000/2020/02/dps-week4/",
"title": "Digital Product School week 4",
"body": "2020/02/01 - The 4th week retropect at Digital Product School Week 4 - Find solution ideas and run experiments [This week’s schedule] CONTENT: Ideation Techniques What is ideation techniques? Generating idea in my team AIdeation Team brain storming of idea Die Produkt MacherMondayIdeation Techniques: [slides from @steffen] What is ideation techniques?: We tried to find out user’s painpoint last week. Tried to users talk about their, pain point. No question directly, but extract from them their pain with transportation. Generating idea in my team: AIdeation: TuesdayTeam brain storming of idea: Based on generated idea on Monday, we extended our idea doing rolling-paper! Die Produkt Macher: What is lean start-up? Lean startup is a methodology for developing businesses and products that aims to shorten product development cycles and rapidly discover if a proposed business model is viable; this is achieved by adopting a combination of business-hypothesis-driven experimentation, iterative product releases, and validated learning. - wikipedia WendesdayThursdayFriday"
}, {
- "id": 23,
+ "id": 25,
"url": "http://localhost:4000/2020/01/retrosprect-of-acl-paper-2020/",
"title": "Retrospect of ACL 2020 paper writing",
"body": "2020/01/29 - 2020 Annual Conference of the Association for Computational Linguistics Why I can’t use ‘Cebuano’ for the research?: Why I had to change target language from ‘Cebuano’ to ‘Tagalog’?-> No language translator options except google translation. But before knowing that I already consult my friend, whose mother tongue is English. So I had to aplogize her, but couldn’t tell her why suddenly I changed my plan. -> I realized there are many languages even can’t be researched at all. . -> Getting accustomed to discrimination makes misunderstanding, sometimes. At my country, we couldn’t use music streaming service, because of legal problem. But at that moment, I thought it was discrimination, which is done by music company. "
}, {
- "id": 24,
+ "id": 26,
"url": "http://localhost:4000/2020/01/Git-Merge/",
"title": "Why am I not listed as a contributor?!",
"body": "2020/01/10 - From the end of last year, big changes have witnessed in NLP research. Embracing an unprecedented growth, I started to study new exciting results and advances. In doing so, I noticed I’m not listed as contributor of repo which my PR accessed. How did I come to a repository?: When I’m stuck, I would prefer to code, than to go deep in theory. (It must be so. . too much to understand 🤒)It was BERT released by Google AI I felt keenly the necessity of implementing, because not only couldn’t understand the way they figured out positional encoding formula, but how it actually works. What does it mean to “scale” dot product in Attention? (Now I know it’s far from my section 😂) Figure 1. Scaled Dot Product. Adopted from tensorflow blogWhat was the code error?: For implement code in paper, I read the papers Transformer and BERT, structured the model, and refered the others’ code. Meanwhile, I found out a small error in tokenization process, which was changing a token into [MASK], enabled bidirectional representation. I’ve made PR, and got merged. But I was not in contributors. Why?: Figure 2. Merged Pull request Adopted from graykode projectActually I happened to know there can be couple of reasons github doesn’t include my name as contributor. Well, if contributors tab has more than 100 people, in which case it shows you up only if you are in the top 100 contributors because displaying too many contributors can make webpages down. Somethimes, however, it doesn’t that problem. Why not? Two possibilities are there. First, According to Joel-Glovier, if repository maintainer merged-as-a-rebase PR will end up showing as maintainer’s commit. But maintainer shouldn’t normally do this. Second, if you happend to commit using a different git email that what is in your GitHub profile, it will not be attached to your Github user, and “doesn’t show up” as you. Reference: Michał Chromiak’s blog Github: why are my contributions are not showing on my profile atlassian-gitfetch"
}, {
- "id": 25,
- "url": "http://localhost:4000/2019/12/lesson1-fastai/",
- "title": "Fine Grained Classification",
- "body": "2019/12/31 - Finally you can solve the mystery behind this weird drawing. . through this course. juptyer notebook magic: %reload_ext autoreload%autoreload 2%matplotlib inlinethis is special directives to jupyter notebook, not python code. And it is called ‘magics’ (but i think jeremy is magicion) If somebody changes underlying library code while I’m running this, please reload it automatically If somebody asks to plot something, then please plot it here in this Jupyter NotebookDon’t hesitate to import start~ Digging into untar_data, path. ls: Union[pathlib. Path, str]: typed programming language? -> maybe i think disclaim the type beforehand for sure. Q. like assert? path. ls()this is some module that fast. ai made because os. listdir(‘path’) is unconvinient. Python3 pathlib library!: pathlib "
- }, {
- "id": 26,
+ "id": 27,
"url": "http://localhost:4000/2019/12/jeremy-howard/",
"title": "Jeremy Howard",
"body": "2019/12/15 - This is journey to find out ‘who am I trying to be?’: How he impacted me? The person who made me start Computer Vision again. He emphasized the importance of studying NLP and Computer together to understand the deep-learning. He didn’t order it to study, but always he pursuade me with reasonable way. “It’s not just something I can throw away. NLP and computer vision a few weeks apart and that’s going to force your brain to realize like ‘oh I have to remember this’” He made me admit my failure in deep-learning. I started to objectify where am I. What should I do when I’m frustrated. “Keep going. You’re not expected to remember everything. Yet. You’re not expected to understand everything. Yet. You’re not expected to know why everything works. Yet. ” His articles are numerous, below. What is torch. nn Really? High Performance Numeric Programming with Swift: Explorations and Reflections C++11, random distributions, and Swift And especially, I like this book. Designing great data products Great predictive modeling is an important part of the solution, but it no longer stands on its own; as products become more sophisticated, it disappears into the plumbing. Designing great data products And he is also famous for words. Here are some. we’re going to try and use that to really understand what’s going on. So to warn you, none of it is rocket science but a lot of its going to look really new. So don’t expect to get it the first time but expect to listen and jump into the notebook try a few things test things out look particularly at like tensor shapes and inputs and outputs to check your understanding then go back and listen again. But and kind of try it, a few times, because you will get there right, it’s just that there’s going to be a lot of new concepts because we haven’t done that much stuff in pure Pytorch. Lesson 6: Deep Learning 2019 "
}, {
- "id": 27,
+ "id": 28,
"url": "http://localhost:4000/2019/11/julia-evans/",
"title": "Julia Evans",
"body": "2019/11/20 - This is journey to find out ‘who am I trying to be?’: The women who surprised me in many ways. First, she approached me to teaching some concepts drawing cartoons. It was at Hackers news, which was hightest ranks. Personally I have the use of not to reading title, so and cartoon was so cute and clear. I naturally gonna understood mechanism and astonished by her explaination ability. Her value, which she was taught by many people so want to do same things, moved me. Volume of her knowledge, that just reading post title is a deal of work, amazed me. "
}, {
- "id": 28,
+ "id": 29,
"url": "http://localhost:4000/2019/11/coc-retropective/",
"title": "Retrospective on Pycon 2019 Korea (CoC Committee)",
"body": "2019/11/05 - When I was volunteer, it seems like busy and hectic to managing that crowded conference. In my experience, to get things moving, it needs hierarchy. But it didn’t. Organizers emphasized our responsibility, and if I passed each other’s burden, It could be my burden next time. In solidarity of the obligation, we finished conference well. And after participating PyCon Korea 2018 as volunteer, I’ve joined PyCon Korea Organizer last year. <Figure 1> First meeting of PyCon 2019 Korea Organizers It’s been a while since PyCon 2019 finished. It’s held on Aug 15 - 18, at Coex Grand Balloom <Figure 2> Ongoing session, speaking on news comment processing <Figure 3> Sponsor Booth iin Coex Hall <Figure 4> After PyCon 2019, with all of volunteer, organizer, speakers 😍 🥰 Serving as part of the coc TF, I spent large fraction of last year doing CoC job. here’s the path what we’ve been grappled with to grasp a solution. First half: Before the conference Toward Diverse Community: Formally we’ve been reusing and modifying PyCon US CoC, but we needed fit in Korean and I was part of that to revise code of conduct. Except ‘That’ Diversity, Because it is ‘Harassment’: Specific point was harassment, and the others were not. process of finding the points. How can we settle this point?Second half: During the conference Handling the potential Harassment: Disjunction of policy and real-time situation: This ‘PyCon 2019 Korea retrospective series’ would be devided into 3 Episodes. “Retrospective on Pycon 2019 Korea (CoC Committee)” “Retrospective on Pycon 2019 Korea (Program Chair)” (20 Nov, To Be Update) “Maintaining participation while still making timely decisions” (29 Nov, To Be Update)"
}, {
- "id": 29,
+ "id": 30,
"url": "http://localhost:4000/2019/11/elif-shafak/",
"title": "Elif Shafak",
"body": "2019/11/05 - This is journey to find out ‘who am I trying to be?’: For creative-minded people, Istanbul is a treasure. ’ Photo © Chris Boland, licensed under CC BY-NC-ND 2. 0 it suddenly felt like what I was trying to convey was more complicated and detailed than what the circumstances allowed me to say. And I did what I usually do in similar situations: I stammered, I shut down, and I stopped talking. I stopped talking because the truth was complicated, even though I knew, deep within, that one should never, ever remain silent for fear of complexity. <Figure 1> Elif Shafak Photo credit: www. elifsafak. com. tr I want to talk about emotions and the need to boost our emotional intelligence. I think it’s a pity that mainstream political theory pays very little attention to emotions. Oftentimes, analysts and experts are so busy with data and metrics that they seem to forget those things in life that are difficult to measure and perhaps impossible to cluster under statistical models. But I think this is a mistake, for two main reasons. We are emotional beings. I think it’s going to be one of our biggest intellectual challenges, because our political systems are replete with emotions. In country after country, we have seen illiberal politicians exploiting these emotions. And yet within the academia and among the intelligentsia, we are yet to take emotions seriously. I think we should. 1 2 Reference: British Council Worldwide ↩ Ted Talk ↩ "
}, {
- "id": 30,
+ "id": 31,
"url": "http://localhost:4000/2019/01/dps-week1/",
"title": "Digital Product School week 1",
"body": "2019/01/11 - The 1th week retropect at Digital Product School [This week’s schedule] CONTENT: Welcome to Digital Product School! Trip to Spitzingsee Welcome to Design Office Specifying our goal of product Welcome to Digital Product School!: Trip to Spitzingsee: At the first day of Digital Product School, we had a off-site with all of batch 9 people. All the costs were managed by dps. At the beautiful mountain, we settled the team, and got my team goal. Basically, there are two kind of team in DPS. (1) Wild team - the team has fixed topic(2) Company team - the team which has specific stakeholders, and also topic defined by that stakeholders The Core-team will fix what team you will join in DPS for 3 months based on ymy professionals, they announce it at off-site. [My team for 3 months at DPS] And we decide on my batch #9 theme song. How? Each team draw for songs and pitch ‘why this song should be batch #9 theme song’The result? Imagine dragon - Believer (I didn’t know at the moment, this song would be stamped in my memory) We have a workshop for getting to know each other. For example, we share 1) what do I expect from 3 months of dps, 2) when I feel happy in my life time, 3) what I worked for last week, 4) what was my last project and 5) what plays important role in my life My team's board Cero Welcome to Design Office: At first day of design office, we had workshop, which celebrates my day in dps also discuss specific rule, menifesto and stakeholders We get sticker and attach it in map depends on my nationality Now time to get to know my team’s stakeholders. What they want for us? What they expect from us? How free my team are on the topic?To be honest, it is endless tug-of-war. We should discuss with my stakeholders, endlessly, and find out solution which can meet interest of users, stakeholders and my team. Basically, my team’s main stakeholder is ADAC, but BMW, City of munich and Nokia will also participate as my team’s stakeholders. Specifying our goal of product: "
@@ -335,7 +340,7 @@
-
Digital Product School week 8&9
+
My life in Digital Product School - week 8/19/10
Follow
@@ -351,7 +356,7 @@ Digital Product School week 8&9
-
+
@@ -391,46 +396,34 @@
diff --git a/_site/2020/03/note08-fastai-2/index.html b/_site/2020/03/note08-fastai-2/index.html
index 7b3320aeb8..e02d17a07d 100644
--- a/_site/2020/03/note08-fastai-2/index.html
+++ b/_site/2020/03/note08-fastai-2/index.html
@@ -19,9 +19,9 @@
-
+
+{"description":"This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring","author":{"@type":"Person","name":"dionne"},"@type":"BlogPosting","url":"http://localhost:4000/2020/03/note08-fastai-2/","publisher":{"@type":"Organization","logo":{"@type":"ImageObject","url":"http://localhost:4000/assets/images/logo.png"},"name":"dionne"},"image":"http://localhost:4000/assets/images/30.png","headline":"What’s inside Pytorch Operator?","dateModified":"2020-03-01T00:00:00+09:00","datePublished":"2020-03-01T00:00:00+09:00","mainEntityOfPage":{"@type":"WebPage","@id":"http://localhost:4000/2020/03/note08-fastai-2/"},"@context":"http://schema.org"}
@@ -161,96 +161,101 @@
"body": " {% if page. url == / %} {% assign latest_post = site. posts[0] %} <div class= topfirstimage style= background-image: url({% if latest_post. image contains :// %}{{ latest_post. image }}{% else %} {{site. baseurl}}/{{ latest_post. image}}{% endif %}); height: 200px; background-size: cover; background-repeat: no-repeat; ></div> {{ latest_post. title }} : {{ latest_post. excerpt | strip_html | strip_newlines | truncate: 136 }} In {% for category in latest_post. categories %} {{ category }}, {% endfor %} {{ latest_post. date | date: '%b %d, %Y' }} {%- assign second_post = site. posts[1] -%} {% if second_post. image %} <img class= w-100 src= {% if second_post. image contains :// %}{{ second_post. image }}{% else %}{{ second_post. image | absolute_url }}{% endif %} alt= {{ second_post. title }} > {% endif %} {{ second_post. title }} : In {% for category in second_post. categories %} {{ category }}, {% endfor %} {{ second_post. date | date: '%b %d, %Y' }} {%- assign third_post = site. posts[2] -%} {% if third_post. image %} <img class= w-100 src= {% if third_post. image contains :// %}{{ third_post. image }}{% else %}{{site. baseurl}}/{{ third_post. image }}{% endif %} alt= {{ third_post. title }} > {% endif %} {{ third_post. title }} : In {% for category in third_post. categories %} {{ category }}, {% endfor %} {{ third_post. date | date: '%b %d, %Y' }} {%- assign fourth_post = site. posts[3] -%} {% if fourth_post. image %} <img class= w-100 src= {% if fourth_post. image contains :// %}{{ fourth_post. image }}{% else %}{{site. baseurl}}/{{ fourth_post. image }}{% endif %} alt= {{ fourth_post. title }} > {% endif %} {{ fourth_post. title }} : In {% for category in fourth_post. categories %} {{ category }}, {% endfor %} {{ fourth_post. date | date: '%b %d, %Y' }} {% for post in site. posts %} {% if post. tags contains sticky %} {{post. title}} {{ post. excerpt | strip_html | strip_newlines | truncate: 136 }} Read More {% endif %}{% endfor %} {% endif %} All Stories: {% for post in paginator. posts %} {% include main-loop-card. html %} {% endfor %} {% if paginator. total_pages > 1 %} {% if paginator. previous_page %} « Prev {% else %} « {% endif %} {% for page in (1. . paginator. total_pages) %} {% if page == paginator. page %} {{ page }} {% elsif page == 1 %} {{ page }} {% else %} {{ page }} {% endif %} {% endfor %} {% if paginator. next_page %} Next » {% else %} » {% endif %} {% endif %} {% include sidebar-featured. html %} "
}, {
"id": 12,
+ "url": "http://localhost:4000/2020/04/v3-2019-lesson06-note/",
+ "title": "fastai 2019 course-v3 Part1, lesson06",
+ "body": "2020/04/15 - Lesson 06Rossmann(Tabular): Tabular data: be careful on Categorical variable vs Continuous variable. if datatype is int, fastai think it is classification, not a regression. Root mean square percentage error. as loss function. When you assign the y_range, it’s better to assign little bit more than actual maximum. > because it’s sigmoid. intermediate layers, which is weight matrix is 1) 1000, and 2) 500 -> which means our parameter would be 500*1000. learn. modelWhat is dropout and embedding dropout?: Nitish Srivastava, Dropout: A Simple way to prevent Neural Networks from Overfitting you can dropout with p value, make it specified to specific layer, or make it applied to all the layers. Pytorch code 1) bernoulli, which decides whether you will hold it? 2) and divide the noise value depends on noise value. so noise became 2 or remain 0. According to pytorch code, We do change at training time, but we do nothing at test time. and this means you don’t have to do anything special with inference time. ’ TODO: find at forums what is inference time - Related to NVIDIA, GPU. Embedding dropout is just a dropout. It’s different between continuous variable and embedding layer. TODO Still can’t understand. why embedding dropout is effective. or,… in need. Let’s delete at random, some of the results of the embedding. and It worked well especially at Kaggle Batch Normalization: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift -> came out false! According to How Does Batch Normalization Help Optimization? The key was multiplicative bias {\gamma} and additive bias {\beta}` Explain Let $$ \hat{y} = f(w_1, w_2, w_3, … , x)} $$ , loss = MSE , Then y_range should be between 1 and 5` And Activation function ends with -1 -> +1 To mitigate this problem, we can add the other parameter, like $$w_n$$ But there’re so much interactions in the process so just re-scale the output. Momentum parameter at BatchNorm1d: Different from momentum like in optimization. This momentum is Exponentially weighted moving average of the mean, instead of deviation. If this is small number: mean standard deviation would be less from mini_batch to mini_batch » less regularization effect. (If this is large number, variation would be greater from mini_batch to mini_batch » more regularization effect) TODO: can’t sure, but i understand, this is not about how to update parameter but about how much reflect previous value when scale and shift Q. Preference between batchnorm and the other regularizations(drop out, weight decay)A. Nope, always try and see the results## lesson6-pets-more### Data Augmentation- Last reg- `get_transforms` has lots of params (even not yet learned all) -> check documentation - Remember you can implement all the doc contents bc it's made from nbdev - TODO: try this!!- Essence of data augmentation is you should maintain the label, while somewhat making sense. - ex) tilt, because it's optically sensible, you can always change the angle of the data view. - zeros, border, and reflection but always `reflection` works most of the time, so that is the default### Convolutional Kernel(What is convolution?)- Will make heat\_map from scratch, which means the parts convolution focuses on![setosa_visualization]()- http://setosa. io/ev/image-kernels/ - javascript thing - How convolution works - Kernel. which does element-wise multiplication, and sum them up - so it has on pixel less at borders -> so it uses padding, and fastai uses reflection as said. - why this Kernel(matrix) helps catching horizontal edge side? - because this kernel`(picture2)` weights differently, depends on `x axis` - why familiar, because it's similar intuition with fugus`(paper)` paper- CNN from different viewpoints`link` - output of pixel is results from different linear equations. - If you connect this with represents of neural network nodes, you can see that the specific inp nodes connected with specific out nodes. - **Summarize**: cnn does 1) matmul some of the elements are always zero 2) same weight for every row, which is called `weight time? weight. . ?, 1:18:50` `(picture)`#### Further lowdown- Because generally image has 3 channels, we need rank 3 kernel. - And **do multiply with all channel output is one pixel**. (`draw by your self`) - but this kernel will catch one feature, like horizontal, so that we make more kernel so that output becomes (h * w * kernel) - And that `kernel` come to `channel`- **Conv2d**: with 3 by 3 kernel, stride 2 conv -> (h/2 * w/2 * kernel) - skip or jump over input pixel - to protect from memory out of control~~~pythonlearn. modellearn. summary()~~~TODO: understand yourself the blocks of conv-kernel: - Usually use big kernel size at first layer (will study this at part2)- Bottom right highlighting kernel(`pic / draw`)- `torch. tensor. expand`: for memory efficient, because we should do RGB- We do not make separate kernel, but make rank 4 kernel - 4d tensor is just stacked kernel- `t[None]. shape` create new unit axis, and why? we make this -> it should move unit of batch, not one size image. ### Average pooling, feature- suppose our pre-trained model results in size of `11 by 11 by 512 ` `pic 4` and my classification task has 37 classes * take the first face of channel, which is 11 by 11 and `mean` it, so that make rank 2 tensor, 512 by 1 * and make 2d matrix, which is 512 by 37 and multiply so that we can get 37 by 1 matrix. - Feature, at convolution block - So, when we transfer-learning without unfreeze, every element of last matrix (512 by 1) should represent(or could catch) each feature. ### Heatmap, Hook~~~hook_output(model[0]) -> acts -> avg_acts~~~- if we average the block with `axis=feature`, result of matrix(11 by 11) depicts `how activated was that area?` -> it is heatmap, `avg_acts`- and acts comes from hook, which is more advanced pytorch feature. - hook into pytorch machine itself, and run any arbitrary Pytorch code - Why this is cool?: Normally it gives set of outputs of forward pass, but we can interrupt and hook the forward pass. - Also can store the output of the convolutional part of the model, which is before avg_pooling- Thinking back when we do cut off `after` the conv part. - but with fast. ai the original convolutional part of the model would be *the first thing in the model*, specifically could be given from `learn. model. eval()[0]` - And this is gotten from `hooked_output` and having hooked the output, we can pass our x_minibatch to output. - Not directly, but with normalized, minibatch, put on to the gpu - `one_item()` function do it, when we have one data `TODO: this is assignment` do it yourself without one_item function - and `. cuda()` put it on gpu- you should print out very often the shape of tensor, and try think why. "
+ }, {
+ "id": 13,
+ "url": "http://localhost:4000/2020/04/qna-image-segmentation/",
+ "title": "[Q&A] Image Segmentation, using Unet with Driving Video data",
+ "body": "2020/04/02 - This post is about my questions while I was studying USF Deep Learning course about image segmentation task. All the answers are from the course, source code, library document, or document. I cared about being clear at reporting information including source of information, however if there are still anything unclear, please contact me. And thank you Jeremy&Rachael for everything. Also Thank you Cambridge Computer Vision Lab to made us to study with your labor. The Cambridge-driving Labeled Video Database (CamVid) is the first collection of videos with object class semantic labels, complete with metadata. The database provides ground truth labels that associate each pixel with one of 32 semantic classes. If someone is interested in this project, please check the site and see the details. Now, let’s start first using jupyter’s one of tricks which I love most. It enables cell to print the code without print function. from IPython. core. interactiveshell import InteractiveShell# pretty print all cell's output and not just the last oneInteractiveShell. ast_node_interactivity = all from fastai. vision import *from fastai. callbacks. hooks import *from fastai. utils. mem import *path = untar_data(URLs. CAMVID) # The locations where the data and models are downloaded are set in config. ymlpath. ls() I’m trying to accustomed to using pathlib module, not just it became built-in module in python, but I felt uncomfortable myself with os module. However, still unpredictable conflicts are remain, even in the quite standard library like Pytorch, tensorflow, onnx. (it require me string for path. not PosixPath. will send PR. . ) [PosixPath('/root/. fastai/data/camvid/valid. txt'), PosixPath('/root/. fastai/data/camvid/images'), PosixPath('/root/. fastai/data/camvid/labels'), PosixPath('/root/. fastai/data/camvid/codes. txt')]path_img = path/'images'path_lbl = path/'labels'fnames = get_image_files(path_img) #filenamelbl_names = get_image_files(path_lbl)1. (Play with data) My Hypothesis: File name has A_B format. and A / B would be at key-value position. Use collections - defaultdict Default Dict: Link: easy to group a sequence of key and value pairs into a dictionary of list?from collections import defaultdictfnames[0], lbl_names[0](PosixPath('/root/. fastai/data/camvid/images/0001TP_009210. png'), PosixPath('/root/. fastai/data/camvid/labels/0016E5_01800_P. png'))files = [tuple(i. stem. split('_')) for i in fnames]labels = [tuple(i. stem. split('_')[:-1]) for i in lbl_names]d = defaultdict(list)for k, v in files: d[k]. append(v)d. keys()len(d['0001TP'])124for k, v in d. items(): print(k, v)0001TP ['009210', '008850', '007350', '008970', '009840', '010140', '008490', '008520', '009540', '008250', '008340', '006840', '007860', '007410', '007740', '009870', '010080', '007890', '008790', '010020', '008400', '007080', '008280', '010380', '009330', '009060', '007470', '006810', '009720', '008580', '007110', '008730', '009150', '007680', '009780', '007800', '007290', '008760', '009510', '008640', '008310', '007440', '006900', '007500', '008460', '009030', '008130', '009480', '009900', '010230', '009270', '008040', '007590', '007950', '009990', '008550', '007260', '008100', '007530', '006960', '008190', '009420', '009930', '009000', '007830', '008940', '006690', '009570', '008880', '010170', '007560', '009300', '006750', '009360', '010200', '007320', '008010', '009120', '007620', '007200', '007140', '010320', '006720', '008670', '007230', '008370', '010260', '009690', '006930', '009090', '007770', '010290', '010350', '008610', '008070', '009600', '008430', '009450', '007380', '009240', '007710', '007170', '008160', '008910', '007020', '006780', '007050', '009960', '009810', '008220', '009180', '009750', '010050', '009660', '010110', '007920', '009630', '007650', '006990', '008700', '009390', '007980', '008820', '006870']0016E5 ['01290', '08159', '05760', '08133', '08063', '06660', '00960', '05850', '00750', '06960', '08035', '08107', '07975', '08017', '05610', '07140', '08119', '08027', '07170', '08400', '08093', '02100', '06390', '04470', '08340', '06060', '00600', '07470', '08151', '07800', '01620', '05730', '01530', '00690', '08430', '05940', '01980', '07320', '08069', '07965', '04380', '05430', '01410', '06780', '08007', '08087', '08079', '06600', '08109', '05490', '00901', '04590', '04680', '08045', '01770', '06690', '08085', '06810', '00420', '08011', '07440', '02190', '06300', '04800', '01500', '00450', '08029', '01470', '06330', '07997', '08067', '05370', '08013', '08190', '00840', '02370', '08049', '08135', '01440', '06870', '05820', '05280', '08051', '04440', '08091', '01380', '00630', '07290', '05520', '04770', '00540', '07995', '07999', '05550', '07920', '08101', '08141', '08053', '04620', '08103', '05160', '07350', '08057', '06030', '06000', '08550', '07963', '08089', '05970', '08047', '05640', '06240', '05220', '04350', '01590', '07959', '01950', '08117', '06180', '01560', '05400', '08043', '07680', '00780', '08081', '07050', '01020', '01350', '04530', '06720', '07969', '08149', '08003', '08131', '08129', '08033', '05460', '01650', '07530', '08023', '05340', '08640', '05100', '08075', '01230', '04980', '02070', '01080', '06210', '05910', '08009', '01800', '05190', '02400', '08083', '08019', '07620', '07200', '07890', '08059', '06990', '04410', '08121', '08123', '06930', '08137', '08147', '08095', '06570', '06150', '08153', '06840', '05250', '00510', '08370', '08580', '08113', '07410', '08097', '01200', '04950', '07770', '07650', '04710', '06090', '08055', '07110', '07981', '00990', '08250', '08127', '01920', '07985', '08220', '08005', '08157', '05130', '08071', '01140', '04830', '07740', '08143', '06120', '02040', '08111', '08115', '00660', '08280', '06420', '07983', '02220', '05700', '01860', '01260', '04920', '06510', '07020', '08073', '08105', '08125', '06360', '07860', '07993', '00810', '06540', '08099', '08139', '02010', '07973', '08155', '07991', '06630', '00480', '06750', '04890', '08001', '08025', '00870', '08490', '01830', '07977', '05010', '01170', '07961', '01680', '01050', '07987', '07080', '04560', '00930', '05310', '02340', '05790', '08460', '00720', '08031', '02280', '08039', '08037', '08065', '06270', '08077', '06900', '04650', '06480', '07230', '08041', '06450', '00570', '07989', '04740', '07979', '02250', '07380', '00390', '01710', '07590', '08021', '08520', '07500', '01110', '04500', '02310', '07971', '02130', '05580', '05880', '08610', '08310', '08145', '05670', '04860', '07260', '08015', '07967', '01740', '01320', '07560', '07830', '01890', '08061', '02160', '07710', '05070', '05040']Seq05VD ['f00030', 'f02550', 'f03450', 'f01110', 'f00480', 'f00210', 'f04590', 'f04170', 'f01800', 'f03990', 'f03360', 'f03900', 'f02070', 'f00810', 'f03690', 'f01350', 'f01530', 'f04980', 'f05100', 'f03060', 'f00900', 'f03870', 'f02460', 'f01470', 'f02370', 'f02820', 'f04080', 'f02760', 'f04860', 'f02250', 'f04200', 'f00270', 'f03720', 'f02850', 'f04410', 'f01200', 'f03090', 'f02010', 'f03930', 'f00090', 'f01650', 'f01890', 'f03840', 'f03030', 'f02130', 'f01230', 'f04110', 'f02520', 'f04140', 'f04020', 'f00060', 'f03420', 'f01560', 'f00120', 'f04290', 'f02340', 'f00300', 'f01380', 'f00870', 'f01860', 'f02970', 'f04560', 'f02730', 'f00330', 'f04530', 'f03780', 'f01770', 'f03390', 'f05040', 'f02430', 'f03330', 'f00660', 'f01740', 'f02100', 'f04800', 'f04050', 'f00510', 'f02790', 'f04350', 'f00690', 'f00540', 'f02490', 'f00960', 'f00930', 'f04230', 'f02880', 'f03600', 'f01020', 'f01500', 'f02400', 'f04830', 'f04470', 'f03300', 'f02670', 'f00450', 'f01980', 'f01170', 'f01620', 'f04500', 'f01080', 'f03180', 'f05070', 'f03150', 'f04950', 'f01440', 'f03510', 'f01710', 'f00360', 'f04770', 'f02910', 'f01050', 'f00630', 'f04320', 'f00570', 'f03240', 'f02190', 'f01140', 'f03540', 'f02220', 'f02640', 'f03960', 'f00000', 'f04920', 'f01950', 'f00990', 'f03480', 'f03000', 'f00420', 'f04620', 'f03210', 'f00780', 'f03570', 'f01590', 'f00750', 'f01920', 'f04650', 'f03750', 'f03630', 'f02310', 'f02610', 'f02580', 'f04740', 'f02280', 'f04680', 'f00390', 'f00720', 'f03660', 'f02040', 'f03270', 'f00180', 'f03810', 'f01410', 'f01290', 'f03120', 'f00840', 'f04440', 'f00150', 'f01260', 'f02700', 'f02940', 'f00600', 'f01830', 'f04260', 'f05010', 'f04890', 'f02160', 'f00240', 'f04380', 'f01680', 'f04710', 'f01320']0006R0 ['f02820', 'f03690', 'f03180', 'f02550', 'f01020', 'f03660', 'f02340', 'f01170', 'f02610', 'f02940', 'f01290', 'f02100', 'f01350', 'f03270', 'f03870', 'f01380', 'f01980', 'f03810', 'f02430', 'f02310', 'f01830', 'f03480', 'f02970', 'f01890', 'f03210', 'f03930', 'f02040', 'f02070', 'f02400', 'f01560', 'f03030', 'f01770', 'f01590', 'f01950', 'f03420', 'f01650', 'f03450', 'f00990', 'f03630', 'f01500', 'f03570', 'f00930', 'f03090', 'f03360', 'f02880', 'f02460', 'f01440', 'f01920', 'f01230', 'f03840', 'f02730', 'f01620', 'f02220', 'f03750', 'f03330', 'f03540', 'f02520', 'f02790', 'f01050', 'f03120', 'f01800', 'f01140', 'f01860', 'f01530', 'f01470', 'f02670', 'f02490', 'f01260', 'f01110', 'f02760', 'f01680', 'f03150', 'f02580', 'f03300', 'f02280', 'f01200', 'f03390', 'f03510', 'f02640', 'f02190', 'f02370', 'f01320', 'f02130', 'f03600', 'f03240', 'f03780', 'f03720', 'f02700', 'f01410', 'f01080', 'f02850', 'f01710', 'f03900', 'f03060', 'f01740', 'f02010', 'f02250', 'f00960', 'f03000', 'f02160', 'f02910']for k, v in d. items(): print(k, len(d[k]))0001TP 1240016E5 305Seq05VD 1710006R0 101for i in d2. keys(): print(i,len(d2[i]))0016E5 3050001TP 1240006R0 101Seq05VD 171files[0], labels[0](('0001TP', '009210'), ('0016E5', '01800'))2. My question: Link: Why do we need masking? and does color from fastai library? (have to look into source code) What do the parameter alpha do? When people make masked img, would it be have ranged integer limit? Does image normalization related with this?lbl_sorted = sorted(lbl_names)f_sorted = sorted(fnames)lbl_1 = lbl_sorted[33]f_1 = f_sorted[33]img = open_image(lbl_1)mask = open_mask(lbl_1)_,axs = plt. subplots(1,2, figsize=(10,5))# img. show(ax=axs[0], y=mask, title='masked')img. show(ax=axs[0], title='1')mask. show(ax=axs[1], title='2', alpha=1. ) img_2 = open_image(f_1)mask_2 = open_mask(f_1)_,axs = plt. subplots(1,2, figsize=(10,5))# img. show(ax=axs[0], y=mask, title='masked')img_2. show(ax=axs[0], title='3',)mask_2. show(ax=axs[1], title='4', alpha=1. ) open_mask(lbl_1). data. shapetorch. Size([1, 720, 960])open_mask(lbl_1). data. shapetorch. Size([1, 720, 960])open_image(f_1). data. shapetorch. Size([3, 720, 960])open_image(f_1). data. shapetorch. Size([3, 720, 960])img. data #labeled datatensor([[[0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], . . . , [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176]], [[0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], . . . , [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176]], [[0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], . . . , [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176]]])mask. data # after mask, labeled datatensor([[[ 4, 4, 4, . . . , 21, 21, 21], [ 4, 4, 4, . . . , 21, 21, 21], [ 4, 4, 4, . . . , 21, 21, 21], . . . , [17, 17, 17, . . . , 30, 30, 30], [17, 17, 17, . . . , 30, 30, 30], [17, 17, 17, . . . , 30, 30, 30]]])img_2. data, mask_2. data(tensor([[[0. 0706, 0. 0667, 0. 0706, . . . , 0. 6431, 0. 6549, 0. 6627], [0. 0745, 0. 0706, 0. 0706, . . . , 0. 6431, 0. 6510, 0. 6549], [0. 0784, 0. 0706, 0. 0745, . . . , 0. 6392, 0. 6588, 0. 6588], . . . , [0. 0863, 0. 0824, 0. 0824, . . . , 0. 1333, 0. 1216, 0. 1255], [0. 0902, 0. 0863, 0. 0824, . . . , 0. 1255, 0. 1176, 0. 1216], [0. 0863, 0. 0824, 0. 0784, . . . , 0. 1137, 0. 1059, 0. 1137]], [[0. 0706, 0. 0667, 0. 0706, . . . , 0. 7490, 0. 7608, 0. 7686], [0. 0745, 0. 0706, 0. 0706, . . . , 0. 7451, 0. 7569, 0. 7608], [0. 0784, 0. 0706, 0. 0745, . . . , 0. 7412, 0. 7529, 0. 7529], . . . , [0. 0980, 0. 0941, 0. 0941, . . . , 0. 1804, 0. 1686, 0. 1725], [0. 1059, 0. 1020, 0. 0980, . . . , 0. 1725, 0. 1647, 0. 1686], [0. 1020, 0. 0980, 0. 0941, . . . , 0. 1608, 0. 1529, 0. 1608]], [[0. 0784, 0. 0745, 0. 0784, . . . , 0. 7569, 0. 7686, 0. 7765], [0. 0824, 0. 0784, 0. 0784, . . . , 0. 7647, 0. 7647, 0. 7686], [0. 0784, 0. 0706, 0. 0745, . . . , 0. 7608, 0. 7647, 0. 7647], . . . , [0. 1216, 0. 1176, 0. 1176, . . . , 0. 2000, 0. 1882, 0. 1922], [0. 1176, 0. 1137, 0. 1098, . . . , 0. 1843, 0. 1765, 0. 1804], [0. 1137, 0. 1098, 0. 1059, . . . , 0. 1725, 0. 1647, 0. 1725]]]), tensor([[[ 18, 17, 18, . . . , 183, 186, 188], [ 19, 18, 18, . . . , 183, 185, 186], [ 20, 18, 19, . . . , 182, 185, 185], . . . , [ 25, 24, 24, . . . , 43, 40, 41], [ 26, 25, 24, . . . , 41, 39, 40], [ 25, 24, 23, . . . , 38, 36, 38]]]))3. What is a difference between image and imageSegment?: imageSegment An ImageSegment object has the same properties as an Image. The only difference is that when applying the transformations to an ImageSegment, it will ignore the functions that deal with lighting and keep values of 0 and 1. It’s easy to show the segmentation mask over the associated Image by using the y argument of show_image. img = open_image(fnames[0])mask = open_mask(lbl_names[0])_,axs = plt. subplots(1,3, figsize=(8,4))img. show(ax=axs[0], title='no mask')img. show(ax=axs[1], y=mask, title='masked') #seg mask over the img using y argmask. show(ax=axs[2], title='mask only', alpha=1. ) vision. image ##4. Why/How img div by 255 and how it results fast. ai : vision. image - If div=True, pixel values are divided by 255. to become floats between 0. and 1. At times, you want to get rid of distortions caused by lights and shadows in an image. Normalizing the RGB values of an image can at times be a simple and effective way of achieving this. So sum of the pixel’s value over all channels(which is S) divides each intensified channel so that nomalized value will be R/S, G/S and B/S (where, S=R+G+B). Detailed explain here4. Python Evaluation Order: Python evaluates expressions from left to right. Notice that while evaluating an assignment, the right-hand side is evaluated before the left-hand side. mask_tmp, trg_tmp, void_tmp = 2, 1, 10mask_tmp = trg_tmp != void_tmpprint(mask_tmp, trg_tmp, void_tmp) # (1) target is not same with voidTrue 1 10# Example 1x = 1y = 2x,y = y,x+yx, y(2, 3)# Example 2x = 1y = 2x = yy = x+yx, y(2, 4)5. model learner parameter :: pct_start: A: Percentage of total number of epochs when learning rate rises during one cycle. Q: Sorry, I still confused that one cycle in the new API only runs one epoch. How the percentage of total number of epochs works? Can you give a example? If learn. fit_one_cycle(10, slice(1e-4,1e-3,1e-2), pct_start=0. 05)??A: Ok, strictly correct answer would be percentage of iterations, so you can have lr both increase and decrease during same epoch. In your example, say, you have 100 iterations per epoch, then for half an epoch (0. 05 * (10 * 100) = 50) lr will rise, then slowly decrease. Q2: Thanks for this explanation … so essentially, it is the percentage of overall iterations where the LR is increasing, correct? So, given the default of 0. 3, it means that your LR is going up for 30% of your iterations and then decreasing over the last 70%. Is that a correct summation of what is happening? A2: Yes, I think that’s correct. You can verify that by changing its value and check:learn. recorder. plot_lr() For example if pct_start = 0. 2 source: forums. fastai "
+ }, {
+ "id": 14,
"url": "http://localhost:4000/2020/03/note08-fastai-4/",
"title": "Gradient backward, Chain Rule, Refactoring",
- "body": "2020/03/02 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring” Lecture 08 - Deep Learning From Foundations-part2 “ Homework: calculus for machine learning einsum conventionCONTENTS: Foundation version Gradients backward pass decompose function chain rule with code check the result using Pytorch autograd Refactor model Layers as classes Modue. forward() Without einsum nn. Linear and nn. Module Forward process Foundation version: Gradients backward pass: Gradients is output with respect to parameter we’ve done this work in this path(below) to simplify this calculus, we can just change it into, So, you should know of the derivative of each bit on its own, and then you multiply them all together. As a result, it would be over cross over the data. So you can get gradient, output with respect to parameter What order should we calculate? BTW, why Jeremy wrote , not Loss function?1 decompose function We want to get derivative of which forms But, we have a estimation of answer (we call it y hat) now So, I will decompose funciton to trace target variable. Using the above forward pass, we can suppose some function from the end. start from , We know MSE funciton got two parameters, output, and target . from MSE’s input we know function’s output and supposing v is input of that function, similarly, v became output of chain rule with code examplify backward process by random sampling To get a variable, I modified forward model a little def model_ping(out = 'x_train'): l1 = lin(x_train, w1, b1) # one linear layer l2 = relu(l1) # one relu layer l3 = lin(l2, w2, b2) # one more linear layer return eval(out) Be careful we don’t use mse_loss in backward process1) start with the very last function, which is loss funciton. MSE If we codify this formula,def mse_grad(inp, targ): #mse_input(1000,1), mse_targ (1000,1) # grad of loss with respect to output of previous layer inp. g = 2. * (inp. squeeze() - targ). unsqueeze(-1) / inp. shape[0] And, this can be examplified like below. Notice that input of gradient function is same with forward functiony_hat = model_ping('l3') #get value from forward modely_hat. g = ((y_hat. squeeze(-1)-y_train). unsqueeze(-1))/y_hat. shape[0]y_hat. g. shape>>> torch. Size([50000, 1]) We can just calculate using broadcasting, not using squeeze. then why should do and unsqueeze again?🎯 It’s related with random access memory(RAM). . If I don’t squeeze, (I’m using colab) it out of RAM. 2) Derivative of linear2 function This process’s weight dimensions defined by axis=1, axis=2. axis=0 dimension means size of data. This will be summazed by . sum(0) method. unsqeeze(-1)&unsqeeze(1) seperates the dimension, and make a dot product, and vanish axis=0 dimension. def lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowlin2 = model_ping('l2'); #get value from forward modellin2. g = y_hat. g@w2. t(); w2. g = (lin2. unsqueeze(-1) * y_hat. g. unsqueeze(1)). sum(0);b2. g = y_hat. g. sum(0);lin2. g. shape, w2. g. shape, b2. g. shape>>> torch. Size([50000, 50])torch. Size([50, 1])torch. Size([1]) Notice going reverse order, we’re passing in gradient backward3) derivative of ReLU def relu_grad(inp, out): # grad of relu with respect to input activations inp. g = (inp>0). float() * out. g Examplified belowlin1=model_ping('l1') #get value from forward modellin1. g = (lin1>0). float() * lin2. g;lin1. g. shape>>> torch. Size([50000, 50])4) Derivative of linear1 Same process with 2) but, this process’s weight hasdef lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowx_train. g = lin1. g @ w1. t(); w1. g = (x_train. unsqueeze(-1) * lin1. g. unsqueeze(1)). sum(0); b1. g = lin1. g. sum(0);x_train. g. shape, w1. g. shape, b1. g. shape>>> torch. Size([50000, 784])torch. Size([784, 50])torch. Size([50])5) Then it goes backward pass def forward_and_backward(inp, targ): # forward pass: l1 = inp @ w1 + b1 l2 = relu(l1) out = l2 @ w2 + b2 # we don't actually need the loss in backward! loss = mse(out, targ) # backward pass: mse_grad(out, targ) lin_grad(l2, out, w2, b2) relu_grad(l1, l2) lin_grad(inp, l1, w1, b1)Version 1 (Basic)- Wall time: 1. 95 s Summary Notice that output of function at forward pass became input of backward pass backpropagation is just the chain rule value loss (loss=mse(out,targ)) is not used in gradient calcuation. Because, it doesn’t appear with the weight. w1g, w2g, b1g, b2g, ig will be used for optimizercheck the result using Pytorch autograd require_grad_ is the magical function, which can automatic differentiation. 2 This magical auto gradified tensor keep track what happend in forward (taking loss function), and do the backward3 So it saves our time to differentiate ourselves ⤵️ THis is benchmark…. . Version 2 (torch autograd)- Wall time: 3. 81 µs Refactor model: Amazingly, just refactoring our main pieces, it comes down up to Pytorch package. 🌟 Implement yourself, Practice, practice, practice! 🌟 Layers as classes: Relu and Linear are layers in oue neural net. -> make it as classes For the forward, using __call__ for the both of forward & backward. Because ‘call’ means we treat this as a function. class Lin(): def __init__(self, w, b): self. w,self. b = w,b def __call__(self, inp): self. inp = inp self. out = inp@self. w + self. b return self. out def backward(self): self. inp. g = self. out. g @ self. w. t() # Creating a giant outer product, just to sum it, is inefficient! self. w. g = (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) self. b. g = self. out. g. sum(0) Remember that in lin_grad function, we save bias&weight!!!!!💬 inp. g : gradient of the output with respect to the input. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 w. g : gradient of the output with respect to the weight. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 b. g : gradient of the output with respect to the bias. {: style=”color:grey; font-size: 90%; text-align: center;”} class Model(): def __init__(self, w1, b1, w2, b2): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ) def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() refer to Jeremy’s Model class, he put layers in list Dionne’s self-study note: Decomposing Jeremy’s Model class init needs weight, bias but not x data when call that class(a. k. a function) it gave x data and y label! jeremy composited function in layers. x = l(x) so concise…. . also utilized that layer list when backward ust reversing it (using python list’s method) And he is recursively calling the function on the result of the previous thing. ⬇️for l in self. layers: x = l(x)Q2: Don’t I need to declare magical autograd function, requires_grad_?{: style=”color:red; font-size: 130%; text-align: center;”} [The questions migrated to this article] Version 3 (refactoring - layer to class)- Wall time: 5. 25 µs Modue. forward(): Duplicate code makes execution time slow. Role of __call__ changed. No more __call__ for implementing forward pass. By initializing the forward with __call__, Module. forward() use overriding to maximize reusability. So any layer inherit Module, can use parent’s function. gradient of the output with respect to the weight (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) can be reexpressed using einsum, torch. einsum( bi,bj->ij , inp, out. g) Defining forward and Module enables Pytorch to out almost duplicatesVersion 4 (Module & einsum)- Wall time: 4. 29 µs Q2: Isn’t there any way to use broadcasting? Why we should use outer product?{: style=”color:red; font-size: 130%; text-align: center;”} Without einsum: Replacing einsum to matrix product is even more faster. torch. einsum( bi,bj->ij , inp, out. g)can be reexpressed using matrix product, inp. t() @ out. gVersion 5 (without einsum)- Wall time: 3. 81 µs nn. Linear and nn. Module: Torch’s package nn. Linear and nn. Module Version 6 (torch package)- Wall time: 5. 01 µs Final, Using torch. nn. Linear & torch. nn. Module~~~pythonclass Model(nn. Module): def init(self, n_in, nh, n_out): super(). init() self. layers = [nn. Linear(n_in,nh), nn. ReLU(), nn. Linear(nh,n_out)] self. loss = mse def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x. squeeze(), targ)class Model(): def init(self): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ)def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() ~~~ Footnote: fast. ai forums Lesson-8 ↩ pytorch docs - autograd ↩ stackoverflow - finding methods a object has ↩ "
+ "body": "2020/03/02 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring ” Lecture 08 - Deep Learning From Foundations-part2 “ Homework: calculus for machine learning einsum conventionCONTENTS: Foundation version Gradients backward pass decompose function chain rule with code check the result using Pytorch autograd Refactor model Layers as classes Modue. forward() Without einsum nn. Linear and nn. Module Forward process Foundation version: Gradients backward pass: Gradients is output with respect to parameter we’ve done this work in this path(below) to simplify this calculus, we can just change it into, So, you should know of the derivative of each bit on its own, and then you multiply them all together. As a result, it would be over cross over the data. So you can get gradient, output with respect to parameter What order should we calculate? BTW, why Jeremy wrote , not Loss function?1 decompose function We want to get derivative of which forms But, we have a estimation of answer (we call it y hat) now So, I will decompose funciton to trace target variable. Using the above forward pass, we can suppose some function from the end. start from , We know MSE funciton got two parameters, output, and target . from MSE’s input we know function’s output and supposing v is input of that function, similarly, v became output of chain rule with code examplify backward process by random sampling To get a variable, I modified forward model a little def model_ping(out = 'x_train'): l1 = lin(x_train, w1, b1) # one linear layer l2 = relu(l1) # one relu layer l3 = lin(l2, w2, b2) # one more linear layer return eval(out) Be careful we don’t use mse_loss in backward process1) start with the very last function, which is loss funciton. MSE If we codify this formula,def mse_grad(inp, targ): #mse_input(1000,1), mse_targ (1000,1) # grad of loss with respect to output of previous layer inp. g = 2. * (inp. squeeze() - targ). unsqueeze(-1) / inp. shape[0] And, this can be examplified like below. Notice that input of gradient function is same with forward functiony_hat = model_ping('l3') #get value from forward modely_hat. g = ((y_hat. squeeze(-1)-y_train). unsqueeze(-1))/y_hat. shape[0]y_hat. g. shape>>> torch. Size([50000, 1]) We can just calculate using broadcasting, not using squeeze. then why should do and unsqueeze again?🎯 It’s related with random access memory(RAM). . If I don’t squeeze, (I’m using colab) it out of RAM. 2) Derivative of linear2 function This process’s weight dimensions defined by axis=1, axis=2. axis=0 dimension means size of data. This will be summazed by . sum(0) method. unsqeeze(-1)&unsqeeze(1) seperates the dimension, and make a dot product, and vanish axis=0 dimension. def lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowlin2 = model_ping('l2'); #get value from forward modellin2. g = y_hat. g@w2. t(); w2. g = (lin2. unsqueeze(-1) * y_hat. g. unsqueeze(1)). sum(0);b2. g = y_hat. g. sum(0);lin2. g. shape, w2. g. shape, b2. g. shape>>> torch. Size([50000, 50])torch. Size([50, 1])torch. Size([1]) Notice going reverse order, we’re passing in gradient backward3) derivative of ReLU def relu_grad(inp, out): # grad of relu with respect to input activations inp. g = (inp>0). float() * out. g Examplified belowlin1=model_ping('l1') #get value from forward modellin1. g = (lin1>0). float() * lin2. g;lin1. g. shape>>> torch. Size([50000, 50])4) Derivative of linear1 Same process with 2) but, this process’s weight hasdef lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowx_train. g = lin1. g @ w1. t(); w1. g = (x_train. unsqueeze(-1) * lin1. g. unsqueeze(1)). sum(0); b1. g = lin1. g. sum(0);x_train. g. shape, w1. g. shape, b1. g. shape>>> torch. Size([50000, 784])torch. Size([784, 50])torch. Size([50])5) Then it goes backward pass def forward_and_backward(inp, targ): # forward pass: l1 = inp @ w1 + b1 l2 = relu(l1) out = l2 @ w2 + b2 # we don't actually need the loss in backward! loss = mse(out, targ) # backward pass: mse_grad(out, targ) lin_grad(l2, out, w2, b2) relu_grad(l1, l2) lin_grad(inp, l1, w1, b1)Version 1 (Basic)- Wall time: 1. 95 s Summary Notice that output of function at forward pass became input of backward pass backpropagation is just the chain rule value loss (loss=mse(out,targ)) is not used in gradient calcuation. Because, it doesn’t appear with the weight. w1g, w2g, b1g, b2g, ig will be used for optimizercheck the result using Pytorch autograd require_grad_ is the magical function, which can automatic differentiation. 2 This magical auto gradified tensor keep track what happend in forward (taking loss function), and do the backward3 So it saves our time to differentiate ourselves Postfix underscore means in pytorch, in-place function, What is in-place function?⤵️ THis is benchmark…. . Version 2 (torch autograd)- Wall time: 3. 81 µs Refactor model: Amazingly, just refactoring our main pieces, it comes down up to Pytorch package. 🌟 Implement yourself, Practice, practice, practice! 🌟 Layers as classes: Relu and Linear are layers in oue neural net. -> make it as classes For the forward, using __call__ for the both of forward & backward. Because ‘call’ means we treat this as a function. class Lin(): def __init__(self, w, b): self. w,self. b = w,b def __call__(self, inp): self. inp = inp self. out = inp@self. w + self. b return self. out def backward(self): self. inp. g = self. out. g @ self. w. t() # Creating a giant outer product, just to sum it, is inefficient! self. w. g = (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) self. b. g = self. out. g. sum(0) Remember that in lin_grad function, we save bias&weight!!!!!💬 inp. g : gradient of the output with respect to the input. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 w. g : gradient of the output with respect to the weight. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 b. g : gradient of the output with respect to the bias. {: style=”color:grey; font-size: 90%; text-align: center;”} class Model(): def __init__(self, w1, b1, w2, b2): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ) def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() refer to Jeremy’s Model class, he put layers in list Dionne’s self-study note: Decomposing Jeremy’s Model class init needs weight, bias but not x data when call that class(a. k. a function) it gave x data and y label! jeremy composited function in layers. x = l(x) so concise…. . also utilized that layer list when backward ust reversing it (using python list’s method) And he is recursively calling the function on the result of the previous thing. ⬇️for l in self. layers: x = l(x)Q2: Don’t I need to declare magical autograd function, requires_grad_?{: style=”color:red; font-size: 130%; text-align: center;”} [The questions migrated to this article] Version 3 (refactoring - layer to class)- Wall time: 5. 25 µs Modue. forward(): Duplicate code makes execution time slow. Role of __call__ changed. No more __call__ for implementing forward pass. By initializing the forward with __call__, Module. forward() use overriding to maximize reusability. So any layer inherit Module, can use parent’s function. gradient of the output with respect to the weight (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) can be reexpressed using einsum, torch. einsum( bi,bj->ij , inp, out. g) Defining forward and Module enables Pytorch to out almost duplicatesVersion 4 (Module & einsum)- Wall time: 4. 29 µs Q2: Isn’t there any way to use broadcasting? Why we should use outer product?{: style=”color:red; font-size: 130%; text-align: center;”} Without einsum: Replacing einsum to matrix product is even more faster. torch. einsum( bi,bj->ij , inp, out. g)can be reexpressed using matrix product, inp. t() @ out. gVersion 5 (without einsum)- Wall time: 3. 81 µs nn. Linear and nn. Module: Torch’s package nn. Linear and nn. Module Version 6 (torch package)- Wall time: 5. 01 µs Final, Using torch. nn. Linear & torch. nn. Module~~~pythonclass Model(nn. Module): def init(self, n_in, nh, n_out): super(). init() self. layers = [nn. Linear(n_in,nh), nn. ReLU(), nn. Linear(nh,n_out)] self. loss = mse def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x. squeeze(), targ)class Model(): def init(self): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ)def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() ~~~ Footnote: fast. ai forums Lesson-8 ↩ pytorch docs - autograd ↩ stackoverflow - finding methods a object has ↩ "
}, {
- "id": 13,
+ "id": 15,
"url": "http://localhost:4000/2020/03/note08-fastai-3/",
"title": "Implement forward&backward pass from scratch",
"body": "2020/03/01 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring1. The forward and backward passes: 1. 1 Normalization: train_mean,train_std = x_train. mean(),x_train. std()>>> train_mean,train_std(tensor(0. 1304), tensor(0. 3073))Remember! Dataset, which is x_train, mean and standard deviation is not 0&1. But we need them to be which means we should substract means and divide data by std. You should not standarlize validation set because training set and validation set should be aparted. after normalize, mean is close to zero, and standard deviation is close to 1. 1. 2 Variable definition: n,m: size of the training set c: the number of activations we need in our model2. Foundation Version: 2. 1 Basic architecture: Our model has one hidden layer, output to have 10 activations, used in cross entropy. But in process of building architecture, we will use mean square error, output to have 1 activations and lator change it to cross entropy number of hidden unit; 50see below pic We want to make w1&w2 mean and std be 0&1. why initializating and make mean zero and std one is important? paper highlighting importance of normalisation - training 10,000 layer network without regularisation1 2. 1. 1 simplified kaiming initQ: Why we did init, normalize with only validation data? Because we can not handle and get statistics from each value of x_valid?{: style=”color:red; font-size: 130%; text-align: center;”} what about hidden(first) layer?w1 = torch. randn(m,nh)b1 = torch. zeros(nh)t = lin(x_valid, w1, b1) # hidden>>> t. mean(), t. std()((tensor(2. 3191), tensor(27. 0303))In output(second) layer, w2 = torch. randn(nh,1)b2 = torch. zeros(1)t2 = lin(t, w2, b2) # output>>> t2. mean(), t2. std()(tensor(-58. 2665), tensor(170. 9717)) which is terribly far from normalzed value. But if we apply simplified kaiming init w1 = torch. randn(m,nh)/math. sqrt(m); b1 = torch. zeros(nh)w2 = torch. randn(nh,1)/math. sqrt(nh); b2 = torch. zeros(1)t = lin(x_valid, w1, b1)t. mean(),t. std()>>> (tensor(-0. 0516), tensor(0. 9354)) But, actually, we use activations not only linear function After applying activations relu at linear layer, mean and deviation became 0. 5. 2. 1. 2 Glorrot initializationPaper2: Understanding the difficulty of training deep feedforward neural networks Gaussian(, bell shaped, normal distributions) is not trained very well. How to initialize neural nets? with the size of layer , the number of filters . But there is No acount for import of ReLU If we got 1000 layers, vanishing gradients problem emerges2. 1. 3 Kaiming initializatingPaper3: Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Kaiming He, explained here rectifier: rectified linear unit rectifier network: neural network with rectifier linear units This is kaiming init, and why suddenly replace one to two on a top? to avoid vanishing gradient(weights) But it doesn’t give very nice mean tough. 2. 1. 4 Pytorch package Why fan_out? according to pytorch documentation, choosing 'fan_in' preserves the magnitude of the variance of the wights in the forward pass. choosing 'fan_out' preserves the magnitues in the backward pass(, which means matmul; with transposed matrix) ➡️ in the other words, torch use fan_out cz pytorch transpose in linear transformaton. What about CNN in Pytorch?I tried torch. nn. Conv2d. conv2d_forward?? Jeremy digged into using torch. nn. modules. conv. _ConvNd. reset_parameters?? 2 in Pytorch, it doesn’t seem to be implemented kaiming init in right formula. so we should use our own operation. But actually, this has been discussed in Pytorch community before. 3 4 Jeremy said it enhanced variance also, so I sampled 100 times and counted better results. To make sure the shape seems sensible. check with assert. (remember we will replace 1 to 10 in cross entropy)assert model(x_valid). shape==torch. Size([x_valid. shape[0],1])>>> model(x_valid). shape(10000, 1) We have made Relu, init, linear, it seems we can forward pass code we need for basic architecture nh = 50def lin(x, w, b): return x@w + b;w1 = torch. randn(m,nh)*math. sqrt(2. /m ); b1 = torch. zeros(nh)w2 = torch. randn(nh,1); b2 = torch. zeros(1)def relu(x): return x. clamp_min(0. ) - 0. 5t1 = relu(lin(x_valid, w1, b1))def model(xb): l1 = lin(xb, w1, b1) l2 = relu(l1) l3 = lin(l2, w2, b2) return l32. 2 Loss function: MSE: Mean squared error need unit vector, so we remove unit axis. def mse(output, targ): return (output. squeeze(-1) - targ). pow(2). mean() In python, in case you remove axis, you use ‘squeeze’, or add axis use ‘unsqueeze’ torch. squeeze where code commonly broken. so, when you use squeeze, clarify dimension axis you want to removetmp = torch. tensor([1,1])tmp. squeeze()>>> tensor([1, 1]) make sure to make as float when you calculateBut why??? because it is tensor?{: style=”color:red; font-size: 130%;”} Here’s the error when I don’t transform the data type ---------------------------------------------------------------------------TypeError Traceback (most recent call last)<ipython-input-22-ae6009bef8b4> in <module>()----> 1 y_train = get_data()[1] # call data again 2 mse(preds, y_train)TypeError: 'map' object is not subscriptable This is forward passFootnote: Other materials: Understanding the difficulty of training deep feedforward neural networks, paper that introduced Xavier initialization Fixup Initialization: Residual Learning Without Normalization ↩ Pytorch implementaion on Kaiming init of conv and linear layers ↩ Pytorch kaiming init issue ↩ Pytorch kaiming init explained ↩ "
}, {
- "id": 14,
+ "id": 16,
"url": "http://localhost:4000/2020/03/note08-fastai-2/",
"title": "What's inside Pytorch Operator?",
"body": "2020/03/01 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, RefactoringWhat’s inside Pytorch Operator?: Section02 Time comparison with pure Python: Matmul with broadcasting> 3194. 95 times faster Einstein summation> 16090. 91 times faster Pytorch’s operator> 49166. 67 times faster 1. Elementwise op: 1. 1 Frobenius norm: above converted into (m*m). sum(). sqrt() Plus, don’t suffer from mathmatical symbols. He also copy and paste that equations from wikipedia. and if you need latex form, download it from archive. 2. Elementwise Matmul: What is the meaning of elementwise? We do not calculate each component. But all of the component at once. Because, length of column of A and row of B are fixed. How much time we saved? So now that takes 1. 37ms. We have removed one line of code and it is a 178 times faster…#TODOI don’t know where the 5 from. but keep it. Maybe this is related with frobenius norm…?as a result, the code before for k in range(ac): c[i,j] += a[i,k] + b[k,j]the code after c[i,j] = (a[i,:] * b[:,j]). sum()To compare it (result betweet original and adjusted version) we use not test_eq but other function. The reason for this is that due to rounding errors from math operations, matrices may not be exactly the same. As a result, we want a function that will “is a equal to b within some tolerance” #exportdef near(a,b): return torch. allclose(a, b, rtol=1e-3, atol=1e-5)def test_near(a,b): test(a,b,near)test_near(t1, matmul(m1, m2))3. Broadcasting: Now, we will use the broadcasting and removec[i,j] = (a[i,:] * b[:,j]). sum() How it works?>>> a=tensor([[10,10,10], [20,20,20], [30,30,30]])>>> b=tensor([1,2,3,])>>> a,b (tensor([[10, 10, 10], [20, 20, 20], [30, 30, 30]]),tensor([1, 2, 3])) >>> a+btensor([[11, 12, 13], [21, 22, 23], [31, 32, 33]]) <Figure 2> demonstrated how array b is broadcasting(or copied but not occupy memory) to compatible with a. Refered from numpy_tutorial there is no loop, but it seems there is exactly the loop. This is not from jeremy (actually after a moment he cover it) but i wondered How to broadcast an array by columns? c=tensor([[1],[2],[3]])a+ctensor([[11, 11, 11], [22, 22, 22], [33, 33, 33]])s What is tensor. stride()?help(t. stride)Help on built-in function stride: stride(…) method of torch. Tensor instancestride(dim) -> tuple or intReturns the stride of :attr:’self’ tensor. Stride is the jump necessary to go from one element to the next one in the specified dimension :attr:’dim’. A tuple of all strides is returned when no argument is passed in. Otherwise, an integer value is returned as the stride in the particular dimension :attr:’dim’. Args: dim (int, optional): the desired dimension in which stride is requiredExample::* x = torch. tensor([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])`x. stride()>>> (5, 1)x. stride(0)>>> 5x. stride(-1)>>> 1 unsqueeze & None index We can manipulate rank of tensor Special value ‘None’, which means please squeeze a new axis here== please broadcast herec = torch. tensor([10,20,30])c[None,:] in c, squeeze a new axis in here please. 2. 2 Matmul with broadcasting: for i in range(ar):# c[i,j] = (a[i,:]). *[:,j]. sum() #previous c[i] = (a[i]. unsqueeze(-1) * b). sum(dim=0) And Using None also (As howard teached)c[i] = (a[i ]. unsqueeze(-1) * b). sum(dim=0) #howardc[i] = (a[i][:,None] * b). sum(dim=0) # using Nonec[i] = (a[i,:,None]*b). sum(dim=0)⭐️Tips🌟 1) Anytime there’s a trailinng(final) colon in numpy or pytorch you can delete it ex) c[i, :] = c [i]2) any number of colon commas at the start, you can switch it with the single elipsis. ex) c[:,:,:,:,i] = c […,i] 2. 3 Broadcasting Rules: What if we tensor. size([1,3]) * tensor. size([3,1])? torch. Size([3, 3]) What is scale???? What if they are one array is times of the other array? ex) Image : 256 x 256 x 3Scale : 128 x 256 x 3Result: ? Why I did not inserted axis via None, but happened broadcasting? >>> c * c[:,None]tensor([[100. , 200. , 300. ], [200. , 400. , 600. ], [300. , 600. , 900. ]])maybe it broadcast cz following array has 3 rows as same principle, no matter what nature shape was, if we do the operation tensor broadcasts to the other. >>> c==c[None]tensor([[True, True, True]])>>> c[None]==c[None,:]tensor([[True, True, True]])>>>c[None,:]==ctensor([[True, True, True]])3. Einstein summation: Creates batch-wise, remove inner most loop, and replaced it with an elementwise producta. k. ac[i,j] += a[i,k] * b[k,j]inner most loop c[i,j] = (a[i,:] * b[:,j]). sum()elementwise product Because K is repeated so we do a dot product. And it is torch. Usage of einsum()1) transpose2) diagnalisation tracing3) batch-wise (matmul) … einstein summation notationdef matmul(a,b): return torch. einsum('ik,kj->ij', a, b)so after all, we are now 16000 times faster than Python. 4. Pytorch op: 49166. 67 times faster than pure python And we will use this matrix multiplication in Fully Connect forward, with some initialized parameters and ReLU. But before that, we need initialized parameters and ReLU, Footnote: TensorRank ti noteResources: Frobenius Norm Review Broadcasting Review (especially Rule) Refer colab! (I totally confused with extension of arrays) torch. allclose Review np. einsum Reviewh "
}, {
- "id": 15,
+ "id": 17,
"url": "http://localhost:4000/2020/02/note08-fastai-1/",
"title": "What is the meaning of 'deep-learning from foundations?'",
"body": "2020/02/29 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring” Lecture 08 - Deep Learning From Foundations-part2 “ I don’t know if you read this article, but I heartily appreciate Rachael Thomas and Jeremy Howard for providing these priceless lectures for free Homework: Review concepts 16 concepts from Course 1 (lessons 1 - 7)(1) Affine Functions & non-linearities; 2) Parameters & activations; 3) Random initialization & transfer learning; 4) SGD, Momentum, Adam; 5) Convolutions; Batch-norm; 6) Dropout; 7) Data augmentation; 8) Weight decay; 9) Res/dense blocks; 10) Image classification and regression; 11)Embeddings; 12) Continuous & Categorical variables; 13) Collaborative filtering; 14) Language models; 15) NLP classification; 16) Segmentation; U-net; GANS) Make sure you understand broadcasting Read section 2. 2 in Delving Deep into Rectifiers Try to replicate as much of the notebooks as you can without peeking; when you get stuck, peek at the lesson notebook, but then close it and try to do it yourself calculus for machine learning based on weight… einsum conventionCONTENTS: What is going on in this course? What is ‘from foundations’? Steps to a basic modern CNN model Today’s implementation goal: 1) matmul -> 4) FC backward Library development using jupyter notebook jupyter notebook certainly can make module Elementwise ops How can we make python faster? What is element wise operation? FootnoteWhat is going on in this course?: What is ‘from foundations’?: 1) Recreate fast. ai and Pytorch 2) using pure python Evade OverfittingOverfit : validation error getting worsetraining loss < validation loss Know the name of the symbol you usefind in this page if you don’t know the symbol that you are using or just draw it here (run by ML!) Steps to a basic modern CNN model: 1) Matrix multiplication -> 2) Relu/Initialization -> 3) Fully-connected Forward-> 4) Fully-connected Backward -> 5) Train loop -> 6) Convolution-> 7) Optimization ->8) Batchnormalization -> 9) Resnet Today’s implementation goal: 1) matmul -> 4) FC backward: Library development using jupyter notebook: what is assers? jupyter notebook certainly can make module: There will be #export tag that Howard (and we) want to extract special notebook2script. py will detect sign of #expert and convert following into python module and test ittest\_eq(TEST,'test')test\_eq(TEST,'test1') what is run_notebook. py? when you want to test your module in command line interface !python run\_notebook. py 01_matmul. ipynb Is there any difference between 1) and 2)?1) test -> test01 2) test01 -> test #TODO I don’t know yet look into run_notebook. py, package fire Jeremy used. What is that?read and run the code in a notebook, and in the process, Jeremy made Python Fire library called!shockingly, fire takes any kind of function and converts into CLI command. fire library was released by Google open source, Thursday, March 2, 2017 Get data pytorch and numpy are pretty much same. variable c explains how many pixels there are in in MNIST, 28 pixels PyTorch’s view() method: torch function that manipulating tensor, and squeeze() in torch & mathmatical operation similar function Rao & McMahan said usually this functions result in feature vector. In part 1, you can use view function several times. Initial python model Which is Linear, like $Xw$(weight)$+a$(bias) $= Y$ If you don’t know hou to multiple matrix, refer this site matmul visulization site How many time spends if we we use pure python function matmul, typical matrix multiplication function, takes about 1 second for calculating 1 single train data! (maybe assumed stochastic, 5 data points in validation) it takes about 11. 36 hours to update parameters even single layer and 1 iteration! (if that was my computer, it would be 14 hours. . )🤪 THIS is why we need to consider ‘time’&’space’ This is kinda slow - what if we could speed it up by 50,000 times? Let’s try! Elementwise ops: How can we make python faster?: If we want to calculate faster, then do remove pythonic calcuation, by passing its computation down to something that is written something other than python, like pytorch. According to PyTorch doc it uses C++ (via ATen), so we are going to implement that function with python. What is element wise operation?: items makes a pair, operate corresponding componentFootnote: notebooks material video broadcasting excel"
}, {
- "id": 16,
+ "id": 18,
"url": "http://localhost:4000/2020/02/what-is-convolution/",
"title": "Digging into convolution",
"body": "2020/02/28 - Issues 1) Kaiming Initializtion in Pytorch was in trouble. 1 2) Jeremy started to dig in, in lesson09, but I didn’t know why the size of tensor is 2 and even understand this spreadsheet data. 3 Homework: Read Visualizing and Understanding Convolutional Networks paper What is a convolution? Visualization one kernel Matthew D Zeiler & Rob Fergus Paper Convolution can be represented as matmul Padding Kernel has rank 3 How can we find a side-edge, a gradient and area of constant weight? What is a convolution?: A convolutional neural network is that your red, green, and blue pixels go into the simple computation, and something comes out of that, and then the result of that goes into a second layer, and the result of that goes into the third layer and so forth. Visualization: one kernel Refer this site for visualizing CNN filteringMatthew D Zeiler & Rob Fergus PaperLecture01 Nine examples of the actual coefficients from the **first layer** Convolution can be represented as matmul: CNNs from different viewpoints {align-items: center;} [A B C D E F G H I J] is 3 by 3 image data flatten to vector. As a result, convolution is a just matrix just two things happens Some of entries are set to zeros at all the times same color always have the same weight. That called weight time / wegith sharing So, we can implement a convolution with matrix multiplication. But, we don’t do that because it’s slow!Padding: What most of libraries do is just put zeros asdie of matrix fast. ai uses reflection paddings (what is this? Jeremy said he uttered it)Kernel has rank 3: As standard picture input would be 4 5, it would be actually 3d, not 2d. If we make kernel as a 3x3 size, we pass over same kernel all the different Red, Green, Blue Pixels. This could make problem, because, if we want to detect frog, which is green, we would want more activations on the green(I made a test cell in my colab 6) How can we find a side-edge, a gradient and area of constant weight?: Not top-edge! One kernel can find only the top-edge, so we should stack the kernels 7 So, we pass it through bunch of kernels to the input images, and that process gives us height x width x corresponding number of kernels. Usually that number of chanel is 16 And if we want to get the more channels and features, we should repeat that process This process gives rise to memory out of control, we do the stride #### conv-example. xlsx 2 convolutional filters At a second layer, filter is 3x3x2 tensor, because to add up together the first layer’s channel. Reference: Problem was math. sqrt(5) was not kaiming initialization formula, Implementation in Pytorch ↩ size of tensor, lecture09 ↩ conv-example. xlsx ↩ Why do computer use red, green and blue instead of primary colors ↩ Grayscale is a group of shades without any visible color. … Each of these dots has its own brightness level as well and, therefore, can be converted to grayscale. A grayscale image is one with all color information removed. ↩ Testing RGB and grayscale ↩ stack kernel and make new rank of tensor at output, Lesson06-2019 ↩ "
}, {
- "id": 17,
+ "id": 19,
"url": "http://localhost:4000/2020/02/dps-week8/",
- "title": "Digital Product School week 8&9",
- "body": "2020/02/24 - The 8th week retropect at Digital Product School Week 8/9 - Ship your MVP/Release next iteration each day This week's schedule CONTENT: Preparing engineering weekly Agile Process Daily Stand-up Making application flowchart (feat draw. io) / ER diagram Flowchart, understaning user journey ER diagram Engineering weekly AI lunch Connecting firebase andPreparing engineering weekly: This week at Wednesday, I planned to explain the Language Modelings, mainly focusing ELMo, ULMFiT, BERT and GPT-2. Slides is available here Changed the presentation, because there were people who are not in ML domain. hereWhenever I do the presentation, I learn more than the information I give them. At the same time, I realize I need to learn more than I know. Agile Process: One of a priceless lesson I learnt from digital product school, was experience of doing agile work. Before I came here, it was a little bit vague concept. I’m not sure ‘what is agile’ but this is what we tried to make agile process. Daily Stand-up: Sharing the works everyday helps interdisciplinary team to work better. Since product started to get higher fidelity, the gap between engineer and non-engineer increased. Actually I didn’t planned to explain concept because I thougth I would be lose my audience when I start to explain. But as daily stand-up, which shares our progess, goes day by day, I planed and reported the issues. And it made each other’s topic feel more familiar. I think point is very important, because at that point people start to be curious. So we can actively ask to the others, and that momwnr, we can explain the point teammate dosen’t know. Each color means every different section. Red: Our team goal, Blue: Interaction designer, Green: Product manager, Yellow: Software/AI engineer This week engineer's main plan Each of us try to explain what we are doing, but things become easier when we are asked. Because we explained something was important to us before, but if we asked it is something important for the others. Making application flowchart (feat draw. io) / ER diagram: Before we start the party, we should clarify the flowchart and ER diagram of our application. Flowchart, understaning user journey: Thanks for google, we could use draw. io for our framechart framework. Actually, we cana choice other good flatform, but draw. io has connected app throgh google drive, most of our engineer was used to it. And after this job, I got to know there is also (of course) rule with the symbols, color, size, space, scaling and direction of arrow -reference. But why we should do this? WE have made our storymap before!! I think storymap is for visualize our status and app. So it should be shared with whole the team, and they should able to understand each role’s issue. But flowchart is more like testing technical feasibility, and error that user can experience. So it could be little more specific, complicated, and hypothetical. This week engineer's main plan ER diagram: Even if we use NoSQL database through firebase, my team was accustomed to SQL more. That what we educated when we were at college, so we had to organize our concept while we were learning NoSQL. Engineering weekly: Every engineering weekly we exchange our knowledge each other so that we can grow together. Before today, my AI collegues presented regression, knn and it was my turn. I prepared slide that explain about pre-trained language model, but my header advised me if I go deep of theoretical things, I would lose my audience. So I decided to brief BERT mode, how I can contribute to other team’s project. Since BERT was breakthrough of NLP industry, I tried to explain how it can be applied to hands on product and how it can help people in their product. The result was quite motivative to me. They gave feedback that since it wasn’t that much theoretical, they could enjoy it, and useful information. Someone asked me do I had learned of presentation before. I was really happy with their feedback! AI lunch: Connecting firebase and: "
+ "title": "My life in Digital Product School - week 8/19/10",
+ "body": "2020/02/24 - The 8/9/10th week retropect at Digital Product School Week 8 - Ship your MVPWeek 9/10 - Release next iteration each day Week 8th schedule CONTENT: Agile Product Development Daily Stand-up(planning) Gemba Walk Sprint Reviews Engineering weeklyAgile Product Development: One of a priceless lesson I learnt from digital product school, was experience of doing agile work. Before I came here, it was a little bit vague concept. I’m still not sure ‘what is agile’ but this is how we tried to make agile process. Daily Stand-up(planning): Sharing the works everyday helps interdisciplinary team to work better. Since product started to get higher fidelity, the gap between engineer and non-engineer increased. Actually I didn’t planned to explain concept because I thougth I would be lose my audience when I start to explain. But as daily stand-up, which shares our progess, goes day by day, I planed and reported the issues. And it made each other’s topic feel more familiar. I think point is very important, because at that point people start to be curious. So we can actively ask to the others, and that momwnr, we can explain the point teammate dosen’t know. Each color means every different section. Red: Our team goal, Blue: Interaction designer, Green: Product manager, Yellow: Software/AI engineer This week engineer's main plan Each of us try to explain what we are doing, but things become easier when we are asked. Because we explained something was important to us before, but if we asked it is something important for the others. Gemba Walk: Team Cero with core team Every 2 weeks, we do the Gemba work, which is ‘question everything to the core team’ time. At this period, people can ask anything related to our product, workshop, and framework. Core team will help just for each team, and each team can solve the problem related to their work. < br/>Why we need this session? because with workshop and general schedule, core team has no time just focus on each team. So through this session, we can have opportunity to understand each program and workshop, like why we are using this platform, and when is the due of our small project, and we have this problem and we need help for this. whatever small problem you have, core team is always willing to help you. Sprint Reviews: Every Friday, we have time to summarise what we did for the week. Maybe we need HMW question and our storymap to share our process and then tell and share what we did try, what point we succeeded and what point it was deviant of our prediction, and why we tried it. . Sprint of Ve-link And then, just after all team’s ppt, we do vote with such a cute marvel. Always it’s very difficult to vote (of course you can’t vote to your team!) Because it depends on criteria what do I value!But since this is process of our agile work, I try to focus on what they have changed since last week, and why they did it, how they did it. Engineering weekly: Every engineering weekly we exchange our knowledge each other so that we can grow together. Everyone have their knowledge to share and we can be tutor and at the same time can be of tutee. Previously, my AI collegues presented regression, knn. And because I’m somewhat specialized to NLP, I prepared slide that explain about pre-trained language model, but my header advised me if I go deep of theoretical things, I would lose my audience. So I decided to brief BERT mode, how I can contribute to other team’s project. Since BERT was breakthrough of NLP industry, I tried to explain how it can be applied to hands on product and how it can help people in their product. The result was quite motivative to me. They gave feedback that since it wasn’t that much theoretical, they could enjoy it, and useful information. Someone asked me do I had learned of presentation before. I was really happy with their feedback! "
}, {
- "id": 18,
+ "id": 20,
"url": "http://localhost:4000/2020/02/fast.ai-nlp-note-16/",
"title": "Algorithmic bias",
"body": "2020/02/20 - Algorithms can encode & magnify human bias Case Study 1: Facial Recognition & Predictive Policing: Joy Buolamwini & Timnit Gebru, gendershades. org Microsoft, FACE+, IBM - All of these things are sell now. Largest gap between $\therefore\ Lighter Male\ >\ Darker\ Female $ This US mayor joked cops should “mount . 50-caliber” guns where AI predicts crime With machine learning, with automation, there’s a 99% success, so that robot is ㅡwill beㅡ99% accurate in telling us what is going to happen next, which is really interesting. - city official in Lancater, CA, approving on using IBM for public security Bias: Bias is type of error Statistical Bias: difference between a statistic’s expected value and the true value Unjust Bias: disproportionate preference for or prejudice against a group Unconscious bias: bias that we don’t realize we have But, term bias is too generic to be productive. Different sources of bias have different causes Representation Bias: Dataset was not representative of the algorithm that might be used on later. Above : Data is okay, but algorithm has some problem. Below : Data has error. For example, object detection production that performs very well in common product of US. But in contrast, change of target product region, like Zimbabwe, Solomon Island, and so on, reduced the performence remarkably. It is not the algorithmic problem, so we should care about data volume of region. Evaluation Bias: Benchmark datasets spur on research, 4. 4% of IJB-A images are dark-skinned women. 2/3 of ImageNet images from the West (Sharkar et al, 2017) Case Study 2: Recidivism Algorithm Used Prison Sentencing: Case Study 3: Online Ad Delivery: Bias in NLP: ( Nothing to do with the course, but I’m researching this field these days. ) But all about Englsih ImpactThe person is doctor. The person is nurse -> 그는 의사다. 그녀는 간호사다. Concept of “biased data” often too generic to be useful: Different sources of bias have different sources Data, models and systems are not unchanging numbers on a screen. They’re the result of a complex process that starts with years of historical context and involves a series of choices and norms, from data measurement to model evaluation to human interpretation. - Harini Suresh, “The problem with Biased Data” Five Sources of Bias in ML: Representation Bias Evaluation Bias Measurement Bias Aggregation Bias(46:02) Historical Bias(46:26) A few studies(47:13) Racial Bias, Even when we have good intentions(new york times)(47:10) gender(48:59) Humans are biased, so why does algorithmic bias matter?: Algorithms & humans are used differently (humans are usually decision maker) Algorithms are accurate and objective No way to apeal if there if error processed large scale cheap Machine learning can amplify bias Machine learning can create feedback loops. Technology is power. And with that comes responsibility. Solutions: Analyze a project at work/school: Questions about AI 5 types of bias (Suresh & Guttag) Datasheets for datasets, Modelcards for model reporting Accuracy rate on different sub-groups Work with domain experts & those impacted Increase diversity in our workspace Advocate for good policy Be on the ongoing lookout for bias"
}, {
- "id": 19,
+ "id": 21,
"url": "http://localhost:4000/2020/02/classifier-city/",
"title": "Making a classifier with image dataset made from gooogle",
"body": "2020/02/15 - CONTENTS: Creating dataset from google images Using google_images_download Create ImageDataBunch Train model fit_one_cycle() Let’s find-tune Let’s train the whole model! Let’s make batch size bigger! Interpretation Model in productionCode can be found hereDeployed model here Making a classifier which can distinguish Seoul from Munich and Sanfrancisco!(hoping my well in Munich!) Creating dataset from google images: In machine learning, you always need data before you build your model. You can use either URLs or google_images_download package. Since Jeremy explained specifically, I will try the other. Using google_images_download: note: This is not google official package Refer to Official Doncument, put that arguments. from google_images_download import google_images_downloadresponse = google_images_download. googleimagesdownload() #class instantiationout_dir = os. path. abspath('. . /. . /materials/dataset/pkg/')os. mkdir(out_dir)arguments = { keywords : Cebu,Munich,Seoul , print_urls :True, suffix_keywords : city , output_directory :out_dir, type : photo , }paths = response. download(arguments) #passing the arguments to the functionprint(paths)and if you need, here is main code. Create ImageDataBunch: We need to separate validation set because we just grabbed these imagese from Google. Most of the dataset we use (kaggle/research) splited into train / validation / test so if they are not devided beforehand we should make databunch, and Jeremy recommended assign 20% to validation. Help on function verify_images in module fastai. vision. data:verify_images(path: Union[pathlib. Path, str], delete: bool = True, max_workers: int = 4, max_size: int = None, recurse: bool = False, dest: Union[pathlib. Path, str] = '. ', n_channels: int = 3, interp=2, ext: str = None, img_format: str = None, resume: bool = None, **kwargs) Check if the images in `path` aren't broken, maybe resize them and copy it in `dest`. Data from google image url Data from package Train model: len(class) len(train) len(valid) Data_url 3 432 108 Data_pkg 3 216 53 Uisng model: restnet34 1, Measurement: accuracy 2 fit_one_cycle(): What is fit one cycle? Cyclical Learning Rates for Training Neural Networks One of the way to find good learning rate. Core idea is to start with small learning rate (like 1e-4, 1e-3) and increase the learning rate after each mini-batch till loss starts exploding. And pick up learning rate one order lower than exploding point. For example, plotted learning rate is like below picture, picking up around 1e-2 is the best way. Why this methods Traditionally, the learning rate is decreased as the learning starts converging with time. But this paper suggests to cycle our learning rate, because it makes us avoid local minimum. Basically this cyclic method enables us to explore whole of loss function so that find out global minimum. In other words, higher learning rate behaves like regularisation. Let’s find-tune: Do train just one last layer by learning rate found by find_lr This section you should find the strongest downward slope that kind of sticking around for quite a while. And choose just one order lower than lowest point. As explained before, I will pick up 1e-2. And of course, this is fine-tuning, we don’t need discriminative learning rate yet. Let’s train the whole model!: link When you plot the learning rate again, maybe you will get soaring shape of learning rate. Rule of thumb, When you slice the learning rate, use learning rate you used at unfrozen part. Divide it by 5 or 10 and put it on maximum bound. At minimum bound, get the point just before it soared, and divide it by 10. Let’s make batch size bigger!: Since default batch size is 64, I tried it to 128. And it gets way more better result(even it’s still underfitting!) And if I freeze model and train whole model again, the model would be better. Also, you can use this method to the other big dataset model training! Interpretation: See the confusion matrix. Result is quite great. *Since I’m using colab, I will skip data cleansing. But I highly recommend you to use ImageCleaner widget, only if you are using jupyter notebook (not jupyter lab) Model in production: You can deploy your model in simple way. I referred fast. ai, and used render(it’s free for limited time). You can find detailed document here. and you can create a route like this. @app. route( /classify-url , methods=[ GET ])async def classify_url(request): bytes = await get_bytes(request. query_params[ url ]) img = open_image(BytesIO(bytes)) _,_,losses = learner. predict(img) return JSONResponse({ predictions : sorted( zip(cat_learner. data. classes, map(float, losses)), key=lambda p: p[1], reverse=True ) })You can find my deployed model here Reference: How to create a deep learning dataset using Google Images towardsdatascience - one cycle policy Deep Residual Learning for Image Recognition ↩ Accuracy_and_precision ↩ "
}, {
- "id": 20,
+ "id": 22,
"url": "http://localhost:4000/2020/02/dps-week5/",
"title": "Digital Product School week 5",
"body": "2020/02/09 - The 5th week retropect at Digital Product School Week 5 - Create a Storymap and sync it with Lean Canvas This week's schedule CONTENT: How to create our story map Prepare your story Discover your product’s AI potentialMondayHow to create our story map: We need this 'aha' moment There was a Milestone workshop, about our weekly goal. As we are agile working, we go fast and change every week’s goal. This week we will finalize our story map based on user’s pain-point and HMW questions. How should we make our story-map Basically we should make story map based on this rule Tell stories, don’t just write them! We always need context, that means all the story component should be connected Visualize your product to establish a shared understanding and speed up discussions! Post-it filled of text is not enough, we should fill it with visualizations then team mates can understand it fast Only discuss in front our your story map! (Speed) So we can update our story-map as soon as we change our opinion And also Use a story map to find the parts that matter most and to identify holes in your idea! Since the story map consists of techinical part, we should consider each story’s technical feasibility Minimise output, maximise outcome and impact! Build tests to figure out what’s minimum and what’s viable! This story map functions to find out our minimum value of ideas Work iteratively: Change your story map according to your learnings! We should repeat this process again and again PMs: Make sure Storymap is up to date!Prepare your story: team cero, our whole story map Our goal Technical feasibility of our storyWhat is your strategy to make user achieve something? This would be our expand point Discover your product’s AI potential: How can we apply AI to our product? Let’s write down our ‘HMW’ questions, and find out all p ossibilities. These are suggestion of possibilities, so don’t attached to feasibility (we will do in at lean start-up) Software section's expectation AI section's expectationTuesday Engineer's task, week5This 5th week, engineers settled WendesdayThursdayFriday"
}, {
- "id": 21,
+ "id": 23,
"url": "http://localhost:4000/2020/02/GPU-time/",
"title": "4 reasons took much time to setting GPU for fast.ai than I expected",
"body": "2020/02/05 - Motivation: Before now, me as a undergraduate student, I was parsimony who usually depend on colab, kaggle, friend’s server(occasional) whenever i need GPU. . And this time it’s been for a while to install GPU than I expected and I share the several component that stood in my way. Written at Oct 24 2019, if you think this is deprecated, please do not have a leap of faith. Just for the record, I’ve used Kaggle, Colab, GCP, Azure, EC2 as GPU cloud. 1. Did not know there is JupyterLab option in Google Cloud Platform. : At the first time when GCP came out, there was no AI Platform service. So from starting vm instance to launching jupyter and installing packages, I did all of the things myself. (and I learned 🤗) $ curl -O https://repo. continuum. io/archive/Anaconda3-5. 0. 1-Linux-x86_64. sh[Downloading conda in ssh] I created VM instance,selected zone, machine type and disk type. Then, define firewall rules and in ssh terminal, install jupyter and other packages. But you can do all of these things just using AI Platform. [AI Platform] I think it especially save your time if you are living in Asia-Pacific, which google doesn’t support not that much GPU resources. 2. Consider if the platform has limited resources in a region you live in. : I live in South Korea, East Asia, and it seems like this region has lots of limitation in GPU (except quite expensive AWS) And the Taiwan which was the only one region where I can launch my own VM with GPU (I tried all the other regions in the list) sometimes do normaly, but not always. 😥After launching, I did several works and next day I could not start VM. (I didn’t count it, but tried it a few hours because I didn’t want cost any more time…) Endlessly failed to start instance, then I choose to move AWS as an alternative way. 3. Fast. ai gives deliberate guide and I didn’t know it. : Fast. ai offer the guide for all available platform. (Colab, salamander, Gradient, Kaggle, Colab, and so on) It is so important, and really needs, because cloud computing options are vary as occasion and purpose arise. I didn’t know fast. ai has manual to running GCP, and I think it’s as good a reason as any for me to be have taken time. It helped me so much when I had aws and shortened my time. I don’t want to read all of the manual in amazno. . (It is recommended. . but I’d rather read GIT PRO now…) ssh -i ~/. ssh/<your_private_key_pair> -L localhost:8888:localhost:8888 ubuntu@<your instance IP>4. You should wait to add more volume just after add volume, by building AWS EC2. : Since Elastic Block Store(EBS) storage supports optimized storage, users can’t extend storage volume two times in a row. Unfortunately, at the first time, I didn’t know it (again 👻) and when VM lacked volume, I doubled dist capacity (76*2) at a rough but It needs more. <!– this time I installed GPU in two years, and it became little complicated compared to 2 years ago. And this time for the first time(maybe not the first time. . but i handled it in my class or with my friend. but it’s my first time on my own. ) I very I’m started to using used google colab, kaggleand, GCP-JupyterLab, ec2 - friend made, aws vm machine but I had a environment variable but i did not know of it. On these days, I could not get a resources from taiwan… I couldn’t notice a deliberate Anyway, as a result I tried myself gcp myself and aws ec2 with fast. ai But I think doing on my self surely takes much time (in this point I wonder why I’m doing this, and should remind me, especially I was studying disk volume optimization) disk volume exceed - https://askubuntu. com/questions/919748/no-space-left-on-device-even-though-there-is: "
}, {
- "id": 22,
+ "id": 24,
"url": "http://localhost:4000/2020/02/dps-week4/",
"title": "Digital Product School week 4",
"body": "2020/02/01 - The 4th week retropect at Digital Product School Week 4 - Find solution ideas and run experiments [This week’s schedule] CONTENT: Ideation Techniques What is ideation techniques? Generating idea in my team AIdeation Team brain storming of idea Die Produkt MacherMondayIdeation Techniques: [slides from @steffen] What is ideation techniques?: We tried to find out user’s painpoint last week. Tried to users talk about their, pain point. No question directly, but extract from them their pain with transportation. Generating idea in my team: AIdeation: TuesdayTeam brain storming of idea: Based on generated idea on Monday, we extended our idea doing rolling-paper! Die Produkt Macher: What is lean start-up? Lean startup is a methodology for developing businesses and products that aims to shorten product development cycles and rapidly discover if a proposed business model is viable; this is achieved by adopting a combination of business-hypothesis-driven experimentation, iterative product releases, and validated learning. - wikipedia WendesdayThursdayFriday"
}, {
- "id": 23,
+ "id": 25,
"url": "http://localhost:4000/2020/01/retrosprect-of-acl-paper-2020/",
"title": "Retrospect of ACL 2020 paper writing",
"body": "2020/01/29 - 2020 Annual Conference of the Association for Computational Linguistics Why I can’t use ‘Cebuano’ for the research?: Why I had to change target language from ‘Cebuano’ to ‘Tagalog’?-> No language translator options except google translation. But before knowing that I already consult my friend, whose mother tongue is English. So I had to aplogize her, but couldn’t tell her why suddenly I changed my plan. -> I realized there are many languages even can’t be researched at all. . -> Getting accustomed to discrimination makes misunderstanding, sometimes. At my country, we couldn’t use music streaming service, because of legal problem. But at that moment, I thought it was discrimination, which is done by music company. "
}, {
- "id": 24,
+ "id": 26,
"url": "http://localhost:4000/2020/01/Git-Merge/",
"title": "Why am I not listed as a contributor?!",
"body": "2020/01/10 - From the end of last year, big changes have witnessed in NLP research. Embracing an unprecedented growth, I started to study new exciting results and advances. In doing so, I noticed I’m not listed as contributor of repo which my PR accessed. How did I come to a repository?: When I’m stuck, I would prefer to code, than to go deep in theory. (It must be so. . too much to understand 🤒)It was BERT released by Google AI I felt keenly the necessity of implementing, because not only couldn’t understand the way they figured out positional encoding formula, but how it actually works. What does it mean to “scale” dot product in Attention? (Now I know it’s far from my section 😂) Figure 1. Scaled Dot Product. Adopted from tensorflow blogWhat was the code error?: For implement code in paper, I read the papers Transformer and BERT, structured the model, and refered the others’ code. Meanwhile, I found out a small error in tokenization process, which was changing a token into [MASK], enabled bidirectional representation. I’ve made PR, and got merged. But I was not in contributors. Why?: Figure 2. Merged Pull request Adopted from graykode projectActually I happened to know there can be couple of reasons github doesn’t include my name as contributor. Well, if contributors tab has more than 100 people, in which case it shows you up only if you are in the top 100 contributors because displaying too many contributors can make webpages down. Somethimes, however, it doesn’t that problem. Why not? Two possibilities are there. First, According to Joel-Glovier, if repository maintainer merged-as-a-rebase PR will end up showing as maintainer’s commit. But maintainer shouldn’t normally do this. Second, if you happend to commit using a different git email that what is in your GitHub profile, it will not be attached to your Github user, and “doesn’t show up” as you. Reference: Michał Chromiak’s blog Github: why are my contributions are not showing on my profile atlassian-gitfetch"
}, {
- "id": 25,
- "url": "http://localhost:4000/2019/12/lesson1-fastai/",
- "title": "Fine Grained Classification",
- "body": "2019/12/31 - Finally you can solve the mystery behind this weird drawing. . through this course. juptyer notebook magic: %reload_ext autoreload%autoreload 2%matplotlib inlinethis is special directives to jupyter notebook, not python code. And it is called ‘magics’ (but i think jeremy is magicion) If somebody changes underlying library code while I’m running this, please reload it automatically If somebody asks to plot something, then please plot it here in this Jupyter NotebookDon’t hesitate to import start~ Digging into untar_data, path. ls: Union[pathlib. Path, str]: typed programming language? -> maybe i think disclaim the type beforehand for sure. Q. like assert? path. ls()this is some module that fast. ai made because os. listdir(‘path’) is unconvinient. Python3 pathlib library!: pathlib "
- }, {
- "id": 26,
+ "id": 27,
"url": "http://localhost:4000/2019/12/jeremy-howard/",
"title": "Jeremy Howard",
"body": "2019/12/15 - This is journey to find out ‘who am I trying to be?’: How he impacted me? The person who made me start Computer Vision again. He emphasized the importance of studying NLP and Computer together to understand the deep-learning. He didn’t order it to study, but always he pursuade me with reasonable way. “It’s not just something I can throw away. NLP and computer vision a few weeks apart and that’s going to force your brain to realize like ‘oh I have to remember this’” He made me admit my failure in deep-learning. I started to objectify where am I. What should I do when I’m frustrated. “Keep going. You’re not expected to remember everything. Yet. You’re not expected to understand everything. Yet. You’re not expected to know why everything works. Yet. ” His articles are numerous, below. What is torch. nn Really? High Performance Numeric Programming with Swift: Explorations and Reflections C++11, random distributions, and Swift And especially, I like this book. Designing great data products Great predictive modeling is an important part of the solution, but it no longer stands on its own; as products become more sophisticated, it disappears into the plumbing. Designing great data products And he is also famous for words. Here are some. we’re going to try and use that to really understand what’s going on. So to warn you, none of it is rocket science but a lot of its going to look really new. So don’t expect to get it the first time but expect to listen and jump into the notebook try a few things test things out look particularly at like tensor shapes and inputs and outputs to check your understanding then go back and listen again. But and kind of try it, a few times, because you will get there right, it’s just that there’s going to be a lot of new concepts because we haven’t done that much stuff in pure Pytorch. Lesson 6: Deep Learning 2019 "
}, {
- "id": 27,
+ "id": 28,
"url": "http://localhost:4000/2019/11/julia-evans/",
"title": "Julia Evans",
"body": "2019/11/20 - This is journey to find out ‘who am I trying to be?’: The women who surprised me in many ways. First, she approached me to teaching some concepts drawing cartoons. It was at Hackers news, which was hightest ranks. Personally I have the use of not to reading title, so and cartoon was so cute and clear. I naturally gonna understood mechanism and astonished by her explaination ability. Her value, which she was taught by many people so want to do same things, moved me. Volume of her knowledge, that just reading post title is a deal of work, amazed me. "
}, {
- "id": 28,
+ "id": 29,
"url": "http://localhost:4000/2019/11/coc-retropective/",
"title": "Retrospective on Pycon 2019 Korea (CoC Committee)",
"body": "2019/11/05 - When I was volunteer, it seems like busy and hectic to managing that crowded conference. In my experience, to get things moving, it needs hierarchy. But it didn’t. Organizers emphasized our responsibility, and if I passed each other’s burden, It could be my burden next time. In solidarity of the obligation, we finished conference well. And after participating PyCon Korea 2018 as volunteer, I’ve joined PyCon Korea Organizer last year. <Figure 1> First meeting of PyCon 2019 Korea Organizers It’s been a while since PyCon 2019 finished. It’s held on Aug 15 - 18, at Coex Grand Balloom <Figure 2> Ongoing session, speaking on news comment processing <Figure 3> Sponsor Booth iin Coex Hall <Figure 4> After PyCon 2019, with all of volunteer, organizer, speakers 😍 🥰 Serving as part of the coc TF, I spent large fraction of last year doing CoC job. here’s the path what we’ve been grappled with to grasp a solution. First half: Before the conference Toward Diverse Community: Formally we’ve been reusing and modifying PyCon US CoC, but we needed fit in Korean and I was part of that to revise code of conduct. Except ‘That’ Diversity, Because it is ‘Harassment’: Specific point was harassment, and the others were not. process of finding the points. How can we settle this point?Second half: During the conference Handling the potential Harassment: Disjunction of policy and real-time situation: This ‘PyCon 2019 Korea retrospective series’ would be devided into 3 Episodes. “Retrospective on Pycon 2019 Korea (CoC Committee)” “Retrospective on Pycon 2019 Korea (Program Chair)” (20 Nov, To Be Update) “Maintaining participation while still making timely decisions” (29 Nov, To Be Update)"
}, {
- "id": 29,
+ "id": 30,
"url": "http://localhost:4000/2019/11/elif-shafak/",
"title": "Elif Shafak",
"body": "2019/11/05 - This is journey to find out ‘who am I trying to be?’: For creative-minded people, Istanbul is a treasure. ’ Photo © Chris Boland, licensed under CC BY-NC-ND 2. 0 it suddenly felt like what I was trying to convey was more complicated and detailed than what the circumstances allowed me to say. And I did what I usually do in similar situations: I stammered, I shut down, and I stopped talking. I stopped talking because the truth was complicated, even though I knew, deep within, that one should never, ever remain silent for fear of complexity. <Figure 1> Elif Shafak Photo credit: www. elifsafak. com. tr I want to talk about emotions and the need to boost our emotional intelligence. I think it’s a pity that mainstream political theory pays very little attention to emotions. Oftentimes, analysts and experts are so busy with data and metrics that they seem to forget those things in life that are difficult to measure and perhaps impossible to cluster under statistical models. But I think this is a mistake, for two main reasons. We are emotional beings. I think it’s going to be one of our biggest intellectual challenges, because our political systems are replete with emotions. In country after country, we have seen illiberal politicians exploiting these emotions. And yet within the academia and among the intelligentsia, we are yet to take emotions seriously. I think we should. 1 2 Reference: British Council Worldwide ↩ Ted Talk ↩ "
}, {
- "id": 30,
+ "id": 31,
"url": "http://localhost:4000/2019/01/dps-week1/",
"title": "Digital Product School week 1",
"body": "2019/01/11 - The 1th week retropect at Digital Product School [This week’s schedule] CONTENT: Welcome to Digital Product School! Trip to Spitzingsee Welcome to Design Office Specifying our goal of product Welcome to Digital Product School!: Trip to Spitzingsee: At the first day of Digital Product School, we had a off-site with all of batch 9 people. All the costs were managed by dps. At the beautiful mountain, we settled the team, and got my team goal. Basically, there are two kind of team in DPS. (1) Wild team - the team has fixed topic(2) Company team - the team which has specific stakeholders, and also topic defined by that stakeholders The Core-team will fix what team you will join in DPS for 3 months based on ymy professionals, they announce it at off-site. [My team for 3 months at DPS] And we decide on my batch #9 theme song. How? Each team draw for songs and pitch ‘why this song should be batch #9 theme song’The result? Imagine dragon - Believer (I didn’t know at the moment, this song would be stamped in my memory) We have a workshop for getting to know each other. For example, we share 1) what do I expect from 3 months of dps, 2) when I feel happy in my life time, 3) what I worked for last week, 4) what was my last project and 5) what plays important role in my life My team's board Cero Welcome to Design Office: At first day of design office, we had workshop, which celebrates my day in dps also discuss specific rule, menifesto and stakeholders We get sticker and attach it in map depends on my nationality Now time to get to know my team’s stakeholders. What they want for us? What they expect from us? How free my team are on the topic?To be honest, it is endless tug-of-war. We should discuss with my stakeholders, endlessly, and find out solution which can meet interest of users, stakeholders and my team. Basically, my team’s main stakeholder is ADAC, but BMW, City of munich and Nokia will also participate as my team’s stakeholders. Specifying our goal of product: "
diff --git a/_site/2020/03/note08-fastai-3/index.html b/_site/2020/03/note08-fastai-3/index.html
index d217f06eac..a151dc55ed 100644
--- a/_site/2020/03/note08-fastai-3/index.html
+++ b/_site/2020/03/note08-fastai-3/index.html
@@ -19,9 +19,9 @@
-
+
+{"description":"This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring","author":{"@type":"Person","name":"dionne"},"@type":"BlogPosting","url":"http://localhost:4000/2020/03/note08-fastai-3/","publisher":{"@type":"Organization","logo":{"@type":"ImageObject","url":"http://localhost:4000/assets/images/logo.png"},"name":"dionne"},"image":"http://localhost:4000/assets/images/4-backward3.png","headline":"Implement forward&backward pass from scratch","dateModified":"2020-03-01T00:00:00+09:00","datePublished":"2020-03-01T00:00:00+09:00","mainEntityOfPage":{"@type":"WebPage","@id":"http://localhost:4000/2020/03/note08-fastai-3/"},"@context":"http://schema.org"}
@@ -161,96 +161,101 @@
"body": " {% if page. url == / %} {% assign latest_post = site. posts[0] %} <div class= topfirstimage style= background-image: url({% if latest_post. image contains :// %}{{ latest_post. image }}{% else %} {{site. baseurl}}/{{ latest_post. image}}{% endif %}); height: 200px; background-size: cover; background-repeat: no-repeat; ></div> {{ latest_post. title }} : {{ latest_post. excerpt | strip_html | strip_newlines | truncate: 136 }} In {% for category in latest_post. categories %} {{ category }}, {% endfor %} {{ latest_post. date | date: '%b %d, %Y' }} {%- assign second_post = site. posts[1] -%} {% if second_post. image %} <img class= w-100 src= {% if second_post. image contains :// %}{{ second_post. image }}{% else %}{{ second_post. image | absolute_url }}{% endif %} alt= {{ second_post. title }} > {% endif %} {{ second_post. title }} : In {% for category in second_post. categories %} {{ category }}, {% endfor %} {{ second_post. date | date: '%b %d, %Y' }} {%- assign third_post = site. posts[2] -%} {% if third_post. image %} <img class= w-100 src= {% if third_post. image contains :// %}{{ third_post. image }}{% else %}{{site. baseurl}}/{{ third_post. image }}{% endif %} alt= {{ third_post. title }} > {% endif %} {{ third_post. title }} : In {% for category in third_post. categories %} {{ category }}, {% endfor %} {{ third_post. date | date: '%b %d, %Y' }} {%- assign fourth_post = site. posts[3] -%} {% if fourth_post. image %} <img class= w-100 src= {% if fourth_post. image contains :// %}{{ fourth_post. image }}{% else %}{{site. baseurl}}/{{ fourth_post. image }}{% endif %} alt= {{ fourth_post. title }} > {% endif %} {{ fourth_post. title }} : In {% for category in fourth_post. categories %} {{ category }}, {% endfor %} {{ fourth_post. date | date: '%b %d, %Y' }} {% for post in site. posts %} {% if post. tags contains sticky %} {{post. title}} {{ post. excerpt | strip_html | strip_newlines | truncate: 136 }} Read More {% endif %}{% endfor %} {% endif %} All Stories: {% for post in paginator. posts %} {% include main-loop-card. html %} {% endfor %} {% if paginator. total_pages > 1 %} {% if paginator. previous_page %} « Prev {% else %} « {% endif %} {% for page in (1. . paginator. total_pages) %} {% if page == paginator. page %} {{ page }} {% elsif page == 1 %} {{ page }} {% else %} {{ page }} {% endif %} {% endfor %} {% if paginator. next_page %} Next » {% else %} » {% endif %} {% endif %} {% include sidebar-featured. html %} "
}, {
"id": 12,
+ "url": "http://localhost:4000/2020/04/v3-2019-lesson06-note/",
+ "title": "fastai 2019 course-v3 Part1, lesson06",
+ "body": "2020/04/15 - Lesson 06Rossmann(Tabular): Tabular data: be careful on Categorical variable vs Continuous variable. if datatype is int, fastai think it is classification, not a regression. Root mean square percentage error. as loss function. When you assign the y_range, it’s better to assign little bit more than actual maximum. > because it’s sigmoid. intermediate layers, which is weight matrix is 1) 1000, and 2) 500 -> which means our parameter would be 500*1000. learn. modelWhat is dropout and embedding dropout?: Nitish Srivastava, Dropout: A Simple way to prevent Neural Networks from Overfitting you can dropout with p value, make it specified to specific layer, or make it applied to all the layers. Pytorch code 1) bernoulli, which decides whether you will hold it? 2) and divide the noise value depends on noise value. so noise became 2 or remain 0. According to pytorch code, We do change at training time, but we do nothing at test time. and this means you don’t have to do anything special with inference time. ’ TODO: find at forums what is inference time - Related to NVIDIA, GPU. Embedding dropout is just a dropout. It’s different between continuous variable and embedding layer. TODO Still can’t understand. why embedding dropout is effective. or,… in need. Let’s delete at random, some of the results of the embedding. and It worked well especially at Kaggle Batch Normalization: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift -> came out false! According to How Does Batch Normalization Help Optimization? The key was multiplicative bias {\gamma} and additive bias {\beta}` Explain Let $$ \hat{y} = f(w_1, w_2, w_3, … , x)} $$ , loss = MSE , Then y_range should be between 1 and 5` And Activation function ends with -1 -> +1 To mitigate this problem, we can add the other parameter, like $$w_n$$ But there’re so much interactions in the process so just re-scale the output. Momentum parameter at BatchNorm1d: Different from momentum like in optimization. This momentum is Exponentially weighted moving average of the mean, instead of deviation. If this is small number: mean standard deviation would be less from mini_batch to mini_batch » less regularization effect. (If this is large number, variation would be greater from mini_batch to mini_batch » more regularization effect) TODO: can’t sure, but i understand, this is not about how to update parameter but about how much reflect previous value when scale and shift Q. Preference between batchnorm and the other regularizations(drop out, weight decay)A. Nope, always try and see the results## lesson6-pets-more### Data Augmentation- Last reg- `get_transforms` has lots of params (even not yet learned all) -> check documentation - Remember you can implement all the doc contents bc it's made from nbdev - TODO: try this!!- Essence of data augmentation is you should maintain the label, while somewhat making sense. - ex) tilt, because it's optically sensible, you can always change the angle of the data view. - zeros, border, and reflection but always `reflection` works most of the time, so that is the default### Convolutional Kernel(What is convolution?)- Will make heat\_map from scratch, which means the parts convolution focuses on![setosa_visualization]()- http://setosa. io/ev/image-kernels/ - javascript thing - How convolution works - Kernel. which does element-wise multiplication, and sum them up - so it has on pixel less at borders -> so it uses padding, and fastai uses reflection as said. - why this Kernel(matrix) helps catching horizontal edge side? - because this kernel`(picture2)` weights differently, depends on `x axis` - why familiar, because it's similar intuition with fugus`(paper)` paper- CNN from different viewpoints`link` - output of pixel is results from different linear equations. - If you connect this with represents of neural network nodes, you can see that the specific inp nodes connected with specific out nodes. - **Summarize**: cnn does 1) matmul some of the elements are always zero 2) same weight for every row, which is called `weight time? weight. . ?, 1:18:50` `(picture)`#### Further lowdown- Because generally image has 3 channels, we need rank 3 kernel. - And **do multiply with all channel output is one pixel**. (`draw by your self`) - but this kernel will catch one feature, like horizontal, so that we make more kernel so that output becomes (h * w * kernel) - And that `kernel` come to `channel`- **Conv2d**: with 3 by 3 kernel, stride 2 conv -> (h/2 * w/2 * kernel) - skip or jump over input pixel - to protect from memory out of control~~~pythonlearn. modellearn. summary()~~~TODO: understand yourself the blocks of conv-kernel: - Usually use big kernel size at first layer (will study this at part2)- Bottom right highlighting kernel(`pic / draw`)- `torch. tensor. expand`: for memory efficient, because we should do RGB- We do not make separate kernel, but make rank 4 kernel - 4d tensor is just stacked kernel- `t[None]. shape` create new unit axis, and why? we make this -> it should move unit of batch, not one size image. ### Average pooling, feature- suppose our pre-trained model results in size of `11 by 11 by 512 ` `pic 4` and my classification task has 37 classes * take the first face of channel, which is 11 by 11 and `mean` it, so that make rank 2 tensor, 512 by 1 * and make 2d matrix, which is 512 by 37 and multiply so that we can get 37 by 1 matrix. - Feature, at convolution block - So, when we transfer-learning without unfreeze, every element of last matrix (512 by 1) should represent(or could catch) each feature. ### Heatmap, Hook~~~hook_output(model[0]) -> acts -> avg_acts~~~- if we average the block with `axis=feature`, result of matrix(11 by 11) depicts `how activated was that area?` -> it is heatmap, `avg_acts`- and acts comes from hook, which is more advanced pytorch feature. - hook into pytorch machine itself, and run any arbitrary Pytorch code - Why this is cool?: Normally it gives set of outputs of forward pass, but we can interrupt and hook the forward pass. - Also can store the output of the convolutional part of the model, which is before avg_pooling- Thinking back when we do cut off `after` the conv part. - but with fast. ai the original convolutional part of the model would be *the first thing in the model*, specifically could be given from `learn. model. eval()[0]` - And this is gotten from `hooked_output` and having hooked the output, we can pass our x_minibatch to output. - Not directly, but with normalized, minibatch, put on to the gpu - `one_item()` function do it, when we have one data `TODO: this is assignment` do it yourself without one_item function - and `. cuda()` put it on gpu- you should print out very often the shape of tensor, and try think why. "
+ }, {
+ "id": 13,
+ "url": "http://localhost:4000/2020/04/qna-image-segmentation/",
+ "title": "[Q&A] Image Segmentation, using Unet with Driving Video data",
+ "body": "2020/04/02 - This post is about my questions while I was studying USF Deep Learning course about image segmentation task. All the answers are from the course, source code, library document, or document. I cared about being clear at reporting information including source of information, however if there are still anything unclear, please contact me. And thank you Jeremy&Rachael for everything. Also Thank you Cambridge Computer Vision Lab to made us to study with your labor. The Cambridge-driving Labeled Video Database (CamVid) is the first collection of videos with object class semantic labels, complete with metadata. The database provides ground truth labels that associate each pixel with one of 32 semantic classes. If someone is interested in this project, please check the site and see the details. Now, let’s start first using jupyter’s one of tricks which I love most. It enables cell to print the code without print function. from IPython. core. interactiveshell import InteractiveShell# pretty print all cell's output and not just the last oneInteractiveShell. ast_node_interactivity = all from fastai. vision import *from fastai. callbacks. hooks import *from fastai. utils. mem import *path = untar_data(URLs. CAMVID) # The locations where the data and models are downloaded are set in config. ymlpath. ls() I’m trying to accustomed to using pathlib module, not just it became built-in module in python, but I felt uncomfortable myself with os module. However, still unpredictable conflicts are remain, even in the quite standard library like Pytorch, tensorflow, onnx. (it require me string for path. not PosixPath. will send PR. . ) [PosixPath('/root/. fastai/data/camvid/valid. txt'), PosixPath('/root/. fastai/data/camvid/images'), PosixPath('/root/. fastai/data/camvid/labels'), PosixPath('/root/. fastai/data/camvid/codes. txt')]path_img = path/'images'path_lbl = path/'labels'fnames = get_image_files(path_img) #filenamelbl_names = get_image_files(path_lbl)1. (Play with data) My Hypothesis: File name has A_B format. and A / B would be at key-value position. Use collections - defaultdict Default Dict: Link: easy to group a sequence of key and value pairs into a dictionary of list?from collections import defaultdictfnames[0], lbl_names[0](PosixPath('/root/. fastai/data/camvid/images/0001TP_009210. png'), PosixPath('/root/. fastai/data/camvid/labels/0016E5_01800_P. png'))files = [tuple(i. stem. split('_')) for i in fnames]labels = [tuple(i. stem. split('_')[:-1]) for i in lbl_names]d = defaultdict(list)for k, v in files: d[k]. append(v)d. keys()len(d['0001TP'])124for k, v in d. items(): print(k, v)0001TP ['009210', '008850', '007350', '008970', '009840', '010140', '008490', '008520', '009540', '008250', '008340', '006840', '007860', '007410', '007740', '009870', '010080', '007890', '008790', '010020', '008400', '007080', '008280', '010380', '009330', '009060', '007470', '006810', '009720', '008580', '007110', '008730', '009150', '007680', '009780', '007800', '007290', '008760', '009510', '008640', '008310', '007440', '006900', '007500', '008460', '009030', '008130', '009480', '009900', '010230', '009270', '008040', '007590', '007950', '009990', '008550', '007260', '008100', '007530', '006960', '008190', '009420', '009930', '009000', '007830', '008940', '006690', '009570', '008880', '010170', '007560', '009300', '006750', '009360', '010200', '007320', '008010', '009120', '007620', '007200', '007140', '010320', '006720', '008670', '007230', '008370', '010260', '009690', '006930', '009090', '007770', '010290', '010350', '008610', '008070', '009600', '008430', '009450', '007380', '009240', '007710', '007170', '008160', '008910', '007020', '006780', '007050', '009960', '009810', '008220', '009180', '009750', '010050', '009660', '010110', '007920', '009630', '007650', '006990', '008700', '009390', '007980', '008820', '006870']0016E5 ['01290', '08159', '05760', '08133', '08063', '06660', '00960', '05850', '00750', '06960', '08035', '08107', '07975', '08017', '05610', '07140', '08119', '08027', '07170', '08400', '08093', '02100', '06390', '04470', '08340', '06060', '00600', '07470', '08151', '07800', '01620', '05730', '01530', '00690', '08430', '05940', '01980', '07320', '08069', '07965', '04380', '05430', '01410', '06780', '08007', '08087', '08079', '06600', '08109', '05490', '00901', '04590', '04680', '08045', '01770', '06690', '08085', '06810', '00420', '08011', '07440', '02190', '06300', '04800', '01500', '00450', '08029', '01470', '06330', '07997', '08067', '05370', '08013', '08190', '00840', '02370', '08049', '08135', '01440', '06870', '05820', '05280', '08051', '04440', '08091', '01380', '00630', '07290', '05520', '04770', '00540', '07995', '07999', '05550', '07920', '08101', '08141', '08053', '04620', '08103', '05160', '07350', '08057', '06030', '06000', '08550', '07963', '08089', '05970', '08047', '05640', '06240', '05220', '04350', '01590', '07959', '01950', '08117', '06180', '01560', '05400', '08043', '07680', '00780', '08081', '07050', '01020', '01350', '04530', '06720', '07969', '08149', '08003', '08131', '08129', '08033', '05460', '01650', '07530', '08023', '05340', '08640', '05100', '08075', '01230', '04980', '02070', '01080', '06210', '05910', '08009', '01800', '05190', '02400', '08083', '08019', '07620', '07200', '07890', '08059', '06990', '04410', '08121', '08123', '06930', '08137', '08147', '08095', '06570', '06150', '08153', '06840', '05250', '00510', '08370', '08580', '08113', '07410', '08097', '01200', '04950', '07770', '07650', '04710', '06090', '08055', '07110', '07981', '00990', '08250', '08127', '01920', '07985', '08220', '08005', '08157', '05130', '08071', '01140', '04830', '07740', '08143', '06120', '02040', '08111', '08115', '00660', '08280', '06420', '07983', '02220', '05700', '01860', '01260', '04920', '06510', '07020', '08073', '08105', '08125', '06360', '07860', '07993', '00810', '06540', '08099', '08139', '02010', '07973', '08155', '07991', '06630', '00480', '06750', '04890', '08001', '08025', '00870', '08490', '01830', '07977', '05010', '01170', '07961', '01680', '01050', '07987', '07080', '04560', '00930', '05310', '02340', '05790', '08460', '00720', '08031', '02280', '08039', '08037', '08065', '06270', '08077', '06900', '04650', '06480', '07230', '08041', '06450', '00570', '07989', '04740', '07979', '02250', '07380', '00390', '01710', '07590', '08021', '08520', '07500', '01110', '04500', '02310', '07971', '02130', '05580', '05880', '08610', '08310', '08145', '05670', '04860', '07260', '08015', '07967', '01740', '01320', '07560', '07830', '01890', '08061', '02160', '07710', '05070', '05040']Seq05VD ['f00030', 'f02550', 'f03450', 'f01110', 'f00480', 'f00210', 'f04590', 'f04170', 'f01800', 'f03990', 'f03360', 'f03900', 'f02070', 'f00810', 'f03690', 'f01350', 'f01530', 'f04980', 'f05100', 'f03060', 'f00900', 'f03870', 'f02460', 'f01470', 'f02370', 'f02820', 'f04080', 'f02760', 'f04860', 'f02250', 'f04200', 'f00270', 'f03720', 'f02850', 'f04410', 'f01200', 'f03090', 'f02010', 'f03930', 'f00090', 'f01650', 'f01890', 'f03840', 'f03030', 'f02130', 'f01230', 'f04110', 'f02520', 'f04140', 'f04020', 'f00060', 'f03420', 'f01560', 'f00120', 'f04290', 'f02340', 'f00300', 'f01380', 'f00870', 'f01860', 'f02970', 'f04560', 'f02730', 'f00330', 'f04530', 'f03780', 'f01770', 'f03390', 'f05040', 'f02430', 'f03330', 'f00660', 'f01740', 'f02100', 'f04800', 'f04050', 'f00510', 'f02790', 'f04350', 'f00690', 'f00540', 'f02490', 'f00960', 'f00930', 'f04230', 'f02880', 'f03600', 'f01020', 'f01500', 'f02400', 'f04830', 'f04470', 'f03300', 'f02670', 'f00450', 'f01980', 'f01170', 'f01620', 'f04500', 'f01080', 'f03180', 'f05070', 'f03150', 'f04950', 'f01440', 'f03510', 'f01710', 'f00360', 'f04770', 'f02910', 'f01050', 'f00630', 'f04320', 'f00570', 'f03240', 'f02190', 'f01140', 'f03540', 'f02220', 'f02640', 'f03960', 'f00000', 'f04920', 'f01950', 'f00990', 'f03480', 'f03000', 'f00420', 'f04620', 'f03210', 'f00780', 'f03570', 'f01590', 'f00750', 'f01920', 'f04650', 'f03750', 'f03630', 'f02310', 'f02610', 'f02580', 'f04740', 'f02280', 'f04680', 'f00390', 'f00720', 'f03660', 'f02040', 'f03270', 'f00180', 'f03810', 'f01410', 'f01290', 'f03120', 'f00840', 'f04440', 'f00150', 'f01260', 'f02700', 'f02940', 'f00600', 'f01830', 'f04260', 'f05010', 'f04890', 'f02160', 'f00240', 'f04380', 'f01680', 'f04710', 'f01320']0006R0 ['f02820', 'f03690', 'f03180', 'f02550', 'f01020', 'f03660', 'f02340', 'f01170', 'f02610', 'f02940', 'f01290', 'f02100', 'f01350', 'f03270', 'f03870', 'f01380', 'f01980', 'f03810', 'f02430', 'f02310', 'f01830', 'f03480', 'f02970', 'f01890', 'f03210', 'f03930', 'f02040', 'f02070', 'f02400', 'f01560', 'f03030', 'f01770', 'f01590', 'f01950', 'f03420', 'f01650', 'f03450', 'f00990', 'f03630', 'f01500', 'f03570', 'f00930', 'f03090', 'f03360', 'f02880', 'f02460', 'f01440', 'f01920', 'f01230', 'f03840', 'f02730', 'f01620', 'f02220', 'f03750', 'f03330', 'f03540', 'f02520', 'f02790', 'f01050', 'f03120', 'f01800', 'f01140', 'f01860', 'f01530', 'f01470', 'f02670', 'f02490', 'f01260', 'f01110', 'f02760', 'f01680', 'f03150', 'f02580', 'f03300', 'f02280', 'f01200', 'f03390', 'f03510', 'f02640', 'f02190', 'f02370', 'f01320', 'f02130', 'f03600', 'f03240', 'f03780', 'f03720', 'f02700', 'f01410', 'f01080', 'f02850', 'f01710', 'f03900', 'f03060', 'f01740', 'f02010', 'f02250', 'f00960', 'f03000', 'f02160', 'f02910']for k, v in d. items(): print(k, len(d[k]))0001TP 1240016E5 305Seq05VD 1710006R0 101for i in d2. keys(): print(i,len(d2[i]))0016E5 3050001TP 1240006R0 101Seq05VD 171files[0], labels[0](('0001TP', '009210'), ('0016E5', '01800'))2. My question: Link: Why do we need masking? and does color from fastai library? (have to look into source code) What do the parameter alpha do? When people make masked img, would it be have ranged integer limit? Does image normalization related with this?lbl_sorted = sorted(lbl_names)f_sorted = sorted(fnames)lbl_1 = lbl_sorted[33]f_1 = f_sorted[33]img = open_image(lbl_1)mask = open_mask(lbl_1)_,axs = plt. subplots(1,2, figsize=(10,5))# img. show(ax=axs[0], y=mask, title='masked')img. show(ax=axs[0], title='1')mask. show(ax=axs[1], title='2', alpha=1. ) img_2 = open_image(f_1)mask_2 = open_mask(f_1)_,axs = plt. subplots(1,2, figsize=(10,5))# img. show(ax=axs[0], y=mask, title='masked')img_2. show(ax=axs[0], title='3',)mask_2. show(ax=axs[1], title='4', alpha=1. ) open_mask(lbl_1). data. shapetorch. Size([1, 720, 960])open_mask(lbl_1). data. shapetorch. Size([1, 720, 960])open_image(f_1). data. shapetorch. Size([3, 720, 960])open_image(f_1). data. shapetorch. Size([3, 720, 960])img. data #labeled datatensor([[[0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], . . . , [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176]], [[0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], . . . , [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176]], [[0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], . . . , [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176]]])mask. data # after mask, labeled datatensor([[[ 4, 4, 4, . . . , 21, 21, 21], [ 4, 4, 4, . . . , 21, 21, 21], [ 4, 4, 4, . . . , 21, 21, 21], . . . , [17, 17, 17, . . . , 30, 30, 30], [17, 17, 17, . . . , 30, 30, 30], [17, 17, 17, . . . , 30, 30, 30]]])img_2. data, mask_2. data(tensor([[[0. 0706, 0. 0667, 0. 0706, . . . , 0. 6431, 0. 6549, 0. 6627], [0. 0745, 0. 0706, 0. 0706, . . . , 0. 6431, 0. 6510, 0. 6549], [0. 0784, 0. 0706, 0. 0745, . . . , 0. 6392, 0. 6588, 0. 6588], . . . , [0. 0863, 0. 0824, 0. 0824, . . . , 0. 1333, 0. 1216, 0. 1255], [0. 0902, 0. 0863, 0. 0824, . . . , 0. 1255, 0. 1176, 0. 1216], [0. 0863, 0. 0824, 0. 0784, . . . , 0. 1137, 0. 1059, 0. 1137]], [[0. 0706, 0. 0667, 0. 0706, . . . , 0. 7490, 0. 7608, 0. 7686], [0. 0745, 0. 0706, 0. 0706, . . . , 0. 7451, 0. 7569, 0. 7608], [0. 0784, 0. 0706, 0. 0745, . . . , 0. 7412, 0. 7529, 0. 7529], . . . , [0. 0980, 0. 0941, 0. 0941, . . . , 0. 1804, 0. 1686, 0. 1725], [0. 1059, 0. 1020, 0. 0980, . . . , 0. 1725, 0. 1647, 0. 1686], [0. 1020, 0. 0980, 0. 0941, . . . , 0. 1608, 0. 1529, 0. 1608]], [[0. 0784, 0. 0745, 0. 0784, . . . , 0. 7569, 0. 7686, 0. 7765], [0. 0824, 0. 0784, 0. 0784, . . . , 0. 7647, 0. 7647, 0. 7686], [0. 0784, 0. 0706, 0. 0745, . . . , 0. 7608, 0. 7647, 0. 7647], . . . , [0. 1216, 0. 1176, 0. 1176, . . . , 0. 2000, 0. 1882, 0. 1922], [0. 1176, 0. 1137, 0. 1098, . . . , 0. 1843, 0. 1765, 0. 1804], [0. 1137, 0. 1098, 0. 1059, . . . , 0. 1725, 0. 1647, 0. 1725]]]), tensor([[[ 18, 17, 18, . . . , 183, 186, 188], [ 19, 18, 18, . . . , 183, 185, 186], [ 20, 18, 19, . . . , 182, 185, 185], . . . , [ 25, 24, 24, . . . , 43, 40, 41], [ 26, 25, 24, . . . , 41, 39, 40], [ 25, 24, 23, . . . , 38, 36, 38]]]))3. What is a difference between image and imageSegment?: imageSegment An ImageSegment object has the same properties as an Image. The only difference is that when applying the transformations to an ImageSegment, it will ignore the functions that deal with lighting and keep values of 0 and 1. It’s easy to show the segmentation mask over the associated Image by using the y argument of show_image. img = open_image(fnames[0])mask = open_mask(lbl_names[0])_,axs = plt. subplots(1,3, figsize=(8,4))img. show(ax=axs[0], title='no mask')img. show(ax=axs[1], y=mask, title='masked') #seg mask over the img using y argmask. show(ax=axs[2], title='mask only', alpha=1. ) vision. image ##4. Why/How img div by 255 and how it results fast. ai : vision. image - If div=True, pixel values are divided by 255. to become floats between 0. and 1. At times, you want to get rid of distortions caused by lights and shadows in an image. Normalizing the RGB values of an image can at times be a simple and effective way of achieving this. So sum of the pixel’s value over all channels(which is S) divides each intensified channel so that nomalized value will be R/S, G/S and B/S (where, S=R+G+B). Detailed explain here4. Python Evaluation Order: Python evaluates expressions from left to right. Notice that while evaluating an assignment, the right-hand side is evaluated before the left-hand side. mask_tmp, trg_tmp, void_tmp = 2, 1, 10mask_tmp = trg_tmp != void_tmpprint(mask_tmp, trg_tmp, void_tmp) # (1) target is not same with voidTrue 1 10# Example 1x = 1y = 2x,y = y,x+yx, y(2, 3)# Example 2x = 1y = 2x = yy = x+yx, y(2, 4)5. model learner parameter :: pct_start: A: Percentage of total number of epochs when learning rate rises during one cycle. Q: Sorry, I still confused that one cycle in the new API only runs one epoch. How the percentage of total number of epochs works? Can you give a example? If learn. fit_one_cycle(10, slice(1e-4,1e-3,1e-2), pct_start=0. 05)??A: Ok, strictly correct answer would be percentage of iterations, so you can have lr both increase and decrease during same epoch. In your example, say, you have 100 iterations per epoch, then for half an epoch (0. 05 * (10 * 100) = 50) lr will rise, then slowly decrease. Q2: Thanks for this explanation … so essentially, it is the percentage of overall iterations where the LR is increasing, correct? So, given the default of 0. 3, it means that your LR is going up for 30% of your iterations and then decreasing over the last 70%. Is that a correct summation of what is happening? A2: Yes, I think that’s correct. You can verify that by changing its value and check:learn. recorder. plot_lr() For example if pct_start = 0. 2 source: forums. fastai "
+ }, {
+ "id": 14,
"url": "http://localhost:4000/2020/03/note08-fastai-4/",
"title": "Gradient backward, Chain Rule, Refactoring",
- "body": "2020/03/02 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring” Lecture 08 - Deep Learning From Foundations-part2 “ Homework: calculus for machine learning einsum conventionCONTENTS: Foundation version Gradients backward pass decompose function chain rule with code check the result using Pytorch autograd Refactor model Layers as classes Modue. forward() Without einsum nn. Linear and nn. Module Forward process Foundation version: Gradients backward pass: Gradients is output with respect to parameter we’ve done this work in this path(below) to simplify this calculus, we can just change it into, So, you should know of the derivative of each bit on its own, and then you multiply them all together. As a result, it would be over cross over the data. So you can get gradient, output with respect to parameter What order should we calculate? BTW, why Jeremy wrote , not Loss function?1 decompose function We want to get derivative of which forms But, we have a estimation of answer (we call it y hat) now So, I will decompose funciton to trace target variable. Using the above forward pass, we can suppose some function from the end. start from , We know MSE funciton got two parameters, output, and target . from MSE’s input we know function’s output and supposing v is input of that function, similarly, v became output of chain rule with code examplify backward process by random sampling To get a variable, I modified forward model a little def model_ping(out = 'x_train'): l1 = lin(x_train, w1, b1) # one linear layer l2 = relu(l1) # one relu layer l3 = lin(l2, w2, b2) # one more linear layer return eval(out) Be careful we don’t use mse_loss in backward process1) start with the very last function, which is loss funciton. MSE If we codify this formula,def mse_grad(inp, targ): #mse_input(1000,1), mse_targ (1000,1) # grad of loss with respect to output of previous layer inp. g = 2. * (inp. squeeze() - targ). unsqueeze(-1) / inp. shape[0] And, this can be examplified like below. Notice that input of gradient function is same with forward functiony_hat = model_ping('l3') #get value from forward modely_hat. g = ((y_hat. squeeze(-1)-y_train). unsqueeze(-1))/y_hat. shape[0]y_hat. g. shape>>> torch. Size([50000, 1]) We can just calculate using broadcasting, not using squeeze. then why should do and unsqueeze again?🎯 It’s related with random access memory(RAM). . If I don’t squeeze, (I’m using colab) it out of RAM. 2) Derivative of linear2 function This process’s weight dimensions defined by axis=1, axis=2. axis=0 dimension means size of data. This will be summazed by . sum(0) method. unsqeeze(-1)&unsqeeze(1) seperates the dimension, and make a dot product, and vanish axis=0 dimension. def lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowlin2 = model_ping('l2'); #get value from forward modellin2. g = y_hat. g@w2. t(); w2. g = (lin2. unsqueeze(-1) * y_hat. g. unsqueeze(1)). sum(0);b2. g = y_hat. g. sum(0);lin2. g. shape, w2. g. shape, b2. g. shape>>> torch. Size([50000, 50])torch. Size([50, 1])torch. Size([1]) Notice going reverse order, we’re passing in gradient backward3) derivative of ReLU def relu_grad(inp, out): # grad of relu with respect to input activations inp. g = (inp>0). float() * out. g Examplified belowlin1=model_ping('l1') #get value from forward modellin1. g = (lin1>0). float() * lin2. g;lin1. g. shape>>> torch. Size([50000, 50])4) Derivative of linear1 Same process with 2) but, this process’s weight hasdef lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowx_train. g = lin1. g @ w1. t(); w1. g = (x_train. unsqueeze(-1) * lin1. g. unsqueeze(1)). sum(0); b1. g = lin1. g. sum(0);x_train. g. shape, w1. g. shape, b1. g. shape>>> torch. Size([50000, 784])torch. Size([784, 50])torch. Size([50])5) Then it goes backward pass def forward_and_backward(inp, targ): # forward pass: l1 = inp @ w1 + b1 l2 = relu(l1) out = l2 @ w2 + b2 # we don't actually need the loss in backward! loss = mse(out, targ) # backward pass: mse_grad(out, targ) lin_grad(l2, out, w2, b2) relu_grad(l1, l2) lin_grad(inp, l1, w1, b1)Version 1 (Basic)- Wall time: 1. 95 s Summary Notice that output of function at forward pass became input of backward pass backpropagation is just the chain rule value loss (loss=mse(out,targ)) is not used in gradient calcuation. Because, it doesn’t appear with the weight. w1g, w2g, b1g, b2g, ig will be used for optimizercheck the result using Pytorch autograd require_grad_ is the magical function, which can automatic differentiation. 2 This magical auto gradified tensor keep track what happend in forward (taking loss function), and do the backward3 So it saves our time to differentiate ourselves ⤵️ THis is benchmark…. . Version 2 (torch autograd)- Wall time: 3. 81 µs Refactor model: Amazingly, just refactoring our main pieces, it comes down up to Pytorch package. 🌟 Implement yourself, Practice, practice, practice! 🌟 Layers as classes: Relu and Linear are layers in oue neural net. -> make it as classes For the forward, using __call__ for the both of forward & backward. Because ‘call’ means we treat this as a function. class Lin(): def __init__(self, w, b): self. w,self. b = w,b def __call__(self, inp): self. inp = inp self. out = inp@self. w + self. b return self. out def backward(self): self. inp. g = self. out. g @ self. w. t() # Creating a giant outer product, just to sum it, is inefficient! self. w. g = (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) self. b. g = self. out. g. sum(0) Remember that in lin_grad function, we save bias&weight!!!!!💬 inp. g : gradient of the output with respect to the input. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 w. g : gradient of the output with respect to the weight. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 b. g : gradient of the output with respect to the bias. {: style=”color:grey; font-size: 90%; text-align: center;”} class Model(): def __init__(self, w1, b1, w2, b2): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ) def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() refer to Jeremy’s Model class, he put layers in list Dionne’s self-study note: Decomposing Jeremy’s Model class init needs weight, bias but not x data when call that class(a. k. a function) it gave x data and y label! jeremy composited function in layers. x = l(x) so concise…. . also utilized that layer list when backward ust reversing it (using python list’s method) And he is recursively calling the function on the result of the previous thing. ⬇️for l in self. layers: x = l(x)Q2: Don’t I need to declare magical autograd function, requires_grad_?{: style=”color:red; font-size: 130%; text-align: center;”} [The questions migrated to this article] Version 3 (refactoring - layer to class)- Wall time: 5. 25 µs Modue. forward(): Duplicate code makes execution time slow. Role of __call__ changed. No more __call__ for implementing forward pass. By initializing the forward with __call__, Module. forward() use overriding to maximize reusability. So any layer inherit Module, can use parent’s function. gradient of the output with respect to the weight (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) can be reexpressed using einsum, torch. einsum( bi,bj->ij , inp, out. g) Defining forward and Module enables Pytorch to out almost duplicatesVersion 4 (Module & einsum)- Wall time: 4. 29 µs Q2: Isn’t there any way to use broadcasting? Why we should use outer product?{: style=”color:red; font-size: 130%; text-align: center;”} Without einsum: Replacing einsum to matrix product is even more faster. torch. einsum( bi,bj->ij , inp, out. g)can be reexpressed using matrix product, inp. t() @ out. gVersion 5 (without einsum)- Wall time: 3. 81 µs nn. Linear and nn. Module: Torch’s package nn. Linear and nn. Module Version 6 (torch package)- Wall time: 5. 01 µs Final, Using torch. nn. Linear & torch. nn. Module~~~pythonclass Model(nn. Module): def init(self, n_in, nh, n_out): super(). init() self. layers = [nn. Linear(n_in,nh), nn. ReLU(), nn. Linear(nh,n_out)] self. loss = mse def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x. squeeze(), targ)class Model(): def init(self): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ)def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() ~~~ Footnote: fast. ai forums Lesson-8 ↩ pytorch docs - autograd ↩ stackoverflow - finding methods a object has ↩ "
+ "body": "2020/03/02 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring ” Lecture 08 - Deep Learning From Foundations-part2 “ Homework: calculus for machine learning einsum conventionCONTENTS: Foundation version Gradients backward pass decompose function chain rule with code check the result using Pytorch autograd Refactor model Layers as classes Modue. forward() Without einsum nn. Linear and nn. Module Forward process Foundation version: Gradients backward pass: Gradients is output with respect to parameter we’ve done this work in this path(below) to simplify this calculus, we can just change it into, So, you should know of the derivative of each bit on its own, and then you multiply them all together. As a result, it would be over cross over the data. So you can get gradient, output with respect to parameter What order should we calculate? BTW, why Jeremy wrote , not Loss function?1 decompose function We want to get derivative of which forms But, we have a estimation of answer (we call it y hat) now So, I will decompose funciton to trace target variable. Using the above forward pass, we can suppose some function from the end. start from , We know MSE funciton got two parameters, output, and target . from MSE’s input we know function’s output and supposing v is input of that function, similarly, v became output of chain rule with code examplify backward process by random sampling To get a variable, I modified forward model a little def model_ping(out = 'x_train'): l1 = lin(x_train, w1, b1) # one linear layer l2 = relu(l1) # one relu layer l3 = lin(l2, w2, b2) # one more linear layer return eval(out) Be careful we don’t use mse_loss in backward process1) start with the very last function, which is loss funciton. MSE If we codify this formula,def mse_grad(inp, targ): #mse_input(1000,1), mse_targ (1000,1) # grad of loss with respect to output of previous layer inp. g = 2. * (inp. squeeze() - targ). unsqueeze(-1) / inp. shape[0] And, this can be examplified like below. Notice that input of gradient function is same with forward functiony_hat = model_ping('l3') #get value from forward modely_hat. g = ((y_hat. squeeze(-1)-y_train). unsqueeze(-1))/y_hat. shape[0]y_hat. g. shape>>> torch. Size([50000, 1]) We can just calculate using broadcasting, not using squeeze. then why should do and unsqueeze again?🎯 It’s related with random access memory(RAM). . If I don’t squeeze, (I’m using colab) it out of RAM. 2) Derivative of linear2 function This process’s weight dimensions defined by axis=1, axis=2. axis=0 dimension means size of data. This will be summazed by . sum(0) method. unsqeeze(-1)&unsqeeze(1) seperates the dimension, and make a dot product, and vanish axis=0 dimension. def lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowlin2 = model_ping('l2'); #get value from forward modellin2. g = y_hat. g@w2. t(); w2. g = (lin2. unsqueeze(-1) * y_hat. g. unsqueeze(1)). sum(0);b2. g = y_hat. g. sum(0);lin2. g. shape, w2. g. shape, b2. g. shape>>> torch. Size([50000, 50])torch. Size([50, 1])torch. Size([1]) Notice going reverse order, we’re passing in gradient backward3) derivative of ReLU def relu_grad(inp, out): # grad of relu with respect to input activations inp. g = (inp>0). float() * out. g Examplified belowlin1=model_ping('l1') #get value from forward modellin1. g = (lin1>0). float() * lin2. g;lin1. g. shape>>> torch. Size([50000, 50])4) Derivative of linear1 Same process with 2) but, this process’s weight hasdef lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowx_train. g = lin1. g @ w1. t(); w1. g = (x_train. unsqueeze(-1) * lin1. g. unsqueeze(1)). sum(0); b1. g = lin1. g. sum(0);x_train. g. shape, w1. g. shape, b1. g. shape>>> torch. Size([50000, 784])torch. Size([784, 50])torch. Size([50])5) Then it goes backward pass def forward_and_backward(inp, targ): # forward pass: l1 = inp @ w1 + b1 l2 = relu(l1) out = l2 @ w2 + b2 # we don't actually need the loss in backward! loss = mse(out, targ) # backward pass: mse_grad(out, targ) lin_grad(l2, out, w2, b2) relu_grad(l1, l2) lin_grad(inp, l1, w1, b1)Version 1 (Basic)- Wall time: 1. 95 s Summary Notice that output of function at forward pass became input of backward pass backpropagation is just the chain rule value loss (loss=mse(out,targ)) is not used in gradient calcuation. Because, it doesn’t appear with the weight. w1g, w2g, b1g, b2g, ig will be used for optimizercheck the result using Pytorch autograd require_grad_ is the magical function, which can automatic differentiation. 2 This magical auto gradified tensor keep track what happend in forward (taking loss function), and do the backward3 So it saves our time to differentiate ourselves Postfix underscore means in pytorch, in-place function, What is in-place function?⤵️ THis is benchmark…. . Version 2 (torch autograd)- Wall time: 3. 81 µs Refactor model: Amazingly, just refactoring our main pieces, it comes down up to Pytorch package. 🌟 Implement yourself, Practice, practice, practice! 🌟 Layers as classes: Relu and Linear are layers in oue neural net. -> make it as classes For the forward, using __call__ for the both of forward & backward. Because ‘call’ means we treat this as a function. class Lin(): def __init__(self, w, b): self. w,self. b = w,b def __call__(self, inp): self. inp = inp self. out = inp@self. w + self. b return self. out def backward(self): self. inp. g = self. out. g @ self. w. t() # Creating a giant outer product, just to sum it, is inefficient! self. w. g = (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) self. b. g = self. out. g. sum(0) Remember that in lin_grad function, we save bias&weight!!!!!💬 inp. g : gradient of the output with respect to the input. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 w. g : gradient of the output with respect to the weight. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 b. g : gradient of the output with respect to the bias. {: style=”color:grey; font-size: 90%; text-align: center;”} class Model(): def __init__(self, w1, b1, w2, b2): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ) def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() refer to Jeremy’s Model class, he put layers in list Dionne’s self-study note: Decomposing Jeremy’s Model class init needs weight, bias but not x data when call that class(a. k. a function) it gave x data and y label! jeremy composited function in layers. x = l(x) so concise…. . also utilized that layer list when backward ust reversing it (using python list’s method) And he is recursively calling the function on the result of the previous thing. ⬇️for l in self. layers: x = l(x)Q2: Don’t I need to declare magical autograd function, requires_grad_?{: style=”color:red; font-size: 130%; text-align: center;”} [The questions migrated to this article] Version 3 (refactoring - layer to class)- Wall time: 5. 25 µs Modue. forward(): Duplicate code makes execution time slow. Role of __call__ changed. No more __call__ for implementing forward pass. By initializing the forward with __call__, Module. forward() use overriding to maximize reusability. So any layer inherit Module, can use parent’s function. gradient of the output with respect to the weight (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) can be reexpressed using einsum, torch. einsum( bi,bj->ij , inp, out. g) Defining forward and Module enables Pytorch to out almost duplicatesVersion 4 (Module & einsum)- Wall time: 4. 29 µs Q2: Isn’t there any way to use broadcasting? Why we should use outer product?{: style=”color:red; font-size: 130%; text-align: center;”} Without einsum: Replacing einsum to matrix product is even more faster. torch. einsum( bi,bj->ij , inp, out. g)can be reexpressed using matrix product, inp. t() @ out. gVersion 5 (without einsum)- Wall time: 3. 81 µs nn. Linear and nn. Module: Torch’s package nn. Linear and nn. Module Version 6 (torch package)- Wall time: 5. 01 µs Final, Using torch. nn. Linear & torch. nn. Module~~~pythonclass Model(nn. Module): def init(self, n_in, nh, n_out): super(). init() self. layers = [nn. Linear(n_in,nh), nn. ReLU(), nn. Linear(nh,n_out)] self. loss = mse def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x. squeeze(), targ)class Model(): def init(self): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ)def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() ~~~ Footnote: fast. ai forums Lesson-8 ↩ pytorch docs - autograd ↩ stackoverflow - finding methods a object has ↩ "
}, {
- "id": 13,
+ "id": 15,
"url": "http://localhost:4000/2020/03/note08-fastai-3/",
"title": "Implement forward&backward pass from scratch",
"body": "2020/03/01 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring1. The forward and backward passes: 1. 1 Normalization: train_mean,train_std = x_train. mean(),x_train. std()>>> train_mean,train_std(tensor(0. 1304), tensor(0. 3073))Remember! Dataset, which is x_train, mean and standard deviation is not 0&1. But we need them to be which means we should substract means and divide data by std. You should not standarlize validation set because training set and validation set should be aparted. after normalize, mean is close to zero, and standard deviation is close to 1. 1. 2 Variable definition: n,m: size of the training set c: the number of activations we need in our model2. Foundation Version: 2. 1 Basic architecture: Our model has one hidden layer, output to have 10 activations, used in cross entropy. But in process of building architecture, we will use mean square error, output to have 1 activations and lator change it to cross entropy number of hidden unit; 50see below pic We want to make w1&w2 mean and std be 0&1. why initializating and make mean zero and std one is important? paper highlighting importance of normalisation - training 10,000 layer network without regularisation1 2. 1. 1 simplified kaiming initQ: Why we did init, normalize with only validation data? Because we can not handle and get statistics from each value of x_valid?{: style=”color:red; font-size: 130%; text-align: center;”} what about hidden(first) layer?w1 = torch. randn(m,nh)b1 = torch. zeros(nh)t = lin(x_valid, w1, b1) # hidden>>> t. mean(), t. std()((tensor(2. 3191), tensor(27. 0303))In output(second) layer, w2 = torch. randn(nh,1)b2 = torch. zeros(1)t2 = lin(t, w2, b2) # output>>> t2. mean(), t2. std()(tensor(-58. 2665), tensor(170. 9717)) which is terribly far from normalzed value. But if we apply simplified kaiming init w1 = torch. randn(m,nh)/math. sqrt(m); b1 = torch. zeros(nh)w2 = torch. randn(nh,1)/math. sqrt(nh); b2 = torch. zeros(1)t = lin(x_valid, w1, b1)t. mean(),t. std()>>> (tensor(-0. 0516), tensor(0. 9354)) But, actually, we use activations not only linear function After applying activations relu at linear layer, mean and deviation became 0. 5. 2. 1. 2 Glorrot initializationPaper2: Understanding the difficulty of training deep feedforward neural networks Gaussian(, bell shaped, normal distributions) is not trained very well. How to initialize neural nets? with the size of layer , the number of filters . But there is No acount for import of ReLU If we got 1000 layers, vanishing gradients problem emerges2. 1. 3 Kaiming initializatingPaper3: Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Kaiming He, explained here rectifier: rectified linear unit rectifier network: neural network with rectifier linear units This is kaiming init, and why suddenly replace one to two on a top? to avoid vanishing gradient(weights) But it doesn’t give very nice mean tough. 2. 1. 4 Pytorch package Why fan_out? according to pytorch documentation, choosing 'fan_in' preserves the magnitude of the variance of the wights in the forward pass. choosing 'fan_out' preserves the magnitues in the backward pass(, which means matmul; with transposed matrix) ➡️ in the other words, torch use fan_out cz pytorch transpose in linear transformaton. What about CNN in Pytorch?I tried torch. nn. Conv2d. conv2d_forward?? Jeremy digged into using torch. nn. modules. conv. _ConvNd. reset_parameters?? 2 in Pytorch, it doesn’t seem to be implemented kaiming init in right formula. so we should use our own operation. But actually, this has been discussed in Pytorch community before. 3 4 Jeremy said it enhanced variance also, so I sampled 100 times and counted better results. To make sure the shape seems sensible. check with assert. (remember we will replace 1 to 10 in cross entropy)assert model(x_valid). shape==torch. Size([x_valid. shape[0],1])>>> model(x_valid). shape(10000, 1) We have made Relu, init, linear, it seems we can forward pass code we need for basic architecture nh = 50def lin(x, w, b): return x@w + b;w1 = torch. randn(m,nh)*math. sqrt(2. /m ); b1 = torch. zeros(nh)w2 = torch. randn(nh,1); b2 = torch. zeros(1)def relu(x): return x. clamp_min(0. ) - 0. 5t1 = relu(lin(x_valid, w1, b1))def model(xb): l1 = lin(xb, w1, b1) l2 = relu(l1) l3 = lin(l2, w2, b2) return l32. 2 Loss function: MSE: Mean squared error need unit vector, so we remove unit axis. def mse(output, targ): return (output. squeeze(-1) - targ). pow(2). mean() In python, in case you remove axis, you use ‘squeeze’, or add axis use ‘unsqueeze’ torch. squeeze where code commonly broken. so, when you use squeeze, clarify dimension axis you want to removetmp = torch. tensor([1,1])tmp. squeeze()>>> tensor([1, 1]) make sure to make as float when you calculateBut why??? because it is tensor?{: style=”color:red; font-size: 130%;”} Here’s the error when I don’t transform the data type ---------------------------------------------------------------------------TypeError Traceback (most recent call last)<ipython-input-22-ae6009bef8b4> in <module>()----> 1 y_train = get_data()[1] # call data again 2 mse(preds, y_train)TypeError: 'map' object is not subscriptable This is forward passFootnote: Other materials: Understanding the difficulty of training deep feedforward neural networks, paper that introduced Xavier initialization Fixup Initialization: Residual Learning Without Normalization ↩ Pytorch implementaion on Kaiming init of conv and linear layers ↩ Pytorch kaiming init issue ↩ Pytorch kaiming init explained ↩ "
}, {
- "id": 14,
+ "id": 16,
"url": "http://localhost:4000/2020/03/note08-fastai-2/",
"title": "What's inside Pytorch Operator?",
"body": "2020/03/01 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, RefactoringWhat’s inside Pytorch Operator?: Section02 Time comparison with pure Python: Matmul with broadcasting> 3194. 95 times faster Einstein summation> 16090. 91 times faster Pytorch’s operator> 49166. 67 times faster 1. Elementwise op: 1. 1 Frobenius norm: above converted into (m*m). sum(). sqrt() Plus, don’t suffer from mathmatical symbols. He also copy and paste that equations from wikipedia. and if you need latex form, download it from archive. 2. Elementwise Matmul: What is the meaning of elementwise? We do not calculate each component. But all of the component at once. Because, length of column of A and row of B are fixed. How much time we saved? So now that takes 1. 37ms. We have removed one line of code and it is a 178 times faster…#TODOI don’t know where the 5 from. but keep it. Maybe this is related with frobenius norm…?as a result, the code before for k in range(ac): c[i,j] += a[i,k] + b[k,j]the code after c[i,j] = (a[i,:] * b[:,j]). sum()To compare it (result betweet original and adjusted version) we use not test_eq but other function. The reason for this is that due to rounding errors from math operations, matrices may not be exactly the same. As a result, we want a function that will “is a equal to b within some tolerance” #exportdef near(a,b): return torch. allclose(a, b, rtol=1e-3, atol=1e-5)def test_near(a,b): test(a,b,near)test_near(t1, matmul(m1, m2))3. Broadcasting: Now, we will use the broadcasting and removec[i,j] = (a[i,:] * b[:,j]). sum() How it works?>>> a=tensor([[10,10,10], [20,20,20], [30,30,30]])>>> b=tensor([1,2,3,])>>> a,b (tensor([[10, 10, 10], [20, 20, 20], [30, 30, 30]]),tensor([1, 2, 3])) >>> a+btensor([[11, 12, 13], [21, 22, 23], [31, 32, 33]]) <Figure 2> demonstrated how array b is broadcasting(or copied but not occupy memory) to compatible with a. Refered from numpy_tutorial there is no loop, but it seems there is exactly the loop. This is not from jeremy (actually after a moment he cover it) but i wondered How to broadcast an array by columns? c=tensor([[1],[2],[3]])a+ctensor([[11, 11, 11], [22, 22, 22], [33, 33, 33]])s What is tensor. stride()?help(t. stride)Help on built-in function stride: stride(…) method of torch. Tensor instancestride(dim) -> tuple or intReturns the stride of :attr:’self’ tensor. Stride is the jump necessary to go from one element to the next one in the specified dimension :attr:’dim’. A tuple of all strides is returned when no argument is passed in. Otherwise, an integer value is returned as the stride in the particular dimension :attr:’dim’. Args: dim (int, optional): the desired dimension in which stride is requiredExample::* x = torch. tensor([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])`x. stride()>>> (5, 1)x. stride(0)>>> 5x. stride(-1)>>> 1 unsqueeze & None index We can manipulate rank of tensor Special value ‘None’, which means please squeeze a new axis here== please broadcast herec = torch. tensor([10,20,30])c[None,:] in c, squeeze a new axis in here please. 2. 2 Matmul with broadcasting: for i in range(ar):# c[i,j] = (a[i,:]). *[:,j]. sum() #previous c[i] = (a[i]. unsqueeze(-1) * b). sum(dim=0) And Using None also (As howard teached)c[i] = (a[i ]. unsqueeze(-1) * b). sum(dim=0) #howardc[i] = (a[i][:,None] * b). sum(dim=0) # using Nonec[i] = (a[i,:,None]*b). sum(dim=0)⭐️Tips🌟 1) Anytime there’s a trailinng(final) colon in numpy or pytorch you can delete it ex) c[i, :] = c [i]2) any number of colon commas at the start, you can switch it with the single elipsis. ex) c[:,:,:,:,i] = c […,i] 2. 3 Broadcasting Rules: What if we tensor. size([1,3]) * tensor. size([3,1])? torch. Size([3, 3]) What is scale???? What if they are one array is times of the other array? ex) Image : 256 x 256 x 3Scale : 128 x 256 x 3Result: ? Why I did not inserted axis via None, but happened broadcasting? >>> c * c[:,None]tensor([[100. , 200. , 300. ], [200. , 400. , 600. ], [300. , 600. , 900. ]])maybe it broadcast cz following array has 3 rows as same principle, no matter what nature shape was, if we do the operation tensor broadcasts to the other. >>> c==c[None]tensor([[True, True, True]])>>> c[None]==c[None,:]tensor([[True, True, True]])>>>c[None,:]==ctensor([[True, True, True]])3. Einstein summation: Creates batch-wise, remove inner most loop, and replaced it with an elementwise producta. k. ac[i,j] += a[i,k] * b[k,j]inner most loop c[i,j] = (a[i,:] * b[:,j]). sum()elementwise product Because K is repeated so we do a dot product. And it is torch. Usage of einsum()1) transpose2) diagnalisation tracing3) batch-wise (matmul) … einstein summation notationdef matmul(a,b): return torch. einsum('ik,kj->ij', a, b)so after all, we are now 16000 times faster than Python. 4. Pytorch op: 49166. 67 times faster than pure python And we will use this matrix multiplication in Fully Connect forward, with some initialized parameters and ReLU. But before that, we need initialized parameters and ReLU, Footnote: TensorRank ti noteResources: Frobenius Norm Review Broadcasting Review (especially Rule) Refer colab! (I totally confused with extension of arrays) torch. allclose Review np. einsum Reviewh "
}, {
- "id": 15,
+ "id": 17,
"url": "http://localhost:4000/2020/02/note08-fastai-1/",
"title": "What is the meaning of 'deep-learning from foundations?'",
"body": "2020/02/29 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring” Lecture 08 - Deep Learning From Foundations-part2 “ I don’t know if you read this article, but I heartily appreciate Rachael Thomas and Jeremy Howard for providing these priceless lectures for free Homework: Review concepts 16 concepts from Course 1 (lessons 1 - 7)(1) Affine Functions & non-linearities; 2) Parameters & activations; 3) Random initialization & transfer learning; 4) SGD, Momentum, Adam; 5) Convolutions; Batch-norm; 6) Dropout; 7) Data augmentation; 8) Weight decay; 9) Res/dense blocks; 10) Image classification and regression; 11)Embeddings; 12) Continuous & Categorical variables; 13) Collaborative filtering; 14) Language models; 15) NLP classification; 16) Segmentation; U-net; GANS) Make sure you understand broadcasting Read section 2. 2 in Delving Deep into Rectifiers Try to replicate as much of the notebooks as you can without peeking; when you get stuck, peek at the lesson notebook, but then close it and try to do it yourself calculus for machine learning based on weight… einsum conventionCONTENTS: What is going on in this course? What is ‘from foundations’? Steps to a basic modern CNN model Today’s implementation goal: 1) matmul -> 4) FC backward Library development using jupyter notebook jupyter notebook certainly can make module Elementwise ops How can we make python faster? What is element wise operation? FootnoteWhat is going on in this course?: What is ‘from foundations’?: 1) Recreate fast. ai and Pytorch 2) using pure python Evade OverfittingOverfit : validation error getting worsetraining loss < validation loss Know the name of the symbol you usefind in this page if you don’t know the symbol that you are using or just draw it here (run by ML!) Steps to a basic modern CNN model: 1) Matrix multiplication -> 2) Relu/Initialization -> 3) Fully-connected Forward-> 4) Fully-connected Backward -> 5) Train loop -> 6) Convolution-> 7) Optimization ->8) Batchnormalization -> 9) Resnet Today’s implementation goal: 1) matmul -> 4) FC backward: Library development using jupyter notebook: what is assers? jupyter notebook certainly can make module: There will be #export tag that Howard (and we) want to extract special notebook2script. py will detect sign of #expert and convert following into python module and test ittest\_eq(TEST,'test')test\_eq(TEST,'test1') what is run_notebook. py? when you want to test your module in command line interface !python run\_notebook. py 01_matmul. ipynb Is there any difference between 1) and 2)?1) test -> test01 2) test01 -> test #TODO I don’t know yet look into run_notebook. py, package fire Jeremy used. What is that?read and run the code in a notebook, and in the process, Jeremy made Python Fire library called!shockingly, fire takes any kind of function and converts into CLI command. fire library was released by Google open source, Thursday, March 2, 2017 Get data pytorch and numpy are pretty much same. variable c explains how many pixels there are in in MNIST, 28 pixels PyTorch’s view() method: torch function that manipulating tensor, and squeeze() in torch & mathmatical operation similar function Rao & McMahan said usually this functions result in feature vector. In part 1, you can use view function several times. Initial python model Which is Linear, like $Xw$(weight)$+a$(bias) $= Y$ If you don’t know hou to multiple matrix, refer this site matmul visulization site How many time spends if we we use pure python function matmul, typical matrix multiplication function, takes about 1 second for calculating 1 single train data! (maybe assumed stochastic, 5 data points in validation) it takes about 11. 36 hours to update parameters even single layer and 1 iteration! (if that was my computer, it would be 14 hours. . )🤪 THIS is why we need to consider ‘time’&’space’ This is kinda slow - what if we could speed it up by 50,000 times? Let’s try! Elementwise ops: How can we make python faster?: If we want to calculate faster, then do remove pythonic calcuation, by passing its computation down to something that is written something other than python, like pytorch. According to PyTorch doc it uses C++ (via ATen), so we are going to implement that function with python. What is element wise operation?: items makes a pair, operate corresponding componentFootnote: notebooks material video broadcasting excel"
}, {
- "id": 16,
+ "id": 18,
"url": "http://localhost:4000/2020/02/what-is-convolution/",
"title": "Digging into convolution",
"body": "2020/02/28 - Issues 1) Kaiming Initializtion in Pytorch was in trouble. 1 2) Jeremy started to dig in, in lesson09, but I didn’t know why the size of tensor is 2 and even understand this spreadsheet data. 3 Homework: Read Visualizing and Understanding Convolutional Networks paper What is a convolution? Visualization one kernel Matthew D Zeiler & Rob Fergus Paper Convolution can be represented as matmul Padding Kernel has rank 3 How can we find a side-edge, a gradient and area of constant weight? What is a convolution?: A convolutional neural network is that your red, green, and blue pixels go into the simple computation, and something comes out of that, and then the result of that goes into a second layer, and the result of that goes into the third layer and so forth. Visualization: one kernel Refer this site for visualizing CNN filteringMatthew D Zeiler & Rob Fergus PaperLecture01 Nine examples of the actual coefficients from the **first layer** Convolution can be represented as matmul: CNNs from different viewpoints {align-items: center;} [A B C D E F G H I J] is 3 by 3 image data flatten to vector. As a result, convolution is a just matrix just two things happens Some of entries are set to zeros at all the times same color always have the same weight. That called weight time / wegith sharing So, we can implement a convolution with matrix multiplication. But, we don’t do that because it’s slow!Padding: What most of libraries do is just put zeros asdie of matrix fast. ai uses reflection paddings (what is this? Jeremy said he uttered it)Kernel has rank 3: As standard picture input would be 4 5, it would be actually 3d, not 2d. If we make kernel as a 3x3 size, we pass over same kernel all the different Red, Green, Blue Pixels. This could make problem, because, if we want to detect frog, which is green, we would want more activations on the green(I made a test cell in my colab 6) How can we find a side-edge, a gradient and area of constant weight?: Not top-edge! One kernel can find only the top-edge, so we should stack the kernels 7 So, we pass it through bunch of kernels to the input images, and that process gives us height x width x corresponding number of kernels. Usually that number of chanel is 16 And if we want to get the more channels and features, we should repeat that process This process gives rise to memory out of control, we do the stride #### conv-example. xlsx 2 convolutional filters At a second layer, filter is 3x3x2 tensor, because to add up together the first layer’s channel. Reference: Problem was math. sqrt(5) was not kaiming initialization formula, Implementation in Pytorch ↩ size of tensor, lecture09 ↩ conv-example. xlsx ↩ Why do computer use red, green and blue instead of primary colors ↩ Grayscale is a group of shades without any visible color. … Each of these dots has its own brightness level as well and, therefore, can be converted to grayscale. A grayscale image is one with all color information removed. ↩ Testing RGB and grayscale ↩ stack kernel and make new rank of tensor at output, Lesson06-2019 ↩ "
}, {
- "id": 17,
+ "id": 19,
"url": "http://localhost:4000/2020/02/dps-week8/",
- "title": "Digital Product School week 8&9",
- "body": "2020/02/24 - The 8th week retropect at Digital Product School Week 8/9 - Ship your MVP/Release next iteration each day This week's schedule CONTENT: Preparing engineering weekly Agile Process Daily Stand-up Making application flowchart (feat draw. io) / ER diagram Flowchart, understaning user journey ER diagram Engineering weekly AI lunch Connecting firebase andPreparing engineering weekly: This week at Wednesday, I planned to explain the Language Modelings, mainly focusing ELMo, ULMFiT, BERT and GPT-2. Slides is available here Changed the presentation, because there were people who are not in ML domain. hereWhenever I do the presentation, I learn more than the information I give them. At the same time, I realize I need to learn more than I know. Agile Process: One of a priceless lesson I learnt from digital product school, was experience of doing agile work. Before I came here, it was a little bit vague concept. I’m not sure ‘what is agile’ but this is what we tried to make agile process. Daily Stand-up: Sharing the works everyday helps interdisciplinary team to work better. Since product started to get higher fidelity, the gap between engineer and non-engineer increased. Actually I didn’t planned to explain concept because I thougth I would be lose my audience when I start to explain. But as daily stand-up, which shares our progess, goes day by day, I planed and reported the issues. And it made each other’s topic feel more familiar. I think point is very important, because at that point people start to be curious. So we can actively ask to the others, and that momwnr, we can explain the point teammate dosen’t know. Each color means every different section. Red: Our team goal, Blue: Interaction designer, Green: Product manager, Yellow: Software/AI engineer This week engineer's main plan Each of us try to explain what we are doing, but things become easier when we are asked. Because we explained something was important to us before, but if we asked it is something important for the others. Making application flowchart (feat draw. io) / ER diagram: Before we start the party, we should clarify the flowchart and ER diagram of our application. Flowchart, understaning user journey: Thanks for google, we could use draw. io for our framechart framework. Actually, we cana choice other good flatform, but draw. io has connected app throgh google drive, most of our engineer was used to it. And after this job, I got to know there is also (of course) rule with the symbols, color, size, space, scaling and direction of arrow -reference. But why we should do this? WE have made our storymap before!! I think storymap is for visualize our status and app. So it should be shared with whole the team, and they should able to understand each role’s issue. But flowchart is more like testing technical feasibility, and error that user can experience. So it could be little more specific, complicated, and hypothetical. This week engineer's main plan ER diagram: Even if we use NoSQL database through firebase, my team was accustomed to SQL more. That what we educated when we were at college, so we had to organize our concept while we were learning NoSQL. Engineering weekly: Every engineering weekly we exchange our knowledge each other so that we can grow together. Before today, my AI collegues presented regression, knn and it was my turn. I prepared slide that explain about pre-trained language model, but my header advised me if I go deep of theoretical things, I would lose my audience. So I decided to brief BERT mode, how I can contribute to other team’s project. Since BERT was breakthrough of NLP industry, I tried to explain how it can be applied to hands on product and how it can help people in their product. The result was quite motivative to me. They gave feedback that since it wasn’t that much theoretical, they could enjoy it, and useful information. Someone asked me do I had learned of presentation before. I was really happy with their feedback! AI lunch: Connecting firebase and: "
+ "title": "My life in Digital Product School - week 8/19/10",
+ "body": "2020/02/24 - The 8/9/10th week retropect at Digital Product School Week 8 - Ship your MVPWeek 9/10 - Release next iteration each day Week 8th schedule CONTENT: Agile Product Development Daily Stand-up(planning) Gemba Walk Sprint Reviews Engineering weeklyAgile Product Development: One of a priceless lesson I learnt from digital product school, was experience of doing agile work. Before I came here, it was a little bit vague concept. I’m still not sure ‘what is agile’ but this is how we tried to make agile process. Daily Stand-up(planning): Sharing the works everyday helps interdisciplinary team to work better. Since product started to get higher fidelity, the gap between engineer and non-engineer increased. Actually I didn’t planned to explain concept because I thougth I would be lose my audience when I start to explain. But as daily stand-up, which shares our progess, goes day by day, I planed and reported the issues. And it made each other’s topic feel more familiar. I think point is very important, because at that point people start to be curious. So we can actively ask to the others, and that momwnr, we can explain the point teammate dosen’t know. Each color means every different section. Red: Our team goal, Blue: Interaction designer, Green: Product manager, Yellow: Software/AI engineer This week engineer's main plan Each of us try to explain what we are doing, but things become easier when we are asked. Because we explained something was important to us before, but if we asked it is something important for the others. Gemba Walk: Team Cero with core team Every 2 weeks, we do the Gemba work, which is ‘question everything to the core team’ time. At this period, people can ask anything related to our product, workshop, and framework. Core team will help just for each team, and each team can solve the problem related to their work. < br/>Why we need this session? because with workshop and general schedule, core team has no time just focus on each team. So through this session, we can have opportunity to understand each program and workshop, like why we are using this platform, and when is the due of our small project, and we have this problem and we need help for this. whatever small problem you have, core team is always willing to help you. Sprint Reviews: Every Friday, we have time to summarise what we did for the week. Maybe we need HMW question and our storymap to share our process and then tell and share what we did try, what point we succeeded and what point it was deviant of our prediction, and why we tried it. . Sprint of Ve-link And then, just after all team’s ppt, we do vote with such a cute marvel. Always it’s very difficult to vote (of course you can’t vote to your team!) Because it depends on criteria what do I value!But since this is process of our agile work, I try to focus on what they have changed since last week, and why they did it, how they did it. Engineering weekly: Every engineering weekly we exchange our knowledge each other so that we can grow together. Everyone have their knowledge to share and we can be tutor and at the same time can be of tutee. Previously, my AI collegues presented regression, knn. And because I’m somewhat specialized to NLP, I prepared slide that explain about pre-trained language model, but my header advised me if I go deep of theoretical things, I would lose my audience. So I decided to brief BERT mode, how I can contribute to other team’s project. Since BERT was breakthrough of NLP industry, I tried to explain how it can be applied to hands on product and how it can help people in their product. The result was quite motivative to me. They gave feedback that since it wasn’t that much theoretical, they could enjoy it, and useful information. Someone asked me do I had learned of presentation before. I was really happy with their feedback! "
}, {
- "id": 18,
+ "id": 20,
"url": "http://localhost:4000/2020/02/fast.ai-nlp-note-16/",
"title": "Algorithmic bias",
"body": "2020/02/20 - Algorithms can encode & magnify human bias Case Study 1: Facial Recognition & Predictive Policing: Joy Buolamwini & Timnit Gebru, gendershades. org Microsoft, FACE+, IBM - All of these things are sell now. Largest gap between $\therefore\ Lighter Male\ >\ Darker\ Female $ This US mayor joked cops should “mount . 50-caliber” guns where AI predicts crime With machine learning, with automation, there’s a 99% success, so that robot is ㅡwill beㅡ99% accurate in telling us what is going to happen next, which is really interesting. - city official in Lancater, CA, approving on using IBM for public security Bias: Bias is type of error Statistical Bias: difference between a statistic’s expected value and the true value Unjust Bias: disproportionate preference for or prejudice against a group Unconscious bias: bias that we don’t realize we have But, term bias is too generic to be productive. Different sources of bias have different causes Representation Bias: Dataset was not representative of the algorithm that might be used on later. Above : Data is okay, but algorithm has some problem. Below : Data has error. For example, object detection production that performs very well in common product of US. But in contrast, change of target product region, like Zimbabwe, Solomon Island, and so on, reduced the performence remarkably. It is not the algorithmic problem, so we should care about data volume of region. Evaluation Bias: Benchmark datasets spur on research, 4. 4% of IJB-A images are dark-skinned women. 2/3 of ImageNet images from the West (Sharkar et al, 2017) Case Study 2: Recidivism Algorithm Used Prison Sentencing: Case Study 3: Online Ad Delivery: Bias in NLP: ( Nothing to do with the course, but I’m researching this field these days. ) But all about Englsih ImpactThe person is doctor. The person is nurse -> 그는 의사다. 그녀는 간호사다. Concept of “biased data” often too generic to be useful: Different sources of bias have different sources Data, models and systems are not unchanging numbers on a screen. They’re the result of a complex process that starts with years of historical context and involves a series of choices and norms, from data measurement to model evaluation to human interpretation. - Harini Suresh, “The problem with Biased Data” Five Sources of Bias in ML: Representation Bias Evaluation Bias Measurement Bias Aggregation Bias(46:02) Historical Bias(46:26) A few studies(47:13) Racial Bias, Even when we have good intentions(new york times)(47:10) gender(48:59) Humans are biased, so why does algorithmic bias matter?: Algorithms & humans are used differently (humans are usually decision maker) Algorithms are accurate and objective No way to apeal if there if error processed large scale cheap Machine learning can amplify bias Machine learning can create feedback loops. Technology is power. And with that comes responsibility. Solutions: Analyze a project at work/school: Questions about AI 5 types of bias (Suresh & Guttag) Datasheets for datasets, Modelcards for model reporting Accuracy rate on different sub-groups Work with domain experts & those impacted Increase diversity in our workspace Advocate for good policy Be on the ongoing lookout for bias"
}, {
- "id": 19,
+ "id": 21,
"url": "http://localhost:4000/2020/02/classifier-city/",
"title": "Making a classifier with image dataset made from gooogle",
"body": "2020/02/15 - CONTENTS: Creating dataset from google images Using google_images_download Create ImageDataBunch Train model fit_one_cycle() Let’s find-tune Let’s train the whole model! Let’s make batch size bigger! Interpretation Model in productionCode can be found hereDeployed model here Making a classifier which can distinguish Seoul from Munich and Sanfrancisco!(hoping my well in Munich!) Creating dataset from google images: In machine learning, you always need data before you build your model. You can use either URLs or google_images_download package. Since Jeremy explained specifically, I will try the other. Using google_images_download: note: This is not google official package Refer to Official Doncument, put that arguments. from google_images_download import google_images_downloadresponse = google_images_download. googleimagesdownload() #class instantiationout_dir = os. path. abspath('. . /. . /materials/dataset/pkg/')os. mkdir(out_dir)arguments = { keywords : Cebu,Munich,Seoul , print_urls :True, suffix_keywords : city , output_directory :out_dir, type : photo , }paths = response. download(arguments) #passing the arguments to the functionprint(paths)and if you need, here is main code. Create ImageDataBunch: We need to separate validation set because we just grabbed these imagese from Google. Most of the dataset we use (kaggle/research) splited into train / validation / test so if they are not devided beforehand we should make databunch, and Jeremy recommended assign 20% to validation. Help on function verify_images in module fastai. vision. data:verify_images(path: Union[pathlib. Path, str], delete: bool = True, max_workers: int = 4, max_size: int = None, recurse: bool = False, dest: Union[pathlib. Path, str] = '. ', n_channels: int = 3, interp=2, ext: str = None, img_format: str = None, resume: bool = None, **kwargs) Check if the images in `path` aren't broken, maybe resize them and copy it in `dest`. Data from google image url Data from package Train model: len(class) len(train) len(valid) Data_url 3 432 108 Data_pkg 3 216 53 Uisng model: restnet34 1, Measurement: accuracy 2 fit_one_cycle(): What is fit one cycle? Cyclical Learning Rates for Training Neural Networks One of the way to find good learning rate. Core idea is to start with small learning rate (like 1e-4, 1e-3) and increase the learning rate after each mini-batch till loss starts exploding. And pick up learning rate one order lower than exploding point. For example, plotted learning rate is like below picture, picking up around 1e-2 is the best way. Why this methods Traditionally, the learning rate is decreased as the learning starts converging with time. But this paper suggests to cycle our learning rate, because it makes us avoid local minimum. Basically this cyclic method enables us to explore whole of loss function so that find out global minimum. In other words, higher learning rate behaves like regularisation. Let’s find-tune: Do train just one last layer by learning rate found by find_lr This section you should find the strongest downward slope that kind of sticking around for quite a while. And choose just one order lower than lowest point. As explained before, I will pick up 1e-2. And of course, this is fine-tuning, we don’t need discriminative learning rate yet. Let’s train the whole model!: link When you plot the learning rate again, maybe you will get soaring shape of learning rate. Rule of thumb, When you slice the learning rate, use learning rate you used at unfrozen part. Divide it by 5 or 10 and put it on maximum bound. At minimum bound, get the point just before it soared, and divide it by 10. Let’s make batch size bigger!: Since default batch size is 64, I tried it to 128. And it gets way more better result(even it’s still underfitting!) And if I freeze model and train whole model again, the model would be better. Also, you can use this method to the other big dataset model training! Interpretation: See the confusion matrix. Result is quite great. *Since I’m using colab, I will skip data cleansing. But I highly recommend you to use ImageCleaner widget, only if you are using jupyter notebook (not jupyter lab) Model in production: You can deploy your model in simple way. I referred fast. ai, and used render(it’s free for limited time). You can find detailed document here. and you can create a route like this. @app. route( /classify-url , methods=[ GET ])async def classify_url(request): bytes = await get_bytes(request. query_params[ url ]) img = open_image(BytesIO(bytes)) _,_,losses = learner. predict(img) return JSONResponse({ predictions : sorted( zip(cat_learner. data. classes, map(float, losses)), key=lambda p: p[1], reverse=True ) })You can find my deployed model here Reference: How to create a deep learning dataset using Google Images towardsdatascience - one cycle policy Deep Residual Learning for Image Recognition ↩ Accuracy_and_precision ↩ "
}, {
- "id": 20,
+ "id": 22,
"url": "http://localhost:4000/2020/02/dps-week5/",
"title": "Digital Product School week 5",
"body": "2020/02/09 - The 5th week retropect at Digital Product School Week 5 - Create a Storymap and sync it with Lean Canvas This week's schedule CONTENT: How to create our story map Prepare your story Discover your product’s AI potentialMondayHow to create our story map: We need this 'aha' moment There was a Milestone workshop, about our weekly goal. As we are agile working, we go fast and change every week’s goal. This week we will finalize our story map based on user’s pain-point and HMW questions. How should we make our story-map Basically we should make story map based on this rule Tell stories, don’t just write them! We always need context, that means all the story component should be connected Visualize your product to establish a shared understanding and speed up discussions! Post-it filled of text is not enough, we should fill it with visualizations then team mates can understand it fast Only discuss in front our your story map! (Speed) So we can update our story-map as soon as we change our opinion And also Use a story map to find the parts that matter most and to identify holes in your idea! Since the story map consists of techinical part, we should consider each story’s technical feasibility Minimise output, maximise outcome and impact! Build tests to figure out what’s minimum and what’s viable! This story map functions to find out our minimum value of ideas Work iteratively: Change your story map according to your learnings! We should repeat this process again and again PMs: Make sure Storymap is up to date!Prepare your story: team cero, our whole story map Our goal Technical feasibility of our storyWhat is your strategy to make user achieve something? This would be our expand point Discover your product’s AI potential: How can we apply AI to our product? Let’s write down our ‘HMW’ questions, and find out all p ossibilities. These are suggestion of possibilities, so don’t attached to feasibility (we will do in at lean start-up) Software section's expectation AI section's expectationTuesday Engineer's task, week5This 5th week, engineers settled WendesdayThursdayFriday"
}, {
- "id": 21,
+ "id": 23,
"url": "http://localhost:4000/2020/02/GPU-time/",
"title": "4 reasons took much time to setting GPU for fast.ai than I expected",
"body": "2020/02/05 - Motivation: Before now, me as a undergraduate student, I was parsimony who usually depend on colab, kaggle, friend’s server(occasional) whenever i need GPU. . And this time it’s been for a while to install GPU than I expected and I share the several component that stood in my way. Written at Oct 24 2019, if you think this is deprecated, please do not have a leap of faith. Just for the record, I’ve used Kaggle, Colab, GCP, Azure, EC2 as GPU cloud. 1. Did not know there is JupyterLab option in Google Cloud Platform. : At the first time when GCP came out, there was no AI Platform service. So from starting vm instance to launching jupyter and installing packages, I did all of the things myself. (and I learned 🤗) $ curl -O https://repo. continuum. io/archive/Anaconda3-5. 0. 1-Linux-x86_64. sh[Downloading conda in ssh] I created VM instance,selected zone, machine type and disk type. Then, define firewall rules and in ssh terminal, install jupyter and other packages. But you can do all of these things just using AI Platform. [AI Platform] I think it especially save your time if you are living in Asia-Pacific, which google doesn’t support not that much GPU resources. 2. Consider if the platform has limited resources in a region you live in. : I live in South Korea, East Asia, and it seems like this region has lots of limitation in GPU (except quite expensive AWS) And the Taiwan which was the only one region where I can launch my own VM with GPU (I tried all the other regions in the list) sometimes do normaly, but not always. 😥After launching, I did several works and next day I could not start VM. (I didn’t count it, but tried it a few hours because I didn’t want cost any more time…) Endlessly failed to start instance, then I choose to move AWS as an alternative way. 3. Fast. ai gives deliberate guide and I didn’t know it. : Fast. ai offer the guide for all available platform. (Colab, salamander, Gradient, Kaggle, Colab, and so on) It is so important, and really needs, because cloud computing options are vary as occasion and purpose arise. I didn’t know fast. ai has manual to running GCP, and I think it’s as good a reason as any for me to be have taken time. It helped me so much when I had aws and shortened my time. I don’t want to read all of the manual in amazno. . (It is recommended. . but I’d rather read GIT PRO now…) ssh -i ~/. ssh/<your_private_key_pair> -L localhost:8888:localhost:8888 ubuntu@<your instance IP>4. You should wait to add more volume just after add volume, by building AWS EC2. : Since Elastic Block Store(EBS) storage supports optimized storage, users can’t extend storage volume two times in a row. Unfortunately, at the first time, I didn’t know it (again 👻) and when VM lacked volume, I doubled dist capacity (76*2) at a rough but It needs more. <!– this time I installed GPU in two years, and it became little complicated compared to 2 years ago. And this time for the first time(maybe not the first time. . but i handled it in my class or with my friend. but it’s my first time on my own. ) I very I’m started to using used google colab, kaggleand, GCP-JupyterLab, ec2 - friend made, aws vm machine but I had a environment variable but i did not know of it. On these days, I could not get a resources from taiwan… I couldn’t notice a deliberate Anyway, as a result I tried myself gcp myself and aws ec2 with fast. ai But I think doing on my self surely takes much time (in this point I wonder why I’m doing this, and should remind me, especially I was studying disk volume optimization) disk volume exceed - https://askubuntu. com/questions/919748/no-space-left-on-device-even-though-there-is: "
}, {
- "id": 22,
+ "id": 24,
"url": "http://localhost:4000/2020/02/dps-week4/",
"title": "Digital Product School week 4",
"body": "2020/02/01 - The 4th week retropect at Digital Product School Week 4 - Find solution ideas and run experiments [This week’s schedule] CONTENT: Ideation Techniques What is ideation techniques? Generating idea in my team AIdeation Team brain storming of idea Die Produkt MacherMondayIdeation Techniques: [slides from @steffen] What is ideation techniques?: We tried to find out user’s painpoint last week. Tried to users talk about their, pain point. No question directly, but extract from them their pain with transportation. Generating idea in my team: AIdeation: TuesdayTeam brain storming of idea: Based on generated idea on Monday, we extended our idea doing rolling-paper! Die Produkt Macher: What is lean start-up? Lean startup is a methodology for developing businesses and products that aims to shorten product development cycles and rapidly discover if a proposed business model is viable; this is achieved by adopting a combination of business-hypothesis-driven experimentation, iterative product releases, and validated learning. - wikipedia WendesdayThursdayFriday"
}, {
- "id": 23,
+ "id": 25,
"url": "http://localhost:4000/2020/01/retrosprect-of-acl-paper-2020/",
"title": "Retrospect of ACL 2020 paper writing",
"body": "2020/01/29 - 2020 Annual Conference of the Association for Computational Linguistics Why I can’t use ‘Cebuano’ for the research?: Why I had to change target language from ‘Cebuano’ to ‘Tagalog’?-> No language translator options except google translation. But before knowing that I already consult my friend, whose mother tongue is English. So I had to aplogize her, but couldn’t tell her why suddenly I changed my plan. -> I realized there are many languages even can’t be researched at all. . -> Getting accustomed to discrimination makes misunderstanding, sometimes. At my country, we couldn’t use music streaming service, because of legal problem. But at that moment, I thought it was discrimination, which is done by music company. "
}, {
- "id": 24,
+ "id": 26,
"url": "http://localhost:4000/2020/01/Git-Merge/",
"title": "Why am I not listed as a contributor?!",
"body": "2020/01/10 - From the end of last year, big changes have witnessed in NLP research. Embracing an unprecedented growth, I started to study new exciting results and advances. In doing so, I noticed I’m not listed as contributor of repo which my PR accessed. How did I come to a repository?: When I’m stuck, I would prefer to code, than to go deep in theory. (It must be so. . too much to understand 🤒)It was BERT released by Google AI I felt keenly the necessity of implementing, because not only couldn’t understand the way they figured out positional encoding formula, but how it actually works. What does it mean to “scale” dot product in Attention? (Now I know it’s far from my section 😂) Figure 1. Scaled Dot Product. Adopted from tensorflow blogWhat was the code error?: For implement code in paper, I read the papers Transformer and BERT, structured the model, and refered the others’ code. Meanwhile, I found out a small error in tokenization process, which was changing a token into [MASK], enabled bidirectional representation. I’ve made PR, and got merged. But I was not in contributors. Why?: Figure 2. Merged Pull request Adopted from graykode projectActually I happened to know there can be couple of reasons github doesn’t include my name as contributor. Well, if contributors tab has more than 100 people, in which case it shows you up only if you are in the top 100 contributors because displaying too many contributors can make webpages down. Somethimes, however, it doesn’t that problem. Why not? Two possibilities are there. First, According to Joel-Glovier, if repository maintainer merged-as-a-rebase PR will end up showing as maintainer’s commit. But maintainer shouldn’t normally do this. Second, if you happend to commit using a different git email that what is in your GitHub profile, it will not be attached to your Github user, and “doesn’t show up” as you. Reference: Michał Chromiak’s blog Github: why are my contributions are not showing on my profile atlassian-gitfetch"
}, {
- "id": 25,
- "url": "http://localhost:4000/2019/12/lesson1-fastai/",
- "title": "Fine Grained Classification",
- "body": "2019/12/31 - Finally you can solve the mystery behind this weird drawing. . through this course. juptyer notebook magic: %reload_ext autoreload%autoreload 2%matplotlib inlinethis is special directives to jupyter notebook, not python code. And it is called ‘magics’ (but i think jeremy is magicion) If somebody changes underlying library code while I’m running this, please reload it automatically If somebody asks to plot something, then please plot it here in this Jupyter NotebookDon’t hesitate to import start~ Digging into untar_data, path. ls: Union[pathlib. Path, str]: typed programming language? -> maybe i think disclaim the type beforehand for sure. Q. like assert? path. ls()this is some module that fast. ai made because os. listdir(‘path’) is unconvinient. Python3 pathlib library!: pathlib "
- }, {
- "id": 26,
+ "id": 27,
"url": "http://localhost:4000/2019/12/jeremy-howard/",
"title": "Jeremy Howard",
"body": "2019/12/15 - This is journey to find out ‘who am I trying to be?’: How he impacted me? The person who made me start Computer Vision again. He emphasized the importance of studying NLP and Computer together to understand the deep-learning. He didn’t order it to study, but always he pursuade me with reasonable way. “It’s not just something I can throw away. NLP and computer vision a few weeks apart and that’s going to force your brain to realize like ‘oh I have to remember this’” He made me admit my failure in deep-learning. I started to objectify where am I. What should I do when I’m frustrated. “Keep going. You’re not expected to remember everything. Yet. You’re not expected to understand everything. Yet. You’re not expected to know why everything works. Yet. ” His articles are numerous, below. What is torch. nn Really? High Performance Numeric Programming with Swift: Explorations and Reflections C++11, random distributions, and Swift And especially, I like this book. Designing great data products Great predictive modeling is an important part of the solution, but it no longer stands on its own; as products become more sophisticated, it disappears into the plumbing. Designing great data products And he is also famous for words. Here are some. we’re going to try and use that to really understand what’s going on. So to warn you, none of it is rocket science but a lot of its going to look really new. So don’t expect to get it the first time but expect to listen and jump into the notebook try a few things test things out look particularly at like tensor shapes and inputs and outputs to check your understanding then go back and listen again. But and kind of try it, a few times, because you will get there right, it’s just that there’s going to be a lot of new concepts because we haven’t done that much stuff in pure Pytorch. Lesson 6: Deep Learning 2019 "
}, {
- "id": 27,
+ "id": 28,
"url": "http://localhost:4000/2019/11/julia-evans/",
"title": "Julia Evans",
"body": "2019/11/20 - This is journey to find out ‘who am I trying to be?’: The women who surprised me in many ways. First, she approached me to teaching some concepts drawing cartoons. It was at Hackers news, which was hightest ranks. Personally I have the use of not to reading title, so and cartoon was so cute and clear. I naturally gonna understood mechanism and astonished by her explaination ability. Her value, which she was taught by many people so want to do same things, moved me. Volume of her knowledge, that just reading post title is a deal of work, amazed me. "
}, {
- "id": 28,
+ "id": 29,
"url": "http://localhost:4000/2019/11/coc-retropective/",
"title": "Retrospective on Pycon 2019 Korea (CoC Committee)",
"body": "2019/11/05 - When I was volunteer, it seems like busy and hectic to managing that crowded conference. In my experience, to get things moving, it needs hierarchy. But it didn’t. Organizers emphasized our responsibility, and if I passed each other’s burden, It could be my burden next time. In solidarity of the obligation, we finished conference well. And after participating PyCon Korea 2018 as volunteer, I’ve joined PyCon Korea Organizer last year. <Figure 1> First meeting of PyCon 2019 Korea Organizers It’s been a while since PyCon 2019 finished. It’s held on Aug 15 - 18, at Coex Grand Balloom <Figure 2> Ongoing session, speaking on news comment processing <Figure 3> Sponsor Booth iin Coex Hall <Figure 4> After PyCon 2019, with all of volunteer, organizer, speakers 😍 🥰 Serving as part of the coc TF, I spent large fraction of last year doing CoC job. here’s the path what we’ve been grappled with to grasp a solution. First half: Before the conference Toward Diverse Community: Formally we’ve been reusing and modifying PyCon US CoC, but we needed fit in Korean and I was part of that to revise code of conduct. Except ‘That’ Diversity, Because it is ‘Harassment’: Specific point was harassment, and the others were not. process of finding the points. How can we settle this point?Second half: During the conference Handling the potential Harassment: Disjunction of policy and real-time situation: This ‘PyCon 2019 Korea retrospective series’ would be devided into 3 Episodes. “Retrospective on Pycon 2019 Korea (CoC Committee)” “Retrospective on Pycon 2019 Korea (Program Chair)” (20 Nov, To Be Update) “Maintaining participation while still making timely decisions” (29 Nov, To Be Update)"
}, {
- "id": 29,
+ "id": 30,
"url": "http://localhost:4000/2019/11/elif-shafak/",
"title": "Elif Shafak",
"body": "2019/11/05 - This is journey to find out ‘who am I trying to be?’: For creative-minded people, Istanbul is a treasure. ’ Photo © Chris Boland, licensed under CC BY-NC-ND 2. 0 it suddenly felt like what I was trying to convey was more complicated and detailed than what the circumstances allowed me to say. And I did what I usually do in similar situations: I stammered, I shut down, and I stopped talking. I stopped talking because the truth was complicated, even though I knew, deep within, that one should never, ever remain silent for fear of complexity. <Figure 1> Elif Shafak Photo credit: www. elifsafak. com. tr I want to talk about emotions and the need to boost our emotional intelligence. I think it’s a pity that mainstream political theory pays very little attention to emotions. Oftentimes, analysts and experts are so busy with data and metrics that they seem to forget those things in life that are difficult to measure and perhaps impossible to cluster under statistical models. But I think this is a mistake, for two main reasons. We are emotional beings. I think it’s going to be one of our biggest intellectual challenges, because our political systems are replete with emotions. In country after country, we have seen illiberal politicians exploiting these emotions. And yet within the academia and among the intelligentsia, we are yet to take emotions seriously. I think we should. 1 2 Reference: British Council Worldwide ↩ Ted Talk ↩ "
}, {
- "id": 30,
+ "id": 31,
"url": "http://localhost:4000/2019/01/dps-week1/",
"title": "Digital Product School week 1",
"body": "2019/01/11 - The 1th week retropect at Digital Product School [This week’s schedule] CONTENT: Welcome to Digital Product School! Trip to Spitzingsee Welcome to Design Office Specifying our goal of product Welcome to Digital Product School!: Trip to Spitzingsee: At the first day of Digital Product School, we had a off-site with all of batch 9 people. All the costs were managed by dps. At the beautiful mountain, we settled the team, and got my team goal. Basically, there are two kind of team in DPS. (1) Wild team - the team has fixed topic(2) Company team - the team which has specific stakeholders, and also topic defined by that stakeholders The Core-team will fix what team you will join in DPS for 3 months based on ymy professionals, they announce it at off-site. [My team for 3 months at DPS] And we decide on my batch #9 theme song. How? Each team draw for songs and pitch ‘why this song should be batch #9 theme song’The result? Imagine dragon - Believer (I didn’t know at the moment, this song would be stamped in my memory) We have a workshop for getting to know each other. For example, we share 1) what do I expect from 3 months of dps, 2) when I feel happy in my life time, 3) what I worked for last week, 4) what was my last project and 5) what plays important role in my life My team's board Cero Welcome to Design Office: At first day of design office, we had workshop, which celebrates my day in dps also discuss specific rule, menifesto and stakeholders We get sticker and attach it in map depends on my nationality Now time to get to know my team’s stakeholders. What they want for us? What they expect from us? How free my team are on the topic?To be honest, it is endless tug-of-war. We should discuss with my stakeholders, endlessly, and find out solution which can meet interest of users, stakeholders and my team. Basically, my team’s main stakeholder is ADAC, but BMW, City of munich and Nokia will also participate as my team’s stakeholders. Specifying our goal of product: "
diff --git a/_site/2020/03/note08-fastai-4/index.html b/_site/2020/03/note08-fastai-4/index.html
index 1495f2a44b..1ddeedd73b 100644
--- a/_site/2020/03/note08-fastai-4/index.html
+++ b/_site/2020/03/note08-fastai-4/index.html
@@ -19,9 +19,9 @@
-
+
+{"description":"This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring","author":{"@type":"Person","name":"dionne"},"@type":"BlogPosting","url":"http://localhost:4000/2020/03/note08-fastai-4/","publisher":{"@type":"Organization","logo":{"@type":"ImageObject","url":"http://localhost:4000/assets/images/logo.png"},"name":"dionne"},"image":"http://localhost:4000/assets/images/4-classlin.png","headline":"Gradient backward, Chain Rule, Refactoring","dateModified":"2020-03-02T00:00:00+09:00","datePublished":"2020-03-02T00:00:00+09:00","mainEntityOfPage":{"@type":"WebPage","@id":"http://localhost:4000/2020/03/note08-fastai-4/"},"@context":"http://schema.org"}
@@ -161,96 +161,101 @@
"body": " {% if page. url == / %} {% assign latest_post = site. posts[0] %} <div class= topfirstimage style= background-image: url({% if latest_post. image contains :// %}{{ latest_post. image }}{% else %} {{site. baseurl}}/{{ latest_post. image}}{% endif %}); height: 200px; background-size: cover; background-repeat: no-repeat; ></div> {{ latest_post. title }} : {{ latest_post. excerpt | strip_html | strip_newlines | truncate: 136 }} In {% for category in latest_post. categories %} {{ category }}, {% endfor %} {{ latest_post. date | date: '%b %d, %Y' }} {%- assign second_post = site. posts[1] -%} {% if second_post. image %} <img class= w-100 src= {% if second_post. image contains :// %}{{ second_post. image }}{% else %}{{ second_post. image | absolute_url }}{% endif %} alt= {{ second_post. title }} > {% endif %} {{ second_post. title }} : In {% for category in second_post. categories %} {{ category }}, {% endfor %} {{ second_post. date | date: '%b %d, %Y' }} {%- assign third_post = site. posts[2] -%} {% if third_post. image %} <img class= w-100 src= {% if third_post. image contains :// %}{{ third_post. image }}{% else %}{{site. baseurl}}/{{ third_post. image }}{% endif %} alt= {{ third_post. title }} > {% endif %} {{ third_post. title }} : In {% for category in third_post. categories %} {{ category }}, {% endfor %} {{ third_post. date | date: '%b %d, %Y' }} {%- assign fourth_post = site. posts[3] -%} {% if fourth_post. image %} <img class= w-100 src= {% if fourth_post. image contains :// %}{{ fourth_post. image }}{% else %}{{site. baseurl}}/{{ fourth_post. image }}{% endif %} alt= {{ fourth_post. title }} > {% endif %} {{ fourth_post. title }} : In {% for category in fourth_post. categories %} {{ category }}, {% endfor %} {{ fourth_post. date | date: '%b %d, %Y' }} {% for post in site. posts %} {% if post. tags contains sticky %} {{post. title}} {{ post. excerpt | strip_html | strip_newlines | truncate: 136 }} Read More {% endif %}{% endfor %} {% endif %} All Stories: {% for post in paginator. posts %} {% include main-loop-card. html %} {% endfor %} {% if paginator. total_pages > 1 %} {% if paginator. previous_page %} « Prev {% else %} « {% endif %} {% for page in (1. . paginator. total_pages) %} {% if page == paginator. page %} {{ page }} {% elsif page == 1 %} {{ page }} {% else %} {{ page }} {% endif %} {% endfor %} {% if paginator. next_page %} Next » {% else %} » {% endif %} {% endif %} {% include sidebar-featured. html %} "
}, {
"id": 12,
+ "url": "http://localhost:4000/2020/04/v3-2019-lesson06-note/",
+ "title": "fastai 2019 course-v3 Part1, lesson06",
+ "body": "2020/04/15 - Lesson 06Rossmann(Tabular): Tabular data: be careful on Categorical variable vs Continuous variable. if datatype is int, fastai think it is classification, not a regression. Root mean square percentage error. as loss function. When you assign the y_range, it’s better to assign little bit more than actual maximum. > because it’s sigmoid. intermediate layers, which is weight matrix is 1) 1000, and 2) 500 -> which means our parameter would be 500*1000. learn. modelWhat is dropout and embedding dropout?: Nitish Srivastava, Dropout: A Simple way to prevent Neural Networks from Overfitting you can dropout with p value, make it specified to specific layer, or make it applied to all the layers. Pytorch code 1) bernoulli, which decides whether you will hold it? 2) and divide the noise value depends on noise value. so noise became 2 or remain 0. According to pytorch code, We do change at training time, but we do nothing at test time. and this means you don’t have to do anything special with inference time. ’ TODO: find at forums what is inference time - Related to NVIDIA, GPU. Embedding dropout is just a dropout. It’s different between continuous variable and embedding layer. TODO Still can’t understand. why embedding dropout is effective. or,… in need. Let’s delete at random, some of the results of the embedding. and It worked well especially at Kaggle Batch Normalization: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift -> came out false! According to How Does Batch Normalization Help Optimization? The key was multiplicative bias {\gamma} and additive bias {\beta}` Explain Let $$ \hat{y} = f(w_1, w_2, w_3, … , x)} $$ , loss = MSE , Then y_range should be between 1 and 5` And Activation function ends with -1 -> +1 To mitigate this problem, we can add the other parameter, like $$w_n$$ But there’re so much interactions in the process so just re-scale the output. Momentum parameter at BatchNorm1d: Different from momentum like in optimization. This momentum is Exponentially weighted moving average of the mean, instead of deviation. If this is small number: mean standard deviation would be less from mini_batch to mini_batch » less regularization effect. (If this is large number, variation would be greater from mini_batch to mini_batch » more regularization effect) TODO: can’t sure, but i understand, this is not about how to update parameter but about how much reflect previous value when scale and shift Q. Preference between batchnorm and the other regularizations(drop out, weight decay)A. Nope, always try and see the results## lesson6-pets-more### Data Augmentation- Last reg- `get_transforms` has lots of params (even not yet learned all) -> check documentation - Remember you can implement all the doc contents bc it's made from nbdev - TODO: try this!!- Essence of data augmentation is you should maintain the label, while somewhat making sense. - ex) tilt, because it's optically sensible, you can always change the angle of the data view. - zeros, border, and reflection but always `reflection` works most of the time, so that is the default### Convolutional Kernel(What is convolution?)- Will make heat\_map from scratch, which means the parts convolution focuses on![setosa_visualization]()- http://setosa. io/ev/image-kernels/ - javascript thing - How convolution works - Kernel. which does element-wise multiplication, and sum them up - so it has on pixel less at borders -> so it uses padding, and fastai uses reflection as said. - why this Kernel(matrix) helps catching horizontal edge side? - because this kernel`(picture2)` weights differently, depends on `x axis` - why familiar, because it's similar intuition with fugus`(paper)` paper- CNN from different viewpoints`link` - output of pixel is results from different linear equations. - If you connect this with represents of neural network nodes, you can see that the specific inp nodes connected with specific out nodes. - **Summarize**: cnn does 1) matmul some of the elements are always zero 2) same weight for every row, which is called `weight time? weight. . ?, 1:18:50` `(picture)`#### Further lowdown- Because generally image has 3 channels, we need rank 3 kernel. - And **do multiply with all channel output is one pixel**. (`draw by your self`) - but this kernel will catch one feature, like horizontal, so that we make more kernel so that output becomes (h * w * kernel) - And that `kernel` come to `channel`- **Conv2d**: with 3 by 3 kernel, stride 2 conv -> (h/2 * w/2 * kernel) - skip or jump over input pixel - to protect from memory out of control~~~pythonlearn. modellearn. summary()~~~TODO: understand yourself the blocks of conv-kernel: - Usually use big kernel size at first layer (will study this at part2)- Bottom right highlighting kernel(`pic / draw`)- `torch. tensor. expand`: for memory efficient, because we should do RGB- We do not make separate kernel, but make rank 4 kernel - 4d tensor is just stacked kernel- `t[None]. shape` create new unit axis, and why? we make this -> it should move unit of batch, not one size image. ### Average pooling, feature- suppose our pre-trained model results in size of `11 by 11 by 512 ` `pic 4` and my classification task has 37 classes * take the first face of channel, which is 11 by 11 and `mean` it, so that make rank 2 tensor, 512 by 1 * and make 2d matrix, which is 512 by 37 and multiply so that we can get 37 by 1 matrix. - Feature, at convolution block - So, when we transfer-learning without unfreeze, every element of last matrix (512 by 1) should represent(or could catch) each feature. ### Heatmap, Hook~~~hook_output(model[0]) -> acts -> avg_acts~~~- if we average the block with `axis=feature`, result of matrix(11 by 11) depicts `how activated was that area?` -> it is heatmap, `avg_acts`- and acts comes from hook, which is more advanced pytorch feature. - hook into pytorch machine itself, and run any arbitrary Pytorch code - Why this is cool?: Normally it gives set of outputs of forward pass, but we can interrupt and hook the forward pass. - Also can store the output of the convolutional part of the model, which is before avg_pooling- Thinking back when we do cut off `after` the conv part. - but with fast. ai the original convolutional part of the model would be *the first thing in the model*, specifically could be given from `learn. model. eval()[0]` - And this is gotten from `hooked_output` and having hooked the output, we can pass our x_minibatch to output. - Not directly, but with normalized, minibatch, put on to the gpu - `one_item()` function do it, when we have one data `TODO: this is assignment` do it yourself without one_item function - and `. cuda()` put it on gpu- you should print out very often the shape of tensor, and try think why. "
+ }, {
+ "id": 13,
+ "url": "http://localhost:4000/2020/04/qna-image-segmentation/",
+ "title": "[Q&A] Image Segmentation, using Unet with Driving Video data",
+ "body": "2020/04/02 - This post is about my questions while I was studying USF Deep Learning course about image segmentation task. All the answers are from the course, source code, library document, or document. I cared about being clear at reporting information including source of information, however if there are still anything unclear, please contact me. And thank you Jeremy&Rachael for everything. Also Thank you Cambridge Computer Vision Lab to made us to study with your labor. The Cambridge-driving Labeled Video Database (CamVid) is the first collection of videos with object class semantic labels, complete with metadata. The database provides ground truth labels that associate each pixel with one of 32 semantic classes. If someone is interested in this project, please check the site and see the details. Now, let’s start first using jupyter’s one of tricks which I love most. It enables cell to print the code without print function. from IPython. core. interactiveshell import InteractiveShell# pretty print all cell's output and not just the last oneInteractiveShell. ast_node_interactivity = all from fastai. vision import *from fastai. callbacks. hooks import *from fastai. utils. mem import *path = untar_data(URLs. CAMVID) # The locations where the data and models are downloaded are set in config. ymlpath. ls() I’m trying to accustomed to using pathlib module, not just it became built-in module in python, but I felt uncomfortable myself with os module. However, still unpredictable conflicts are remain, even in the quite standard library like Pytorch, tensorflow, onnx. (it require me string for path. not PosixPath. will send PR. . ) [PosixPath('/root/. fastai/data/camvid/valid. txt'), PosixPath('/root/. fastai/data/camvid/images'), PosixPath('/root/. fastai/data/camvid/labels'), PosixPath('/root/. fastai/data/camvid/codes. txt')]path_img = path/'images'path_lbl = path/'labels'fnames = get_image_files(path_img) #filenamelbl_names = get_image_files(path_lbl)1. (Play with data) My Hypothesis: File name has A_B format. and A / B would be at key-value position. Use collections - defaultdict Default Dict: Link: easy to group a sequence of key and value pairs into a dictionary of list?from collections import defaultdictfnames[0], lbl_names[0](PosixPath('/root/. fastai/data/camvid/images/0001TP_009210. png'), PosixPath('/root/. fastai/data/camvid/labels/0016E5_01800_P. png'))files = [tuple(i. stem. split('_')) for i in fnames]labels = [tuple(i. stem. split('_')[:-1]) for i in lbl_names]d = defaultdict(list)for k, v in files: d[k]. append(v)d. keys()len(d['0001TP'])124for k, v in d. items(): print(k, v)0001TP ['009210', '008850', '007350', '008970', '009840', '010140', '008490', '008520', '009540', '008250', '008340', '006840', '007860', '007410', '007740', '009870', '010080', '007890', '008790', '010020', '008400', '007080', '008280', '010380', '009330', '009060', '007470', '006810', '009720', '008580', '007110', '008730', '009150', '007680', '009780', '007800', '007290', '008760', '009510', '008640', '008310', '007440', '006900', '007500', '008460', '009030', '008130', '009480', '009900', '010230', '009270', '008040', '007590', '007950', '009990', '008550', '007260', '008100', '007530', '006960', '008190', '009420', '009930', '009000', '007830', '008940', '006690', '009570', '008880', '010170', '007560', '009300', '006750', '009360', '010200', '007320', '008010', '009120', '007620', '007200', '007140', '010320', '006720', '008670', '007230', '008370', '010260', '009690', '006930', '009090', '007770', '010290', '010350', '008610', '008070', '009600', '008430', '009450', '007380', '009240', '007710', '007170', '008160', '008910', '007020', '006780', '007050', '009960', '009810', '008220', '009180', '009750', '010050', '009660', '010110', '007920', '009630', '007650', '006990', '008700', '009390', '007980', '008820', '006870']0016E5 ['01290', '08159', '05760', '08133', '08063', '06660', '00960', '05850', '00750', '06960', '08035', '08107', '07975', '08017', '05610', '07140', '08119', '08027', '07170', '08400', '08093', '02100', '06390', '04470', '08340', '06060', '00600', '07470', '08151', '07800', '01620', '05730', '01530', '00690', '08430', '05940', '01980', '07320', '08069', '07965', '04380', '05430', '01410', '06780', '08007', '08087', '08079', '06600', '08109', '05490', '00901', '04590', '04680', '08045', '01770', '06690', '08085', '06810', '00420', '08011', '07440', '02190', '06300', '04800', '01500', '00450', '08029', '01470', '06330', '07997', '08067', '05370', '08013', '08190', '00840', '02370', '08049', '08135', '01440', '06870', '05820', '05280', '08051', '04440', '08091', '01380', '00630', '07290', '05520', '04770', '00540', '07995', '07999', '05550', '07920', '08101', '08141', '08053', '04620', '08103', '05160', '07350', '08057', '06030', '06000', '08550', '07963', '08089', '05970', '08047', '05640', '06240', '05220', '04350', '01590', '07959', '01950', '08117', '06180', '01560', '05400', '08043', '07680', '00780', '08081', '07050', '01020', '01350', '04530', '06720', '07969', '08149', '08003', '08131', '08129', '08033', '05460', '01650', '07530', '08023', '05340', '08640', '05100', '08075', '01230', '04980', '02070', '01080', '06210', '05910', '08009', '01800', '05190', '02400', '08083', '08019', '07620', '07200', '07890', '08059', '06990', '04410', '08121', '08123', '06930', '08137', '08147', '08095', '06570', '06150', '08153', '06840', '05250', '00510', '08370', '08580', '08113', '07410', '08097', '01200', '04950', '07770', '07650', '04710', '06090', '08055', '07110', '07981', '00990', '08250', '08127', '01920', '07985', '08220', '08005', '08157', '05130', '08071', '01140', '04830', '07740', '08143', '06120', '02040', '08111', '08115', '00660', '08280', '06420', '07983', '02220', '05700', '01860', '01260', '04920', '06510', '07020', '08073', '08105', '08125', '06360', '07860', '07993', '00810', '06540', '08099', '08139', '02010', '07973', '08155', '07991', '06630', '00480', '06750', '04890', '08001', '08025', '00870', '08490', '01830', '07977', '05010', '01170', '07961', '01680', '01050', '07987', '07080', '04560', '00930', '05310', '02340', '05790', '08460', '00720', '08031', '02280', '08039', '08037', '08065', '06270', '08077', '06900', '04650', '06480', '07230', '08041', '06450', '00570', '07989', '04740', '07979', '02250', '07380', '00390', '01710', '07590', '08021', '08520', '07500', '01110', '04500', '02310', '07971', '02130', '05580', '05880', '08610', '08310', '08145', '05670', '04860', '07260', '08015', '07967', '01740', '01320', '07560', '07830', '01890', '08061', '02160', '07710', '05070', '05040']Seq05VD ['f00030', 'f02550', 'f03450', 'f01110', 'f00480', 'f00210', 'f04590', 'f04170', 'f01800', 'f03990', 'f03360', 'f03900', 'f02070', 'f00810', 'f03690', 'f01350', 'f01530', 'f04980', 'f05100', 'f03060', 'f00900', 'f03870', 'f02460', 'f01470', 'f02370', 'f02820', 'f04080', 'f02760', 'f04860', 'f02250', 'f04200', 'f00270', 'f03720', 'f02850', 'f04410', 'f01200', 'f03090', 'f02010', 'f03930', 'f00090', 'f01650', 'f01890', 'f03840', 'f03030', 'f02130', 'f01230', 'f04110', 'f02520', 'f04140', 'f04020', 'f00060', 'f03420', 'f01560', 'f00120', 'f04290', 'f02340', 'f00300', 'f01380', 'f00870', 'f01860', 'f02970', 'f04560', 'f02730', 'f00330', 'f04530', 'f03780', 'f01770', 'f03390', 'f05040', 'f02430', 'f03330', 'f00660', 'f01740', 'f02100', 'f04800', 'f04050', 'f00510', 'f02790', 'f04350', 'f00690', 'f00540', 'f02490', 'f00960', 'f00930', 'f04230', 'f02880', 'f03600', 'f01020', 'f01500', 'f02400', 'f04830', 'f04470', 'f03300', 'f02670', 'f00450', 'f01980', 'f01170', 'f01620', 'f04500', 'f01080', 'f03180', 'f05070', 'f03150', 'f04950', 'f01440', 'f03510', 'f01710', 'f00360', 'f04770', 'f02910', 'f01050', 'f00630', 'f04320', 'f00570', 'f03240', 'f02190', 'f01140', 'f03540', 'f02220', 'f02640', 'f03960', 'f00000', 'f04920', 'f01950', 'f00990', 'f03480', 'f03000', 'f00420', 'f04620', 'f03210', 'f00780', 'f03570', 'f01590', 'f00750', 'f01920', 'f04650', 'f03750', 'f03630', 'f02310', 'f02610', 'f02580', 'f04740', 'f02280', 'f04680', 'f00390', 'f00720', 'f03660', 'f02040', 'f03270', 'f00180', 'f03810', 'f01410', 'f01290', 'f03120', 'f00840', 'f04440', 'f00150', 'f01260', 'f02700', 'f02940', 'f00600', 'f01830', 'f04260', 'f05010', 'f04890', 'f02160', 'f00240', 'f04380', 'f01680', 'f04710', 'f01320']0006R0 ['f02820', 'f03690', 'f03180', 'f02550', 'f01020', 'f03660', 'f02340', 'f01170', 'f02610', 'f02940', 'f01290', 'f02100', 'f01350', 'f03270', 'f03870', 'f01380', 'f01980', 'f03810', 'f02430', 'f02310', 'f01830', 'f03480', 'f02970', 'f01890', 'f03210', 'f03930', 'f02040', 'f02070', 'f02400', 'f01560', 'f03030', 'f01770', 'f01590', 'f01950', 'f03420', 'f01650', 'f03450', 'f00990', 'f03630', 'f01500', 'f03570', 'f00930', 'f03090', 'f03360', 'f02880', 'f02460', 'f01440', 'f01920', 'f01230', 'f03840', 'f02730', 'f01620', 'f02220', 'f03750', 'f03330', 'f03540', 'f02520', 'f02790', 'f01050', 'f03120', 'f01800', 'f01140', 'f01860', 'f01530', 'f01470', 'f02670', 'f02490', 'f01260', 'f01110', 'f02760', 'f01680', 'f03150', 'f02580', 'f03300', 'f02280', 'f01200', 'f03390', 'f03510', 'f02640', 'f02190', 'f02370', 'f01320', 'f02130', 'f03600', 'f03240', 'f03780', 'f03720', 'f02700', 'f01410', 'f01080', 'f02850', 'f01710', 'f03900', 'f03060', 'f01740', 'f02010', 'f02250', 'f00960', 'f03000', 'f02160', 'f02910']for k, v in d. items(): print(k, len(d[k]))0001TP 1240016E5 305Seq05VD 1710006R0 101for i in d2. keys(): print(i,len(d2[i]))0016E5 3050001TP 1240006R0 101Seq05VD 171files[0], labels[0](('0001TP', '009210'), ('0016E5', '01800'))2. My question: Link: Why do we need masking? and does color from fastai library? (have to look into source code) What do the parameter alpha do? When people make masked img, would it be have ranged integer limit? Does image normalization related with this?lbl_sorted = sorted(lbl_names)f_sorted = sorted(fnames)lbl_1 = lbl_sorted[33]f_1 = f_sorted[33]img = open_image(lbl_1)mask = open_mask(lbl_1)_,axs = plt. subplots(1,2, figsize=(10,5))# img. show(ax=axs[0], y=mask, title='masked')img. show(ax=axs[0], title='1')mask. show(ax=axs[1], title='2', alpha=1. ) img_2 = open_image(f_1)mask_2 = open_mask(f_1)_,axs = plt. subplots(1,2, figsize=(10,5))# img. show(ax=axs[0], y=mask, title='masked')img_2. show(ax=axs[0], title='3',)mask_2. show(ax=axs[1], title='4', alpha=1. ) open_mask(lbl_1). data. shapetorch. Size([1, 720, 960])open_mask(lbl_1). data. shapetorch. Size([1, 720, 960])open_image(f_1). data. shapetorch. Size([3, 720, 960])open_image(f_1). data. shapetorch. Size([3, 720, 960])img. data #labeled datatensor([[[0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], . . . , [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176]], [[0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], . . . , [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176]], [[0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], [0. 0157, 0. 0157, 0. 0157, . . . , 0. 0824, 0. 0824, 0. 0824], . . . , [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176], [0. 0667, 0. 0667, 0. 0667, . . . , 0. 1176, 0. 1176, 0. 1176]]])mask. data # after mask, labeled datatensor([[[ 4, 4, 4, . . . , 21, 21, 21], [ 4, 4, 4, . . . , 21, 21, 21], [ 4, 4, 4, . . . , 21, 21, 21], . . . , [17, 17, 17, . . . , 30, 30, 30], [17, 17, 17, . . . , 30, 30, 30], [17, 17, 17, . . . , 30, 30, 30]]])img_2. data, mask_2. data(tensor([[[0. 0706, 0. 0667, 0. 0706, . . . , 0. 6431, 0. 6549, 0. 6627], [0. 0745, 0. 0706, 0. 0706, . . . , 0. 6431, 0. 6510, 0. 6549], [0. 0784, 0. 0706, 0. 0745, . . . , 0. 6392, 0. 6588, 0. 6588], . . . , [0. 0863, 0. 0824, 0. 0824, . . . , 0. 1333, 0. 1216, 0. 1255], [0. 0902, 0. 0863, 0. 0824, . . . , 0. 1255, 0. 1176, 0. 1216], [0. 0863, 0. 0824, 0. 0784, . . . , 0. 1137, 0. 1059, 0. 1137]], [[0. 0706, 0. 0667, 0. 0706, . . . , 0. 7490, 0. 7608, 0. 7686], [0. 0745, 0. 0706, 0. 0706, . . . , 0. 7451, 0. 7569, 0. 7608], [0. 0784, 0. 0706, 0. 0745, . . . , 0. 7412, 0. 7529, 0. 7529], . . . , [0. 0980, 0. 0941, 0. 0941, . . . , 0. 1804, 0. 1686, 0. 1725], [0. 1059, 0. 1020, 0. 0980, . . . , 0. 1725, 0. 1647, 0. 1686], [0. 1020, 0. 0980, 0. 0941, . . . , 0. 1608, 0. 1529, 0. 1608]], [[0. 0784, 0. 0745, 0. 0784, . . . , 0. 7569, 0. 7686, 0. 7765], [0. 0824, 0. 0784, 0. 0784, . . . , 0. 7647, 0. 7647, 0. 7686], [0. 0784, 0. 0706, 0. 0745, . . . , 0. 7608, 0. 7647, 0. 7647], . . . , [0. 1216, 0. 1176, 0. 1176, . . . , 0. 2000, 0. 1882, 0. 1922], [0. 1176, 0. 1137, 0. 1098, . . . , 0. 1843, 0. 1765, 0. 1804], [0. 1137, 0. 1098, 0. 1059, . . . , 0. 1725, 0. 1647, 0. 1725]]]), tensor([[[ 18, 17, 18, . . . , 183, 186, 188], [ 19, 18, 18, . . . , 183, 185, 186], [ 20, 18, 19, . . . , 182, 185, 185], . . . , [ 25, 24, 24, . . . , 43, 40, 41], [ 26, 25, 24, . . . , 41, 39, 40], [ 25, 24, 23, . . . , 38, 36, 38]]]))3. What is a difference between image and imageSegment?: imageSegment An ImageSegment object has the same properties as an Image. The only difference is that when applying the transformations to an ImageSegment, it will ignore the functions that deal with lighting and keep values of 0 and 1. It’s easy to show the segmentation mask over the associated Image by using the y argument of show_image. img = open_image(fnames[0])mask = open_mask(lbl_names[0])_,axs = plt. subplots(1,3, figsize=(8,4))img. show(ax=axs[0], title='no mask')img. show(ax=axs[1], y=mask, title='masked') #seg mask over the img using y argmask. show(ax=axs[2], title='mask only', alpha=1. ) vision. image ##4. Why/How img div by 255 and how it results fast. ai : vision. image - If div=True, pixel values are divided by 255. to become floats between 0. and 1. At times, you want to get rid of distortions caused by lights and shadows in an image. Normalizing the RGB values of an image can at times be a simple and effective way of achieving this. So sum of the pixel’s value over all channels(which is S) divides each intensified channel so that nomalized value will be R/S, G/S and B/S (where, S=R+G+B). Detailed explain here4. Python Evaluation Order: Python evaluates expressions from left to right. Notice that while evaluating an assignment, the right-hand side is evaluated before the left-hand side. mask_tmp, trg_tmp, void_tmp = 2, 1, 10mask_tmp = trg_tmp != void_tmpprint(mask_tmp, trg_tmp, void_tmp) # (1) target is not same with voidTrue 1 10# Example 1x = 1y = 2x,y = y,x+yx, y(2, 3)# Example 2x = 1y = 2x = yy = x+yx, y(2, 4)5. model learner parameter :: pct_start: A: Percentage of total number of epochs when learning rate rises during one cycle. Q: Sorry, I still confused that one cycle in the new API only runs one epoch. How the percentage of total number of epochs works? Can you give a example? If learn. fit_one_cycle(10, slice(1e-4,1e-3,1e-2), pct_start=0. 05)??A: Ok, strictly correct answer would be percentage of iterations, so you can have lr both increase and decrease during same epoch. In your example, say, you have 100 iterations per epoch, then for half an epoch (0. 05 * (10 * 100) = 50) lr will rise, then slowly decrease. Q2: Thanks for this explanation … so essentially, it is the percentage of overall iterations where the LR is increasing, correct? So, given the default of 0. 3, it means that your LR is going up for 30% of your iterations and then decreasing over the last 70%. Is that a correct summation of what is happening? A2: Yes, I think that’s correct. You can verify that by changing its value and check:learn. recorder. plot_lr() For example if pct_start = 0. 2 source: forums. fastai "
+ }, {
+ "id": 14,
"url": "http://localhost:4000/2020/03/note08-fastai-4/",
"title": "Gradient backward, Chain Rule, Refactoring",
- "body": "2020/03/02 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring” Lecture 08 - Deep Learning From Foundations-part2 “ Homework: calculus for machine learning einsum conventionCONTENTS: Foundation version Gradients backward pass decompose function chain rule with code check the result using Pytorch autograd Refactor model Layers as classes Modue. forward() Without einsum nn. Linear and nn. Module Forward process Foundation version: Gradients backward pass: Gradients is output with respect to parameter we’ve done this work in this path(below) to simplify this calculus, we can just change it into, So, you should know of the derivative of each bit on its own, and then you multiply them all together. As a result, it would be over cross over the data. So you can get gradient, output with respect to parameter What order should we calculate? BTW, why Jeremy wrote , not Loss function?1 decompose function We want to get derivative of which forms But, we have a estimation of answer (we call it y hat) now So, I will decompose funciton to trace target variable. Using the above forward pass, we can suppose some function from the end. start from , We know MSE funciton got two parameters, output, and target . from MSE’s input we know function’s output and supposing v is input of that function, similarly, v became output of chain rule with code examplify backward process by random sampling To get a variable, I modified forward model a little def model_ping(out = 'x_train'): l1 = lin(x_train, w1, b1) # one linear layer l2 = relu(l1) # one relu layer l3 = lin(l2, w2, b2) # one more linear layer return eval(out) Be careful we don’t use mse_loss in backward process1) start with the very last function, which is loss funciton. MSE If we codify this formula,def mse_grad(inp, targ): #mse_input(1000,1), mse_targ (1000,1) # grad of loss with respect to output of previous layer inp. g = 2. * (inp. squeeze() - targ). unsqueeze(-1) / inp. shape[0] And, this can be examplified like below. Notice that input of gradient function is same with forward functiony_hat = model_ping('l3') #get value from forward modely_hat. g = ((y_hat. squeeze(-1)-y_train). unsqueeze(-1))/y_hat. shape[0]y_hat. g. shape>>> torch. Size([50000, 1]) We can just calculate using broadcasting, not using squeeze. then why should do and unsqueeze again?🎯 It’s related with random access memory(RAM). . If I don’t squeeze, (I’m using colab) it out of RAM. 2) Derivative of linear2 function This process’s weight dimensions defined by axis=1, axis=2. axis=0 dimension means size of data. This will be summazed by . sum(0) method. unsqeeze(-1)&unsqeeze(1) seperates the dimension, and make a dot product, and vanish axis=0 dimension. def lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowlin2 = model_ping('l2'); #get value from forward modellin2. g = y_hat. g@w2. t(); w2. g = (lin2. unsqueeze(-1) * y_hat. g. unsqueeze(1)). sum(0);b2. g = y_hat. g. sum(0);lin2. g. shape, w2. g. shape, b2. g. shape>>> torch. Size([50000, 50])torch. Size([50, 1])torch. Size([1]) Notice going reverse order, we’re passing in gradient backward3) derivative of ReLU def relu_grad(inp, out): # grad of relu with respect to input activations inp. g = (inp>0). float() * out. g Examplified belowlin1=model_ping('l1') #get value from forward modellin1. g = (lin1>0). float() * lin2. g;lin1. g. shape>>> torch. Size([50000, 50])4) Derivative of linear1 Same process with 2) but, this process’s weight hasdef lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowx_train. g = lin1. g @ w1. t(); w1. g = (x_train. unsqueeze(-1) * lin1. g. unsqueeze(1)). sum(0); b1. g = lin1. g. sum(0);x_train. g. shape, w1. g. shape, b1. g. shape>>> torch. Size([50000, 784])torch. Size([784, 50])torch. Size([50])5) Then it goes backward pass def forward_and_backward(inp, targ): # forward pass: l1 = inp @ w1 + b1 l2 = relu(l1) out = l2 @ w2 + b2 # we don't actually need the loss in backward! loss = mse(out, targ) # backward pass: mse_grad(out, targ) lin_grad(l2, out, w2, b2) relu_grad(l1, l2) lin_grad(inp, l1, w1, b1)Version 1 (Basic)- Wall time: 1. 95 s Summary Notice that output of function at forward pass became input of backward pass backpropagation is just the chain rule value loss (loss=mse(out,targ)) is not used in gradient calcuation. Because, it doesn’t appear with the weight. w1g, w2g, b1g, b2g, ig will be used for optimizercheck the result using Pytorch autograd require_grad_ is the magical function, which can automatic differentiation. 2 This magical auto gradified tensor keep track what happend in forward (taking loss function), and do the backward3 So it saves our time to differentiate ourselves ⤵️ THis is benchmark…. . Version 2 (torch autograd)- Wall time: 3. 81 µs Refactor model: Amazingly, just refactoring our main pieces, it comes down up to Pytorch package. 🌟 Implement yourself, Practice, practice, practice! 🌟 Layers as classes: Relu and Linear are layers in oue neural net. -> make it as classes For the forward, using __call__ for the both of forward & backward. Because ‘call’ means we treat this as a function. class Lin(): def __init__(self, w, b): self. w,self. b = w,b def __call__(self, inp): self. inp = inp self. out = inp@self. w + self. b return self. out def backward(self): self. inp. g = self. out. g @ self. w. t() # Creating a giant outer product, just to sum it, is inefficient! self. w. g = (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) self. b. g = self. out. g. sum(0) Remember that in lin_grad function, we save bias&weight!!!!!💬 inp. g : gradient of the output with respect to the input. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 w. g : gradient of the output with respect to the weight. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 b. g : gradient of the output with respect to the bias. {: style=”color:grey; font-size: 90%; text-align: center;”} class Model(): def __init__(self, w1, b1, w2, b2): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ) def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() refer to Jeremy’s Model class, he put layers in list Dionne’s self-study note: Decomposing Jeremy’s Model class init needs weight, bias but not x data when call that class(a. k. a function) it gave x data and y label! jeremy composited function in layers. x = l(x) so concise…. . also utilized that layer list when backward ust reversing it (using python list’s method) And he is recursively calling the function on the result of the previous thing. ⬇️for l in self. layers: x = l(x)Q2: Don’t I need to declare magical autograd function, requires_grad_?{: style=”color:red; font-size: 130%; text-align: center;”} [The questions migrated to this article] Version 3 (refactoring - layer to class)- Wall time: 5. 25 µs Modue. forward(): Duplicate code makes execution time slow. Role of __call__ changed. No more __call__ for implementing forward pass. By initializing the forward with __call__, Module. forward() use overriding to maximize reusability. So any layer inherit Module, can use parent’s function. gradient of the output with respect to the weight (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) can be reexpressed using einsum, torch. einsum( bi,bj->ij , inp, out. g) Defining forward and Module enables Pytorch to out almost duplicatesVersion 4 (Module & einsum)- Wall time: 4. 29 µs Q2: Isn’t there any way to use broadcasting? Why we should use outer product?{: style=”color:red; font-size: 130%; text-align: center;”} Without einsum: Replacing einsum to matrix product is even more faster. torch. einsum( bi,bj->ij , inp, out. g)can be reexpressed using matrix product, inp. t() @ out. gVersion 5 (without einsum)- Wall time: 3. 81 µs nn. Linear and nn. Module: Torch’s package nn. Linear and nn. Module Version 6 (torch package)- Wall time: 5. 01 µs Final, Using torch. nn. Linear & torch. nn. Module~~~pythonclass Model(nn. Module): def init(self, n_in, nh, n_out): super(). init() self. layers = [nn. Linear(n_in,nh), nn. ReLU(), nn. Linear(nh,n_out)] self. loss = mse def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x. squeeze(), targ)class Model(): def init(self): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ)def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() ~~~ Footnote: fast. ai forums Lesson-8 ↩ pytorch docs - autograd ↩ stackoverflow - finding methods a object has ↩ "
+ "body": "2020/03/02 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring ” Lecture 08 - Deep Learning From Foundations-part2 “ Homework: calculus for machine learning einsum conventionCONTENTS: Foundation version Gradients backward pass decompose function chain rule with code check the result using Pytorch autograd Refactor model Layers as classes Modue. forward() Without einsum nn. Linear and nn. Module Forward process Foundation version: Gradients backward pass: Gradients is output with respect to parameter we’ve done this work in this path(below) to simplify this calculus, we can just change it into, So, you should know of the derivative of each bit on its own, and then you multiply them all together. As a result, it would be over cross over the data. So you can get gradient, output with respect to parameter What order should we calculate? BTW, why Jeremy wrote , not Loss function?1 decompose function We want to get derivative of which forms But, we have a estimation of answer (we call it y hat) now So, I will decompose funciton to trace target variable. Using the above forward pass, we can suppose some function from the end. start from , We know MSE funciton got two parameters, output, and target . from MSE’s input we know function’s output and supposing v is input of that function, similarly, v became output of chain rule with code examplify backward process by random sampling To get a variable, I modified forward model a little def model_ping(out = 'x_train'): l1 = lin(x_train, w1, b1) # one linear layer l2 = relu(l1) # one relu layer l3 = lin(l2, w2, b2) # one more linear layer return eval(out) Be careful we don’t use mse_loss in backward process1) start with the very last function, which is loss funciton. MSE If we codify this formula,def mse_grad(inp, targ): #mse_input(1000,1), mse_targ (1000,1) # grad of loss with respect to output of previous layer inp. g = 2. * (inp. squeeze() - targ). unsqueeze(-1) / inp. shape[0] And, this can be examplified like below. Notice that input of gradient function is same with forward functiony_hat = model_ping('l3') #get value from forward modely_hat. g = ((y_hat. squeeze(-1)-y_train). unsqueeze(-1))/y_hat. shape[0]y_hat. g. shape>>> torch. Size([50000, 1]) We can just calculate using broadcasting, not using squeeze. then why should do and unsqueeze again?🎯 It’s related with random access memory(RAM). . If I don’t squeeze, (I’m using colab) it out of RAM. 2) Derivative of linear2 function This process’s weight dimensions defined by axis=1, axis=2. axis=0 dimension means size of data. This will be summazed by . sum(0) method. unsqeeze(-1)&unsqeeze(1) seperates the dimension, and make a dot product, and vanish axis=0 dimension. def lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowlin2 = model_ping('l2'); #get value from forward modellin2. g = y_hat. g@w2. t(); w2. g = (lin2. unsqueeze(-1) * y_hat. g. unsqueeze(1)). sum(0);b2. g = y_hat. g. sum(0);lin2. g. shape, w2. g. shape, b2. g. shape>>> torch. Size([50000, 50])torch. Size([50, 1])torch. Size([1]) Notice going reverse order, we’re passing in gradient backward3) derivative of ReLU def relu_grad(inp, out): # grad of relu with respect to input activations inp. g = (inp>0). float() * out. g Examplified belowlin1=model_ping('l1') #get value from forward modellin1. g = (lin1>0). float() * lin2. g;lin1. g. shape>>> torch. Size([50000, 50])4) Derivative of linear1 Same process with 2) but, this process’s weight hasdef lin_grad(inp, out, w, b): # grad of matmul with respect to input inp. g = out. g @ w. t() w. g = (inp. unsqueeze(-1) * out. g. unsqueeze(1)). sum(0) b. g = out. g. sum(0) Examplified belowx_train. g = lin1. g @ w1. t(); w1. g = (x_train. unsqueeze(-1) * lin1. g. unsqueeze(1)). sum(0); b1. g = lin1. g. sum(0);x_train. g. shape, w1. g. shape, b1. g. shape>>> torch. Size([50000, 784])torch. Size([784, 50])torch. Size([50])5) Then it goes backward pass def forward_and_backward(inp, targ): # forward pass: l1 = inp @ w1 + b1 l2 = relu(l1) out = l2 @ w2 + b2 # we don't actually need the loss in backward! loss = mse(out, targ) # backward pass: mse_grad(out, targ) lin_grad(l2, out, w2, b2) relu_grad(l1, l2) lin_grad(inp, l1, w1, b1)Version 1 (Basic)- Wall time: 1. 95 s Summary Notice that output of function at forward pass became input of backward pass backpropagation is just the chain rule value loss (loss=mse(out,targ)) is not used in gradient calcuation. Because, it doesn’t appear with the weight. w1g, w2g, b1g, b2g, ig will be used for optimizercheck the result using Pytorch autograd require_grad_ is the magical function, which can automatic differentiation. 2 This magical auto gradified tensor keep track what happend in forward (taking loss function), and do the backward3 So it saves our time to differentiate ourselves Postfix underscore means in pytorch, in-place function, What is in-place function?⤵️ THis is benchmark…. . Version 2 (torch autograd)- Wall time: 3. 81 µs Refactor model: Amazingly, just refactoring our main pieces, it comes down up to Pytorch package. 🌟 Implement yourself, Practice, practice, practice! 🌟 Layers as classes: Relu and Linear are layers in oue neural net. -> make it as classes For the forward, using __call__ for the both of forward & backward. Because ‘call’ means we treat this as a function. class Lin(): def __init__(self, w, b): self. w,self. b = w,b def __call__(self, inp): self. inp = inp self. out = inp@self. w + self. b return self. out def backward(self): self. inp. g = self. out. g @ self. w. t() # Creating a giant outer product, just to sum it, is inefficient! self. w. g = (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) self. b. g = self. out. g. sum(0) Remember that in lin_grad function, we save bias&weight!!!!!💬 inp. g : gradient of the output with respect to the input. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 w. g : gradient of the output with respect to the weight. {: style=”color:grey; font-size: 90%; text-align: center;”} 💬 b. g : gradient of the output with respect to the bias. {: style=”color:grey; font-size: 90%; text-align: center;”} class Model(): def __init__(self, w1, b1, w2, b2): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ) def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() refer to Jeremy’s Model class, he put layers in list Dionne’s self-study note: Decomposing Jeremy’s Model class init needs weight, bias but not x data when call that class(a. k. a function) it gave x data and y label! jeremy composited function in layers. x = l(x) so concise…. . also utilized that layer list when backward ust reversing it (using python list’s method) And he is recursively calling the function on the result of the previous thing. ⬇️for l in self. layers: x = l(x)Q2: Don’t I need to declare magical autograd function, requires_grad_?{: style=”color:red; font-size: 130%; text-align: center;”} [The questions migrated to this article] Version 3 (refactoring - layer to class)- Wall time: 5. 25 µs Modue. forward(): Duplicate code makes execution time slow. Role of __call__ changed. No more __call__ for implementing forward pass. By initializing the forward with __call__, Module. forward() use overriding to maximize reusability. So any layer inherit Module, can use parent’s function. gradient of the output with respect to the weight (self. inp. unsqueeze(-1) * self. out. g. unsqueeze(1)). sum(0) can be reexpressed using einsum, torch. einsum( bi,bj->ij , inp, out. g) Defining forward and Module enables Pytorch to out almost duplicatesVersion 4 (Module & einsum)- Wall time: 4. 29 µs Q2: Isn’t there any way to use broadcasting? Why we should use outer product?{: style=”color:red; font-size: 130%; text-align: center;”} Without einsum: Replacing einsum to matrix product is even more faster. torch. einsum( bi,bj->ij , inp, out. g)can be reexpressed using matrix product, inp. t() @ out. gVersion 5 (without einsum)- Wall time: 3. 81 µs nn. Linear and nn. Module: Torch’s package nn. Linear and nn. Module Version 6 (torch package)- Wall time: 5. 01 µs Final, Using torch. nn. Linear & torch. nn. Module~~~pythonclass Model(nn. Module): def init(self, n_in, nh, n_out): super(). init() self. layers = [nn. Linear(n_in,nh), nn. ReLU(), nn. Linear(nh,n_out)] self. loss = mse def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x. squeeze(), targ)class Model(): def init(self): self. layers = [Lin(w1,b1), Relu(), Lin(w2,b2)] self. loss = Mse() def __call__(self, x, targ): for l in self. layers: x = l(x) return self. loss(x, targ)def backward(self): self. loss. backward() for l in reversed(self. layers): l. backward() ~~~ Footnote: fast. ai forums Lesson-8 ↩ pytorch docs - autograd ↩ stackoverflow - finding methods a object has ↩ "
}, {
- "id": 13,
+ "id": 15,
"url": "http://localhost:4000/2020/03/note08-fastai-3/",
"title": "Implement forward&backward pass from scratch",
"body": "2020/03/01 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring1. The forward and backward passes: 1. 1 Normalization: train_mean,train_std = x_train. mean(),x_train. std()>>> train_mean,train_std(tensor(0. 1304), tensor(0. 3073))Remember! Dataset, which is x_train, mean and standard deviation is not 0&1. But we need them to be which means we should substract means and divide data by std. You should not standarlize validation set because training set and validation set should be aparted. after normalize, mean is close to zero, and standard deviation is close to 1. 1. 2 Variable definition: n,m: size of the training set c: the number of activations we need in our model2. Foundation Version: 2. 1 Basic architecture: Our model has one hidden layer, output to have 10 activations, used in cross entropy. But in process of building architecture, we will use mean square error, output to have 1 activations and lator change it to cross entropy number of hidden unit; 50see below pic We want to make w1&w2 mean and std be 0&1. why initializating and make mean zero and std one is important? paper highlighting importance of normalisation - training 10,000 layer network without regularisation1 2. 1. 1 simplified kaiming initQ: Why we did init, normalize with only validation data? Because we can not handle and get statistics from each value of x_valid?{: style=”color:red; font-size: 130%; text-align: center;”} what about hidden(first) layer?w1 = torch. randn(m,nh)b1 = torch. zeros(nh)t = lin(x_valid, w1, b1) # hidden>>> t. mean(), t. std()((tensor(2. 3191), tensor(27. 0303))In output(second) layer, w2 = torch. randn(nh,1)b2 = torch. zeros(1)t2 = lin(t, w2, b2) # output>>> t2. mean(), t2. std()(tensor(-58. 2665), tensor(170. 9717)) which is terribly far from normalzed value. But if we apply simplified kaiming init w1 = torch. randn(m,nh)/math. sqrt(m); b1 = torch. zeros(nh)w2 = torch. randn(nh,1)/math. sqrt(nh); b2 = torch. zeros(1)t = lin(x_valid, w1, b1)t. mean(),t. std()>>> (tensor(-0. 0516), tensor(0. 9354)) But, actually, we use activations not only linear function After applying activations relu at linear layer, mean and deviation became 0. 5. 2. 1. 2 Glorrot initializationPaper2: Understanding the difficulty of training deep feedforward neural networks Gaussian(, bell shaped, normal distributions) is not trained very well. How to initialize neural nets? with the size of layer , the number of filters . But there is No acount for import of ReLU If we got 1000 layers, vanishing gradients problem emerges2. 1. 3 Kaiming initializatingPaper3: Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Kaiming He, explained here rectifier: rectified linear unit rectifier network: neural network with rectifier linear units This is kaiming init, and why suddenly replace one to two on a top? to avoid vanishing gradient(weights) But it doesn’t give very nice mean tough. 2. 1. 4 Pytorch package Why fan_out? according to pytorch documentation, choosing 'fan_in' preserves the magnitude of the variance of the wights in the forward pass. choosing 'fan_out' preserves the magnitues in the backward pass(, which means matmul; with transposed matrix) ➡️ in the other words, torch use fan_out cz pytorch transpose in linear transformaton. What about CNN in Pytorch?I tried torch. nn. Conv2d. conv2d_forward?? Jeremy digged into using torch. nn. modules. conv. _ConvNd. reset_parameters?? 2 in Pytorch, it doesn’t seem to be implemented kaiming init in right formula. so we should use our own operation. But actually, this has been discussed in Pytorch community before. 3 4 Jeremy said it enhanced variance also, so I sampled 100 times and counted better results. To make sure the shape seems sensible. check with assert. (remember we will replace 1 to 10 in cross entropy)assert model(x_valid). shape==torch. Size([x_valid. shape[0],1])>>> model(x_valid). shape(10000, 1) We have made Relu, init, linear, it seems we can forward pass code we need for basic architecture nh = 50def lin(x, w, b): return x@w + b;w1 = torch. randn(m,nh)*math. sqrt(2. /m ); b1 = torch. zeros(nh)w2 = torch. randn(nh,1); b2 = torch. zeros(1)def relu(x): return x. clamp_min(0. ) - 0. 5t1 = relu(lin(x_valid, w1, b1))def model(xb): l1 = lin(xb, w1, b1) l2 = relu(l1) l3 = lin(l2, w2, b2) return l32. 2 Loss function: MSE: Mean squared error need unit vector, so we remove unit axis. def mse(output, targ): return (output. squeeze(-1) - targ). pow(2). mean() In python, in case you remove axis, you use ‘squeeze’, or add axis use ‘unsqueeze’ torch. squeeze where code commonly broken. so, when you use squeeze, clarify dimension axis you want to removetmp = torch. tensor([1,1])tmp. squeeze()>>> tensor([1, 1]) make sure to make as float when you calculateBut why??? because it is tensor?{: style=”color:red; font-size: 130%;”} Here’s the error when I don’t transform the data type ---------------------------------------------------------------------------TypeError Traceback (most recent call last)<ipython-input-22-ae6009bef8b4> in <module>()----> 1 y_train = get_data()[1] # call data again 2 mse(preds, y_train)TypeError: 'map' object is not subscriptable This is forward passFootnote: Other materials: Understanding the difficulty of training deep feedforward neural networks, paper that introduced Xavier initialization Fixup Initialization: Residual Learning Without Normalization ↩ Pytorch implementaion on Kaiming init of conv and linear layers ↩ Pytorch kaiming init issue ↩ Pytorch kaiming init explained ↩ "
}, {
- "id": 14,
+ "id": 16,
"url": "http://localhost:4000/2020/03/note08-fastai-2/",
"title": "What's inside Pytorch Operator?",
"body": "2020/03/01 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, RefactoringWhat’s inside Pytorch Operator?: Section02 Time comparison with pure Python: Matmul with broadcasting> 3194. 95 times faster Einstein summation> 16090. 91 times faster Pytorch’s operator> 49166. 67 times faster 1. Elementwise op: 1. 1 Frobenius norm: above converted into (m*m). sum(). sqrt() Plus, don’t suffer from mathmatical symbols. He also copy and paste that equations from wikipedia. and if you need latex form, download it from archive. 2. Elementwise Matmul: What is the meaning of elementwise? We do not calculate each component. But all of the component at once. Because, length of column of A and row of B are fixed. How much time we saved? So now that takes 1. 37ms. We have removed one line of code and it is a 178 times faster…#TODOI don’t know where the 5 from. but keep it. Maybe this is related with frobenius norm…?as a result, the code before for k in range(ac): c[i,j] += a[i,k] + b[k,j]the code after c[i,j] = (a[i,:] * b[:,j]). sum()To compare it (result betweet original and adjusted version) we use not test_eq but other function. The reason for this is that due to rounding errors from math operations, matrices may not be exactly the same. As a result, we want a function that will “is a equal to b within some tolerance” #exportdef near(a,b): return torch. allclose(a, b, rtol=1e-3, atol=1e-5)def test_near(a,b): test(a,b,near)test_near(t1, matmul(m1, m2))3. Broadcasting: Now, we will use the broadcasting and removec[i,j] = (a[i,:] * b[:,j]). sum() How it works?>>> a=tensor([[10,10,10], [20,20,20], [30,30,30]])>>> b=tensor([1,2,3,])>>> a,b (tensor([[10, 10, 10], [20, 20, 20], [30, 30, 30]]),tensor([1, 2, 3])) >>> a+btensor([[11, 12, 13], [21, 22, 23], [31, 32, 33]]) <Figure 2> demonstrated how array b is broadcasting(or copied but not occupy memory) to compatible with a. Refered from numpy_tutorial there is no loop, but it seems there is exactly the loop. This is not from jeremy (actually after a moment he cover it) but i wondered How to broadcast an array by columns? c=tensor([[1],[2],[3]])a+ctensor([[11, 11, 11], [22, 22, 22], [33, 33, 33]])s What is tensor. stride()?help(t. stride)Help on built-in function stride: stride(…) method of torch. Tensor instancestride(dim) -> tuple or intReturns the stride of :attr:’self’ tensor. Stride is the jump necessary to go from one element to the next one in the specified dimension :attr:’dim’. A tuple of all strides is returned when no argument is passed in. Otherwise, an integer value is returned as the stride in the particular dimension :attr:’dim’. Args: dim (int, optional): the desired dimension in which stride is requiredExample::* x = torch. tensor([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])`x. stride()>>> (5, 1)x. stride(0)>>> 5x. stride(-1)>>> 1 unsqueeze & None index We can manipulate rank of tensor Special value ‘None’, which means please squeeze a new axis here== please broadcast herec = torch. tensor([10,20,30])c[None,:] in c, squeeze a new axis in here please. 2. 2 Matmul with broadcasting: for i in range(ar):# c[i,j] = (a[i,:]). *[:,j]. sum() #previous c[i] = (a[i]. unsqueeze(-1) * b). sum(dim=0) And Using None also (As howard teached)c[i] = (a[i ]. unsqueeze(-1) * b). sum(dim=0) #howardc[i] = (a[i][:,None] * b). sum(dim=0) # using Nonec[i] = (a[i,:,None]*b). sum(dim=0)⭐️Tips🌟 1) Anytime there’s a trailinng(final) colon in numpy or pytorch you can delete it ex) c[i, :] = c [i]2) any number of colon commas at the start, you can switch it with the single elipsis. ex) c[:,:,:,:,i] = c […,i] 2. 3 Broadcasting Rules: What if we tensor. size([1,3]) * tensor. size([3,1])? torch. Size([3, 3]) What is scale???? What if they are one array is times of the other array? ex) Image : 256 x 256 x 3Scale : 128 x 256 x 3Result: ? Why I did not inserted axis via None, but happened broadcasting? >>> c * c[:,None]tensor([[100. , 200. , 300. ], [200. , 400. , 600. ], [300. , 600. , 900. ]])maybe it broadcast cz following array has 3 rows as same principle, no matter what nature shape was, if we do the operation tensor broadcasts to the other. >>> c==c[None]tensor([[True, True, True]])>>> c[None]==c[None,:]tensor([[True, True, True]])>>>c[None,:]==ctensor([[True, True, True]])3. Einstein summation: Creates batch-wise, remove inner most loop, and replaced it with an elementwise producta. k. ac[i,j] += a[i,k] * b[k,j]inner most loop c[i,j] = (a[i,:] * b[:,j]). sum()elementwise product Because K is repeated so we do a dot product. And it is torch. Usage of einsum()1) transpose2) diagnalisation tracing3) batch-wise (matmul) … einstein summation notationdef matmul(a,b): return torch. einsum('ik,kj->ij', a, b)so after all, we are now 16000 times faster than Python. 4. Pytorch op: 49166. 67 times faster than pure python And we will use this matrix multiplication in Fully Connect forward, with some initialized parameters and ReLU. But before that, we need initialized parameters and ReLU, Footnote: TensorRank ti noteResources: Frobenius Norm Review Broadcasting Review (especially Rule) Refer colab! (I totally confused with extension of arrays) torch. allclose Review np. einsum Reviewh "
}, {
- "id": 15,
+ "id": 17,
"url": "http://localhost:4000/2020/02/note08-fastai-1/",
"title": "What is the meaning of 'deep-learning from foundations?'",
"body": "2020/02/29 - This note is divided into 4 section. Section1: What is the meaning of ‘deep-learning from foundations?’ Section2: What’s inside Pytorch Operator? Section3: Implement forward&backward pass from scratch Section4: Gradient backward, Chain Rule, Refactoring” Lecture 08 - Deep Learning From Foundations-part2 “ I don’t know if you read this article, but I heartily appreciate Rachael Thomas and Jeremy Howard for providing these priceless lectures for free Homework: Review concepts 16 concepts from Course 1 (lessons 1 - 7)(1) Affine Functions & non-linearities; 2) Parameters & activations; 3) Random initialization & transfer learning; 4) SGD, Momentum, Adam; 5) Convolutions; Batch-norm; 6) Dropout; 7) Data augmentation; 8) Weight decay; 9) Res/dense blocks; 10) Image classification and regression; 11)Embeddings; 12) Continuous & Categorical variables; 13) Collaborative filtering; 14) Language models; 15) NLP classification; 16) Segmentation; U-net; GANS) Make sure you understand broadcasting Read section 2. 2 in Delving Deep into Rectifiers Try to replicate as much of the notebooks as you can without peeking; when you get stuck, peek at the lesson notebook, but then close it and try to do it yourself calculus for machine learning based on weight… einsum conventionCONTENTS: What is going on in this course? What is ‘from foundations’? Steps to a basic modern CNN model Today’s implementation goal: 1) matmul -> 4) FC backward Library development using jupyter notebook jupyter notebook certainly can make module Elementwise ops How can we make python faster? What is element wise operation? FootnoteWhat is going on in this course?: What is ‘from foundations’?: 1) Recreate fast. ai and Pytorch 2) using pure python Evade OverfittingOverfit : validation error getting worsetraining loss < validation loss Know the name of the symbol you usefind in this page if you don’t know the symbol that you are using or just draw it here (run by ML!) Steps to a basic modern CNN model: 1) Matrix multiplication -> 2) Relu/Initialization -> 3) Fully-connected Forward-> 4) Fully-connected Backward -> 5) Train loop -> 6) Convolution-> 7) Optimization ->8) Batchnormalization -> 9) Resnet Today’s implementation goal: 1) matmul -> 4) FC backward: Library development using jupyter notebook: what is assers? jupyter notebook certainly can make module: There will be #export tag that Howard (and we) want to extract special notebook2script. py will detect sign of #expert and convert following into python module and test ittest\_eq(TEST,'test')test\_eq(TEST,'test1') what is run_notebook. py? when you want to test your module in command line interface !python run\_notebook. py 01_matmul. ipynb Is there any difference between 1) and 2)?1) test -> test01 2) test01 -> test #TODO I don’t know yet look into run_notebook. py, package fire Jeremy used. What is that?read and run the code in a notebook, and in the process, Jeremy made Python Fire library called!shockingly, fire takes any kind of function and converts into CLI command. fire library was released by Google open source, Thursday, March 2, 2017 Get data pytorch and numpy are pretty much same. variable c explains how many pixels there are in in MNIST, 28 pixels PyTorch’s view() method: torch function that manipulating tensor, and squeeze() in torch & mathmatical operation similar function Rao & McMahan said usually this functions result in feature vector. In part 1, you can use view function several times. Initial python model Which is Linear, like $Xw$(weight)$+a$(bias) $= Y$ If you don’t know hou to multiple matrix, refer this site matmul visulization site How many time spends if we we use pure python function matmul, typical matrix multiplication function, takes about 1 second for calculating 1 single train data! (maybe assumed stochastic, 5 data points in validation) it takes about 11. 36 hours to update parameters even single layer and 1 iteration! (if that was my computer, it would be 14 hours. . )🤪 THIS is why we need to consider ‘time’&’space’ This is kinda slow - what if we could speed it up by 50,000 times? Let’s try! Elementwise ops: How can we make python faster?: If we want to calculate faster, then do remove pythonic calcuation, by passing its computation down to something that is written something other than python, like pytorch. According to PyTorch doc it uses C++ (via ATen), so we are going to implement that function with python. What is element wise operation?: items makes a pair, operate corresponding componentFootnote: notebooks material video broadcasting excel"
}, {
- "id": 16,
+ "id": 18,
"url": "http://localhost:4000/2020/02/what-is-convolution/",
"title": "Digging into convolution",
"body": "2020/02/28 - Issues 1) Kaiming Initializtion in Pytorch was in trouble. 1 2) Jeremy started to dig in, in lesson09, but I didn’t know why the size of tensor is 2 and even understand this spreadsheet data. 3 Homework: Read Visualizing and Understanding Convolutional Networks paper What is a convolution? Visualization one kernel Matthew D Zeiler & Rob Fergus Paper Convolution can be represented as matmul Padding Kernel has rank 3 How can we find a side-edge, a gradient and area of constant weight? What is a convolution?: A convolutional neural network is that your red, green, and blue pixels go into the simple computation, and something comes out of that, and then the result of that goes into a second layer, and the result of that goes into the third layer and so forth. Visualization: one kernel Refer this site for visualizing CNN filteringMatthew D Zeiler & Rob Fergus PaperLecture01 Nine examples of the actual coefficients from the **first layer** Convolution can be represented as matmul: CNNs from different viewpoints {align-items: center;} [A B C D E F G H I J] is 3 by 3 image data flatten to vector. As a result, convolution is a just matrix just two things happens Some of entries are set to zeros at all the times same color always have the same weight. That called weight time / wegith sharing So, we can implement a convolution with matrix multiplication. But, we don’t do that because it’s slow!Padding: What most of libraries do is just put zeros asdie of matrix fast. ai uses reflection paddings (what is this? Jeremy said he uttered it)Kernel has rank 3: As standard picture input would be 4 5, it would be actually 3d, not 2d. If we make kernel as a 3x3 size, we pass over same kernel all the different Red, Green, Blue Pixels. This could make problem, because, if we want to detect frog, which is green, we would want more activations on the green(I made a test cell in my colab 6) How can we find a side-edge, a gradient and area of constant weight?: Not top-edge! One kernel can find only the top-edge, so we should stack the kernels 7 So, we pass it through bunch of kernels to the input images, and that process gives us height x width x corresponding number of kernels. Usually that number of chanel is 16 And if we want to get the more channels and features, we should repeat that process This process gives rise to memory out of control, we do the stride #### conv-example. xlsx 2 convolutional filters At a second layer, filter is 3x3x2 tensor, because to add up together the first layer’s channel. Reference: Problem was math. sqrt(5) was not kaiming initialization formula, Implementation in Pytorch ↩ size of tensor, lecture09 ↩ conv-example. xlsx ↩ Why do computer use red, green and blue instead of primary colors ↩ Grayscale is a group of shades without any visible color. … Each of these dots has its own brightness level as well and, therefore, can be converted to grayscale. A grayscale image is one with all color information removed. ↩ Testing RGB and grayscale ↩ stack kernel and make new rank of tensor at output, Lesson06-2019 ↩ "
}, {
- "id": 17,
+ "id": 19,
"url": "http://localhost:4000/2020/02/dps-week8/",
- "title": "Digital Product School week 8&9",
- "body": "2020/02/24 - The 8th week retropect at Digital Product School Week 8/9 - Ship your MVP/Release next iteration each day This week's schedule CONTENT: Preparing engineering weekly Agile Process Daily Stand-up Making application flowchart (feat draw. io) / ER diagram Flowchart, understaning user journey ER diagram Engineering weekly AI lunch Connecting firebase andPreparing engineering weekly: This week at Wednesday, I planned to explain the Language Modelings, mainly focusing ELMo, ULMFiT, BERT and GPT-2. Slides is available here Changed the presentation, because there were people who are not in ML domain. hereWhenever I do the presentation, I learn more than the information I give them. At the same time, I realize I need to learn more than I know. Agile Process: One of a priceless lesson I learnt from digital product school, was experience of doing agile work. Before I came here, it was a little bit vague concept. I’m not sure ‘what is agile’ but this is what we tried to make agile process. Daily Stand-up: Sharing the works everyday helps interdisciplinary team to work better. Since product started to get higher fidelity, the gap between engineer and non-engineer increased. Actually I didn’t planned to explain concept because I thougth I would be lose my audience when I start to explain. But as daily stand-up, which shares our progess, goes day by day, I planed and reported the issues. And it made each other’s topic feel more familiar. I think point is very important, because at that point people start to be curious. So we can actively ask to the others, and that momwnr, we can explain the point teammate dosen’t know. Each color means every different section. Red: Our team goal, Blue: Interaction designer, Green: Product manager, Yellow: Software/AI engineer This week engineer's main plan Each of us try to explain what we are doing, but things become easier when we are asked. Because we explained something was important to us before, but if we asked it is something important for the others. Making application flowchart (feat draw. io) / ER diagram: Before we start the party, we should clarify the flowchart and ER diagram of our application. Flowchart, understaning user journey: Thanks for google, we could use draw. io for our framechart framework. Actually, we cana choice other good flatform, but draw. io has connected app throgh google drive, most of our engineer was used to it. And after this job, I got to know there is also (of course) rule with the symbols, color, size, space, scaling and direction of arrow -reference. But why we should do this? WE have made our storymap before!! I think storymap is for visualize our status and app. So it should be shared with whole the team, and they should able to understand each role’s issue. But flowchart is more like testing technical feasibility, and error that user can experience. So it could be little more specific, complicated, and hypothetical. This week engineer's main plan ER diagram: Even if we use NoSQL database through firebase, my team was accustomed to SQL more. That what we educated when we were at college, so we had to organize our concept while we were learning NoSQL. Engineering weekly: Every engineering weekly we exchange our knowledge each other so that we can grow together. Before today, my AI collegues presented regression, knn and it was my turn. I prepared slide that explain about pre-trained language model, but my header advised me if I go deep of theoretical things, I would lose my audience. So I decided to brief BERT mode, how I can contribute to other team’s project. Since BERT was breakthrough of NLP industry, I tried to explain how it can be applied to hands on product and how it can help people in their product. The result was quite motivative to me. They gave feedback that since it wasn’t that much theoretical, they could enjoy it, and useful information. Someone asked me do I had learned of presentation before. I was really happy with their feedback! AI lunch: Connecting firebase and: "
+ "title": "My life in Digital Product School - week 8/19/10",
+ "body": "2020/02/24 - The 8/9/10th week retropect at Digital Product School Week 8 - Ship your MVPWeek 9/10 - Release next iteration each day Week 8th schedule CONTENT: Agile Product Development Daily Stand-up(planning) Gemba Walk Sprint Reviews Engineering weeklyAgile Product Development: One of a priceless lesson I learnt from digital product school, was experience of doing agile work. Before I came here, it was a little bit vague concept. I’m still not sure ‘what is agile’ but this is how we tried to make agile process. Daily Stand-up(planning): Sharing the works everyday helps interdisciplinary team to work better. Since product started to get higher fidelity, the gap between engineer and non-engineer increased. Actually I didn’t planned to explain concept because I thougth I would be lose my audience when I start to explain. But as daily stand-up, which shares our progess, goes day by day, I planed and reported the issues. And it made each other’s topic feel more familiar. I think point is very important, because at that point people start to be curious. So we can actively ask to the others, and that momwnr, we can explain the point teammate dosen’t know. Each color means every different section. Red: Our team goal, Blue: Interaction designer, Green: Product manager, Yellow: Software/AI engineer This week engineer's main plan Each of us try to explain what we are doing, but things become easier when we are asked. Because we explained something was important to us before, but if we asked it is something important for the others. Gemba Walk: Team Cero with core team Every 2 weeks, we do the Gemba work, which is ‘question everything to the core team’ time. At this period, people can ask anything related to our product, workshop, and framework. Core team will help just for each team, and each team can solve the problem related to their work. < br/>Why we need this session? because with workshop and general schedule, core team has no time just focus on each team. So through this session, we can have opportunity to understand each program and workshop, like why we are using this platform, and when is the due of our small project, and we have this problem and we need help for this. whatever small problem you have, core team is always willing to help you. Sprint Reviews: Every Friday, we have time to summarise what we did for the week. Maybe we need HMW question and our storymap to share our process and then tell and share what we did try, what point we succeeded and what point it was deviant of our prediction, and why we tried it. . Sprint of Ve-link And then, just after all team’s ppt, we do vote with such a cute marvel. Always it’s very difficult to vote (of course you can’t vote to your team!) Because it depends on criteria what do I value!But since this is process of our agile work, I try to focus on what they have changed since last week, and why they did it, how they did it. Engineering weekly: Every engineering weekly we exchange our knowledge each other so that we can grow together. Everyone have their knowledge to share and we can be tutor and at the same time can be of tutee. Previously, my AI collegues presented regression, knn. And because I’m somewhat specialized to NLP, I prepared slide that explain about pre-trained language model, but my header advised me if I go deep of theoretical things, I would lose my audience. So I decided to brief BERT mode, how I can contribute to other team’s project. Since BERT was breakthrough of NLP industry, I tried to explain how it can be applied to hands on product and how it can help people in their product. The result was quite motivative to me. They gave feedback that since it wasn’t that much theoretical, they could enjoy it, and useful information. Someone asked me do I had learned of presentation before. I was really happy with their feedback! "
}, {
- "id": 18,
+ "id": 20,
"url": "http://localhost:4000/2020/02/fast.ai-nlp-note-16/",
"title": "Algorithmic bias",
"body": "2020/02/20 - Algorithms can encode & magnify human bias Case Study 1: Facial Recognition & Predictive Policing: Joy Buolamwini & Timnit Gebru, gendershades. org Microsoft, FACE+, IBM - All of these things are sell now. Largest gap between $\therefore\ Lighter Male\ >\ Darker\ Female $ This US mayor joked cops should “mount . 50-caliber” guns where AI predicts crime With machine learning, with automation, there’s a 99% success, so that robot is ㅡwill beㅡ99% accurate in telling us what is going to happen next, which is really interesting. - city official in Lancater, CA, approving on using IBM for public security Bias: Bias is type of error Statistical Bias: difference between a statistic’s expected value and the true value Unjust Bias: disproportionate preference for or prejudice against a group Unconscious bias: bias that we don’t realize we have But, term bias is too generic to be productive. Different sources of bias have different causes Representation Bias: Dataset was not representative of the algorithm that might be used on later. Above : Data is okay, but algorithm has some problem. Below : Data has error. For example, object detection production that performs very well in common product of US. But in contrast, change of target product region, like Zimbabwe, Solomon Island, and so on, reduced the performence remarkably. It is not the algorithmic problem, so we should care about data volume of region. Evaluation Bias: Benchmark datasets spur on research, 4. 4% of IJB-A images are dark-skinned women. 2/3 of ImageNet images from the West (Sharkar et al, 2017) Case Study 2: Recidivism Algorithm Used Prison Sentencing: Case Study 3: Online Ad Delivery: Bias in NLP: ( Nothing to do with the course, but I’m researching this field these days. ) But all about Englsih ImpactThe person is doctor. The person is nurse -> 그는 의사다. 그녀는 간호사다. Concept of “biased data” often too generic to be useful: Different sources of bias have different sources Data, models and systems are not unchanging numbers on a screen. They’re the result of a complex process that starts with years of historical context and involves a series of choices and norms, from data measurement to model evaluation to human interpretation. - Harini Suresh, “The problem with Biased Data” Five Sources of Bias in ML: Representation Bias Evaluation Bias Measurement Bias Aggregation Bias(46:02) Historical Bias(46:26) A few studies(47:13) Racial Bias, Even when we have good intentions(new york times)(47:10) gender(48:59) Humans are biased, so why does algorithmic bias matter?: Algorithms & humans are used differently (humans are usually decision maker) Algorithms are accurate and objective No way to apeal if there if error processed large scale cheap Machine learning can amplify bias Machine learning can create feedback loops. Technology is power. And with that comes responsibility. Solutions: Analyze a project at work/school: Questions about AI 5 types of bias (Suresh & Guttag) Datasheets for datasets, Modelcards for model reporting Accuracy rate on different sub-groups Work with domain experts & those impacted Increase diversity in our workspace Advocate for good policy Be on the ongoing lookout for bias"
}, {
- "id": 19,
+ "id": 21,
"url": "http://localhost:4000/2020/02/classifier-city/",
"title": "Making a classifier with image dataset made from gooogle",
"body": "2020/02/15 - CONTENTS: Creating dataset from google images Using google_images_download Create ImageDataBunch Train model fit_one_cycle() Let’s find-tune Let’s train the whole model! Let’s make batch size bigger! Interpretation Model in productionCode can be found hereDeployed model here Making a classifier which can distinguish Seoul from Munich and Sanfrancisco!(hoping my well in Munich!) Creating dataset from google images: In machine learning, you always need data before you build your model. You can use either URLs or google_images_download package. Since Jeremy explained specifically, I will try the other. Using google_images_download: note: This is not google official package Refer to Official Doncument, put that arguments. from google_images_download import google_images_downloadresponse = google_images_download. googleimagesdownload() #class instantiationout_dir = os. path. abspath('. . /. . /materials/dataset/pkg/')os. mkdir(out_dir)arguments = { keywords : Cebu,Munich,Seoul , print_urls :True, suffix_keywords : city , output_directory :out_dir, type : photo , }paths = response. download(arguments) #passing the arguments to the functionprint(paths)and if you need, here is main code. Create ImageDataBunch: We need to separate validation set because we just grabbed these imagese from Google. Most of the dataset we use (kaggle/research) splited into train / validation / test so if they are not devided beforehand we should make databunch, and Jeremy recommended assign 20% to validation. Help on function verify_images in module fastai. vision. data:verify_images(path: Union[pathlib. Path, str], delete: bool = True, max_workers: int = 4, max_size: int = None, recurse: bool = False, dest: Union[pathlib. Path, str] = '. ', n_channels: int = 3, interp=2, ext: str = None, img_format: str = None, resume: bool = None, **kwargs) Check if the images in `path` aren't broken, maybe resize them and copy it in `dest`. Data from google image url Data from package Train model: len(class) len(train) len(valid) Data_url 3 432 108 Data_pkg 3 216 53 Uisng model: restnet34 1, Measurement: accuracy 2 fit_one_cycle(): What is fit one cycle? Cyclical Learning Rates for Training Neural Networks One of the way to find good learning rate. Core idea is to start with small learning rate (like 1e-4, 1e-3) and increase the learning rate after each mini-batch till loss starts exploding. And pick up learning rate one order lower than exploding point. For example, plotted learning rate is like below picture, picking up around 1e-2 is the best way. Why this methods Traditionally, the learning rate is decreased as the learning starts converging with time. But this paper suggests to cycle our learning rate, because it makes us avoid local minimum. Basically this cyclic method enables us to explore whole of loss function so that find out global minimum. In other words, higher learning rate behaves like regularisation. Let’s find-tune: Do train just one last layer by learning rate found by find_lr This section you should find the strongest downward slope that kind of sticking around for quite a while. And choose just one order lower than lowest point. As explained before, I will pick up 1e-2. And of course, this is fine-tuning, we don’t need discriminative learning rate yet. Let’s train the whole model!: link When you plot the learning rate again, maybe you will get soaring shape of learning rate. Rule of thumb, When you slice the learning rate, use learning rate you used at unfrozen part. Divide it by 5 or 10 and put it on maximum bound. At minimum bound, get the point just before it soared, and divide it by 10. Let’s make batch size bigger!: Since default batch size is 64, I tried it to 128. And it gets way more better result(even it’s still underfitting!) And if I freeze model and train whole model again, the model would be better. Also, you can use this method to the other big dataset model training! Interpretation: See the confusion matrix. Result is quite great. *Since I’m using colab, I will skip data cleansing. But I highly recommend you to use ImageCleaner widget, only if you are using jupyter notebook (not jupyter lab) Model in production: You can deploy your model in simple way. I referred fast. ai, and used render(it’s free for limited time). You can find detailed document here. and you can create a route like this. @app. route( /classify-url , methods=[ GET ])async def classify_url(request): bytes = await get_bytes(request. query_params[ url ]) img = open_image(BytesIO(bytes)) _,_,losses = learner. predict(img) return JSONResponse({ predictions : sorted( zip(cat_learner. data. classes, map(float, losses)), key=lambda p: p[1], reverse=True ) })You can find my deployed model here Reference: How to create a deep learning dataset using Google Images towardsdatascience - one cycle policy Deep Residual Learning for Image Recognition ↩ Accuracy_and_precision ↩ "
}, {
- "id": 20,
+ "id": 22,
"url": "http://localhost:4000/2020/02/dps-week5/",
"title": "Digital Product School week 5",
"body": "2020/02/09 - The 5th week retropect at Digital Product School Week 5 - Create a Storymap and sync it with Lean Canvas This week's schedule CONTENT: How to create our story map Prepare your story Discover your product’s AI potentialMondayHow to create our story map: We need this 'aha' moment There was a Milestone workshop, about our weekly goal. As we are agile working, we go fast and change every week’s goal. This week we will finalize our story map based on user’s pain-point and HMW questions. How should we make our story-map Basically we should make story map based on this rule Tell stories, don’t just write them! We always need context, that means all the story component should be connected Visualize your product to establish a shared understanding and speed up discussions! Post-it filled of text is not enough, we should fill it with visualizations then team mates can understand it fast Only discuss in front our your story map! (Speed) So we can update our story-map as soon as we change our opinion And also Use a story map to find the parts that matter most and to identify holes in your idea! Since the story map consists of techinical part, we should consider each story’s technical feasibility Minimise output, maximise outcome and impact! Build tests to figure out what’s minimum and what’s viable! This story map functions to find out our minimum value of ideas Work iteratively: Change your story map according to your learnings! We should repeat this process again and again PMs: Make sure Storymap is up to date!Prepare your story: team cero, our whole story map Our goal Technical feasibility of our storyWhat is your strategy to make user achieve something? This would be our expand point Discover your product’s AI potential: How can we apply AI to our product? Let’s write down our ‘HMW’ questions, and find out all p ossibilities. These are suggestion of possibilities, so don’t attached to feasibility (we will do in at lean start-up) Software section's expectation AI section's expectationTuesday Engineer's task, week5This 5th week, engineers settled WendesdayThursdayFriday"
}, {
- "id": 21,
+ "id": 23,
"url": "http://localhost:4000/2020/02/GPU-time/",
"title": "4 reasons took much time to setting GPU for fast.ai than I expected",
"body": "2020/02/05 - Motivation: Before now, me as a undergraduate student, I was parsimony who usually depend on colab, kaggle, friend’s server(occasional) whenever i need GPU. . And this time it’s been for a while to install GPU than I expected and I share the several component that stood in my way. Written at Oct 24 2019, if you think this is deprecated, please do not have a leap of faith. Just for the record, I’ve used Kaggle, Colab, GCP, Azure, EC2 as GPU cloud. 1. Did not know there is JupyterLab option in Google Cloud Platform. : At the first time when GCP came out, there was no AI Platform service. So from starting vm instance to launching jupyter and installing packages, I did all of the things myself. (and I learned 🤗) $ curl -O https://repo. continuum. io/archive/Anaconda3-5. 0. 1-Linux-x86_64. sh[Downloading conda in ssh] I created VM instance,selected zone, machine type and disk type. Then, define firewall rules and in ssh terminal, install jupyter and other packages. But you can do all of these things just using AI Platform. [AI Platform] I think it especially save your time if you are living in Asia-Pacific, which google doesn’t support not that much GPU resources. 2. Consider if the platform has limited resources in a region you live in. : I live in South Korea, East Asia, and it seems like this region has lots of limitation in GPU (except quite expensive AWS) And the Taiwan which was the only one region where I can launch my own VM with GPU (I tried all the other regions in the list) sometimes do normaly, but not always. 😥After launching, I did several works and next day I could not start VM. (I didn’t count it, but tried it a few hours because I didn’t want cost any more time…) Endlessly failed to start instance, then I choose to move AWS as an alternative way. 3. Fast. ai gives deliberate guide and I didn’t know it. : Fast. ai offer the guide for all available platform. (Colab, salamander, Gradient, Kaggle, Colab, and so on) It is so important, and really needs, because cloud computing options are vary as occasion and purpose arise. I didn’t know fast. ai has manual to running GCP, and I think it’s as good a reason as any for me to be have taken time. It helped me so much when I had aws and shortened my time. I don’t want to read all of the manual in amazno. . (It is recommended. . but I’d rather read GIT PRO now…) ssh -i ~/. ssh/<your_private_key_pair> -L localhost:8888:localhost:8888 ubuntu@<your instance IP>4. You should wait to add more volume just after add volume, by building AWS EC2. : Since Elastic Block Store(EBS) storage supports optimized storage, users can’t extend storage volume two times in a row. Unfortunately, at the first time, I didn’t know it (again 👻) and when VM lacked volume, I doubled dist capacity (76*2) at a rough but It needs more. <!– this time I installed GPU in two years, and it became little complicated compared to 2 years ago. And this time for the first time(maybe not the first time. . but i handled it in my class or with my friend. but it’s my first time on my own. ) I very I’m started to using used google colab, kaggleand, GCP-JupyterLab, ec2 - friend made, aws vm machine but I had a environment variable but i did not know of it. On these days, I could not get a resources from taiwan… I couldn’t notice a deliberate Anyway, as a result I tried myself gcp myself and aws ec2 with fast. ai But I think doing on my self surely takes much time (in this point I wonder why I’m doing this, and should remind me, especially I was studying disk volume optimization) disk volume exceed - https://askubuntu. com/questions/919748/no-space-left-on-device-even-though-there-is: "
}, {
- "id": 22,
+ "id": 24,
"url": "http://localhost:4000/2020/02/dps-week4/",
"title": "Digital Product School week 4",
"body": "2020/02/01 - The 4th week retropect at Digital Product School Week 4 - Find solution ideas and run experiments [This week’s schedule] CONTENT: Ideation Techniques What is ideation techniques? Generating idea in my team AIdeation Team brain storming of idea Die Produkt MacherMondayIdeation Techniques: [slides from @steffen] What is ideation techniques?: We tried to find out user’s painpoint last week. Tried to users talk about their, pain point. No question directly, but extract from them their pain with transportation. Generating idea in my team: AIdeation: TuesdayTeam brain storming of idea: Based on generated idea on Monday, we extended our idea doing rolling-paper! Die Produkt Macher: What is lean start-up? Lean startup is a methodology for developing businesses and products that aims to shorten product development cycles and rapidly discover if a proposed business model is viable; this is achieved by adopting a combination of business-hypothesis-driven experimentation, iterative product releases, and validated learning. - wikipedia WendesdayThursdayFriday"
}, {
- "id": 23,
+ "id": 25,
"url": "http://localhost:4000/2020/01/retrosprect-of-acl-paper-2020/",
"title": "Retrospect of ACL 2020 paper writing",
"body": "2020/01/29 - 2020 Annual Conference of the Association for Computational Linguistics Why I can’t use ‘Cebuano’ for the research?: Why I had to change target language from ‘Cebuano’ to ‘Tagalog’?-> No language translator options except google translation. But before knowing that I already consult my friend, whose mother tongue is English. So I had to aplogize her, but couldn’t tell her why suddenly I changed my plan. -> I realized there are many languages even can’t be researched at all. . -> Getting accustomed to discrimination makes misunderstanding, sometimes. At my country, we couldn’t use music streaming service, because of legal problem. But at that moment, I thought it was discrimination, which is done by music company. "
}, {
- "id": 24,
+ "id": 26,
"url": "http://localhost:4000/2020/01/Git-Merge/",
"title": "Why am I not listed as a contributor?!",
"body": "2020/01/10 - From the end of last year, big changes have witnessed in NLP research. Embracing an unprecedented growth, I started to study new exciting results and advances. In doing so, I noticed I’m not listed as contributor of repo which my PR accessed. How did I come to a repository?: When I’m stuck, I would prefer to code, than to go deep in theory. (It must be so. . too much to understand 🤒)It was BERT released by Google AI I felt keenly the necessity of implementing, because not only couldn’t understand the way they figured out positional encoding formula, but how it actually works. What does it mean to “scale” dot product in Attention? (Now I know it’s far from my section 😂) Figure 1. Scaled Dot Product. Adopted from tensorflow blogWhat was the code error?: For implement code in paper, I read the papers Transformer and BERT, structured the model, and refered the others’ code. Meanwhile, I found out a small error in tokenization process, which was changing a token into [MASK], enabled bidirectional representation. I’ve made PR, and got merged. But I was not in contributors. Why?: Figure 2. Merged Pull request Adopted from graykode projectActually I happened to know there can be couple of reasons github doesn’t include my name as contributor. Well, if contributors tab has more than 100 people, in which case it shows you up only if you are in the top 100 contributors because displaying too many contributors can make webpages down. Somethimes, however, it doesn’t that problem. Why not? Two possibilities are there. First, According to Joel-Glovier, if repository maintainer merged-as-a-rebase PR will end up showing as maintainer’s commit. But maintainer shouldn’t normally do this. Second, if you happend to commit using a different git email that what is in your GitHub profile, it will not be attached to your Github user, and “doesn’t show up” as you. Reference: Michał Chromiak’s blog Github: why are my contributions are not showing on my profile atlassian-gitfetch"
}, {
- "id": 25,
- "url": "http://localhost:4000/2019/12/lesson1-fastai/",
- "title": "Fine Grained Classification",
- "body": "2019/12/31 - Finally you can solve the mystery behind this weird drawing. . through this course. juptyer notebook magic: %reload_ext autoreload%autoreload 2%matplotlib inlinethis is special directives to jupyter notebook, not python code. And it is called ‘magics’ (but i think jeremy is magicion) If somebody changes underlying library code while I’m running this, please reload it automatically If somebody asks to plot something, then please plot it here in this Jupyter NotebookDon’t hesitate to import start~ Digging into untar_data, path. ls: Union[pathlib. Path, str]: typed programming language? -> maybe i think disclaim the type beforehand for sure. Q. like assert? path. ls()this is some module that fast. ai made because os. listdir(‘path’) is unconvinient. Python3 pathlib library!: pathlib "
- }, {
- "id": 26,
+ "id": 27,
"url": "http://localhost:4000/2019/12/jeremy-howard/",
"title": "Jeremy Howard",
"body": "2019/12/15 - This is journey to find out ‘who am I trying to be?’: How he impacted me? The person who made me start Computer Vision again. He emphasized the importance of studying NLP and Computer together to understand the deep-learning. He didn’t order it to study, but always he pursuade me with reasonable way. “It’s not just something I can throw away. NLP and computer vision a few weeks apart and that’s going to force your brain to realize like ‘oh I have to remember this’” He made me admit my failure in deep-learning. I started to objectify where am I. What should I do when I’m frustrated. “Keep going. You’re not expected to remember everything. Yet. You’re not expected to understand everything. Yet. You’re not expected to know why everything works. Yet. ” His articles are numerous, below. What is torch. nn Really? High Performance Numeric Programming with Swift: Explorations and Reflections C++11, random distributions, and Swift And especially, I like this book. Designing great data products Great predictive modeling is an important part of the solution, but it no longer stands on its own; as products become more sophisticated, it disappears into the plumbing. Designing great data products And he is also famous for words. Here are some. we’re going to try and use that to really understand what’s going on. So to warn you, none of it is rocket science but a lot of its going to look really new. So don’t expect to get it the first time but expect to listen and jump into the notebook try a few things test things out look particularly at like tensor shapes and inputs and outputs to check your understanding then go back and listen again. But and kind of try it, a few times, because you will get there right, it’s just that there’s going to be a lot of new concepts because we haven’t done that much stuff in pure Pytorch. Lesson 6: Deep Learning 2019 "
}, {
- "id": 27,
+ "id": 28,
"url": "http://localhost:4000/2019/11/julia-evans/",
"title": "Julia Evans",
"body": "2019/11/20 - This is journey to find out ‘who am I trying to be?’: The women who surprised me in many ways. First, she approached me to teaching some concepts drawing cartoons. It was at Hackers news, which was hightest ranks. Personally I have the use of not to reading title, so and cartoon was so cute and clear. I naturally gonna understood mechanism and astonished by her explaination ability. Her value, which she was taught by many people so want to do same things, moved me. Volume of her knowledge, that just reading post title is a deal of work, amazed me. "
}, {
- "id": 28,
+ "id": 29,
"url": "http://localhost:4000/2019/11/coc-retropective/",
"title": "Retrospective on Pycon 2019 Korea (CoC Committee)",
"body": "2019/11/05 - When I was volunteer, it seems like busy and hectic to managing that crowded conference. In my experience, to get things moving, it needs hierarchy. But it didn’t. Organizers emphasized our responsibility, and if I passed each other’s burden, It could be my burden next time. In solidarity of the obligation, we finished conference well. And after participating PyCon Korea 2018 as volunteer, I’ve joined PyCon Korea Organizer last year. <Figure 1> First meeting of PyCon 2019 Korea Organizers It’s been a while since PyCon 2019 finished. It’s held on Aug 15 - 18, at Coex Grand Balloom <Figure 2> Ongoing session, speaking on news comment processing <Figure 3> Sponsor Booth iin Coex Hall <Figure 4> After PyCon 2019, with all of volunteer, organizer, speakers 😍 🥰 Serving as part of the coc TF, I spent large fraction of last year doing CoC job. here’s the path what we’ve been grappled with to grasp a solution. First half: Before the conference Toward Diverse Community: Formally we’ve been reusing and modifying PyCon US CoC, but we needed fit in Korean and I was part of that to revise code of conduct. Except ‘That’ Diversity, Because it is ‘Harassment’: Specific point was harassment, and the others were not. process of finding the points. How can we settle this point?Second half: During the conference Handling the potential Harassment: Disjunction of policy and real-time situation: This ‘PyCon 2019 Korea retrospective series’ would be devided into 3 Episodes. “Retrospective on Pycon 2019 Korea (CoC Committee)” “Retrospective on Pycon 2019 Korea (Program Chair)” (20 Nov, To Be Update) “Maintaining participation while still making timely decisions” (29 Nov, To Be Update)"
}, {
- "id": 29,
+ "id": 30,
"url": "http://localhost:4000/2019/11/elif-shafak/",
"title": "Elif Shafak",
"body": "2019/11/05 - This is journey to find out ‘who am I trying to be?’: For creative-minded people, Istanbul is a treasure. ’ Photo © Chris Boland, licensed under CC BY-NC-ND 2. 0 it suddenly felt like what I was trying to convey was more complicated and detailed than what the circumstances allowed me to say. And I did what I usually do in similar situations: I stammered, I shut down, and I stopped talking. I stopped talking because the truth was complicated, even though I knew, deep within, that one should never, ever remain silent for fear of complexity. <Figure 1> Elif Shafak Photo credit: www. elifsafak. com. tr I want to talk about emotions and the need to boost our emotional intelligence. I think it’s a pity that mainstream political theory pays very little attention to emotions. Oftentimes, analysts and experts are so busy with data and metrics that they seem to forget those things in life that are difficult to measure and perhaps impossible to cluster under statistical models. But I think this is a mistake, for two main reasons. We are emotional beings. I think it’s going to be one of our biggest intellectual challenges, because our political systems are replete with emotions. In country after country, we have seen illiberal politicians exploiting these emotions. And yet within the academia and among the intelligentsia, we are yet to take emotions seriously. I think we should. 1 2 Reference: British Council Worldwide ↩ Ted Talk ↩ "
}, {
- "id": 30,
+ "id": 31,
"url": "http://localhost:4000/2019/01/dps-week1/",
"title": "Digital Product School week 1",
"body": "2019/01/11 - The 1th week retropect at Digital Product School [This week’s schedule] CONTENT: Welcome to Digital Product School! Trip to Spitzingsee Welcome to Design Office Specifying our goal of product Welcome to Digital Product School!: Trip to Spitzingsee: At the first day of Digital Product School, we had a off-site with all of batch 9 people. All the costs were managed by dps. At the beautiful mountain, we settled the team, and got my team goal. Basically, there are two kind of team in DPS. (1) Wild team - the team has fixed topic(2) Company team - the team which has specific stakeholders, and also topic defined by that stakeholders The Core-team will fix what team you will join in DPS for 3 months based on ymy professionals, they announce it at off-site. [My team for 3 months at DPS] And we decide on my batch #9 theme song. How? Each team draw for songs and pitch ‘why this song should be batch #9 theme song’The result? Imagine dragon - Believer (I didn’t know at the moment, this song would be stamped in my memory) We have a workshop for getting to know each other. For example, we share 1) what do I expect from 3 months of dps, 2) when I feel happy in my life time, 3) what I worked for last week, 4) what was my last project and 5) what plays important role in my life My team's board Cero Welcome to Design Office: At first day of design office, we had workshop, which celebrates my day in dps also discuss specific rule, menifesto and stakeholders We get sticker and attach it in map depends on my nationality Now time to get to know my team’s stakeholders. What they want for us? What they expect from us? How free my team are on the topic?To be honest, it is endless tug-of-war. We should discuss with my stakeholders, endlessly, and find out solution which can meet interest of users, stakeholders and my team. Basically, my team’s main stakeholder is ADAC, but BMW, City of munich and Nokia will also participate as my team’s stakeholders. Specifying our goal of product: "
@@ -331,7 +336,7 @@
- fast.ai-v3 ,
+ fastai-v3 ,
@@ -391,12 +396,15 @@
+