- 此项目是个人学习李宏毅老师机器学习课程的学习笔记,通过笔记的方式一边巩固学习效果,一边方便后续复习。如有错误,欢迎批评指正。
- 项目包括了机器学习、神经网络、图像处理、NLP相关领域的基础知识和实践应用(具体可以看下文的课程大纲图或目录部分),后续会不断整理面试的知识点进来。
- Chapter 1~7为手写笔记,Chapter 8~X为Markdown笔记(建议使用Typora打开)
摘自李宏毅老师课程网站:http://speech.ee.ntu.edu.tw/~tlkagk/courses_ML20.html
1.梯度下降回顾
3.Stochastic Gradient Descent
6.最优化算法在实际应用中的比较(Adam和SGD的改进)
8.Future Position in Current Step(SGDWM和AdamW)
3.概率生成模型的Parameters Sharing机制
6.最大似然估计求解Logistic Regression问题
7.Logistic与Linear Regression的对比
1.DL三步走(定义模型函数、定义评价函数、选择最优模型)
3.深度学习的训练技巧(过拟合的判断、激活函数的选择、自适应学习率、Early Stopping、正则化、Dropout)
2.CNN的网络结构(Convolution和MaxPooling)
2.GNN可以解决的问题类型以及相关DataSet和BenchMark
3.2 NN4G(Neural Network for Graph)
3.3 DCNN(Diffusion Convolution Neural Network)
3.4 DGC(Diffusion Graph Convolution)
3.5 MoNET(Mixture Model Networks)
3.6 GraphSage(SAmple and aggreGatE)
3.7 GAT(Graph AttentinNeiworks)
3.8 GIN(Graph Isomorphism Network)
4.Graph Signal Processing and Spectral-based GNN
4.1 Signal and System Review(信号系统中的数学变换与GNN的关系)
4.2 Spectral Graph Theory(谱图理论)
4.4 GCN(Graph Convolution Network)
5 Graph Generation(VAE-based model、GAN-based model、Auto-regressive-based model概述)
1.1 RNN Application - Slot Filling
2.1 Long Short-term Memory Cell
4.1 Sentiment Analysis(Many to One)
4.2 Key Term Extraction(Many to One)
4.3 Speech Recognizition(Many to Many)
4.4 Machine Translation(Many to Many)
4.5 Machine Translation(Many to Many)
4.6 Syntactic Parsing(Beyond Sequence)
4.7 Sequence-to-Sequence Auto-encoder(Text)
4.8 Sequence-to-Sequence Auto-encoder(Speech)
5.Attention-based Model(Chapter 11会展开讲解Self-Attention的原理)
5.1 Attention-based Model基本原理
5.2 Attention-based Model Applications
6.Deep Learning VS. Structured Learning
6.1 Deep Learning与Structured Learning的比较
6.2 Integrating Deep Learning and Structured Learning
1.3 Count-based Embedding(Glove Vector)
1.4 Prediction-based Embedding(基本原理、CBOW变式、Skip gram变式)
2.1 Word Embedding Vector蕴含的信息
1.1 Semi-supervised Learning定义
2.Semi-supervised Learning for Generative Model
2.1 Supervised Generative Model
2.2 Semi-supervised Generative Model
3.Low-density Separation(非黑即白)
3.1 Low-density Separation假设
3.3 Self-training与Semi-supervised Generative Model对比
3.4 Entropy-based Regularization
3.5 Outlook: Semi-supervised SVM
4.Smoothness Assumption(近朱者赤,近墨者黑)
4.2 基于平滑理论的半监督学习算法(Cluster and then Label & Graph-based Approach)
4.3 Self-training与Semi-supervised Generative Model对比
4.4 Entropy-based Regularization
4.5 Outlook: Semi-supervised SVM
5.Better Representation(去芜存菁,化繁为简)
1.2 Interpretable v.s. Powerful
2.Local Explanation:Explain the Decision(Questions: Why do you think this image is a cat?)
2.1 Important Component与Gradient-based Method
2.2 Limitation of Gradient based Approaches
3.Global Explanation:Explain the whole Model(Questions: What do you think a “cat” looks like?)
3.1 Activation Maximization Review
3.2 “Regularization” from Generator
3.3 Self-training与Semi-supervised Generative Model对比
3.4 Entropy-based Regularization
3.5 Outlook: Semi-supervised SVM
4.Using A Model to Explain Another Model
4.1 基本原理
4.2 Local Interpretable Model - Agnostic Explanations(LIME - 基于Linear Model)
4.3 Local Interpretable Model - Agnostic Explanations(LIME - 基于Decision Tree)
1.Explain a trained model - Attribution(Local v.s. Global attribution / Completeness / Evaluation)
2.Explain a trained model - Probing(BERT / Good Probing Model)
2.2 What does BERT learn?(BERT Rediscovers the Classical NLP Pipeline )
2.3 What does BERT might not learn?
3.Explain a trained model - HeatMap(Activation map \ Attention map)
3.1 Activation Map:CNN Dissection
3.2 Attention map as explanation
4.Create an explainable model
4.1 CNN Explainable Model的难点
4.2 Constraining activation map
2.2 Fast Gradient Sign Method (FGSM)
2.3 White Box v.s. Black Box
2.4 Universal Adversarial Attack
2.5 Adversarial Reprogramming
2.6 Audio Attack & Text Attack
1.2 Network Pruning - Practical Issue
2.Knowledge Distillation(知识蒸馏)
2.1 Knowledge Distillation基本原理
2.2 训练技巧
3.1 Parameter Quantization的三种解决方案
4.2 Depthwise Separable Convolution
1.Network Compression Review
1.1 Network Compression常用的解决办法
2.Knowledge Distillation(知识蒸馏)
2.1 Knowledge Distillation基本原理
3.3 More About Lottery Ticket Hypothesis
1.1 Structured Object Generation Model
2.Attention(Dynamic Conditional Generation)
2.2 Machine Translation with Attention-based Model
2.3 Speech Recognition with Attention-based Model
2.4 Image Caption with Attention-based Model
3.Tips for Training Generation Model
3.1 Attention Weight Regularization
3.2 Mismatch between Train and Test
3.4 Object level v.s. Component level
5.1 Sentiment Analysis Application
1.4 Multi-head Self-attention(以2 heads 为例)
2.Self-attention在Seq2Seq Model中的用法
2.1 Seq2Seq with Self-attention模型结构
3.1 模型结构
2.Residual Shuffle Exchange Network
2.1 Switch Unit 和 Residual Shuffle Exchange Network
3.1 BERT
2.1 Dimension Reduction的可行性分析
2.2 Principle Component Analysis (PCA)
2.3 PCA – Another Point of View(SVD)
1.Locally Linear Embedding (LLE)
2.1 Laplacian Eigenmaps的基本原理
3.T-distributed Stochastic Neighbor Embedding(t-SNE)
1.2 Auto-encoder – Text Retrieval
1.3 Auto-encoder – Similar Image Search
1.4 Auto-encoder – Pre-training DNN
3.More Non-Linear Dimension Reduction Model
3.1 Restricted Boltzmann Machine
4.1 More than minimizing reconstruction error
4.2 More Interpretable Embedding(Voice Conversion)
1.2 Practicing Generation Models:Pokémon Creation
2.Variational Autoencoder(VAE)
2.3 VAE的数学解释(Gaussian Mixture Model)
3.Generative Adversarial Network (GAN)
1.Embeddings from Language Model(ELMO)
1.1 Contextualized Word Embedding
1.2 Embeddings from Language Model(ELMO)
2.Bidirectional Encoder Representations from Transformers (BERT)
3.Enhanced Representation through Knowledge Integration (ERNIE)
4.Generative Pre-Training(GPT)
4.2 GPT的神奇之处(Zero-shot Learning)
1.1 Self-supervised Learning的常见模型
1.2 Anomaly Detection按照数据类型的分类
2.Case 1 - With Label(Classifier)
3.Case 2 - Without Label(Classifier)
5.More About Anomaly Detection
5.2 Anomaly detection on Image
5.3 Anomaly detection on Audio
2.GAN as structured learning
2.1 Structured Learning的难点与解决方案
3.Can Generator learn by itself? YES!
3.1 使用Auto-Encoder实现Generator的独自学习
4.Can Discriminator generate? YES, but diffiuclt!
4.1 使用Discriminator完成Generative Task的基本方法
5.GAN = Generator + Discriminator
5.1 Generator与Discriminator的相辅相成
1.Conditional Generation by GAN
1.1 Conditional GAN中Generator与Discriminator的设计
2.Unsupervised Conditional Generation by GAN
2.1 Unsupervised Conditional Generation by GAN的提出背景
2.2 实现方案一:Direct transformation
2.3 实现方案二:Projection to Common Space
1.1 Generator与最大似然估计(KL Divergence)
1.2 Discriminator与如何就算KL Divergence
2.FGAN:General Framework of GAN
1.1 JS Divergence与分布无重叠之间的矛盾
1.2 Wasserstein Distance(�Earth Mover’s Distance)
1.5 Loss-sensitive GAN(LSGAN)
Chapter 24 - Generative Adversarial Network(Part 5 - Feature Extraction by GAN [InfoGAN / VAE-GAN / BiGAN])
1.5 Loss-sensitive GAN(LSGAN)
2.Domain-adversarial Training
2.2 Intelligent Photo Editing
2.4 More Application of GAN on Image
1.Improving Supervised Seq-to-seq Model
1.1 Regular Seq2Seq Model训练过程存在的问题
1.2 使用RL训练Seq2Seq Model(Human Feedback)
1.3 使用GAN训练Seq2Seq Model(Discriminator Feedback)
2.Unsupervised Conditional Sequence Generation
2.2 Unsupervised Abstractive Summarization
2.3 Unsupervised Translation
1.Improving Supervised Seq-to-seq Model
1.1 Regular Seq2Seq Model训练过程存在的问题
1.2 使用RL训练Seq2Seq Model(Human Feedback)
1.3 使用GAN训练Seq2Seq Model(Discriminator Feedback)
2.Unsupervised Conditional Sequence Generation
2.2 Unsupervised Abstractive Summarization
2.3 Unsupervised Translation
Chapter 24 - Generative Adversarial Network(Part 8 - More GAN-based Model [SAGAN, BigGAN, SinGAN, GauGAN, GANILLA, NICE-GAN])
1.Improving Supervised Seq-to-seq Model
1.4 SinGAN(将一张图片切割成很多小的图片当做训练资料)
1.5 GauGAN(Conditional Normalization)
1.6 GANILLA(CycleGAN/DualGAN mega升级、宮崎駿Dataset)
1.7 NICE-GAN(D的前半部当成encoder)
2.Labeled Source Dara + Labeled Target Data
3.Labeled Source Dara + Unlabeled Target Data
3.1 Domain-adversarial Training
1.Reinforence Learning Introduction
2.Policy-based Approach(Learning an Actor)
2.1 Policy-based Approach三步走
2.2 Step 1:Neural Network as Actor
2.3 Step 2:Goodness of Actor
2.4 Step 3:Pick the best Actor
3.Value-based Approach(Learning a Critic)
3.1 Critic的定义(State Value Function)
3.2 Estimating Critic(State Value Function)
3.3 Critic的定义(State-action Value Function)
4.1 A3C(Asynchronous Advantage Actor-Critic)
5.Inverse Reinforence Learning
1.1 RL的目标函数(最大化收益期望)与求解过程(Advantage Function)
2.From on-policy to off-policy
2.2 Importance Sampling与Off-Policy
2.3 使用Importance Sampling实现从On-Policy到Off-Policy的转换
3.2 PPO Algorithm和PPO2 Algorithm
1.Value-based Approach Review
1.1 Value-based Approach的分类与估计方法
2.Introduction of Q-Learning
3.Tips of Q-Learning(Q-Learning的变种)
3.6 Distributional Q-function
4.Q-Learning for Continuous Actions
4.1 Q-Learning处理连续型Action的难点和解决办法
1.2 Asynchronous Advantage Actor-Critic(A3C)
2.Pathwise Derivative Policy Gradient
2.1 借鉴GAN的思想使用Actor解决Q-Learning的arg max问题
2.2 Pathwise Derivative Policy Gradient算法伪代码
2.1 Curriculum Learning的从易到难
2.2 Reverse Curriculum Generation
3.Hierarchical Reinforcement Learning
3.1 Curriculum Learning的从易到难
1.2 Behavior Cloning中存在的一些问题