Skip to content

Ailln/nlp-roadmap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 

Repository files navigation

Natural Language Processing Roadmap

🗺️ 一个「自然语言处理」的学习路线图

⚠️ 注意:

  1. 这个项目包含一个名为 PCB 的小实验,这个的 PCB 不是印刷电路板 Printed Circuit Board,也不是进程控制块 Process Control Block,而是 Paper Code Blog 的缩写。我认为 论文代码博客 这三个东西,可以让我们兼顾理论和实践同时,快速地掌握知识点!

  2. 每篇论文后面的星星个数代表论文的重要性(主观意见,仅供参考)。

    1. 🌟: 一般;
    2. 🌟🌟: 重要;
    3. 🌟🌟🌟: 非常重要。

1 分词 Word Segmentation

词是能够独立活动的最小语言单位。 在自然语言处理中,通常都是以词作为基本单位进行处理的。由于英文本身具有天生的优势,以空格划分所有词。而中文的词与词之间没有明显的分割标记,所以在做中文语言处理前的首要任务,就是把连续中文句子分割成「词序列」。这个分割的过程就叫分词了解更多

综述

  • 汉语分词技术综述 {Paper} 🌟
  • 国内中文自动分词技术研究综述 {Paper} 🌟
  • 汉语自动分词的研究现状与困难 {Paper} 🌟🌟
  • 汉语自动分词研究评述 {Paper} 🌟🌟
  • 中文分词十年又回顾: 2007-2017 {Paper} 🌟🌟🌟
  • chinese-word-segmentation {Code}
  • 深度学习中文分词调研 {Blog}

2 词嵌入 Word Embedding

词嵌入就是找到一个映射或者函数,生成在一个新的空间上的表示,该表示被称为「单词表示」。了解更多

综述

  • Word Embeddings: A Survey {Paper} 🌟🌟🌟
  • Visualizing Attention in Transformer-Based Language Representation Models {Paper} 🌟🌟
  • PTMs: Pre-trained Models for Natural Language Processing: A Survey {Paper} {Blog} 🌟🌟🌟
  • Efficient Transformers: A Survey {Paper} 🌟🌟
  • A Survey of Transformers {Paper} 🌟🌟
  • Pre-Trained Models: Past, Present and Future {Paper} 🌟🌟
  • Pretrained Language Models for Text Generation: A Survey {Paper} 🌟
  • A Practical Survey on Faster and Lighter Transformers {Paper} 🌟
  • The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures {Paper} 🌟🌟

核心

  • NNLM: A Neural Probabilistic Language Model {Paper} {Code} {Blog} 🌟
  • W2V: Efficient Estimation of Word Representations in Vector Space {Paper} 🌟🌟
  • Glove: Global Vectors for Word Representation {Paper} 🌟🌟
  • CharCNN: Character-level Convolutional Networks for Text Classification {Paper} {Blog} 🌟
  • ULMFiT: Universal Language Model Fine-tuning for Text Classification {Paper} 🌟
  • SiATL: An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models {Paper} 🌟
  • FastText: Bag of Tricks for Efficient Text Classification {Paper} 🌟🌟
  • CoVe: Learned in Translation: Contextualized Word Vectors {Paper} 🌟
  • ELMo: Deep contextualized word representations {Paper} 🌟🌟
  • Transformer: Attention is All you Need {Paper} {Code} {Blog} 🌟🌟🌟
  • GPT: Improving Language Understanding by Generative Pre-Training {Paper} 🌟
  • GPT2: Language Models are Unsupervised Multitask Learners {Paper} {Code} {Blog} 🌟🌟
  • GPT3: Language Models are Few-Shot Learners {Paper} {Code} 🌟🌟🌟
  • GPT4: GPT-4 Technical Report {Paper} 🌟🌟🌟
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding {Paper} {Code} {Blog} 🌟🌟🌟
  • UniLM: Unified Language Model Pre-training for Natural Language Understanding and Generation {Paper} {Code} {Blog} 🌟🌟
  • T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer {Paper} {Code} {Blog} 🌟
  • ERNIE(Baidu): Enhanced Representation through Knowledge Integration {Paper} {Code} 🌟
  • ERNIE(Tsinghua): Enhanced Language Representation with Informative Entities {Paper} {Code} 🌟
  • RoBERTa: A Robustly Optimized BERT Pretraining Approach {Paper} 🌟
  • ALBERT: A Lite BERT for Self-supervised Learning of Language Representations {Paper} {Code} 🌟🌟
  • TinyBERT: Distilling BERT for Natural Language Understanding {Paper} 🌟🌟
  • FastFormers: Highly Efficient Transformer Models for Natural Language Understanding {Paper} {Code} 🌟🌟

其他

  • word2vec Parameter Learning Explained {Paper} 🌟🌟
  • Semi-supervised Sequence Learning {Paper} 🌟🌟
  • BERT Rediscovers the Classical NLP Pipeline {Paper} 🌟
  • Pre-trained Languge Model Papers {Blog}
  • HuggingFace Transformers {Code}
  • Fudan FastNLP {Code}

3 文本分类 Text Classification

综述

  • A Survey on Text Classification: From Shallow to Deep Learning {Paper} 🌟🌟🌟
  • Deep Learning Based Text Classification: A Comprehensive Review {Paper} 🌟🌟

CNN

  • TextCNN:Convolutional Neural Networks for Sentence Classification {Paper} {Code} 🌟🌟🌟
  • Convolutional Neural Networks for Text Categorization: Shallow Word-level vs. Deep Character-level {Paper} 🌟
  • DPCNN: Deep Pyramid Convolutional Neural Networks for Text Categorization {Paper} {Code} 🌟🌟

4 序列标注 Sequence Labeling

综述

  • Sequence Labeling 的发展史(DNNs+CRF){Blog}

Bi-LSTM + CRF

  • End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF {Paper} 🌟🌟

  • pytorch_NER_BiLSTM_CNN_CRF {Code}

  • NN_NER_tensorFlow {Code}

  • End-to-end-Sequence-Labeling-via-Bi-directional-LSTM-CNNs-CRF-Tutorial {Code}

  • Bi-directional LSTM-CNNs-CRF {Code}

其他

  • Sequence to Sequence Learning with Neural Networks {Paper} 🌟
  • Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks {Paper} 🌟

5 对话系统 Dialogue Systems

综述

  • A Survey on Dialogue Systems: Recent Advances and New Frontiers {Paper} {Blog} 🌟🌟
  • 小哥哥,检索式chatbot了解一下? {Blog} 🌟🌟🌟
  • Recent Neural Methods on Slot Filling and Intent Classification for Task-Oriented Dialogue Systems: A Survey {Paper} 🌟🌟

Open Domain Dialogue Systems

  • HERD: Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models {Paper} {Code} 🌟🌟
  • Adversarial Learning for Neural Dialogue Generation {Paper} {Code} {Blog} 🌟🌟

Task Oriented Dialogue Systems

  • Joint NLU: Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling {Paper} {Code} 🌟🌟
  • BERT for Joint Intent Classification and Slot Filling {Paper} 🌟
  • Sequicity: Simplifying Task-oriented Dialogue Systems with Single Sequence-to-Sequence Architectures {Paper} {Code} 🌟🌟
  • Attention with Intention for a Neural Network Conversation Model {Paper} 🌟
  • REDP: Few-Shot Generalization Across Dialogue Tasks {Paper} {Blog} 🌟🌟
  • TEDP: Dialogue Transformers {Paper} {Code} {Blog} 🌟🌟🌟

Conversational Response Selection

  • Multi-view Response Selection for Human-Computer Conversation {Paper} 🌟🌟
  • SMN: Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-Based Chatbots {Paper} {Code} {Blog} 🌟🌟🌟:
  • DUA: Modeling Multi-turn Conversation with Deep Utterance Aggregation {Paper} {Code} {Blog} 🌟🌟
  • DAM: Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network {Paper} {Code} {Blog} 🌟🌟🌟
  • IMN: Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots {Paper} {Code} {Blog} 🌟🌟
  • Dialogue Transformers {Paper} 🌟🌟

6 主题模型 Topic Model

LDA

7 知识图谱 Knowledge Graph

综述

  • Towards a Definition of Knowledge Graphs {Paper} 🌟🌟🌟

8 提示学习 Prompt Learning

综述

  • PPP: Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing {Paper} {Blog} 🌟🌟🌟

9 图神经网络 Graph Neural Network

综述

  • Graph Neural Networks for Natural Language Processing: A Survey {Paper} 🌟🌟

10 句嵌入 Sentence Embedding

核心

  • InferSent: Supervised Learning of Universal Sentence Representations from Natural Language Inference Data {Paper} {Code} 🌟🌟
  • Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks {Paper} {Code} 🌟🌟🌟
  • BERT-flow: On the Sentence Embeddings from Pre-trained Language Models {Paper} {Code} {Blog} 🌟🌟
  • SimCSE: Simple Contrastive Learning of Sentence Embeddings {Paper} {Code} 🌟🌟🌟

参考