🗺️ 一个「自然语言处理」的学习路线图。
⚠️ 注意:
这个项目包含一个名为
PCB
的小实验,这个的 PCB 不是印刷电路板Printed Circuit Board
,也不是进程控制块Process Control Block
,而是Paper Code Blog
的缩写。我认为论文
、代码
和博客
这三个东西,可以让我们兼顾理论和实践同时,快速地掌握知识点!每篇论文后面的星星个数代表论文的重要性(主观意见,仅供参考)。
- 🌟: 一般;
- 🌟🌟: 重要;
- 🌟🌟🌟: 非常重要。
词是能够独立活动的最小语言单位。 在自然语言处理中,通常都是以词作为基本单位进行处理的。由于英文本身具有天生的优势,以空格划分所有词。而中文的词与词之间没有明显的分割标记,所以在做中文语言处理前的首要任务,就是把连续中文句子分割成「词序列」。这个分割的过程就叫分词。了解更多
- 汉语分词技术综述 {Paper} 🌟
- 国内中文自动分词技术研究综述 {Paper} 🌟
- 汉语自动分词的研究现状与困难 {Paper} 🌟🌟
- 汉语自动分词研究评述 {Paper} 🌟🌟
- 中文分词十年又回顾: 2007-2017 {Paper} 🌟🌟🌟
- chinese-word-segmentation {Code}
- 深度学习中文分词调研 {Blog}
词嵌入就是找到一个映射或者函数,生成在一个新的空间上的表示,该表示被称为「单词表示」。了解更多
- Word Embeddings: A Survey {Paper} 🌟🌟🌟
- Visualizing Attention in Transformer-Based Language Representation Models {Paper} 🌟🌟
- PTMs: Pre-trained Models for Natural Language Processing: A Survey {Paper} {Blog} 🌟🌟🌟
- Efficient Transformers: A Survey {Paper} 🌟🌟
- A Survey of Transformers {Paper} 🌟🌟
- Pre-Trained Models: Past, Present and Future {Paper} 🌟🌟
- Pretrained Language Models for Text Generation: A Survey {Paper} 🌟
- A Practical Survey on Faster and Lighter Transformers {Paper} 🌟
- The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures {Paper} 🌟🌟
- NNLM: A Neural Probabilistic Language Model {Paper} {Code} {Blog} 🌟
- W2V: Efficient Estimation of Word Representations in Vector Space {Paper} 🌟🌟
- Glove: Global Vectors for Word Representation {Paper} 🌟🌟
- CharCNN: Character-level Convolutional Networks for Text Classification {Paper} {Blog} 🌟
- ULMFiT: Universal Language Model Fine-tuning for Text Classification {Paper} 🌟
- SiATL: An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models {Paper} 🌟
- FastText: Bag of Tricks for Efficient Text Classification {Paper} 🌟🌟
- CoVe: Learned in Translation: Contextualized Word Vectors {Paper} 🌟
- ELMo: Deep contextualized word representations {Paper} 🌟🌟
- Transformer: Attention is All you Need {Paper} {Code} {Blog} 🌟🌟🌟
- GPT: Improving Language Understanding by Generative Pre-Training {Paper} 🌟
- GPT2: Language Models are Unsupervised Multitask Learners {Paper} {Code} {Blog} 🌟🌟
- GPT3: Language Models are Few-Shot Learners {Paper} {Code} 🌟🌟🌟
- GPT4: GPT-4 Technical Report {Paper} 🌟🌟🌟
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding {Paper} {Code} {Blog} 🌟🌟🌟
- UniLM: Unified Language Model Pre-training for Natural Language Understanding and Generation {Paper} {Code} {Blog} 🌟🌟
- T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer {Paper} {Code} {Blog} 🌟
- ERNIE(Baidu): Enhanced Representation through Knowledge Integration {Paper} {Code} 🌟
- ERNIE(Tsinghua): Enhanced Language Representation with Informative Entities {Paper} {Code} 🌟
- RoBERTa: A Robustly Optimized BERT Pretraining Approach {Paper} 🌟
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations {Paper} {Code} 🌟🌟
- TinyBERT: Distilling BERT for Natural Language Understanding {Paper} 🌟🌟
- FastFormers: Highly Efficient Transformer Models for Natural Language Understanding {Paper} {Code} 🌟🌟
- word2vec Parameter Learning Explained {Paper} 🌟🌟
- Semi-supervised Sequence Learning {Paper} 🌟🌟
- BERT Rediscovers the Classical NLP Pipeline {Paper} 🌟
- Pre-trained Languge Model Papers {Blog}
- HuggingFace Transformers {Code}
- Fudan FastNLP {Code}
- A Survey on Text Classification: From Shallow to Deep Learning {Paper} 🌟🌟🌟
- Deep Learning Based Text Classification: A Comprehensive Review {Paper} 🌟🌟
- TextCNN:Convolutional Neural Networks for Sentence Classification {Paper} {Code} 🌟🌟🌟
- Convolutional Neural Networks for Text Categorization: Shallow Word-level vs. Deep Character-level {Paper} 🌟
- DPCNN: Deep Pyramid Convolutional Neural Networks for Text Categorization {Paper} {Code} 🌟🌟
- Sequence Labeling 的发展史(DNNs+CRF){Blog}
-
End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF {Paper} 🌟🌟
-
pytorch_NER_BiLSTM_CNN_CRF {Code}
-
NN_NER_tensorFlow {Code}
-
End-to-end-Sequence-Labeling-via-Bi-directional-LSTM-CNNs-CRF-Tutorial {Code}
-
Bi-directional LSTM-CNNs-CRF {Code}
- Sequence to Sequence Learning with Neural Networks {Paper} 🌟
- Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks {Paper} 🌟
- A Survey on Dialogue Systems: Recent Advances and New Frontiers {Paper} {Blog} 🌟🌟
- 小哥哥,检索式chatbot了解一下? {Blog} 🌟🌟🌟
- Recent Neural Methods on Slot Filling and Intent Classification for Task-Oriented Dialogue Systems: A Survey {Paper} 🌟🌟
- HERD: Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models {Paper} {Code} 🌟🌟
- Adversarial Learning for Neural Dialogue Generation {Paper} {Code} {Blog} 🌟🌟
- Joint NLU: Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling {Paper} {Code} 🌟🌟
- BERT for Joint Intent Classification and Slot Filling {Paper} 🌟
- Sequicity: Simplifying Task-oriented Dialogue Systems with Single Sequence-to-Sequence Architectures {Paper} {Code} 🌟🌟
- Attention with Intention for a Neural Network Conversation Model {Paper} 🌟
- REDP: Few-Shot Generalization Across Dialogue Tasks {Paper} {Blog} 🌟🌟
- TEDP: Dialogue Transformers {Paper} {Code} {Blog} 🌟🌟🌟
- Multi-view Response Selection for Human-Computer Conversation {Paper} 🌟🌟
- SMN: Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-Based Chatbots {Paper} {Code} {Blog} 🌟🌟🌟:
- DUA: Modeling Multi-turn Conversation with Deep Utterance Aggregation {Paper} {Code} {Blog} 🌟🌟
- DAM: Multi-Turn Response Selection for Chatbots with Deep Attention Matching Network {Paper} {Code} {Blog} 🌟🌟🌟
- IMN: Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots {Paper} {Code} {Blog} 🌟🌟
- Dialogue Transformers {Paper} 🌟🌟
- Towards a Definition of Knowledge Graphs {Paper} 🌟🌟🌟
- PPP: Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing {Paper} {Blog} 🌟🌟🌟
- Graph Neural Networks for Natural Language Processing: A Survey {Paper} 🌟🌟
- InferSent: Supervised Learning of Universal Sentence Representations from Natural Language Inference Data {Paper} {Code} 🌟🌟
- Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks {Paper} {Code} 🌟🌟🌟
- BERT-flow: On the Sentence Embeddings from Pre-trained Language Models {Paper} {Code} {Blog} 🌟🌟
- SimCSE: Simple Contrastive Learning of Sentence Embeddings {Paper} {Code} 🌟🌟🌟