Skip to content
KDD19 Tutorial: From Shallow to Deep Language Representations: Pre-training, Fine-tuning, and Beyond
Branch: master
Clone or download
Latest commit 4e84164 May 13, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
01_basics add nb Mar 16, 2019
02_word_embedding add nb Mar 16, 2019
03_finetuning_word_embedding add nb Mar 16, 2019
04_attention add nb Mar 16, 2019
06_bert remove files Apr 5, 2019
LICENSE Initial commit Mar 16, 2019 Update May 14, 2019

KDD19 Tutorial: From Shallow to Deep Language Representations: Pre-training, Fine-tuning, and Beyond

Time: TBA

Location: TBA

Tutors: Sheng Zha, Aston Zhang, Haibin Lin, Chenguang Wang, Mu Li, Alex J. Smola


Natural language processing (NLP) is at the core of the pursuit for artificial intelligence, with deep learning as the main powerhouse of recent advances. Most NLP problems remain unsolved. The compositional nature of language enables us to express complex ideas, but at the same time making it intractable to spoon-feed enough labels to the data-hungry algorithms for all situations. Recent progress on unsupervised language representation techniques brings new hope. In this hands-on tutorial, we walk through these techniques and see how NLP learning can be drastically improved based on pre-training and fine-tuning language representations on unlabelled text. Specifically, we consider shallow representations in word embeddings such as word2vec, fastText, and GloVe, and deep representations with attention mechanisms such as BERT. We demonstrate detailed procedures and best practices on how to pre-train such models and ne-tune them in downstream NLP tasks as diverse as finding synonyms and analogies, sentiment analysis, question answering, and machine translation. All the hands-on implementations are with Apache (incubating) MXNet and GluonNLP, and part of the implementations are available on Dive into Deep Learning.

Target Audience

We are targeting engineers, scientists, and instructors in the eld of natural language processing, data mining, text mining, deep learning, machine learning, and arti cial intelligence. While the audience with a good background in these areas would bene t most of this tutorial, it will give general audience and newcomers an introductory pointer to the presented materials.


Time Title Slides Notebooks
TBA Basics of hands-on deep learning ndarray, autograd
TBA Shallow language representations in word embedding word2vec, fasttext, GloVe, pre-train
TBA Fine-tuning pre-trained word embedding analogy, sa-rnn, sa-cnn
TBA Attention mechanisms attention, seq2seq with attention on mt
TBA Deep language representations with attention
TBA Fine-tuning pre-trained BERT finetune
TBA Q & A and Closing
You can’t perform that action at this time.