Empowering Future NLP Experts: An Applied Course on Cutting-Edge Techniques with a Focus on Multilinguality and Language Diversity
This repository contains the materials for an applied Natural Language Processing (NLP) course designed for upper-year computer science undergraduate students. The course emphasizes state-of-the-art techniques through self-directed, project-based learning with a focus on multilinguality and language diversity.
The rapid advancements and widespread transformation of Large Language Models (LLMs) have made it necessary to incorporate these cutting-edge techniques into the educational curricula of NLP. This course is designed to empower learners to advance their language community while preparing for industry readiness.
- Target Audience: Senior undergraduate students in computer science with prerequisites of linear algebra, calculus, probabilities, and introductory machine learning. Junior graduate students with an interest in NLP research may also join.
- Focus: Multilinguality and language diversity, ideal for empowering a diverse and multicultural student population.
- Pedagogical Approach: Self-directed learning, hands-on labs, assignments, and a comprehensive final project.
The course is 12 weeks long, with 9 weeks of lectures and 3 weeks for invited speakers who are working in multilinguality and language diversity.
Week | Lecture Topics | Lab Notebooks | Assignments |
---|---|---|---|
1 | Course Introduction | Python and Regex | |
2 | Corpus Statistics and n-Gram Language Model | N-Gram Language Modelling | |
3 | Entropy Decisions | PyTorch Introduction | A1 |
4 | Machine Learning and Feature Classification | Naive Bayes and Text Classification | |
5 | Neural Language Models | Word Embeddings and Vector Semantics | |
6 | MIDTERM EXAM | RNN | MIDTERM |
SB | STUDY BREAK | ||
8 | Attention and Transformers | PyTorch and Attention | A2 |
9 | Large Language Models | Transformer (Illustrated and Annotated) | |
10 | Multilinguality and Language Diversity | HuggingFace1 | |
11 | Multilinguality and Language Diversity | HuggingFace2 | A3 |
12 | Multilinguality and Language Diversity | Transfer Learning | FINAL |
-
A Journey through Language Modelling
- Introduction to language modelling applied to low-resource languages.
- Implementation of n-gram models, neural n-gram models, and transformer language models.
- Open-ended exploration to improve results.
- This assignment is borrowed from UC Berkeley’s Computer Science graduate NLP course (cs288) Interactive Assignments for Teaching Structured Neural NLP, Project 1: Language Modeling Link.
-
Neural Machine Translation with Custom Vocabulary Building & Transformer
- Implementation of a custom transformer architecture using PyTorch.
- Hands-on experience with gradient descent optimization, back-propagation, and loss functions.
- Evaluation of translation performance using BLEU scores.
-
Adapting Languages with Fine-Tuning
- Fine-tuning existing language models to a low-resource language.
- Exploration of full parameter fine-tuning, LoRA, and prompt tuning.
- Comprehensive evaluation of adapted models against baseline models.
The learning outcomes of the course are designed to be SMART (specific, measurable, achievable, realistic, timely) using the Bloom taxonomy:
- Understanding and knowledge of introductory and basic NLP concepts, terminology, tasks, pipeline, and methods/techniques.
- Knowledge and application of advanced techniques in LLM such as fine-tuning and pretraining.
- Ability to reproduce code for the latest State-of-the-Art NLP baselines and conduct proof-of-concept projects.
- Practice collaborative software development and debugging of Deep Learning code.
- Creation of experimental results tables and methods to test research hypotheses.
- Implementation of advanced techniques in NLP required for conducting research.
- Lorin W. Anderson and David R. Krathwohl. 2001. A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives. Longman.
- Peter F Brown, Vincent J Della Pietra, Peter V Desouza, Jennifer C Lai, and Robert L Mercer. 1992. Class-based n-gram models of natural language. Computational linguistics, 18(4):467–480.
- David Gaddy, Daniel Fried, Nikita Kitaev, Mitchell Stern, Rodolfo Corona, John DeNero, and Dan Klein. 2021. Interactive assignments for teaching structured neural nlp. In Proceedings of the Fifth Workshop on Teaching NLP, pages 104–107.
- Yoav Goldberg. 2016. A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research, 57:345–420.
- And many more...
This project is licensed under the Apache2.0 License - see the LICENSE file for details.
We hope this course empowers you to advance your language community and prepares you for a successful career in NLP!