Empowering Future NLP Experts: An Applied Course on Cutting-Edge Techniques with a Focus on Multilinguality and Language Diversity

This repository contains the materials for an applied Natural Language Processing (NLP) course designed for upper-year computer science undergraduate students. The course emphasizes state-of-the-art techniques through self-directed, project-based learning with a focus on multilinguality and language diversity.

Course Overview

The rapid advancements and widespread transformation of Large Language Models (LLMs) have made it necessary to incorporate these cutting-edge techniques into the educational curricula of NLP. This course is designed to empower learners to advance their language community while preparing for industry readiness.

Key Highlights

Target Audience: Senior undergraduate students in computer science with prerequisites of linear algebra, calculus, probabilities, and introductory machine learning. Junior graduate students with an interest in NLP research may also join.
Focus: Multilinguality and language diversity, ideal for empowering a diverse and multicultural student population.
Pedagogical Approach: Self-directed learning, hands-on labs, assignments, and a comprehensive final project.

Course Structure and Content

The course is 12 weeks long, with 9 weeks of lectures and 3 weeks for invited speakers who are working in multilinguality and language diversity.

Week	Lecture Topics	Lab Notebooks	Assignments
1	Course Introduction	Python and Regex
2	Corpus Statistics and n-Gram Language Model	N-Gram Language Modelling
3	Entropy Decisions	PyTorch Introduction	A1
4	Machine Learning and Feature Classification	Naive Bayes and Text Classification
5	Neural Language Models	Word Embeddings and Vector Semantics
6	MIDTERM EXAM	RNN	MIDTERM
SB	STUDY BREAK
8	Attention and Transformers	PyTorch and Attention	A2
9	Large Language Models	Transformer (Illustrated and Annotated)
10	Multilinguality and Language Diversity	HuggingFace1
11	Multilinguality and Language Diversity	HuggingFace2	A3
12	Multilinguality and Language Diversity	Transfer Learning	FINAL

Assignments

A Journey through Language Modelling
- Introduction to language modelling applied to low-resource languages.
- Implementation of n-gram models, neural n-gram models, and transformer language models.
- Open-ended exploration to improve results.
- This assignment is borrowed from UC Berkeley’s Computer Science graduate NLP course (cs288) Interactive Assignments for Teaching Structured Neural NLP, Project 1: Language Modeling Link.
Neural Machine Translation with Custom Vocabulary Building & Transformer
- Implementation of a custom transformer architecture using PyTorch.
- Hands-on experience with gradient descent optimization, back-propagation, and loss functions.
- Evaluation of translation performance using BLEU scores.
Adapting Languages with Fine-Tuning
- Fine-tuning existing language models to a low-resource language.
- Exploration of full parameter fine-tuning, LoRA, and prompt tuning.
- Comprehensive evaluation of adapted models against baseline models.

Learning Outcomes

The learning outcomes of the course are designed to be SMART (specific, measurable, achievable, realistic, timely) using the Bloom taxonomy:

Understanding and knowledge of introductory and basic NLP concepts, terminology, tasks, pipeline, and methods/techniques.
Knowledge and application of advanced techniques in LLM such as fine-tuning and pretraining.
Ability to reproduce code for the latest State-of-the-Art NLP baselines and conduct proof-of-concept projects.
Practice collaborative software development and debugging of Deep Learning code.
Creation of experimental results tables and methods to test research hypotheses.
Implementation of advanced techniques in NLP required for conducting research.

References

Lorin W. Anderson and David R. Krathwohl. 2001. A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives. Longman.
Peter F Brown, Vincent J Della Pietra, Peter V Desouza, Jennifer C Lai, and Robert L Mercer. 1992. Class-based n-gram models of natural language. Computational linguistics, 18(4):467–480.
David Gaddy, Daniel Fried, Nikita Kitaev, Mitchell Stern, Rodolfo Corona, John DeNero, and Dan Klein. 2021. Interactive assignments for teaching structured neural nlp. In Proceedings of the Fifth Workshop on Teaching NLP, pages 104–107.
Yoav Goldberg. 2016. A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research, 57:345–420.
And many more...

License

This project is licensed under the Apache2.0 License - see the LICENSE file for details.

We hope this course empowers you to advance your language community and prepares you for a successful career in NLP!

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assignments		assignments
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Empowering Future NLP Experts: An Applied Course on Cutting-Edge Techniques with a Focus on Multilinguality and Language Diversity

Course Overview

Key Highlights

Course Structure and Content

Assignments

Learning Outcomes

References

License

About

Releases

Packages

Contributors 2

Languages

License

Kosei1227/OTU-LLM-Course

Folders and files

Latest commit

History

Repository files navigation

Empowering Future NLP Experts: An Applied Course on Cutting-Edge Techniques with a Focus on Multilinguality and Language Diversity

Course Overview

Key Highlights

Course Structure and Content

Assignments

Learning Outcomes

References

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages