Skip to content

Empowering Future NLP Experts: An Applied Course on Cutting-Edge Techniques with a Focus on Multilinguality and Language Diversity

License

Notifications You must be signed in to change notification settings

Kosei1227/OTU-LLM-Course

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Empowering Future NLP Experts: An Applied Course on Cutting-Edge Techniques with a Focus on Multilinguality and Language Diversity

This repository contains the materials for an applied Natural Language Processing (NLP) course designed for upper-year computer science undergraduate students. The course emphasizes state-of-the-art techniques through self-directed, project-based learning with a focus on multilinguality and language diversity.

Course Overview

The rapid advancements and widespread transformation of Large Language Models (LLMs) have made it necessary to incorporate these cutting-edge techniques into the educational curricula of NLP. This course is designed to empower learners to advance their language community while preparing for industry readiness.

Key Highlights

  • Target Audience: Senior undergraduate students in computer science with prerequisites of linear algebra, calculus, probabilities, and introductory machine learning. Junior graduate students with an interest in NLP research may also join.
  • Focus: Multilinguality and language diversity, ideal for empowering a diverse and multicultural student population.
  • Pedagogical Approach: Self-directed learning, hands-on labs, assignments, and a comprehensive final project.

Course Structure and Content

The course is 12 weeks long, with 9 weeks of lectures and 3 weeks for invited speakers who are working in multilinguality and language diversity.

Week Lecture Topics Lab Notebooks Assignments
1 Course Introduction Python and Regex
2 Corpus Statistics and n-Gram Language Model N-Gram Language Modelling
3 Entropy Decisions PyTorch Introduction A1
4 Machine Learning and Feature Classification Naive Bayes and Text Classification
5 Neural Language Models Word Embeddings and Vector Semantics
6 MIDTERM EXAM RNN MIDTERM
SB STUDY BREAK
8 Attention and Transformers PyTorch and Attention A2
9 Large Language Models Transformer (Illustrated and Annotated)
10 Multilinguality and Language Diversity HuggingFace1
11 Multilinguality and Language Diversity HuggingFace2 A3
12 Multilinguality and Language Diversity Transfer Learning FINAL

Assignments

  1. A Journey through Language Modelling

    • Introduction to language modelling applied to low-resource languages.
    • Implementation of n-gram models, neural n-gram models, and transformer language models.
    • Open-ended exploration to improve results.
    • This assignment is borrowed from UC Berkeley’s Computer Science graduate NLP course (cs288) Interactive Assignments for Teaching Structured Neural NLP, Project 1: Language Modeling Link.
  2. Neural Machine Translation with Custom Vocabulary Building & Transformer

    • Implementation of a custom transformer architecture using PyTorch.
    • Hands-on experience with gradient descent optimization, back-propagation, and loss functions.
    • Evaluation of translation performance using BLEU scores.
  3. Adapting Languages with Fine-Tuning

    • Fine-tuning existing language models to a low-resource language.
    • Exploration of full parameter fine-tuning, LoRA, and prompt tuning.
    • Comprehensive evaluation of adapted models against baseline models.

Learning Outcomes

The learning outcomes of the course are designed to be SMART (specific, measurable, achievable, realistic, timely) using the Bloom taxonomy:

  1. Understanding and knowledge of introductory and basic NLP concepts, terminology, tasks, pipeline, and methods/techniques.
  2. Knowledge and application of advanced techniques in LLM such as fine-tuning and pretraining.
  3. Ability to reproduce code for the latest State-of-the-Art NLP baselines and conduct proof-of-concept projects.
  4. Practice collaborative software development and debugging of Deep Learning code.
  5. Creation of experimental results tables and methods to test research hypotheses.
  6. Implementation of advanced techniques in NLP required for conducting research.

References

  • Lorin W. Anderson and David R. Krathwohl. 2001. A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives. Longman.
  • Peter F Brown, Vincent J Della Pietra, Peter V Desouza, Jennifer C Lai, and Robert L Mercer. 1992. Class-based n-gram models of natural language. Computational linguistics, 18(4):467–480.
  • David Gaddy, Daniel Fried, Nikita Kitaev, Mitchell Stern, Rodolfo Corona, John DeNero, and Dan Klein. 2021. Interactive assignments for teaching structured neural nlp. In Proceedings of the Fifth Workshop on Teaching NLP, pages 104–107.
  • Yoav Goldberg. 2016. A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research, 57:345–420.
  • And many more...

License

This project is licensed under the Apache2.0 License - see the LICENSE file for details.


We hope this course empowers you to advance your language community and prepares you for a successful career in NLP!

About

Empowering Future NLP Experts: An Applied Course on Cutting-Edge Techniques with a Focus on Multilinguality and Language Diversity

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published