- Sung Kim
- Jungwoo Ha (Adjunct Prof)
This course will provide students with a comprehensive understanding of Large Language Models (LLMs) and their practical applications in a production environment, using LLM Ops methodology. We will explore the latest developments in LLMs and LLM Ops, as well as hands-on training in developing, deploying, and evaluating LLMs.
TBA
- Participations: 10%
- Inclass test: 10%
- Homework: 30%
- Mid-project proposal: 20%
- Final project: 30%
- HW1 (read and one-page report) and submit one page summary for each paper
- Transformer: https://arxiv.org/abs/1706.03762
- GPT3.5: https://arxiv.org/abs/2203.02155
- TA will explain basic code of finetuning 1 hour
- Online test after that. Students must have more than 70 to enroll this class
- Simple example: https://github.com/facebookresearch/llama-recipes/blob/main/examples/quickstart.ipynb
- https://artificialcorner.com/mastering-llama-2-a-comprehensive-guide-to-fine-tuning-in-google-colab-bedfcc692b7f
- https://huggingface.co/docs/trl/index
- https://aws.amazon.com/blogs/machine-learning/fine-tune-llama-2-for-text-generation-on-amazon-sagemaker-jumpstart/
- https://github.com/facebookresearch/llama-recipes
- Read the paper in advance: https://arxiv.org/abs/2303.18223
Week 9 Mar 29: LLM pre-training, ecosystem, and Sovereign LLM (Jung-Woo Ha, Head of Naver AI, Adjunct Prof at HKUST) @Zoom
Qingming Festival Break Apr 5: HW Read: LLM Evaluations, https://arxiv.org/abs/2307.03109
Increasing the model size, dataset size, and amount of compute for training has been shown to steadily improve the performance of Large Language Models (LLMs). However, unlike labs affiliated with companies like Google, which have access to vast computational resources, academic labs face the challenge of finding alternative and more sustainable ways of scaling up LLMs. In this talk, I will describe our journey of pre-training a 7B-parameter model from scratch. I will delve into the technical aspects of our approach, including the architecture of our model, the training dataset, and the optimization techniques employed. Furthermore, I will discuss the computational resources and infrastructure utilized, highlighting the challenges faced and the solutions implemented to overcome them within an academic setting. In addition to the practical experience of training a large-scale LLM, I will also share some of our ongoing investigations into modular design and continual learning as potential avenues for sustainable scale-up.
- Students will learn about monitoring and maintenance techniques for LLM models in a production environment, and will set up monitoring and alerting mechanisms for their deployed models.
- Final project presentations to the class, and feedback from instructors and peers.
- Course review and wrap-up.
Overall, the course will provide students with a solid understanding of LLM Ops methodology, as well as hands-on experience in developing, deploying, and evaluating LLMs in a production environment. This skill set is essential for any career in the field of natural language processing. The course will be co-taught by Sung Kim and Jungwoo Ha, who have extensive experience in the field and have worked on a variety of LLM-based projects.
We encourage students with a background in natural language processing, machine learning, or data science to enroll in this course. Students should have experience with programming languages such as Python, and familiarity with deep learning frameworks such as TensorFlow or PyTorch.