# Group Project / Assignment 4: Instruction finetuning a Llama-3.2 model
**Assignment due 21 April 11:59pm**

Welcome to the fourth and final assignment for 50.055 Machine Learning Operations. The third and fourth assignment together form the course group project. You will continue the work on a chatbot which can answer questions about SUTD to prospective students.


**This assignment is a group assignment.**

- Read the instructions in this notebook carefully
- Add your solution code and answers in the appropriate places. The questions are marked as **QUESTION:**, the places where you need to add your code and text answers are marked as **ADD YOUR SOLUTION HERE**. The assignment is more open-ended than previous assignments, i.e. you have more freedom how to solve the problem and how to structure your code.
- The completed notebook, including your added code and generated output will be your submission for the assignment.
- The notebook should execute without errors from start to finish when you select "Restart Kernel and Run All Cells..". Please test this before submission.
- Use the SUTD Education Cluster to solve and test the assignment. If you work on another environment, minimally test your work on the SUTD Education Cluster.

**Rubric for assessment** 

Your submission will be graded using the following criteria. 
1. Code executes: your code should execute without errors. The SUTD Education cluster should be used to ensure the same execution environment.
2. Correctness: the code should produce the correct result or the text answer should state the factual correct answer.
3. Style: your code should be written in a way that is clean and efficient. Your text answers should be relevant, concise and easy to understand.
4. Partial marks will be awarded for partially correct solutions.
5. Creativity and innovation: in this assignment you have more freedom to design your solution, compared to the first assignments. You can show of your creativity and innovative mindset. 
6. There is a maximum of 310 points for this assignment.

**ChatGPT policy** 

If you use AI tools, such as ChatGPT, to solve the assignment questions, you need to be transparent about its use and mark AI-generated content as such. In particular, you should include the following in addition to your final answer:
- A copy or screenshot of the prompt you used
- The name of the AI model
- The AI generated output
- An explanation why the answer is correct or what you had to change to arrive at the correct answer

**Assignment Notes:** Please make sure to save the notebook as you go along. Submission Instructions are located at the bottom of the notebook.



### Finetuning LLMs

The goal of the assignment is to build a more advanced chatbot that can talk to prospective students and answer questions about SUTD.

We will finetune a smaller 1B LLM on question-answer pairs which we synthetically generate. Then we will compare the finetuned and non-finetuned LLMs with and without RAG to see if we were able to improve the SUTD chatbot answer quality. 

We'll be leveraging `langchain`, `llama 3.2` and `Google AI STudio with Gemini 2.0`.

Check out the docs:
- [LangChain](https://docs.langchain.com/docs/)
- [Llama 3.2](https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_2/)
- [Google AI Studio](https://aistudio.google.com/)

Note: Google AI Studio provides a lot of free tokens but has certain rate limits. Write your code in a way that it can handle these limits.

# Install dependencies
Use pip to install all required dependencies of this assignment in the cell below. Make sure to test this on the SUTD cluster as different environments have different software pre-installed.  

In [66]:
# QUESTION: Install and import all required packages
# The rest of your code should execute without any import or dependency errors.

# **--- ADD YOUR SOLUTION HERE (10 points) ---**
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
!pip install transformers
!pip install datasets
!pip install accelerate
!pip install huggingface_hub
!pip install langchain langchain-core langchain-community langchain-experimental langchain-google-genai
!pip install google-generativeai
!pip install flashrank
!pip install openai
!pip install sentence-transformers
!pip install dotenv
!pip install pydantic
!pip install unsloth
!pip install accelerate
!pip install peft
!pip install trl
!pip install bitsandbytes

Looking in indexes: https://download.pytorch.org/whl/cu118
Collecting google-ai-generativelanguage<0.7.0,>=0.6.16 (from langchain-google-genai)
  Using cached google_ai_generativelanguage-0.6.17-py3-none-any.whl.metadata (9.8 kB)
Using cached google_ai_generativelanguage-0.6.17-py3-none-any.whl (1.4 MB)
Installing collected packages: google-ai-generativelanguage
  Attempting uninstall: google-ai-generativelanguage
    Found existing installation: google-ai-generativelanguage 0.6.15
    Uninstalling google-ai-generativelanguage-0.6.15:
      Successfully uninstalled google-ai-generativelanguage-0.6.15
Successfully installed google-ai-generativelanguage-0.6.17


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-generativeai 0.8.4 requires google-ai-generativelanguage==0.6.15, but you have google-ai-generativelanguage 0.6.17 which is incompatible.


Collecting google-ai-generativelanguage==0.6.15 (from google-generativeai)
  Using cached google_ai_generativelanguage-0.6.15-py3-none-any.whl.metadata (5.7 kB)
Using cached google_ai_generativelanguage-0.6.15-py3-none-any.whl (1.3 MB)
Installing collected packages: google-ai-generativelanguage
  Attempting uninstall: google-ai-generativelanguage
    Found existing installation: google-ai-generativelanguage 0.6.17
    Uninstalling google-ai-generativelanguage-0.6.17:
      Successfully uninstalled google-ai-generativelanguage-0.6.17
Successfully installed google-ai-generativelanguage-0.6.15


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
langchain-google-genai 2.1.2 requires google-ai-generativelanguage<0.7.0,>=0.6.16, but you have google-ai-generativelanguage 0.6.15 which is incompatible.


Collecting unsloth
  Using cached unsloth-2025.3.19-py3-none-any.whl.metadata (46 kB)
Collecting unsloth_zoo>=2025.3.17 (from unsloth)
  Using cached unsloth_zoo-2025.3.17-py3-none-any.whl.metadata (8.0 kB)
Collecting xformers>=0.0.27.post2 (from unsloth)
  Using cached xformers-0.0.29.post3-cp311-cp311-win_amd64.whl.metadata (1.0 kB)
Collecting bitsandbytes (from unsloth)
  Downloading bitsandbytes-0.45.5-py3-none-win_amd64.whl.metadata (5.1 kB)
Collecting triton-windows (from unsloth)
  Using cached triton_windows-3.2.0.post17-cp311-cp311-win_amd64.whl.metadata (1.5 kB)
Collecting tyro (from unsloth)
  Using cached tyro-0.9.18-py3-none-any.whl.metadata (9.2 kB)
Collecting sentencepiece>=0.2.0 (from unsloth)
  Using cached sentencepiece-0.2.0-cp311-cp311-win_amd64.whl.metadata (8.3 kB)
Collecting wheel>=0.42.0 (from unsloth)
  Using cached wheel-0.45.1-py3-none-any.whl.metadata (2.3 kB)
Collecting trl!=0.15.0,!=0.9.0,!=0.9.1,!=0.9.2,!=0.9.3,<=0.15.2,>=0.7.9 (from unsloth)
  Using cach

  You can safely remove it manually.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
grpcio-status 1.71.0 requires protobuf<6.0dev,>=5.26.1, but you have protobuf 3.20.3 which is incompatible.
langchain-google-genai 2.1.2 requires google-ai-generativelanguage<0.7.0,>=0.6.16, but you have google-ai-generativelanguage 0.6.15 which is incompatible.




# Importing libraries

In [68]:
from dotenv import load_dotenv
from langchain_google_genai import ChatGoogleGenerativeAI
# TODO: check the difference between ChatGoogleGenerativeAI and GoogleGenerativeAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
import pandas as pd
from sklearn.model_selection import train_test_split
from datasets import Dataset, DatasetDict
from datasets import load_dataset
from huggingface_hub import login
from transformers import TrainingArguments
from trl import SFTTrainer
from unsloth import FastLanguageModel
from typing import List
import os
from pydantic import BaseModel
import time
import json
import torch

load_dotenv()
GOOGLE_GENAI_API_KEY = os.getenv("GOOGLE_GENAI_API_KEY")
HUGGINGFACE_TOKEN = os.getenv("HUGGINGFACE_TOKEN")

login(token=HUGGINGFACE_TOKEN)

# Generate training data
The first step of the assignment is generating synthetic question-answer pairs which can be used for finetuning an LLM model. 
Use the Google AI studio with the Gemini models to create -high-quality QA training data.


In [2]:
# QUESTION: Use langchain and the Google AI Studio APIs and a model from the Gemini 2.0 family
# to create a text-generation chain that can produce and parse JSON output.
# Test it by having the LLM generate a JSON array of 3 fruits

#--- ADD YOUR SOLUTION HERE (20 points)---

# --------------------------------------------------------------------------------

# we are creating synthetic data that contains only questions and responses but not on multiple tasks, because this is a chatbot application
model = ChatGoogleGenerativeAI(google_api_key=GOOGLE_GENAI_API_KEY, model="gemini-2.0-flash", temperature=0.2, convert_system_message_to_human=True)
parser = JsonOutputParser()
input_prompt = "Generate a JSON array containing exactly 3 fruit names."
prompt = ChatPromptTemplate.from_messages([
    ("system", """
    You are a helpful assistant that generates structured JSON data.
    Your response should ONLY contain valid JSON without any additional text.
    Do not include explanation, notes, or markdown formatting.
    """),
    ("human", "{input_text}")
])
chain = ( prompt | model | parser )
response = chain.invoke({"input_text": input_prompt})
print(response)

# --------------------------------------------------------------------------------

# imo there is a better way to do this which is using pydantic and langchain
class FruitList(BaseModel):
    fruits: List[str] = Field(description="A list of fruit objects", min_length=3, max_length=3)
    
pydantic_parser = JsonOutputParser(pydantic_object=FruitList)
prompt = PromptTemplate(
    template  = "Answer the user's question.\n\n{format_instructions}\n\n{input}",
    input_variables = ["input"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)
   
chain = ( prompt | model | pydantic_parser )
response = chain.invoke({"input": input_prompt})
print(response)



['apple', 'banana', 'orange']




['apple', 'banana', 'orange']


## Generate topics
When generating data, it is often helpful to guide the generation process through some hierachical structure. 
Before we create question-answer pairs, let's generate some topics which the questions should be about.



In [49]:
# QUESTION: Create a function 'generate_topics' which generates topics which prospective students might care about.
#
# Generate a list of 20 topics 

#--- ADD YOUR SOLUTION HERE (20 points)---
def generate_topics(num = 20):
    class TopicsList(BaseModel):
        topics: List[str] = Field(description=f"A list of {num} topics that prospective students might care about", min_length=num, max_length=num)
    
    # if the file already exists don't run instead return the topics from the file
    output_dir = "data"
    
    if os.path.exists(output_dir) == False:
        os.makedirs(output_dir)
        
    filename = f"data/topics.json"
            
    model = ChatGoogleGenerativeAI(
        google_api_key=GOOGLE_GENAI_API_KEY,
        model="gemini-2.0-flash",
        temperature=0.8, # using this value for more diverse topics
        convert_system_message_to_human=True #the system message is reformatted or “converted” into a style that resembles the way a human would pose a question or comment
    )    
    
    pydantic_parser = JsonOutputParser(pydantic_object=TopicsList)
    pydantic_parser = JsonOutputParser(pydantic_object=TopicsList)
    
    prompt = PromptTemplate(
        template=(
            "List out topics a prospective student might be interested in when "
            "Think of real concerns such as academic quality, campus life, tuition, "
            "social environment, and career opportunities.\n\n"
            "{format_instructions}\n\n"
            "{input}"
        ),
        input_variables=["input"],
        partial_variables={"format_instructions": pydantic_parser.get_format_instructions()},
    )
    input_prompt = (
        f"Please list exactly {num} topics that capture what prospective students care about "
        "when choosing a university like SUTD."
    )
        
    chain = prompt | model | parser 
    response = chain.invoke({"input": input_prompt})

    topics = response["topics"]
    
    # save to disk otherwise
    with open(filename, 'w') as file:
        json.dump(topics, file)
    
    return topics

In [50]:
# test topic generation
print(generate_topics(3))



['Academic Rigor and Curriculum', 'Career Prospects and Industry Connections', 'Campus Culture and Student Life']


In [51]:
# Generate a list of 20 topics 
# We save a copy to disk and reload it from there if the file exists

# the function to generate topics and saves it to disk as topics_20.json
print(generate_topics(20))



['Academic Reputation and Ranking', 'Specific Program Strengths (e.g., Engineering, Design)', 'Faculty Expertise and Research Opportunities', 'Hands-on Learning and Project-Based Curriculum', 'Internship Opportunities and Industry Connections', 'Career Services and Graduate Employment Rate', 'Tuition Fees and Financial Aid Options', 'Scholarship Availability and Eligibility Criteria', 'Cost of Living in Singapore', 'Accommodation Options (on-campus and off-campus)', 'Campus Facilities and Resources (labs, library, maker spaces)', 'Student-Faculty Ratio and Class Sizes', 'Student Support Services (academic advising, counseling)', 'Campus Culture and Student Life', 'Diversity and Inclusion Initiatives', 'Extracurricular Activities and Clubs', 'Location and Accessibility of the University', 'Opportunities for International Exchange Programs', 'Networking Opportunities with Alumni', 'Safety and Security on Campus']


## Generate questions
Now generate a set of questions about each topic

In [52]:
# QUESTION: Create a function 'generate_questions' which generates quetions about a given topic. 
# Generate a list of 10 questions per topics. In total you should have 200 questions. 
#

#--- ADD YOUR SOLUTION HERE (20 points)---
# TODO: rememeber to add timeout to the function
def generate_questions(topic, num=10):
    output_dir = "data"
    
    if os.path.exists(output_dir) == False:
        os.makedirs(output_dir)

    model = ChatGoogleGenerativeAI(
        google_api_key=GOOGLE_GENAI_API_KEY,
        model="gemini-2.0-flash",
        temperature=0.4, # hyperparameter can be changed
        convert_system_message_to_human=True
    )    
    
    class Questions(BaseModel):
        questions: List[str] = Field(
            description=f"A list of {num} questions about a specific topic", 
            min_length=num, 
            max_length=num
        )
    
    parser = JsonOutputParser(pydantic_object=Questions)
    
    # i think the best thing is to avoid specifics to the university to avoid confusion
    prompt = PromptTemplate(
        template=(
            "Imagine you are a prospective university student wanting to know more about a specific aspect. "
            "For the topic provided, generate exactly {num_questions} questions that you might naturally ask. "
            "Topic: {topic}\n\n"
            "{format_instructions}"
        ),
        input_variables=["topic", "num_questions"],
        partial_variables={"format_instructions": parser.get_format_instructions()},
    )
    chain = prompt | model | parser   
    
    questions_data = {}
    
    try:
        print(f"Generating questions for topic: {topic}")
        response = chain.invoke({"topic": topic, "num_questions": num})
        questions_data[topic] = response["questions"]
        
    except Exception as e:
        print(f"Error generating questions for topic '{topic}': {e}")

    return questions_data

In [53]:
# test it
print(generate_questions("Academic Reputation and Program Quality", 3))



Generating questions for topic: Academic Reputation and Program Quality
{'Academic Reputation and Program Quality': ['What specific accreditations does the [Program Name] program hold, and how do these accreditations benefit students?', 'Could you provide data on graduate employment rates and the types of positions graduates typically obtain after completing the [Program Name] program?', 'How does the university ensure the curriculum for the [Program Name] program remains current and relevant to industry trends and advancements?']}


In [56]:
# # QUESTION: Now let's put it together and generate 10 questions for each topic. Save the questions in a local file.

#--- ADD YOUR SOLUTION HERE (20 points)---
with open("data/topics.json", "r") as f:
    topics = json.load(f)

rate_limit = 15
requests_made = 0
all_questions = {}

for i, topic in enumerate(topics):
    if requests_made >= rate_limit:
        time.sleep(60)
        requests_made = 0
    response = generate_questions(topic)
    all_questions[topic] = response[topic]
    requests_made += 1

with open("data/questions.json", "w") as f:
    json.dump(all_questions, f, indent=2)



Generating questions for topic: Academic Reputation and Ranking




Generating questions for topic: Specific Program Strengths (e.g., Engineering, Design)




Generating questions for topic: Faculty Expertise and Research Opportunities




Generating questions for topic: Hands-on Learning and Project-Based Curriculum




Generating questions for topic: Internship Opportunities and Industry Connections




Generating questions for topic: Career Services and Graduate Employment Rate




Generating questions for topic: Tuition Fees and Financial Aid Options




Generating questions for topic: Scholarship Availability and Eligibility Criteria




Generating questions for topic: Cost of Living in Singapore




Generating questions for topic: Accommodation Options (on-campus and off-campus)




Generating questions for topic: Campus Facilities and Resources (labs, library, maker spaces)




Generating questions for topic: Student-Faculty Ratio and Class Sizes




Generating questions for topic: Student Support Services (academic advising, counseling)




Generating questions for topic: Campus Culture and Student Life




Generating questions for topic: Diversity and Inclusion Initiatives




Generating questions for topic: Extracurricular Activities and Clubs




Generating questions for topic: Location and Accessibility of the University




Generating questions for topic: Opportunities for International Exchange Programs




Generating questions for topic: Networking Opportunities with Alumni




Generating questions for topic: Safety and Security on Campus


## Generate Answers

Now create answers for the questions. 

You can use the Google AI Studio Gemini model (assuming that they are good enough to generate good answers), your RAG system from assignment 3 or any other method you choose to generate answers for your question dataset.

Note: it is normal that some LLM calls fail, even with retry, so maybe you end up with less than 200 QA pairs but it should be at least 160 QA pairs.

In [57]:
# QUESTION: Generate answers to al your questions using Gemini, your SUTD RAG system or any other method.
# Split your dataset in to 80% training and 20% test dataset.
# Store all questions and answer pairs in a huggingface dataset `sutd_qa_dataset` and push it to your Huggingface hub. 

#--- ADD YOUR SOLUTION HERE (40 points)---

# generate answers to al the questions using Gemini and then split the dataset in to 80% training and 20% test dataset.
def generate_answer(question):
    class Answer(BaseModel):
        answer: str = Field(description="The answer to the question")
        
    parser = JsonOutputParser(pydantic_object=Answer)
    prompt = PromptTemplate(
        template=(
            "Answer the following question in a friendly, clear, and brief manner, as though you "
            "are advising a prospective student. Use simple language and get straight to the point.\n\n"
            "{format_instructions}\n\n"
            "{question}"
        ),
        input_variables=["question"],
        partial_variables={"format_instructions": parser.get_format_instructions()},
    )
    
    model = ChatGoogleGenerativeAI(
        google_api_key=GOOGLE_GENAI_API_KEY, 
        model="gemini-2.0-flash",
        temperature=0.3, 
        convert_system_message_to_human=True
    )
    chain = prompt | model | parser
    return chain.invoke({"question": question})

In [59]:
# now split the dataset in to 80% training and 20% test dataset

# load the questions from the json
with open("data/questions.json", "r") as f:
    questions_data = json.load(f)

# flattening the questions data with topic as the key
flat_questions = []
for topic, questions in questions_data.items():
    for question in questions:
        flat_questions.append({"topic": topic, "question": question})

questions_df = pd.DataFrame(flat_questions)

# splits
train_df, test_df = train_test_split(questions_df, test_size=0.2, random_state=42)

print(f"Training size: {len(train_df)}")
print(f"Test size: {len(test_df)}")

# save
train_df.to_csv("data/train.csv", index=False)
test_df.to_csv("data/test.csv", index=False)

Training size: 160
Test size: 40


In [60]:
# test the chain
question = "When was SUTD founded?"

# Now run the answer generation chain
response = generate_answer(question)
print("\nModel Response:")
print(response["answer"])




Model Response:
SUTD was founded in 2009.


In [61]:
# now run the chain for all questions to collect context and generate answers

# load the questions
def process_dataset(csv_file, output_file):
    # check if the output file already exists to resume from previous run (IT CRASHED)
    try:
        df = pd.read_csv(output_file)
        print(f"Resuming from existing file: {output_file}")
    except FileNotFoundError:
        df = pd.read_csv(csv_file)
        df['answer'] = None  # add empty answer column to be filled in
   
    requests_made = 0
    rate_limit = 12  # reduced from 15 to 12 to be safer
    count = 0
    total = len(df)
    start_time = time.time()
    
    # track which rows we've already processed
    processed_rows = 0
    for index, row in df.iterrows():
        # skip rows that already have answers
        if pd.notna(row.get('answer')):
            processed_rows += 1
            continue
            
        question = row['question']
       
        # rate limit by time window (60 seconds) and requests made
        current_time = time.time()
        elapsed_time = current_time - start_time
        if requests_made >= rate_limit:
            remaining_time = 60 - elapsed_time
            # if requests made is greater than rate limit, and time has elapsed is greater than 60 seconds, sleep for remaining time
            if remaining_time > 0:
                print(f"Rate limit reached. Sleeping for {remaining_time:.2f} seconds...")
                time.sleep(remaining_time)
            # reset counter and time window
            requests_made = 0
            start_time = time.time()
       
        # generate answer
        try:
            answer = generate_answer(question=question)
            df.at[index, 'answer'] = answer["answer"]
            
            # Save progress after each successful answer
            if processed_rows % 5 == 0:  # Save every 5 processed items
                df.to_csv(output_file, index=False)
                
            count += 1
            requests_made += 1
            processed_rows += 1
            print(f"Processing {csv_file}: {processed_rows}/{total}")
            
        except Exception as e:
            print(f"Error processing question: {question}")
            print(f"Error: {e}")
            # save progress asap
            df.to_csv(output_file, index=False)
            
            # if quota exceeded, wait longer
            if "429" in str(e) or "exceeded" in str(e).lower() or "ResourceExhausted" in str(e):
                wait_time = 120  # 2min wait time 
                print(f"API quota exceeded. Waiting for {wait_time} seconds before retrying...")
                time.sleep(wait_time)
                requests_made = 0  # reset
                start_time = time.time() 
            continue
   
    # save the complete dataset
    df.to_csv(output_file, index=False)
    print(f"Finished writing {output_file}")
    return df

print("Generating answers for the training dataset...")
train_with_answers = process_dataset("data/train.csv", "data/train.csv")  

print("Generating answers for the testing dataset...")
test_with_answers = process_dataset("data/test.csv", "data/test.csv") 

Generating answers for the training dataset...
Resuming from existing file: data/train.csv




Processing data/train.csv: 1/160




Processing data/train.csv: 2/160




Processing data/train.csv: 3/160




Processing data/train.csv: 4/160




Processing data/train.csv: 5/160




Processing data/train.csv: 6/160




Processing data/train.csv: 7/160




Processing data/train.csv: 8/160




Processing data/train.csv: 9/160




Processing data/train.csv: 10/160




Processing data/train.csv: 11/160




Processing data/train.csv: 12/160
Rate limit reached. Sleeping for 49.00 seconds...




Processing data/train.csv: 13/160




Processing data/train.csv: 14/160




Processing data/train.csv: 15/160




Processing data/train.csv: 16/160




Processing data/train.csv: 17/160




Processing data/train.csv: 18/160




Processing data/train.csv: 19/160




Processing data/train.csv: 20/160




Processing data/train.csv: 21/160




Processing data/train.csv: 22/160




Processing data/train.csv: 23/160




Processing data/train.csv: 24/160
Rate limit reached. Sleeping for 47.67 seconds...




Processing data/train.csv: 25/160




Processing data/train.csv: 26/160




Processing data/train.csv: 27/160




Processing data/train.csv: 28/160




Processing data/train.csv: 29/160




Processing data/train.csv: 30/160




Processing data/train.csv: 31/160




Processing data/train.csv: 32/160




Processing data/train.csv: 33/160




Processing data/train.csv: 34/160




Processing data/train.csv: 35/160




Processing data/train.csv: 36/160
Rate limit reached. Sleeping for 46.32 seconds...




Processing data/train.csv: 37/160




Processing data/train.csv: 38/160




Processing data/train.csv: 39/160




Processing data/train.csv: 40/160




Processing data/train.csv: 41/160




Processing data/train.csv: 42/160




Processing data/train.csv: 43/160




Processing data/train.csv: 44/160




Processing data/train.csv: 45/160




Processing data/train.csv: 46/160




Processing data/train.csv: 47/160




Processing data/train.csv: 48/160
Rate limit reached. Sleeping for 46.00 seconds...




Processing data/train.csv: 49/160




Processing data/train.csv: 50/160




Processing data/train.csv: 51/160




Processing data/train.csv: 52/160




Processing data/train.csv: 53/160




Processing data/train.csv: 54/160




Processing data/train.csv: 55/160




Processing data/train.csv: 56/160




Processing data/train.csv: 57/160




Processing data/train.csv: 58/160




Processing data/train.csv: 59/160




Processing data/train.csv: 60/160
Rate limit reached. Sleeping for 45.80 seconds...




Processing data/train.csv: 61/160




Processing data/train.csv: 62/160




Processing data/train.csv: 63/160




Processing data/train.csv: 64/160




Processing data/train.csv: 65/160




Processing data/train.csv: 66/160




Processing data/train.csv: 67/160




Processing data/train.csv: 68/160




Processing data/train.csv: 69/160




Processing data/train.csv: 70/160




Processing data/train.csv: 71/160




Processing data/train.csv: 72/160
Rate limit reached. Sleeping for 40.45 seconds...




Processing data/train.csv: 73/160




Processing data/train.csv: 74/160




Processing data/train.csv: 75/160




Processing data/train.csv: 76/160




Processing data/train.csv: 77/160




Processing data/train.csv: 78/160




Processing data/train.csv: 79/160




Processing data/train.csv: 80/160




Processing data/train.csv: 81/160




Processing data/train.csv: 82/160




Processing data/train.csv: 83/160




Processing data/train.csv: 84/160
Rate limit reached. Sleeping for 41.61 seconds...




Processing data/train.csv: 85/160




Processing data/train.csv: 86/160




Processing data/train.csv: 87/160




Processing data/train.csv: 88/160




Processing data/train.csv: 89/160




Processing data/train.csv: 90/160




Processing data/train.csv: 91/160




Processing data/train.csv: 92/160




Processing data/train.csv: 93/160




Processing data/train.csv: 94/160




Processing data/train.csv: 95/160




Processing data/train.csv: 96/160
Rate limit reached. Sleeping for 41.78 seconds...




Processing data/train.csv: 97/160




Processing data/train.csv: 98/160




Processing data/train.csv: 99/160




Processing data/train.csv: 100/160




Processing data/train.csv: 101/160




Processing data/train.csv: 102/160




Processing data/train.csv: 103/160




Processing data/train.csv: 104/160




Processing data/train.csv: 105/160




Processing data/train.csv: 106/160




Processing data/train.csv: 107/160




Processing data/train.csv: 108/160
Rate limit reached. Sleeping for 44.41 seconds...




Processing data/train.csv: 109/160




Processing data/train.csv: 110/160




Processing data/train.csv: 111/160




Processing data/train.csv: 112/160




Processing data/train.csv: 113/160




Processing data/train.csv: 114/160




Processing data/train.csv: 115/160




Processing data/train.csv: 116/160




Processing data/train.csv: 117/160




Processing data/train.csv: 118/160




Processing data/train.csv: 119/160




Processing data/train.csv: 120/160
Rate limit reached. Sleeping for 41.98 seconds...




Processing data/train.csv: 121/160




Processing data/train.csv: 122/160




Processing data/train.csv: 123/160




Processing data/train.csv: 124/160




Processing data/train.csv: 125/160




Processing data/train.csv: 126/160




Processing data/train.csv: 127/160




Processing data/train.csv: 128/160




Processing data/train.csv: 129/160




Processing data/train.csv: 130/160




Processing data/train.csv: 131/160




Processing data/train.csv: 132/160
Rate limit reached. Sleeping for 50.14 seconds...




Processing data/train.csv: 133/160




Processing data/train.csv: 134/160




Processing data/train.csv: 135/160




Processing data/train.csv: 136/160




Processing data/train.csv: 137/160




Processing data/train.csv: 138/160




Processing data/train.csv: 139/160




Processing data/train.csv: 140/160




Processing data/train.csv: 141/160




Processing data/train.csv: 142/160




Processing data/train.csv: 143/160




Processing data/train.csv: 144/160
Rate limit reached. Sleeping for 50.57 seconds...




Processing data/train.csv: 145/160




Processing data/train.csv: 146/160




Processing data/train.csv: 147/160




Processing data/train.csv: 148/160




Processing data/train.csv: 149/160




Processing data/train.csv: 150/160




Processing data/train.csv: 151/160




Processing data/train.csv: 152/160




Processing data/train.csv: 153/160




Processing data/train.csv: 154/160




Processing data/train.csv: 155/160




Processing data/train.csv: 156/160
Rate limit reached. Sleeping for 50.54 seconds...




Processing data/train.csv: 157/160




Processing data/train.csv: 158/160




Processing data/train.csv: 159/160




Processing data/train.csv: 160/160
Finished writing data/train.csv
Generating answers for the testing dataset...
Resuming from existing file: data/test.csv




Processing data/test.csv: 1/40




Processing data/test.csv: 2/40




Processing data/test.csv: 3/40




Processing data/test.csv: 4/40




Processing data/test.csv: 5/40




Processing data/test.csv: 6/40




Processing data/test.csv: 7/40




Processing data/test.csv: 8/40




Processing data/test.csv: 9/40




Processing data/test.csv: 10/40




Processing data/test.csv: 11/40




Processing data/test.csv: 12/40
Rate limit reached. Sleeping for 50.96 seconds...




Processing data/test.csv: 13/40




Processing data/test.csv: 14/40




Processing data/test.csv: 15/40




Processing data/test.csv: 16/40




Processing data/test.csv: 17/40




Processing data/test.csv: 18/40




Processing data/test.csv: 19/40




Processing data/test.csv: 20/40




Processing data/test.csv: 21/40




Processing data/test.csv: 22/40




Processing data/test.csv: 23/40




Processing data/test.csv: 24/40
Rate limit reached. Sleeping for 50.58 seconds...




Processing data/test.csv: 25/40




Processing data/test.csv: 26/40




Processing data/test.csv: 27/40




Processing data/test.csv: 28/40




Processing data/test.csv: 29/40




Processing data/test.csv: 30/40




Processing data/test.csv: 31/40




Processing data/test.csv: 32/40




Processing data/test.csv: 33/40




Processing data/test.csv: 34/40




Processing data/test.csv: 35/40




Processing data/test.csv: 36/40
Rate limit reached. Sleeping for 50.87 seconds...




Processing data/test.csv: 37/40




Processing data/test.csv: 38/40




Processing data/test.csv: 39/40




Processing data/test.csv: 40/40
Finished writing data/test.csv


In [None]:
# push to huggingface
train_df = pd.read_csv("data/train.csv")
test_df = pd.read_csv("data/test.csv")

# convert to huggingface format
train_dataset = Dataset.from_pandas(train_with_answers)
test_dataset = Dataset.from_pandas(test_with_answers)

sutd_qa_dataset = DatasetDict({
    "train": train_dataset,
    "test": test_dataset
})

# huggingface push
# TODO: change username to .env
sutd_qa_dataset.push_to_hub("adi0308/sutd_qa_dataset")

Creating parquet from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 997.69ba/s]
Uploading the dataset shards: 100%|██████████| 1/1 [00:01<00:00,  1.48s/it]
Creating parquet from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 998.88ba/s]
Uploading the dataset shards: 100%|██████████| 1/1 [00:01<00:00,  1.39s/it]


CommitInfo(commit_url='https://huggingface.co/datasets/adi0308/sutd_qa_dataset/commit/0a7d8b193b7e016f64c87ec56af251f6e2f84340', commit_message='Upload dataset', commit_description='', oid='0a7d8b193b7e016f64c87ec56af251f6e2f84340', pr_url=None, repo_url=RepoUrl('https://huggingface.co/datasets/adi0308/sutd_qa_dataset', endpoint='https://huggingface.co', repo_type='dataset', repo_id='adi0308/sutd_qa_dataset'), pr_revision=None, pr_num=None)

# Finetune Llama 3.2 1B model

Now use your SUTD QA dataset training data set to finetune a smaller Llama 3.2 1B LLM using parameter-efficient finetuning (PEFT). 
We recommend the unsloth library but you are free to choose other frameworks. You can decide the parameters for the finetuning. 
Push your finetuned model to Huggingface. 

Then we will compare the finetuned and non-finetuned LLMs with and without RAG to see if we were able to improve the SUTD chatbot answer quality. 


In [65]:
# load the dataset from huggingface
# TODO: replace the username with .env
dataset = load_dataset("adi0308/sutd_qa_dataset")

train_data = dataset["train"]
test_data = dataset["test"]

def format_instruction(example):
    return {
        "text": f"<s>[INST] {example['question']} [/INST] {example['answer']} </s>"
    }
    
train_dataset = train_data.map(format_instruction)
eval_dataset = test_data.map(format_instruction)

Generating train split: 100%|██████████| 160/160 [00:00<00:00, 2872.85 examples/s]
Generating test split: 100%|██████████| 40/40 [00:00<00:00, 9782.63 examples/s]
Map: 100%|██████████| 160/160 [00:00<00:00, 9816.12 examples/s]
Map: 100%|██████████| 40/40 [00:00<00:00, 5582.36 examples/s]


In [73]:
# QUESTION: Finetune a Llama 3.2 1B model on the training split of your SUTD QA dataset.
# You need to prepare your dataset accordingly and set the hyperparameters for the training.
# Push your finetuned model to the Hugginface model hub {YOUR_HF_NAME}/llama-3.2-1B-sutdqa

#--- ADD YOUR SOLUTION HERE (50 points)---
# load model with Unsloth optimizations
# following the docs at: https://docs.unsloth.ai/basics/tutorial-how-to-finetune-llama-3-and-use-in-ollama
# using base llama 3.2 and not instruct tuned: https://huggingface.co/unsloth/Llama-3.2-1B-bnb-4bit
# using the 4bit quantization version
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Llama-3.2-1B-bnb-4bit", 
    max_seq_length=2048,                       
    dtype=torch.float16,                       
    load_in_4bit=True,                          
)

# config
model = FastLanguageModel.get_peft_model(
    model,
    r=8,               # LoRA rank (set low for speed)
    lora_alpha=32,      # alpha parameter for LoRA
    lora_dropout=0.05,  # dropout parameter for LoRA
    target_modules=[    
        "q_proj", "k_proj", "v_proj", "o_proj", 
        "gate_proj", "up_proj", "down_proj"
    ],
    use_gradient_checkpointing=True,  # save memory with gradient checkpointing
    random_state=42,              
)


==((====))==  Unsloth 2025.3.19: Fast Llama patching. Transformers: 4.51.2.
   \\   /|    NVIDIA GeForce RTX 3050 Laptop GPU. Num GPUs = 1. Max memory: 4.0 GB. Platform: Windows.
O^O/ \_/ \    Torch: 2.6.0+cu118. CUDA: 8.6. CUDA Toolkit: 11.8. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = None. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


In [None]:
# QUESTION: Load a non-finetuned Llama 3.2 1B model and your finetuned SUTD QA Llama 3.2 1B model
# Ask it a simple test question (e.g. "What is special about SUTD?") to check that both models can generated answers

#--- ADD YOUR SOLUTION HERE (10 points)---



In [None]:
# try out the llms

query = "What is special about SUTD?"

print("Question:", query)
response_base = llm_base.invoke(query,  pipeline_kwargs={"max_new_tokens": 512})
print("Answer base:", response_base)

print("---------")
response_finetune = llm_finetune.invoke(query, pipeline_kwargs={"max_new_tokens": 512})
print("Answer finetune:", response_finetune)


# Integrate and evaluate

Now integrate both the non-finetuned Llama 3.2 1B model and your finetuned model into your SUTD chatbot RAG system. 
Generate responses to the 20 questions you have collected in assignment 3 using these 4 appraoches
1. non-finetuned Llama 3.2 1B model without RAG
2. finetuned Llama 3.2 1B SUTD QA model without RAG
3. non-finetuned Llama 3.2 1B model with RAG
4. finetuned Llama 3.2 1B SUTD QA model with RAG

Compare the responses and decide what system produces the most accurate and high quality responses

In [None]:
# QUESTION: Re-create the RAG chatbot system you have created in assignment 3 but with the Llama 3.2 1B (non-tuned and finetuned) models

#--- ADD YOUR SOLUTION HERE (40 points)---


# Bonus points: LLM-as-judge evaluation 

Implement an LLM-as-judge pipeline to assess the quality of the different system (finetuned vs. non-fintuned, RAG vs no RAG)



In [None]:
# QUESTION: Implement an LLM-as-judge pipeline to assess the quality of the different system (finetuned vs. non-fintuned, RAG vs no RAG)

#--- ADD YOUR SOLUTION HERE (40 points)---

# Bonus points: chatbot UI

Implement a web UI frontend for your chatbot that you can demo in class. 


In [None]:
# QUESTION: Implement a web UI frontend for your chatbot that you can demo in class. 

#--- ADD YOUR SOLUTION HERE (40 points)---

# End

This concludes assignment 4.

Please submit this notebook with your answers and the generated output cells as a **Jupyter notebook file** via github.


Every group member should do the following submission steps:
1. Create a private github repository **sutd_5055mlop** under your github user.
2. Add your instructors as collaborator: ddahlmeier and lucainiaoge
3. Save your submission as assignment_04_GROUP_NAME.ipynb where GROUP_NAME is the name of the group you have registered. 
4. Push the submission files to your repo 
5. Submit the link to the repo via eDimensions



**Assignment due 21 April 2025 11:59pm**