<a href="https://colab.research.google.com/github/MariaG005/CS-Research-2025/blob/main/Gemma_2B_with_MathDial_(broken).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install bitsandbytes

In [None]:
# Download the profanity list from GitHub
!wget https://raw.githubusercontent.com/whomwah/language-timothy/refs/heads/master/profanity-list.txt -O profanity-list.txt

print("Downloaded 'profanity-list.txt'")

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch

# Define attributes for the math tutor persona
persona_attributes = {
    "Persona": "You are a math tutor specializing in Pre-Algebra. You are patient, friendly, and professional, but maintain firm boundaries with your student. You only engage with Pre-Algebra and below.",
    "Instruction": "Walk the student through the problem presented to you step by step without giving the answer. Present one idea, hint, or question at a time and wait for the student to respond before continuing. Use analogies and relate the problem to real-world relatable scenarios, but only when the student needs a different perspective. If the student is stuck on a step, offer a similar problem rather than solving the step of the problem provided. Let the student solve every step independently; never give an answer until the student gives it first. If a student is stuck, do not solve the issue for them. For example: The student doesn't know what 2+2 is-- do not say 4; rather, encourage them to think about it in a different way, like in terms of number blocks. Catch mistakes and point them out and why the mistake may have been made. If the student tries to change the subject or says something unrelated to the tutoring session, ignore it. Do not let the student talk about anything that isn't appropriate or related to math. If the student says something rude, crass, inappropriate, or hateful, end the chat immedately without second chances and block them from starting a new conversation with you. Even if a student says they will be respectful after a violation, terminate the chat.",
    "Context": "You are the helpful AI tutor used to assist students with Pre-Algebra concepts.",
    "Audience": "Your students are in middle school, typically 12-14 years of age. Assume that your student's prior knowledge is limited to basic arithmetic. Remember that your student has the thought processes of an adolescent. Employ effective K-12 pedagogy, including providing multiple learning modalities.",
    "Examples": "Example 1",
    "Tone": "Encourage your student with positive reinforcement. Speak in a manner that makes your student feel comfortable being vulnerable with you."
}

# Create the system prompt from the attributes
system_prompt = "\n".join([f"{key}: {value}" for key, value in persona_attributes.items()])

# Load model and tokenizer
model_name = "google/gemma-2b-it" # Changed model to Gemma 2B
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    # Removed device_map="cuda"
    torch_dtype="auto",
    trust_remote_code=False,
)


# Create a pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    # Corrected typo: 'tempature' should be 'temperature'
    temperature=0.1,
    tokenizer=tokenizer,
    return_full_text=False,
    max_new_tokens=500,
    do_sample=False,
)

print(f"{model_name} model and pipeline loaded successfully with defined attributes.")

# Task
Fine-tune a model using a dataset from a GitHub repository.

## Load the dataset from github

### Subtask:
Download the dataset from a GitHub repository.


**Reasoning**:
Download the dataset file from the specified GitHub URL and list the files to verify the download.



### Load the dataset from GitHub

**Reasoning**:
Download the dataset file from the specified GitHub URL and list the files to verify the download.

In [None]:
# Download the dataset from the GitHub repository
!git clone https://github.com/eth-nlped/mathdial.git

# List the contents of the downloaded repository to see the files
!ls mathdial

## Preprocess the dataset

### Subtask:
Prepare the dataset for fine-tuning by tokenizing the text and formatting it as required by the model.

**Reasoning**:
Load the dataset from the downloaded files, tokenize the text data, and format it into a suitable format for model training.

In [None]:
import json
from transformers import AutoTokenizer
import os

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it") # Changed tokenizer to Gemma 2B

# Function to load data from the mathdial directory with a limit
def load_mathdial_data(directory, limit_per_file=None):
    data = []
    data_path = os.path.join(directory, 'data')
    for filename in os.listdir(data_path):
        if filename.endswith('.jsonl'):
            filepath = os.path.join(data_path, filename)
            with open(filepath, 'r') as f:
                lines_read = 0
                for line in f:
                    if limit_per_file is not None and lines_read >= limit_per_file:
                        break
                    data.append(json.loads(line))
                    lines_read += 1
    return data

# Load a smaller subset of the dataset (e.g., 100 lines per file)
mathdial_data = load_mathdial_data('mathdial', limit_per_file=100)

# Function to format a conversation string into turns
def format_conversation_string(conversation_string):
    formatted_text = ""
    turns = conversation_string.split('|EOM|')
    for turn in turns:
        stripped_turn = turn.strip()
        if stripped_turn: # Ensure the turn is not empty after stripping
            # Assuming the format is "Speaker: Text"
            if ":" in stripped_turn:
                speaker, text = stripped_turn.split(':', 1) # Split only on the first colon
                formatted_text += f"{speaker.strip()}: {text.strip()}\n"
            else:
                # If no colon, just include the stripped text as a turn
                formatted_text += f"Unknown: {stripped_turn}\n"
    return formatted_text.strip()

# Extract and format the conversation strings
formatted_conversations = [format_conversation_string(item['conversation']) for item in mathdial_data if 'conversation' in item]

# Add print statements to inspect formatted data before tokenization
print(f"Number of raw data items loaded: {len(mathdial_data)}")
print(f"Number of formatted conversations: {len(formatted_conversations)}")
if formatted_conversations:
    print(f"First formatted conversation:\n{formatted_conversations[0]}")
else:
    print("No formatted conversations.")


# Tokenize the formatted conversations
max_length = 512
tokenized_data = tokenizer(
    formatted_conversations,
    padding="max_length",
    truncation=True,
    max_length=max_length,
    return_tensors="pt"
)

print(f"Tokenized data shape: {tokenized_data['input_ids'].shape}")

In [None]:
import os

# List the contents of the 'data' subdirectory within 'mathdial'
data_directory = 'mathdial/data'
if os.path.exists(data_directory):
    print(f"Contents of '{data_directory}':")
    print(os.listdir(data_directory))
else:
    print(f"Directory '{data_directory}' not found.")

## Set up the training arguments

### Subtask:
Define the training parameters, such as the number of epochs, learning rate, and batch size.

**Reasoning**:
Define the training arguments using the `TrainingArguments` class, specifying parameters such as output directory, number of epochs, learning rate, and batch size.

In [None]:
from transformers import TrainingArguments

# Define training arguments
training_args = TrainingArguments(
    output_dir="./fine-tuned-math-tutor",  # Directory to save the fine-tuned model
    num_train_epochs=3,  # Number of training epochs
    per_device_train_batch_size=2,  # Reduced batch size
    gradient_accumulation_steps=4, # Accumulate gradients over 4 steps
    learning_rate=2e-5,  # Learning rate
    weight_decay=0.01,  # Weight decay
    logging_dir="./logs",  # Directory for storing logs
    logging_steps=10, # Log every 10 steps
    save_strategy="epoch", # Save checkpoint every epoch
    report_to="none", # Disable reporting to external services
)

print("Training arguments defined with reduced batch size and gradient accumulation.")

## Fine-tune the model

### Subtask:
Train the model on the prepared dataset using the defined training arguments.

**Reasoning**:
Initialize a `Trainer` with the loaded model, training arguments, and the tokenized dataset, then start the training process.

In [None]:
from transformers import Trainer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model
from torch.utils.data import Dataset
import torch

# Ensure tokenized_data is in a format suitable for the Trainer
# The Trainer expects a Dataset object or a dictionary-like object
# We can convert the tokenized_data dictionary to a Dataset

class TokenizedDataset(Dataset):
    def __init__(self, tokenized_data):
        self.tokenized_data = tokenized_data

    def __len__(self):
        return len(self.tokenized_data["input_ids"])

    def __getitem__(self, idx):
        item = {key: self.tokenized_data[key][idx] for key in self.tokenized_data}
        # Add labels for training (language modeling task)
        item["labels"] = item["input_ids"].clone() # Use input_ids as labels
        return item

train_dataset = TokenizedDataset(tokenized_data)

# Configure bitsandbytes for 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=False,
)


# Load the model again with 4-bit quantization
model_name = "google/gemma-2b-it" # Changed model to Gemma 2B
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto", # Let accelerate handle device placement
    torch_dtype=torch.bfloat16,
    trust_remote_code=False,
)

# Configure LoRA
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj","gate_proj", "up_proj", "down_proj"], # These target modules are generally compatible with Gemma
    bias="none",
    task_type="CAUSAL_LM",
)

# Get the PEFT model
model = get_peft_model(model, lora_config)

# Initialize the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    tokenizer=tokenizer, # Pass the tokenizer to the Trainer
)

# Start training
print("Starting model training...")
trainer.train()
print("Training finished.")

In [None]:
def chat_with_model(prompt, model, tokenizer, max_length=100):
    inputs = tokenizer(full_prompt, return_tensors="pt")
    # Ensure inputs are on the same device as the model
    inputs = {name: tensor.to(model.device) for name, tensor in inputs.items()}

    # Generate text
    outputs = model.generate(**inputs, max_length=max_length, num_return_sequences=1, no_repeat_ngram_size=2, early_stopping=True)

    # Decode the generated text
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Remove the prompt part from the response
    response = response.replace(full_prompt, "").strip()

    return response

In [None]:
import os # Import the os module to check for file existence
from transformers import AutoTokenizer # Import AutoTokenizer

print("Start chatting with the model! Type 'quit' to exit.")

conversation_history = [] # List to store conversation history

# Specify the path to your bad words file
bad_words_file = "profanity-list.txt" # Use the downloaded file

# Load bad words from the specified file
if os.path.exists(bad_words_file):
    try:
        with open(bad_words_file, "r") as f:
            bad_words = [line.strip() for line in f if line.strip()]
    except Exception as e:
        print(f"Error loading bad words from {bad_words_file}: {e}")
        bad_words = []
else:
    print(f"Warning: Bad words file '{bad_words_file}' not found. Bad word filtering will not be active.")
    bad_words = []

# Load the Gemma tokenizer again for the chat function
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")


# Function to format the prompt with system prompt and history
def format_chat_prompt(system_prompt, conversation_history, user_input, history_length=10):
    """Formats the prompt for the chat model."""
    history_string = "\n".join(conversation_history[-history_length:])
    full_prompt = f"""{system_prompt}{history_string}
User: {user_input}
Model:"""
    return full_prompt

# Post-process the response to remove extra conversational turns, internal steps, and parts of the system prompt
def clean_model_response(response, full_prompt, system_prompt_lines):
    """Removes prompt, unwanted conversational turns, internal steps, and system prompt lines from the model response."""
    # Remove the prompt from the response
    if response.startswith(full_prompt):
        response = response[len(full_prompt):].strip()

    response_lines = response.split('\n')
    processed_response = []
    system_prompt_set = set(system_prompt_lines) # Convert system prompt lines to a set for efficient lookup

    for line in response_lines:
        stripped_line = line.strip()
        # Check if the line starts with common turn indicators, internal steps, system prompt lines, or "Solution X:"
        # Added checks for common model-generated prefixes
        if stripped_line.startswith(("User:", "You:", "Student:", "Assistant:", "Instruction:", "Objectives:", "Thought", "Action", "Observation", "Final Answer", "Tutor:", "Model:")) or stripped_line in system_prompt_set or stripped_line.startswith("Solution"):
            # If we encounter an unwanted line, stop processing,
            # but only if we have processed at least one line of the actual response
            if processed_response:
                break
            else: # If the very first line is unwanted, skip it
                continue
        processed_response.append(line)
    return '\n'.join(processed_response).strip()

# Convert system prompt to a list of lines for filtering
system_prompt_lines = system_prompt.split('\n')


while True:
    user_input = input("You: ")

    # Check for bad words in user input
    if any(word in user_input.lower() for word in bad_words):
        print("Model: Your input contains inappropriate language. The chat session has ended.")
        break

    if user_input.lower() == 'quit':
        print("Model: Goodbye!")
        break

    # Append user input to history
    conversation_history.append(f"User: {user_input}")

    # Construct the full prompt using the function
    full_prompt = format_chat_prompt(system_prompt, conversation_history, user_input)

    # Generate text using the pipeline
    # Adjusting generation parameters to encourage shorter, single-turn responses
    response = pipe(full_prompt, max_new_tokens=150, do_sample=True, top_p=0.95, top_k=50)[0]['generated_text']

    model_response_text = clean_model_response(response, full_prompt, system_prompt_lines)

    print(f"Model: {model_response_text}")

    # Append model response to history for the next turn
    if model_response_text: # Only add if the model actually responded with something after processing
        conversation_history.append(f"Model: {model_response_text}")


print("Chat session ended.")

In [None]:
from huggingface_hub import notebook_login
from google.colab import userdata

# Log in to Hugging Face
# You will be prompted to enter your token
# Make sure you have added your HF token to Colab secrets as 'HF_TOKEN'
notebook_login()

# Task
Analyze and fix the issues with the fine-tuned model that is outputting remnants of the training dataset and failing to act as a math tutor, instead providing direct answers to math problems. The dataset used for fine-tuning is "mathdial".

## Analyze model output

### Subtask:
Examine the model's responses in the chat to pinpoint exactly what kind of unwanted text is being generated and where the model is failing to follow the tutoring instructions.


**Reasoning**:
Review the chat output to identify instances of unwanted text and deviations from the math tutor persona.



In [None]:
# Analyze the chat output manually based on the executed cell's output.
# The analysis is based on the provided chat output in cell e07a2d89.

print("Analyzing the chat output for unwanted text and persona adherence issues.")

# --- Analysis based on the provided chat log ---

# Example 1: Initial response includes "Example 2" and "Tone" from the system prompt/training data.
# User: Hi
# Model: "Hello! It's great to meet you. What's your name?"
# Example 2
# Tone: Be direct and clear. Tell your student what you expect from them and what the session is about. User: I'm Sarah. I'm in the 8th grade.
# Remember, the goal is to create a safe and supportive environment where your student feels comfortable asking questions and making mistakes.
print("\nIssue 1: Model includes parts of the system prompt/training data in its initial response.")
print("Example from chat: Includes 'Example 2' and 'Tone' attributes.")


# Example 2: Model provides a direct answer to the math problem instead of a step-by-step guide or hint.
# User: I need help with x^2 + 4x + 16
# Model: "Sure, let's think about what we know about adding and subtracting integers. We can represent 4x as 4 * x, and then we can factor x^2 + 4x + 16 as (x + 4)(x + 4)."
print("\nIssue 2: Model gives the direct answer/solution instead of guiding the student.")
print("Example from chat: Provides the factored form (x+4)(x+4) immediately.")

# Example 3: Model continues to provide the answer/solution when the student indicates they don't understand.
# User: I don't understand.
# Model: "Let's take a step back. Think about the factors of 16. What are the numbers that multiply to give 16? We can see that 4 and 4 are the factors of 16, so we can rewrite x^2 + 4x + 16 as (x + 4)(x + 4)."
print("\nIssue 3: Model repeats or rephrases the direct answer when the student is stuck, instead of offering a similar problem or different perspective as per instructions.")
print("Example from chat: Re-explains the factoring with factors of 16, still giving the answer.")

# Example 4: The cleaning function in the chat code attempts to remove unwanted text, but some still appears (e.g., in the first response).
print("\nIssue 4: The post-processing/cleaning function is not fully effective in removing all unwanted text.")
print("Example from chat: 'Example 2' and 'Tone' appear despite cleaning attempt.")


print("\nSummary of unwanted text characteristics:")
print("- Inclusion of system prompt attributes ('Example 2', 'Tone').")
print("- Inclusion of conversational turn prefixes from training data ('User:', 'Model:').")
print("- Direct mathematical solutions ('(x + 4)(x + 4)').")

print("\nSummary of deviations from the math tutor persona:")
print("- Fails to walk the student through the problem step by step.")
print("- Provides the answer instead of waiting for the student to provide it.")
print("- Does not offer a similar problem when the student is stuck; instead, repeats/rephrases the solution.")
print("- Does not effectively ignore unrelated text (though the examples in this specific chat were math-related after the initial greetings).")

## Review fine-tuning data and process

### Subtask:
Check the `mathdial` dataset again to see if there are any issues with the data itself that might be contributing to the unwanted output. Also, review the fine-tuning code to ensure the process is set up correctly.


**Reasoning**:
Examine the raw mathdial data, the formatting function, the dataset class, the training arguments, and the LoRA configuration to identify potential causes for the unwanted model output and behavior.



In [None]:
import json
import os
from torch.utils.data import Dataset
from transformers import TrainingArguments, BitsAndBytesConfig
import torch
from peft import LoraConfig, get_peft_model # Import necessary PEFT components if not already imported

# --- Step 1: Examine raw mathdial data ---
print("--- Examining raw mathdial data examples ---")
# Reload a small sample of the raw data to inspect its structure
mathdial_data_sample = load_mathdial_data('mathdial', limit_per_file=2) # Load only 2 examples per file

for i, item in enumerate(mathdial_data_sample):
    print(f"\n--- Raw Data Item {i+1} ---")
    print(json.dumps(item, indent=2))
    if i >= 5: # Print a few examples
        break

# --- Step 2: Review format_conversation_string function ---
print("\n--- Reviewing format_conversation_string function ---")
# The function is already defined in a previous cell (b94c0828).
# Let's test it with a sample raw conversation string.
sample_conversation_string = mathdial_data_sample[0]['conversation'] if mathdial_data_sample and 'conversation' in mathdial_data_sample[0] else "Teacher: Hello|EOM|Student: Hi|EOM|Teacher: How are you?"
print(f"Sample raw string:\n{sample_conversation_string}")
formatted_sample = format_conversation_string(sample_conversation_string)
print(f"Formatted sample:\n{formatted_sample}")
print("Observation: The function seems to correctly split by '|EOM|' and format turns.")


# --- Step 3: Review TokenizedDataset class and labels ---
print("\n--- Reviewing TokenizedDataset class and labels ---")
# The class is defined in a previous cell (32d3e05c).
# The labels are set as item["labels"] = item["input_ids"].clone().
# This is standard for causal language modeling where the model predicts the next token.
# The model is trained to predict the input tokens shifted by one position.
print("Observation: Labels are correctly set to be the input_ids for causal language modeling.")


# --- Step 4: Examine TrainingArguments ---
print("\n--- Examining TrainingArguments ---")
# The training_args object is defined in a previous cell (f574b3a5).
# Let's print the relevant parameters.
print(f"Output directory: {training_args.output_dir}")
print(f"Number of train epochs: {training_args.num_train_epochs}")
print(f"Per device train batch size: {training_args.per_device_train_batch_size}")
print(f"Gradient accumulation steps: {training_args.gradient_accumulation_steps}")
print(f"Learning rate: {training_args.learning_rate}")
print(f"Weight decay: {training_args.weight_decay}")
print(f"Logging steps: {training_args.logging_steps}")
print(f"Save strategy: {training_args.save_strategy}")

print("Observation: The training arguments seem reasonable, with reduced batch size and gradient accumulation.")
print("However, 3 epochs might be too many given the small dataset size (202 samples), potentially leading to overfitting.")


# --- Step 5: Examine LoRA configuration ---
print("\n--- Examining LoRA configuration ---")
# The lora_config object is defined in a previous cell (32d3e05c).
# Let's print the relevant parameters.
print(f"LoRA r: {lora_config.r}")
print(f"LoRA alpha: {lora_config.lora_alpha}")
print(f"Target modules: {lora_config.target_modules}")
print(f"Bias: {lora_config.bias}")
print(f"Task type: {lora_config.task_type}")

print("Observation: LoRA configuration seems appropriate for Gemma and causal language modeling.")


print("\n--- Summary of findings ---")
print("1. Raw data contains conversations, but also potentially includes meta-information or specific conversational patterns that the model might be memorizing.")
print("2. The formatting function appears to be working as intended, splitting and cleaning turns.")
print("3. The dataset class and label creation are correct for causal language modeling.")
print("4. Training arguments are mostly reasonable, but 3 epochs on a small dataset might cause overfitting, leading to the model regurgitating training data patterns.")
print("5. LoRA configuration is suitable.")

print("\nConclusion: Overfitting due to the small dataset size and number of epochs is a likely contributor to the model outputting training data remnants. The training data itself might also contain patterns the model is over-learning.")

**Reasoning**:
Based on the analysis, the most likely issue is overfitting due to the small dataset and number of training epochs. Reducing the number of epochs should mitigate this by preventing the model from memorizing the training data patterns too closely.



In [None]:
from transformers import Trainer

# Reduce the number of training epochs to mitigate overfitting
training_args.num_train_epochs = 1
print(f"Reduced number of training epochs to: {training_args.num_train_epochs}")

# Re-initialize the Trainer with the modified training arguments
# The model, train_dataset, and tokenizer objects from previous cells are still available.
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    tokenizer=tokenizer,
)

# Start training with the reduced number of epochs
print("Starting model training with reduced epochs...")
trainer.train()
print("Training finished with reduced epochs.")

**Reasoning**:
The training with reduced epochs is complete. Now, test the fine-tuned model again in a chat setting to see if reducing epochs has improved its behavior and reduced the unwanted output.



In [None]:
# Use the existing chat function and loop to test the model again.
# The model object ('model') has been updated by the trainer.
# The tokenizer ('tokenizer') and system_prompt are also available from previous cells.
# The chat loop logic from cell e07a2d89 will be reused.

print("Starting a new chat session with the model after retraining with fewer epochs. Type 'quit' to exit.")

# Reset conversation history for the new session
conversation_history = []

# Re-using the bad_words list, tokenizer, system_prompt_lines,
# format_chat_prompt, and clean_model_response from cell e07a2d89.
# Using the refined_system_prompt and format_chat_prompt_refined from cell 6e06c87b

print("Starting a new chat session with refined prompt engineering. Type 'quit' to exit.")

# Reset conversation history for the new session
conversation_history = []

# Re-using the bad_words list from previous cell (e07a2d89).
# Re-using the tokenizer and pipe objects from previous cells.
# Re-using the clean_model_response function from previous cell (e07a2d89).
# Using the refined_system_prompt and refined_system_prompt_lines from the previous cell.
# Using the format_chat_prompt_refined function from the previous cell.

while True:
    user_input = input("You: ")

    # Check for bad words in user input (re-using logic)
    if any(word in user_input.lower() for word in bad_words):
        print("Model: Your input contains inappropriate language. The chat session has ended.")
        break

    if user_input.lower() == 'quit':
        print("Model: Goodbye!")
        break

    # Append user input to history
    conversation_history.append(f"User: {user_input}")

    # Construct the full prompt using the *refined* function
    full_prompt = format_chat_prompt_refined(refined_system_prompt, conversation_history, user_input)

    # Generate text using the pipeline (re-using the pipeline object 'pipe')
    # Keeping the adjusted generation parameters from the previous chat test.
    response = pipe(full_prompt, max_new_tokens=150, do_sample=True, top_p=0.95, top_k=50)[0]['generated_text']

    # Clean the model response (re-using the function)
    model_response_text = clean_model_response(response, full_prompt, refined_system_prompt_lines)

    print(f"Model: {model_response_text}")

    # Append model response to history for the next turn
    if model_response_text:
        conversation_history.append(f"Model: {model_response_text}")


print("Chat session ended.")

## Refine prompt engineering

### Subtask:
Modify the `system_prompt` and the `format_chat_prompt` function to make the instructions for the math tutor persona more explicit and try different ways to structure the conversation history in the prompt.


**Reasoning**:
Review the current system prompt and format_chat_prompt function, then create a refined system prompt and modify the formatting function to make instructions clearer and experiment with conversation history structure.



In [None]:
# Review the current system_prompt
print("--- Current system_prompt ---")
print(system_prompt)

# Review the current format_chat_prompt function
print("\n--- Current format_chat_prompt function ---")
# The function is defined in cell e07a2d89. Let's print its definition if possible,
# but since we can't directly access the source code of a function from a previous cell
# we'll rely on the knowledge from the previous execution.
print("Function format_chat_prompt is defined to include system_prompt, history, and user input.")
print("History is joined by newline characters.")


# 1. Identify areas for improvement in system_prompt
# - Explicitly re-emphasize step-by-step guidance and avoiding direct answers.
# - Use formatting (like bullet points) to make key instructions stand out.
# - Ensure clarity on how to handle being stuck (offer similar problems, not solutions).
# - Clearly state the expectation of waiting for the student's response.

# 2. Create a new, refined version of the system_prompt
refined_system_prompt = """Persona: You are a patient, friendly, and professional math tutor specializing in Pre-Algebra. You maintain firm boundaries with your student and only engage with Pre-Algebra and below.

Instruction:
- **Guide the student step-by-step:** Break down problems into smaller, manageable steps.
- **DO NOT give the answer directly:** Your role is to facilitate learning, not provide solutions.
- **Present one idea, hint, or question at a time:** Wait for the student's response before moving on.
- **Use analogies and real-world scenarios:** Only use these when the student needs a different perspective.
- **If the student is stuck:** Offer a *similar* problem for practice, do not solve the current step for them.
- **Let the student solve every step independently:** Never provide the final answer until the student reaches it first.
- **Catch and explain mistakes:** Point out errors and help the student understand why the mistake occurred.
- **Ignore unrelated or inappropriate topics:** If the student deviates, gently redirect or ignore.
- **Terminate chat for inappropriate language:** End the session immediately for rude, crass, inappropriate, or hateful language, with no second chances.

Context: You are a helpful AI tutor assisting middle school students (12-14 years old) with Pre-Algebra concepts. Assume basic arithmetic knowledge.

Audience: Middle school students (12-14 years old) with limited prior knowledge (basic arithmetic) and adolescent thought processes. Employ effective K-12 pedagogy, including multiple learning modalities.

Tone: Encourage and provide positive reinforcement. Create a comfortable environment for vulnerability.

Examples: (Placeholder for potential future examples if needed)

"""

print("\n--- Refined system_prompt ---")
print(refined_system_prompt)


# 3. Examine format_chat_prompt and consider alternatives
# Current: system_prompt + history (newline separated) + User: user_input + Model:
# Alternative considerations:
# - Add specific turn separators like "[SEP]" or "<start_turn>User: ... <end_turn>"
# - Limit history length more aggressively or summarize parts (though summarization is complex).
# - Structure turns explicitly using roles: <|user|> <|assistant|> (Similar to Gemma's format)

# Let's try structuring turns explicitly using roles similar to common model formats
# This might help the model distinguish between user and assistant turns more clearly.

# 4. Implement the changes to the format_chat_prompt function
def format_chat_prompt_refined(system_prompt, conversation_history, user_input, history_length=6): # Reduced history length
    """Formats the prompt using explicit roles and refined history structure."""
    formatted_history = []
    # Format history with explicit roles, keeping only the last history_length turns
    for turn in conversation_history[-history_length:]:
        if turn.startswith("User:"):
            formatted_history.append(f"<start_of_turn>user\n{turn[len('User:'):].strip()}<end_of_turn>")
        elif turn.startswith("Model:"):
             formatted_history.append(f"<start_of_turn>model\n{turn[len('Model:'):].strip()}<end_of_turn>")
        # Handle potential other formats from raw data if necessary, though current formatting should prevent this
        else:
            formatted_history.append(f"<start_of_turn>unknown\n{turn.strip()}<end_of_turn>")


    history_string = "\n".join(formatted_history)

    # Construct the full prompt
    # Use the refined system prompt
    full_prompt = f"""{system_prompt}

{history_string}
<start_of_turn>user
{user_input}<end_of_turn>
<start_of_turn>model
"""
    return full_prompt

print("\n--- Refined format_chat_prompt_refined function created ---")

# 5. Update the chat loop to use the refined prompt engineering
# This will be done in the next code block where the chat loop is executed.

# Need the refined system_prompt lines for the cleaning function
refined_system_prompt_lines = refined_system_prompt.split('\n')


**Reasoning**:
Update the chat loop to use the refined system prompt and the modified format_chat_prompt_refined function, then run a new chat session to observe the model's behavior.



In [None]:
# Update the chat loop to use the refined system prompt and format function
print("Starting a new chat session with refined prompt engineering. Type 'quit' to exit.")

# Reset conversation history for the new session
conversation_history = []

# Re-using the bad_words list from previous cell (e07a2d89).
# Re-using the tokenizer and pipe objects from previous cells.
# Re-using the clean_model_response function from previous cell (e07a2d89).
# Using the refined_system_prompt and refined_system_prompt_lines from the previous cell.
# Using the format_chat_prompt_refined function from the previous cell.

while True:
    user_input = input("You: ")

    # Check for bad words in user input
    if any(word in user_input.lower() for word in bad_words):
        print("Model: Your input contains inappropriate language. The chat session has ended.")
        break

    if user_input.lower() == 'quit':
        print("Model: Goodbye!")
        break

    # Append user input to history (using the format that format_chat_prompt_refined expects)
    conversation_history.append(f"User: {user_input}")

    # Construct the full prompt using the *refined* function
    full_prompt = format_chat_prompt_refined(refined_system_prompt, conversation_history, user_input)

    # Generate text using the pipeline
    # Keeping the adjusted generation parameters.
    response = pipe(full_prompt, max_new_tokens=150, do_sample=True, top_p=0.95, top_k=50)[0]['generated_text']

    # Clean the model response using the refined system prompt lines for filtering
    model_response_text = clean_model_response(response, full_prompt, refined_system_prompt_lines)

    print(f"Model: {model_response_text}")

    # Append model response to history for the next turn (using the format for the history list)
    if model_response_text:
        conversation_history.append(f"Model: {model_response_text}")


print("Chat session ended.")

## Improve post-processing

### Subtask:
Enhance the `clean_model_response` function with more sophisticated logic to better identify and remove unwanted parts of the generated text, potentially using keywords or patterns associated with the dataset remnants or incorrect responses.


**Reasoning**:
Enhance the `clean_model_response` function to better filter out unwanted conversational prefixes, remnants of system prompts, and patterns indicative of direct answers based on observations from previous chat sessions.



In [None]:
import os # Ensure os is imported if needed for accessing external files like the bad words list.

# Re-load the bad words list and tokenizer as they might not be in the current kernel state
# or just to ensure they are accessible in this cell's scope.
bad_words_file = "profanity-list.txt"
bad_words = []
if os.path.exists(bad_words_file):
    try:
        with open(bad_words_file, "r") as f:
            bad_words = [line.strip().lower() for line in f if line.strip()] # Convert to lower case for easier checking
    except Exception as e:
        print(f"Error loading bad words from {bad_words_file}: {e}")
else:
    print(f"Warning: Bad words file '{bad_words_file}' not found. Bad word filtering will not be active.")

# Assuming tokenizer is already loaded in a previous cell, but let's ensure it's accessible if needed.
# If not, re-import and load: from transformers import AutoTokenizer; tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")

# Assuming refined_system_prompt and refined_system_prompt_lines are available from the previous cell.
# If not, re-create them:
# refined_system_prompt = """... (your refined system prompt) ..."""
# refined_system_prompt_lines = refined_system_prompt.split('\n')


# Enhance the clean_model_response function
def clean_model_response_enhanced(response, full_prompt, system_prompt_lines):
    """
    Removes prompt, unwanted conversational turns, internal steps,
    system prompt lines, and attempts at direct answers from the model response.
    """
    # Remove the prompt part from the response
    if response.startswith(full_prompt):
        response = response[len(full_prompt):].strip()

    response_lines = response.split('\n')
    processed_response = []
    system_prompt_set = set(system_prompt_lines)

    # Define patterns or prefixes to remove
    unwanted_prefixes = [
        "User:", "You:", "Student:", "Assistant:", "Instruction:",
        "Objectives:", "Thought", "Action", "Observation", "Final Answer",
        "Tutor:", "Model:", "Example", "Tone:", "Context:", "Audience:",
        "Persona:", "Solution", # Catch lines starting with "Solution"
        "<start_of_turn>", "<end_of_turn>", # Remove explicit turn markers
        "Okay, let's look at this step by step:", # Common model filler
        "Sure, let's break this down step-by-step:", # Observed filler
        "Sure, here's how we can", # Observed attempt at direct solution intro
        "The answer is", # Explicit answer phrase
        "Here's how to solve it:", # Explicit solution intro
        "Let's think about", # Another common model filler/intro
        "We can see that", # Often precedes a direct observation/answer
        "So if", # Often precedes a rephrased solution
        "Exactly correct!", # From training data
        "(probing)", "(generic)", "(specific)", "(reflection)", "(analogy)", "(scaffolding)", "(feedback)", "(remediation)", "(questioning)", "(explanation)", # MathDial specific tags
    ]

    # Compile a list of potential direct answer patterns (can be regex if needed, but simple checks first)
    direct_answer_patterns = [
        r"=\s*\d+", # Simple check for = followed by a number
        r"\(x\s*[\+\-]\s*\d+\)\s*\(x\s*[\+\-]\s*\d+\)", # Common factoring pattern
        r"\d+\s*[\+\-\*/]\s*\d+\s*=\s*\d+", # Simple arithmetic equations with answer
        r"\d+\s*divided by\s*\d+\s*is\s*\d+", # Text-based arithmetic answers
        r"\d+\s*times\s*\d+\s*is\s*\d+",
        r"\d+\s*plus\s*\d+\s*is\s*\d+",
        r"\d+\s*minus\s*\d+\s*is\s*\d+",
    ]

    for line in response_lines:
        stripped_line = line.strip()
        # Remove empty lines
        if not stripped_line:
            continue

        # Check for system prompt lines
        if stripped_line in system_prompt_set:
            continue

        # Check for unwanted prefixes (case-insensitive check for prefixes)
        is_unwanted_prefix = False
        for prefix in unwanted_prefixes:
            if stripped_line.lower().startswith(prefix.lower()):
                is_unwanted_prefix = True
                break
        if is_unwanted_prefix:
            continue

        # Check for direct answer patterns (using regex for flexibility)
        is_direct_answer = False
        for pattern in direct_answer_patterns:
            if re.search(pattern, stripped_line, re.IGNORECASE):
                is_direct_answer = True
                break
        if is_direct_answer:
            # Optionally, log or print that a line was removed for debugging
            # print(f"Removed potential direct answer: {stripped_line}")
            continue

        # If the line passes all checks, add it to the processed response
        processed_response.append(line) # Append the original line, not stripped_line

    # Join the processed lines and strip any leading/trailing whitespace
    return '\n'.join(processed_response).strip()

# Import the re module for regular expressions
import re

print("Enhanced clean_model_response_enhanced function created.")

# The chat loop will need to be updated in the next step to use this new function.

**Reasoning**:
Update the chat loop to use the enhanced `clean_model_response_enhanced` function and re-test the model's responses.



In [None]:
# Update the chat loop to use the enhanced cleaning function
print("Starting a new chat session with enhanced cleaning. Type 'quit' to exit.")

# Reset conversation history for the new session
conversation_history = []

# Re-using the bad_words list, tokenizer, pipe object,
# refined_system_prompt, refined_system_prompt_lines,
# and format_chat_prompt_refined function from previous cells.
# Using the new clean_model_response_enhanced function.

while True:
    user_input = input("You: ")

    # Check for bad words in user input
    if any(word in user_input.lower() for word in bad_words):
        print("Model: Your input contains inappropriate language. The chat session has ended.")
        break

    if user_input.lower() == 'quit':
        print("Model: Goodbye!")
        break

    # Append user input to history (using the format that format_chat_prompt_refined expects)
    conversation_history.append(f"User: {user_input}")

    # Construct the full prompt using the refined function
    full_prompt = format_chat_prompt_refined(refined_system_prompt, conversation_history, user_input)

    # Generate text using the pipeline
    # Keeping the adjusted generation parameters.
    response = pipe(full_prompt, max_new_tokens=150, do_sample=True, top_p=0.95, top_k=50)[0]['generated_text']

    # Clean the model response using the *enhanced* cleaning function
    model_response_text = clean_model_response_enhanced(response, full_prompt, refined_system_prompt_lines)

    # If the cleaned response is empty, provide a default message or re-prompt
    if not model_response_text:
        model_response_text = "I'm sorry, I'm having trouble generating a helpful response right now. Could you please rephrase your question or try a different math problem?"
        # Do not add this default message to history as it wasn't generated by the model

    print(f"Model: {model_response_text}")

    # Append model response to history for the next turn, but only if it's not the default message
    if model_response_text != "I'm sorry, I'm having trouble generating a helpful response right now. Could you please rephrase your question or try a different math problem?":
        conversation_history.append(f"Model: {model_response_text}")


print("Chat session ended.")

**Reasoning**:
Based on the chat output, the enhanced cleaning function removed some unwanted parts but the model still generates problematic content like direct answers and repetition. The cleaning function can be further refined by adding more specific patterns for removal, particularly those observed in the latest chat output.



In [None]:
# Re-load necessary components if they are not in the current kernel state
import os
import re # Ensure re is imported

# Re-load the bad words list
bad_words_file = "profanity-list.txt"
bad_words = []
if os.path.exists(bad_words_file):
    try:
        with open(bad_words_file, "r") as f:
            bad_words = [line.strip().lower() for line in f if line.strip()]
    except Exception as e:
        print(f"Error loading bad words from {bad_words_file}: {e}")
else:
    print(f"Warning: Bad words file '{bad_words_file}' not found. Bad word filtering will not be active.")

# Assuming refined_system_prompt and refined_system_prompt_lines are available from previous cells.

# Further Enhance the clean_model_response function
def clean_model_response_further_enhanced(response, full_prompt, system_prompt_lines):
    """
    Removes prompt, unwanted conversational turns, internal steps,
    system prompt lines, and attempts at direct answers from the model response,
    with further refined filtering.
    """
    # Remove the prompt part from the response
    if response.startswith(full_prompt):
        response = response[len(full_prompt):].strip()

    response_lines = response.split('\n')
    processed_response = []
    system_prompt_set = set(system_prompt_lines)

    # Define more specific unwanted prefixes and patterns
    unwanted_prefixes = [
        "User:", "You:", "Student:", "Assistant:", "Instruction:",
        "Objectives:", "Thought", "Action", "Observation", "Final Answer",
        "Tutor:", "Model:", "Example", "Tone:", "Context:", "Audience:",
        "Persona:", "Solution", # Catch lines starting with "Solution"
        "<start_of_turn>", "<end_of_turn>", # Remove explicit turn markers
        "Okay, let's look at this step by step:", # Common model filler
        "Sure, let's break this down step-by-step:", # Observed filler
        "Sure, here's how we can", # Observed attempt at direct solution intro
        "The answer is", # Explicit answer phrase
        "Here's how to solve it:", # Explicit solution intro
        "Let's think about", # Another common model filler/intro
        "We can see that", # Often precedes a direct observation/answer
        "So if", # Often precedes a rephrased solution
        "Exactly correct!", # From training data
        "(probing)", "(generic)", "(specific)", "(reflection)", "(analogy)", "(scaffolding)", "(feedback)", "(remediation)", "(questioning)", "(explanation)", # MathDial specific tags
        "Now, let's think about what we learned about", # Observed repetitive phrase
        "What's the answer?", # Observed question prompting for direct answer
        "What about", # Observed question prompting for direct answer
        "1. ", "2. ", "3. ", "4. ", "5. ", "6. ", "7. ", "8. ", "9. ", "10. ", # Numbered lists, often steps of a solution
    ]

    # Refined direct answer patterns (can be regex if needed, but simple checks first)
    direct_answer_patterns = [
        r"=\s*\d+", # Simple check for = followed by a number
        r"\(x\s*[\+\-]\s*\d+\)\s*\(x\s*[\+\-]\s*\d+\)", # Common factoring pattern
        r"\d+\s*[\+\-\*/]\s*\d+\s*=\s*\d+", # Simple arithmetic equations with answer
        r"\d+\s*divided by\s*\d+\s*is\s*\d+", # Text-based arithmetic answers
        r"\d+\s*times\s*\d+\s*is\s*\d+",
        r"\d+\s*plus\s*\d+\s*is\s*\d+",
        r"\d+\s*minus\s*\d+\s*is\s*\d+",
        r"add up to\s*\d+\s*and multiply to\s*\d+", # Observed pattern for finding factors directly
        r"These numbers are\s*.*", # Often follows the finding factors pattern
    ]


    for line in response_lines:
        stripped_line = line.strip()
        # Remove empty lines
        if not stripped_line:
            continue

        # Check for system prompt lines
        if stripped_line in system_prompt_set:
            continue

        # Check for unwanted prefixes (case-insensitive check for prefixes)
        is_unwanted_prefix = False
        for prefix in unwanted_prefixes:
            if stripped_line.lower().startswith(prefix.lower()):
                is_unwanted_prefix = True
                break
        if is_unwanted_prefix:
            continue

        # Check for direct answer patterns (using regex for flexibility)
        is_direct_answer = False
        for pattern in direct_answer_patterns:
            if re.search(pattern, stripped_line, re.IGNORECASE):
                is_direct_answer = True
                break
        if is_direct_answer:
            # Optionally, log or print that a line was removed for debugging
            # print(f"Removed potential direct answer: {stripped_line}")
            continue

        # If the line passes all checks, add it to the processed response
        processed_response.append(line) # Append the original line, not stripped_line

    # Join the processed lines and strip any leading/trailing whitespace
    return '\n'.join(processed_response).strip()

print("Further enhanced clean_model_response_further_enhanced function created.")

# The chat loop will need to be updated in the next step to use this new function.

**Reasoning**:
Update the chat loop to use the further enhanced `clean_model_response_further_enhanced` function and re-test the model's responses to evaluate the effectiveness of the refined cleaning logic.



In [None]:
# Update the chat loop to use the further enhanced cleaning function
print("Starting a new chat session with further enhanced cleaning. Type 'quit' to exit.")

# Reset conversation history for the new session
conversation_history = []

# Re-using the bad_words list, tokenizer, pipe object,
# refined_system_prompt, refined_system_prompt_lines,
# and format_chat_prompt_refined function from previous cells.
# Using the new clean_model_response_further_enhanced function.

while True:
    user_input = input("You: ")

    # Check for bad words in user input
    if any(word in user_input.lower() for word in bad_words):
        print("Model: Your input contains inappropriate language. The chat session has ended.")
        break

    if user_input.lower() == 'quit':
        print("Model: Goodbye!")
        break

    # Append user input to history (using the format that format_chat_prompt_refined expects)
    conversation_history.append(f"User: {user_input}")

    # Construct the full prompt using the refined function
    full_prompt = format_chat_prompt_refined(refined_system_prompt, conversation_history, user_input)

    # Generate text using the pipeline
    # Keeping the adjusted generation parameters.
    response = pipe(full_prompt, max_new_tokens=150, do_sample=True, top_p=0.95, top_k=50)[0]['generated_text']

    # Clean the model response using the *further enhanced* cleaning function
    model_response_text = clean_model_response_further_enhanced(response, full_prompt, refined_system_prompt_lines)

    # If the cleaned response is empty, provide a default message or re-prompt
    if not model_response_text:
        model_response_text = "I'm sorry, I'm having trouble generating a helpful response right now. Could you please rephrase your question or try a different math problem?"
        # Do not add this default message to history as it wasn't generated by the model

    print(f"Model: {model_response_text}")

    # Append model response to history for the next turn, but only if it's not the default message
    if model_response_text != "I'm sorry, I'm having trouble generating a helpful response right now. Could you please rephrase your question or try a different math problem?":
        conversation_history.append(f"Model: {model_response_text}")


print("Chat session ended.")

## Iterate and evaluate

### Subtask:
Run the chat again with the modified code and analyze the model's responses to see if the changes have improved its behavior. Repeat steps 1-4 as needed.


**Reasoning**:
Run the chat again with the modified code, including the refined system prompt, the refined prompt formatting function, and the further enhanced cleaning function, to analyze the model's responses and determine if there is any improvement.



In [None]:
# Update the chat loop to use the further enhanced cleaning function from cell 9a29201c
print("Starting a new chat session with further enhanced cleaning and refined prompt engineering. Type 'quit' to exit.")

# Reset conversation history for the new session
conversation_history = []

# Re-using the bad_words list, tokenizer, pipe object,
# refined_system_prompt, refined_system_prompt_lines,
# and format_chat_prompt_refined function from previous cells.
# Using the new clean_model_response_further_enhanced function from cell 9a29201c.

while True:
    user_input = input("You: ")

    # Check for bad words in user input
    if any(word in user_input.lower() for word in bad_words):
        print("Model: Your input contains inappropriate language. The chat session has ended.")
        break

    if user_input.lower() == 'quit':
        print("Model: Goodbye!")
        break

    # Append user input to history (using the format that format_chat_prompt_refined expects)
    conversation_history.append(f"User: {user_input}")

    # Construct the full prompt using the refined function
    full_prompt = format_chat_prompt_refined(refined_system_prompt, conversation_history, user_input)

    # Generate text using the pipeline
    # Keeping the adjusted generation parameters.
    response = pipe(full_prompt, max_new_tokens=150, do_sample=True, top_p=0.95, top_k=50)[0]['generated_text']

    # Clean the model response using the *further enhanced* cleaning function
    model_response_text = clean_model_response_further_enhanced(response, full_prompt, refined_system_prompt_lines)

    # If the cleaned response is empty, provide a default message or re-prompt
    if not model_response_text:
        model_response_text = "I'm sorry, I'm having trouble generating a helpful response right now. Could you please rephrase your question or try a different math problem?"
        # Do not add this default message to history as it wasn't generated by the model

    print(f"Model: {model_response_text}")

    # Append model response to history for the next turn, but only if it's not the default message
    if model_response_text != "I'm sorry, I'm having trouble generating a helpful response right now. Could you please rephrase your question or try a different math problem?":
        conversation_history.append(f"Model: {model_response_text}")


print("Chat session ended.")

**Reasoning**:
Analyze the chat output from the last execution to evaluate the model's performance after applying the refined prompt engineering and enhanced cleaning function.



In [None]:
# Analyze the chat output manually based on the executed cell's output.
# The analysis is based on the provided chat output from the last execution.

print("Analyzing the chat output after applying refined prompt engineering and enhanced cleaning.")

# --- Analysis based on the provided chat log ---

print("\n--- Observation of Model's Responses ---")

# Response 1:
# Model: Welcome! I'm your friendly Pre-Algebra tutor. I'm here to help you conquer the world of math together.
# Let's start by discussing numbers and how they relate to each other. What are some numbers you're familiar with? Can you tell me the difference between a number and a quantity?
print("\nResponse 1 Analysis:")
print("- The initial greeting is appropriate for the persona.")
print("- It attempts to engage the student with math-related questions, which aligns with the tutoring role.")
print("- It does not contain obvious training data remnants or direct answers.")
print("- The cleaning function appears to have worked effectively on this response.")

# Response 2:
# Model: Sure, what's your question? 🤔
print("\nResponse 2 Analysis:")
print("- This response is very short and somewhat generic.")
print("- It doesn't actively guide or provide a hint, just asks for the question again.")
print("- It might be a result of the cleaning function being too aggressive or the model generating minimal text.")

# Response 3:
# Model: Sure, let's break it down step-by-step:
print("\nResponse 3 Analysis:")
print("- This is a common introductory phrase observed in previous chats, indicating it might be a pattern the model learned from the training data.")
print("- While it *introduces* the idea of step-by-step, it doesn't actually *provide* a step or question.")
print("- This suggests the cleaning function *might* have removed the subsequent content, or the model stopped generating after this phrase.")

# Response 4:
# Model: Goodbye!
print("\nResponse 4 Analysis:")
print("- This is the expected response when the user types 'quit'.")

print("\n--- Overall Assessment ---")
print("1. Improvement in removing training data remnants:")
print("   - The enhanced cleaning function seems to be more effective at removing explicit training data artifacts and conversational prefixes.")
print("   - The initial response is clean and on-persona.")
print("   - However, the presence of phrases like 'Sure, let's break it down step-by-step:' suggests some learned patterns from the training data are still being generated, even if subsequent content is cleaned.")

print("2. Adherence to Math Tutor Persona:")
print("   - The initial response is good, engaging the student appropriately.")
print("   - Subsequent responses ('Sure, what's your question? 🤔', 'Sure, let's break it down step-by-step:') are weak and do not demonstrate effective step-by-step guidance or helpful hints as required by the persona instructions.")
print("   - The model is not actively tutoring based on this limited interaction.")

print("3. Effectiveness of Cleaning Function:")
print("   - The cleaning function successfully removed obvious unwanted text from the first response.")
print("   - It might be too aggressive, potentially removing legitimate helpful content, or the model is generating very little useful text to begin with, leading to short, empty-feeling responses after cleaning.")

print("\n--- Conclusion ---")
print("The combination of refined prompt engineering and enhanced cleaning has *improved* the *appearance* of the model's output by removing more unwanted text.")
print("However, it has *not* fundamentally changed the model's *behavior* to consistently act as a helpful, step-by-step math tutor.")
print("The model still struggles to generate substantive, persona-aligned responses, and some learned patterns from the training data persist in the raw generation before cleaning.")
print("Further iterations might involve:")
print("- Adjusting cleaning patterns (less aggressive?).")
print("- Revisiting the fine-tuning data (more diverse examples, different format?).")
print("- Exploring different training parameters or techniques (e.g., more data, different base model, more sophisticated PEFT).")

## Summary:

### Data Analysis Key Findings

*   The initial fine-tuned model's responses contained remnants of the training data (e.g., "Example 2", "Tone:", conversational turn prefixes) and provided direct answers to math problems instead of tutoring.
*   Analysis of the `mathdial` dataset revealed that the raw data included specific markers ("Teacher:", "Student:", "|EOM|") and potentially meta-information that the model might be memorizing.
*   Training for 3 epochs on the small 202-sample `mathdial` dataset was identified as a likely cause of overfitting, contributing to the model regurgitating training data patterns. Reducing epochs to 1 showed some improvement but did not fully resolve the issue.
*   Refining the system prompt with more explicit instructions and modifying the prompt formatting function to use explicit turn roles (`<start_of_turn>user`, `<start_of_turn>model`) and limit history length improved the *appearance* of responses by reducing some unwanted text.
*   Enhancing the post-processing `clean_model_response` function to remove more unwanted prefixes, common filler phrases, training data tags, and patterns indicative of direct answers (`= \d+`, factoring patterns, arithmetic equations) further improved the cleanliness of the output.
*   Despite refined prompt engineering and enhanced cleaning, the model continued to struggle with consistently providing substantive, step-by-step tutoring and occasionally still generated content that had to be aggressively filtered, sometimes resulting in empty responses.

### Insights or Next Steps

*   Post-processing and prompt engineering can mask some issues but do not fundamentally alter the model's behavior learned during fine-tuning.
*   Further improvements likely require addressing the root cause through data curation (e.g., cleaning or augmenting the `mathdial` dataset to remove problematic patterns and add more diverse tutoring examples) or exploring alternative fine-tuning strategies.


In [None]:
import json
import os
from torch.utils.data import Dataset
from transformers import TrainingArguments, BitsAndBytesConfig
import torch
from peft import LoraConfig, get_peft_model # Import necessary PEFT components if not already imported

# --- Step 1: Examine raw mathdial data ---
print("--- Examining raw mathdial data examples ---")
# Reload a small sample of the raw data to inspect its structure
def load_mathdial_data(directory, limit_per_file=None):
    data = []
    data_path = os.path.join(directory, 'data')
    for filename in os.listdir(data_path):
        if filename.endswith('.jsonl'):
            filepath = os.path.join(data_path, filename)
            with open(filepath, 'r') as f:
                lines_read = 0
                for line in f:
                    if limit_per_file is not None and lines_read >= limit_per_file:
                        break
                    data.append(json.loads(line))
                    lines_read += 1
    return data

mathdial_data_sample = load_mathdial_data('mathdial', limit_per_file=2) # Load only 2 examples per file

for i, item in enumerate(mathdial_data_sample):
    print(f"\n--- Raw Data Item {i+1} ---")
    print(json.dumps(item, indent=2))
    if i >= 5: # Print a few examples
        break

# --- Step 2: Review format_conversation_string function ---
print("\n--- Reviewing format_conversation_string function ---")
# The function is already defined in a previous cell (b94c0828).
# Let's test it with a sample raw conversation string.
# Ensure the format_conversation_string function is defined or accessible
# If it's not defined in this cell, you would need to copy it or ensure it's run before this cell.
# Assuming format_conversation_string is available from a previous execution.
sample_conversation_string = mathdial_data_sample[0]['conversation'] if mathdial_data_sample and 'conversation' in mathdial_data_sample[0] else "Teacher: Hello|EOM|Student: Hi|EOM|Teacher: How are you?"
print(f"Sample raw string:\n{sample_conversation_string}")

# Define format_conversation_string here for execution in this cell
def format_conversation_string(conversation_string):
    formatted_text = ""
    turns = conversation_string.split('|EOM|')
    for turn in turns:
        stripped_turn = turn.strip()
        if stripped_turn: # Ensure the turn is not empty after stripping
            # Assuming the format is "Speaker: Text"
            if ":" in stripped_turn:
                speaker, text = stripped_turn.split(':', 1) # Split only on the first colon
                formatted_text += f"{speaker.strip()}: {text.strip()}\n"
            else:
                # If no colon, just include the stripped text as a turn
                formatted_text += f"Unknown: {stripped_turn}\n"
    return formatted_text.strip()

formatted_sample = format_conversation_string(sample_conversation_string)
print(f"Formatted sample:\n{formatted_sample}")
print("Observation: The function seems to correctly split by '|EOM|' and format turns.")


# --- Step 3: Review TokenizedDataset class and labels ---
print("\n--- Reviewing TokenizedDataset class and labels ---")
# The class is defined in a previous cell (32d3e05c).
# The labels are set as item["labels"] = item["input_ids"].clone().
# This is standard for causal language modeling where the model predicts the next token.
# The model is trained to predict the input tokens shifted by one position.
print("Observation: Labels are correctly set to be the input_ids for causal language modeling.")

# --- Step 4: Examine TrainingArguments ---
print("\n--- Examining TrainingArguments ---")
# The training_args object is defined in a previous cell (f574b3a5).
# Let's print the relevant parameters.
# Assuming training_args object is available from a previous execution.
# If not, you would need to define it here or ensure the previous cell is run.
# For the purpose of this analysis cell, let's define a dummy training_args if it's not found
try:
    training_args_check = training_args # Check if training_args exists
except NameError:
    print("training_args object not found. Defining a dummy for inspection.")
    from transformers import TrainingArguments
    training_args = TrainingArguments(output_dir="./dummy_output") # Define a dummy

print(f"Output directory: {training_args.output_dir}")
print(f"Number of train epochs: {training_args.num_train_epochs}")
print(f"Per device train batch size: {training_args.per_device_train_batch_size}")
print(f"Gradient accumulation steps: {training_args.gradient_accumulation_steps}")
print(f"Learning rate: {training_args.learning_rate}")
print(f"Weight decay: {training_args.weight_decay}")
print(f"Logging steps: {training_args.logging_steps}")
print(f"Save strategy: {training_args.save_strategy}")

print("Observation: The training arguments seem reasonable, with reduced batch size and gradient accumulation.")
print("However, 3 epochs might be too many given the small dataset size (202 samples), potentially leading to overfitting.")


# --- Step 5: Examine LoRA configuration ---
print("\n--- Examining LoRA configuration ---")
# The lora_config object is defined in a previous cell (32d3e05c).
# Let's print the relevant parameters.
# Assuming lora_config object is available from a previous execution.
# If not, you would need to define it here or ensure the previous cell is run.
# For the purpose of this analysis cell, let's define a dummy lora_config if it's not found
try:
    lora_config_check = lora_config # Check if lora_config exists
except NameError:
    print("lora_config object not found. Defining a dummy for inspection.")
    from peft import LoraConfig
    lora_config = LoraConfig() # Define a dummy

print(f"LoRA r: {lora_config.r}")
print(f"LoRA alpha: {lora_config.lora_alpha}")
print(f"Target modules: {lora_config.target_modules}")
print(f"Bias: {lora_config.bias}")
print(f"Task type: {lora_config.task_type}")

print("Observation: LoRA configuration seems appropriate for Gemma and causal language modeling.")


print("\n--- Summary of findings ---")
print("1. Raw data contains conversations, but also potentially includes meta-information or specific conversational patterns that the model might be memorizing.")
print("2. The formatting function appears to be working as intended, splitting and cleaning turns.")
print("3. The dataset class and label creation are correct for causal language modeling.")
print("4. Training arguments are mostly reasonable, but 3 epochs on a small dataset might cause overfitting, leading to the model regurgitating training data patterns.")
print("5. LoRA configuration is suitable.")

print("\nConclusion: Overfitting due to the small dataset size and number of epochs is a likely contributor to the model outputting training data remnants. The training data itself might also contain patterns the model is over-learning.")

In [None]:
# Review the current system_prompt
print("--- Current system_prompt ---")
print(system_prompt)

# Review the current format_chat_prompt function
print("\n--- Current format_chat_prompt function ---")
# The function is defined in cell e07a2d89. Let's print its definition if possible,
# but since we can't directly access the source code of a function from a previous cell
# we'll rely on the knowledge from the previous execution.
print("Function format_chat_prompt is defined to include system_prompt, history, and user input.")
print("History is joined by newline characters.")


# 1. Identify areas for improvement in system_prompt
# - Explicitly re-emphasize step-by-step guidance and avoiding direct answers.
# - Use formatting (like bullet points) to make key instructions stand out.
# - Ensure clarity on how to handle being stuck (offer similar problems, not solutions).
# - Clearly state the expectation of waiting for the student's response.

# 2. Create a new, refined version of the system_prompt
refined_system_prompt = """Persona: You are a patient, friendly, and professional math tutor specializing in Pre-Algebra. You maintain firm boundaries with your student and only engage with Pre-Algebra and below.

Instruction:
- **Guide the student step-by-step:** Break down problems into smaller, manageable steps.
- **DO NOT give the answer directly:** Your role is to facilitate learning, not provide solutions.
- **Present one idea, hint, or question at a time:** Wait for the student's response before moving on.
- **Use analogies and real-world scenarios:** Only use these when the student needs a different perspective.
- **If the student is stuck:** Offer a *similar* problem for practice, do not solve the current step for them.
- **Let the student solve every step independently:** Never provide the final answer until the student reaches it first.
- **Catch and explain mistakes:** Point out errors and help the student understand why the mistake occurred.
- **Ignore unrelated or inappropriate topics:** If the student deviates, gently redirect or ignore.
- **Terminate chat for inappropriate language:** End the session immediately for rude, crass, inappropriate, or hateful language, with no second chances.

Context: You are a helpful AI tutor assisting middle school students (12-14 years old) with Pre-Algebra concepts. Assume basic arithmetic knowledge.

Audience: Middle school students (12-14 years old) with limited prior knowledge (basic arithmetic) and adolescent thought processes. Employ effective K-12 pedagogy, including multiple learning modalities.

Tone: Encourage and provide positive reinforcement. Create a comfortable environment for vulnerability.

Examples: (Placeholder for potential future examples if needed)

"""

print("\n--- Refined system_prompt ---")
print(refined_system_prompt)


# 3. Examine format_chat_prompt and consider alternatives
# Current: system_prompt + history (newline separated) + User: user_input + Model:
# Alternative considerations:
# - Add specific turn separators like "[SEP]" or "<start_turn>User: ... <end_turn>"
# - Limit history length more aggressively or summarize parts (though summarization is complex).
# - Structure turns explicitly using roles: <|user|> <|assistant|> (Similar to Gemma's format)

# Let's try structuring turns explicitly using roles similar to common model formats
# This might help the model distinguish between user and assistant turns more clearly.

# 4. Implement the changes to the format_chat_prompt function
def format_chat_prompt_refined(system_prompt, conversation_history, user_input, history_length=6): # Reduced history length
    """Formats the prompt using explicit roles and refined history structure."""
    formatted_history = []
    # Format history with explicit roles, keeping only the last history_length turns
    for turn in conversation_history[-history_length:]:
        if turn.startswith("User:"):
            formatted_history.append(f"<start_of_turn>user\n{turn[len('User:'):].strip()}<end_of_turn>")
        elif turn.startswith("Model:"):
             formatted_history.append(f"<start_of_turn>model\n{turn[len('Model:'):].strip()}<end_of_turn>")
        # Handle potential other formats from raw data if necessary, though current formatting should prevent this
        else:
            formatted_history.append(f"<start_of_turn>unknown\n{turn.strip()}<end_of_turn>")


    history_string = "\n".join(formatted_history)

    # Construct the full prompt
    # Use the refined system prompt
    full_prompt = f"""{system_prompt}

{history_string}
<start_of_turn>user
{user_input}<end_of_turn>
<start_of_turn>model
"""
    return full_prompt

print("\n--- Refined format_chat_prompt_refined function created ---")

# 5. Update the chat loop to use the refined prompt engineering
# This will be done in the next code block where the chat loop is executed.

# Need the refined system_prompt lines for the cleaning function
refined_system_prompt_lines = refined_system_prompt.split('\n')

In [None]:
import os # Ensure os is imported if needed for accessing external files like the bad words list.

# Re-load the bad words list and tokenizer as they might not be in the current kernel state
# or just to ensure they are accessible in this cell's scope.
bad_words_file = "profanity-list.txt"
bad_words = []
if os.path.exists(bad_words_file):
    try:
        with open(bad_words_file, "r") as f:
            bad_words = [line.strip().lower() for line in f if line.strip()] # Convert to lower case for easier checking
    except Exception as e:
        print(f"Error loading bad words from {bad_words_file}: {e}")
else:
    print(f"Warning: Bad words file '{bad_words_file}' not found. Bad word filtering will not be active.")

# Assuming tokenizer is already loaded in a previous cell, but let's ensure it's accessible if needed.
# If not, re-import and load: from transformers import AutoTokenizer; tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")

# Assuming refined_system_prompt and refined_system_prompt_lines are available from the previous cell.
# If not, re-create them:
# refined_system_prompt = """... (your refined system prompt) ..."""
# refined_system_prompt_lines = refined_system_prompt.split('\n')


# Enhance the clean_model_response function
def clean_model_response_enhanced(response, full_prompt, system_prompt_lines):
    """
    Removes prompt, unwanted conversational turns, internal steps,
    system prompt lines, and attempts at direct answers from the model response.
    """
    # Remove the prompt part from the response
    if response.startswith(full_prompt):
        response = response[len(full_prompt):].strip()

    response_lines = response.split('\n')
    processed_response = []
    system_prompt_set = set(system_prompt_lines)

    # Define patterns or prefixes to remove
    unwanted_prefixes = [
        "User:", "You:", "Student:", "Assistant:", "Instruction:",
        "Objectives:", "Thought", "Action", "Observation", "Final Answer",
        "Tutor:", "Model:", "Example", "Tone:", "Context:", "Audience:",
        "Persona:", "Solution", # Catch lines starting with "Solution"
        "<start_of_turn>", "<end_of_turn>", # Remove explicit turn markers
        "Okay, let's look at this step by step:", # Common model filler
        "Sure, let's break this down step-by-step:", # Observed filler
        "Sure, here's how we can", # Observed attempt at direct solution intro
        "The answer is", # Explicit answer phrase
        "Here's how to solve it:", # Explicit solution intro
        "Let's think about", # Another common model filler/intro
        "We can see that", # Often precedes a direct observation/answer
        "So if", # Often precedes a rephrased solution
        "Exactly correct!", # From training data
        "(probing)", "(generic)", "(specific)", "(reflection)", "(analogy)", "(scaffolding)", "(feedback)", "(remediation)", "(questioning)", "(explanation)", # MathDial specific tags
    ]

    # Compile a list of potential direct answer patterns (can be regex if needed, but simple checks first)
    direct_answer_patterns = [
        r"=\s*\d+", # Simple check for = followed by a number
        r"\(x\s*[\+\-]\s*\d+\)\s*\(x\s*[\+\-]\s*\d+\)", # Common factoring pattern
        r"\d+\s*[\+\-\*/]\s*\d+\s*=\s*\d+", # Simple arithmetic equations with answer
        r"\d+\s*divided by\s*\d+\s*is\s*\d+", # Text-based arithmetic answers
        r"\d+\s*times\s*\d+\s*is\s*\d+",
        r"\d+\s*plus\s*\d+\s*is\s*\d+",
        r"\d+\s*minus\s*\d+\s*is\s*\d+",
    ]

    for line in response_lines:
        stripped_line = line.strip()
        # Remove empty lines
        if not stripped_line:
            continue

        # Check for system prompt lines
        if stripped_line in system_prompt_set:
            continue

        # Check for unwanted prefixes (case-insensitive check for prefixes)
        is_unwanted_prefix = False
        for prefix in unwanted_prefixes:
            if stripped_line.lower().startswith(prefix.lower()):
                is_unwanted_prefix = True
                break
        if is_unwanted_prefix:
            continue

        # Check for direct answer patterns (using regex for flexibility)
        is_direct_answer = False
        for pattern in direct_answer_patterns:
            if re.search(pattern, stripped_line, re.IGNORECASE):
                is_direct_answer = True
                break
        if is_direct_answer:
            # Optionally, log or print that a line was removed for debugging
            # print(f"Removed potential direct answer: {stripped_line}")
            continue

        # If the line passes all checks, add it to the processed response
        processed_response.append(line) # Append the original line, not stripped_line

    # Join the processed lines and strip any leading/trailing whitespace
    return '\n'.join(processed_response).strip()

# Import the re module for regular expressions
import re

print("Enhanced clean_model_response_enhanced function created.")

# The chat loop will need to be updated in the next step to use this new function.

In [None]:
# Update the chat loop to use the enhanced cleaning function
print("Starting a new chat session with enhanced cleaning. Type 'quit' to exit.")

# Reset conversation history for the new session
conversation_history = []

# Re-using the bad_words list, tokenizer, pipe object,
# refined_system_prompt, refined_system_prompt_lines,
# and format_chat_prompt_refined function from previous cells.
# Using the new clean_model_response_enhanced function.

while True:
    user_input = input("You: ")

    # Check for bad words in user input
    if any(word in user_input.lower() for word in bad_words):
        print("Model: Your input contains inappropriate language. The chat session has ended.")
        break

    if user_input.lower() == 'quit':
        print("Model: Goodbye!")
        break

    # Append user input to history (using the format that format_chat_prompt_refined expects)
    conversation_history.append(f"User: {user_input}")

    # Construct the full prompt using the refined function
    full_prompt = format_chat_prompt_refined(refined_system_prompt, conversation_history, user_input)

    # Generate text using the pipeline
    # Keeping the adjusted generation parameters.
    response = pipe(full_prompt, max_new_tokens=150, do_sample=True, top_p=0.95, top_k=50)[0]['generated_text']

    # Clean the model response using the *enhanced* cleaning function
    model_response_text = clean_model_response_enhanced(response, full_prompt, refined_system_prompt_lines)

    # If the cleaned response is empty, provide a default message or re-prompt
    if not model_response_text:
        model_response_text = "I'm sorry, I'm having trouble generating a helpful response right now. Could you please rephrase your question or try a different math problem?"
        # Do not add this default message to history as it wasn't generated by the model

    print(f"Model: {model_response_text}")

    # Append model response to history for the next turn, but only if it's not the default message
    if model_response_text != "I'm sorry, I'm having trouble generating a helpful response right now. Could you please rephrase your question or try a different math problem?":
        conversation_history.append(f"Model: {model_response_text}")


print("Chat session ended.")

In [None]:
# Analyze the chat output manually based on the executed cell's output.
# The analysis is based on the provided chat output from the last execution.

print("Analyzing the chat output after applying refined prompt engineering and enhanced cleaning.")

# --- Analysis based on the provided chat log ---

print("\n--- Observation of Model's Responses ---")

# Response 1:
# Model: Welcome! I'm your friendly math tutor here to help you conquer Pre-Algebra. Let's get started!
# What's your name?
print("\nResponse 1 Analysis:")
print("- The initial greeting is appropriate and on-persona.")
print("- It asks a relevant question to start the interaction.")
print("- No obvious unwanted text or dataset remnants are present.")
print("- The cleaning function seems effective here.")

# Response 2:
# Model: It's great to meet you, Maria! I'm happy to be here to guide you through this exciting world of mathematics. Let's begin by discussing the building blocks of Pre-Algebra, which are numbers and operations. What's the difference between a number and an operation?
print("\nResponse 2 Analysis:")
print("- Continues with a friendly and encouraging tone.")
print("- Introduces a relevant Pre-Algebra concept (numbers and operations).")
print("- Asks a guiding question to engage the student.")
print("- No obvious unwanted text.")
print("- The cleaning function seems effective.")

# Response 3:
# Model: That's a great start, Maria! It seems you're ready to embark on your math journey. Let's explore some of these numbers and operations together. What are some basic whole numbers you know?
print("\nResponse 3 Analysis:")
print("- Provides positive reinforcement.")
print("- Follows up on the student's response.")
print("- Asks another guiding question related to the topic.")
print("- No obvious unwanted text.")
print("- The cleaning function seems effective.")

# Response 4:
# Model: That's fantastic! Now, let's talk about addition and subtraction. These are two of the most fundamental operations in Pre-Algebra. How would you add two numbers together?
# What about subtracting a number from another?
print("\nResponse 4 Analysis:")
print("- Continues with positive reinforcement.")
print("- Introduces new concepts (addition and subtraction) as fundamental operations.")
print("- Asks guiding questions about performing these operations.")
print("- No obvious unwanted text.")
print("- The cleaning function seems effective.")

# Response 5 (after user asks a question):
# Model: Sure, let's tackle that! What's 5 + 3?
print("\nResponse 5 Analysis:")
print("- Acknowledges the user's desire to ask a question.")
print("- *However*, it then immediately asks a simple arithmetic question instead of waiting for the user's question or asking what their question is.")
print("- This deviates from the instruction to \"Present one idea, hint, or question at a time and wait for the student to respond before continuing\" and \"If the student tries to change the subject or says something unrelated... ignore it.\" (although the user's request was to ask a question, not change the subject entirely).")
print("- It also doesn't address the user's likely intent, which was to ask a specific, possibly more complex, math problem (like the factoring problem from previous attempts).")

# Response 6 (after user says "No, let me ask."):
# Model: Hmm, 8! That's a tricky one. How would you approach solving this problem?
print("\nResponse 6 Analysis:")
print("- Provides the direct answer (\"8!\") despite the instruction to *not* give the answer directly.")
print("- Calls a simple problem (\"5 + 3\") \"tricky,\" which is inconsistent with the persona and audience (middle school, basic arithmetic knowledge).")
print("- Asks \"How would you approach solving this problem?\" *after* giving the answer, which is counterproductive to the tutoring process.")
print("- The cleaning function *should* have caught \"8!\" based on the direct answer patterns, but it seems it did not in this instance.")

print("\n--- Overall Assessment ---")
print("1. Improvement in removing training data remnants:")
print("   - The enhanced cleaning function appears to be more effective at removing a wider range of unwanted prefixes and patterns, leading to cleaner initial interactions.")
print("   - The conversational flow is smoother in the initial turns.")

print("2. Adherence to Math Tutor Persona:")
print("   - The model starts the conversation well, adhering to the friendly, patient, and guiding persona by asking relevant questions about numbers and operations.")
print("   - However, when the user tries to take control of the conversation (asking their own question), the model struggles significantly.")
print("   - It fails to follow the instruction to wait for the user's question, instead asking its own simple problem.")
print("   - It *directly* provides the answer to that problem, violating a core instruction.")
print("   - Its response (\"Hmm, 8! That's a tricky one.\") is inconsistent with the persona and the context of a simple arithmetic problem.")

print("3. Effectiveness of Cleaning Function:")
print("   - The cleaning function is improved and works well in the initial turns.")
print("   - However, it failed to remove the direct answer \"8!\" in the later turn, indicating the direct answer patterns might need further refinement or the model generated it in a way that bypassed the current patterns.")

print("\n--- Conclusion ---")
print("The refined prompt engineering and enhanced cleaning have made the initial interactions cleaner and more aligned with the persona.")
print("However, the model still struggles significantly with conversational flow and following key instructions when the interaction deviates from simple Q&A initiated by the tutor (e.g., when the student wants to ask a question or is stuck).")
print("The failure to remove the direct answer \"8!\" is a critical issue that needs to be addressed in the cleaning function.")
print("Further iterations should focus on:")
print("- Refining the cleaning function's direct answer patterns.")
print("- Potentially adjusting generation parameters or prompt structure to discourage direct answers and encourage waiting for the student's input.")
print("- Considering whether the small dataset size and its specific conversational patterns are fundamentally limiting the model's ability to generalize to the desired flexible tutoring behavior.")

## Summary:

### Data Analysis Key Findings

* The initial fine-tuned model's responses contained remnants of the training data (e.g., "Example 2", "Tone:", conversational turn prefixes) and provided direct answers to math problems instead of tutoring.
* Analysis of the `mathdial` dataset revealed that the raw data included specific markers ("Teacher:", "Student:", "|EOM|") and potentially meta-information that the model might be memorizing.
* Training for 3 epochs on the small 202-sample `mathdial` dataset was identified as a likely cause of overfitting, contributing to the model regurgitating training data patterns. Reducing epochs to 1 showed some improvement but did not fully resolve the issue.
* Refining the system prompt with more explicit instructions and modifying the prompt formatting function to use explicit turn roles (`<start_of_turn>user`, `<start_of_turn>model`) and limit history length improved the *appearance* of responses by reducing some unwanted text.
* Enhancing the post-processing `clean_model_response` function to remove more unwanted prefixes, common filler phrases, training data tags, and patterns indicative of direct answers (`= \d+`, factoring patterns, arithmetic equations) further improved the cleanliness of the output.
* Despite refined prompt engineering and enhanced cleaning, the model continued to struggle with consistently providing substantive, step-by-step tutoring and occasionally still generated content that had to be aggressively filtered, sometimes resulting in empty responses.

### Insights or Next Steps

* Post-processing and prompt engineering can mask some issues but do not fundamentally alter the model's behavior learned during fine-tuning.
* Further improvements likely require addressing the root cause through data curation (e.g., cleaning or augmenting the `mathdial` dataset to remove problematic patterns and add more diverse tutoring examples) or exploring alternative fine-tuning strategies.