# Fine-Tuned PHI 4 Mini - Financial Aid Chat Usecase 

This notebook pulls down the Microsoft PHI 4 Mini model from HuggingFace and leverages a pre-made synthetic dataset comprising of fictitious recipient records. LoRA adapters were used to fine-tune the model efficiently, reducing memory usage by keeping the base model frozen and training only additional adapter layers. No real data is used in this dataset and the code is free from any conflict of interest to the best of the author's knowledge. This author assumes the user has a HuggingFace token and has set their GPU setting to MPS (requires MacOS 13+).


## Step 1: Create a virtual environment and run make install

## Step 2: Import dependencies and deal with memory settings

In [4]:
import os
import pandas as pd
import re
import torch
import random
import boto3

# For LLM
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainingArguments,
)

from chat_ai.src.domain.trainers.recipient_model_trainer import RecipientModelTrainer
from chat_ai.src.domain.datasets.instruct_dataset import StudentDialogueDataset


In [3]:
# The following helps with the MPS memory issue
torch.mps.empty_cache()


## Step 3: Load Datasets and Synthetic Database

In [5]:
# Load Data to Fine Tune LlaMa
root_path = '../dist/data'
train_df = pd.read_csv(f"{root_path}/train_data.csv", index_col=0)
eval_df = pd.read_csv(f"{root_path}/eval_data.csv", index_col=0)
synthetic_student_population_df = pd.read_csv(f"{root_path}/synthetic_population_data.csv", index_col=0)

In [6]:
train_df

Unnamed: 0,context,user_input,answer
0,Student Linda Bailey (ID: S00001) has a GPA of...,What financial aid do I qualify for?,You qualify for the Need-Based Grant because y...
1,Student John Doe (ID: S00002) has a GPA of 3.8...,What financial aid am I eligible for?,John Doe qualifies for the Merit-Based Scholar...
2,Student Sarah Carter (ID: S00003) has a GPA of...,Can I apply for any financial aid?,Yes! You qualify for the STEM Excellence Award...
3,Student John Doe (ID: S10123) has a GPA of 2.3...,What am I to be eligible for financial aid?,"Due to your income, you do not qualify for a n..."


In [7]:
eval_df

Unnamed: 0,context,user_input,answer
0,Student Emily Carter (ID: S00023) has a GPA of...,Am I eligible for any financial aid?,"Yes, you qualify for the STEM Excellence Award..."
1,Student Mark Smith (ID: S00045) has a GPA of 2...,Can I get coverage with my education program?,"Unfortunately, you do not qualify for any fina..."
2,Student Olivia Jones (ID: S00030) has a GPA of...,Which financial aid can do I qualify for?,You qualify for the STEM Excellence Award sinc...
3,Student Daniel Thompson (ID: S00051) has a GPA...,What financial aid options do I have?,You are eligible for the Need-Based Grant sinc...


In [8]:
synthetic_student_population_df

Unnamed: 0,id,name,gpa,field_of_study,income
0,S00001,Marc Solomon,2.53,Anthropology,73706
1,S00002,Suzanne Kidd,2.50,Business Administration,22259
2,S00003,Michelle Griffin,2.42,Political Science,31192
3,S00004,Hannah Dixon,2.05,Engineering,530431
4,S00005,Mary Harris,3.13,Chemistry,370142
...,...,...,...,...,...
995,S00996,Nicole Clay,3.97,Chemistry,353792
996,S00997,Tammy Rose,3.34,Computer Science,414283
997,S00998,Jeff Alvarez,3.96,Psychology,450287
998,S00999,Thomas Lowe,3.02,Business Administration,54775


## Step 4: Load PHI 4 Mini Model and Train

In [9]:
import logging

# Configure logging
logging.basicConfig(
    level=logging.INFO,  # Set logging level
    format="%(asctime)s - %(levelname)s - %(message)s"
)

# Get a logger instance for the app
logger = logging.getLogger(__name__)  # Use __name__ to get module-level logger

In [10]:
# Import Base Model (used for all use cases)
model_name = "Phi-4-mini-instruct"
model_path = "microsoft/Phi-4-mini-instruct"

device = torch.device("mps")
model_trainer = RecipientModelTrainer(
    model_name=model_path, 
    dataset=train_df,
    logger=logger, 
    device=device, 
    model_type="causal",
    huggingface_token=os.getenv("HUGGINGFACE_TOKEN")
)

model, tokenizer = model_trainer.load_model(
    apply_lora=True, 
    lora_target_modules=["q_proj", "v_proj", "k_proj", "o_proj", "fc1", "fc2"]
    )

2025-03-11 22:13:58,421 - INFO - Model will be moved to mps


Loading model from microsoft/Phi-4-mini-instruct...


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

2025-03-11 22:14:02,686 - INFO - Model loaded in 4.26 seconds.


Model loaded successfully and set to use mps!
'NoneType' object has no attribute 'cadam32bit_grad_fp32'
trainable params: 3,145,728 || all params: 3,839,167,488 || trainable%: 0.0819


  warn("The installed version of bitsandbytes was compiled without GPU support. "


In [None]:
train_dataset = StudentDialogueDataset(train_df.to_dict(orient='records'), tokenizer, task_type="instruct", device=device)
eval_dataset = StudentDialogueDataset(eval_df.to_dict(orient='records'), tokenizer, task_type="instruct", device=device)

In [13]:
checkpoints_dir = "../dist/models/phi4_mini/chat/checkpoints"
lora_adapter_path = "../dist/models/phi4_mini/chat/lora_adapter"

In [None]:
training_args = TrainingArguments(
    output_dir=checkpoints_dir,
    per_device_train_batch_size=2,  # Reduce batch size for MPS stability
    per_device_eval_batch_size=2,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    logging_strategy="steps",
    logging_steps=10,
    learning_rate=2e-5,
    num_train_epochs=5,
    bf16=True,
    dataloader_pin_memory=False,
    report_to="none",
    warmup_steps=100,
)

model_trainer.train(
    training_arguments=training_args, 
    train_dataset=train_dataset, 
    eval_dataset=eval_dataset, 
    output_dir=checkpoints_dir,
    adapter_path=lora_adapter_path,
    pad_to_multiple_of=8, 
    return_tensors="pt"
    )


[2025-03-11 22:14:39,847] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to mps (auto detect)


W0311 22:14:39.978000 79982 torch/distributed/elastic/multiprocessing/redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.
No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Epoch,Training Loss,Validation Loss
1,No log,7.743078
2,No log,7.740966
3,No log,7.734684
4,No log,7.710366
5,7.345200,7.661146


2025-03-11 22:15:17,835 - INFO - Model LoRA adapter files saved at ../dist/models/phi4_mini/chat/lora_adapter


In [15]:
# Empty cache ahead of merge and unload
torch.mps.empty_cache()

In [16]:
# Load Base Model
device = torch.device("mps")
new_model_trainer = RecipientModelTrainer(
    model_name=model_path, 
    dataset=train_df, 
    logger=logger,
    device=device, 
    model_type="causal",
    huggingface_token=os.getenv("HUGGINGFACE_TOKEN")
)
base_model, tokenizer = new_model_trainer.load_model(apply_lora=False)

# Load LoRA Adapters
model = new_model_trainer.load_lora_adapters(
    base_model, 
    lora_adapter_path, 
    merge_adapters=True  # Merge adapters into base model
    )

# Save the Fully Fine-Tuned Model
tuned_version = 1
tuned_model_name = "chat"
tuned_model_storage_dir = f"../dist/models/phi4_mini/{tuned_model_name}"
tuned_model_storage_path = f"{tuned_model_storage_dir}/v{tuned_version}.0"
s3_model_storage_dir = f"models/phi4_mini/{tuned_model_name}/v{tuned_version}.0"

model.save_pretrained(tuned_model_storage_path)
tokenizer.save_pretrained(tuned_model_storage_path)

print(f"\nMerged {model_name} model saved (base model + LoRA adapters)!")
print(f"Model saved to {tuned_model_storage_path}")
print(f"S3 Model will be saved to {s3_model_storage_dir}")

2025-03-11 22:17:53,377 - INFO - Model will be moved to mps


Loading model from microsoft/Phi-4-mini-instruct...


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

2025-03-11 22:17:56,201 - INFO - Model loaded in 2.82 seconds.


Model loaded successfully and set to use mps!

Merged Phi-4-mini-instruct model saved (base model + LoRA adapters)!
Model saved to ../dist/models/phi4_mini/chat/v1.0
S3 Model will be saved to models/phi4_mini/chat/v1.0


Up until this point, that is all that is required for full download, fine-tune, and saving of fine-tuned model. The remainder of this notebook is on using the model for the student chat use case.

## Step 5: Preprocess Synthetic Recipient Population Data
Using the synthetic population dataset, construct prompts to test out the model's performance

In [12]:
fake_students = synthetic_student_population_df.to_dict(orient="records")

print(f"{len(fake_students)} student records available")
print("\nData record shape:\n")
print(fake_students[0])

1000 student records available

Data record shape:

{'id': 'S00001', 'name': 'Marc Solomon', 'gpa': 2.53, 'field_of_study': 'Anthropology', 'income': 73706}


In [13]:
from typing import Optional

@staticmethod
def determine_financial_aid_eligibility(gpa, field_of_study, income):
    """
    Uses a student record to determine coverage against some fictitious
    parameters
    """
    # Dummy logic for financial aid eligibility
    financial_aid = []
    requirements = []

    if gpa >= 3.6:
        requirements.append("GPA ≥ 3.6")
    if income <= 35000:
        requirements.append("Income ≤ $35,000")
    if field_of_study in [
        "Computer Science",
        "Engineering",
        "Mathematics",
        "IT",
        "Statistics",
    ]:
        financial_aid.append("STEM Excellence Award")
        requirements.append("Field of Study in STEM")
    if field_of_study in ["Psychology", "Sociology", "Social Work", "Anthropology"]:
        financial_aid.append("Behavioral and Social Sciences Grant")
        requirements.append("Field of Study in Behavioral and Social Sciences")
    if field_of_study in ["Paralegal", "Law"]:
        financial_aid.append("Legal Studies Full Ride")
        requirements.append("Field of Study in Legal Studies")

    return requirements, financial_aid

def retrieve_financial_aid_knowledge(student_id: Optional[str] = None, student_name: Optional[str] = None):
    """
    Retrieves financial aid eligibility knowledge for a student.
    """
    if student_id:
        student = next((item for item in fake_students if item["id"] == student_id), None)
    if student_name:
        student = next((item for item in fake_students if item["name"] == student_name), None)

    if not student:
        return None

    requirements, financial_aid = determine_financial_aid_eligibility(
        student["gpa"], student["field_of_study"], student["income"]
        )
        

    return {
        "id": student["id"],
        "name": student["name"],
        "gpa": student["gpa"],
        "field_of_study": student["field_of_study"],
        "income": student["income"],
        "financial_aid": financial_aid,
        "requirements": requirements,
    }

In [14]:
def format(student: dict) -> str:
    """
    Converts the retrieved student and financial aid knowledge into an LLM-readable prompt.
    """
    return f"Recipient {student['name']} (ID: {student['id']}) has a GPA of {student['gpa']} in {student['field_of_study']} " + \
        f"and an income of ${student['income']}. " + \
        f"Eligible financial aid: {', '.join(student['financial_aid']) if student.get('financial_aid') else 'None'}. " + \
        f"Requirements met: {', '.join(student['requirements']) if student.get('requirements') else 'None'}."


retrieved_knowledge = retrieve_financial_aid_knowledge("S00001")
print(retrieved_knowledge)  # Data as-is
print(format(retrieved_knowledge)) # Data formatted for LLM

{'id': 'S00001', 'name': 'Marc Solomon', 'gpa': 2.53, 'field_of_study': 'Anthropology', 'income': 73706, 'financial_aid': ['Behavioral and Social Sciences Grant'], 'requirements': ['Field of Study in Behavioral and Social Sciences']}
Recipient Marc Solomon (ID: S00001) has a GPA of 2.53 in Anthropology and an income of $73706. Eligible financial aid: Behavioral and Social Sciences Grant. Requirements met: Field of Study in Behavioral and Social Sciences.


## Step 6: Pre-build all prompts using synthetic population data

In [15]:
from typing import Optional, Tuple

def build_student_prompt(user_input: str, student_id: Optional[str] = None) -> Tuple[str, str]:
    """
    Builds student chat prompt for LLM and uses RAG to keep responses properly informed for
    the given student with student_id.
    """

    # student id available and valid, proceed with lookup
    knowledge = format(retrieve_financial_aid_knowledge(student_id))

    # Retrun both the knowledge rertieved and the prompt
    return (
        knowledge,
        f"""
        ### Role: You are a financial aid evaluator. Answer the user's question based on the provided student's profile. Do not repeat the question or context, do not add any data not available regarding funding amount or renewals. Make sure to use the eligible financial aid and requirements met to inform their eligibility. \n\n### Context:\n{knowledge}\n\n### User's Question:\n{user_input}\n\n###Answer: 
        """
    )

In [16]:
# Define test dataset (Example questions)
def build_test_sentences():
    test_sentences = [
        f"What GPA do I need need to qualify for financial aid?",
        f"Am I eligible for scholarships?",
        f"What grants can I apply for?",
    ]

    return test_sentences

student_prompts = []

for student in fake_students:
    test_sentence = random.choice(build_test_sentences())
    knowledge_retrieved, prompt = build_student_prompt(user_input=test_sentence, student_id=student["id"])
    student_prompts.append(prompt)

student_prompts[:5]

["\n        ### Role: You are a financial aid evaluator. Answer the user's question based on the provided student's profile. Do not repeat the question or context, do not add any data not available regarding funding amount or renewals. Make sure to use the eligible financial aid and requirements met to inform their eligibility. \n\n### Context:\nRecipient Marc Solomon (ID: S00001) has a GPA of 2.53 in Anthropology and an income of $73706. Eligible financial aid: Behavioral and Social Sciences Grant. Requirements met: Field of Study in Behavioral and Social Sciences.\n\n### User's Question:\nWhat grants can I apply for?\n\n###Answer: \n        ",
 "\n        ### Role: You are a financial aid evaluator. Answer the user's question based on the provided student's profile. Do not repeat the question or context, do not add any data not available regarding funding amount or renewals. Make sure to use the eligible financial aid and requirements met to inform their eligibility. \n\n### Context:

#### Set to start testing!

## Step 7: Run custom student inferences

In [17]:
import torch
from chat_ai.src.adapters.llms.model_loader_adapter import ModelLoaderAdapter

# Load fine-tuned model (the final merged version)
tuned_model_storage_path = f"notebooks/dist/models/phi4_mini/chat"

model_loader = ModelLoaderAdapter(
    model_name="PHI4StudentChatModel",
    model_path=tuned_model_storage_path,
    model_version="1.0",
    model_type="generation",
)

model, tokenizer = model_loader.load_pretrained_model()

# model.to(device)  TODO: Figure out the cause behind mps device movement failure
model = torch.compile(model) # To boost inference speed


Loading model from /Users/erosado/work/rag-ai-chat-template/notebooks/dist/models/phi4_mini/chat/v1.0...


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Model loaded successfully and set to use mps!


In [18]:
# Select first prompt
first_prompt = random.choice(student_prompts)
first_prompt

"\n        ### Role: You are a financial aid evaluator. Answer the user's question based on the provided student's profile. Do not repeat the question or context, do not add any data not available regarding funding amount or renewals. Make sure to use the eligible financial aid and requirements met to inform their eligibility. \n\n### Context:\nRecipient Sandra James (ID: S00168) has a GPA of 2.9 in Art and an income of $326828. Eligible financial aid: None. Requirements met: None.\n\n### User's Question:\nWhat GPA do I need need to qualify for financial aid?\n\n###Answer: \n        "

In [19]:
def extract_answer(response_text):
    """
    Extracts the generated answer from the LLM response.
    """

    # Match the last "Answer:" followed by actual content
    match = re.search(
        r"(?i)(?:Your response:|Answer:)\s*\n*(.*?)(?=\n\n|\Z)",
        response_text,
        re.DOTALL,
    )

    if match:
        extracted_answer = match.group(1).strip()
        return extracted_answer
    else:
        return "No answer found."

In [21]:
device = model_loader.get_device()
def generate_text(prompt, max_new_tokens=100):
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    with torch.no_grad():
        output = model.generate(**inputs, max_new_tokens=max_new_tokens)
    return tokenizer.decode(output[0], skip_special_tokens=True)

# Run first prompt
response = generate_text(first_prompt)
print(extract_answer(response))

Sandra James does not qualify for any financial aid as there are no eligible financial aid options available for her, and she has not met any requirements for financial aid.


In [22]:
# Run another prompt
response = generate_text(random.choice(student_prompts))
print(extract_answer(response))

Yes, you are eligible for the STEM Excellence Award.


## Step 8: Evaluate Model

### TBD - requires uploading separate dataset

## Step 9: Deploy to S3
The following is entirely optional and requires an AWS account. The author assumes the user has set up their AWS CLI and is authenticated.

In [None]:
# s3 = boto3.client("s3")
# bucket_name = "your-sagemaker-bucket"

# # Upload merged model to S3
# s3.upload_file(tuned_model_storage_dir, bucket_name, s3_model_storage_dir)
