# Fine-tuning Quantified Phi 3.5 Mini to HF Cover Letter Dataset
Fine-tuning Phi 3.5 Mini using QLoRa with MLflow and PEFT
1. Using QLoRa and PEFT to overcome the GPU limitations for fine-tuning on [`microsoft/Phi-3.5-mini-instruct` ](https://huggingface.co/microsoft/Phi-3.5-mini-instruct)
2. Utilise MLflow to log model artifacts, hyperparameters, metrics, and prompts 
3. Saving the fine-tuned model and potentially testing it.

Brief overview of the technologies used: 
* [QLoRA](https://github.com/artidoro/qlora) allows us to fine-tune large foundational models with limited GPU resources. It reduces the number of trainable parameters and also applies 4-bit quantization to the frozen pretrained model to further reduce the memory footprint.
* [PEFT](https://huggingface.co/docs/peft/en/index) with PEFT, you can apply QLoRA to the pretrained model with a few lines of configurations like the normal Transformers model training.

Reference/useful notebooks: 
1) https://mlflow.org/docs/latest/llms/transformers/tutorials/fine-tuning/transformers-peft.html
2) https://discuss.huggingface.co/t/tutorial-phi-3-5-fine-tuning/103461 


## Imports


Ensure that pytorch is working with CUDA to utilise GPU 

In [2]:
import torch
print(torch.backends.cudnn.enabled)
print(torch.cuda.is_available())
print(torch.cuda.get_arch_list())
torch.zeros(1).cuda()

True
True
['sm_50', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'sm_90', 'sm_37', 'compute_37']


tensor([0.], device='cuda:0')

In [2]:
# data 
import pandas as pd
from datasets import load_dataset
from IPython.display import HTML, display

# loading model and training 
from datasets import load_dataset
from peft import LoraConfig
import torch
import transformers
from trl import SFTTrainer
from transformers import (AutoModelForCausalLM, 
                          AutoTokenizer, 
                          TrainingArguments, 
                          BitsAndBytesConfig)
# mlflow 
import mlflow
import datetime 
from mlflow.models import infer_signature

  from .autonotebook import tqdm as notebook_tqdm


## Load dataset in pandas 

In [None]:
# credit -> https://mlflow.org/docs/latest/llms/transformers/tutorials/fine-tuning/transformers-peft.html  (Apache-2.0 license) 

# displays sample of dataset 
def display_table(dataset_or_sample):
    # A helper fuction to display a Transformer dataset or single sample contains multi-line string nicely
    pd.set_option("display.max_colwidth", None)
    pd.set_option("display.width", None)
    pd.set_option("display.max_rows", None)

    if isinstance(dataset_or_sample, dict):
        df = pd.DataFrame(dataset_or_sample, index=[0])
    else:
        df = pd.DataFrame(dataset_or_sample)

    html = df.to_html().replace("\\n", "<br>")
    styled_html = f"""<style> .dataframe th, .dataframe tbody td {{ text-align: left; padding-right: 30px; }} </style> {html}"""
    display(HTML(styled_html))


In [None]:
dataset_name = "ShashiVish/cover-letter-dataset"

# only use 5 percent of the dataset 
train_dataset = load_dataset(dataset_name, split="train[:5%]")
test_dataset = load_dataset(dataset_name, split="test[:5%]")

display_table(train_dataset.select(range(1)))

Unnamed: 0,Job Title,Preferred Qualifications,Hiring Company,Applicant Name,Past Working Experience,Current Working Experience,Skillsets,Qualifications,Cover Letter
0,Senior Java Developer,5+ years of experience in Java Development,Google,John Doe,Java Developer at XYZ for 3 years,Senior Java Developer at ABC for 2 years,"Java, Spring Boot, Hibernate, SQL",BSc in Computer Science,"I am writing to express my interest in the Senior Java Developer position at Google. With over 5 years of experience in Java development, I am confident in my ability to contribute effectively to your team. My professional experience includes designing and implementing Java applications, managing the full software development lifecycle, and troubleshooting and resolving technical issues. I also possess strong skills in Spring Boot, Hibernate and SQL. I am a diligent and dedicated professional, always looking to improve and learn new skills. I am excited about the opportunity to work with Google and contribute to your ongoing projects. I am certain that my skills and experience make me a strong candidate for this position."


In [5]:
print(f"Training dataset contains {len(train_dataset)} cv-to-coverletter pairs")
print(f"Test dataset contains {len(test_dataset)} cv-to-coverletter pairs")
column_names = list(train_dataset.features)
print(column_names)

Training dataset contains 41 cv-to-coverletter pairs
Test dataset contains 17 cv-to-coverletter pairs
['Job Title', 'Preferred Qualifications', 'Hiring Company', 'Applicant Name', 'Past Working Experience', 'Current Working Experience', 'Skillsets', 'Qualifications', 'Cover Letter']


## Organise and Format Dataset

Parse dataset into chat templating. Based on this documentation https://huggingface.co/docs/transformers/en/chat_templating

An example of the chat templating format: 
>`messages = [ ` \
> `    {"role": "user", "content": "Hi there!"},`  \
> `    {"role": "assistant", "content": "Nice to meet you!"},`\
>`    {"role": "user", "content": "Can I ask a question?"}`\
>`]`

In [None]:
def apply_message_template(row): 
    messages = [
        # system prompt 
        {"content": 
         """You are a powerful cover letter generator. Generate a professional formal personalised cover letter based on job title, qualifications, hiring company name, applicant name, work experience, skills. The cover letter must be between 300 - 400 words.""", 
         "role": "system"},
         # Format database information into prompt 
        {"content": 
        f"""Generate Cover Letter using this information:
        Job Title: {row['Job Title']}, Preferred Qualifications: {row['Preferred Qualifications']}, Hiring Company: {row['Hiring Company']}, Applicant Name: {row['Applicant Name']}, Past Working Experience: {row['Past Working Experience']}, Current Working Experience: {row['Current Working Experience']}, Skillsets:{row['Skillsets']}, Qualifications: {row['Qualifications']}""",
        "role" : "user"},
        # ideal response from assistant 
        {"content": f"{row['Cover Letter']}", "role":"assistant"}
    ] 
    return {"messages":messages} 

In [None]:
# transform dataset to chat templating 
train_dataset = train_dataset.map(apply_message_template,
                                  remove_columns=column_names)

# display our transformed dataset 
display_table(train_dataset.select(range(1)))

Unnamed: 0,messages
0,"[{'content': 'You are a powerful cover letter generator. Generate a professional formal personalised cover letter based on job title, qualifications, hiring company name, applicant name, work experience, skills. The cover letter must be between 300 - 400 words.', 'role': 'system'}, {'content': 'Generate Cover Letter using this information:  Job Title: Senior Java Developer, Preferred Qualifications: 5+ years of experience in Java Development, Hiring Company: Google, Applicant Name: John Doe, Past Working Experience: Java Developer at XYZ for 3 years, Current Working Experience: Senior Java Developer at ABC for 2 years, Skillsets:Java, Spring Boot, Hibernate, SQL, Qualifications: BSc in Computer Science', 'role': 'user'}, {'content': 'I am writing to express my interest in the Senior Java Developer position at Google. With over 5 years of experience in Java development, I am confident in my ability to contribute effectively to your team. My professional experience includes designing and implementing Java applications, managing the full software development lifecycle, and troubleshooting and resolving technical issues. I also possess strong skills in Spring Boot, Hibernate and SQL. I am a diligent and dedicated professional, always looking to improve and learn new skills. I am excited about the opportunity to work with Google and contribute to your ongoing projects. I am certain that my skills and experience make me a strong candidate for this position.', 'role': 'assistant'}]"


In [8]:
test_dataset = test_dataset.map(apply_message_template,
                                  remove_columns=column_names)

display_table(test_dataset.select(range(1)))

Unnamed: 0,messages
0,"[{'content': 'You are a powerful cover letter generator. Generate a professional formal personalised cover letter based on job title, qualifications, hiring company name, applicant name, work experience, skills. The cover letter must be between 300 - 400 words.', 'role': 'system'}, {'content': 'Generate Cover Letter using this information:  Job Title: Data Scientist, Preferred Qualifications: BSc focused on data Science/computer Science/engineering 4+ years experience Developing and shipping production grade machine learning systems 2+ years building and shipping data Science based personalization services and recommendation systems experience in data Science or machine learning engineering Strong analytical and data Science skills, Hiring Company: XYZ Corporation, Applicant Name: John Smith, Past Working Experience: Data Analyst at ABC Company, Current Working Experience: Machine Learning Engineer at DEF Company, Skillsets:Python, R, scikit-learn, Keras, Tensorflow, Qualifications: BSc in Computer Science, 5+ years of experience in data science and machine learning', 'role': 'user'}, {'content': 'Dear Hiring Manager, I am writing to express my interest in the Data Scientist position at XYZ Corporation. With my strong background in data science and machine learning, I believe I am well-suited for this role. In my previous role as a Data Analyst at ABC Company, I gained experience in identifying and engineering features for modeling. I also have a proven track record of evaluating various modeling techniques and developing models. Additionally, my current position as a Machine Learning Engineer at DEF Company has allowed me to collaborate with stakeholders and put models into production. I have a BSc in Computer Science and over 5 years of experience in data science and machine learning. I am proficient in Python, R, scikit-learn, Keras, and Tensorflow. I am eager to learn from others and contribute to the growth of the team. I am confident that my strong analytical and data science skills, along with my ability to work well in cross-functional teams, make me a valuable asset to XYZ Corporation. I am excited about the opportunity to contribute to the development of personalization services and recommendation systems. Thank you for considering my application. I look forward to the opportunity to discuss how my skills and qualifications align with the needs of XYZ Corporation. Sincerely, John Smith', 'role': 'assistant'}]"


## Load Model and set up MLFlow tracking

Initalise MLFlow for tracking parameters and setting up our run and experiment.

In [None]:
mlflow.set_tracking_uri("http://127.0.0.1:5000")
mlflow.set_experiment("QUANT PHI PEFT")
run_name = f"Phi-3.5-mini-Cover-Letter-QLoRA-{(datetime.datetime.now()).strftime('%Y-%m-%d-%H:%M:%S')}"

In [None]:
# configs 

training_config = {
    "report_to": "mlflow",
    "run_name": run_name,
    "fp16":True,
    "bf16": False,
    "do_eval": False,
    "learning_rate": 5.0e-06,
    "log_level": "info",
    "logging_steps": 20,
    "logging_strategy": "steps",
    "lr_scheduler_type": "cosine",
    "num_train_epochs": 1,
    "max_steps": -1,
    "output_dir": "./checkpoint_dir",
    "overwrite_output_dir": True,
    "per_device_eval_batch_size": 1,
    "per_device_train_batch_size": 1,
    "remove_unused_columns": True,
    "save_steps": 100,
    "save_total_limit": 1,
    "seed": 0,
    "gradient_checkpointing": True,
    "gradient_checkpointing_kwargs":{"use_reentrant": False},
    "gradient_accumulation_steps": 1,
    "warmup_ratio": 0.2,
    }

peft_config = {
    "r": 16,
    "lora_alpha": 32,
    "lora_dropout": 0.05,
    "bias": "none",
    "task_type": "CAUSAL_LM",
    "target_modules": [
        "q_proj",
        "k_proj",
        "lm_head",
    ],
    "modules_to_save": None,
}

In [None]:
# Pass our configurations 
train_conf = TrainingArguments(**training_config)
peft_conf = LoraConfig(**peft_config)

In [None]:
checkpoint_path = "microsoft/Phi-3.5-mini-instruct"

# Load the model with 4-bit quantization
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    # Use double quantization
    bnb_4bit_use_double_quant=True,
    # Use 4-bit Normal Float for storing the base model weights in GPU memory
    bnb_4bit_quant_type="nf4",
    # De-quantize the weights to 16-bit (Brain) float before the forward/backward pass
    bnb_4bit_compute_dtype=torch.bfloat16,
)

# pass quantisation configuration to our model 
model_kwargs = dict(
    quantization_config=quantization_config, 
    use_cache=False,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map=None
)

# initalise model and tokeniser 
model = AutoModelForCausalLM.from_pretrained(checkpoint_path, **model_kwargs)
tokenizer = AutoTokenizer.from_pretrained(checkpoint_path)
tokenizer.model_max_length = 2048
tokenizer.pad_token = tokenizer.unk_token  # use unk rather than eos token to prevent endless generation
tokenizer.pad_token_id = tokenizer.convert_tokens_to_ids(tokenizer.pad_token)
tokenizer.padding_side = 'right'


`flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.
Current `flash-attention` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.
`low_cpu_mem_usage` was None, now default to True since model is quantized.
Loading checkpoint shards: 100%|██████████| 2/2 [00:08<00:00,  4.13s/it]


In [None]:
# Process the data using our tokenizer 
# Data has to be processed twice as tokenizer process prompts well if it is not in hugging face's chat template 
def apply_chat_template(
    record,
    tokenizer,
):
    messages = record["messages"]
    record["text"] = tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=False)
    return record

In [None]:
column_names = list(train_dataset.features)

processed_train_dataset = train_dataset.map(
    apply_chat_template,
    fn_kwargs={"tokenizer": tokenizer},
    # remove irrelevant columns
    remove_columns=column_names,
    num_proc=10,
    desc="Applying chat template to train_sft",
)

In [None]:
processed_test_dataset = test_dataset.map(
    apply_chat_template,
    fn_kwargs={"tokenizer": tokenizer},
    # remove irrelevant columns
    remove_columns=column_names,
    num_proc=10,
    desc="Applying chat template to test_sft",
)

In [None]:
# display samples from our process datasets 
display_table(processed_train_dataset.select(range(1)))
display_table(processed_test_dataset.select(range(1)))

Unnamed: 0,text
0,"<|system|> You are a powerful cover letter generator. Generate a professional formal personalised cover letter based on job title, qualifications, hiring company name, applicant name, work experience, skills. The cover letter must be between 300 - 400 words.<|end|> <|user|> Generate Cover Letter using this information:  Job Title: Senior Java Developer, Preferred Qualifications: 5+ years of experience in Java Development, Hiring Company: Google, Applicant Name: John Doe, Past Working Experience: Java Developer at XYZ for 3 years, Current Working Experience: Senior Java Developer at ABC for 2 years, Skillsets:Java, Spring Boot, Hibernate, SQL, Qualifications: BSc in Computer Science<|end|> <|assistant|> I am writing to express my interest in the Senior Java Developer position at Google. With over 5 years of experience in Java development, I am confident in my ability to contribute effectively to your team. My professional experience includes designing and implementing Java applications, managing the full software development lifecycle, and troubleshooting and resolving technical issues. I also possess strong skills in Spring Boot, Hibernate and SQL. I am a diligent and dedicated professional, always looking to improve and learn new skills. I am excited about the opportunity to work with Google and contribute to your ongoing projects. I am certain that my skills and experience make me a strong candidate for this position.<|end|> <|endoftext|>"


Unnamed: 0,text
0,"<|system|> You are a powerful cover letter generator. Generate a professional formal personalised cover letter based on job title, qualifications, hiring company name, applicant name, work experience, skills. The cover letter must be between 300 - 400 words.<|end|> <|user|> Generate Cover Letter using this information:  Job Title: Data Scientist, Preferred Qualifications: BSc focused on data Science/computer Science/engineering 4+ years experience Developing and shipping production grade machine learning systems 2+ years building and shipping data Science based personalization services and recommendation systems experience in data Science or machine learning engineering Strong analytical and data Science skills, Hiring Company: XYZ Corporation, Applicant Name: John Smith, Past Working Experience: Data Analyst at ABC Company, Current Working Experience: Machine Learning Engineer at DEF Company, Skillsets:Python, R, scikit-learn, Keras, Tensorflow, Qualifications: BSc in Computer Science, 5+ years of experience in data science and machine learning<|end|> <|assistant|> Dear Hiring Manager, I am writing to express my interest in the Data Scientist position at XYZ Corporation. With my strong background in data science and machine learning, I believe I am well-suited for this role. In my previous role as a Data Analyst at ABC Company, I gained experience in identifying and engineering features for modeling. I also have a proven track record of evaluating various modeling techniques and developing models. Additionally, my current position as a Machine Learning Engineer at DEF Company has allowed me to collaborate with stakeholders and put models into production. I have a BSc in Computer Science and over 5 years of experience in data science and machine learning. I am proficient in Python, R, scikit-learn, Keras, and Tensorflow. I am eager to learn from others and contribute to the growth of the team. I am confident that my strong analytical and data science skills, along with my ability to work well in cross-functional teams, make me a valuable asset to XYZ Corporation. I am excited about the opportunity to contribute to the development of personalization services and recommendation systems. Thank you for considering my application. I look forward to the opportunity to discuss how my skills and qualifications align with the needs of XYZ Corporation. Sincerely, John Smith<|end|> <|endoftext|>"


## Test model

In [None]:
# initalise pipeline with quanitised model 
pipeline = transformers.pipeline(model=model, tokenizer=tokenizer, task="text-generation")

# following configurations specified at Phi 3.5 model card 
generation_args = {
    "max_new_tokens": 700,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

messages = train_dataset[0]["messages"][0:2]
print(messages)

# generate without assistant prompt
with torch.no_grad():
    output = pipeline(messages, **generation_args)

print(output[0]['generated_text'])

#generated output in 1m 25.9s on GPU 

The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
`get_max_cache()` is deprecated for all Cache classes. Use `get_max_cache_shape()` instead. Calling `get_max_cache()` will raise error from v4.48
You are not running the flash-attention implementation, expect numerical differences.


[{'content': 'You are a powerful cover letter generator. Generate a professional formal personalised cover letter based on job title, qualifications, hiring company name, applicant name, work experience, skills. The cover letter must be between 300 - 400 words.', 'role': 'system'}, {'content': 'Generate Cover Letter using this information:\n        Job Title: Senior Java Developer, Preferred Qualifications: 5+ years of experience in Java Development, Hiring Company: Google, Applicant Name: John Doe, Past Working Experience: Java Developer at XYZ for 3 years, Current Working Experience: Senior Java Developer at ABC for 2 years, Skillsets:Java, Spring Boot, Hibernate, SQL, Qualifications: BSc in Computer Science', 'role': 'user'}]
 [John Doe]
[John's Address]
[City, State, Zip]
[Email Address]
[Phone Number]
[Date]

Hiring Manager
Google Inc.
[Google's Address]
[City, State, Zip]

Dear Hiring Manager,

I am writing to express my interest in the Senior Java Developer position at Google as

In [None]:
# testing model with assistant message 
messages = train_dataset[0]["messages"]
print(messages)

with torch.no_grad():
    output = pipeline(messages, **generation_args)

print(output[0]['generated_text'])

# generated in 1m 1.9s on GPU 

[{'content': 'You are a powerful cover letter generator. Generate a professional formal personalised cover letter based on job title, qualifications, hiring company name, applicant name, work experience, skills. The cover letter must be between 300 - 400 words.', 'role': 'system'}, {'content': 'Generate Cover Letter using this information:\n        Job Title: Senior Java Developer, Preferred Qualifications: 5+ years of experience in Java Development, Hiring Company: Google, Applicant Name: John Doe, Past Working Experience: Java Developer at XYZ for 3 years, Current Working Experience: Senior Java Developer at ABC for 2 years, Skillsets:Java, Spring Boot, Hibernate, SQL, Qualifications: BSc in Computer Science', 'role': 'user'}, {'content': 'I am writing to express my interest in the Senior Java Developer position at Google. With over 5 years of experience in Java development, I am confident in my ability to contribute effectively to your team. My professional experience includes desig

# Fine tune quantified model

In [None]:
# initalise model trainer with our peft conf 
trainer = SFTTrainer(
    model=model,
    args=train_conf,
    peft_config=peft_conf,
    train_dataset=processed_train_dataset,
    eval_dataset=processed_test_dataset,
    max_seq_length=2048,
    dataset_text_field="text",
    tokenizer=tokenizer,
    packing=True
)


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.
Using auto half precision backend


In [None]:
# train our model 
with mlflow.start_run() as run:
    train_result = trainer.train()
    metrics = train_result.metrics
    trainer.log_metrics("train", metrics)
    trainer.save_metrics("train", metrics)
    trainer.save_state()

***** Running training *****
  Num examples = 7
  Num Epochs = 1
  Instantaneous batch size per device = 1
  Total train batch size (w. parallel, distributed & accumulation) = 1
  Gradient Accumulation steps = 1
  Total optimization steps = 7
  Number of trainable parameters = 562,176
100%|██████████| 7/7 [00:15<00:00,  1.88s/it]Saving model checkpoint to ./checkpoint_dir\checkpoint-7
tokenizer config file saved in ./checkpoint_dir\checkpoint-7\tokenizer_config.json
Special tokens file saved in ./checkpoint_dir\checkpoint-7\special_tokens_map.json


Training completed. Do not forget to share your model on huggingface.co/models =)


100%|██████████| 7/7 [00:16<00:00,  2.40s/it]


{'train_runtime': 16.8211, 'train_samples_per_second': 0.416, 'train_steps_per_second': 0.416, 'train_loss': 1.9283688408987862, 'epoch': 1.0}
***** train metrics *****
  epoch                    =        1.0
  total_flos               =   298255GF
  train_loss               =     1.9284
  train_runtime            = 0:00:16.82
  train_samples_per_second =      0.416
  train_steps_per_second   =      0.416


2024/11/11 01:50:33 INFO mlflow.tracking._tracking_service.client: 🏃 View run capable-quail-412 at: http://127.0.0.1:5000/#/experiments/352351548751262593/runs/89af066455ba479883e38e18391ab4cc.
2024/11/11 01:50:33 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://127.0.0.1:5000/#/experiments/352351548751262593.


In [None]:
# save trained model 
trainer.save_model("./saved_models")

Saving model checkpoint to ./saved_models
tokenizer config file saved in ./saved_models\tokenizer_config.json
Special tokens file saved in ./saved_models\special_tokens_map.json


## Log the fine-tuned quant model 
Logging the model in MLFLOW. 

First we register a signature to ensure MLflow knows what type of inputs the model can take. Unluckily, we cannot pass the HF chat template dict type as MLFlow only accept signatures of string type. 

In [None]:
# retrieve a sample input from our dataset 
sample = processed_test_dataset['text']

# register a signature 
signature = infer_signature(
    model_input=sample,
    model_output=sample,
    # save the generation configs we used 
    params=generation_args,
)

# display signature 
signature

inputs: 
  [string (required)]
outputs: 
  [string (required)]
params: 
  ['max_new_tokens': integer (default: 700), 'return_full_text': boolean (default: False), 'temperature': double (default: 0.0), 'do_sample': boolean (default: False)]

In [None]:
# retrieve run ID of our fine-tuned model 
last_run_id = mlflow.last_active_run().info.run_id

# Save tokenizer without padding because it is only needed for training
tokenizer_no_pad = AutoTokenizer.from_pretrained("microsoft/Phi-3.5-mini-instruct")

prompt_template = """You are a powerful cover letter generator. Generate a professional formal personalised cover letter based on job title, qualifications, hiring company name, applicant name, work experience, skills. The cover letter must be between 300 - 400 words.
        {prompt}
         """

# register the model with MLFlow 
with mlflow.start_run(run_id=last_run_id):
    mlflow.log_params(peft_config)
    mlflow.transformers.log_model(
        transformers_model={"model": trainer.model, "tokenizer": tokenizer_no_pad},
        prompt_template=prompt_template,
        signature=signature,
        artifact_path="model",  
    )

loading file tokenizer.model from cache at C:\Users\ROG\.cache\huggingface\hub\models--microsoft--Phi-3.5-mini-instruct\snapshots\af0dfb8029e8a74545d0736d30cb6b58d2f0f3f0\tokenizer.model
loading file tokenizer.json from cache at C:\Users\ROG\.cache\huggingface\hub\models--microsoft--Phi-3.5-mini-instruct\snapshots\af0dfb8029e8a74545d0736d30cb6b58d2f0f3f0\tokenizer.json
loading file added_tokens.json from cache at C:\Users\ROG\.cache\huggingface\hub\models--microsoft--Phi-3.5-mini-instruct\snapshots\af0dfb8029e8a74545d0736d30cb6b58d2f0f3f0\added_tokens.json
loading file special_tokens_map.json from cache at C:\Users\ROG\.cache\huggingface\hub\models--microsoft--Phi-3.5-mini-instruct\snapshots\af0dfb8029e8a74545d0736d30cb6b58d2f0f3f0\special_tokens_map.json
loading file tokenizer_config.json from cache at C:\Users\ROG\.cache\huggingface\hub\models--microsoft--Phi-3.5-mini-instruct\snapshots\af0dfb8029e8a74545d0736d30cb6b58d2f0f3f0\tokenizer_config.json
Special tokens have been added in t

# Attempt to run our finetuned model

In [None]:
# Load the model in based on Run ID found from MLflow UI 
mlflow_model = mlflow.pyfunc.load_model("runs:/62da25f1c43c4623815db5e2d4e26cb8/model")

# We only input table and question, since system prompt is adeed in the prompt template.
test_dataset = load_dataset(dataset_name, split="test[:5%]")
sample = test_dataset[1]

# MLflow infers schema from the provided sample input/output/params
model_input={
    "Job Title": sample['Job Title'],
    "Work Experience": f"{sample['Current Working Experience']},{sample['Past Working Experience']}",
    "Preferred Qualifications" : sample['Preferred Qualifications'], 
    "Qualitifcations" : sample['Qualifications'],
    "Hiring Company" : sample['Hiring Company'],
    "Applicant Name" : sample['Applicant Name'],
    "Skillsets" : sample['Skillsets'],        
}
# Inference parameters like max_tokens_length are set to default values specified in the Model Signature
generated_query = mlflow_model.predict(model_input)[0]
display_table({"prompt": model_input, "generated_query": generated_query})

## Conclusion and Next Steps 
Attempted to test the model. The model loaded in successfully, however inference took over more than an hour and eventually timed-out. I read about other users experiencing the same problem with slower inference after fine-tuning their model with PEFT. I tested it with just fine-tuning a small part of the model using PEFT but it was unsuccessful due to my hardware GPU limitations. 

Furthermore, I later discovered that this would not be able to be implemented a quantified model in the final product, due to how quantification works in Hugging face. Hugging face requires that the full model to be downloaded, then the model has to be quantified utilising libraries such as, BitsAndBytes. These require the use of a GPU which is not an allow specification in the coursework outline. The unsuitability of this model is further highlighted through it's memory foot print which reached up to 10 GB. 

Hence, to combat this and attempt to speed up inference time. I will test other model pipelines exclusively on CPU first to ensure that the inference time is acceptable. I could limit my search to models with parameters under 1B, however this could have a heavy impact on the quality of the model output. 