# Parameterisation and Prompt-tuning with Qwen2.5-0.5B 

This notebook was used to run MLFlow experiments to evaluate the [`Qwen2.5-0.5B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) model.  

For each run, changes to parameter configurations were made (temperature, top_k, top_p) based on two made approaches: optimisation for analytical tasks and optimisation for more creative outputs. Where the analytical approach involved testing with higher Top-k/top-p with lower temperatures. While the latter with lower Top-K/Top-P with higher temeratures. Each run was originally evaluated with the first 5 records in the test dataset. The sample size was limited so harshly due to hardware limitations along with inference taking much time.  

Once the top 2 models that performed the best by minising
metrics, automated readability index (ARI) and flesch kincaid grade level, were found. Those models were re-evaluated with a larger dataset of 35 entries to ensure performance was consistent over a variety of inputs. The optimum input and prompt in reveals at the end of the notebook. 

## Imports 

In [None]:
# data 
import datasets
from datasets import load_dataset
import pandas as pd
from IPython.display import HTML, display

# loading model and training 
import os
import torch
import transformers
from transformers import (AutoModelForCausalLM, 
                          AutoTokenizer, 
                          GenerationConfig,
                          pipeline)

# mlflow 
import mlflow
import datetime 

# model metrics and evaluation 
import tiktoken
import textstat
import json

  from .autonotebook import tqdm as notebook_tqdm


## Load Dataset 

In [None]:
# credit -> https://mlflow.org/docs/latest/llms/transformers/tutorials/fine-tuning/transformers-peft.html  (Apache-2.0 license) 

# displays sample of dataset 
def displayTable(datasetOrSample):
    # A helper fuction to display a Transformer dataset or single sample contains multi-line string nicely
    pd.set_option("display.max_colwidth", None)
    pd.set_option("display.width", None)
    pd.set_option("display.max_rows", None)

    if isinstance(datasetOrSample, dict):
        df = pd.DataFrame(datasetOrSample, index=[0])
    else:
        df = pd.DataFrame(datasetOrSample)

    html = df.to_html().replace("\\n", "<br>")
    styledHtml = f"""<style> .dataframe th, .dataframe tbody td {{ text-align: left; padding-right: 30px; }} </style> {html}"""
    display(HTML(styledHtml))


In [None]:
datasetName = "ShashiVish/cover-letter-dataset"

# as we are just evaluating the model just use test split 
testDataset = load_dataset(datasetName, split="test[:10%]")

In [None]:
print(f"Test dataset contains {len(testDataset)} cv-to-coverletter pairs")
columnNames = list(testDataset.features)
print(columnNames)

Test dataset contains 35 cv-to-coverletter pairs
['Job Title', 'Preferred Qualifications', 'Hiring Company', 'Applicant Name', 'Past Working Experience', 'Current Working Experience', 'Skillsets', 'Qualifications', 'Cover Letter']


## Organise and Format Dataset

Parse dataset into chat templating. Based on this documentation https://huggingface.co/docs/transformers/en/chat_templating

An example of the chat templating format: 
>`messages = [ ` \
> `    {"role": "user", "content": "Hi there!"},`  \
> `    {"role": "assistant", "content": "Nice to meet you!"},`\
>`    {"role": "user", "content": "Can I ask a question?"}`\
>`]`

In [None]:
def applyMessageTemplate(row): 
    messages = [
        # system prompt 
        {"content": 
         "'You are a helpful assistant who writes tailored Cover Letters.", 
         "role": "system"},
         # Format database information into prompt 
        {"content": 
        f"""Generate Cover Letter using this information:
        Job Title: {row['Job Title']}, Preferred Qualifications: {row['Preferred Qualifications']}, Hiring Company: {row['Hiring Company']}, Applicant Name: {row['Applicant Name']}, Past Working Experience: {row['Past Working Experience']}, Current Working Experience: {row['Current Working Experience']}, Skillsets:{row['Skillsets']}, Qualifications: {row['Qualifications']}""",
        "role" : "user"},
        # ideal response from assistant 
        {"content": f"{row['Cover Letter']}", "role":"assistant"}
    ] 
    return {"messages":messages} 

In [None]:
# transform dataset to chat templating 
testDataset = testDataset.map(applyMessageTemplate,
                                  remove_columns=columnNames)

# display our transformed dataset 
displayTable(testDataset.select(range(1)))

Unnamed: 0,messages
0,"[{'content': 'You are a helpful assistant who writes tailored Cover Letters.', 'role': 'system'}, {'content': 'Generate Cover Letter using this information:Job Title: Data Scientist, Preferred Qualifications: BSc focused on data Science/computer Science/engineering 4+ years experience Developing and shipping production grade machine learning systems 2+ years building and shipping data Science based personalization services and recommendation systems experience in data Science or machine learning engineering Strong analytical and data Science skills, Hiring Company: XYZ Corporation, Applicant Name: John Smith, Past Working Experience: Data Analyst at ABC Company, Current Working Experience: Machine Learning Engineer at DEF Company, Skillsets:Python, R, scikit-learn, Keras, Tensorflow, Qualifications: BSc in Computer Science, 5+ years of experience in data science and machine learning', 'role': 'user'}, {'content': 'Dear Hiring Manager, I am writing to express my interest in the Data Scientist position at XYZ Corporation. With my strong background in data science and machine learning, I believe I am well-suited for this role. In my previous role as a Data Analyst at ABC Company, I gained experience in identifying and engineering features for modeling. I also have a proven track record of evaluating various modeling techniques and developing models. Additionally, my current position as a Machine Learning Engineer at DEF Company has allowed me to collaborate with stakeholders and put models into production. I have a BSc in Computer Science and over 5 years of experience in data science and machine learning. I am proficient in Python, R, scikit-learn, Keras, and Tensorflow. I am eager to learn from others and contribute to the growth of the team. I am confident that my strong analytical and data science skills, along with my ability to work well in cross-functional teams, make me a valuable asset to XYZ Corporation. I am excited about the opportunity to contribute to the development of personalization services and recommendation systems. Thank you for considering my application. I look forward to the opportunity to discuss how my skills and qualifications align with the needs of XYZ Corporation. Sincerely, John Smith', 'role': 'assistant'}]"


## Load Model and set up MLFlow tracking

Initalise MLFlow for tracking parameters and setting up our run and experiment. 

In [None]:
mlflow.set_tracking_uri("http://127.0.0.1:5000")
mlflow.set_experiment("Qwen 0.5B Instruct parameterisation")
runName = f"Quen2.5-0.5B-Instruct-hog-optConfig-{(datetime.datetime.now()).strftime('%Y-%m-%d-%H:%M:%S')}"

Load in Qwen2.5-0.5B Model through HF transformers pipeline. 

In [None]:
modelName = "Qwen/Qwen2.5-0.5B-Instruct"

pipe = pipeline("text-generation", modelName, torch_dtype="auto", device_map="auto")
pipe.tokenizer.padding_side="left"


Test out inference with model's default parameters. 

In [None]:
# get first record from dataset (we are inferencing without the last assistant prompt)
messageBatch = testDataset[0]['messages'][0:2]
# returns dictionary of chat templates 
print(messageBatch)

resultBatch = pipe(messageBatch, max_new_tokens=512, batch_size=2)
# 3m 51.4s on cpu 

[{'content': 'You are a helpful assistant who writes tailored Cover Letters.', 'role': 'system'}, {'content': 'Generate Cover Letter using this information:Job Title:  Data Scientist, Preferred Qualifications: BSc focused on data Science/computer Science/engineering\n4+ years experience Developing and shipping production grade machine learning systems\n2+ years building and shipping data Science based personalization services and recommendation systems\nexperience in data Science or machine learning engineering\nStrong analytical and data Science skills, Hiring Company: XYZ Corporation, Applicant Name: John Smith, Past Working Experience: Data Analyst at ABC Company, Current Working Experience: Machine Learning Engineer at DEF Company, Skillsets:Python, R, scikit-learn, Keras, Tensorflow, Qualifications: BSc in Computer Science, 5+ years of experience in data science and machine learning', 'role': 'user'}]


In [None]:
# model output 
print(resultBatch[0]['generated_text'][2]['content'])

Dear Hiring Manager,

I am writing to express my interest in the Data Scientist position at XYZ Corporation. With over four years of experience developing and shipping production-grade machine learning systems and two years building and shipping data science-based personalization services and recommendation systems, I am confident that my background and skills make me an ideal candidate for this role.

As a Data Scientist with a strong analytical and data science skillset, I have extensive experience in working with Python, R, scikit-learn, Keras, and TensorFlow. My previous work as a Data Analyst at ABC Company has honed my skills in data visualization and statistical analysis, which I believe would be valuable in a Data Scientist role.

In addition to my technical expertise, I possess strong communication and collaboration skills, which I bring to the team through my past experience in machine learning engineering and data science projects. My ability to work collaboratively with cro

## Evaluating the Model 

In [None]:
# retrieve the generation configurations of the pipline 
genConfig = pipe.generation_config

# save the gen configs so best performing model can be retrieved easily
# get path of current directory 
currPath = os.getcwd()  
configFileName = "Quen2.5-0.5B-Instruct-hog-Config.json"
# Change temp, top_k, top_p for different runs and experiments 
genConfig = genConfig.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct", 
                                        max_new_tokens=512, 
                                        batch_size=2, 
                                        temperature=0.7,
                                        top_k=20,
                                        top_p=0.8,)

# save changed config as json 
genConfig.save_pretrained('qwen_config', config_file_name=configFileName)


Saving the configurations of all generation arguments not only helps us track what parameters we change at every run, but it also allows easier retrieval and record our parameters and configuration once we find the best performing one. 

The reason why we have to save the configs to a JSON file and retrieve them again is beceause `pipe.generation_config` returns a `GenerationConfig` object instead of a dictionary. The documentation does not mention a retrieval method that returns a dictionary. 

In [None]:
# Open JSON file to retrieve config data 
with open(f"qwen_config/{configFileName}") as jsonFile:
    configData = json.load(jsonFile)

print(configData)


{'batch_size': 2, 'bos_token_id': 151643, 'do_sample': True, 'eos_token_id': [151645, 151643], 'max_new_tokens': 512, 'pad_token_id': 151643, 'repetition_penalty': 1.1, 'temperature': 0.7, 'top_k': 20, 'top_p': 0.8, 'transformers_version': '4.46.2'}


Reformat the model inputs and expected outputs into a pandas dataframe. As the [tutorial](https://mlflow.org/docs/latest/llms/llm-evaluate/index.html) specifes the needed format that MLFlow will accept for evaluation:


<code>evalData = pd.DataFrame( 
    {
        "inputs": [
            "What is MLflow?",
            "What is Spark?",
        ],
        "ground_truth": [
            "MLflow is an open-source platform for managing the end-to-end machine learning (ML) ",
            "Apache Spark is an open-source, distributed computing system designed for big data ",
        ],
    }
)</code>


In [None]:
# model inputs 
inputs = [record['messages'][1]['content'] for record in testDataset]
# model outputs 
groundTruth = [record['messages'][2]['content'] for record in testDataset]

# reformat testDataset 
evalData = pd.DataFrame(
    {
        "inputs": inputs,
        "ground_truth": groundTruth,
    }
)

In [None]:
# ensure that this is the system prompt we want to test with 
systemPrompt = testDataset[0]['messages'][0]
print(systemPrompt)

{'content': 'You are a helpful assistant who writes tailored Cover Letters.', 'role': 'system'}


In [None]:
with mlflow.start_run():
    # log our model with our configs 
    loggedModelInfo = mlflow.transformers.log_model(
        transformers_model=pipe,
        artifact_path="model",
        task = "text-generation",
        model_config=configData,
        # HF Chat templating for model input 
        messages = [
            systemPrompt,
            {"role":"user", "content":"{input}"}
        ]  
    )

    # start evaluation 
    # Use predefined text metrics to evaluate our model.
    results= mlflow.evaluate(
        # retrieve logged model 
        loggedModelInfo.model_uri,
        # pass in the test dataset inputs and expected outputs for MLFlow 
        evalData,
        targets="ground_truth",
        model_type="text",
    )

    print(f"See aggregated evaluation results below: \n{results.metrics}")

    # Evaluation result for each data record is available in `results.tables`.
    evalTable = results.tables["eval_results_table"]
    print(f"See evaluation table below: \n{evalTable}")

Downloading artifacts: 100%|██████████| 18/18 [04:28<00:00, 14.89s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00,  3.33it/s]
2024/11/12 02:56:46 INFO mlflow.models.evaluation.default_evaluator: Computing model predictions.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was d

See aggregated evaluation results below: 
{'toxicity/v1/mean': np.float64(0.006326894655025431), 'toxicity/v1/variance': np.float64(2.7198205797484146e-05), 'toxicity/v1/p90': np.float64(0.01205590143799782), 'toxicity/v1/ratio': 0.0, 'flesch_kincaid_grade_level/v1/mean': np.float64(15.962857142857144), 'flesch_kincaid_grade_level/v1/variance': np.float64(5.499477551020408), 'flesch_kincaid_grade_level/v1/p90': np.float64(17.44), 'ari_grade_level/v1/mean': np.float64(18.648571428571426), 'ari_grade_level/v1/variance': np.float64(9.728783673469389), 'ari_grade_level/v1/p90': np.float64(21.16)}


Downloading artifacts: 100%|██████████| 1/1 [00:00<00:00, 25.84it/s]
2024/11/12 04:04:33 INFO mlflow.tracking._tracking_service.client: 🏃 View run gentle-chimp-268 at: http://127.0.0.1:5000/#/experiments/504111313041053431/runs/ad270994a1f74291aba5fcc399405cd3.
2024/11/12 04:04:33 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://127.0.0.1:5000/#/experiments/504111313041053431.


See evaluation table below: 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           

This notebooks process was used to run over 10+ experiments to find the most suitable configurations. A model that managed to achieve the highest average on the `automated readability index ` and the `flesch kincaid grade level` had the configurations: 
- Temperature : 0.7 
- Top_k = 20
- Top_p = 0.8   \
This was a more analytical based configuration. 