# Prompt Tuning Qwen2.5 0.5B

Why prompt tune? 
- Faster adaptation to new tasks 
- Flexible method of adapting models to specific tasks

Objectives to a good prompt: 
- Priortise information 
- Clear and Concise Language 
- Remove Unnecessary Information 
- Use prositive language 
- Specify Constraints 
- Provide Context 
- Test for clarity 
- Avoid Overly Long Prompts 


References: 
1) https://medium.com/@shahshreyansh20/prompt-tuning-a-powerful-technique-for-adapting-llms-to-new-tasks-6d6fd9b83557

## Imports 

In [2]:
# data 
import torch 
from datasets import load_dataset
import pandas as pd
from IPython.display import HTML, display

# loading model and training 
from transformers import pipeline

  from .autonotebook import tqdm as notebook_tqdm


## Load Dataset

In [3]:
# credit -> https://mlflow.org/docs/latest/llms/transformers/tutorials/fine-tuning/transformers-peft.html  (Apache-2.0 license) 

# displays sample of dataset 
def displayTable(datasetOrSample):
    # A helper fuction to display a Transformer dataset or single sample contains multi-line string nicely
    pd.set_option("display.max_colwidth", None)
    pd.set_option("display.width", None)
    pd.set_option("display.max_rows", None)

    if isinstance(datasetOrSample, dict):
        df = pd.DataFrame(datasetOrSample, index=[0])
    else:
        df = pd.DataFrame(datasetOrSample)

    html = df.to_html().replace("\\n", "<br>")
    styledHtml = f"""<style> .dataframe th, .dataframe tbody td {{ text-align: left; padding-right: 30px; }} </style> {html}"""
    display(HTML(styledHtml))


In [4]:
datasetName = "ShashiVish/cover-letter-dataset"

# as we are just evaluating the model just use test split 
testDataset = load_dataset(datasetName, split="test[:10%]")

In [5]:
print(f"Test dataset contains {len(testDataset)} cv-to-coverletter pairs")
columnNames = list(testDataset.features)
print(columnNames)

Test dataset contains 35 cv-to-coverletter pairs
['Job Title', 'Preferred Qualifications', 'Hiring Company', 'Applicant Name', 'Past Working Experience', 'Current Working Experience', 'Skillsets', 'Qualifications', 'Cover Letter']


Parse dataset into chat templating. Based on this documentation https://huggingface.co/docs/transformers/en/chat_templating

An example of the chat templating format: 
>`messages = [ ` \
> `    {"role": "user", "content": "Hi there!"},`  \
> `    {"role": "assistant", "content": "Nice to meet you!"},`\
>`    {"role": "user", "content": "Can I ask a question?"}`\
>`]`

In [6]:
def applyMessageTemplate(row): 
    messages = [
         # Format database information into prompt 
        {"content": 
        f"""Generate Cover Letter using this information:
        Job Title: {row['Job Title']}, Preferred Qualifications: {row['Preferred Qualifications']}, Hiring Company: {row['Hiring Company']}, Applicant Name: {row['Applicant Name']}, Past Working Experience: {row['Past Working Experience']}, Current Working Experience: {row['Current Working Experience']}, Skillsets:{row['Skillsets']}, Qualifications: {row['Qualifications']}""",
        "role" : "user"},
    ] 
    return {"messages":messages} 

In [7]:
# transform dataset to chat templating 
testDataset = testDataset.map(applyMessageTemplate,
                                  remove_columns=columnNames)

# display our transformed dataset 
displayTable(testDataset.select(range(1)))

Unnamed: 0,messages
0,"[{'content': 'Generate Cover Letter using this information:  Job Title: Data Scientist, Preferred Qualifications: BSc focused on data Science/computer Science/engineering 4+ years experience Developing and shipping production grade machine learning systems 2+ years building and shipping data Science based personalization services and recommendation systems experience in data Science or machine learning engineering Strong analytical and data Science skills, Hiring Company: XYZ Corporation, Applicant Name: John Smith, Past Working Experience: Data Analyst at ABC Company, Current Working Experience: Machine Learning Engineer at DEF Company, Skillsets:Python, R, scikit-learn, Keras, Tensorflow, Qualifications: BSc in Computer Science, 5+ years of experience in data science and machine learning', 'role': 'user'}]"


In [8]:
modelName = "Qwen/Qwen2.5-0.5B-Instruct"

# define the default pipeline 
pipe = pipeline("text-generation", modelName, torch_dtype="auto", device_map="auto")
pipe.tokenizer.padding_side="left"

# Testing Prompts 

## 1) "Generate a Cover Letter"

In [None]:
# get first record from dataset (we are inferencing without the last assistant prompt)
systemPrompt = {
    'content' : "Generate a cover letter", 
    'role' : 'system', 
}
messageBatch = [systemPrompt, testDataset[0]['messages'][0]]
# returns dictionary of chat templates 
print(messageBatch)

# generate 
resultBatch = pipe(messageBatch, max_new_tokens=512, batch_size=2)


[{'content': 'Generate a cover letter', 'role': 'system'}, {'content': 'Generate Cover Letter using this information:\n        Job Title:  Data Scientist, Preferred Qualifications: BSc focused on data Science/computer Science/engineering\n4+ years experience Developing and shipping production grade machine learning systems\n2+ years building and shipping data Science based personalization services and recommendation systems\nexperience in data Science or machine learning engineering\nStrong analytical and data Science skills, Hiring Company: XYZ Corporation, Applicant Name: John Smith, Past Working Experience: Data Analyst at ABC Company, Current Working Experience: Machine Learning Engineer at DEF Company, Skillsets:Python, R, scikit-learn, Keras, Tensorflow, Qualifications: BSc in Computer Science, 5+ years of experience in data science and machine learning', 'role': 'user'}]


In [None]:
print(resultBatch[0]['generated_text'][-1]['content'])

Dear Hiring Manager,

I am writing to express my enthusiasm for the Data Scientist position at XYZ Corporation. With over four years of experience in developing and shipping production-grade machine learning systems and building and shipping data science-based personalized services and recommendation systems, I have honed my analytical and data science skills.

My academic background has provided me with a strong foundation in both computer science and engineering, which is crucial for the field of data science and machine learning. Additionally, my past work experience as a data analyst at ABC Company and current role as a Machine Learning Engineer at DEF Company have equipped me with the necessary technical skills to deliver high-quality solutions.

In my previous roles, I have leveraged Python, R, Scikit-Learn, Keras, and TensorFlow to develop robust models that are capable of handling complex data sets. My expertise in these technologies allows me to build scalable and efficient pr

This prompt is clear and concise. However does not provide a good context to the user presented information. After running this prompt 10 times, the average inference time was 4m 51.4s on CPU. Which is not the fastest inference time. The prompt also seem to list the data given. 

## 2. "You are a powerful cover letter generator. Generate a cover letter."

In [None]:
# get first record from dataset (we are inferencing without the last assistant prompt)
# You are a powerful cover letter generator. Generate a professional formal personalised cover letter based on job title, qualifications, hiring company name, applicant name, work experience, skills. The cover letter must be between 300 - 400 words.
systemPrompt = {
    'content' : "You are a powerful cover letter generator. Generate a cover letter.", 
    'role' : 'system', 
}
messageBatch = [systemPrompt, testDataset[0]['messages'][0]]
# returns dictionary of chat templates 
print(messageBatch)

# generate 
resultBatch = pipe(messageBatch, max_new_tokens=512, batch_size=2)

# print output 
print(resultBatch[0]['generated_text'][-1]['content'])

[{'content': 'You are a powerful cover letter generator. Generate a cover letter.', 'role': 'system'}, {'content': 'Generate Cover Letter using this information:\n        Job Title:  Data Scientist, Preferred Qualifications: BSc focused on data Science/computer Science/engineering\n4+ years experience Developing and shipping production grade machine learning systems\n2+ years building and shipping data Science based personalization services and recommendation systems\nexperience in data Science or machine learning engineering\nStrong analytical and data Science skills, Hiring Company: XYZ Corporation, Applicant Name: John Smith, Past Working Experience: Data Analyst at ABC Company, Current Working Experience: Machine Learning Engineer at DEF Company, Skillsets:Python, R, scikit-learn, Keras, Tensorflow, Qualifications: BSc in Computer Science, 5+ years of experience in data science and machine learning', 'role': 'user'}]
[Your Name]  
[Your Address]  
[City, State, Zip Code]  
[Email A

This prompt gave positive affirmation to the model. However, that could constitute to providing unnecessary information. The average inference time was 6m 11.3s on CPU. It also contained addresses of the recipient and the employer. Though, to fit different contexts that the cover letter might be used in, it might be unneeded overhead and will be an unnecessary addition to the word count. The prompt also persistantly provides outputs with featuring the sentence, "Feel free to modify any part of this template to better fit your specific situation and preferences", which is not a desired sentence. Although, it uses some provided statistics in it's output. It also averages around 362 words.

## 3. "You are a powerful cover letter generator. Generate a formal cover letter."

In [None]:
# get first record from dataset (we are inferencing without the last assistant prompt)
# You are a powerful cover letter generator. Generate a professional formal personalised cover letter based on job title, qualifications, hiring company name, applicant name, work experience, skills. The cover letter must be between 300 - 400 words.
systemPrompt = {
    'content' : "You are a powerful cover letter generator. Generate a formal cover letter.", 
    'role' : 'system', 
}
messageBatch = [systemPrompt, testDataset[0]['messages'][0]]
# returns dictionary of chat templates 
print(messageBatch)

# generate 
resultBatch = pipe(messageBatch, max_new_tokens=512, batch_size=2)

# print output 
print('\n')
print(resultBatch[0]['generated_text'][-1]['content']) 

[{'content': 'You are a powerful cover letter generator. Generate a formal cover letter.', 'role': 'system'}, {'content': 'Generate Cover Letter using this information:\n        Job Title:  Data Scientist, Preferred Qualifications: BSc focused on data Science/computer Science/engineering\n4+ years experience Developing and shipping production grade machine learning systems\n2+ years building and shipping data Science based personalization services and recommendation systems\nexperience in data Science or machine learning engineering\nStrong analytical and data Science skills, Hiring Company: XYZ Corporation, Applicant Name: John Smith, Past Working Experience: Data Analyst at ABC Company, Current Working Experience: Machine Learning Engineer at DEF Company, Skillsets:Python, R, scikit-learn, Keras, Tensorflow, Qualifications: BSc in Computer Science, 5+ years of experience in data science and machine learning', 'role': 'user'}]


[Your Name]  
[Your Address]  
[City, State, Zip Code]  

This prompt was similar to [our last prompt](##2.-"You-are-a-powerful-cover-letter-generator.-Generate-a-cover-letter.") However, we have specified that we want a formal cover letter.  
It does contain more formal output but was unable to insert data provided into it's output. For example, "[ABC Company]" was outputted instead of ABC Company. 

It averaged 5m 38.3s on CPU and 320 words. It used statistics in the given contexts which was emphasised by action verbs. Also has addresses. 

## 4. "You are a powerful cover letter generator. Generate a professional formal personalised cover letter based on job title, qualifications, hiring company name, applicant name, work experience, skills."

In [None]:
# get first record from dataset (we are inferencing without the last assistant prompt)
systemPrompt = {
    'content' : "You are a powerful cover letter generator. Generate a professional formal personalised cover letter based on job title, qualifications, hiring company name, applicant name, work experience, skills.", 
    'role' : 'system', 
}
messageBatch = [systemPrompt, testDataset[0]['messages'][0]]
# returns dictionary of chat templates 
print(messageBatch)

# generate 
resultBatch = pipe(messageBatch, max_new_tokens=512, batch_size=2)

# print output 
print('\n')
print(resultBatch[0]['generated_text'][-1]['content'])

# 6m 25.4s on cpu 
# talks about soft skills 
# uses companie names without brackets 
# has name and address 
# 363

[{'content': 'You are a powerful cover letter generator. Generate a professional formal personalised cover letter based on job title, qualifications, hiring company name, applicant name, work experience, skills.', 'role': 'system'}, {'content': 'Generate Cover Letter using this information:\n        Job Title:  Data Scientist, Preferred Qualifications: BSc focused on data Science/computer Science/engineering\n4+ years experience Developing and shipping production grade machine learning systems\n2+ years building and shipping data Science based personalization services and recommendation systems\nexperience in data Science or machine learning engineering\nStrong analytical and data Science skills, Hiring Company: XYZ Corporation, Applicant Name: John Smith, Past Working Experience: Data Analyst at ABC Company, Current Working Experience: Machine Learning Engineer at DEF Company, Skillsets:Python, R, scikit-learn, Keras, Tensorflow, Qualifications: BSc in Computer Science, 5+ years of ex

 - Long prompt that provided context to the given data fields from the user. 
 - Mentions addresses, unneeded overhead. 
 - Uses statistics 
 - avg. 6m 25.4s on CPU 
 - Talks about soft skills 
 - 363 words avg. 

## 5. "You are a helpful assistant who writes tailored Cover Letters."

In [None]:
# get first record from dataset (we are inferencing without the last assistant prompt)
systemPrompt = {
    'content' : "You are a helpful assistant who writes tailored Cover Letters.", 
    'role' : 'system', 
}
messageBatch = [systemPrompt, testDataset[0]['messages'][0]]
# returns dictionary of chat templates 
print(messageBatch)

# generate 
resultBatch = pipe(messageBatch, max_new_tokens=512, batch_size=2)

# print output 
print('\n')
print(resultBatch[0]['generated_text'][-1]['content'])

[{'content': 'You are a helpful assistant who writes tailored Cover Letters.', 'role': 'system'}, {'content': 'Generate Cover Letter using this information:\n        Job Title:  Data Scientist, Preferred Qualifications: BSc focused on data Science/computer Science/engineering\n4+ years experience Developing and shipping production grade machine learning systems\n2+ years building and shipping data Science based personalization services and recommendation systems\nexperience in data Science or machine learning engineering\nStrong analytical and data Science skills, Hiring Company: XYZ Corporation, Applicant Name: John Smith, Past Working Experience: Data Analyst at ABC Company, Current Working Experience: Machine Learning Engineer at DEF Company, Skillsets:Python, R, scikit-learn, Keras, Tensorflow, Qualifications: BSc in Computer Science, 5+ years of experience in data science and machine learning', 'role': 'user'}]


Dear Hiring Manager,

I am writing to express my strong interest in 

- 3m 47.2s on CPU avg. 
- Quickest inference time 
- Does not mention addresses 
- Uses given statistics 
- Mentions soft skills 
- May hallunciate experiences 
- Average 251 words 

## 5.  "You are a helpful assistant who writes tailored Cover Letters using the information given. The letter must be 250 to 450 words long."

In [16]:
# get first record from dataset (we are inferencing without the last assistant prompt)
systemPrompt = {
    'content' : "You are a helpful assistant who writes tailored Cover Letters using the information given. The letter must be 250-450 words long.", 
    'role' : 'system', 
}

messageBatch = [systemPrompt, testDataset[0]['messages'][0]]
# returns dictionary of chat templates 
print(messageBatch)

# generate 
resultBatch = pipe(messageBatch, max_new_tokens=512, batch_size=2)

# print output 
print('\n')
print(resultBatch[0]['generated_text'][-1]['content'])

[{'content': 'You are a helpful assistant who writes tailored Cover Letters using the information given. The letter must be 250-450 words long.', 'role': 'system'}, {'content': 'Generate Cover Letter using this information:\n        Job Title:  Data Scientist, Preferred Qualifications: BSc focused on data Science/computer Science/engineering\n4+ years experience Developing and shipping production grade machine learning systems\n2+ years building and shipping data Science based personalization services and recommendation systems\nexperience in data Science or machine learning engineering\nStrong analytical and data Science skills, Hiring Company: XYZ Corporation, Applicant Name: John Smith, Past Working Experience: Data Analyst at ABC Company, Current Working Experience: Machine Learning Engineer at DEF Company, Skillsets:Python, R, scikit-learn, Keras, Tensorflow, Qualifications: BSc in Computer Science, 5+ years of experience in data science and machine learning', 'role': 'user'}]


D

- 5m 13.8s on CPU avg. 
- Does not mention addresses 
- Uses given statistics 
- 386 words avg.  
- Mentions soft skills 
- May hallunciate experiences 

## 6. "You are a helpful assistant who writes 250 to 450 word long tailored Cover Letters using the information given."

In [None]:
# get first record from dataset (we are inferencing without the last assistant prompt)
systemPrompt = {
    'content' : "You are a helpful assistant who writes 250 to 450 word long tailored Cover Letters using the information given.", 
    'role' : 'system', 
}
messageBatch = [systemPrompt, testDataset[0]['messages'][0]]
# returns dictionary of chat templates 
print(messageBatch)

# generate 
resultBatch = pipe(messageBatch, max_new_tokens=512, batch_size=2)

# print output 
print('\n')
print(resultBatch[0]['generated_text'][-1]['content'])

[{'content': 'You are a helpful assistant who writes 250 to 450 word long tailored Cover Letters using the information given.', 'role': 'system'}, {'content': 'Generate Cover Letter using this information:\n        Job Title:  Data Scientist, Preferred Qualifications: BSc focused on data Science/computer Science/engineering\n4+ years experience Developing and shipping production grade machine learning systems\n2+ years building and shipping data Science based personalization services and recommendation systems\nexperience in data Science or machine learning engineering\nStrong analytical and data Science skills, Hiring Company: XYZ Corporation, Applicant Name: John Smith, Past Working Experience: Data Analyst at ABC Company, Current Working Experience: Machine Learning Engineer at DEF Company, Skillsets:Python, R, scikit-learn, Keras, Tensorflow, Qualifications: BSc in Computer Science, 5+ years of experience in data science and machine learning', 'role': 'user'}]


Dear [Hiring Manage

- 3m 31.6s on CPU avg. 
- quick but 183 words avg. 
- uses statistics 
- Does not feature address

# Conclusion 

Chosen prompt: "You are a helpful assistant who writes tailored Cover Letters." 

As it is clear and concise. Uses the information given by the user. Contains positive language. Has the quickest inference time. But word length is not long but reasonable length. 