In [1]:
SYSTEM = """You are an intelligent assistant dedicated to extracting management levels and job titles from user queries. Before doing so, you must understand what a functional area is."""
USER = """
Definition of a Functional Area:
- A functional area is a department or group of personnel tasked with a specific organizational function. These include departments like finance, marketing, engineering, etc.

Definition of Management Level:
- A management level refers to a hierarchical position within an organization without a specific functional area. It encompasses broader titles that may include roles across different functional areas.
- Management levels include: "Board of Directors," "CSuite and President," "Executive and Sr. VP," "General Manager," "VP," "Director," "Manager," "Senior (Individual Contributor)," "Mid (Individual Contributor)," and "Junior."

Definition of a Job Title:
- A job title refers to a specific employment position combined with a functional area.
- Examples include 'VP of Engineering' (functional area: Engineering) and 'Director of Finance' (functional area: Finance).

Instructions:
1. Management Levels: Only return management levels that match the predefined set: ["Board of Directors," "CSuite and President," "Executive and Sr. VP," "General Manager," "VP," "Director," "Manager," "Senior (Individual Contributor)," "Mid (Individual Contributor)," "Junior"].
2. Job Titles: Normalize the job title after extracting it from the text. For example, convert "ceo" to "Chief Executive Officer" and include both the full title and its abbreviation if mentioned in the query, e.g., "VP of Engineering" and "Vice President of Engineering."
3. Response Format: Your response must be a dictionary with two keys: "management_levels" and "titles". Each key should have a list of management levels and titles respectively.
4. If a keyword is classified as title, don't include it in the management levels and vice versa. e.g if "VP of Engineering" is classified as title then don't include "VP" in management levels.

Example:
Query: "Provide a list of CFOs and VPs working in the technology sector"
Output: {"management_levels": ["VP"], "titles": ["Chief Financial Officer", "CFO"]}

Guidelines:
- "Leaders" implies CSuite/President level
- Titles without functional areas (e.g., "Founder", "Managing Director") go in titles list
- Use knowledge base for educated guesses about classifications

Query: "{{query}}"
Output:"""

In [2]:
from jinja2 import Template
import json

In [3]:
with open("onboarding_task_dataset.jsonl") as f:
    json_data = []
    for line in f:
        example = json.loads(line)
        json_data.append(example)


In [4]:
llama_data = []



In [5]:
PROMPT_TEMPLATE_INFERENCE = """
<|begin_of_text|><|start_header_id|>system<|end_header_id|>{{system_message}}<|eot_id|>
<|begin_of_text|><|start_header_id|>user<|end_header_id|>{{user_message}}<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>{{prefix}}"""

PROMPT_TEMPLATE_FINETUNING = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>{{system}}<|eot_id|>
    <|start_header_id|>user<|end_header_id|>{{user}}<|eot_id|>
    <|start_header_id|>assistant<|end_header_id|>{{assistant}}<|eot_id|>"""


In [6]:
correct_data = []

In [7]:
for item in json_data:
    print(type(item))  # Check the type of each item
    # print(item)         # Print each item to inspect its structure


<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'di

In [8]:
for item in json_data:
    correct_data.append({
        "query" : item["query"],
        "output" : {
            "management_levels" : item["output"]["management_level"],
            "titles" : item["output"]["title"]
        }
    })


In [9]:
with open("finetuning_data.json", "w") as f:
    json.dump(correct_data, f)

In [10]:
with open("finetuning_data.json") as f:
    finetuning_data = json.load(f)

In [11]:
special_tokens_data = []

In [12]:
for item in finetuning_data:
    user = Template(USER).render({"query": item["query"]})
    prompt = Template(PROMPT_TEMPLATE_FINETUNING).render({
        "system": SYSTEM,
        "user": user,
        "assistant": item["output"]
    })
    special_tokens_data.append({
        "text": prompt
    })


In [13]:
special_tokens_data

[{'text': '<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an intelligent assistant dedicated to extracting management levels and job titles from user queries. Before doing so, you must understand what a functional area is.<|eot_id|>\n    <|start_header_id|>user<|end_header_id|>\nDefinition of a Functional Area:\n- A functional area is a department or group of personnel tasked with a specific organizational function. These include departments like finance, marketing, engineering, etc.\n\nDefinition of Management Level:\n- A management level refers to a hierarchical position within an organization without a specific functional area. It encompasses broader titles that may include roles across different functional areas.\n- Management levels include: "Board of Directors," "CSuite and President," "Executive and Sr. VP," "General Manager," "VP," "Director," "Manager," "Senior (Individual Contributor)," "Mid (Individual Contributor)," and "Junior."\n\nDefinition of a Job T

In [14]:
with open("llama-3-finetuning-data.jsonl", 'w', encoding='utf-8') as f:
        for item in special_tokens_data:
            json_line = json.dumps(item)
            f.write(json_line + '\n')