# Use Case 1: Data Augmentation

Code authored by: Shaw Talebi

Video link: https://youtu.be/3JsgtpX_rpU <br>
Blog link: https://towardsdatascience.com/3-ai-use-cases-that-are-not-a-chatbot-f4f328a2707a

### imports

In [1]:
import polars as pl

from openai import OpenAI
from sk import my_sk

import numpy as np

### Load data

In [2]:
# load resume data
df = pl.read_csv('data/resumes.csv')

### Extracting YoE

In [3]:
# set up connection to OpenAI API
client = OpenAI(api_key=my_sk)

In [4]:
# create system prompt
system_prompt = """You are a resume analysis assistant. Your task is to classify resumes into one of five experience level buckets based on the number of years of professional experience listed in the resume.

The experience level buckets are:

1. Entry Level (0-2 years): Suitable for recent graduates or individuals new to the industry.
2. Junior Level (2-5 years): Candidates with some professional experience, often having foundational skills and looking to build their expertise.
3. Mid Level (5-10 years): Professionals with substantial experience, capable of handling more complex tasks and possibly taking on leadership roles.
4. Senior Level (10-15 years): Highly experienced individuals who are often experts in their field and may hold senior or managerial positions.
5. Executive Level (15+ years): Veteran professionals with extensive experience, likely to be in top management or executive roles.

When given a resume, analyze the text to determine the total years of professional experience and classify the resume into the appropriate experience level bucket."""

In [5]:
prompt_template = lambda resume: f"""I have a resume, and I need to identify the candidate's experience level. Here are the experience level buckets:

1 = Entry Level (0-2 years)
2 = Junior Level (2-5 years)
3 = Mid Level (5-10 years)
4 = Senior Level (10-15 years)
5 = Executive Level (15+ years)

Please analyze the following resume text and identify the experience level of the candidate. Ensure your response is a single digit between 1-5 indicating the experience level based on the above rubric.

### Resume

{resume}

### Output: """

In [6]:
exp_level_list = []

# extract YoE for each resume in df
for i in range(len(df)):
    
    prompt = prompt_template(df["Resume"][i])
    
    response = client.chat.completions.create(
      model="gpt-4o",
      messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt}
      ],
      max_tokens=1,
      n=1,
      temperature = 0.1
    )
    
    exp_level_list.append(response.choices[0].message.content)

In [8]:
# convert list to numpy array of integers
exp_level_arr = np.array(exp_level_list).astype(int)

In [10]:
exp_level_arr

array([5, 5, 5, 5, 4, 2, 3, 3, 5, 3, 3, 3, 3, 2, 3, 3, 3, 2, 2, 4, 3, 4,
       4, 4, 5, 1, 5, 5, 3, 3, 2, 1, 5, 1, 3, 5, 5, 2, 3, 3, 3, 5, 3, 4,
       3, 3, 3, 3, 3, 3, 3, 3, 2, 3, 4, 4, 3, 4, 2, 3, 3, 4, 5, 3, 4, 3,
       5, 3, 3, 3, 2, 3, 2, 3, 3, 4, 3, 3, 5, 3, 3, 3, 3, 3, 5, 4, 2, 3,
       3, 5, 1, 2, 3, 2, 3, 3, 2, 3, 3, 5, 2])

### add new data to df

In [None]:
df = df.with_columns(pl.Series(name="exp_level", values=exp_level_arr))

In [12]:
# write data to file
df.write_csv('data/resumes_augmented.csv')