Important aspects of a job description:
1. Job title
2. Responsibilities
3. Required Qualifications
4. Preferred Qualifications

Some resumes may have:
1. Skills and Competencies
2. Experience Requirements (e.g., "5+ years of marketing experience")

These extra sections can be grouped under "Required Qualifications"

In [2]:
from utils.with_structured_output import with_structured_output
from pydantic import BaseModel, RootModel, Field
from pprint import pprint

In [14]:
class JobDescription(BaseModel):
    job_title: str                      = Field(..., alias="Job Title")
    responsibilities: list[str]         = Field(..., alias="Responsibilities")
    required_qualifications: list[str]  = Field(..., alias="Required Qualifications")
    preferred_qualifications: list[str] = Field(..., alias="Preferred Qualifications")

In [33]:
JOB_DESC_EXTRACTION_TEMPLATE = """
You are an expert at parsing important information from job descriptions. Given a job description, your job is to parse key details from it in this format:
    {{
        "Job Title": "<Job Title>",
        "Responsibilities": ["list", "of", "responsibilities"],
        "Required Qualifications": ["list", "of", "required", "qualifications"],
        "Preferred Qualifications:" ["list", "of", "preferred", "qualifications"]
    }}
    
Note that the job title will usually be the first line of text from the job description. **If there is no clear job title, please use the context provided to infer the job title.**

Some job descriptions may have a "Skills and Competencies" section or "Experience Requirements." These should be grouped together under "Required Qualifications."

Job descriptions may have sections that do not exactly match one of the sections defined in the output format. If this is the case, please use surrounding context to infer which output section is relevant. For example, "What we need to see" is equivalent to "Required Qualifications."

Often, entries listed under "Responsibilities," "Required Qualifications," and "Preferred Qualifications" (or equivalent) will be formatted as a bulleted list. Please include all items from the bulleted list. 

**Ensure there is no overlap between output sections.** For example, the same information should not appear in both "Responsibilities" and "Required Qualifications."

Job Description text:
{job_desc}

Output:
"""

In [37]:
with open("../sample-data/google-swe-senior.txt", "r") as file:
    job_desc = file.read()

In [38]:
job_desc = with_structured_output(
    prompt=JOB_DESC_EXTRACTION_TEMPLATE.format(job_desc=job_desc),
    schema=JobDescription)
pprint(job_desc.model_dump())

{'job_title': 'Senior Software Engineer, AI/ML GenAI',
 'preferred_qualifications': ["Master's degree or PhD in Computer Science or "
                              'related technical field.',
                              '1 year of experience in a technical leadership '
                              'role.',
                              'Experience developing accessible technologies.'],
 'required_qualifications': ['Bachelor’s degree or equivalent practical '
                             'experience.',
                             '5 years of experience with software development '
                             'in one or more programming languages, and with '
                             'data structures/algorithms.',
                             '3 years of experience testing, maintaining, or '
                             'launching software products, and 1 year of '
                             'experience with software design and '
                             'architecture.',
   