In [1]:
import pandas as pd

In [2]:
df = pd.read_csv("data/jobs.csv")

In [3]:
df.shape

(1079, 6)

In [4]:
df.head()

Unnamed: 0,Position,Company_Name,Location,Post_Month,Post_Year,Details
0,IT Manager,10 Percent Recruiting Ltd.,"Vancouver, British Columbia, Canada",June,2024,Position Title: IT Manager\n\nLocation: Vancou...
1,"Manager, IT Support",Procom,"Toronto, Ontario, Canada",June,2024,"On behalf of our public sector client, PROCOM ..."
2,Director of Information Technology,Southampton Financial Inc,"Toronto, Ontario, Canada",June,2024,Southampton Financials’ Mission: Bring Clarity...
3,"Manager, Information Technology Services",Town of Tillsonburg,"Tillsonburg, Ontario, Canada",June,2024,The Town of Tillsonburg is looking for a Manag...
4,Systems Manager,Accor,"Winnipeg, Manitoba, Canada",June,2024,"Company Description\n\n""Why work for Accor?""\n..."


In [5]:
df = df.head(500)

Ai Processing

In [6]:
# Generate new column that contains JSON object with job description following:
"""
{
  "role_summary": "A concise, non-technical summary of the job role. It should describe the primary responsibilities of the role in simple language, avoiding jargon. Focus on what the role does rather than listing requirements.",
  "key_terms": [
    {
      "term": "Technical Term or jargon from the job description",
      "explanation": "A simple explanation of the term in the context of the role. This helps recruiters understand technical jargon without needing domain expertise."
    }
  ],
  "skill_priorities": {
    "must_have": ["List of essential skills required for the role. These are non-negotiable and should be explicitly mentioned in the job description."],
    "nice_to_have": ["List of preferred skills that are beneficial but not mandatory. These are often marked as 'preferred,' 'a plus,' or 'optional' in the job description."]
  },
  "proposed_screening_questions_with_answers": [
    {
      "question": "A role-specific question to assess candidate expertise. The question should focus on technical skills, problem-solving, or past experiences relevant to the role.",
      "example_answer": "An example of a strong candidate response to the question. The answer should demonstrate technical depth, problem-solving, and relevance to the role."
    }
  ],
  "red_flags": [
    "Indicators of potential mismatches for the role, offering actionable insights for recruiters based on the job offer. Examples include 'Avoid candidates without cloud experience'."
  ],
  "confidence_score": "A numerical score between 0 and 100 indicating the model's confidence in the accuracy of its analysis."
}
"""

'\n{\n  "role_summary": "A concise, non-technical summary of the job role. It should describe the primary responsibilities of the role in simple language, avoiding jargon. Focus on what the role does rather than listing requirements.",\n  "key_terms": [\n    {\n      "term": "Technical Term or jargon from the job description",\n      "explanation": "A simple explanation of the term in the context of the role. This helps recruiters understand technical jargon without needing domain expertise."\n    }\n  ],\n  "skill_priorities": {\n    "must_have": ["List of essential skills required for the role. These are non-negotiable and should be explicitly mentioned in the job description."],\n    "nice_to_have": ["List of preferred skills that are beneficial but not mandatory. These are often marked as \'preferred,\' \'a plus,\' or \'optional\' in the job description."]\n  },\n  "proposed_screening_questions_with_answers": [\n    {\n      "question": "A role-specific question to assess candidate

In [7]:
system_message_prompt = """
You are an AI assistant that converts job descriptions into structured JSON data to assist recruiters. Your task is to extract key details from a given job description and format them into a JSON object with the following structure:

{
  "role_summary": "A concise, non-technical summary of the job role. It should describe the primary responsibilities of the role in simple language, avoiding jargon. Focus on what the role does rather than listing requirements.",
  "key_terms": [
    {
      "term": "Technical Term or jargon from the job description",
      "explanation": "A simple explanation of the term in the context of the role. This helps recruiters understand technical jargon without needing domain expertise."
    }
  ],
  "skill_priorities": {
    "must_have": ["List of essential skills required for the role. These are non-negotiable and should be explicitly mentioned in the job description."],
    "nice_to_have": ["List of preferred skills that are beneficial but not mandatory. These are often marked as 'preferred,' 'a plus,' or 'optional' in the job description."]
  },
  "proposed_screening_questions_with_answers": [
    {
      "question": "A role-specific question to assess candidate expertise. The question should focus on technical skills, problem-solving, or past experiences relevant to the role.",
      "example_answer": "An example of a strong candidate response to the question. The answer should demonstrate technical depth, problem-solving, and relevance to the role."
    }
  ],
  "red_flags": [
    "Indicators of potential mismatches for the role, offering actionable insights for recruiters based on the job offer. Examples include 'Avoid candidates without cloud experience'."
  ],
  "confidence_score": "A numerical score between 0 and 100 indicating the model's confidence in the accuracy of its analysis."
}

Guidelines:
- Ensure the role summary is written in simple, non-technical language, focusing on primary responsibilities rather than qualifications.
- Extract key technical terms and provide clear, non-technical explanations.
- Identify and categorize skills into 'must-have' and 'nice-to-have' based on explicit mentions in the job description.
- Generate screening questions that assess relevant competencies and provide an example of a strong answer.
- Highlight red flags that indicate potential mismatches, ensuring they are actionable.
- Assign a confidence score (0-100) reflecting the reliability of the extracted insights.

IMPORTANT: Output only the raw JSON object. Do not wrap it in markdown code blocks (no ```json or ``` tags). Do not include any explanation text before or after the JSON.
Do not include any additional text or comments. The output should be a valid JSON object that can be directly used in applications. Ensure the JSON is well-structured and adheres to the specified format.
"""

In [8]:
# ollama serve -> http://127.0.0.1:11434

Create a JobAnalysis class to structure the output

In [9]:
from typing import List, Optional
from pydantic import BaseModel

class KeyTerm(BaseModel):
    term: str
    explanation: str

class SkillPriorities(BaseModel):
    must_have: List[str]
    nice_to_have: List[str]

class ScreeningQuestion(BaseModel):
    question: str
    example_answer: str

class JobAnalysis(BaseModel):
    role_summary: str
    key_terms: List[KeyTerm]
    skill_priorities: SkillPriorities
    proposed_screening_questions_with_answers: List[ScreeningQuestion]
    red_flags: List[str]
    confidence_score: float
     

Create function that analyze job descriptions

In [10]:
from typing import Union
import requests
import json
import time
from pydantic import ValidationError

url = "http://127.0.0.1:11434/api/chat"

llama3_model = "llama3.2"


def analyze_job_with_retries(
    job_description: str,
    prompt: str,
    model_name: str,
    max_retries: int = 3,
    initial_retry_delay: float = 1.0
) -> Union[JobAnalysis, str]:
    """
    Analyzes job descriptions with retry logic.

    Args:
        job_description: The job description text to analyze.
        prompt: System prompt for the AI model.
        model_name: Name of the LLM model to use.
        max_retries: Maximum number of retry attempts.
        initial_retry_delay: Initial delay between retries in seconds.

    Returns:
        Union[JobAnalysis, str]: Parsed job analysis results or error message.
    """
    model = model_name
    message = [
        {"role": "system", "content": f"{prompt}\n"
                                     f"The JSON object must use the schema: {json.dumps(JobAnalysis.model_json_schema(), indent=2)}\n"
                                     f"VERY IMPORTANT: Do not include markdown code blocks. Do not start with ```json and do not end with ```. Output a raw JSON object only."},
        {"role": "user", "content": f"Analyze this job description:\n\n{job_description}"}
    ]
    
    payload = {
        "model": model,
        "messages": message,
        "temperature": 0.0,
    }
    
    retry_delay = initial_retry_delay
    total_attempts = 0
    
    while total_attempts < max_retries:
        try:
            print(f"Attempt {total_attempts + 1} of {max_retries}...")
            response = requests.post(url, json=payload, stream=True)
            
            if response.status_code != 200:
                print(f"Error: {response.status_code}")
                print(response.text)
                raise requests.exceptions.RequestException(f"Request failed with status code: {response.status_code}")
            
            full_content = ""
            for line in response.iter_lines(decode_unicode=True):
                if line:  # Ignore empty lines
                    try:
                        # Parse each line as a JSON object
                        json_data = json.loads(line)
                        # Extract the assistant's message content
                        if "message" in json_data and "content" in json_data["message"]:
                            content = json_data["message"]["content"]
                            full_content += content
                            print(content, end="")
                    except json.JSONDecodeError:
                        print(f"\nFailed to parse line: {line}")
            
            # Remove markdown code block formatting if present
            full_content = full_content.strip()
            if full_content.startswith("```json"):
                full_content = full_content[7:]  # Remove ```json
            elif full_content.startswith("```"):
                full_content = full_content[3:]  # Remove ```
                
            if full_content.endswith("```"):
                full_content = full_content[:-3]  # Remove trailing ```
                
            full_content = full_content.strip()
            
            # Try to parse the complete response as JSON
            json_content = json.loads(full_content)
            job_analysis = JobAnalysis(**json_content)
            print("\nSuccessfully parsed job analysis!")
            return job_analysis
        
        except json.JSONDecodeError as e:
            print(f"\nJSON parsing error: {e}")
        except ValidationError as e:
            print(f"\nValidation error: {e}")
        except requests.exceptions.RequestException as e:
            print(f"\nRequest error: {e}")
        except Exception as e:
            print(f"\nUnexpected error: {e}")
        
        total_attempts += 1
        if total_attempts >= max_retries:
            break
        
        print(f"Retrying in {retry_delay} seconds...")
        time.sleep(retry_delay)
        retry_delay *= 2  # Exponential backoff
    
    print("All retry attempts exhausted. Returning error response.")
    return "Error analyzing job description after multiple attempts."

#### Model llama3.2 3B does not give satysfaying asnwears due to errors:

Validation error: 2 validation errors for JobAnalysis
proposed_screening_questions_with_answers.0.example_answer
  Field required [type=missing, input_value={'question': 'Can you exp...ions or classify data.'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing
proposed_screening_questions_with_answers.1.example_answer
  Field required [type=missing, input_value={'question': 'How do you ... all on the same page.'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing


In [11]:
from typing import Union
import requests
import json

url = "http://127.0.0.1:11434/api/chat"

llama3_model = "llama3.2"


def analyze_job(
    job_description: str,
    prompt: str,
    model_name: str,
) -> Union[JobAnalysis, str]:
    """
    Analyzes a job descriptions.

    Args:
        job_description: The job description text to analyze.
        prompt: System prompt for the AI model.
        model_name: Name of the LLM model to use.

    Returns:
        Union[JobAnalysis, str]: Parsed job analysis results or Error message.
    """
    model = model_name
    message = [
        {"role": "system", "content":   f"{prompt}\n"
                                        f"The JSON object must use the schema: {json.dumps(JobAnalysis.model_json_schema(), indent=2)}"},
        {"role": "user", "content":     f"Analyze this job description:\n\n{job_description}"}
    ]
    
    payload = {
        "model": model,
        "messages": message,
        "temperature": 0.0,
    }
    
    response = requests.post(url, json=payload, stream=True)
    
    if response.status_code == 200:
        full_content = ""
        for line in response.iter_lines(decode_unicode=True):
            if line:  # Ignore empty lines
                try:
                    # Parse each line as a JSON object
                    json_data = json.loads(line)
                    # Extract the assistant's message content
                    if "message" in json_data and "content" in json_data["message"]:
                        content = json_data["message"]["content"]
                        full_content += content
                        print(content, end="")
                except json.JSONDecodeError:
                    print(f"\nFailed to parse line: {line}")
                    
        try:
            # Try to parse the complete response as JSON
            json_content = json.loads(full_content)
            return JobAnalysis(**json_content)
        except Exception as e:
            print(f"\nCould not parse response to JobAnalysis: {e}")
            return full_content
    else:
        print(f"Error: {response.status_code}")
        print(response.text)
        return None


In [12]:
# Example usage
job_desc = """
Note: By applying to this position you will have an opportunity to share your preferred working location from the following: Cambridge, MA, USA; Atlanta, GA, USA; Austin, TX, USA; Seattle, WA, USA.Minimum qualifications:

Master's degree in a quantitative discipline such as Statistics, Engineering, Sciences, or equivalent practical experience
4 years of experience using analytics to solve product or business problems, coding (e.g., Python, R, SQL), querying databases or statistical analysis.
4 years of experience in business visualization tools like Looker, Tableau, etc.

Preferred qualifications:

Master's degree in Computer Science, Economics, or Mathematics, or a related field.
5 years of work experience in data science or quantitative analytics with focus on statistical modeling, Machine Learning, and AI.
Experience with both SQL and Python.
Experience with Machine Learning, AI or AI Pipeline.
Experience in Google Cloud Platform.

About The Job

Cloud Learning Services (CLS) is revolutionizing direct cloud learning. We empower users of all levels with interactive labs and guided experiences to build practical skills on Google Cloud Platform and other leading technologies. Our mission is to make the cloud accessible, engaging, and enjoyable to learn.

As a Business Data Scientist, you will play a key role in uncovering valuable insights from diverse data sources to solve critical business tests. You will address a wide range of exciting projects, from establishing new measurement frameworks to identifying meaningful patterns in large datasets, ultimately empowering stakeholders to make data-driven decisions.

In this role, you will need to be a detail-oriented problem-solver with a strong foundation in data analytics, data visualization, and Artificial Intelligence/Machine Learning. You will solve the real-world problems and a commitment to continuous learning. Exceptional communication and stakeholder management skills are essential, as you will collaborate extensively with cross-functional teams. If you succeed in dynamic environments and are eager to make a real impact with data, we encourage you to apply!

The US base salary range for this full-time position is 
244,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google .

Responsibilities

Extract actionable insights from large, complex datasets and build data products like dashboards to operationalize them, driving measurable improvements in Key Performance Indicators (KPIs).
Present and communicate actionable insights and recommendations to executives, leaders, and cross-functional partners, including Product, Engineering, and Marketing teams.
Serve as a peer-reviewer and consultant to other members of the team, fostering a collaborative and knowledge-sharing environment.
Learn and share knowledge of the latest advancements in AI/ML and data science that are relevant to our work.

"""

analysis = analyze_job(job_desc, system_message_prompt, llama3_model)


# Assuming `analysis` is your JobAnalysis object
print(analysis)

{
  "role_summary": "A detail-oriented problem-solver with a strong foundation in data analytics, data visualization, and Artificial Intelligence/Machine Learning will uncover valuable insights from diverse data sources to solve critical business tests.",
  "key_terms": [
    {
      "term": "Machine Learning",
      "explanation": "A subset of artificial intelligence that involves training algorithms on large datasets to enable the system to make predictions or decisions"
    },
    {
      "term": "Data Visualization",
      "explanation": "The process of creating graphical representations of data to help users understand and analyze it"
    }
  ],
  "skill_priorities": {
    "must_have": [
      "4 years of experience using analytics to solve product or business problems, coding (e.g., Python, R, SQL), querying databases or statistical analysis."
    ],
    "nice_to_have": [
      "5 years of work experience in data science or quantitative analytics with focus on statistical modelin

In [13]:
analysis

'{\n  "role_summary": "A detail-oriented problem-solver with a strong foundation in data analytics, data visualization, and Artificial Intelligence/Machine Learning will uncover valuable insights from diverse data sources to solve critical business tests.",\n  "key_terms": [\n    {\n      "term": "Machine Learning",\n      "explanation": "A subset of artificial intelligence that involves training algorithms on large datasets to enable the system to make predictions or decisions"\n    },\n    {\n      "term": "Data Visualization",\n      "explanation": "The process of creating graphical representations of data to help users understand and analyze it"\n    }\n  ],\n  "skill_priorities": {\n    "must_have": [\n      "4 years of experience using analytics to solve product or business problems, coding (e.g., Python, R, SQL), querying databases or statistical analysis."\n    ],\n    "nice_to_have": [\n      "5 years of work experience in data science or quantitative analytics with focus on s