In [1]:
job_description = """
 Opportunities with Episource, part of the Optum family of businesses. Join a premier provider of risk adjustment services, software and solutions that’s fueling innovation in the health care industry. Start a rewarding career where your work will empower health plans and medical groups with comprehensive end-to-end solutions designed to navigate health care efficiently. Our culture is rooted in innovation, encouraging our team to stay curious and engaged. By joining us, you become part of a global, remote/hybrid-friendly team dedicated to bridging health care gaps with a solid sense of social responsibility. At Episource, we are enriching lives, including those of our team members through Caring. Connecting. Growing together. 

As a Data Analyst, you will play a crucial role in transforming raw data into actionable insights that drive business decisions and strategies. Your responsibilities will include data collection, analysis, interpretation, and transformation, helping stakeholders understand complex datasets to make informed choices and optimize performance

The data analyst will execute data pipelines, analyze transformed data to validate its completeness and gather insight to impact business decisions. Additionally, the data analysis will handle ad hoc analysis request-related data surrounding medical Record retrieval to assist in determining the location of medical charts.

You’ll enjoy the flexibility to work remotely * from anywhere within the U.S. as you take on some tough challenges.

Primary Responsibilities

    Collect, clean, and preprocess data from various sources to ensure accuracy and consistency
    Identify and address data quality issues, outliers, and missing values
    Perform exploratory data analysis to uncover trends, patterns, and relationships within datasets
    Oversee ingestion pipelines and troubleshoots errors related to data. Resolve errors using approved data cleaning techniques
    Translate complex data analysis into clear and actionable insights for stakeholders
    Collaborate with cross-functional teams to identify opportunities for process optimization and strategic decision-making
    Build analytics tools that utilize the data pipeline to provide actionable insights into operational efficiency, and other critical business performance metrics
    Develops and maintains solid relationships with all internal and external stakeholders
    Document analysis methodologies, assumptions, and findings for future reference and replication
    Requires an individual to maintain the ability to work in an environment with PHI / PII data 

You’ll be rewarded and recognized for your performance in an environment that will challenge you and give you clear direction on what it takes to succeed in your role as well as provide development for other roles you may be interested in.

Required Qualifications

    2+ years of experience in data analytics
    2+ years of experience writing intermediate to advanced SQL (joining many tables, SQL functions, CTE's)
    Experience working with Python (Ability to read and understand Python scripts and ability to make adjustments to scripts)
    Experience with data visualization tools such as Tableau 
    Proven knowledge of data analysis and data quality best practices
    Demonstrated excellent verbal and written communication skills, including ability to effectively communicate with internal and external customers
    Proficient in Excel (v-look ups, pivot tables)
    Ability and willingness to work occasional weekends and/or evening work

Preferred Qualifications

    Proven knowledge of healthcare data including member, claims, and provider data
    Proven knowledge of industry standards and practices related to medical records retrieval
    Demonstrated understanding of data pipelines, database structures and ETL practices 
    All employees working remotely will be required to adhere to UnitedHealth Group’s Telecommuter Policy.




"""


In [2]:
pwd

'c:\\Users\\reddy\\Downloads\\LLM_Project\\experiments\\skills'

In [3]:
import json

with open('../resume.json') as f:
    resume_json = json.load(f)

In [4]:
candidate_skills = resume_json['skills']

In [5]:
from groq import Groq
import os
import datetime
import subprocess
import time
from dotenv import load_dotenv

In [6]:
load_dotenv()
api_key = os.getenv('GROQ_API_KEY')
client = Groq(api_key=api_key)

In [7]:
import json
import time

def analyze_skills(candidate_profile: dict, job_description, max_retries=5) -> None:
    # Step 1: Define prompt for the first LLM (Skills and Responsibilities Optimization)
    first_prompt = f"""
    You are a resume optimization expert for ATS compliance. Your task is to enhance the candidate's skills and modify it based on the job description:

    1. **Extract Key Skills and Responsibilities**:
       - Identify and list the key skills, responsibilities, and keywords from the job description that are essential for the ideal candidate.

    2. **Optimize the Candidate's Skills Section**:
       - Review the candidate's "Skills" section carefully.
       - Prioritize and rearrange skills based on relevance to the job description.
       - Remove irrelevant skills and avoid adding new or highly technical skills the candidate doesn't possess.
       - Add transferable, miscellaneous, analytical or any other skill section if applicable, ensuring they align with the job description and are somewhat consistent with the candidate’s profile.
       - Do not include any sections (e.g., "Cloud", "Programming Languages") if they have no corresponding skills or if adding them would result in an empty or visually redundant section. Remove the section entirely if empty.
       - You may add new sections only if they are relevant and include appropriate skills.
       - Return the output in JSON format, with each section containing only non-empty skill lists.

    3. **Output Format**:
       - Provide the optimized "Skills" section in JSON format.

    ### Candidate Profile:
    {json.dumps(candidate_profile, indent=4)}

    ### Job Description:
    {json.dumps(job_description, indent=4)}

    Your output should be in the form of an enhanced JSON object for the "Skills" section.
    """

    try:
        # Step 2: Query the LLM for the optimized skills section
        first_response = client.chat.completions.create(
            model="llama3-70b-8192",
            messages=[{"role": "user", "content": first_prompt}],
            temperature=0.7,
            max_tokens=2048,
            top_p=0.9,
            stream=False,
            stop=None,
        )
        first_output = first_response.choices[0].message.content if first_response.choices else None

        if not first_output:
            print("First LLM did not produce a valid response.")
            return

        # Save the first output for review
        with open("first_output.txt", "w") as file:
            file.write(first_output)
        print("First LLM output saved to 'first_output.txt'.")

        # Step 3: Retry parsing and saving JSON output up to max_retries times
        for attempt in range(1, max_retries + 1):
            print(f"Attempt {attempt} of {max_retries}...")

            second_prompt = f"""
            Extract and return only the JSON object from the following text. Do not include any extra explanations or comments.

            ### Input:
            {first_output}

            ### Output:
            Provide only the valid JSON object.
            """

            try:
                second_response = client.chat.completions.create(
                    model="llama3-70b-8192",
                    messages=[{"role": "user", "content": second_prompt}],
                    temperature=0.3,
                    max_tokens=1024,
                    top_p=0.9,
                    stream=False,
                    stop=None,
                )
                second_output = second_response.choices[0].message.content if second_response.choices else None

                if not second_output:
                    print("Second LLM did not produce a valid response.")
                    continue

                # Validate the JSON output
                try:
                    extracted_json = json.loads(second_output.strip())
                    # Save the JSON file
                    with open("skills.json", "w") as file:
                        json.dump(extracted_json, file, indent=4)
                    print("Optimized skills JSON saved to 'skills.json'.")
                    return  # Exit loop if successful
                except json.JSONDecodeError:
                    print("Error parsing JSON from the second LLM response.")
                    print(f"Invalid JSON output: {second_output}")
                    # Continue to the next attempt

            except Exception as e:
                print(f"Error during LLM processing on attempt {attempt}: {e}")

            # Optional: Delay between retries
            time.sleep(1)

        print(f"Failed to extract valid JSON after {max_retries} attempts.")

    except Exception as e:
        print(f"Error during LLM processing: {e}")


In [8]:
analyze_skills(candidate_skills, job_description)

First LLM output saved to 'first_output.txt'.
Attempt 1 of 5...
Optimized skills JSON saved to 'skills.json'.
