<a href="https://colab.research.google.com/github/asante69/Data-Analyst/blob/main/gen_ai_cv_tailoring.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Core Idea:
Take a user's base CV (text) and a job posting URL. Extract relevant information from both. Use generative AI to rewrite/update sections of the CV (like the summary, skills, or specific experience points) to better align with the job description's requirements and keywords.


## Potential Generative AI Capabilities to Showcase:
- Structured Output / JSON Mode: We'll ask the model to extract key information from the job description and return it in a structured JSON format. This is highly practical for parsing.

- Few-Shot Prompting: When generating the tailored CV summary, we'll provide the model with 1 or 2 examples (shots) of how a generic summary can be transformed into a tailored one based on sample job details. This guides the model to produce better, more relevant output.

- Document Understanding: This is inherent in the task. The model needs to read and comprehend both the CV (Document 1) and the Job Description (Document 2) to perform the tailoring. We will explicitly call this out and design prompts that require comparison and synthesis of information from both sources, such as the skill matching step.

- Information Extraction: Using an LLM to parse the job description and extract key requirements, skills, responsibilities, and company values.

- Text Summarization: Summarizing the core requirements of the job description or summarizing the candidate's relevant experience.

- Text Generation/Rewriting: Generating a new tailored CV summary or objective statement based on the job description. Rewriting experience bullet points to use keywords from the job posting.

- Skill Mapping/Gap Analysis: Identifying skills mentioned in the job description that are present or missing from the CV, potentially suggesting ways to phrase existing experience to cover gaps.

- Question Answering (applied): Framing the task as "How should I update my CV summary for this job?" and having the model generate an answer (the new summary).


In [None]:
import os
import shutil
import glob # To find files matching a pattern

print("Attempting to clean up previous output files...")

# Define the directory where files are saved (usually /kaggle/working/)
output_dir = "/kaggle/working/"

# List of file patterns to delete
patterns_to_delete = [
    "updated_cv*.pdf",
    "updated_cv*.tex",
    "updated_cv*.log",
    "llm_raw_output*.tex",
    "*.aux",
    "updated_cv*.out"
    # Add any other file patterns you generate
]

files_deleted_count = 0
for pattern in patterns_to_delete:
    # Find all files matching the pattern in the output directory
    files_to_remove = glob.glob(os.path.join(output_dir, pattern))
    for file_path in files_to_remove:
        try:
            if os.path.isfile(file_path):
                os.remove(file_path)
                print(f"Deleted file: {file_path}")
                files_deleted_count += 1
            # Optional: Delete directories if needed (USE WITH CAUTION)
            # elif os.path.isdir(file_path):
            #     shutil.rmtree(file_path)
            #     print(f"Deleted directory: {file_path}")
        except Exception as e:
            print(f"Error deleting {file_path}: {e}")

if files_deleted_count == 0:
     print("No matching files found to delete in /kaggle/working/")
else:
     print(f"Finished cleaning up {files_deleted_count} previous output file(s).")

print("-" * 30)

## Install Libraries

- transformers, torch, accelerate, bitsandbytes: Core Hugging Face libraries for loading and running models efficiently (bitsandbytes helps with quantization for loading larger models).

- langchain: A popular framework for building applications with LLMs (optional but helpful for structuring prompts and chains, especially for an agent approach).

- beautifulsoup4, requests: Standard Python libraries for fetching and parsing HTML content from URLs.

- sentence_transformers: Can be used for comparing semantic similarity between CV skills and job requirements.

- pdfminer.six: To handle PDF CV uploads directly in Kaggle (requires uploading the PDF to your notebook's data).

In [None]:
!pip uninstall -qqy jupyterlab-lsp  # Remove unused conflicting packages
!pip install -U -q "google-genai==1.7.0"
#!pip install -q transformers torch accelerate bitsandbytes # For running Hugging Face models
#!pip install -q langchain langchain_community langchain_huggingface # To use LangChain framework
!pip install -q google-generativeai # Google AI Client Library
!pip install -q beautifulsoup4 requests # For web scraping
#!pip install -q sentence_transformers # Useful for semantic comparison if needed
!pip install -q pdfminer.six # To upload/read PDF CVs
!pip install validators
# Placeholder for other libraries

print("Libraries installed.")

In [None]:
from google import genai
from google.genai import types

genai.__version__

In [None]:
import google.generativeai as genai
from kaggle_secrets import UserSecretsClient # Assuming this is how you get secrets

try:
    user_secrets = UserSecretsClient()
    GOOGLE_API_KEY = user_secrets.get_secret("GOOGLE_API_KEY")

    # --- Add these lines for debugging ---
    print(f"Type of genai: {type(genai)}")
    print(f"Attributes of genai: {dir(genai)}")
    # --- End of debugging lines ---

    genai.configure(api_key=GOOGLE_API_KEY) # Line causing the error
    print("Google AI Client Configured.")

except Exception as e:
    print(f"Error configuring Google AI Client: {e}") # Print the specific error
    print("Please ensure your GOOGLE_API_KEY secret is set correctly.")
    # Handle the error appropriately, maybe raise it or exit
    raise e

In [None]:
# Install TeX Live
# --- Install LaTeX ---
print("Installing TeX Live, this might take a few minutes...")
!sudo apt-get update > /dev/null # Suppress lengthy output
!sudo apt-get install -y texlive texlive-latex-recommended texlive-fonts-recommended latexmk > /dev/null
!sudo apt-get install -y texlive-latex-base texlive-latex-extra texlive-fonts-extra > /dev/null
print("TeX Live installation complete.")

# Verify installation (optional)
!pdflatex -version
!latexmk -version

# Import Libraries:

In [None]:
import google.generativeai as genai
import os
import pandas as pd # Optional: for better display

# --- List Available Models ---
print("\nFetching available models...")
models_list = []
try:
    for m in genai.list_models():
        # Check if the model supports the 'generateContent' method (standard text generation)
        # This helps filter out specialized models if you only want text generators
        if 'generateContent' in m.supported_generation_methods:
            model_info = {
                'name': m.name,
                'display_name': m.display_name,
                'description': m.description,
                'version': m.version,
                # Add other potentially useful fields:
                # 'input_token_limit': m.input_token_limit,
                # 'output_token_limit': m.output_token_limit,
            }
            models_list.append(model_info)

    if not models_list:
        print("No models supporting 'generateContent' found or API call failed.")
    else:
        print(f"Found {len(models_list)} models supporting 'generateContent':")

        # --- Filter for "Pro" models (Case-Insensitive Search) ---
        pro_models = [
            model for model in models_list
            if 'pro' in model['name'].lower() or 'pro' in model['display_name'].lower()
        ]

        print("\n--- All Found Text Generation Models ---")
        # Optional: Display as a Pandas DataFrame for nicer formatting
        all_models_df = pd.DataFrame(models_list)
        #display(all_models_df[['name', 'display_name', 'version', 'description']]) # Use display() in Kaggle

        print("\n--- Filtered 'Pro' Models ---")
        if pro_models:
            pro_models_df = pd.DataFrame(pro_models)
            display(pro_models_df[['name', 'display_name', 'version', 'description']])
        else:
            print("No models containing 'Pro' found in the results.")

except Exception as e:
    print(f"An error occurred while listing models: {e}")

In [None]:
import os
import requests
from bs4 import BeautifulSoup
import google.generativeai as genai
from google.generativeai.types import HarmCategory, HarmBlockThreshold
import json # For handling JSON output
import time # For potential retries

from pdfminer.high_level import extract_text
import re # Useful for cleaning/parsing text

# --- Model Configuration ---
# MODEL_NAME = "gemini-1.5-flash" # Flash is fast and capable for many tasks but I found pro better in generating text
MODEL_NAME = "models/gemini-2.5-pro-exp-03-25"
model = genai.GenerativeModel(MODEL_NAME)

# --- Safety Settings (Optional but Recommended) ---
safety_settings = {
    HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
}

# --- Generation Configuration (Optional) ---
generation_config = genai.GenerationConfig(
    temperature=0.9, # Controls randomness (0=deterministic, >1=more creative)
    # max_output_tokens=1024, # Limit response length
    # response_mime_type="application/json" # Use this specifically for JSON mode capability
)

print(f"Using model: {MODEL_NAME}")

# Define Input Data:

In [None]:
# --- Cell for Instructed Upload via Kaggle UI ---

import ipywidgets as widgets
from IPython.display import display, clear_output
import os
import time
import validators # For URL validation later if needed

# --- Global variables to store results ---
job_posting_url = None # Assuming you still get this earlier or here
cv_path = None
setup_successful = False

# --- Widgets ---
# Assuming job_link is obtained previously or via input here
# Let's add URL input back for completeness of this example cell
url_input = widgets.Text(
    placeholder='Paste job description URL here...',
    description='Job URL:',
    layout=widgets.Layout(width='80%'),
    disabled=False
)

# Instructions for CV upload using Kaggle UI
upload_instructions = widgets.HTML(
    value="""
    <hr>
    <b>Step 2: Upload Your CV using Kaggle UI</b>
    <ol>
        <li>Go to the <b>"File" menu</b> at the top-left OR use the <b>"+ Add data" button</b> in the right sidebar.</li>
        <li>Select <b>"Upload Data"</b>.</li>
        <li>Click <b>"Browse Files"</b> and select your CV file (e.g., MyResume.pdf).</li>
        <li>Give your upload a simple dataset title if prompted (e.g., "cv-upload").</li>
        <li>Wait for the upload to complete.</li>
        <li>Find your uploaded file in the Input section of the right sidebar (it might take a moment to appear).</li>
        <li><b>Copy the exact file path</b> (e.g., <code>/kaggle/input/cv-upload/MyResume.pdf</code>).</li>
        <li><b>Paste the full path</b> into the text box below and click "Confirm Path".</li>
    </ol>
    """
)

# Text input for the user to paste the path
path_input = widgets.Text(
    placeholder='/kaggle/input/your-dataset-name/your-cv.pdf',
    description='CV Path:',
    layout=widgets.Layout(width='80%'),
    disabled=True # Disabled until URL is entered
)

# Button to confirm the pasted path
confirm_button = widgets.Button(
    description="Confirm URL & CV Path",
    button_style='success',
    tooltip='Click after entering URL and pasting the CV path from /kaggle/input/',
    disabled=False
)

# Output area for messages
output_area = widgets.Output()

# --- Event Handlers ---

def handle_confirmation(button_instance):
    global job_posting_url, cv_path, setup_successful, path_input, url_input # Allow modification

    # Disable inputs during processing
    url_input.disabled = True
    path_input.disabled = True
    confirm_button.disabled = True

    with output_area:
        clear_output(wait=True)
        print("Processing...")
        time.sleep(0.5)

        # 1. Validate URL
        temp_job_link = url_input.value.strip()
        if not validators.url(temp_job_link):
             print(f"❌ Error: Invalid Job URL provided: '{temp_job_link}'")
             # Re-enable inputs and return
             url_input.disabled = False
             path_input.disabled = True # Keep path disabled until URL is valid
             confirm_button.disabled = False
             return
        else:
            job_posting_url = temp_job_link # Store valid URL
            print(f"✅ Job URL confirmed: {job_posting_url}")
            path_input.disabled = False # Enable path input now URL is okay

        # 2. Validate Pasted CV Path
        temp_cv_path = path_input.value.strip()
        # Remove surrounding quotes if user pastes path with them
        if temp_cv_path.startswith('"') and temp_cv_path.endswith('"'):
            temp_cv_path = temp_cv_path[1:-1]
        elif temp_cv_path.startswith("'") and temp_cv_path.endswith("'"):
            temp_cv_path = temp_cv_path[1:-1]


        if not temp_cv_path:
            print("❌ Error: Please paste the CV file path from /kaggle/input/.")
        elif not temp_cv_path.startswith("/kaggle/input/"):
             print(f"❌ Error: Path should start with '/kaggle/input/'. You entered: '{temp_cv_path}'")
        elif not os.path.exists(temp_cv_path):
             print(f"❌ Error: File not found at path '{temp_cv_path}'. Double-check the path in the sidebar.")
        elif not os.path.isfile(temp_cv_path):
             print(f"❌ Error: Path '{temp_cv_path}' points to a directory, not a file.")
        else:
            # --- Success ---
            cv_path = temp_cv_path # Store the validated path
            setup_successful = True
            print(f"✅ CV Path confirmed: {cv_path}")
            print("\n--- Setup Complete ---")
            print("You can now proceed with the next steps in your notebook.")
            # Inputs remain disabled on success
            return # Exit function successfully

        # --- If any validation failed ---
        setup_successful = False
        cv_path = None
        # Re-enable inputs for correction (URL stays enabled, path too now)
        url_input.disabled = False
        path_input.disabled = False
        confirm_button.disabled = False


# --- Link Event Handler ---
confirm_button.on_click(handle_confirmation)

# --- Display Widgets ---
display(widgets.VBox([
    widgets.HTML("<b>Step 1: Enter Job Description URL</b>"),
    url_input,
    upload_instructions, # Display the HTML instructions
    path_input,
    confirm_button,
    output_area
]))
#job_posting_url = "https://www.amazon.jobs/en/jobs/2956591/embedded-software-development-engineer-blink?cmpid=SPLICX0248M&utm_source=linkedin.com&utm_campaign=cxro&utm_medium=social_media&utm_content=job_posting&ss=paid"

In [None]:
# --- INPUT SECTION ---
# Define the job URL (manual input)
job_posting_url = "https://www.amazon.jobs/en/jobs/2956591/embedded-software-development-engineer-blink?cmpid=SPLICX0248M&utm_source=linkedin.com&utm_campaign=cxro&utm_medium=social_media&utm_content=job_posting&ss=paid"

# --- PDF INPUT SECTION ---

# IMPORTANT: Update this path based on your dataset name and PDF filename!
# Example: If dataset is 'cv-data' and file is 'MyResume_Jan2024.pdf'
# pdf_path = "/kaggle/working/MyResume_Jan2024.pdf"

# --- Try to auto-detect the first PDF found ---
pdf_path = None
input_dirs = [d for d in os.listdir('/kaggle/input') if os.path.isdir(os.path.join('/kaggle/input', d))]

if not input_dirs:
    print("Warning: No input directories found in /kaggle/input/. Did you add data?")
else:
    # Search for the first file ending in .pdf
    for dirname in input_dirs:
        dirpath = os.path.join('/kaggle/input', dirname)
        try:
            for filename in os.listdir(dirpath):
                if filename.lower().endswith('.pdf'):
                    pdf_path = os.path.join(dirpath, filename)
                    print(f"Found PDF: {pdf_path}")
                    break # Use the first PDF found
        except Exception as e:
            print(f"Could not list files in {dirpath}: {e}")
        if pdf_path:
             break # Stop searching once one is found

# --- Define path manually if auto-detect fails or is wrong ---
# pdf_path = "/kaggle/working/YOUR_CV_FILENAME.pdf" # UNCOMMENT AND SET MANUALLY IF NEEDED

base_cv_text = None
if pdf_path and os.path.exists(pdf_path):
    print(f"Attempting to read text from: {pdf_path}")
    try:
        base_cv_text = extract_text(pdf_path)
        print(f"Successfully extracted text from PDF. Length: {len(base_cv_text)} characters.")
        # Optional: Print a snippet to verify
        # print("\n--- Start of Extracted CV Text ---")
        # print(base_cv_text[:500])
        # print("--- End of Snippet ---")

    except Exception as e:
        print(f"Error extracting text from PDF '{pdf_path}': {e}")
        print("Please ensure it's a text-based PDF and not an image scan.")
        # Optional: Provide fallback mechanism
        # base_cv_text = """ PASTE YOUR CV TEXT HERE AS A FALLBACK """
else:
    if not pdf_path:
         print("Error: Could not automatically find a PDF file in /kaggle/input/.")
    else:
         print(f"Error: PDF file not found at specified path: {pdf_path}")
    print("Please ensure you have added your CV PDF via '+ Add Data' and the path is correct.")
    # Stop execution or use fallback
    # raise FileNotFoundError("CV PDF not found or readable.")

# --- Ensure base_cv_text is not None before proceeding ---
if base_cv_text is None:
     raise ValueError("Failed to load CV text from PDF. Notebook cannot proceed.")

# Optional: Define which parts of the CV you want to target for updates
target_sections = ["Summary", "Skills"]
print("Input CV and Job URL defined.")

# Adjust Section Parsing:

I've tried using re to extract and scrape data, but since I want to make it more general I decided to ask LLM to go through
input CV and detect sections

## Define helper function for text generation:

In [None]:
def generate_text_with_gemini(prompt_text, is_json_output=False):
    """Generates text using the Gemini model with error handling."""
    try:
        current_config = generation_config
        if is_json_output:
            # Ensure the specific model supports JSON output mode if explicitly requesting
            # Check Gemini docs. If not directly supported via MIME type,
            # instruct the model clearly in the prompt to ONLY output valid JSON.
            current_config = genai.GenerationConfig(
                 response_mime_type="application/json"
                 # Add other config like temperature if needed
            )
            print("Attempting JSON output mode.")


        response = model.generate_content(
            prompt_text,
            generation_config=current_config,
            safety_settings=safety_settings
        )

        # Handle potential safety blocks or empty responses
        if not response.candidates:
             print("Warning: Response blocked or empty. Safety ratings:", response.prompt_feedback.safety_ratings)
             return None

        # Accessing the text - check Gemini API documentation for the exact structure
        # It might be response.text or response.candidates[0].content.parts[0].text
        generated_content = response.text

        if is_json_output:
             # Validate if the output is actually JSON
             try:
                 json.loads(generated_content)
                 print("Valid JSON received.")
             except json.JSONDecodeError:
                 print("Warning: Model did not return valid JSON despite request.")
                 print("Raw output:", generated_content[:200]) # Print snippet for debugging
                 # Fallback: Return raw text or None
                 return generated_content # Or handle error differently

        return generated_content

    except Exception as e:
        print(f"An error occurred during Gemini API call: {e}")
        # Implement retry logic if desired (e.g., for rate limits)
        # time.sleep(5) # Simple backoff
        return None # Or re-raise the exception

In [None]:
# --- LLM-Based Section Extraction ---

print("Attempting to extract Summary and Skills sections using LLM...")
base_summary = "Summary could not be extracted by LLM." # Default fallback
base_skills_section = "Skills section could not be extracted by LLM." # Default fallback

if base_cv_text: # Only proceed if PDF text was loaded
    # --- Prepare the Prompt for Section Extraction ---
    prompt_extract_sections = f"""
    Analyze the following CV text and identify the main "Summary" (or Objective/Profile) section and the primary "Skills" (or Key Skills/Technical Skills/Competencies) section.

    CV Text:
    ---
    {base_cv_text}
    ---

    Task: Extract the full text content for these two sections.
    IMPORTANT: Respond ONLY with a valid JSON object containing two keys:
    1.  "summary": A string containing the full text of the candidate's summary/objective/profile section. If no such section is clearly identifiable, return null or an empty string.
    2.  "skills": A string containing the full text of the main skills listing section. This often includes bullet points. If no such section is clearly identifiable, return null or an empty string.

    Do not include any introductory text, explanations, or markdown formatting outside the JSON object itself.
    """

    print("Requesting section extraction from LLM...")
    # Use the helper function, requesting JSON output
    extracted_sections_json_str = generate_text_with_gemini(prompt_extract_sections, is_json_output=True)

    if extracted_sections_json_str:
        try:
            # Clean potential markdown code block fences ```json ... ``` or ``` ... ```
            if extracted_sections_json_str.strip().startswith("```json"):
                 extracted_sections_json_str = extracted_sections_json_str.strip()[7:-3].strip()
            elif extracted_sections_json_str.strip().startswith("```"):
                 extracted_sections_json_str = extracted_sections_json_str.strip()[3:-3].strip()

            # Parse the JSON response
            extracted_sections = json.loads(extracted_sections_json_str)
            print("--- Extracted Job Information (JSON Parsed & Pretty Printed) ---")
            # Use json.dumps() with indent for pretty printing
            print(json.dumps(extracted_sections, indent=2)) # <--- CHANGE/VERIFY THIS LINE

            # Assign the extracted content, providing fallbacks if keys are missing or null
            base_summary = extracted_sections.get("summary") or base_summary # Use fallback if null/empty
            base_skills_section = extracted_sections.get("skills") or base_skills_section # Use fallback if null/empty

            if not extracted_sections.get("summary"):
                 print("Warning: LLM did not identify a 'summary' section.")
            if not extracted_sections.get("skills"):
                 print("Warning: LLM did not identify a 'skills' section.")

            print(f"LLM-Extracted Professional Summary (Snippet): {base_summary[:150]}...")
            print(f"LLM-Extracted Key Skills Section (Snippet): {base_skills_section[:150]}...")

        except json.JSONDecodeError as e:
            print(f"Error parsing JSON response from LLM for section extraction: {e}")
            print("Raw Response Snippet:", extracted_sections_json_str[:500])
            # Keep the default fallback values
        except Exception as e:
             print(f"An unexpected error occurred processing the LLM response for section extraction: {e}")
             print("Raw Response Snippet:", extracted_sections_json_str[:500])
             # Keep the default fallback values
    else:
        print("LLM did not return a response for section extraction.")
        # Keep the default fallback values

else:
    print("Skipping LLM section extraction as base_cv_text is not loaded.")

# --- Ensure the variables are ready for use in subsequent prompts ---
# The variables base_summary and base_skills_section now hold the text extracted by the LLM
# or the default error message if extraction failed or sections weren't found.

if base_summary.startswith("Summary could not"):
    print("Warning: Proceeding without a clearly identified summary section.")
if base_skills_section.startswith("Skills section could not"):
     print("Warning: Proceeding without a clearly identified skills section.")

# Data Extraction and Procesing:

This function uses requests to get page HTML and uses BeautifulSoup to parse it. The text content is then given to LLM with prompt with some exmaples and output format.

In [None]:
def scrape_job_description(url):
    """
    Fetches content from a URL, extracts all text, and then uses
    an LLM to identify and return only the core job description.
    """
    try:
        # --- Step 1: Fetch and Parse Initial HTML ---
        response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'}, timeout=20) # Added timeout
        response.raise_for_status() # Raise HTTPError for bad responses (4xx or 5xx)
        soup = BeautifulSoup(response.content, 'html.parser')

        # --- Step 2: Extract All Text (Initial Noisy Extraction) ---
        # Getting all text is often necessary as specific containers fail
        all_text = ' '.join(soup.stripped_strings)

        if not all_text:
            print(f"Warning: No text could be extracted from {url} using BeautifulSoup.")
            return None # Return None if no text found at all

        print(f"Initial text extracted (length {len(all_text)}). Asking LLM to find job description...")
        # print(f"Initial text snippet: {all_text[:500]}...") # Optional: for debugging

        # --- Step 3: Use LLM to Extract Core Job Description ---
        # Limit input length to avoid exceeding token limits (adjust as needed)
        # Gemini models often have large context windows, but let's be safe.
        max_input_chars = 25000 # Example limit, adjust based on model and typical page size
        input_text_for_llm = all_text[:max_input_chars]

        prompt_llm_extract_jd = f"""
        Analyze the following text extracted from a webpage ({url}). This text contains the entire page content, including headers, footers, navigation, ads, and potentially the job description.

        Your task is to identify and extract ONLY the core job description content. This typically includes sections like:
        - Job Title / Role Name
        - Company Information (briefly, if relevant to the role itself)
        - Responsibilities / What You'll Do
        - Requirements / Qualifications / Who You Are
        - Nice-to-haves / Preferred Qualifications
        - Location / Remote work details (if part of the description)
        - Sometimes details about the team or company culture within the description body.

        Explicitly EXCLUDE common irrelevant page elements such as:
        - Main site navigation menus (e.g., links to Home, About Us, Careers Home)
        - Footer text (copyrights, privacy policy links)
        - Cookie consent banners
        - Advertisements or links to unrelated jobs
        - Application form fields or instructions ("Apply Now" buttons, "Upload Resume") unless they are literally the only text available.
        - Boilerplate text repeated on many pages (e.g., general diversity statements unless part of the JD itself).

        If a clear job description section cannot be found, return the phrase "No specific job description found."

        Extracted Webpage Text:
        ---
        {input_text_for_llm}
        ---

        Extracted Job Description:
        """

        # Use the existing helper function (assuming it's defined)
        cleaned_job_description = generate_text_with_gemini(prompt_llm_extract_jd, is_json_output=False) # Not requesting JSON here

        if cleaned_job_description and "No specific job description found." not in cleaned_job_description:
            print("LLM successfully extracted potential job description.")
            return cleaned_job_description.strip()
        elif cleaned_job_description:
             print("LLM indicated no specific job description was found in the text.")
             return None # Return None if LLM explicitly says it can't find it
        else:
            print("Warning: LLM extraction failed or returned empty. Falling back to raw text (potentially noisy).")
            # Fallback: Return the noisy text if LLM fails, but maybe truncated? Or return None?
            # Returning None might be safer if the LLM step is crucial.
            # return all_text[:5000] # Example: Return truncated raw text as last resort
            return None # Let's return None if LLM step fails, as raw text was problematic

    except requests.exceptions.Timeout:
         print(f"Error: Request timed out for URL {url}")
         return None
    except requests.exceptions.RequestException as e:
        print(f"Error fetching URL {url}: {e}")
        return None
    except Exception as e:
        # Catch potential errors in BeautifulSoup or the LLM call within this function
        print(f"Error processing content from {url}: {e}")
        return None

In [None]:
# Fetch and print the job description
job_description = scrape_job_description(job_posting_url)

if job_description:
    print("--- Scraped Job Description (First 500 chars) ---")
    print(job_description[:500])
    print("\n--- End of Snippet ---")
else:
    print("Failed to retrieve job description.")
    # You might want to stop execution or handle this error
    if not job_description: # Still no JD? Stop.
        raise ValueError("Job Description could not be obtained.")

# Apply Gen AI Techniques:

## Use LLM to extract info from Job Description in JSON format:

In [None]:
prompt_extract_json = f"""
Analyze the following job description and extract key information.
IMPORTANT: Respond ONLY with a valid JSON object containing the following keys:
- "required_skills": A list of strings detailing specific technical and soft skills mentioned.
- "key_responsibilities": A list of strings summarizing the main duties and tasks.
- "company_focus_values": A brief string describing any stated company culture, values, or product focus. If none are explicitly mentioned, state "Not specified".
- "job_title": The job title mentioned in the description. If not clear, state "Not specified".
- "experience_level": Required or desired experience level (e.g., "Entry-level", "5+ years", "Senior"). If not clear, state "Not specified".

Do not include any introductory text or explanations outside the JSON object.

Job Description Text (first ~3000 chars):
---
{job_description[:3000]}
---

JSON Output:
"""

print("Requesting structured information from Job Description...")
extracted_info_json_str = generate_text_with_gemini(prompt_extract_json, is_json_output=True) # Request JSON

extracted_info = None
if extracted_info_json_str:
    try:
        # Clean potential markdown code block fences ```json ... ```
        if extracted_info_json_str.strip().startswith("```json"):
             extracted_info_json_str = extracted_info_json_str.strip()[7:-3].strip()
        elif extracted_info_json_str.strip().startswith("```"):
             extracted_info_json_str = extracted_info_json_str.strip()[3:-3].strip()

        extracted_info = json.loads(extracted_info_json_str)
        print("--- Extracted Job Information (JSON Parsed) ---")
        # Pretty print the JSON
        print(json.dumps(extracted_info, indent=2))
    except json.JSONDecodeError as e:
        print(f"Error parsing JSON response: {e}")
        print("Raw Response Snippet:", extracted_info_json_str[:500])
        # Handle the error - maybe try a non-JSON prompt as fallback?
    except Exception as e:
         print(f"An unexpected error occurred processing the response: {e}")
         print("Raw Response Snippet:", extracted_info_json_str[:500])

if not extracted_info:
    print("Could not extract structured information. Proceeding with caution.")
    # You might need to handle this case downstream (e.g., skip steps relying on this info)

## Using Few-Shot Promptng to Generate Tailored CV Summary:

In [None]:
# Ensure we have extracted info, even if basic
job_skills_str = ", ".join(extracted_info.get('required_skills', [])) if extracted_info else "relevant skills"
job_resp_str = "; ".join(extracted_info.get('key_responsibilities', [])) if extracted_info else "key duties"


# --- Construct the Few-Shot Prompt ---
prompt_rewrite_summary_few_shot = f"""
You are an expert resume writer. Your task is to rewrite a generic CV summary to be specifically tailored for a job description.

**Example 1:**
Generic Summary: Results-oriented Marketing Manager with 8 years of experience in digital campaigns and team leadership. Proven ability to increase brand awareness.
Job Highlights: Seeking B2B SaaS Marketing Lead. Requires expertise in SEO, content marketing for lead generation, and managing marketing automation platforms (HubSpot).
Tailored Summary: Experienced B2B SaaS Marketing Manager with 8 years driving lead generation through strategic SEO and content marketing. Proficient in HubSpot, focused on increasing qualified leads for SaaS products.

**Example 2:**
Generic Summary: Detail-oriented Data Analyst skilled in SQL, Python, and data visualization. Experience cleaning and interpreting large datasets.
Job Highlights: Junior Data Analyst role focused on retail analytics. Needs experience with Tableau for dashboarding and presenting findings to non-technical stakeholders.
Tailored Summary: Data Analyst proficient in SQL and Python, with experience in retail analytics. Skilled in transforming complex data into actionable insights using Tableau dashboards and communicating findings clearly to business stakeholders.

**Now, perform the task for the following:**

Generic Summary:
{base_summary}

Job Highlights:
- Key Skills: {job_skills_str}
- Responsibilities: {job_resp_str}
- Job Title: {extracted_info.get('job_title', 'Not specified') if extracted_info else 'Not specified'}

Rewrite the generic summary (2-4 sentences) to strongly align with the specific Job Highlights provided above. Focus on the most relevant skills and experiences mentioned in the generic summary that match the job requirements.

Tailored CV Summary:
"""

print("\nRequesting tailored CV summary using Few-Shot Prompting...")
tailored_summary = generate_text_with_gemini(prompt_rewrite_summary_few_shot)

print("\n--- Original CV Summary ---")
print(base_summary)
print("\n--- Generated Tailored CV Summary ---")
if tailored_summary:
    print(tailored_summary)
else:
    print("Failed to generate tailored summary.")

## Using Document Understanding to Compare skills and suggest additions or rephrasing:

In [None]:
base_skills_section = "[Skills]..." # Extract the skills section from base_cv_text
# Simple extraction
try:
     skills_start = base_cv_text.find("**Skills**") + len("**Skills**")
     skills_end = base_cv_text.find("**Education**") # Assumes Education follows Skills
     if skills_start > -1 and skills_end > -1:
          base_skills_section = base_cv_text[skills_start:skills_end].strip()
     elif skills_start > -1: # If Skills is the last section
          base_skills_section = base_cv_text[skills_start:].strip()
     else:
          base_skills_section = "No skills section found."
except Exception:
     base_skills_section = "Error extracting skills."


# Ensure we have the required skills list from the structured extraction
required_skills_list = extracted_info.get('required_skills', []) if extracted_info else []

if not required_skills_list:
    print("\nWarning: Cannot perform skill matching as required skills were not extracted.")
    skill_analysis = "Skipped - Required skills not available."
else:
    prompt_skill_match = f"""
    Analyze the alignment between the candidate's current CV skills and the required skills for the job.

    Candidate's Current CV Skills Section:
    ---
    {base_skills_section}
    ---

    Required Skills from Job Description:
    ---
    {', '.join(required_skills_list)}
    ---

    Task:
    1.  Identify which **Required Skills** seem to be **present** in the Candidate's CV Skills Section (list them). Interpret skills broadly (e.g., 'AWS EC2' in CV matches 'AWS' or 'Cloud Computing' requirement).
    2.  Identify key **Required Skills** that appear to be **missing** or not explicitly mentioned in the Candidate's CV Skills Section (list them).
    3.  Suggest 1-2 high-priority skills from the **missing** list that the candidate should consider adding or elaborating on in their CV, potentially by rephrasing existing experience if applicable.

    Provide the analysis clearly under the headings: "Present Skills:", "Missing Skills:", and "Suggestions:".

    Analysis:
    """

    print("\nRequesting Skill Analysis (Document Understanding)...")
    skill_analysis = generate_text_with_gemini(prompt_skill_match)

print("\n--- Skills Analysis and Suggestions ---")
if skill_analysis:
    print(skill_analysis)
else:
    print("Failed to generate skill analysis.")

# Create Output:

## Adding helper function to escape Latex Charachters:

In [None]:
import re

def escape_latex(text):
    """Escapes characters special to LaTeX."""
    if not isinstance(text, str):
        text = str(text) # Ensure input is a string

    # Basic escape characters
    conv = {
        '&': r'\&',
        '%': r'\%',
        '$': r'\$',
        '#': r'\#',
        '_': r'\_',
        '{': r'\{',
        '}': r'\}',
        '~': r'\textasciitilde{}',
        '^': r'\^{}',
        '\\': r'\textbackslash{}',
        '<': r'\textless{}',
        '>': r'\textgreater{}',
    }
    regex = re.compile('|'.join(re.escape(str(key)) for key in sorted(conv.keys(), key = lambda item: - len(item))))
    text = regex.sub(lambda match: conv[match.group()], text)

    # Handle newlines - simple approach: treat double newline as paragraph break
    # Treat single newline as just a space unless we want explicit line breaks (\\)
    text = text.replace('\n\n', '\n\n\\par\n\n') # Double newline -> paragraph
    text = text.replace('\n', ' ') # Single newline -> space (adjust if you want \\)

    # Basic handling for bullet points (assuming '*' or '-' at start of line)
    # This is simplistic and might need refinement
    lines = text.splitlines()
    processed_lines = []
    in_itemize = False
    for line in lines:
         stripped_line = line.strip()
         if stripped_line.startswith('* ') or stripped_line.startswith('- '):
              item_content = stripped_line[2:].strip()
              if item_content: # Only add if there's content after bullet
                   if not in_itemize:
                        processed_lines.append(r'\begin{itemize}')
                        in_itemize = True
                   processed_lines.append(rf'  \item {item_content}')
         else:
              if in_itemize:
                   processed_lines.append(r'\end{itemize}')
                   in_itemize = False
              processed_lines.append(line) # Add non-bullet lines (already escaped)
    if in_itemize: # Close itemize if it was the last thing
        processed_lines.append(r'\end{itemize}')

    return '\n'.join(processed_lines)

print("LaTeX escaping helper function defined.")

## Define Latex CV Template:

In [None]:
# --- Define Specific LaTeX CV Template (Provided by User) ---

latex_template = r"""
\documentclass[11pt,a4paper]{article}
\usepackage[margin=0.6in]{geometry}
\usepackage{enumitem}
\usepackage{titlesec}
\usepackage[hidelinks]{hyperref}
\usepackage{parskip}

% Define custom section formatting
\titleformat{\section}{\large\bfseries}{}{0em}{}[\titlerule]
\titleformat{\subsection}{\normalsize\bfseries}{}{0em}{}

\begin{document}

\begin{center}
    {\LARGE \textbf{Aida Afshar Nia}}\\[0.2cm]  % <<< --- Placeholder/Extractable
    Calgary, AB \quad $|$ \quad +1~587-594-7668 \quad $|$ \quad \href{mailto:aidaafsharnia@gmail.com}{aidaafsharnia@gmail.com} \quad $|$ \quad \href{http://www.linkedin.com/in/aida-afsharnia-507274121/}{LinkedIn} % <<< --- Placeholder/Extractable
\end{center}

\vspace{0.3cm}

\section*{Professional Summary}
\begin{itemize}[leftmargin=*]
%%SUMMARY_ITEMS_PLACEHOLDER%%
\end{itemize}

\section*{Key Skills}
\begin{itemize}[leftmargin=*]
%%SKILLS_ITEMS_PLACEHOLDER%%
\end{itemize}

% --- Rest of the template remains unchanged (Experience, Education etc.) ---
% --- We are only dynamically updating Summary and Skills items ---

\section*{Professional Experience}

\textbf{Software Engineer at \textit{Exro Technologies Inc.} } \hfill \textit{Sep 2022 -- Present}
\begin{itemize}[leftmargin=*]
    \item Developed and maintained a suite of microservices for a Linux-based, single-core system covering everything from U-Boot and kernel customizations to high-level application logic in Python and C/C++.
    \item Designed a custom IPC core module using ZeroMQ that listens for incoming messages and redirects them based on a custom protocol, supporting both JSON and binary payloads.
    \item Implemented local data conversion/management using Protocol Buffers following SunSpec standards.
    \item Implemented microservices using Python libraries (Flask, schedule, SQLite) and object-oriented C/C++ designs to ensure organized, maintainable code.
    \item Established CI/CD pipelines with Docker and Git, automating testing and enabling seamless release and package management.
    \item Integrated AWS services (S3, IoT) for secure data transfer to web dashboards, and implemented communication via MQTT and TCP.
    \item Developed key microservices including a 2030.5 client, OffGrid Controller, Modbus client/server, RS485 interface, AWS handler, remote SSH access, and AWS command/job management service.
    \item Built automated monitoring systems to track microservice PIDs, CPU/RAM usage, and system responsiveness, ensuring high reliability in real-time testing.
    \item Contributed to full-stack development by building web servers in C++ and Python and developing dynamic client interfaces using JavaScript, HTML, PHP, CSS, and Ninja.
    \item Diagnosed and resolved issues related to inter-service communication, database interactions, and real-time data handling using tools such as oscilloscopes, logic analyzers, and log monitoring.
    \item Conducted performance optimization by monitoring and profiling microservices to analyze system bottlenecks, applying optimizations to reduce latency and improve resource utilization.
    \item Addressed real-time data handling challenges by implementing asynchronous processing and message queues, and solved scalability issues through optimized database queries and load balancing strategies.
\end{itemize}

\textbf{Research Assistant at \textit{Power Electronics Laboratory}} \hfill \textit{Sep 2019 -- Aug 2022}
\begin{itemize}[leftmargin=*]
    \item Developed control algorithms and simulation models for renewable energy systems using MATLAB/Simulink and Psim.
    \item Implemented and validated software models to improve stability and fault tolerance in energy conversion systems.
    \item Collaborated on hardware-in-loop testing setups, contributing to the integration of real-time monitoring and control software.
\end{itemize}

\textbf{Teacher Assistant \& Laboratory Instructor at \textit{University of Alberta}} \hfill \textit{Sep 2019 -- May 2022}
\begin{itemize}[leftmargin=*]
    \item Instructed over 200 students in Embedded System Design, Computer Interfacing, and Microprocessor fundamentals.
    \item Developed course projects and labs emphasizing C programming, assembly language, and real-time system debugging on STM microcontrollers.
\end{itemize}

\textbf{Research Assistant \& Web Development at \textit{Cognitive and Robotics Lab.}} \hfill \textit{Summer 2016--2019}
\begin{itemize}[leftmargin=*]
    \item Designed control and simulation projects for robotics, including data analysis for a rehabilitation device and NAO robot control using Microsoft Kinect.
    \item Developed web-based interfaces and databases (MySQL, PHP) to facilitate remote data acquisition and system monitoring.
    \item Integrated Wi-Fi modules (ESP8266) for real-time sensor data collection and remote system control.
\end{itemize}

\section*{Education}
\textbf{M.Sc. in Energy Systems}\\
University of Alberta, Edmonton, Canada \hfill \textit{Sep 2019 -- Aug 2022}

\vspace{0.2cm}

\textbf{B.Sc. in Control Engineering}\\
University of Tehran, Tehran \hfill \textit{Sep 2013 -- Jan 2018}

\section*{Publications}
\begin{itemize}[leftmargin=*]
    \item \textit{Weighted Dynamic Aggregation Modeling of Induction Machine-Based Wind Farms}
    \item \textit{Weighted Dynamic Aggregation Modeling of DC Microgrid Converters with Droop Control}
    \item \textit{Droop-Based DC Microgrids Analysis and Control Design Using a Weighted Dynamic Aggregation Modeling Approach}
\end{itemize}

\section*{References}
Available upon request.

\end{document}
"""
# Add placeholders inside the itemize environments
latex_template = latex_template.replace(r"""    \item Software Engineer with over 5 years of experience designing and implementing complex software systems on Linux-based platforms or embedded devices.
 \item Proven expertise in developing microservices—from low-level bootloaders and kernel modifications to high-level Python and C/C++ services—ensuring robust system performance and scalability.
 \item Skilled in architecting distributed systems with custom IPC mechanisms (using ZeroMQ) that enforce strict message protocols to guarantee integrity and timely responses.
 \item Experienced in containerization (Docker), virtualization (QEMU), CI/CD automation via Git, and full-stack web development.
 \item Demonstrated ability to integrate cloud services (AWS S3, AWS IoT) and implement comprehensive monitoring, debugging, and release management processes.""", "%%SUMMARY_ITEMS_PLACEHOLDER%%", 1)

latex_template = latex_template.replace(r"""    \item \textbf{Software Development:} C, C++, Python; class-based design with signal/function inter-class communication.
 \item \textbf{Microservices \& IPC:} Flask, schedule, ZeroMQ, RESTful service design, SQLite.
 \item \textbf{Containerization \& CI/CD:} Docker, Git-based CI/CD pipelines for automated testing, package management, and release versioning.
 \item \textbf{Cloud Integration:} AWS S3, AWS IoT services; secure data transfer and remote management.
 \item \textbf{System Monitoring \& Debugging:} Automated PID, CPU, and RAM monitoring; hardware-in-loop testing and core dump analysis.
 \item \textbf{Web Development:} Server-side development (C++/Python) and front-end skills (JavaScript, HTML, PHP, CSS, Ninja).
 \item \textbf{Additional Technologies:} MQTT, TCP , UART, CAN communication; currently learning Rust.""", "%%SKILLS_ITEMS_PLACEHOLDER%%", 1)


print("Specific LaTeX CV template loaded with placeholders.")
# print(latex_template) # Optional: Print to verify placeholders

## Prepare Content and Populate Template:

In [None]:
# Prepare Inputs:
import os
import subprocess
import re
from IPython.display import FileLink, display, HTML

# --- Ensure necessary variables exist from previous steps ---
if 'latex_template' not in locals() or not latex_template:
    raise ValueError("Original LaTeX template is missing.")
if 'base_cv_text' not in locals() or not base_cv_text:
    raise ValueError("Base CV text (from PDF/LLM) is missing.")
if 'extracted_info' not in locals() or not extracted_info:
    extracted_info = {} # Provide empty dict if extraction failed
    print("Warning: Job description info not fully extracted. LLM analysis may be limited.")
# We don't strictly need tailored_summary/skills_latex_items anymore, as the LLM will regenerate them

# Prepare job info strings for the prompt
job_skills_str = ", ".join(extracted_info.get('required_skills', [])) if extracted_info.get('required_skills') else "Not specified"
job_resp_str = "; ".join(extracted_info.get('key_responsibilities', [])) if extracted_info.get('key_responsibilities') else "Not specified"
job_title_str = extracted_info.get('job_title', 'Not specified')

In [None]:
# --- Revised Prompt for Generating the Full Updated LaTeX Document ---
# Adding explicit anti-hallucination constraint for experience and Adding Few-shots
# Adding optimization instructions (prioritization, slight redundancy check)

prompt_generate_full_latex_optimized = rf"""
You are an expert LaTeX CV assistant. Your task is to generate a complete, updated LaTeX CV document based on the provided original template structure, original CV text context, and target job description details, adhering strictly to the modification rules.

### TASK
Generate a complete LaTeX document string as output, performing the following optimizations and updates:
1.  **Tailor Content:** Update the Summary, Skills, and Experience sections to align with the Target Job Description.
2.  **Prioritize Relevance:** Reorder items within the Key Skills section and put relevant skills text first within Professional Experience bullet points to emphasize qualifications matching the job description.
3.  **Optimize Content (Cautiously):** While generating the updated text for Summary, Skills, and Experience, avoid significant redundancy. Do *not* remove entire sections from the original template structure.
4.  **Maintain Originality:** Ensure all skills and experience details included are grounded in the 'Original Full CV Text'. **Do not add skills, experiences, or methodologies (like Agile/Scrum) if they are not present in the original CV text, even if mentioned in the job description.**
5.  **Format Correctly:** Output strictly valid LaTeX code matching the original template's structure, starting *exactly* with `\documentclass` and ending *exactly* with `\end{{document}}`. # Escaped }} here

### INPUTS

1.  **Original LaTeX Template Structure:**
    ```latex
    {latex_template}
    ```

2.  **Original Full CV Text (for content reference):**
    ```text
    {base_cv_text[:10000]} # Limited length for context
    ```

3.  **Target Job Description Information:**
    - Job Title: {job_title_str}
    - Required Skills: {job_skills_str}
    - Key Responsibilities: {job_resp_str}

### EXAMPLE (Illustrates Prioritization/Highlighting)

*   *Template Snippet:* ... (Same as before, ensure braces inside this example text are doubled too if needed, though usually not necessary inside ``` blocks) ...
    ```latex
    \documentclass{{article}}  # Escaped {{ and }}
    \usepackage{{enumitem}}    # Escaped {{ and }}
    \begin{{document}}          # Escaped {{ and }}
    \section*{{Summary}}          # Escaped {{ and }}
    \begin{{itemize}}[leftmargin=*] # Escaped {{ and }}
    \item Generic point 1.
    \item Generic point 2.
    \end{{itemize}}             # Escaped {{ and }}
    \section*{{Skills}}           # Escaped {{ and }}
    \begin{{itemize}}[leftmargin=*] # Escaped {{ and }}
    \item \textbf{{Dev:}} Python, C++ # Escaped {{ and }}
    \item \textbf{{Tools:}} Git      # Escaped {{ and }}
    \end{{itemize}}             # Escaped {{ and }}
    \section*{{Experience}}       # Escaped {{ and }}
    \textbf{{Old Job}} \hfill 2020-2021 # Escaped {{ and }}
    \begin{{itemize}}[leftmargin=*] # Escaped {{ and }}
    \item Developed web apps using \textbf{{Python}}. # Escaped {{ and }}
    \item Used \textbf{{Git}} for version control. # Escaped {{ and }}
    \end{{itemize}}             # Escaped {{ and }}
    \end{{document}}              # Escaped {{ and }}
    ```
*   *CV Context:* Software dev with Python, C++, Git experience. Built web apps. Experience includes "Used Git for version control." and "Developed web apps using Python."
*   *Job Info:* Seeking Python Developer. Requires Python, Git. Responsibilities: Build backend services.
*   *Expected Output Snippet (Focus on changes - braces inside this example also need escaping):*
    ```latex
    % ... preamble ...
    \section*{{Summary}}  # Escaped {{ and }}
    \begin{{itemize}}[leftmargin=*] # Escaped {{ and }}
    \item Software Developer highly proficient in \textbf{{Python}} with experience in backend services. % Tailored & relevant skill emphasized # Escaped {{ and }}
    \item Skilled in version control using \textbf{{Git}} and developing robust applications. % Tailored & relevant skill mentioned # Escaped {{ and }}
    \end{{itemize}}             # Escaped {{ and }}
    \section*{{Skills}}           # Escaped {{ and }}
    \begin{{itemize}}[leftmargin=*] # Escaped {{ and }}
    \item \textbf{{Dev:}} Python, C++ # Escaped {{ and }}
    \item \textbf{{Tools:}} Git      # Escaped {{ and }}
    \end{{itemize}}             # Escaped {{ and }}
    \section*{{Experience}}       # Escaped {{ and }}
    \textbf{{Old Job}} \hfill 2020-2021 # Escaped {{ and }}
    \begin{{itemize}}[leftmargin=*] # Escaped {{ and }}
    \item Developed web apps using \textbf{{Python}}. % Highlighted relevant skill IN EXISTING text # Escaped {{ and }}
    \item Used \textbf{{Git}} for version control. % Highlighted relevant skill IN EXISTING text # Escaped {{ and }}
    \end{{itemize}}             # Escaped {{ and }}
    % ... rest of document ...
    \end{{document}}             # Escaped {{ and }}
    ```
*(End of Example)*

### DETAILED INSTRUCTIONS (Apply to the main INPUTS provided above)

1.  **Base Structure:** Use the full 'Original LaTeX Template Structure'.
2.  **Output Format:** Respond with ONLY the LaTeX code, starting exactly with `\documentclass` and ending exactly with `\end{{document}}`. NO extra text, comments, or markdown. # Escaped }}
3.  **Personal Information:** Retain as is from the template for now.
4.  **Professional Summary:** Replace content inside `itemize`. Generate 3-5 concise `\item` points tailored to the Job Description. Escape LaTeX characters.
5.  **Key Skills:** Replace content inside `itemize`. Include ONLY skills from the Original CV Text. **Reorder the `\item` list (or skills within categories) to place skills highly relevant to the Job Description first.** Maintain `\textbf{{Category:}}` format if applicable. Do NOT add skills only from the job description. Escape LaTeX characters. # Escaped {{ and }}
6.  **Professional Experience:** Replicate the structure and content from the Original CV Text. Do NOT add new bullet points or information not present originally. For each original `\item`:
    *   Compare its exact text to the Job Description.
    *   If words/phrases *already present* directly match a required skill/responsibility, enclose only those existing words/phrases in `\textbf{{highlighted text}}`. Apply highlighting judiciously. # Escaped {{ and }}
    *   Ensure proper LaTeX escaping for the entire item.
7.  **Other Sections:** Retain Education, Publications, References based on the original structure/content, ensuring correct LaTeX format.
8.  **LaTeX Validity:** Ensure correct syntax (braces `{{}}`, backslashes `\\`, environments). # Escaped {{ and }}

### FINAL OUTPUT (Generate the complete, optimized, and constrained LaTeX document now)
""" # Prompt definition ends here


# --- (The rest of the code block calling the LLM and processing the output remains the same) ---
print("Requesting LLM to generate optimized LaTeX document (Few-Shot, Constrained, Prioritized)...")
# Use the new prompt variable name (it still holds the escaped string)
generated_latex_code = generate_text_with_gemini(prompt_generate_full_latex_optimized, is_json_output=False)

# --- More Robust Post-Processing/Validation (Keep this identical to previous step) ---
# --- More Robust Post-Processing/Validation ---
validated_latex_code = None
raw_latex_output_filename = "llm_raw_output_optimized.tex" # Optional raw output file

if generated_latex_code:
    print(f"LLM returned output (length {len(generated_latex_code)}). Processing...")
    # (Optional: Save raw output for debugging)
    # try:
    #     with open(raw_latex_output_filename, "w", encoding="utf-8") as f:
    #         f.write(generated_latex_code)
    #     print(f"Saved raw LLM output to {raw_latex_output_filename}")
    # except Exception as e:
    #     print(f"Warning: Could not save raw LLM output: {e}")

    processed_code = generated_latex_code.strip()

    # Remove potential markdown fences
    if processed_code.startswith("```latex"):
        processed_code = processed_code[7:].strip()
    elif processed_code.startswith("```"):
        processed_code = processed_code[3:].strip()
    if processed_code.endswith("```"):
        processed_code = processed_code[:-3].strip()

    # Find the start and end markers
    start_marker = r"\documentclass"
    end_marker = r"\end{document}"
    start_index = processed_code.find(start_marker)
    end_index = processed_code.rfind(end_marker) # Use rfind for the *last* end marker

    if start_index != -1 and end_index != -1 and start_index < end_index:
        # --- MODIFICATION START ---
        # Extract the content *between* the markers
        extracted_code = processed_code[start_index : end_index + len(end_marker)]
        print("Successfully extracted content between \\documentclass and \\end{document}.")

        # Check for and warn about unexpected text, but don't fail validation because of it
        leading_text = processed_code[:start_index].strip()
        trailing_text = processed_code[end_index + len(end_marker):].strip()

        if leading_text:
            print(f"Warning: Found unexpected text before \\documentclass: '{leading_text}' (Ignoring)")
        if trailing_text:
            print(f"Warning: Found unexpected text after \\end{{document}}: '{trailing_text}' (Ignoring)") # Escaped }}

        # Assign the *extracted* code as the validated code
        validated_latex_code = extracted_code
        # --- MODIFICATION END ---

        # Optional: Apply final LaTeX escaping if needed (e.g., for '&')
        # print("Applying final LaTeX escaping (if necessary)...")
        # validated_latex_code = validated_latex_code.replace('&', r'\&')
        # validated_latex_code = validated_latex_code.replace('%', r'\%') # Example
        # validated_latex_code = validated_latex_code.replace('_', r'\_') # Example

    else:
        print(f"Error: Could not reliably find '{start_marker}' and '{end_marker}' markers in the correct order in the processed output.")
        print("--- Processed Output Start (first 200 chars) ---")
        print(processed_code[:200])
        print("--- Processed Output End (last 200 chars) ---")
        print(processed_code[-200:])
        print("--- End Processed Output ---")
        validated_latex_code = None # Ensure it's None if markers not found

else:
    print("Error: LLM failed to generate any LaTeX code response.")
    validated_latex_code = None


# --- Write to .tex file (This part remains the same) ---
tex_filename = "updated_cv_llm_generated_fewshot.tex"
if validated_latex_code:
    try:
        # Ensure the validated_latex_code starts correctly *now*
        if not validated_latex_code.startswith(r"\documentclass"):
             print(f"CRITICAL WARNING: Final code does not start with \\documentclass after processing! Check logic.")
             # Decide if you want to proceed or raise an error here

        with open(tex_filename, "w", encoding="utf-8") as f:
            f.write(validated_latex_code)
        print(f"Successfully wrote validated LaTeX code to {tex_filename}")
    except Exception as e:
        print(f"Error writing .tex file: {e}")
        tex_filename = None
else:
    print("Skipping .tex file writing due to failed generation or validation.")
    tex_filename = None

## Perform Seperate Skill Gap Analysis:

In [None]:
# --- Prompt for Skill Gap Analysis (Human Readable) ---
# Use the same prompt as used in earlier steps for "Document Understanding"

prompt_skill_gap_analysis = f"""
Analyze the alignment between the candidate's original CV skills/experience and the required skills for the job.

Candidate's Original CV Text (relevant parts):
---
Skills Section:
{base_skills_section}

Experience Section Snippet (for context):
{base_cv_text[base_cv_text.find('Professional Experience'):][:1500]} # Limit length
---

Required Skills from Job Description:
---
{job_skills_str}
---

Task:
1.  Identify which **Required Skills** seem to be **present** and demonstrated in the Candidate's CV text (Skills section or Experience).
2.  Identify key **Required Skills** that appear to be **missing** or not explicitly mentioned in the Candidate's CV text.
3.  Provide suggestions for the candidate regarding these missing skills (e.g., consider learning, highlight adjacent skills, or rephrase existing experience if applicable but not obvious).

Present the analysis clearly in a human-readable format. Do not use LaTeX formatting here.
Example Output Structure:
Present Skills:
  - Skill A (Mentioned in Skills Section/Experience)
  - Skill B (Demonstrated in Project X)
Missing Skills:
  - Skill C
  - Skill D
Suggestions:
  - Consider taking a course or project focused on Skill C as it appears important for this role.
  - While Skill D is not explicitly listed, your experience with Tool Y might be partially relevant; consider mentioning Tool Y if applying.

Analysis:
"""

print("\nRequesting Skill Gap Analysis for human-readable suggestions...")
skill_gap_analysis_text = generate_text_with_gemini(prompt_skill_gap_analysis, is_json_output=False)

print("\n--- Skill Gap Analysis and Suggestions ---")
if skill_gap_analysis_text:
    # Simple cleaning of potential leading/trailing whitespace
    skill_gap_analysis_text = skill_gap_analysis_text.strip()
    print(skill_gap_analysis_text)
else:
    print("Failed to generate skill gap analysis.")

## Compile Latex to PDF:

In [None]:
def compile_latex(tex_file_path):
    """
    Compiles the given .tex file to PDF using pdflatex.

    Args:
        tex_file_path (str): The path to the .tex file.

    Returns:
        tuple: (success: bool, log_content: str | None, error_message: str | None)
               - success: True if compilation seems successful (exit code 0), False otherwise.
               - log_content: The content of the log file if generated.
               - error_message: A specific error message if subprocess fails badly.
    """
    if not tex_file_path or not os.path.exists(tex_file_path):
        print(f"Error: LaTeX file not found at '{tex_file_path}'")
        return False, None, f"LaTeX file not found: {tex_file_path}"

    base_name = os.path.basename(tex_file_path).replace(".tex", "")
    dir_name = os.path.dirname(tex_file_path) or '.'
    pdf_file_path = os.path.join(dir_name, base_name + ".pdf")
    log_file_path = os.path.join(dir_name, base_name + ".log")
    compilation_success = False
    log_content = None
    error_message = None
    last_return_code = -99 # Initialize with a distinct value

    # Clean previous log before attempting compilation
    if os.path.exists(log_file_path):
        try:
            os.remove(log_file_path)
        except OSError as e:
            print(f"Warning: Could not remove previous log file '{log_file_path}': {e}")

    print(f"\nAttempting to compile {os.path.basename(tex_file_path)}...")
    # Run pdflatex twice for references etc. Use -halt-on-error for cleaner logs on failure.
    for i in range(2):
        print(f"Running pdflatex compilation (Pass {i+1}/2)...")
        try:
            process = subprocess.run(
                ['pdflatex', '-interaction=nonstopmode', '-halt-on-error', tex_file_path],
                capture_output=True, text=True, check=False, timeout=180, # Increased timeout
                cwd=dir_name
            )
            last_return_code = process.returncode
            if process.returncode != 0:
                print(f"Compilation failed on pass {i+1} with exit code {process.returncode}.")
                error_message = f"pdflatex failed on pass {i+1} (exit code {process.returncode})"
                break # Stop if a pass fails
        except subprocess.TimeoutExpired:
             print(f"Error: pdflatex timed out on pass {i+1}.")
             error_message = "pdflatex timed out"
             last_return_code = -2 # Indicate timeout
             break
        except FileNotFoundError:
             print("Error: 'pdflatex' command not found. Is LaTeX installed in the environment?")
             error_message = "'pdflatex' command not found. Ensure TeX Live is installed."
             last_return_code = -4 # Indicate command not found
             break
        except Exception as e:
             print(f"Error running pdflatex subprocess on pass {i+1}: {e}")
             error_message = f"Subprocess error: {e}"
             last_return_code = -3 # Indicate other subprocess error
             break

    # Read log file regardless of exit code, as it might contain useful info
    try:
        if os.path.exists(log_file_path):
            with open(log_file_path, 'r', encoding='utf-8', errors='ignore') as log_file:
                log_content = log_file.read()
    except Exception as e:
        print(f"Warning: Could not read log file '{log_file_path}': {e}")
        if not error_message: error_message = f"Failed to read log file: {e}"


    # Check final results
    # Success requires exit code 0 AND the PDF file to exist
    if last_return_code == 0 and os.path.exists(pdf_file_path):
        print(f"Successfully compiled {os.path.basename(tex_file_path)} to {os.path.basename(pdf_file_path)}")
        compilation_success = True
    else:
        print(f"Error during LaTeX compilation (Last exit code {last_return_code}).")
        if log_content:
            error_lines = [line for line in log_content.splitlines() if line.startswith('! ')]
            if error_lines:
                print("\nPotential Error lines found in log:")
                for line in error_lines[:10]: print(line) # Limit output
                display(HTML(f"<p style='color:red;'>LaTeX compilation failed. Check '{os.path.basename(log_file_path)}' or errors above.</p>"))
            else:
                 print("\nNo lines starting with '! ' found in log. Check full log for errors or warnings.")
                 display(HTML(f"<p style='color:orange;'>LaTeX compilation failed (Exit code: {last_return_code}). PDF may not exist or be complete. Check '{os.path.basename(log_file_path)}'.</p>"))
        else:
            display(HTML(f"<p style='color:red;'>LaTeX compilation failed. Log file could not be read. Error: {error_message}</p>"))

    return compilation_success, log_content, error_message

In [None]:
# --- Compile LaTeX to PDF ---
pdf_filename = "updated_cv_llm_generated_fewshot.tex"
compilation_success = False


pdf_filename = tex_filename.replace(".tex", ".pdf") # Derive PDF name
log_filename = tex_filename.replace(".tex", ".log") # Derive LOG name

# Check if the CORRECT tex file exists before trying to compile
if tex_filename and os.path.exists(tex_filename):
    print(f"\nAttempting to compile {tex_filename} to PDF...") # Should print the ...escaped.tex name
    # Run pdflatex twice
    for i in range(2):
        print(f"Running pdflatex compilation (Pass {i+1}/2)...")
        process = subprocess.run(
            ['pdflatex', '-interaction=nonstopmode', tex_filename], # Uses the correct filename
            capture_output=True, text=True, check=False, timeout=120
        )
        if process.returncode != 0:
            print(f"Compilation failed on pass {i+1} with exit code {process.returncode}.")
            break

    # Check final results (uses correct pdf_filename)
    if process.returncode == 0 and os.path.exists(pdf_filename):
        print(f"Successfully compiled {tex_filename} to {pdf_filename}")
        compilation_success = True
        display(HTML(f"<p style='color:green;'>Successfully generated {pdf_filename}.</p>"))
    else:
        print(f"Error during LaTeX compilation (Last exit code {process.returncode}).")
        # Uses correct log_filename
        display(HTML(f"<p style='color:red;'>LaTeX compilation failed. Check '{log_filename}' for details.</p>"))
        # Print log snippet (Uses correct log_filename)
        try:
            with open(log_filename, 'r', encoding='utf-8', errors='ignore') as log_file:
                 # ... (rest of log reading code) ...
                 log_content = log_file.read()
                 error_lines = [line for line in log_content.splitlines() if line.startswith('! ')]
                 if error_lines:
                      print("Potential Error lines found:")
                      for line in error_lines[:10]: print(line)
                 else:
                      # ... (warning checks) ...
                      print("No lines starting with '! ' found, check full log.")
        except FileNotFoundError:
            print(f"Could not find log file: {log_filename}") # Will print correct name now
        except Exception as e:
            print(f"Error reading log file: {e}")
else:
    # This message should indicate if the expected .tex file wasn't found
    print(f"\nSkipping compilation because the expected .tex file '{tex_filename}' was not found or not generated successfully.")


## Provide Download Link:


In [None]:
# --- Display Skill Suggestions and Provide Download Link ---

print("\n" + "="*30 + " Suggestions Based on Skill Gap Analysis " + "="*30)
if skill_gap_analysis_text:
    print(skill_gap_analysis_text)
else:
    print("Skill gap analysis was not generated.")
print("="*80)


if compilation_success and os.path.exists(pdf_filename):
    print(f"\nDownload the updated CV PDF (LLM Generated):")
    display(FileLink(pdf_filename))
else:
    print(f"\nCould not generate {pdf_filename}.")
    if tex_filename and os.path.exists(tex_filename):
         print(f"You can download the generated .tex file for manual debugging:")
         display(FileLink(tex_filename))

# Output Evaluation:

In [None]:
## Proposed CV Updates & Analysis
# --- Step 3a: Evaluate Generated CV Content Against Job Description ---

evaluation_rating = None
evaluation_justification = None

# Only proceed if we have validated LaTeX code and extracted job info
if validated_latex_code and extracted_info:
    print("\n" + "="*30 + " Evaluating Generated CV Fit " + "="*30)

    # Prepare concise job info for the evaluation prompt
    eval_job_skills = extracted_info.get('required_skills', [])
    eval_job_resp = extracted_info.get('key_responsibilities', [])
    eval_job_title = extracted_info.get('job_title', 'N/A')

    if not eval_job_skills and not eval_job_resp:
        print("Warning: Insufficient job description details extracted for a meaningful evaluation.")
    else:
        # Limit the LaTeX code sent for evaluation to avoid excessive length, focus on key sections
        # Find start of summary and end of experience (approximate - adjust regex if needed)
        summary_start_match = re.search(r"\\section\*\{Professional Summary\}", validated_latex_code, re.IGNORECASE)
        experience_end_match = re.search(r"\\section\*\{Education\}", validated_latex_code, re.IGNORECASE)

        cv_content_for_eval = validated_latex_code # Default to full code
        if summary_start_match and experience_end_match:
             start_idx = summary_start_match.start()
             end_idx = experience_end_match.start()
             cv_content_for_eval = validated_latex_code[start_idx:end_idx]
             print("Evaluating content from Summary through Experience sections...")
        elif summary_start_match:
             cv_content_for_eval = validated_latex_code[summary_start_match.start():] # Eval from summary onwards
             print("Evaluating content from Summary section onwards...")
        else:
             print("Warning: Could not isolate key sections. Evaluating first 10000 chars of generated LaTeX.")
             cv_content_for_eval = validated_latex_code[:10000]


        prompt_evaluate_cv = rf"""
        You are an expert Technical Recruiter evaluating a candidate's CV against a specific job description.

        **Inputs:**

        1.  **Candidate CV Content (Extracted from generated LaTeX - focus on text content, ignore syntax):**
            ```latex
            {cv_content_for_eval}
            ```

        2.  **Target Job Description:**
            - Title: {eval_job_title}
            - Required Skills: {', '.join(eval_job_skills) if eval_job_skills else 'Not specified'}
            - Key Responsibilities: {'; '.join(eval_job_resp) if eval_job_resp else 'Not specified'}

        **Evaluation Task:**

        Assess how well the provided Candidate CV Content aligns with the Target Job Description. Consider the following:
        - **Summary Relevance:** Does the summary effectively highlight experience relevant to the job title and responsibilities?
        - **Skill Alignment:** Does the CV showcase the required skills prominently? Are the listed skills relevant?
        - **Experience Match:** Do the experience descriptions (especially any highlighted parts) demonstrate capabilities relevant to the job's key responsibilities and required skills?
        - **Overall Fit:** Based *only* on the provided texts, how strong is the match between the CV and the job?

        **Output Format:**

        Respond ONLY with a valid JSON object containing two keys:
        1.  `rating`: An integer score from 1 to 5, representing the overall fit.
        2.  `justification`: A brief text explanation (2-4 sentences) justifying the rating, mentioning specific strengths or weaknesses regarding the job fit.

        **Rating Scale:**
        - 1: Poor Fit (Very few relevant skills/experience)
        - 2: Weak Fit (Some overlap, but major gaps)
        - 3: Moderate Fit (Reasonable match for several requirements, some gaps remain)
        - 4: Strong Fit (Good alignment with most key requirements)
        - 5: Excellent Fit (Very strong alignment, clearly demonstrates suitability for key skills/responsibilities)

        **JSON Output:**
        """

        print("Requesting LLM to evaluate CV fit...")
        evaluation_response_str = generate_text_with_gemini(prompt_evaluate_cv, is_json_output=True)

        if evaluation_response_str:
            try:
                # Clean potential markdown fences
                if evaluation_response_str.strip().startswith("```json"):
                    evaluation_response_str = evaluation_response_str.strip()[7:-3].strip()
                elif evaluation_response_str.strip().startswith("```"):
                     evaluation_response_str = evaluation_response_str.strip()[3:-3].strip()

                evaluation_result = json.loads(evaluation_response_str)

                rating = evaluation_result.get('rating')
                justification = evaluation_result.get('justification')

                # Basic validation of rating
                if isinstance(rating, int) and 1 <= rating <= 5:
                    evaluation_rating = rating
                    print(f"Evaluation Rating: {evaluation_rating}/5")
                else:
                    print(f"Warning: Received invalid rating value: {rating}")

                if isinstance(justification, str) and justification.strip():
                    evaluation_justification = justification.strip()
                    print(f"Justification: {evaluation_justification}")
                else:
                    print("Warning: Received invalid or empty justification.")

            except json.JSONDecodeError as e:
                print(f"Error parsing JSON response from evaluation LLM: {e}")
                print("Raw Response Snippet:", evaluation_response_str[:500])
            except Exception as e:
                print(f"An unexpected error occurred processing the evaluation response: {e}")
                print("Raw Response Snippet:", evaluation_response_str[:500])
        else:
            print("LLM evaluation call failed or returned no response.")

    print("="*60)

else:
    print("\nSkipping CV Evaluation step because generated LaTeX code or job info is missing.")

print("\n" + "="*30 + " Final Results & Downloads " + "="*30)

# Display Evaluation Results
print("\n--- Generated CV Evaluation ---")
if evaluation_rating is not None:
    print(f"Rating vs. Job Description: {evaluation_rating}/5")
    print(f"Justification: {evaluation_justification if evaluation_justification else 'N/A'}")
else:
    print("Evaluation could not be completed.")

# Display Skill Suggestions
print("\n--- Skill Gap Analysis & Suggestions ---")
if 'skill_gap_analysis_text' in locals() and skill_gap_analysis_text:
    print(skill_gap_analysis_text)
else:
    print("Skill gap analysis was not generated or failed.")
print("="*80)

# Display Download Links
if compilation_success and os.path.exists(pdf_filename):
    print(f"\nDownload the updated CV PDF:")
    display(FileLink(pdf_filename))
else:
    print(f"\nCould not generate {pdf_filename}.")
    if tex_filename and os.path.exists(tex_filename):
        print(f"You can download the generated .tex file for manual debugging:")
        display(FileLink(tex_filename))

# Limitation and future work:

## Discussion

**GenAI Capabilities Demonstrated:**

1.  **Structured Output (JSON):** Used the Gemini API's capability (or instructed via prompt) to return extracted job details in JSON format, enabling easier programmatic use (Phase 4, Step 9).
2.  **Few-Shot Prompting:** Provided examples within the prompt to guide the model in generating a high-quality, relevant tailored CV summary, improving upon zero-shot results (Phase 4, Step 10).
3.  **Document Understanding:** Leveraged the model's ability to read, comprehend, and compare information from two distinct text sources (CV and Job Description analysis) to perform skill matching and gap analysis (Phase 4, Step 11).

**Limitations:**
*   **Web Scraping Fragility:** Still a major dependency. LinkedIn, Greenhouse, etc., often change layouts or use JavaScript, making scraping unreliable without more advanced tools (like Selenium).
*   **API Dependency & Cost:** Relies on Google API access and may incur costs. Subject to rate limits or outages.
*   **Model Interpretation:** The model's understanding of "skill match" or "relevance" might differ from human interpretation. Output requires review.
*   **Context Window Limits:** While Gemini models have large context windows, extremely long CVs or job posts might still exceed limits (though less likely than with smaller models). We truncated the JD input as a precaution.
*   **Safety Filters:** Overly strict safety settings might block legitimate responses, while lax settings risk inappropriate content (though less likely for this use case).
*   **JSON Mode Reliability:** If `response_mime_type` isn't perfectly supported or the model deviates, JSON parsing can fail. The fallback instruction helps but isn't foolproof.

**Creativity & Problem Solving:**
*   Combined web scraping, structured data extraction, and multiple prompting techniques (few-shot, instruction-based) using the Gemini API.
*   Implemented specific GenAI capabilities (Structured Output, Few-Shot, Document Understanding) relevant to the course objectives.
*   Included basic error handling for API calls and JSON parsing.

**Future Work:**
*   **Implement RAG:** Use embeddings (e.g., Google's embedding models via API or local Sentence Transformers) and vector search (FAISS, ChromaDB) to find the *most* relevant CV sections for *each* job requirement before generation – this would be a major enhancement demonstrating Embeddings, Vector Search, and RAG.
*   **Fine-tunning:** Fine tune a smaller model specifically on CV/Job Description data (more advanced).
*   **Refine Scraping:** Use `Selenium` (harder in Kaggle) or specific API scraping services for job boards if available. Add more robust parsing logic for common job sites.
*   **Bullet Point Generation:** Add a step to rewrite specific experience bullet points using keywords/responsibilities from the job description.
*   **User Interface:** Build a simple UI using Gradio/Streamlit (might require running outside Kaggle or using specific integrations).
*   **Evaluation Metric:** Implement a simple GenAI evaluation (e.g., ask Gemini to rate the generated summary's relevance to the job description on a scale of 1-5, using another prompt).
*   **Agent Framework:** Re-structure the workflow using an agent framework (like LangChain with Gemini integration) where the agent decides the sequence of actions (scrape, extract, compare, rewrite).