# Initial Job Listing Generation Implementation

## Framework Overview

This notebook implements the **baseline generation prompt** for LLM-based job listing creation, translating empirically derived metrics from literature and expert interviews into systematic prompt templates.

### Core Components
- **Data preprocessing** of structured company and job information
- **Baseline generation prompt** with contextual meta-prompting  
- **Few-shot learning** with company-specific examples
- **ChatGPT-4 integration** for initial job listing generation

---

## 1. Prompt Template Architecture

### Language-Specific Implementation
The `JobListingPromptGenerator` class creates structured templates for Dutch and English job listings:

**Template Structure:**
- System role definition (recruitment specialist)
- Job details (title, location, employment type, hours, education, experience, salary)
- Responsibilities and employment conditions
- Writing guidelines (tone, jargon level, word count)
- Content rules (gender neutrality, cliché avoidance)
- Candidate profile integration
- Company information requirements
- Output format specification

### Few-Shot Learning Integration
**Company-specific examples:** 10 verified job listings per recruitment company loaded from Excel files:
- Companies A, C, D (Dutch): `combined_rest_dutch`
- Companies B, E (English): `combined_rest_english`

**Implementation:** Examples guide model behavior toward company-specific style and structure using `FewShotPromptTemplate` from LangChain.



In [3]:
import pandas as pd
from tqdm import tqdm
from openai import OpenAI
import os
import json
import numpy as np

from langchain.prompts import PromptTemplate
from langchain.prompts.few_shot import FewShotPromptTemplate
from langchain.schema import BaseOutputParser
from typing import Dict, List, Optional, Tuple

import time
from tqdm import tqdm


## 2. Generation Pipeline

### Batch Processing Workflow
1. **Load structured data** from Excel files (`combined_dutch_with_html.xlsx`, `combined_english_with_html.xlsx`)
2. **Generate prompts** with company-specific examples using `generate_batch_prompts()`
3. **Execute generation** via OpenAI API with temperature=0.3
4. **Export results** to Excel format for evaluation

### Technical Configuration
- **Model:** GPT-4o for multilingual capabilities
- **Temperature:** 0.3 (balanced variation with structure)
- **Rate limiting:** 1-second delay between API calls
- **Error handling:** Graceful degradation with error logging


In [16]:
df_dutch = pd.read_excel("combined_dutch_with_html.xlsx")
df_english = pd.read_excel("combined_english_with_html.xlsx")

print("NL:", df_dutch.shape)
print("EN:", df_english.shape)

NL: (10, 25)
EN: (10, 25)


In [6]:
rest_A_sample = pd.read_excel("rest_listings/rest_A.xlsx")
rest_C_sample = pd.read_excel("rest_listings/rest_C.xlsx")
rest_D_sample = pd.read_excel("rest_listings/rest_D.xlsx")
combined_rest_dutch = pd.concat([rest_A_sample, rest_C_sample, rest_D_sample], ignore_index=True)

rest_B_sample = pd.read_excel("rest_listings/rest_B.xlsx")
rest_E_sample = pd.read_excel("rest_listings/rest_E.xlsx")
combined_rest_english = pd.concat([rest_B_sample, rest_E_sample], ignore_index=True)

In [17]:

class JobListingPromptGenerator:    
    def __init__(self, language: str = 'dutch'):
        """
        Initialize the prompt generator with language support.
        
        Args:
            language (str): 'dutch' or 'english'
        """
        self.language = language.lower()
        if self.language not in ['dutch', 'english']:
            raise ValueError("Language must be 'dutch' or 'english'")
            
        self.base_template = self._create_base_template()
        self.few_shot_examples = []
        self.company_examples = {}  
        
    def _create_base_template(self) -> str:
        if self.language == 'dutch':
            return self._create_dutch_template()
        else:
            return self._create_english_template()
    
    def _create_dutch_template(self) -> str:
        return """Je bent een ervaren recruitment- en marketingspecialist, gespecialiseerd in het schrijven van hoogwaardige functieomschrijvingen voor de arbeidsmarkt.
[TAAK]
Schrijf een professionele, aantrekkelijke en inclusieve functieomschrijving gebaseerd op onderstaande informatie.

[FUNCTIEGEGEVENS]
Functietitel: {job_title}
Locatie: {job_location}
Dienstverband: {employment_type}
Uren per week: {hours_per_week}
Opleidingsniveau: {education_level}
Ervaringsniveau: {experience_level}
Salarisindicatie: {salary_range}

[FUNCTIE-INHOUD]
Verduidelijk alleen onderstaande verantwoordelijkheden; voeg niets toe en laat niets weg:
{responsibilities}

[ARBEIDSVOORWAARDEN]
Verduidelijk alleen onderstaande voorwaarden; Vermeld altijd het salarisbereik indien gegeven; voeg niets toe en laat niets weg, zet alle voorwaarden in bulletpoints:
{employment_conditions}

[SCHRIJFRICHTLIJNEN]
- Gebruik een verhalende stijl om de functieomschrijving aantrekkelijker en coherenter te maken
- Toon van stem: {tone_of_voice}
- Jargonniveau: {jargon_level} (schaal 1-5, waarbij 1 = geen jargon, 5 = veel professioneel jargon)
- Woordenaantal: {min_words} - {max_words}
- Alleen gespecialiseerde certificaten moeten in vet worden weergegeven, niet namen of software.
- {additional_syntax_rules}

[INHOUDSREGELS]
- Presenteer verantwoordelijkheden en voordelen in de context van de ervaring of reis van de kandidaat
- Gebruik volledig genderneutrale taal (vermijd hij/zij; gebruik 'je' of 'de kandidaat')
- Schrijf actief en enthousiast
- Benadruk wat het bedrijf de kandidaat te bieden heeft
- Geen emoji's
- Gebruik nooit een em-dash
- Vermijd typische clichés zoals: "Dan is deze vacature echt iets voor jou!", "Je hebt geen 9-tot-5 mentaliteit" of "Je bent de spin in het web"
- Vermijd populaire taal zoals 'sales tijger' of 'pro'
- {additional_content_rules}

[KANDIDAATPROFIEL]
Doelgroep: {target_demographic}
Ideale kandidaateigenschappen: {ideal_candidate_traits}
Verwerk dit profiel in de tekst; de genoemde woorden hoeven NIET letterlijk gebruikt te worden.

[BEDRIJFSINFORMATIE]
Bedrijfsnaam: {company_name}
Branche: {industry_sector}
Bedrijfsomschrijving: {company_description}
Schrijf een feitelijke, volledige en motiverende tekst over het bedrijf. (Bijv.: wat doet het bedrijf, omvang, onderscheidende kenmerken)

[OUTPUT FORMAAT]
Structureer de functieomschrijving als volgt: begin met de functietitel, gevolgd door de volgende kopjes in exact deze volgorde:
{format_structure}
"""

    def _create_english_template(self) -> str:
        return """You are an experienced recruitment and marketing specialist, specializing in writing high-quality job descriptions for the job market.

[TASK]
Write a professional, attractive and inclusive job description based on the information below.

[JOB DETAILS]
Job title: {job_title}
Location: {job_location}
Employment type: {employment_type}
Hours per week: {hours_per_week}
Education level: {education_level}
Experience level: {experience_level}
Salary indication: {salary_range}

[RESPONSIBLITIES]
Only clarify the responsibilities below; do not add anything and do not leave anything out:
{responsibilities}

[EMPLOYMENT CONDITIONS]
Only clarify the conditions below; Always include salaray range if given; do not add anything and do not leave anything out, put all conditions in bullet points:
{employment_conditions}

[WRITING GUIDELINES]
- Use a storytelling style to make the job description more engaging and cohesive
- Tone of voice: {tone_of_voice}
- Jargon level: {jargon_level} (scale 1-5, where 1 = no jargon, 5 = lots of professional jargon)
- Word count: {min_words} - {max_words}
- Only specialized certificates should be in bold, not names or software.
- {additional_syntax_rules}

[CONTENT RULES]
- Frame responsibilities and benefits in the context of the candidate’s experience or journey
- Use completely gender-neutral language (avoid he/she; use 'you' or 'the candidate')
- Write actively and enthusiastically
- Emphasize what the company has to offer the candidate
- No emojis
- Never use an em-dash
- Avoid typical clichés like: "Then this vacancy is really something for you!", "You don't have a 9-to-5 mentality" or "You are the spider in the web"
- Avoid popular language like 'sales tiger' or 'pro'
- {additional_content_rules}

[CANDIDATE PROFILE]
Target demographic: {target_demographic}
Ideal candidate traits: {ideal_candidate_traits}
Incorporate this profile into the text; the mentioned words do NOT need to be used literally. 

[COMPANY INFORMATION]
Company name: {company_name}
Industry sector: {industry_sector}
Company description: {company_description}
Write a factual, complete and motivating text about the company. (E.g.: what does the company do, size, distinguishing characteristics)

[OUTPUT FORMAT]
Structure the job description as follows: start with the job title, followed by the following headings in exactly this order:
{format_structure}
"""
    
    def load_examples_from_dataframe(self, df: pd.DataFrame):
        """Load job listing examples from dataframe."""
        for company in df['r_comp'].unique():
            valid_listings = df[(df['r_comp'] == company) & (df['job_listing'].notna())] 
            self.company_examples[company] = valid_listings['job_listing'].tolist()
        
        print(f"\nLoaded examples for {len(self.company_examples)} companies:")
        for company, examples in self.company_examples.items():
            print(f"  - {company}: {len(examples)} examples")
    
    def get_company_examples(self, company):
        """Get examples for a specific company."""
        return self.company_examples.get(company, [])
    
    def create_prompt_with_examples(self, **kwargs) -> str:
        """Create a prompt with few-shot examples for the specified company."""
        company = kwargs.get('r_comp')
        
        if not company or company not in self.company_examples:
            print(f"Warning: No examples found for company '{company}'.")
            return PromptTemplate(
                template=self.base_template,
                input_variables=list(kwargs.keys())
            ).format(**kwargs)
        
        company_examples = self.company_examples[company]
        
        if self.language == 'dutch':
            example_template = PromptTemplate(
                template="""=== VOORBEELD {index} ===
Volledige vacaturetekst:
{full_listing}

---""",
                input_variables=["index", "full_listing"]
            )
            
            prefix = f"""Je krijgt nu {len(company_examples)} complete voorbeelden van vacatureteksten voor {company}.
Bestudeer deze voorbeelden zorgvuldig en let op:
- De specifieke schrijfstijl en toon
- De structuur en opbouw
- Het taalgebruik en formuleringen

VOORBEELDEN:"""
            
            suffix = f"""=== EINDE VOORBEELDEN ===

Gebruik EXACT dezelfde stijl, toon, en structuur als in de voorbeelden hierboven voor de volgende nieuwe vacature:

{self.base_template}"""
        
        else:
            example_template = PromptTemplate(
                template="""=== EXAMPLE {index} ===
Complete job listing:
{full_listing}

---""",
                input_variables=["index", "full_listing"]
            )
            
            prefix = f"""You will now receive {len(company_examples)} complete examples of job descriptions for {company}.
Study these examples carefully and pay attention to:
- The specific writing style and tone
- The structure and layout
- The language use and formulations

EXAMPLES:"""
            
            suffix = f"""=== END OF EXAMPLES ===

Use EXACTLY the same style, tone, and structure as in the examples above for the following new job listing:

{self.base_template}"""
        
        examples_with_index = []
        for i, example in enumerate(company_examples, 1):
            examples_with_index.append({
                'index': i,
                'full_listing': example  
            })
        
        few_shot_prompt = FewShotPromptTemplate(
            examples=examples_with_index,
            example_prompt=example_template,
            prefix=prefix,
            suffix=suffix,
            input_variables=list(kwargs.keys())
        )
        
        return few_shot_prompt.format(**kwargs)
    
    def generate_batch_prompts(self, df: pd.DataFrame, output_column: str = 'prompt') -> pd.DataFrame:
        """Generate prompts for all rows in the dataframe."""
        df = df.copy()
        prompts = []
        
        for idx, row in df.iterrows():
            params = {k: v for k, v in row.to_dict().items() if pd.notna(v)}
            
            try:
                prompt = self.create_prompt_with_examples(**params)
                prompts.append(prompt)
            except Exception as e:
                print(f"Error generating prompt for row {idx}: {str(e)}")
                prompts.append(None)
        
        df[output_column] = prompts
        return df



In [18]:

generator_dutch = JobListingPromptGenerator(language='dutch')
generator_dutch.load_examples_from_dataframe(combined_rest_dutch)
df_with_prompts_dutch = generator_dutch.generate_batch_prompts(df_dutch)

generator_english = JobListingPromptGenerator(language='english')
generator_english.load_examples_from_dataframe(combined_rest_english)
df_with_prompts_english = generator_english.generate_batch_prompts(df_english)


Loaded examples for 3 companies:
  - A: 10 examples
  - C: 10 examples
  - D: 10 examples

Loaded examples for 2 companies:
  - B: 10 examples
  - E: 10 examples


## 3. Output Preparation

### Export Format
Generated listings exported to:
- `final_nl_prompts_outputs.xlsx` (Dutch results)
- `final_en_prompts_outputs.xlsx` (English results)

**Columns include:**
- Original job data
- Generated prompts
- Generated listings
- Error tracking

### Evaluation Integration
Outputs structured for subsequent human and LLM-based evaluation in Qualtrics format, supporting the next phase of the research framework.

In [19]:
def execute_prompts(df, prompt_column="prompt", delay=1.0):
    responses = []
    errors = []
    
    for prompt in tqdm(df[prompt_column], desc="Processing prompts"):
        try:
            response = client.chat.completions.create(
                model="gpt-4o",
                messages=[{"role": "user", "content": prompt}],
                temperature=0.3
            )
            responses.append(response.choices[0].message.content)
            errors.append(None)
            
        except Exception as e:
            print(f"Error: {str(e)}")
            responses.append(None)
            errors.append(str(e))
            
        time.sleep(delay)  
    
    df = df.copy()
    df["generated_listing"] = responses
    df["errors"] = errors
    return df

df_with_prompts_dutch = execute_prompts(df_with_prompts_dutch, delay=1.0)
df_with_prompts_english = execute_prompts(df_with_prompts_english, delay=1.0)


Processing prompts: 100%|█████████████████| 10/10 [02:07<00:00, 12.71s/it]


In [None]:
df_with_prompts_english.to_excel("final_en_prompts_outputs.xlsx", index=False)
df_with_prompts_dutch.to_excel("final_nl_prompts_outputs.xlsx", index=False)



















