# LLM-Based Evaluation Pipeline

This notebook presents an implementation of an LLM-Based Evaluation Pipeline for automated assessment of job listing quality.

## Key Components
1. **Evaluation Metrics**
- Implements metrics derived from expert interviews and literature
- Provides quantitative scoring across multiple quality dimensions with multiple techniques

2. **Analysis Reporting**
- Tracks efficiency through step counting
- Documents pipeline performance across evaluation iterations
- Generates detailed breakdowns of metric-specific assessments



In [3]:
import numpy as np
import pandas as pd
import os

import dspy
import openai

import phoenix as px

## 1. Evaluation Metrics
Example quality metrics:
1. Clarity


In [4]:
CLARITY_LLM_JUDGE_PROMPT = """
In this task, you will be presented with a job listing. Your objective is to evaluate the clarity 
of the job listing. A "clear" job listing is one that is well-structured, concise, easy to read, 
and directly communicates the necessary information without ambiguity or unnecessary complexity. 
An "unclear" job listing is one that deviates from the specified job, is vague, disorganized, overly complex, or difficult to understand.

Your response should be a single word: either "clear" or "unclear," indicating whether the listing is easy to understand. Do not include any other text or characters in your answer.

After providing your response, you must write a detailed explanation justifying your reasoning. 
Avoid stating the final label at the beginning of your explanation. Your reasoning should focus on specific aspects of the job listing that affect clarity, such as grammar, organization, and conciseness.

[BEGIN DATA]
Input: {job_listing}
Answer: {response}
[END DATA]

EXPLANATION: Provide your reasoning step by step, evaluating aspects like structure, language, and readability.
LABEL: "clear" or "unclear"
"""

In [5]:
def evaluate_clarity(output: str, input: str) -> bool:
    if output is None:
        return False
    df = pd.DataFrame({"query": [input.get("question")],
                       "response": [output.get("final_output")]})
    response = llm_classify(
        data=df,
        template=CLARITY_LLM_JUDGE_PROMPT,
        rails=["clear", "unclear"],
        model=eval_model,
        provide_explanation=True
    )
    return response['label'] == 'clear'