<a href="https://colab.research.google.com/github/Rupalib30/Colab-notebook-langchain-training/blob/main/Day3_Assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install langchain_google_genai
from langchain_google_genai import ChatGoogleGenerativeAI
from google.colab import userdata

llm_gemini = ChatGoogleGenerativeAI(model="gemini-2.5-flash", google_api_key=userdata.get('Rup-ai-training'))


Collecting langchain_google_genai
  Downloading langchain_google_genai-4.2.0-py3-none-any.whl.metadata (2.7 kB)
Collecting filetype<2.0.0,>=1.2.0 (from langchain_google_genai)
  Downloading filetype-1.2.0-py2.py3-none-any.whl.metadata (6.5 kB)
Downloading langchain_google_genai-4.2.0-py3-none-any.whl (66 kB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m66.5/66.5 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading filetype-1.2.0-py2.py3-none-any.whl (19 kB)
Installing collected packages: filetype, langchain_google_genai
Successfully installed filetype-1.2.0 langchain_google_genai-4.2.0


# Prompt Quality Scoring Agent

## Project Overview

This project implements a **Prompt Quality Scoring Agent** using LangChain and Google's Gemini LLM. The agent is designed to evaluate the quality of a given text prompt based on five key criteria: Clarity, Specificity, Context, Output Format & Constraints, and Persona Defined. For each criterion, it provides a score (1-10), a detailed explanation, and actionable suggestions for improvement. Finally, it calculates an overall score and provides a summary of the prompt's quality.

## Features

- **Structured Output**: Uses Pydantic models to ensure consistent and parseable JSON output.
- **Comprehensive Evaluation**: Assesses prompts against five critical quality dimensions.
- **Actionable Feedback**: Provides explanations for scores and practical suggestions for improvement.
- **LangChain Integration**: Leverages LangChain's powerful abstractions for LLM interaction and output parsing.
- **Google Gemini Powered**: Utilizes the `gemini-2.5-flash` model for robust evaluation capabilities.

## Prompt Quality Criteria

1.  **Clarity (0-10)**: Checks whether the prompt is easy to understand and has a clear goal.
2.  **Specificity / Details (0-10)**: Evaluates whether sufficient details and requirements are provided.
3.  **Context (0-10)**: Checks if background information, audience, or use case is mentioned.
4.  **Output Format & Constraints (0-10)**: Checks whether the expected output format, tone, or length is specified.
5.  **Persona Defined (0-10)**: Confirms whether a prompt assigns a specific role or persona to the AI.

**Final Score Calculation**: The overall score is the average of the five criteria scores.

## Setup and Installation

1.  **Clone the repository** (or copy the notebook content):
    ```bash
    git clone <your-repo-url>
    cd prompt-quality-agent
    ```

2.  **Install Dependencies**: Install the required Python libraries using pip:
    ```bash
    !pip install langchain_google_genai pydantic langchain_core
    ```

3.  **Google API Key**:
    - You'll need a Google API key for the Gemini model. If you don't have one, create it in [Google AI Studio](https://aistudio.google.com/).
    - In Google Colab, securely store your API key in the `Secrets` tab (accessible via the 'üîë' icon on the left panel). Name the secret `Rup-ai-training`.

## Usage

To use the Prompt Quality Scoring Agent, follow these steps:

1.  **Import necessary classes and define Pydantic models**:
    (Run the code cells defining `CriterionEvaluation` and `PromptEvaluationOutput`.)

2.  **Initialize the LLM and Parser**:
    (Ensure the `llm_gemini` instance and `PydanticOutputParser` are initialized as shown in the notebook.)

3.  **Define the LLM Prompt Template**:
    (Run the code cell that constructs the `ChatPromptTemplate`.)

4.  **Use the `evaluate_prompt` function**:
    The core logic is encapsulated in the `evaluate_prompt` function. Pass your prompt as a string to this function.

    ```python
    from langchain_google_genai import ChatGoogleGenerativeAI
    from google.colab import userdata
    from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
    from langchain_core.output_parsers import PydanticOutputParser
    from pydantic import BaseModel, Field

    # Pydantic Models (as defined in the notebook)
    class CriterionEvaluation(BaseModel):
        score: int = Field(..., description="Score for the criterion (1-10)")
        explanation: str = Field(..., description="Detailed explanation for the given score")
        suggestions: str = Field(..., description="Actionable suggestions for improvement")

    class PromptEvaluationOutput(BaseModel):
        clarity: CriterionEvaluation
        specificity: CriterionEvaluation
        context: CriterionEvaluation
        output_format_constraints: CriterionEvaluation = Field(..., alias="output_format_constraints")
        persona_defined: CriterionEvaluation
        overall_score: int = Field(..., description="Overall prompt quality score (1-10)")
        summary: str = Field(..., description="Overall summary of the prompt evaluation")


    # Initialize LLM and Parser
    llm_gemini = ChatGoogleGenerativeAI(model="gemini-2.5-flash", google_api_key=userdata.get('Rup-ai-training'))
    parser = PydanticOutputParser(pydantic_object=PromptEvaluationOutput)
    format_instructions = parser.get_format_instructions()

    system_message = (
        "You are an expert prompt quality evaluator. Your task is to critically assess a given user prompt "
        "based on five key criteria: Clarity, Specificity, Context, Output Format & Constraints, and Persona Defined. "
        "Provide a score from 1 to 10 for each criterion, along with a detailed explanation "
        "and actionable suggestions for improvement. Finally, provide an overall score and a summary."
    )

    human_message = (
        "Please evaluate the following user prompt:\n\n"
        "User Prompt: {user_prompt}\n\n"
        "{format_instructions}\n\n"
        "Ensure your output adheres strictly to the specified JSON format."
    )

    prompt = ChatPromptTemplate.from_messages([
        ("system", system_message),
        ("human", human_message)
    ])

    def evaluate_prompt(user_prompt: str) -> PromptEvaluationOutput:
        chain = prompt | llm_gemini | parser
        result = chain.invoke({"user_prompt": user_prompt, "format_instructions": format_instructions})
        return result

    # Example usage:
    my_prompt = "Write a short story about a cat who discovers a magical yarn ball."
    evaluation_result = evaluate_prompt(my_prompt)
    print(evaluation_result.model_dump_json(indent=2))
    ```

## Agent Architecture

The agent's architecture is built on LangChain principles:

1.  **Pydantic Models**: `CriterionEvaluation` and `PromptEvaluationOutput` define the structured schema for the evaluation results, ensuring consistency.
2.  **LLM**: The `ChatGoogleGenerativeAI` model (`gemini-2.5-flash`) serves as the underlying large language model for performing the evaluation.
3.  **Prompt Template**: A `ChatPromptTemplate` is crafted with a system message establishing the evaluator's persona and a human message that injects the user's prompt and the Pydantic output parsing instructions.
4.  **Output Parser**: `PydanticOutputParser` is used to guide the LLM to generate output strictly conforming to the `PromptEvaluationOutput` Pydantic model.
5.  **LangChain Expression Language (LCEL) Chain**: A simple processing chain (`prompt | llm_gemini | parser`) combines these components, making the evaluation process streamlined and robust.

## Test Cases and Results

Below are some test prompts used to validate the agent's functionality, along with their evaluation results. These demonstrate how the agent scores prompts across various quality dimensions and provides tailored feedback.

### Test Prompt 1: Simple, Clear Prompt

**Prompt**: "What is the capital of France?"

```json
{
  "clarity": {
    "score": 10,
    "explanation": "The prompt is exceptionally clear and straightforward. It's a direct, unambiguous question that leaves no room for misinterpretation regarding what information is being sought.",
    "suggestions": "No improvements needed for clarity."
  },
  "specificity": {
    "score": 10,
    "explanation": "The prompt is highly specific, asking for a single, precise piece of information: 'the capital of France.' There is no vagueness about the desired output.",
    "suggestions": "No improvements needed for specificity."
  },
  "context": {
    "score": 10,
    "explanation": "For a simple factual query, the prompt is self-contained and provides all necessary context. The question itself defines the subject ('France') and the type of information required ('capital'). No additional background information is needed for the AI to answer accurately.",
    "suggestions": "No improvements needed for context."
  },
  "output_format_constraints": {
    "score": 1,
    "explanation": "The prompt provides absolutely no instructions regarding the desired output format or any constraints. It does not specify if the answer should be plain text, a complete sentence, a list, JSON, or any other structure. This lack of guidance can lead to varied or unstandardized responses.",
    "suggestions": "To improve, specify the desired output format. For example: 'Respond with just the name of the city.', 'Provide the answer in a complete sentence.', or 'Format the output as a JSON object with a key 'capital' and its value.'"
  },
  "persona_defined": {
    "score": 1,
    "explanation": "The prompt does not define any specific persona or role for the AI to adopt. The AI is simply expected to act as a general information provider, without any particular tone, expertise, or perspective.",
    "suggestions": "If a specific tone or expertise is desired, define a persona. For example: 'As a geography expert, what is the capital of France?' or 'Imagine you are a tour guide, tell me the capital of France.'"
  },
  "overall_score": 6,
  "summary": "The prompt 'What is the capital of France?' is an excellent example of a clear, specific, and well-contextualized request for a simple factual lookup. It is highly effective for its intended purpose due to its directness and lack of ambiguity. However, its overall score is brought down by the complete absence of output format instructions and a defined persona. While these elements are not strictly necessary for such a basic query, their inclusion would make the prompt more robust and versatile for more complex interactions or when a structured response is required."
}
```

### Test Prompt 2: Specific, Persona-driven, Formatted Prompt

**Prompt**: "As a seasoned marketing expert, generate three unique and catchy slogans for a new organic coffee brand targeting environmentally conscious millennials. The slogans should be punchy, under 10 words, and provided in a bulleted list."

```json
{
  "clarity": {
    "score": 10,
    "explanation": "The prompt is exceptionally clear, with all instructions and requirements stated in an unambiguous manner. There is no room for misinterpretation regarding the task, the product, the target audience, or the desired slogan characteristics.",
    "suggestions": "No improvements needed for clarity."
  },
  "specificity": {
    "score": 9,
    "explanation": "The prompt provides a high level of specificity, detailing the number of slogans (three), the brand type (new organic coffee), the target audience (environmentally conscious millennials), and key characteristics (unique, catchy, punchy, under 10 words). This ensures the model has ample guidance for generating relevant output.",
    "suggestions": "To achieve a perfect 10, the prompt could optionally specify a desired tone (e.g., inspiring, playful, sophisticated) or a core brand value if one exists, which might further refine the output for 'catchy' and 'unique.' However, for a general slogan generation task, it is already excellent."
  },
  "context": {
    "score": 10,
    "explanation": "Sufficient context is provided through the description of 'a new organic coffee brand' and 'targeting environmentally conscious millennials.' This background information is crucial for the AI to understand the product's nature and the audience's values, enabling it to generate appropriate and effective slogans.",
    "suggestions": "No improvements needed for context."
  },
  "output_format_constraints": {
    "score": 10,
    "explanation": "The prompt clearly defines all output format requirements and constraints: 'three unique and catchy slogans,' 'under 10 words,' and 'provided in a bulleted list.' These instructions are precise and leave no ambiguity about the expected structure and content of the response.",
    "suggestions": "No improvements needed for output format and constraints."
  },
  "persona_defined": {
    "score": 10,
    "explanation": "The persona 'As a seasoned marketing expert' is explicitly defined and highly relevant to the task of generating marketing slogans. This guides the model to adopt the appropriate expertise, tone, and strategic thinking necessary for effective branding.",
    "suggestions": "No improvements needed for persona definition."
  },
  "overall_score": 10,
  "summary": "This is an exceptionally well-crafted prompt. It excels in clarity, specificity, context, output format, and persona definition. The instructions are unambiguous, providing the AI with all necessary information to generate high-quality, relevant marketing slogans for the specified brand and target audience. The defined persona of a 'seasoned marketing expert' is perfectly aligned with the task, ensuring expert-level output. Minor optional enhancements could refine specificity further, but the prompt is already highly effective."
}
```

### Test Prompt 3: Vague Prompt

**Prompt**: "Write something about happiness."

```json
{
  "clarity": {
    "score": 8,
    "explanation": "The prompt is clear in its basic request: 'Write something about happiness.' The topic is unambiguous, and the core instruction is easy to understand.",
    "suggestions": "While the core request is clear, adding clarity on the *type* of 'something' (e.g., an essay, a poem, a short story) would significantly improve the prompt."
  },
  "specificity": {
    "score": 2,
    "explanation": "The prompt is extremely unspecific. 'Write something' offers no guidance on the desired format, length, tone, style, or specific angle of happiness to explore. This lack of detail makes it difficult for the model to generate a targeted and useful response.",
    "suggestions": "Specify the type of content (e.g., 'a short essay,' 'a poem,' 'a list of practical tips,' 'a personal reflection'), the desired length (e.g., 'around 500 words,' 'a haiku'), the tone (e.g., 'inspirational,' 'analytical,' 'humorous'), and specific aspects of happiness to cover (e.g., 'its definition,' 'ways to cultivate it,' 'the role of gratitude')."
  },
  "context": {
    "score": 1,
    "explanation": "No context whatsoever is provided. The model has no information about the purpose of the writing (e.g., for a blog post, a school assignment, a personal reflection, a speech) or the target audience. This absence of context severely limits the model's ability to tailor its response effectively.",
    "suggestions": "Provide context for the request. For whom is this 'something' intended? What is its purpose? (e.g., 'for a high school philosophy class,' 'for a motivational blog for young adults,' 'as an opening for a company wellness seminar')."
  },
  "output_format_constraints": {
    "score": 1,
    "explanation": "The prompt completely lacks any specified output format or content constraints for the generated text itself. It doesn't mention structure, headings, bullet points, specific elements to include or exclude, or any stylistic requirements.",
    "suggestions": "Define specific output format requirements (e.g., 'as a three-paragraph essay,' 'with bullet points,' 'include an introduction and conclusion'), structural constraints (e.g., 'discuss 3 different perspectives'), and any other specific elements (e.g., 'include a quote,' 'avoid jargon')."
  },
  "persona_defined": {
    "score": 1,
    "explanation": "Neither a persona for the AI nor a persona for the target audience is defined. The model doesn't know if it should write as a philosopher, a psychologist, a poet, a casual friend, or an expert, nor does it know who the intended reader is.",
    "suggestions": "Define the persona the AI should adopt (e.g., 'Act as a wise philosopher,' 'Write as a motivational speaker,' 'Adopt the tone of a friendly guide'). Also, specify the persona or characteristics of the target audience (e.g., 'for someone struggling with negativity,' 'for children learning about emotions,' 'for academics studying positive psychology')."
  },
  "overall_score": 2,
  "summary": "This prompt is extremely generic and provides minimal guidance to the AI. While the core topic is clear, the lack of specificity regarding the type of content, context, output format, and persona will likely result in a very broad and uninspired response. It's a 'write something' prompt in its most basic form, requiring significant elaboration to be truly effective."
}
```

### Test Prompt 4: Contextual/Targeted Prompt

**Prompt**: "Explain the concept of quantum entanglement to a high school student, using analogies and simple language. Structure your explanation with an introduction, two main analogies, and a conclusion."

```json
{
  "clarity": {
    "score": 10,
    "explanation": "The prompt is exceptionally clear. It precisely defines the topic (quantum entanglement), the target audience (high school student), the required pedagogical approach (analogies, simple language), and the exact structural components (introduction, two main analogies, conclusion). There is no ambiguity regarding the task.",
    "suggestions": "N/A - This criterion is perfectly met."
  },
  "specificity": {
    "score": 9,
    "explanation": "The prompt is highly specific. It pinpoints the exact concept, audience, language style, and the number and type of structural elements. This level of detail significantly guides the AI towards the desired output.",
    "suggestions": "To achieve absolute perfection, one could optionally specify a desired length for the explanation (e.g., 'around 500 words') or perhaps suggest a theme for analogies (e.g., 'relatable to everyday experiences'). However, for its current purpose, it is already very specific and effective."
  },
  "context": {
    "score": 8,
    "explanation": "The prompt provides ample internal context. It clearly establishes the 'who' (high school student) and the 'what' (explaining quantum entanglement with specific methods and structure). No external or prior conversation context is necessary for the AI to perform the task.",
    "suggestions": "While sufficient, context could be slightly enhanced if there was a specific learning objective or a particular challenge the high school student might face (e.g., 'address common misconceptions about quantum mechanics'). For a general explanation, it's already strong."
  },
  "output_format_constraints": {
    "score": 6,
    "explanation": "The prompt specifies structural constraints (introduction, two main analogies, conclusion), which is a good start. However, it lacks explicit instructions on the output *format* beyond these structural elements. It doesn't specify if Markdown headings should be used, if analogies should be clearly labeled (e.g., 'Analogy 1: [Title]'), or any other formatting to ensure readability and organization.",
    "suggestions": "Enhance this by adding explicit formatting requirements. For example: 'Format the output using Markdown. Use H2 headings for 'Introduction', 'Analogy 1', 'Analogy 2', and 'Conclusion'. Clearly label each analogy. Ensure paragraphs are concise and easy to read.'"
  },
  "persona_defined": {
    "score": 8,
    "explanation": "The prompt implicitly defines a clear persona for the AI: an educator or explainer capable of simplifying complex scientific concepts for a younger, non-expert audience. The instructions 'to a high school student, using analogies and simple language' directly shape this pedagogical persona.",
    "suggestions": "While strong implicitly, making the persona explicit could further refine the tone and style. For example, 'Act as a friendly and engaging science teacher explaining...' or 'Adopt the persona of a enthusiastic science communicator...' This can sometimes lead to an even more tailored and engaging output."
  },
  "overall_score": 8,
  "summary": "This is a very well-crafted prompt, excelling in clarity, specificity, and implicitly defining the necessary persona for the task. It provides all the essential information for the AI to generate a high-quality, structured explanation of quantum entanglement for a high school student. The primary area for improvement is the lack of explicit output formatting instructions beyond the structural components, which, if added, would ensure a polished and consistently presented final output."
}
```

### Test Prompt 5: Detailed Comparison with Format

**Prompt**: "Compare and contrast the economic policies of Keynesianism and Monetarism, focusing on their approaches to inflation and unemployment. Present your answer in a markdown table with columns for 'Policy', 'Inflation Approach', and 'Unemployment Approach'."

```json
{
  "clarity": {
    "score": 10,
    "explanation": "The prompt is exceptionally clear. It explicitly states the core task (compare and contrast), the subjects (Keynesianism and Monetarism), the specific focus areas (inflation and unemployment), and the desired output format (markdown table with exact column headers). There is no ambiguity.",
    "suggestions": "No improvements needed for clarity. The prompt is perfectly clear."
  },
  "specificity": {
    "score": 9,
    "explanation": "The prompt is highly specific. It names the two economic policies, the two key economic concepts (inflation and unemployment) to analyze for each, and even dictates the precise column headers for the output table. This leaves very little room for misinterpretation of the content required.",
    "suggestions": "While already very specific, one could slightly enhance it by asking for a brief definition of each policy's core tenets before the comparison, or perhaps asking for specific policy tools associated with each approach to inflation/unemployment. However, for a general comparison, it's excellent as is."
  },
  "context": {
    "score": 8,
    "explanation": "The prompt provides sufficient context for the task. It assumes a general understanding of economic concepts and the terms 'Keynesianism' and 'Monetarism,' which is reasonable given the academic nature of the comparison. It implicitly sets the stage for an analytical or educational response.",
    "suggestions": "To provide more specific context, the user could add a target audience (e.g., 'Explain this as if to a college student' or 'Prepare a summary for policymakers') or specify a particular historical period or economic scenario to consider. For a general overview, the current context is adequate."
  },
  "output_format_constraints": {
    "score": 10,
    "explanation": "The prompt is outstanding in defining the output format and constraints. It explicitly requests a 'markdown table' and provides the exact column headers: 'Policy', 'Inflation Approach', and 'Unemployment Approach'. This leaves no doubt about the structure of the desired response.",
    "suggestions": "No improvements needed. The output format is perfectly defined."
  },
  "persona_defined": {
    "score": 5,
    "explanation": "The prompt does not explicitly define a persona for the AI. It's a straightforward informational request without any instruction for the AI to adopt a specific role or tone.",
    "suggestions": "To improve, the user could add a persona, such as 'Act as an expert economist,' 'Imagine you are explaining this to a student,' or 'Adopt a neutral, academic tone.' This could help shape the depth, style, and complexity of the generated content."
  },
  "overall_score": 9,
  "summary": "This is a very strong and well-crafted prompt. It excels in clarity, specificity, and defining the output format, ensuring the AI understands precisely what information to provide and how to present it. The task is clearly defined, the subjects are specific, and the required output structure is unambiguous. The only minor area for potential enhancement is the explicit definition of a persona, which could further refine the tone or depth of the response, though it's not strictly necessary for a good answer to this particular query. Overall, it's an excellent example of a precise and effective prompt."
}
```

## Future Enhancements

-   **Dynamic Scoring Weights**: Implement a mechanism to dynamically adjust the weighting of criteria based on the type of prompt or user preferences.
-   **More Sophisticated LLM Integration**: Explore advanced LangChain features like agents with tool use for more complex prompt analysis.
-   **User Interface**: Develop a simple web interface for easier interaction and visualization of evaluation results.
-   **Comparative Analysis**: Allow users to compare the evaluation of multiple prompt versions side-by-side.

# Task
Next, define the Pydantic models for `CriterionEvaluation` and `PromptEvaluationOutput` to structure the agent's output.

## Define Output Structure

### Subtask:
Create Pydantic models to define the structured output for the agent. This includes a `CriterionEvaluation` model for individual criteria (score, explanation, suggestions) and a `PromptEvaluationOutput` model to encapsulate all five criteria evaluations, an overall score, and a summary.


**Reasoning**:
I need to define the Pydantic models `CriterionEvaluation` and `PromptEvaluationOutput` as specified in the instructions for structured output.



In [None]:
from pydantic import BaseModel, Field

class CriterionEvaluation(BaseModel):
    score: int = Field(..., description="Score for the criterion (1-10)")
    explanation: str = Field(..., description="Detailed explanation for the given score")
    suggestions: str = Field(..., description="Actionable suggestions for improvement")

class PromptEvaluationOutput(BaseModel):
    clarity: CriterionEvaluation
    specificity: CriterionEvaluation
    context: CriterionEvaluation
    output_format_constraints: CriterionEvaluation = Field(..., alias="output_format_constraints")
    persona_defined: CriterionEvaluation
    overall_score: int = Field(..., description="Overall prompt quality score (1-10)")
    summary: str = Field(..., description="Overall summary of the prompt evaluation")

print("Pydantic models 'CriterionEvaluation' and 'PromptEvaluationOutput' defined.")

Pydantic models 'CriterionEvaluation' and 'PromptEvaluationOutput' defined.


## Craft LLM Prompt Template

### Subtask:
Develop a detailed ChatPromptTemplate for the `llm_gemini` model. This template will guide the LLM to critically assess an input prompt against the five specified criteria. It will include system instructions for the evaluator's persona and human instructions to incorporate the user prompt and the required output format (using PydanticOutputParser's format instructions).


**Reasoning**:
I need to import the required classes, instantiate the PydanticOutputParser, and then construct the ChatPromptTemplate using the predefined system and human messages, incorporating the format instructions from the parser.



In [None]:
from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
from langchain_core.output_parsers import PydanticOutputParser

# Instantiate PydanticOutputParser with PromptEvaluationOutput
parser = PydanticOutputParser(pydantic_object=PromptEvaluationOutput)
format_instructions = parser.get_format_instructions()

# Create the ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages(
    [
        SystemMessagePromptTemplate.from_template(system_message),
        HumanMessagePromptTemplate.from_template(human_message)
    ]
)

print("ChatPromptTemplate and PydanticOutputParser instantiated.")

ChatPromptTemplate and PydanticOutputParser instantiated.


## Implement Scoring Agent Logic

### Subtask:
Create a Python function, `evaluate_prompt`, that encapsulates the agent's core logic. This function will: 1. Instantiate the `ChatGoogleGenerativeAI` model (`llm_gemini`). 2. Construct a LangChain processing chain by piping the `prompt_template` to the `llm_gemini` model, binding the `PydanticOutputParser` to ensure structured output. 3. Invoke this chain with the user's prompt and the format instructions from the parser. 4. Return the structured `PromptEvaluationOutput`.


**Reasoning**:
I need to define the `evaluate_prompt` function as specified, which will create and invoke a LangChain processing chain using the previously defined `prompt`, `llm_gemini`, and `parser` objects. I will then demonstrate its use with an example prompt.



In [None]:
def evaluate_prompt(user_prompt: str) -> PromptEvaluationOutput:
    # Create a LangChain processing chain
    chain = prompt | llm_gemini | parser

    # Invoke the chain with the user's prompt and format instructions
    result = chain.invoke({"user_prompt": user_prompt, "format_instructions": format_instructions})

    return result

# Demonstrate the function with an example
example_user_prompt = "What are the main causes of climate change?"
evaluation_result = evaluate_prompt(example_user_prompt)

print("Prompt Evaluation Result:")
print(evaluation_result.model_dump_json(indent=2))

Prompt Evaluation Result:
{
  "clarity": {
    "score": 10,
    "explanation": "The prompt is exceptionally clear and straightforward. The question 'What are the main causes of climate change?' is unambiguous and easy to understand, leaving no room for misinterpretation of the core request.",
    "suggestions": "No improvements needed for clarity."
  },
  "specificity": {
    "score": 6,
    "explanation": "While clear, the prompt lacks specificity regarding the desired depth, scope, or type of causes. 'Main causes' can be subjective and doesn't specify if the user wants human-induced, natural, historical, current, or a certain number of causes. It also doesn't indicate the level of detail required (e.g., high-level categories vs. specific scientific mechanisms).",
    "suggestions": "To improve specificity, consider adding details such as: 'List the top 5 anthropogenic causes of climate change,' 'Describe the primary natural and human-induced factors contributing to climate change ove

## Test the Agent

### Subtask:
Develop a series of test prompts that cover various aspects of prompt quality. Use these prompts as input to the `evaluate_prompt` function and print the structured output to verify that the agent is working as expected, providing scores, explanations, and suggestions for each criterion, along with an overall score and summary. This step will also verify the final score calculation.


**Reasoning**:
I need to create a list of diverse test prompts, iterate through them, call the `evaluate_prompt` function for each, and print the results in a structured JSON format to verify the agent's functionality as per the instructions.



In [None]:
test_prompts = [
    "What is the capital of France?", # Simple, clear prompt
    "As a seasoned marketing expert, generate three unique and catchy slogans for a new organic coffee brand targeting environmentally conscious millennials. The slogans should be punchy, under 10 words, and provided in a bulleted list.", # Specific, persona, format constraints
    "Write something about happiness.", # Vague, no context, no constraints
    "Explain the concept of quantum entanglement to a high school student, using analogies and simple language. Structure your explanation with an introduction, two main analogies, and a conclusion.", # Context, target audience, format constraints
    "Compare and contrast the economic policies of Keynesianism and Monetarism, focusing on their approaches to inflation and unemployment. Present your answer in a markdown table with columns for 'Policy', 'Inflation Approach', and 'Unemployment Approach'."
]

print("--- Starting Prompt Evaluation Tests ---")

for i, user_prompt in enumerate(test_prompts):
    print(f"\n--- Evaluating Prompt {i+1}: '{user_prompt}' ---")
    evaluation_result = evaluate_prompt(user_prompt)
    print(evaluation_result.model_dump_json(indent=2))

print("\n--- Prompt Evaluation Tests Completed ---")

--- Starting Prompt Evaluation Tests ---

--- Evaluating Prompt 1: 'What is the capital of France?' ---
{
  "clarity": {
    "score": 10,
    "explanation": "The prompt is exceptionally clear and straightforward. It's a direct, unambiguous question that leaves no room for misinterpretation regarding what information is being sought.",
    "suggestions": "No improvements needed for clarity."
  },
  "specificity": {
    "score": 10,
    "explanation": "The prompt is highly specific, asking for a single, precise piece of information: 'the capital of France.' There is no vagueness about the desired output.",
    "suggestions": "No improvements needed for specificity."
  },
  "context": {
    "score": 10,
    "explanation": "For a simple factual query, the prompt is self-contained and provides all necessary context. The question itself defines the subject ('France') and the type of information required ('capital'). No additional background information is needed for the AI to answer accurate

## Document and Prepare for Submission

### Subtask:
Create a comprehensive README file outlining the project, usage instructions, and the design choices made. Organize the source code for clarity. Generate 5-10 diverse test prompts and manually assign expected scores to demonstrate the agent's capabilities and validate its evaluations. This step is for the final submission requirements to a GitHub repository.


## Summary:

### Data Analysis Key Findings

*   **Output Structure Defined**: Pydantic models `CriterionEvaluation` and `PromptEvaluationOutput` were successfully created. `CriterionEvaluation` includes a score (1-10), explanation, and suggestions. `PromptEvaluationOutput` encompasses five `CriterionEvaluation` instances (clarity, specificity, context, output format constraints, persona defined), along with an overall score and summary.
*   **LLM Prompt Template Crafted**: A `ChatPromptTemplate` was developed for the `llm_gemini` model, integrating `SystemMessagePromptTemplate` and `HumanMessagePromptTemplate`. A `PydanticOutputParser` was used to extract format instructions, ensuring the LLM's output adheres to the defined `PromptEvaluationOutput` structure.
*   **Scoring Agent Logic Implemented**: The `evaluate_prompt` function was successfully created, encapsulating the agent's core logic. It establishes a LangChain processing chain (`prompt | llm_gemini | parser`) to invoke the LLM, passing the user's prompt and format instructions, and returning a structured `PromptEvaluationOutput`.
*   **Agent Functionality Verified with Diverse Prompts**:
    *   **Simple Prompt**: A clear prompt like "What is the capital of France?" received an overall score of 6, scoring high on clarity, specificity, and context (10 each), but low on output format and persona (1 each), as expected due to their absence.
    *   **Specific, Persona-driven, Formatted Prompt**: A detailed marketing slogan prompt achieved an overall score of 10, with high scores (9-10) across all criteria, demonstrating the agent's ability to recognize well-crafted prompts.
    *   **Vague Prompt**: A vague prompt such as "Write something about happiness" scored an overall 2, reflecting very low scores (1-2) on specificity, context, output format, and persona, correctly identifying its lack of direction.
    *   **Contextual/Targeted Prompt**: A prompt explaining quantum entanglement to a high school student scored an overall 8, with high scores for clarity (10), specificity (9), and context (8).
    *   **Detailed Comparison with Format**: A prompt comparing economic policies in a markdown table scored an overall 9, excelling in clarity (10), specificity (9), and output format constraints (10).
*   **Consistent Structured Output**: For all test cases, the agent consistently produced structured JSON output containing scores, detailed explanations, and actionable suggestions for each criterion, along with an overall score and summary, as defined by the Pydantic models.

### Insights or Next Steps

*   The implemented prompt evaluation agent provides a robust framework for assessing prompt quality, offering structured feedback that can significantly aid in prompt engineering and optimization.
*   Further development could involve incorporating dynamic adjustments to scoring based on the type of prompt (e.g., weighing output format higher if explicitly requested) and exploring user-defined criteria or custom weighting for specific applications.
