# Assignment: Evaluate Prompts with Sample Outputs using Google AI Studio (PaLM Explorer)


---

### Objective:
This assignment aims to provide hands-on experience in **evaluating the quality and characteristics of Large Language Model (LLM) outputs** generated from different types of prompts. You will use the **Google AI Studio web interface (formerly known as PaLM Explorer)** to experiment with various prompts, observe the model's responses, and critically analyze their effectiveness, coherence, and adherence to instructions. This process is fundamental to understanding prompt engineering and optimizing LLM interactions.

---

### Instructions:
1.  **Access Google AI Studio**: Navigate to [https://aistudio.google.com/](https://aistudio.google.com/). You will need a Google account.
2.  **Start a new prompt**: In Google AI Studio, select "Create new" and then "Text prompt" (or "Chat prompt" for conversational tasks).
3.  **Experiment with Prompts**: For each task below, copy the provided prompt into the Google AI Studio interface. You can adjust model parameters (temperature, top-k, top-p, max output tokens) if instructed or to observe effects.
4.  **Capture Outputs**: After generating a response, **copy the exact output text** and paste it into the designated markdown cell in this Jupyter Notebook. If the output is very long, paste a representative portion. You may also include **screenshots** if they help illustrate your points (e.g., showing parameter settings or multiple generated outputs).
5.  **Analysis**: For each task, provide a detailed analysis of the LLM's output. Discuss its quality, relevance, adherence to instructions, and any unexpected behaviors.
6.  **Jupyter Notebook**: All your observations, pasted outputs, and analysis must be documented in this Jupyter Notebook.

---

## Part 1: Basic Prompt Evaluation (Text Prompt)
In this section, you'll evaluate standard text generation prompts for factual, creative, and summarization tasks.

### Task 1.1: Factual Question and Parameter Impact
Test a factual question and observe how varying the `Temperature` parameter affects the output.

* **Prompt**: `Explain the concept of photosynthesis in one paragraph suitable for a high school student.`
* **Steps**:
    1.  Run the prompt with **Temperature = 0.0** (deterministic).
    2.  Run the prompt with **Temperature = 0.8** (more creative/diverse).
    3.  Run the prompt with **Temperature = 1.0** (even more randomness).
* **Paste Outputs Below**:
    ```
    # Output with Temperature = 0.0:
    [Paste LLM output here]

    # Output with Temperature = 0.8:
    [Paste LLM output here]

    # Output with Temperature = 1.0:
    [Paste LLM output here]
    ```

### Analysis for Task 1.1:
* How did the outputs differ between the different `Temperature` settings?
* Which temperature setting produced the most accurate/appropriate response for this factual task?
* In what scenarios would you choose a high vs. low temperature for factual queries?

### Task 1.2: Creative Writing Prompt
Evaluate the LLM's ability to generate creative text.

* **Prompt**: `Write a short, whimsical poem about a grumpy cat who secretly loves belly rubs.`
* **Steps**:
    1.  Run the prompt with **Temperature = 0.9**.
    2.  Run it again with the same temperature (to see variability).
* **Paste Outputs Below**:
    ```
    # First Output:
    [Paste LLM output here]

    # Second Output (same prompt, same temp):
    [Paste LLM output here]
    ```

### Analysis for Task 1.2:
* Did the LLM capture the requested tone (whimsical) and subject matter (grumpy cat, belly rubs)?
* How similar or different were the two outputs generated with the same prompt and temperature?
* What aspects of the poem (rhyme, rhythm, imagery) were well-executed, and what could be improved?

### Task 1.3: Text Summarization
Assess the LLM's summarization capabilities.

* **Input Text**:
    ```
    The Amazon rainforest, covering much of northwestern Brazil and extending into Colombia, Peru, and other South American countries, is the world’s largest tropical rainforest, famed for its biodiversity. It’s home to millions of species, including thousands of types of plants and animals. The Amazon River, flowing through the forest, is the second-longest river in the world and carries more water than any other river. Deforestation, primarily for cattle ranching and agriculture, poses a significant threat to the rainforest, leading to habitat loss and contributing to climate change. Efforts are underway globally to protect this vital ecosystem.
    ```
* **Prompt**: `Summarize the following text in 2-3 sentences.

[Paste Input Text Here]`
* **Steps**:
    1.  Paste the `Input Text` directly into the prompt in Google AI Studio, following your instruction.
    2.  Run the prompt.
* **Paste Output Below**:
    ```
    [Paste LLM output here]
    ```

### Analysis for Task 1.3:
* Did the summary accurately capture the main points of the original text?
* Did it adhere to the length constraint (2-3 sentences)?
* Was the language concise and natural, or did it feel robotic/repetitive?

---

## Part 2: Advanced Prompt Evaluation (Data/Structured Prompt)
This section explores prompts that require information extraction or adherence to specific output formats.

### Task 2.1: Information Extraction
Extract specific pieces of information from a given text into a structured format.

* **Input Text**:
    ```
    Acme Corp announced record profits for Q3 2024, reaching $1.2 billion, up from $900 million in Q2. Their CEO, Jane Doe, stated the growth was primarily due to strong sales in their new European market and innovative product launches. The company plans to open a new R&D center in Bengaluru, India, by early 2025.
    ```
* **Prompt**: `Extract the following information from the text below and present it as a JSON object:

- Company Name
- Q3 2024 Profit
- Q2 2024 Profit
- CEO Name
- Reason for Growth
- New R&D Center Location
- New R&D Center Opening Year

Text:
[Paste Input Text Here]`
* **Steps**:
    1.  Paste the `Input Text` into the prompt.
    2.  Run the prompt.
* **Paste Output Below**:
    ```json
    [Paste LLM output (JSON format) here]
    ```

### Analysis for Task 2.1:
* Did the LLM successfully extract all the requested information?
* Was the output strictly in valid JSON format? If not, what errors occurred?
* How accurately did it identify and categorize each piece of information?
* Discuss challenges the LLM might face with more complex or ambiguous texts for extraction.

### Task 2.2: Conditional/Rule-Based Generation
Evaluate a prompt that requires the LLM to follow specific rules or conditions based on input.

* **Prompt**: `Analyze the sentiment of the following customer review. If the sentiment is positive, write a 1-sentence thank you message. If the sentiment is negative, write a 1-sentence apology and ask how to improve. If neutral, just say 'Sentiment: Neutral'.

Review: "I absolutely love this new phone! The camera is amazing and the battery life is fantastic. Highly recommend!"

---

Review: "The product arrived damaged and customer service was unhelpful. Very disappointed."

---

Review: "The packaging was adequate and the delivery was on time. The item itself is okay."
`
* **Steps**:
    1.  Paste the entire prompt (including all three reviews) into Google AI Studio.
    2.  Run the prompt.
* **Paste Output Below**:
    ```
    [Paste LLM output here]
    ```

### Analysis for Task 2.2:
* Did the LLM correctly identify the sentiment for each review?
* Did it generate the appropriate response (thank you, apology, or neutral statement) for each sentiment?
* Were there any instances where the rules were not followed correctly? What might cause this?

---

## Part 3: Comparative Analysis and Refinement
This section focuses on identifying prompt weaknesses and iteratively improving them.

### Task 3.1: Identify and Debug a Sub-optimal Prompt
Consider the following prompt. It sometimes gives vague or overly generic responses.

* **Initial Problematic Prompt**: `Write an email about the upcoming event.`

Your task is to:
1.  Run this initial prompt in Google AI Studio and observe its typical output.
2.  Identify *why* it's problematic (e.g., lack of detail, ambiguity).
3.  Propose and implement a **refined prompt** that addresses these issues by adding more context, constraints, or examples.
4.  Run your refined prompt and compare the output.

**1. Output from Initial Problematic Prompt:**
```
[Paste LLM output here]
```

**2. Analysis of Problematic Prompt:**
[Explain why the initial prompt is problematic. What's missing? What's ambiguous?]

**3. Your Refined Prompt (Example Structure):**
```
Write an email to [Target Audience] about the [Event Name] happening on [Date] at [Time] in [Location].
The email should include:
- A catchy subject line.
- A brief introduction to the event.
- [2-3 Key Highlights/Benefits of attending].
- Call to Action (e.g., 'RSVP by [Date]' or 'Register at [Link]').
- Keep the tone [e.g., 'professional and exciting'].
```

**4. Output from Refined Prompt (with example variables):**
    * *Example Variables you used (e.g., Event Name: 'Annual Tech Conference', Date: 'October 26th', etc.):*
        [List your specific input values for the refined prompt here]
    ```
    [Paste LLM output here]
    ```

### Analysis for Task 3.1:
* What specific changes did you make to the initial prompt to refine it?
* How did the output from the refined prompt improve compared to the initial one?
* Discuss the importance of clarity, specificity, and constraints in prompt engineering.

---

## Part 4: Conclusion and Reflection
In this markdown cell, provide a comprehensive summary of your findings and reflections based on this assignment.

* **Key Learnings from Evaluation**: What are the most important insights you gained about LLM behavior and prompt engineering through this hands-on evaluation?
* **Role of Parameters**: How do parameters like Temperature, Top-K, and Top-P influence the diversity and quality of outputs?
* **Challenges in Prompt Design**: What aspects of prompt design did you find most challenging?
* **Best Practices**: Based on your experience, what are 3-5 best practices for writing effective prompts for LLMs?
* **Ethical Considerations**: What ethical concerns (e.g., bias, misinformation, fairness) might arise when evaluating or deploying LLM outputs, and how can prompt design help mitigate them?

---

### Submission:
* Ensure all instructions are followed and outputs/analysis are clearly documented.
* Save your Jupyter Notebook as `[YourName]_Prompt_Evaluation_Assignment.ipynb`.