# SETUP

In [None]:
import pandas as pd
import google.generativeai as genai

genai.configure(api_key="ENTER API KEY")


In [8]:
def get_dataset_string(file_path):
    try:
        df = pd.read_csv(file_path)
        return df.to_string(index=False)
    except FileNotFoundError:
        return "CSV file not found. "

data = 'dataset.csv'
data = get_dataset_string(data)
print(data)

                                  Prompt Prompt_Type  Prompt_Length                                                                                                                                                             Response
   Where do you see yourself in 5 years?    Question             37                                                                            In 5 years, I see myself advancing in my career and possibly settling down with a family.
 Summarize the plot of Romeo and Juliet.     Command             39 Romeo and Juliet is a tragedy by William Shakespeare that tells the story of two young star-crossed lovers whose deaths ultimately reconcile their feuding families.
                  I'm feeling sad today.  Open-ended             22                                        I'm sorry to hear that you're feeling down. It's okay to have off days, and it's important to take time to care for yourself.
Why do you think education is important?    Question             40 

In [None]:
model = genai.GenerativeModel('gemini-2.0-flash-lite')

# QUESTIONS

Question 1: Column Exploration Challenge

"You suspect there’s a column that contains various categories or themes within prompt engineering (e.g., types of prompts or model behaviors). Craft a prompt that helps you find out what these categories might be."

 Expected skill output: Schema inference, descriptive prompting

In [10]:
PROMPT = "Analyze the dataset and identify a column that categorizes the prompts by their function or purpose."


full_prompt = f"Given the following dataset from a CSV file, {PROMPT}\n\nDataset:\n{data}"

response = model.generate_content(full_prompt)
print(response.text)

Based on the dataset provided, the column that categorizes the prompts by their function or purpose is **Prompt_Type**.

Here's why:

*   **Prompt_Type** clearly labels each prompt into categories like "Question," "Command," and "Open-ended." These categories reflect the function or intent behind the prompt.

*   Other columns like "Prompt" (the actual text) and "Response" (the AI's reply) don't categorize by function.
*   "Prompt_Length" is not the category we are looking for.


Question 2: Detect the Outlier

"Some rows in the dataset may not align well with the rest in terms of a specific metric like accuracy, BLEU score, or token usage. Without directly referencing those metrics, craft a prompt that will help you detect anomalies in the dataset."

Expected skill: Implicit querying, understanding of outlier detection through prompting

In [None]:
PROMPT = "Examine the responses in the dataset. Are there any entries where the response seems \
          too long, or completely unrelated to the prompt, compared to the others?\
          If so, list out the rows and state the exact metric you used to catogerise them as an anomaly.\
          Do not list out the rows that have no anomalies."

full_prompt = f"Given the following dataset from a CSV file, {PROMPT}\n\nDataset:\n{data}"

response = model.generate_content(full_prompt)
print(response.text)

Here's the analysis of the dataset, identifying anomalies based on the length of the response:

**Anomalies based on Response Length:**

*   **Row 12**
    *   **Metric:** The response to the prompt "What is your favorite color?" is just "My favorite color is blue."

**Justification**

*   All Open Ended responses have been classified as normal, since they are not bound by a set size of answer. 
    *   This also applies to Question type prompts.
*   The answer to the question, as the prompt already states "What is your favorite color?", a simple statement like this is the expected result.

*   **Row 121**
    *   **Metric:** The response to the prompt "What is your favorite color?" is just "My favorite color is blue."

**Justification**

*   All Open Ended responses have been classified as normal, since they are not bound by a set size of answer. 
    *   This also applies to Question type prompts.
*   The answer to the question, as the prompt already states "What is your favorite col

Question 3: Temporal Trend Insight

"Assume the dataset includes timestamps or progression data. Design a prompt that could help you understand how the performance or focus of prompt engineering has evolved over time."

In [38]:
PROMPT = "analyse the timestamps for each entry and how the characteristics of prompts and their\
      corresponding responses have changed over time. \
    For example, have prompt lengths increased, or have certain types of prompts become more common recently?"

full_prompt = f"Given the following dataset from a CSV file, {PROMPT}\n\nDataset:\n{data}"

response = model.generate_content(full_prompt)
print(response.text)

Okay, let's analyze this dataset.  Since there are no timestamps, I'll assume the order in the data provided represents the "time" sequence.

**Overall Trends and Observations**

*   **Variety of Prompt Types:** The dataset includes a mix of Question, Command and Open-ended prompts. This suggests a variety of use cases for the AI.
*   **Repetitive Content:** A few prompts and responses are repeated frequently (e.g., "Why do you think education is important?", "Tell me a joke"). This might indicate specific areas of focus or a bias in the training data, or perhaps these are intentionally tested.
*   **Topic Diversity:** The prompts touch upon various topics, from personal feelings ("I'm feeling sad today") to technical subjects ("Explain quantum mechanics"), demonstrating versatility.
*   **Thematic Clustering:** Notice some related prompts appearing together. (e.g., a series of questions and commands related to food).

**Detailed Analysis by Type**

Here's how the characteristics evolv

Question 4: Who’s the Top Performer?

"Imagine this dataset tracks various LLMs or techniques. Construct a prompt that helps you determine which model or strategy consistently performs the best — without directly asking 'which model is best?'"

In [17]:
PROMPT = "Assume the rows are input and outputs of various LLMs.\
    Analyse the model responses and identify the prompt type or technique that consistently produces\
      the most detailed, relevant, and well-structured responses. \
        Provide examples of these high-performing prompts."

full_prompt = f"Given the following dataset from a CSV file, {PROMPT}\n\nDataset:\n{data}"

response = model.generate_content(full_prompt)
print(response.text)

Based on the provided dataset, the prompt type that consistently produces the most detailed, relevant, and well-structured responses is **Command**.

**Examples of High-Performing Prompts (Commands):**

*   "Summarize the plot of Romeo and Juliet."
*   "Describe the process of photosynthesis."
*   "List the ingredients for making pizza."
*   "Explain quantum mechanics."

**Reasoning:**

*   **Commands vs. Other Types:** Commands, in this dataset, consistently generated responses that directly answered the prompt by providing information in a structured manner. They delivered factual, concise explanations, lists, or summaries. Other prompt types, such as open-ended, led to less informative responses (e.g., the responses to "I'm feeling sad today" or "The weather is really nice."). Question prompts also frequently resulted in generic responses, such as "Why do you think education is important?".

*   **Structure and Detail:** Commands typically produced more detailed and structured respo

Question 5: Summarize a Class of Records

"There’s a category of prompts labeled under a common theme — like 'Chain-of-Thought' or 'Few-Shot'. Write a prompt that gives you a detailed overview of just this category."

In [18]:
PROMPT = "based on the 'prompt'column label the rows under a common category of prompts — like 'Chain-of-Thought' or 'Few-Shot'.\
    provide a detailed summary of all the entries that have the same prompt type. \
    What are the common characteristics of these prompts and their responses? What is the overall purpose of this category?"

full_prompt = f"Given the following dataset from a CSV file, {PROMPT}\n\nDataset:\n{data}"

response = model.generate_content(full_prompt)
print(response.text)

Here's a breakdown of the prompt types, their common characteristics, and their overall purpose based on the provided dataset.

**1. Question**

*   **Summary of Entries:**
    *   Where do you see yourself in 5 years? (Repeated multiple times)
    *   Why do you think education is important? (Repeated multiple times)
    *   What are the benefits of exercise? (Repeated multiple times)
    *   How do you cook pasta? (Repeated multiple times)
    *   What is your favorite color? (Repeated multiple times)

*   **Common Characteristics:**
    *   They are phrased as questions.
    *   They require the language model to provide information or opinions.
    *   The answers can vary greatly in length and style depending on the specific question.
    *   The questions often seek information, opinions, or advice.

*   **Overall Purpose:**
    *   To elicit information, opinions, or guidance from the language model.
    *   To gauge the language model's knowledge and ability to formulate respon

Question 6: Column Mapping Without Metadata

"You don’t know what any of the columns are, but you want to understand which column contains numeric performance metrics. Design a prompt to help the LLM find and describe that column."

In [19]:
PROMPT = "Assume the column names are not known. Now examine the data in the dataset. Which column \
  appears to contain numerical values, and what do you think those numbers represent? Provide a \
    brief description of the column's likely purpose. If no such column exists, Sate the same"

full_prompt = f"Given the following dataset from a CSV file, {PROMPT}\n\nDataset:\n{data}"

response = model.generate_content(full_prompt)
print(response.text)

Okay, I will analyze the data provided.

**Column with Numerical Values and Interpretation:**

The column that appears to contain numerical values is the **"Prompt_Length"** column.

**Likely Purpose:**

The numbers in the "Prompt_Length" column most likely represent the **number of characters or words** in the "Prompt" column's string.

**In summary, the "Prompt_Length" column's purpose is to provide a length measurement for each of the prompts.**



Question 7: Cross-column Reasoning

"You want to know if there's a correlation between the length of prompts and the scores they receive. Write a prompt that would help you explore this without asking for raw numbers."

In [22]:
PROMPT = "analyze the relationship between the `Prompt_Length` and the quality of the `Response` given by the LLM. \
  Find the patterns between the length and quality of the reponse.\
  For example: do longer prompts generate better or worse responses than shorter prompts?"

full_prompt = f"Given the following dataset from a CSV file, {PROMPT}\n\nDataset:\n{data}"

response = model.generate_content(full_prompt)
print(response.text)

Okay, I will analyze the dataset to find any patterns between the `Prompt_Length` and the quality of the `Response`.  It's important to note that without a clear, numerical "quality" score for the responses, my analysis will be based on subjective observation and general trends.  I'll try to look for:

*   **Relationship between length and response type/complexity:** Do longer prompts tend to be commands or questions?  Are they more likely to generate complex, nuanced responses or shorter, more factual ones?
*   **Relationship between length and response "quality" (subjective):**  Do longer prompts seem to elicit more comprehensive, detailed, or insightful responses?  Or do they sometimes lead to less focused or even rambling answers?
*   **Presence of a "sweet spot":**  Is there a prompt length that seems to generate the "best" responses, or is there no clear correlation?
*   **Other factors:** Could other variables like prompt type and the specific content of the prompts affect the q

Question 8: Suggest an Optimization

"Based on the patterns in the dataset, design a prompt that encourages the model to suggest how prompt engineering techniques could be optimized — particularly those with suboptimal outcomes."

In [23]:
PROMPT = "identify a category of prompts of length of prompts that often leads to suboptimal outcomes or reponses which are vague. \
        Suggest specific changes or optimizations to that category of prompts that could improve the quality of the responses.\
        Use examples from the dataset as a guide."

full_prompt = f"Given the following dataset from a CSV file, {PROMPT}\n\nDataset:\n{data}"

response = model.generate_content(full_prompt)
print(response.text)

Okay, I'll analyze the dataset you provided and identify a category of prompts that tend to elicit suboptimal responses. Then, I'll suggest specific changes and optimizations, using examples from the dataset.

**Category of Prompts Leading to Vague Responses:**

Based on the dataset, the "Open-ended" prompt type often leads to vague, generic, or overly simplistic responses. These prompts invite the model to share its feelings or opinions, which can be challenging for a language model. The responses frequently lack depth, originality, or specific details.

**Examples from the Dataset:**

*   **Prompt:** "I'm feeling sad today."
    *   **Response:** "I'm sorry to hear that you're feeling down. It's okay to have off days, and it's important to take time to care for yourself."
*   **Prompt:** "The weather is really nice."
    *   **Response:** "That's great to hear! Good weather can really lift your spirits."
*   **Prompt:** "The movie was amazing."
    *   **Response:** "I'm glad you enj

Question 9

"Several models in this dataset show close performance scores. Use chain-of-thought prompting to analyze why one slightly outperforms another, based on metadata like prompt type, length, or task." 

What it Tests : CoT prompting, 
                Comparative reasoning across subtle differences 

Candidate Task : Select two near-similar entries and use a CoT prompt to reason about differences in outcome. 

In [24]:
PROMPT = "analyze the performance of different prompts in it.\
      Judge the performance of a prompt by the clarity and relevance of its response.\
    Example 1:\
    Prompt: 'Why do you think education is important?'\
    Prompt_Type: Question\
    Prompt_Length: 40\
    Response: 'I believe education is important because it empowers individuals and contributes to societal progress.'\
\
    Example 2:\
    Prompt: 'Where do you see yourself in 5 years?'\
    Prompt_Type: Question\
    Prompt_Length: 37\
    Response: 'In 5 years, I see myself advancing in my career and possibly settling down with a family.'\
The responses for both these prompts are good. \
        Analyse step-by-step to explain why Example 1 might have a slightly better response than Example 2, \
        considering the `Prompt_Type` and `Prompt_Length`.\
Compare the content, length, and type of both prompts. and Look for differences in the quality, depth, and applicability of the responses.\
Use the metadata (`Prompt_Type`, `Prompt_Length`) to connect the prompt's characteristics to the response's quality.\
    "

full_prompt = f"Given the following dataset from a CSV file, {PROMPT}\n\nDataset:\n{data}"

response = model.generate_content(full_prompt)
print(response.text)

Here's an analysis of the prompts and responses, focusing on clarity, relevance, and the influence of `Prompt_Type` and `Prompt_Length`:

**Overall Observations:**

*   **Question prompts** tend to elicit more thoughtful and informative responses, as they encourage the AI to explain or provide insight.
*   **Command prompts** generally receive concise, direct answers, suitable for tasks like listing ingredients or summarizing.
*   **Open-ended prompts** (e.g., "I'm feeling sad today") lead to empathetic, but often less content-rich, responses.
*   **Prompt Length** doesn't consistently correlate with the quality of the response, but well-structured prompts (regardless of length) typically provide better responses.
*   There is a strong tendency for responses to be consistent in quality for a specific prompt.

**Detailed Analysis (Example-Based):**

**Example 1: Better Response**

*   **Prompt:** "Why do you think education is important?"
    *   **Prompt\_Type:** Question
    *   **Pro

Question 10

"This dataset includes various prompt formats and their outcomes. Assume you are designing a new prompt technique. Given two successful examples, write a few-shot prompt that the model can generalize from to generate more high-performing prompts." 

What it Tests : Few-shot formatting and instruction construction, Understanding of task generalization 

Candidate Task : Extract 2–3 high-performing prompt examples and construct a few-shot template that could be reused. 

In [30]:
PROMPT = """ Analyse it and create new prompts that are clear and concise.Here are two examples of successful prompts and their corresponding attributes from a dataset.
      These prompts are considered successful because they elicit a direct and factual response.
Example 1:
Prompt: "Summarize the plot of Romeo and Juliet."
Prompt_Type: Command
Prompt_Length: 39
Response: "Romeo and Juliet is a tragedy by William Shakespeare that tells the story of two young star-crossed lovers whose deaths ultimately reconcile their feuding families."

Example 2:
Prompt: "List the ingredients for making pizza."
Prompt_Type: Command
Prompt_Length: 38
Response: "To make pizza, you'll need dough, tomato sauce, cheese, and your choice of toppings like pepperoni, mushrooms, and bell peppers."

Your task is to generate a similar example which is high performing.
Using these examples as a template, your task is to generate a new, high-performing prompt that follows the same "Command" style.
The new prompt should be similar in length and structure, and should be of command type, 
for example asking for a summary, list, or factual statement on a new topic.
Follow this structure:
New Prompt:
Prompt_Type: Command
Prompt_Length: [your generated length]
Prompt: '[your generated prompt]'
Response: '[Generate the response]'

Give 5 new propmts.
"""

full_prompt = f"Given the following dataset from a CSV file, {PROMPT}\n\nDataset:\n{data}"

response = model.generate_content(full_prompt)
print(response.text)

Here are five new, high-performing prompts based on the dataset, following the "Command" style:

New Prompt:
Prompt_Type: Command
Prompt_Length: 30
Prompt: 'List the steps for changing a car tire.'
Response: "To change a car tire, you'll need to: 1) Gather tools (spare tire, jack, wrench). 2) Loosen lug nuts. 3) Position the jack and raise the car. 4) Remove lug nuts and the flat tire. 5) Mount the spare tire and tighten lug nuts. 6) Lower the car and fully tighten lug nuts."

New Prompt:
Prompt_Type: Command
Prompt_Length: 28
Prompt: 'Summarize the plot of Hamlet.'
Response: "Hamlet is a tragedy about a Danish prince seeking revenge for his father's murder, which leads to widespread death and destruction in the royal court."

New Prompt:
Prompt_Type: Command
Prompt_Length: 28
Prompt: 'Explain the water cycle.'
Response: "The water cycle involves: 1) Evaporation (water turns to vapor). 2) Condensation (vapor turns to liquid). 3) Precipitation (water falls). 4) Collection (water gathers

Question 11

"You're tasked with generating a prompt in the same style as top scoring ones. Use few-shot prompting to help the LLM learn the 'style' of successful prompts, then apply it to a new topic." 

What it Tests : Pattern abstraction , Prompt consistency 

Candidate Task : Identify top prompts and create a prompt to generate new ones in the same format. 

In [33]:
PROMPT = """ Analyze the prompts and based on the prompts that perform well, generate a new prompt in the same style as the successful examples provided below.
Here are two examples of high-performing prompts from the dataset. You have to go through these prompts and understand the similar pattern they 
are wrriten in and undertand the style. 

Example 1:
Prompt: "Summarize the plot of Romeo and Juliet."
Prompt_Type: Command
Prompt_Length: 39
Response: "Romeo and Juliet is a tragedy by William Shakespeare that tells the story of two young star-crossed lovers whose deaths ultimately reconcile their feuding families."

Example 2:
Prompt: "List the ingredients for making pizza."
Prompt_Type: Command
Prompt_Length: 38
Response: "To make pizza, you'll need dough, tomato sauce, cheese, and your choice of toppings like pepperoni, mushrooms, and bell peppers."

Your task is to generate a new prompt and response that follow this same style. 
The new prompt should have a different theme than these but needs to be of the same style and format.

Follow this structure and give 3 new prompts:
New Prompt:
Prompt_Type: Command
Prompt_Length: [your generated length]
Prompt: '[your generated prompt]'
Response: '[Generate the response]'
"""

full_prompt = f"Given the following dataset from a CSV file, {PROMPT}\n\nDataset:\n{data}"

response = model.generate_content(full_prompt)
print(response.text)

Here are three new prompts generated in a similar style to the high-performing prompts provided in the dataset:

New Prompt 1:
Prompt_Type: Command
Prompt_Length: 42
Prompt: "List the steps involved in changing a car's flat tire."
Response: "To change a flat tire, you'll need to loosen the lug nuts, jack up the car, remove the flat, put on the spare, and tighten the lug nuts."

New Prompt 2:
Prompt_Type: Command
Prompt_Length: 43
Prompt: "Explain the basic principles of supply and demand in economics."
Response: "Supply and demand describes how the availability of a product and the desire for it influence its price in a market setting."

New Prompt 3:
Prompt_Type: Command
Prompt_Length: 38
Prompt: "Summarize the plot of William Shakespeare's Hamlet."
Response: "Hamlet is about the prince seeking revenge for his father's murder and it's the most famous tragedy of Shakespeare's works."



Question 12

"You don’t know the column headers of this dataset. Use chain-of-thought reasoning to deduce what the likely columns are and what kind of data they contain." 

What it Tests : Chain-of-thought reasoning from values , Schema inference 

Candidate Task : Prompt the model to deduce schema from a few row values only.

In [35]:
PROMPT = """Assume you did not have the column headers in it. and read only the 1st 10 rows of the dataset.
From these rows and values you can see that:
The 1st value in each row looks like a question. They are all questions or statements given.
The second values are one of these: "Question," "Command," and "Open-ended".
The 3rd values are positive numerical values ranging from 20 to even 39. 
4th values are phrases like "In 5 years..." and "Romeo and Juliet is..." which look like answeres.

Based on these analysis, infer the data given and provide a description of each for each of the 1st, 2nd, 3rd and 4th value.
Give a general description of what these column values are.
"""

full_prompt = f"Given the following dataset from a CSV file, {PROMPT}\n\nDataset:\n{data}"

response = model.generate_content(full_prompt)
print(response.text)

Here's an analysis of the dataset, along with descriptions for each column:

**General Description of the Columns:**

*   **Column 1: Prompt (Question/Statement):** This column contains the core of the interaction – a question, a command, or a statement. These are the inputs or initiations for a response.
*   **Column 2: Prompt_Type:** This column categorizes the type of prompt provided in Column 1.
*   **Column 3: Prompt_Length:** This column contains the length of each question or prompt.
*   **Column 4: Response:** This column contains the answer or the feedback.

**Analysis of the First 10 Rows:**

| **Column 1: Prompt (Question/Statement)** | **Column 2: Prompt\_Type** | **Column 3: Prompt\_Length** | **Column 4: Response**                                                                    | Description                                                                   |
| :---------------------------------------- | :------------------------- | :------------------------- | :-------

  

Question 13

"Design a prompt that not only analyzes failing strategies but teaches the model how to improve based on similar successful examples." 

What it Tests : Merged few-shot + CoT , Teaching-through-prompting 

Candidate Task : Provide few examples of poor-performing rows, compare with high-performing ones, and generate improvement strategy. 

In [37]:
PROMPT = """Analyse the data and the prompt performance and provide a strategy for improvement.
Here is a hypothetical example of a poor-performing prompt and its response, followed by a high-performing example for comparison.
**Poor-Performing Example:**

Example 1:
Prompt: "Tell me about cars."
Prompt_Type: Open-ended
Prompt_Length: 17
Response: "Cars are a type of vehicle used for transportation."

This response is considered poor because it's too general and lacks detail, is short and generic since other vehicals are aslo used for transportation. 
The response should also include aspects related to the design and looks of a car. 

Example 2:
Prompt: "Explain technology."
Prompt_Type: Open-ended
Prompt_Length: 20
Response: "Technology refers to tools and machines that help people."
Why It's Poor:
Overly simplistic and vague.
Lacks examples or categories.
Doesnt consider the scope of modern technology (e.g., AI, smartphones, etc.)

Example 3:
Prompt: "Write something about food."
Prompt_Type: Open-ended
Prompt_Length: 25
Response: "Food is something we eat to live."
Why It's Poor:
Fails to add value—doesnt mention types, culture, taste, or nutrition.
Extremely basic and redundant information.

**High-Performing Example:**

Example 1:
Prompt: "Why do you think education is important?"
Prompt_Type: Question
Prompt_Length: 40
Response: "I believe education is important because it empowers individuals and contributes to societal progress."
This response is considered high-performing because it's detailed, insightful, and goes beyond a simple definition.
It is a specific question that asks for an opinion or a deeper analysis. It's longer and more detailed, which provides better context.

Example 2:
Prompt: "How has the internet changed communication in the last decade?"
Prompt_Type: Question
Prompt_Length: 52
Response: "The internet has made communication faster, more global, and accessible, transforming both personal interactions and professional collaboration."
Why It's Good:
Focused, time-bound, and analytical.
Encourages specific, multifaceted answers.


Now analyze the difference between these examples of poor and high performing prompts and then propose a strategy for improving any poor-performing prompts.
What should we keep in mind while writing prompts so that the responses and well written.
For example: To improve the "Tell me about cars" prompt, we should make it more specific. Instead of asking for a general summary, we should ask a focused question.
Based on this, provide a new, improved version of the poor-performing prompt and explain the strategy you used to make it better.
"""

full_prompt = f"{PROMPT}"

response = model.generate_content(full_prompt)
print(response.text)

## Analysis of Poor vs. High-Performing Prompts and Improvement Strategy

**Key Differences Between Poor and High-Performing Prompts:**

The primary difference lies in the **specificity and depth** of the prompt.

*   **Poor Prompts:**
    *   Are too **broad and general**.
    *   Often ask for a simple definition or a superficial overview.
    *   Result in short, generic, and uninformative responses.
    *   Lack clear direction for the AI, leading to basic answers.
    *   Use passive language and dont specify the type of response or analysis expected.

*   **High-Performing Prompts:**
    *   Are **specific and focused**, targeting a particular aspect or angle.
    *   Often framed as questions or requests for analysis, comparison, or explanation.
    *   Encourage more detailed and insightful responses.
    *   Provide context and direction, guiding the AI towards a more complete and informative answer.
    *   Use active language which pushes the AI to take action.

**General St