# 2209261 Basic Programming NLP
## Lab 12 : NLP (Part II Large Language Model)
## Done by : 6730084521 Chatrphol Ovanonchai

# 1. Generate a Jokes Dataset with an LLM

Write a Python script that calls the Gemini API to generate a small jokes dataset based on a user-provided topic and number of jokes, then returns the results as a pandas DataFrame.

## Functional Requirements
1. **Inputs**
   - `topic` (string), e.g., `"computers"`, `"Thai food"`.
   - `n_jokes` (int), e.g., `20`.

2. **LLM Call (Gemini API)**
   - Prompt Gemini to generate `n_jokes` short, family-friendly jokes about the given topic.
   - Ask Gemini to return the output in **strict JSON** format with the following structure.

3. **Expected Output Schema**
   | Column | Type | Description |
   |---------|------|--------------|
   | id | int | Index number (1..n) |
   | topic | str | The topic used |
   | joke | str | The joke text |

In [24]:
# install libraries
!pip install -q -U google-generativeai

In [25]:
import google.generativeai as genai
import os
import pandas as pd
import json

api_key = "AIzaSyBoZ3ltxyq3IZKksNGB19SvNcH6S2nPRyQ"

genai.configure(api_key=api_key)
model = genai.GenerativeModel("gemini-2.5-flash-lite")

prompt = "Explain the difference between a Generative model and a Discriminative model in simple terms."

response = model.generate_content(prompt)
print(response.text)

Let's imagine you want to teach a computer to distinguish between pictures of cats and dogs.

**Generative Model: The "Artist"**

Think of a generative model as an **artist** who learns how to **create** new pictures of cats and dogs.

*   **What it learns:** It tries to understand *what makes a cat a cat* and *what makes a dog a dog*. It learns the typical features of each, like the shape of their ears, eyes, nose, fur patterns, and body structure. It's essentially learning the **underlying distribution** of data for each category.
*   **How it works:** It learns the probability of seeing certain pixels or features *given* that it's a cat, and the probability of seeing certain pixels or features *given* that it's a dog.
*   **What it can do:**
    *   **Generate new examples:** It can create entirely new, realistic-looking pictures of cats and dogs that it has never seen before. It can "imagine" what a new cat or dog would look like.
    *   **Classify:** Since it understands what cat

In [26]:
## Optional : Tuning output (TEst with prompt)
prompt = "Explain history of department of Elecrial Engineering , Chulalongkorn University "

# Define the generation configuration
generation_config = {
    "temperature": 0.7,  # A higher temperature for creativity
    "max_output_tokens": 300 # Keep the story relatively short
}

# Make the API call with the configuration
try:
    response = model.generate_content(
        prompt,
        generation_config=generation_config
    )
    print(response.text)
except Exception as e:
    print(f"An error occurred: {e}")

The Department of Electrical Engineering at Chulalongkorn University boasts a rich and foundational history, deeply intertwined with the development of electrical engineering education and practice in Thailand. Here's a breakdown of its evolution:

**Early Beginnings and the Genesis of Engineering Education (Pre-1940s):**

* **The Seeds of Engineering:** While not a dedicated Electrical Engineering department initially, the concept of engineering education at Chulalongkorn University began to take root in the early 20th century. The university was established in 1917, and the need for skilled professionals in various fields, including technical ones, was recognized.
* **The "School of Engineering" (1930s):** The formal establishment of an engineering faculty, or "School of Engineering," was a crucial step. This broad faculty encompassed various engineering disciplines, laying the groundwork for specialization.

**The Birth of Electrical Engineering as a Distinct Discipline (1940s):**



## Define function for joke generation

### Subtask:
Create a Python function that takes the topic and number of jokes as input, calls the Gemini API to generate jokes in JSON format, and returns the JSON response.


**Reasoning**:
Define the Python function to generate jokes using the Gemini API, construct the prompt, call the API, extract the JSON response, and return it.



In [27]:
# import json parse libraries
import pandas as pd
from pandas import json_normalize

def generate_jokes_from_gemini(topic: str, n_jokes: int):

    # Step 1 : create prompt f-string 
    prompt = f"Generate jokes about {topic} with quantity of {n_jokes} jokes , need it in JSON format having id , topics (value = {topic}) and joke keys"
    
    # Step 2 : Tuning output with generation_config
    generation_config = {
        "temperature": 0.72,  # A higher temperature for creativity
        "max_output_tokens": 500 , # Keep the story relatively short
        "response_mime_type": "application/json" # return JSON Type
    }
    
    # Step 3 : generate content with tuned output
    response = model.generate_content(prompt , generation_config = generation_config)
    
    # Step 4 : return as response.text (JSON Format)
    data = response.text
    return data # JSON String

def parse_jokes_json_to_dataframe(jokes_json):
    
    # Step 1 : loads JSON String to list/dict
    jokes_data = json.loads(jokes_json)         # string ‚Üí Python list/dict
    
    # Step 2 ; using json_normalize to convert string to dataframe
    output_df = json_normalize(jokes_data, sep="_")    # flatten JSON
    
    # Step 3 : return dataframe
    return output_df


In [28]:
topic = "animals"
n_jokes = 5

jokes_json = generate_jokes_from_gemini(topic, n_jokes)
print(jokes_json)

jokes_df = parse_jokes_json_to_dataframe(jokes_json)

display(jokes_df)

[
  {
    "id": 1,
    "topics": "animals",
    "joke": "Why don't scientists trust atoms? Because they make up everything! (Just kidding, that's a science joke, here's an animal one: What do you call a lazy kangaroo? Pouch potato!)"
  },
  {
    "id": 2,
    "topics": "animals",
    "joke": "What do you call a bear with no teeth? A gummy bear!"
  },
  {
    "id": 3,
    "topics": "animals",
    "joke": "Why did the scarecrow win an award? Because he was outstanding in his field! (Okay, another animal one: What do you get when you cross a snowman and a vampire? Frostbite!)"
  },
  {
    "id": 4,
    "topics": "animals",
    "joke": "What do you call a fish with no eyes? Fsh!"
  },
  {
    "id": 5,
    "topics": "animals",
    "joke": "Why was the elephant kicked out of the swimming pool? He kept dropping his trunks!"
  }
]


Unnamed: 0,id,topics,joke
0,1,animals,Why don't scientists trust atoms? Because they...
1,2,animals,What do you call a bear with no teeth? A gummy...
2,3,animals,Why did the scarecrow win an award? Because he...
3,4,animals,What do you call a fish with no eyes? Fsh!
4,5,animals,Why was the elephant kicked out of the swimmin...


In [29]:
topic = "science"
n_jokes = 3

jokes_json = generate_jokes_from_gemini(topic, n_jokes)
jokes_df = parse_jokes_json_to_dataframe(jokes_json)

display(jokes_df)

Unnamed: 0,id,topics,joke
0,1,science,Why did the physicist break up with the biolog...
1,2,science,What do you call a lazy kangaroo? Pouch potato!
2,3,science,Why don't scientists trust atoms? Because they...


# Task 2 ‚Äî Classify Sentiment with Gemini (LLM)

##  Goal
Use the **Gemini API** to classify the sentiment of social-media messages from the **`wisesight_sentiment`** dataset, then evaluate model performance.

---

## Task 2.1 ‚Äî Sentiment Classification with LLM
1. Call the **Gemini API** to classify sentiment for each message in the dataset.
2. **Batch processing recommended**:  
   - Send messages in batches (e.g., **20 rows at a time**) to reduce overhead.
3. The model‚Äôs **output must contain only the sentiment label**  
   - No explanation, reasoning, or extra text.
   - Example expected outputs:  
     - `"positive"`  
     - `"neutral"`  
     - `"negative"`
4. Store the predicted sentiment label in a **new column** in your DataFrame (e.g., `pred_sentiment`).

---

## Task 2.2 ‚Äî Evaluate Performance
Write a function to compute the following metrics:

### **Accuracy**
Measures the proportion of correctly predicted labels.


In [32]:
df

Unnamed: 0,texts,category,pred_sentiment
0,‡∏Å‡∏π‡∏à‡∏∞‡πÑ‡∏õ‡∏î‡∏π‡∏î‡πÉ‡∏ô‡πÄ‡∏£‡∏∑‡∏≠‡∏î‡∏≥‡∏ô‡πâ‡∏≥‡∏Ç‡∏≠‡∏á‡∏ô‡∏≤‡∏¢‡∏Å üòÖ,neu,
1,‡πÄ‡∏ï‡∏£‡∏µ‡∏¢‡∏°‡∏ï‡∏±‡∏ß‡πÉ‡∏´‡πâ‡∏û‡∏£‡πâ‡∏≠‡∏°!! ‡∏ù‡∏∂‡∏Å‡∏£‡πâ‡∏≠‡∏á‡πÄ‡∏û‡∏•‡∏á‡πÉ‡∏´‡πâ‡∏Ñ‡∏£‡∏ö ‡πÅ‡∏•‡πâ‡∏ß‡∏°‡∏≤‡∏û‡∏ö...,neu,
2,‡∏≠‡∏∏‡∏ï‡∏£‡∏î‡∏¥‡∏ï‡∏ñ‡πå‡∏°‡∏µ‡∏´‡∏°‡πâ‡∏≠‡∏ô‡πâ‡∏≥‡∏ã‡∏∏‡∏õ4‡∏ä‡πà‡∏≠‡∏á‡πÑ‡∏´‡∏°‡∏Ñ‡πà‡∏∞,neu,
3,ShowDC ‡∏á‡∏≤‡∏ô‡∏î‡∏µ ‡πÄ‡∏£‡∏≤‡πÄ‡∏Ñ‡∏¢‡∏¢‡∏¢üëçüëçüëç,pos,
4,‡∏õ‡∏õ‡∏õ‡∏õ.)),neu,
...,...,...,...
95,‡∏Ç‡∏ô‡πÄ‡∏™‡∏î‡πÑ‡∏á,neu,
96,‡∏ï‡πâ‡∏≠‡∏á‡∏£‡∏∏‡πà‡∏ô‡∏ô‡∏µ‡πâ‡πÄ‡∏•‡∏¢‡∏Ñ‡∏£‡∏±‡∏ö Honda Civic ‡∏°‡∏±‡∏ô‡πÄ‡∏õ‡πá‡∏ô‡∏£‡∏ñ‡∏ó‡∏µ‡πà‡∏°‡∏µ‡πÄ...,pos,
97,‡πÄ‡∏™‡∏µ‡∏¢‡πÄ‡∏ß‡∏•‡∏≤,neg,
98,‡∏™‡∏á‡∏™‡∏≤‡∏£‡πÑ‡∏≠‡∏ï‡∏±‡∏ß‡πÄ‡∏•‡πá‡∏Å‡∏≠‡∏∞‡∏î‡∏¥,neu,


In [33]:
def classify_sentiment_batch(texts):
    """
    Send a batch of texts to Gemini and return a list of sentiment labels.
    Output must be ONLY: neg / neu / pos
    """

    prompt = f"""
Classify sentiment for each message.
Return ONLY one label per line using: pos, neu, neg.
NO explanation.

Messages:
{texts}
"""

    response = model.generate_content(prompt)
    labels = response.text.strip().split("\n")
    labels = [x.strip().lower() for x in labels]

    return labels

batch_size = 20
preds = []
texts = df["texts"].astype(str).tolist()

for i in range(0, len(texts), batch_size):
    batch = texts[i:i+batch_size]
    preds.extend(classify_sentiment_batch(batch))

df["predicted_labels"] = preds

def calculate_accuracy(true_col, pred_col):
    correct = (true_col == pred_col).sum()
    total = len(true_col)
    return correct / total

acc = calculate_accuracy(df["category"], df["predicted_labels"])
print("Accuracy:", acc)

Accuracy: 0.62
