# Módulo 12: Prompt Engineering

Este notebook corresponde a la entrega del ejercicio entregable (actividad 2) del módulo de Prompt Engineering.

## Paso 0: toma de contacto con los datos

Primero vamos a cargar los datos.

In [1]:
import pandas as pd
import os

df = pd.read_csv("videogames_reviews.csv")
df.head()

Unnamed: 0.1,Unnamed: 0,Contenido,Valoración,Recomendado_binario
0,0,2 marzo so bad,No recomendado,0
1,1,10 febrero actualmente recomiendo juego contab...,No recomendado,0
2,2,9 febrero increíblemente gracioso ver cómo cdp...,No recomendado,0
3,3,the world in this game is extremely static the...,No recomendado,0
4,4,zero replayability i finished this game in abo...,No recomendado,0


In [3]:
print("Valores nulos por columna:")
print(df.isnull().sum())

Valores nulos por columna:
Unnamed: 0               0
Contenido              288
Valoración               0
Recomendado_binario      0
dtype: int64


Vamos a limpiar los datos.

In [16]:
# limpiamos valores nulos
df_clean = df.dropna(subset=["Contenido"])

## Paso 1: filtrar opiniones de mayor longitud (reducción del universo)

El primer paso es reducir las 20k muestras en una menor cantidad para poder manejarlas.

Vamos a obtener las opiniones que tengan una longitud mayor.

In [None]:
df_clean['content_length'] = df_clean['Contenido'].str.len()
df_top_100 = df_clean.nlargest(100, 'content_length').reset_index(drop=True)

Eliminamos columnas irrelevantes y cambiamos nombre a inglés (más estándar).

In [189]:
df_top_100 = df_top_100.drop(columns=["Unnamed: 0", "Valoración", "content_length"], errors="ignore")
df_top_100 = df_top_100.rename(columns={"Recomendado_binario": "is_recomended", "Contenido": "content"})
df_top_100.head()

Unnamed: 0,content,is_recomended
0,suiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii...,1
1,this was probably my first preorder i felt tha...,1
2,oh well admittedly its difficult for to write ...,0
3,oh well admittedly its difficult for to write ...,0
4,i know many will handwave away any criticisms ...,0


## Paso 2: clasificación por relevancia

Como vemos, la primera de las opiniones podría considerarse como no relevante, dado que no aporta información útil.

Utilizamos un modelo de generación de texto con prompting para clasificar.

In [181]:
from transformers import pipeline
import torch

model_id = "Qwen/Qwen2.5-1.5B-Instruct"

pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

Loading weights: 100%|██████████| 338/338 [00:02<00:00, 117.07it/s, Materializing param=model.norm.weight]                              


In [183]:
from tqdm import tqdm

prompt = """You are an elite content moderator for a gaming forum. 
Your task is to classify user reviews into RELEVANT or SPAM.

### DEFINITIONS:
- RELEVANT: Meaningful opinions about game mechanics, bugs, story, developers, or comparisons. Even if the tone is toxic or negative, if it discusses the game, it is RELEVANT.
- SPAM: Nonsense text, character repetition (e.g., "suiiii"), keyboard smashing, or single-word shouts that provide no feedback.

### EXAMPLES:

Review: "The physics engine in this game is broken. I fell through the floor five times in an hour. Trash devs."
Result: RELEVANT

Review: "suiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii"
Result: SPAM

Review: "This game is a shovelware imitation of a game that never existed, a total garbagefire."
Result: RELEVANT

Review: "asdfghjklasdfghjkl!!!"
Result: SPAM

Review: "best game ever 10/10"
Result: RELEVANT

Review: "looooooooooooooooooooooooooooooooooooool"
Result: SPAM

### CURRENT REVIEW TO CLASSIFY:
Review: {input_text}
Result:"""

results = []
for i, row in df_top_100.iterrows():

    text = row["content"]
    reduced_text = text[:1000]
    messages = [{"role": "user", "content": prompt.format(input_text=reduced_text)}]

    output = pipe(messages, max_new_tokens=2, temperature=0.1)
    result = output[0]["generated_text"][-1]["content"].strip().upper()
    if "SPAM" in result:
        result = "SPAM"
    else:
        result = "RELEVANT"

    results.append(result)

Both `max_new_tokens` (=2) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=2) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=2) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=2) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_ne

In [185]:
from collections import defaultdict

counts = defaultdict(int)
for res in results:
    counts[res] += 1

print(counts)

defaultdict(<class 'int'>, {'SPAM': 1, 'RELEVANT': 99})


Vemos que solo hay una marcada como SPAM. Las eliminamos.

In [190]:
df_top_100["cat"] = results
df_top_100 = df_top_100[df_top_100["cat"] != "SPAM"]

df_top_100.info()

<class 'pandas.DataFrame'>
RangeIndex: 99 entries, 1 to 99
Data columns (total 3 columns):
 #   Column         Non-Null Count  Dtype
---  ------         --------------  -----
 0   content        99 non-null     str  
 1   is_recomended  99 non-null     int64
 2   cat            99 non-null     str  
dtypes: int64(1), str(2)
memory usage: 2.4 KB


## Paso 3: extracción de datos

El objetivo ahora es extraer información extructurada a partir de cada opinión. Tomaremos como referencia el formato impuesto en el enunciado.

Usamos el mismo modelo de generación de texto pero con Instructor.

In [None]:
from pydantic import BaseModel, Field

class Output(BaseModel):
    sentiment: str = Field(description="Sentiment extracted")

In [213]:
from transformers import pipeline
import torch

pipe2 = pipeline(
    "text-generation",
    model="Qwen/Qwen2.5-1.5B-Instruct",
    dtype=torch.float16,
    device_map="auto"
)

Loading weights: 100%|██████████| 338/338 [00:04<00:00, 82.11it/s, Materializing param=model.norm.weight]                               
Some parameters are on the meta device because they were offloaded to the cpu and disk.


In [220]:
prompt = """<|im_start|>system
You are a specialized game analyst. Extract information from game reviews into a strict JSON format.
JSON fields:
- sentiment: sentiment extracted from the user review. Possible values: positive, negative.
- difficulty: difficulty of the game extracted from review. Possible values: easy, difficult, null (in case it is not defined).
- positive_tags: tags with positive aspects extracted from user review about game(no repeat values).
- negative_tags: tags with negative aspects extracted from user review about game(no repeat values).
Only return the JSON object, nothing else.<|im_end|>
<|im_start|>user
Review: "{input_text}"

JSON Schema:
{{
  "sentiment": "string",
  "positive_tags": ["string"],
  "negative_tags": ["string"],
  "difficulty": "string"
}}
<|im_end|>
<|im_start|>assistant
{{"""
  
results_2 = []
for i, row in df_top_100.iterrows():
    
    if i == 10:
        break

    text = row["content"]
    reduced_text = text[:800]
    _prompt = prompt.format(input_text=reduced_text)
    output = pipe2(_prompt, max_new_tokens=200, temperature=0.7, return_full_text=False)
    result = "{" + output[0]["generated_text"]
    results_2.append(result)


Both `max_new_tokens` (=200) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=200) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=200) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=200) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both

## Resultados

In [227]:
import json

loaded_jsons = [json.loads(res) for res in results_2]

results_df = df_top_100.copy(deep=True).reset_index()
results_df = results_df[results_df.index < len(loaded_jsons)]


for (i, row), js in zip(results_df.iterrows(), loaded_jsons):
    print("REVIEW:", row["content"][:50], "...")
    print("EXTRACTED:", js)

REVIEW: this was probably my first preorder i felt that if ...
EXTRACTED: {'sentiment': 'positive', 'positive_tags': ['loving urban spaces', 'finding alleyways', 'seeing lit-up windows', 'many details and slices of life', 'full enjoyment', 'highest praise'], 'negative_tags': [], 'difficulty': 'null'}
REVIEW: oh well admittedly its difficult for to write this ...
EXTRACTED: {'sentiment': 'negative', 'positive_tags': [], 'negative_tags': ['dissapointing', 'biased', 'painful', 'huge fantasy and rpg fan', 'witcher franchise', 'favorite gaming franchises', 'entertaining quality content rich titles', 'favorite developer studio', 'topped it'], 'difficulty': 'difficult'}
REVIEW: oh well admittedly its difficult for to write this ...
EXTRACTED: {'sentiment': 'negative', 'positive_tags': ['fantasy', 'rpg', 'witcher', 'gwent', 'enjoyment'], 'negative_tags': ['dissapointing', 'biased', 'painful', 'huge fantasy and rpg fan', 'cdpr', 'their previous products', 'attitude towards their comm'], 'diffic

Guardamos en un CSV.

In [230]:
final_df = pd.DataFrame(loaded_jsons)
final_df["review"] = results_df["content"]
final_df.to_csv("./results.csv", index=False)

## Notas

- Había pensado en aplicar zero-shot-classification, pero no conweguía buenos resultados.
- Utilizando Qwen 2.5, el modelo de 0.5B de parámetros se quedaba corto en ambas tareas.
- En el prompt de la primera tarea he tenido que añadir unos ejemplos (few-shot) para conseguri buenos resultados.
- En el prompt de la segunda tarea, he debido indicar que no ponga tags repetidos.
- Sorry por los warnings :(