# LLMs - You Can't Please Them All
Are LLM-judges robust to adversarial inputs?

The goal of this competition is to maximize disagreement between three individual LLM-judges, while using the English language, and without repeating yourself. Each row in your submission.csv file should contain an essay with a length of approximately 100 words. 

Our unpublished LLM-as-a-judge system will return an average of three quality scores for every essay that you submit (avg_q). Quality scores will be floats in the range [0,9]. The grading system will also return measurements of both horizontal variance (avg_h) and vertical variance (min_v). Horizontal variance is defined as the variance between the scores returned by the 3 judges for a single essay, while vertical variance is defined as the variance between the scores returned by a single judge across every essay. These scores will then be combined with English language confidence scores (avg_e) and sequence similarity scores (avg_s) to penalize non-English and repetitive approaches. English scores and similarity scores are both floats in the range [0,1].

#### Import Libraries

In [None]:
import pandas as pd
import sys 
import torch
import random
import numpy as np
import pandas as pd
import gc
import time
import random
from tqdm import tqdm
import os

from IPython.display import display

from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM, AutoModel

if (not torch.cuda.is_available()): print("Sorry - GPU required!")
    
import logging
logging.getLogger('transformers').setLevel(logging.ERROR)

pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', None)

### Data

In [None]:
test_df = pd.read_csv("/kaggle/input/llms-you-cant-please-them-all/test.csv")
sample = pd.read_csv("/kaggle/input/llms-you-cant-please-them-all/sample_submission.csv")

In [None]:
test_df

### Model

#### Basic Example

In [None]:
tokenizer = AutoTokenizer.from_pretrained("/kaggle/input/gemma-2/transformers/gemma-2-2b/2")
model = AutoModelForCausalLM.from_pretrained(
    "/kaggle/input/gemma-2/transformers/gemma-2-2b/2",
    device_map="auto",
    torch_dtype=torch.bfloat16
)

input_text = "Compare and contrast the importance of self-reliance and adaptability in healthcare."
input_ids = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

### Multiple tests

In [None]:
def generate_model(model_name,word_count_to_request, temperature, top_p, repetition_penalty  ): 
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(
                                                model_name,
                                                device_map="auto",
                                                torch_dtype=torch.bfloat16, 
                                                trust_remote_code=True ### ???
                                                )
    pipe = pipeline(
                    "text2text-generation", 
                    model=model, 
                    tokenizer=tokenizer, 
                    trust_remote_code=True,
                    max_new_tokens=word_count_to_request*1.3,
                    temperature=temperature,
                    top_p=top_p,
                    do_sample=True,
                    repetition_penalty = repetition_penalty
                )
    return pipe 

def generate_essay(prompt, topic, pipe): 
    final_prompt = f'{prompt} {topic}'
    response = pipe(final_prompt)
    essay = response
    return essay[0]['generated_text'].split('Answer:')[1]
    

In [None]:
model = generate_model("/kaggle/input/gemma/transformers/2b/1", 
                      word_count_to_request= 130, 
                      temperature = 0.7, 
                      top_p = 0.9,
                      repetition_penalty=1.2)

In [None]:
topics = list(test_df.topic)
print(topics)

In [None]:
prompt =  'generate a creative, human like crazy 100 words essay about the following topic:'
topic = topics[0]
essay = generate_essay(prompt, topic, model)


In [None]:
print(topic, '/n', essay)
print('largo del essay', len(essay.split(' ')))

In [None]:
essays = []
for i in range(len(test_df)):
    prompt =  'generate a creative, human like crazy 100 words essay about the following topic:'
    topic = topics[0]
    essay = generate_essay(prompt, topic, model)
    essays.append(essay)


In [None]:
sample['essay'] = essays 
sample

In [None]:
sample.to_csv('submission.csv', index = False)