# Evaluate fine-tuned sensationalist detector model

#### Load sensationalist detector for inference

In [1]:
from unsloth import FastLanguageModel
max_seq_length = 2048 
dtype = None 
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "llama-3-8b-bnb-4bit-sft-sensationalist-detector-v2_model",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model)


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


2024-07-01 17:10:19.936878: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-07-01 17:10:19.964144: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


==((====))==  Unsloth: Fast Llama patching release 2024.6
   \\   /|    GPU: NVIDIA GeForce RTX 3060. Max memory: 11.754 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.3.1+cu121. CUDA = 8.6. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. Xformers = 0.0.26.post1. FA = False.
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Unsloth 2024.6 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


### Format testset as the instructions from the training set 

In [2]:
prompt = """ 
Below you will find the title, summary, and body of a news article. 
Your task is to analyze these components and classify whether the article is sensationalist or not.

Sensationalist is defined as: "presenting information in a way that is intended to provoke public interest, excitement, or anxiety, at the expense of accuracy."

### Article information:
    Title: {}
    Subheading: {}
    Body: {}
    is this article false?: {}
    
### is this article sensationalist?:
{}
"""
EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    
    title = examples["Titular"]
    summary = examples["Copete"]
    body      = examples["Cuerpo"]
    is_sensationalist      = examples["Amarillismo"]
    is_false = examples["Falsa"]
    texts = []
    for title, summary, body, is_false, is_sensationalist in zip(title, summary, body, is_false, is_sensationalist):
        # Must add EOS_TOKEN
        text = prompt.format(title, summary, body, is_false, is_sensationalist) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

### Define functions for inference, load testset and get predictions

In [3]:

def inference(article_data, model, tokenizer):
    
    inputs = tokenizer(
    [
        prompt.format(
            article_data["Titular"], # titular
            article_data["Copete"], # summary
            article_data["Cuerpo"],# body
            article_data["Falsa"], # is false
    ""
    )], return_tensors = "pt").to("cuda")

    outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
    return tokenizer.batch_decode(outputs)

import pandas as pd
def load_dataset(dataset_path):
    dataset = pd.read_json(dataset_path, orient='index',  encoding='latin1')
    return dataset

def text_generated2label(text_generated):
    is_sensationalist_answer = text_generated[text_generated.find("### is this article sensationalist?:\n") + len("### is this article sensationalist?:\n"):]
    if("No Amarillista" in is_sensationalist_answer):
        return 0
    else:
        return 1

#### Load testset

In [58]:
testset = pd.read_csv("data/testset_realnews.csv", encoding='latin1')
testset["Amarillismo_binary"] = testset["Amarillismo"].apply(lambda x: 0 if x == "No Amarillista" else 1)

Make predictions for the test set

In [45]:
def make_predictions(model_name, model, tokenizer, dataset):
    predictions = pd.DataFrame(columns=["Id","Predicted_label", "True_label", "Generated_text"])
    for i in range(len(dataset)):
        article_data = dataset.iloc[i]
        text_generated = inference(article_data, model, tokenizer)
        label = text_generated2label(text_generated[0])
        predictions.loc[len(predictions)] = [article_data["Id"], label, article_data["Amarillismo_binary"], text_generated[0]]

    predictions.to_csv(f'predictions_{model_name}.csv', index=False)
    return predictions

In [59]:
predictions_llama_sft_sensationalist = make_predictions("llama-3-8b-bnb-4bit-sft-sensationalist-detector-v2-testset", model, tokenizer, testset)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for

### Analyse predictions 

In [54]:
from sklearn.metrics import classification_report, balanced_accuracy_score

def generate_report(predictions):
    report = classification_report(predictions["True_label"], predictions["Predicted_label"], labels=[0,1], target_names=["No Amarillista", "Amarillista"], output_dict=True)
    report_df = pd.DataFrame(report).transpose()
    
    balanced_accuracy = balanced_accuracy_score(predictions["True_label"], predictions["Predicted_label"])
    print("Balanced accuracy: " + str(balanced_accuracy))
    return report_df, balanced_accuracy

Report for sft model on testset

In [41]:
report, balanced_accuracy_score = generate_report(predictions_llama_sft_sensationalist)
report

Balanced accuracy: 0.6078431372549019


Unnamed: 0,precision,recall,f1-score,support
No Amarillista,0.8,0.333333,0.470588,24.0
Amarillista,0.483871,0.882353,0.625,17.0
accuracy,0.560976,0.560976,0.560976,0.560976
macro avg,0.641935,0.607843,0.547794,41.0
weighted avg,0.668922,0.560976,0.534613,41.0


Eval sft model on trainset

In [51]:
trainset = load_dataset("data/trainset.json")
trainset["Id"] = trainset.index
trainset["Amarillismo_binary"] = trainset["Amarillismo"].apply(lambda x: 0 if x == "No Amarillista" else 1)
predictions_llama_sft_sensationalist_trainset = make_predictions("llama-3-8b-bnb-4bit-sft-sensationalist-detector-v2-trainset", model, tokenizer, trainset)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for

In [55]:
report, balanced_accuracy_score = generate_report(predictions_llama_sft_sensationalist_trainset)
report

Balanced accuracy: 0.905940594059406


Unnamed: 0,precision,recall,f1-score,support
No Amarillista,1.0,0.811881,0.896175,101.0
Amarillista,0.841667,1.0,0.914027,101.0
accuracy,0.905941,0.905941,0.905941,0.905941
macro avg,0.920833,0.905941,0.905101,202.0
weighted avg,0.920833,0.905941,0.905101,202.0


#### hallucinations

In [56]:
def detect_hallucinations(predictions):
    
    hallucinations_responses_idx = [] 
    for i in range(len(predictions)):
        response = predictions['Generated_text'][i]
        response_is_sensationalist = response[response.find("### is this article sensationalist?:\n") + len("### is this article sensationalist?:\n"):]
        
        if "No Amarillista" not in response_is_sensationalist  and "Amarillista" not in response_is_sensationalist:
            print(response_is_sensationalist)
            hallucinations_responses_idx.append(predictions.iloc[i].name)
            
    print(f"Number of hallucinations {len(hallucinations_responses_idx)} out of {len(predictions)} predictions.")
    return hallucinations_responses_idx

In [57]:
# Detect hallucinations in the predictions in testset
detect_hallucinations(predictions_llama_sft_sensationalist)

# Detect hallucinations in the predictions in trainset
detect_hallucinations(predictions_llama_sft_sensationalist_trainset)

Number of hallucinations 0 out of 41 predictions.
Number of hallucinations 0 out of 202 predictions.


[]

## Eval base model without fine-tuning 

In [1]:
from unsloth import FastLanguageModel
max_seq_length = 2048 
dtype = None 
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


2024-06-30 21:02:09.961221: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-30 21:02:09.986747: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


==((====))==  Unsloth: Fast Llama patching release 2024.6
   \\   /|    GPU: NVIDIA GeForce RTX 3060. Max memory: 11.754 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.3.1+cu121. CUDA = 8.6. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. Xformers = 0.0.26.post1. FA = False.
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [11]:
prompt = """ 
Below you will find the title, summary, and body of a news article. 
Your task is to analyze these components and classify whether the article is sensationalist or not.

Sensationalist is defined as: "presenting information in a way that is intended to provoke public interest, excitement, or anxiety, at the expense of accuracy."

### Article information:
    Title: {}
    Subheading: {}
    Body: {}
    is this article false?: {}
    
### is this article sensationalist? (True or False):
{}
"""

In [15]:
inference(testset.iloc[2], model, tokenizer)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


['<|begin_of_text|> \nBelow you will find the title, summary, and body of a news article. \nYour task is to analyze these components and classify whether the article is sensationalist or not.\n\nSensationalist is defined as: "presenting information in a way that is intended to provoke public interest, excitement, or anxiety, at the expense of accuracy."\n\n### Article information:\n    Title: Superluna azul con eclipse y luna de sangre: cuatro fenÃ³menos en uno\n    Subheading: nan\n    Body: Superluna azul con eclipse y luna de sangre: cuatro fenÃ³menos en uno\n Este *NUMBER* de enero hemos podido disfrutar de la segunda superluna de este *NUMBER*. DespuÃ©s de la primera, que fue la madrugada del *NUMBER* al *NUMBER* de enero, la noche del martes *NUMBER* al miÃ©rcoles *NUMBER* hemos observado este fenÃ³meno astronÃ³mico. Pero la superluna solo es uno de los cuatro fenÃ³menos que se producen a la vez este *NUMBER* de enero: tambiÃ©n habrÃ¡ un eclipse lunar, una luna de sangre y una lu

In [7]:
predictions_llama_base = make_predictions("llama-3-8b-bnb-4bit", model, tokenizer)

from sklearn.metrics import classification_report

report = classification_report(predictions_llama_base["True_label"], predictions_llama_base["Predicted_label"], labels=[0,1], target_names=["No Amarillista", "Amarillista"], output_dict=True)
report_df = pd.DataFrame(report).transpose()

report_df

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for

Unnamed: 0,precision,recall,f1-score,support
No Amarillista,0.0,0.0,0.0,24.0
Amarillista,0.414634,1.0,0.586207,17.0
accuracy,0.414634,0.414634,0.414634,0.414634
macro avg,0.207317,0.5,0.293103,41.0
weighted avg,0.171921,0.414634,0.243061,41.0
