# LLM Prediction

In [3]:
# importando as bibliotecas

import pandas as pd
from sklearn.model_selection import train_test_split
import tensorflow as tf
from transformers import (
    TFBertModel,
    BertTokenizer,
    TFAutoModelForSequenceClassification,
    DataCollatorWithPadding,
    pipeline,
)

## Testing Sentiment Analysis


In [4]:
# carregando os modelos já treinados dos repositórios no HuggingFace

tokenizer = BertTokenizer.from_pretrained(
    "Guspfc/my-awesome-bert-model-sentiment-analysis"
)
model = TFAutoModelForSequenceClassification.from_pretrained(
    "Guspfc/my-awesome-bert-model-sentiment-analysis"
)

Some layers from the model checkpoint at Guspfc/my-awesome-bert-model-sentiment-analysis were not used when initializing TFBertForSequenceClassification: ['dropout_525']
- This IS expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertForSequenceClassification were initialized from the model checkpoint at Guspfc/my-awesome-bert-model-sentiment-analysis.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForSequenceClassification for predictions without further training.


In [5]:
# gerando uma entrada de teste e tokenizando ela

input_text = "I enjoyed this game, but it was really hard."
inputs = tokenizer(input_text, return_tensors="tf")

# passando o input para o modelo

outputs = model(**inputs)
logits = outputs.logits

# convertendo o output do modelo (formato logits padrão de modelos de linguagem) para valores interpretáveis através
# da softmax, e após isso, obtendo a classe predita pela argmax

probabilities = tf.nn.softmax(logits, axis=-1)
predicted_class = tf.argmax(probabilities, axis=1).numpy()[0]

# obtendo a classe predita
if predicted_class == 1:
    print("Neutro")
elif predicted_class == 2:
    print("Negativo")
else:
    print("Positivo")

Neutro


In [6]:
# outro exemplo de predicao utilizando o pipeline do HuggingFace, nesse também se obtem o score referente a classificacao

classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

input_text = "this game is awful"
result = classifier(input_text)

predicted_class = result[0]["label"]

print(result)

[{'label': 'LABEL_2', 'score': 0.9992133378982544}]


## Testing Hate Speech Identification


In [7]:
# carregando os modelos já treinados dos repositórios no HuggingFace

tokenizer_h = BertTokenizer.from_pretrained("Guspfc/my-awesome-bert-model-hate-speech")
model_h = TFAutoModelForSequenceClassification.from_pretrained(
    "Guspfc/my-awesome-bert-model-hate-speech"
)

All model checkpoint layers were used when initializing TFBertForSequenceClassification.

All the layers of TFBertForSequenceClassification were initialized from the model checkpoint at Guspfc/my-awesome-bert-model-hate-speech.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForSequenceClassification for predictions without further training.


In [11]:
# gerando uma entrada de teste e tokenizando ela

input_text = "I hate people"
inputs = tokenizer_h(input_text, return_tensors="tf")

# passando o input para o modelo

outputs = model_h(**inputs)
logits = outputs.logits

# convertendo o output do modelo (formato logits padrão de modelos de linguagem) para valores interpretáveis através
# da softmax, e após isso, obtendo a classe predita pela argmax

probabilities = tf.nn.softmax(logits, axis=-1)
predicted_class = tf.argmax(probabilities, axis=1).numpy()[0]


# obtendo a classe predita

if predicted_class == 0:
    print("Discurso de ódio")
elif predicted_class == 1:
    print("linguagem ofensiva")
else:
    print("Sem problemas")

linguagem ofensiva
