<a href="https://colab.research.google.com/github/beyza720/nlphw2/blob/main/inference_pipeline_power.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
from google.colab import drive
from sklearn.model_selection import train_test_split
from sklearn.utils import resample
from transformers import pipeline
import torch
from huggingface_hub import login
from sklearn.metrics import accuracy_score, classification_report
import pandas as pd

drive.mount('/content/drive')

train_path = '/content/drive/My Drive/Colab Notebooks/power-lv-train.tsv'
train_power_data = pd.read_csv(train_path, sep='\t')

# Class imbalance for power, same as the fine-tune a multilingual masked language model task
class_0 = train_power_data[train_power_data['label'] == 0]
class_1 = train_power_data[train_power_data['label'] == 1]

class_1_oversampled = resample(class_1, replace=True, n_samples=len(class_0), random_state=42)
balanced_train_power_data = pd.concat([class_1_oversampled, class_0])

# splitting the data 0.9 for training and 0.1 for testing
train_power, val_power = train_test_split(
    balanced_train_power_data,
    test_size=0.1,
    stratify=balanced_train_power_data['label'],
    random_state=42
)

text = train_power['text'].tolist()
text_en = train_power['text_en'].tolist()

# access token for the model
login(token="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")

pipe = pipeline(
    model="meta-llama/Llama-3.2-1B",
    # to use gpu
    device=0,
    torch_dtype=torch.float16
)

pipe.tokenizer.pad_token_id = pipe.model.config.eos_token_id
pipe.tokenizer.padding_side = 'left'


results_text = pipe(text, max_new_tokens=100, temperature=0.7, batch_size=8)
results_text_en = pipe(text_en, max_new_tokens=100, temperature=0.7, batch_size=8)

predicted_labels_text = [
    1 if "right" in result[0]['generated_text'].lower() else 0 for result in results_text
]
predicted_labels_en = [
    1 if "right" in result[0]['generated_text'].lower() else 0 for result in results_text_en
]

true_labels = train_power['label'].tolist()

print("Accuracy (text):", accuracy_score(true_labels, predicted_labels_text))
print("Accuracy (text_en):", accuracy_score(true_labels, predicted_labels_en))
print("\nClassification Report (text):")
print(classification_report(true_labels, predicted_labels_text))
print("\nClassification Report (text_en):")
print(classification_report(true_labels, predicted_labels_en))


Mounted at /content/drive


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/843 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.47G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/185 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/50.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/301 [00:00<?, ?B/s]

Device set to use cuda:0
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to 

Accuracy (text): 0.4997057092407298
Accuracy (text_en): 0.5450264861683343

Classification Report (text):
              precision    recall  f1-score   support

           0       0.50      1.00      0.67       849
           1       0.50      0.00      0.00       850

    accuracy                           0.50      1699
   macro avg       0.50      0.50      0.34      1699
weighted avg       0.50      0.50      0.34      1699


Classification Report (text_en):
              precision    recall  f1-score   support

           0       0.53      0.72      0.61       849
           1       0.57      0.37      0.45       850

    accuracy                           0.55      1699
   macro avg       0.55      0.55      0.53      1699
weighted avg       0.55      0.55      0.53      1699

