<a href="https://colab.research.google.com/github/Riasci/KomputasiIntelegensiaTasks/blob/main/TaskWeek7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Nama: Ria Mulyadi

NPM: 2206048556

**Running the 'mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis' model**

In [1]:
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-classification", model="mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


**Check the model’s accuracy using a synthetic dataset**

In [2]:
import pandas as pd
import numpy as np

# Define some sample financial news headlines and their sentiment labels
data = {
    "headline": [
        "The company reported record profits this quarter.",
        "Investors are worried about the rising inflation rates.",
        "Stock prices fell sharply after the disappointing earnings report.",
        "New product launch is expected to boost sales significantly.",
        "The merger has raised concerns among regulators.",
        "Market analysts predict a downturn in the next quarter.",
        "Company X's stock has outperformed its competitors.",
        "There is uncertainty in the market due to geopolitical tensions.",
        "The acquisition is seen as a positive move for growth.",
        "Experts are optimistic about the company's future performance."
    ],
    "label": [
        "positive",  # Record profits
        "negative",  # Rising inflation
        "negative",  # Disappointing earnings
        "positive",  # Boost in sales
        "negative",  # Concerns about merger
        "negative",  # Market downturn
        "positive",  # Outperformed competitors
        "negative",  # Market uncertainty
        "positive",  # Positive acquisition
        "positive"   # Optimistic performance
    ]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Shuffle the DataFrame
df = df.sample(frac=1).reset_index(drop=True)

# Save the synthetic dataset to a CSV file
df.to_csv("synthetic_financial_news.csv", index=False)

print("Synthetic dataset created:")
print(df)


Synthetic dataset created:
                                            headline     label
0  New product launch is expected to boost sales ...  positive
1  The acquisition is seen as a positive move for...  positive
2  Stock prices fell sharply after the disappoint...  negative
3  Company X's stock has outperformed its competi...  positive
4   The merger has raised concerns among regulators.  negative
5  There is uncertainty in the market due to geop...  negative
6  Investors are worried about the rising inflati...  negative
7  Market analysts predict a downturn in the next...  negative
8  Experts are optimistic about the company's fut...  positive
9  The company reported record profits this quarter.  positive


In [3]:
# Import library yang diperlukan
from transformers import pipeline
import numpy as np

# Initialize the model pipeline
pipe = pipeline("text-classification", model="mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis")

# Extract headlines from DataFrame
headlines = df["headline"].tolist()

# Make predictions
predictions = pipe(headlines)

# Extract predicted labels
predicted_labels = [pred['label'] for pred in predictions]

# Calculate accuracy
accuracy = np.mean(np.array(predicted_labels) == np.array(df["label"]))

print(f"Akurasi Model: {accuracy:.2f}")




Akurasi Model: 1.00


**Implement attention transformer on that model**

In [4]:
pip install transformers torch




In [5]:
import torch
from torch import nn
from transformers import DistilBertModel

class AttentionTransformer(nn.Module):
    def __init__(self, model_name="mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis"):
        super(AttentionTransformer, self).__init__()
        self.distilbert = DistilBertModel.from_pretrained(model_name)
        self.fc = nn.Linear(self.distilbert.config.dim, 2)  # Assuming binary classification
        self.softmax = nn.Softmax(dim=1)

    def forward(self, input_ids, attention_mask):
        # Get the outputs from DistilBERT
        outputs = self.distilbert(input_ids=input_ids, attention_mask=attention_mask)

        # Extract the last hidden state
        hidden_state = outputs.last_hidden_state

        # Apply attention mechanism
        # Here, we can use the mean of the hidden states as a simple form of attention
        attention_output = torch.mean(hidden_state, dim=1)  # Mean pooling

        # Pass through the fully connected layer
        logits = self.fc(attention_output)

        # Apply softmax to get probabilities
        probabilities = self.softmax(logits)

        return probabilities

**Check the model accuracy after using attention transformer using the same synthetic dataset and compare**

In [6]:
import torch
import pandas as pd
from torch.utils.data import DataLoader, Dataset
from transformers import RobertaTokenizer  # Use RobertaTokenizer instead of DistilBertTokenizer

# Load the synthetic dataset
df = pd.read_csv("synthetic_financial_news.csv")

# Map labels to integers
label_map = {"positive": 1, "negative": 0}
df['label'] = df['label'].map(label_map)

# Initialize tokenizer
tokenizer = RobertaTokenizer.from_pretrained("mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis")

# Define the FinancialNewsDataset class (make sure you have this class defined)
class FinancialNewsDataset(Dataset):
    def __init__(self, headlines, labels, tokenizer, max_len):
        self.headlines = headlines
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_len = max_len

    def __len__(self):
        return len(self.headlines)

    def __getitem__(self, idx):
        headline = self.headlines[idx]
        label = self.labels[idx]

        # Tokenize the headline
        encoding = self.tokenizer.encode_plus(
            headline,
            add_special_tokens=True,
            max_length=self.max_len,
            return_token_type_ids=False,
            padding='max_length',
            truncation=True,
            return_attention_mask=True,
            return_tensors='pt'
        )

        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'labels': torch.tensor(label, dtype=torch.long)
        }

# Create dataset and dataloader
dataset = FinancialNewsDataset(df['headline'].tolist(), df['label'].tolist(), tokenizer, max_len=10)
dataloader = DataLoader(dataset, batch_size=2, shuffle=True)

# Initialize the AttentionTransformer model
model = AttentionTransformer()
model.eval()  # Set the model to evaluation mode

# Function to evaluate accuracy
def evaluate_model(model, dataloader):
    correct_predictions = 0
    total_predictions = 0

    with torch.no_grad():
        for batch in dataloader:
            input_ids = batch['input_ids']
            attention_mask = batch['attention_mask']
            labels = batch['labels']

            # Get model predictions
            outputs = model(input_ids, attention_mask)
            _, preds = torch.max(outputs, dim=1)

            correct_predictions += torch.sum(preds == labels).item()
            total_predictions += labels.size(0)

    return correct_predictions / total_predictions

# Calculate and print model accuracy
accuracy = evaluate_model(model, dataloader)
print(f"Model Accuracy with Attention Transformer: {accuracy:.2f}")


You are using a model of type roberta to instantiate a model of type distilbert. This is not supported for all configurations of models and can yield errors.
Some weights of DistilBertModel were not initialized from the model checkpoint at mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis and are newly initialized: ['embeddings.LayerNorm.bias', 'embeddings.LayerNorm.weight', 'embeddings.position_embeddings.weight', 'embeddings.word_embeddings.weight', 'transformer.layer.0.attention.k_lin.bias', 'transformer.layer.0.attention.k_lin.weight', 'transformer.layer.0.attention.out_lin.bias', 'transformer.layer.0.attention.out_lin.weight', 'transformer.layer.0.attention.q_lin.bias', 'transformer.layer.0.attention.q_lin.weight', 'transformer.layer.0.attention.v_lin.bias', 'transformer.layer.0.attention.v_lin.weight', 'transformer.layer.0.ffn.lin1.bias', 'transformer.layer.0.ffn.lin1.weight', 'transformer.layer.0.ffn.lin2.bias', 'transformer.layer.0.ffn.lin2.weight', 'transformer

Model Accuracy with Attention Transformer: 0.60


In [7]:
import torch
import pandas as pd
from transformers import RobertaTokenizer, pipeline

# Load the synthetic dataset
df = pd.read_csv("synthetic_financial_news.csv")

# Initialize the pre-trained sentiment analysis pipeline
pretrained_pipe = pipeline("text-classification", model="mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis")

# Tokenizer for RoBERTa
tokenizer = RobertaTokenizer.from_pretrained("mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis")

# Tokenize the headlines
inputs = tokenizer(df['headline'].tolist(), padding=True, truncation=True, return_tensors="pt")

# Use the pre-trained model for predictions
pretrained_predictions = pretrained_pipe(df["headline"].tolist())
pretrained_labels = [pred['label'] for pred in pretrained_predictions]

# Calculate accuracy for pre-trained model
pretrained_accuracy = (np.array(pretrained_labels) == np.array(df["label"])).mean()
print(f"Akurasi Model Pre-trained: {pretrained_accuracy:.2f}")

# Instantiate the custom AttentionTransformer model
attention_model = AttentionTransformer()

# Move model to evaluation mode
attention_model.eval()

# Forward pass through the custom model
with torch.no_grad():
    attention_probabilities = attention_model(inputs['input_ids'], inputs['attention_mask'])
    attention_predictions = torch.argmax(attention_probabilities, dim=1).numpy()

# Convert numerical predictions back to labels
attention_labels = ["positive" if pred == 1 else "negative" for pred in attention_predictions]

# Calculate accuracy for the custom model
attention_accuracy = (np.array(attention_labels) == np.array(df["label"])).mean()
print(f"Akurasi Model Attention Transformer: {attention_accuracy:.2f}")




Akurasi Model Pre-trained: 1.00


You are using a model of type roberta to instantiate a model of type distilbert. This is not supported for all configurations of models and can yield errors.
Some weights of DistilBertModel were not initialized from the model checkpoint at mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis and are newly initialized: ['embeddings.LayerNorm.bias', 'embeddings.LayerNorm.weight', 'embeddings.position_embeddings.weight', 'embeddings.word_embeddings.weight', 'transformer.layer.0.attention.k_lin.bias', 'transformer.layer.0.attention.k_lin.weight', 'transformer.layer.0.attention.out_lin.bias', 'transformer.layer.0.attention.out_lin.weight', 'transformer.layer.0.attention.q_lin.bias', 'transformer.layer.0.attention.q_lin.weight', 'transformer.layer.0.attention.v_lin.bias', 'transformer.layer.0.attention.v_lin.weight', 'transformer.layer.0.ffn.lin1.bias', 'transformer.layer.0.ffn.lin1.weight', 'transformer.layer.0.ffn.lin2.bias', 'transformer.layer.0.ffn.lin2.weight', 'transformer

Akurasi Model Attention Transformer: 0.50


Hasilnya,akurasi Model Pre-trained (mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis) mencapai akurasi 100% pada dataset sintetis sedangkan Akurasi Attention Transformer hanya mencapai akurasi 50%.