#deberta-v3-base-abas-v1.1
https://medium.com/nlplanet/quick-intro-to-aspect-based-sentiment-analysis-c8888a09eda7

- the dataset is unlabeled, this study uses a pre-trained ABSA model yangheng/deberta-v3-base-absa-v1.1 to generate pseudo ground truth sentiment labels.
- this model trained on SemEval datasets, and can detect aspect terms and classify the associated sentiment into pos/neu/neg
- DeBERTa assign sentiment polarity and cofidence score (for fine tuning and evaluation)


####Using a public pre-trained model
First, we install the transformers library along with the SentencePiece tokenizer (which is needed by some models of the library, such as DeBERTa

In [None]:
!pip install transformers[sentencepiece]

Load Model

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch.nn.functional as F
import torch
import pandas as pd
from tqdm import tqdm

# Load the tokenizer and ABSA model
absa_tokenizer = AutoTokenizer.from_pretrained("yangheng/deberta-v3-base-absa-v1.1")
absa_model = AutoModelForSequenceClassification.from_pretrained("yangheng/deberta-v3-base-absa-v1.1")


In [None]:
# Load the CSV (update path if needed)
df = pd.read_csv("ASPECT_preprocessed.csv")

# Keep only needed columns and remove duplicates
df = df[['Id', 'cleaned_text', 'aspect']].drop_duplicates()


In [None]:
df.shape

In [None]:
# Set model to eval mode
absa_model.eval()

# Store results here
results = []

# Loop through each row
for _, row in tqdm(df.iterrows(), total=len(df)):
    sentence = row['cleaned_text']
    aspect = row['aspect']

    # ABSA-specific input format
    encoded_input = absa_tokenizer(f"[CLS] {sentence} [SEP] {aspect} [SEP]", return_tensors="pt")

    with torch.no_grad():
        output = absa_model(**encoded_input)
        probs = F.softmax(output.logits, dim=1).detach().numpy()[0]

    # Map probabilities to sentiment
    sentiment_labels = ['negative', 'neutral', 'positive']
    sentiment = sentiment_labels[probs.argmax()]
    confidence = probs.max()

    results.append({
        "Id": row['Id'],
        "cleaned_text": sentence,
        "aspect": aspect,
        "sentiment": sentiment,
        "confidence": confidence
    })


In [None]:
results_df = pd.DataFrame(results)

In [None]:
results_df.head()

In [None]:
# Convert to DataFrame and save
results_df.to_csv("DeBERTa.csv", index=False)

In [None]:
results_df.shape

####Merge file


In [None]:
import pandas as pd

# Load the datasets
aspect_df = pd.read_csv("ASPECT_preprocessed.csv")
deberta_df = pd.read_csv("DeBERTa.csv")

# Merge on 'Id', 'translated_text', and 'aspect'
merged_df = pd.merge(
    aspect_df[['Id', 'created_at', 'cleaned_text', 'type', 'aspect']],
    deberta_df[['Id', 'cleaned_text', 'aspect', 'sentiment', 'confidence']],
    # Remove 'type' from the 'on' list as it's not in both dataframes
    on=['Id', 'cleaned_text', 'aspect'],
    how='inner'
)

# Rename 'sentiment' to 'deberta_sentiment'
merged_df.rename(columns={'sentiment': 'deberta_sentiment'}, inplace=True)

# Reorder columns
merged_df = merged_df[['Id', 'created_at', 'cleaned_text', 'type', 'aspect', 'deberta_sentiment', 'confidence']]

In [None]:
# Save the result to a new CSV file
merged_df.to_csv("ABSA-DeBERTa_annotate.csv", index=False)

In [None]:
merged_df.shape