# Aspect-Based Sentiment Analysis (ABSA) for Vietnamese Text

## SECTION 1: Setup and Initialization
In this section, we:
- Import necessary libraries (json, torch, transformers)
- Load a pre-trained RoBERTa model and tokenizer for Vietnamese sentiment analysis
- Define utility functions for JSON data loading

The model used is "wonrax/phobert-base-vietnamese-sentiment", which is specifically trained for Vietnamese language sentiment analysis.


In [None]:
import json
import torch
from transformers import RobertaForSequenceClassification, AutoTokenizer

# Load the model and tokenizer
model = RobertaForSequenceClassification.from_pretrained("wonrax/phobert-base-vietnamese-sentiment")
tokenizer = AutoTokenizer.from_pretrained("wonrax/phobert-base-vietnamese-sentiment", use_fast=False)

def load_json_data(file_path):
    with open(file_path, 'r', encoding='utf-8') as file:
        return json.load(file)


## SECTION 2: Sentiment Analysis Functions
This section includes functions for:
- Analyzing sentiment of a given text
- Determining the dominant sentiment
- Mapping sentiment to polarity (NEG, POS, NEU)

The `analyze_sentiment` function uses the loaded model to predict sentiment probabilities for a given text. The `get_dominant_sentiment` and `sentiment_to_polarity` functions process these probabilities into a final sentiment classification.


In [None]:
def analyze_sentiment(text):
    input_ids = torch.tensor([tokenizer.encode(text)])
    with torch.no_grad():
        out = model(input_ids)
        probs = out.logits.softmax(dim=-1).tolist()[0]
    return {
        "Negative": probs[0],
        "Positive": probs[1],
        "Neutral": probs[2]
    }

def get_dominant_sentiment(sentiment_dict):
    return max(sentiment_dict, key=sentiment_dict.get)

def sentiment_to_polarity(sentiment):
    mapping = {
        "negative": "NEG",
        "positive": "POS",
        "neutral": "NEU"
    }
    return mapping.get(sentiment.lower(), "NEU")


## SECTION 3: Feedback Processing
The `process_feedback` function is the core of the ABSA method. It:
- Iterates through aspects and their terms in the feedback
- Extracts context around each aspect term
- Analyzes sentiment for each context
- Generates results including aspect category, terms, polarity, and sentiment score

This function implements the aspect-based approach by analyzing sentiment specifically for the context of each aspect term.


In [None]:
def process_feedback(feedback):
    content = feedback['Content']
    results = []
    
    for aspect in feedback['Aspects']:
        for term in aspect['AspectTerms']:
            # Find the context around the term (you may need to adjust this logic)
            start = max(0, content.find(term) - 20)
            end = min(len(content), content.find(term) + len(term) + 20)
            context = content[start:end]
            
            sentiment = analyze_sentiment(context)
            dominant_sentiment = get_dominant_sentiment(sentiment)
            dominant_score = round(sentiment[dominant_sentiment], 3)
            
            results.append({
                'AspectCategory': aspect['AspectCategory'],
                'AspectTerms': term,
                'Polarity': sentiment_to_polarity(dominant_sentiment),
                'DominantScore': dominant_score
            })
    
    return results


## SECTION 4: Data Processing and Output
The final section of the script:
- Loads JSON data from 'manual_labelling.json'
- Processes each feedback entry using the ABSA method
- Structures the results into a new format
- Writes the processed data to 'processed.json'

This section ties together all the previous functions to process a dataset and output the results in a structured JSON format.


In [None]:
# Load and process the JSON data
json_data = load_json_data('manual_labelling.json')

# Process each feedback entry
processed_data = []
for feedback in json_data:
    processed_feedback = {
        'GeneralFeedbackID': feedback['GeneralFeedbackID'],
        'ID': feedback['ID'],
        'Content': feedback['Content'],
        'Aspects': process_feedback(feedback)
    }
    processed_data.append(processed_feedback)

# Write the processed data to a JSON file
with open('processed.json', 'w', encoding='utf-8') as file:
    json.dump(processed_data, file, ensure_ascii=False, indent=2)

print("Processing complete. Results saved to 'processed.json'.")