# **Aspect-Based Sentiment Analysis (ABSA) for Product Reviews**

**Objective :** build a system that can identify specific aspects (e.g., "battery life," "screen quality," "price") in product reviews and assign a sentiment to each aspect.

In [1]:
# importing necessary libraries and loading the dataset
import pandas as pd
import numpy as np

df = pd.read_csv(r"C:\P_Project_1\ABSA\dataset\flipkart_reviews_dataset.csv")
df.head(10)

Unnamed: 0,product_id,product_title,rating,summary,review,location,date,upvotes,downvotes
0,ACCFZGAQJGYCYDCM,BoAt Rockerz 235v2 with ASAP charging Version ...,5,Terrific purchase,1-more flexible2-bass is very high3-sound clar...,Shirala,8 months ago,1390,276
1,ACCFZGAQJGYCYDCM,BoAt Rockerz 235v2 with ASAP charging Version ...,5,Terrific purchase,Super sound and good looking I like that prize,Visakhapatnam,8 months ago,643,133
2,ACCFZGAQJGYCYDCM,BoAt Rockerz 235v2 with ASAP charging Version ...,5,Super!,Very much satisfied with the device at this pr...,Kozhikode,"Feb, 2020",1449,328
3,ACCFZGAQJGYCYDCM,BoAt Rockerz 235v2 with ASAP charging Version ...,5,Super!,"Nice headphone, bass was very good and sound i...",Jaora,7 months ago,160,28
4,ACCFZGAQJGYCYDCM,BoAt Rockerz 235v2 with ASAP charging Version ...,5,Terrific purchase,Sound quality super battery backup super quali...,New Delhi,8 months ago,533,114
5,ACCFZGAQJGYCYDCM,BoAt Rockerz 235v2 with ASAP charging Version ...,5,Wonderful,"Wowwww it's amezing bluetooth nice look, nice ...",Bengaluru,8 months ago,172,37
6,ACCFZGAQJGYCYDCM,BoAt Rockerz 235v2 with ASAP charging Version ...,4,Pretty good,Awesome colour! Amazing experience .. but only...,Robertsonpet,8 months ago,206,46
7,ACCFZGAQJGYCYDCM,BoAt Rockerz 235v2 with ASAP charging Version ...,5,Terrific purchase,"For the first time, I am posting a review, jus...",Bhadreswar,"Feb, 2020",616,182
8,ACCFZGAQJGYCYDCM,BoAt Rockerz 235v2 with ASAP charging Version ...,4,Delightful,First of all the Delivery boy is a good guy. N...,Tirupati,"Mar, 2020",232,66
9,ACCFZGAQJGYCYDCM,BoAt Rockerz 235v2 with ASAP charging Version ...,1,Worthless,This headphone is good but not that much as i ...,Firozabad District,8 months ago,265,83


In [2]:
df.shape

(9374, 9)

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9374 entries, 0 to 9373
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   product_id     9374 non-null   object
 1   product_title  9374 non-null   object
 2   rating         9374 non-null   int64 
 3   summary        9374 non-null   object
 4   review         9374 non-null   object
 5   location       8081 non-null   object
 6   date           9374 non-null   object
 7   upvotes        9374 non-null   int64 
 8   downvotes      9374 non-null   int64 
dtypes: int64(3), object(6)
memory usage: 659.2+ KB


**Data types:**
- int64 (3 columns) → rating, upvotes, downvotes
- object (6 columns) → product_id, product_title, summary, review, location, date

**Missing values:**
- The location column has only 8081 non-null values, meaning 1293 entries are missing.
- All other columns are complete (no missing values).

# **Performing EDA** 

 replacing Null values in the location column were with the keyword 'Unknown'. This approach ensures that all aspects and their associated sentiments can be consistently analyzed with the corresponding reviews. Additionally, retaining null values would have resulted in N/A appearing during visualization, which could affect readability and interpretation.

In [4]:
# checking for missing values
df.isna().sum()

product_id          0
product_title       0
rating              0
summary             0
review              0
location         1293
date                0
upvotes             0
downvotes           0
dtype: int64

In [5]:
# replacing Null values.
df['location'].fillna('Unknown', inplace=True)
display(df.isna().sum())

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['location'].fillna('Unknown', inplace=True)


product_id       0
product_title    0
rating           0
summary          0
review           0
location         0
date             0
upvotes          0
downvotes        0
dtype: int64

In [6]:
# checking for duplicate values
df['review'].duplicated().sum()

np.int64(2041)

In [7]:
# removing duplicate values
df.drop_duplicates(subset=['review'], inplace=True)

In [8]:
# verifying if duplicates are removed
df['review'].duplicated().any()

np.False_

In [9]:
# final shape of the dataframe
df.shape

(7333, 9)

In [9]:
# dropping unnecessary columns
df.drop(columns=['product_id', 'rating', 'summary'], inplace=True)
df.head(10)


Unnamed: 0,product_title,review,location,date,upvotes,downvotes
0,BoAt Rockerz 235v2 with ASAP charging Version ...,1-more flexible2-bass is very high3-sound clar...,Shirala,8 months ago,1390,276
1,BoAt Rockerz 235v2 with ASAP charging Version ...,Super sound and good looking I like that prize,Visakhapatnam,8 months ago,643,133
2,BoAt Rockerz 235v2 with ASAP charging Version ...,Very much satisfied with the device at this pr...,Kozhikode,"Feb, 2020",1449,328
3,BoAt Rockerz 235v2 with ASAP charging Version ...,"Nice headphone, bass was very good and sound i...",Jaora,7 months ago,160,28
4,BoAt Rockerz 235v2 with ASAP charging Version ...,Sound quality super battery backup super quali...,New Delhi,8 months ago,533,114
5,BoAt Rockerz 235v2 with ASAP charging Version ...,"Wowwww it's amezing bluetooth nice look, nice ...",Bengaluru,8 months ago,172,37
6,BoAt Rockerz 235v2 with ASAP charging Version ...,Awesome colour! Amazing experience .. but only...,Robertsonpet,8 months ago,206,46
7,BoAt Rockerz 235v2 with ASAP charging Version ...,"For the first time, I am posting a review, jus...",Bhadreswar,"Feb, 2020",616,182
8,BoAt Rockerz 235v2 with ASAP charging Version ...,First of all the Delivery boy is a good guy. N...,Tirupati,"Mar, 2020",232,66
9,BoAt Rockerz 235v2 with ASAP charging Version ...,This headphone is good but not that much as i ...,Firozabad District,8 months ago,265,83


In [10]:
# checking the distribution of product titles
df["product_title"].value_counts()

product_title
BoAt BassHeads 100 Wired Headset                                       968
realme Buds 2 Wired Headset                                            954
OnePlus Bullets Wireless Z Bluetooth Headset                           939
realme Buds Wireless Bluetooth Headset                                 936
BoAt Rockerz 235v2 with ASAP charging Version 5.0 Bluetooth Headset    880
BoAt Airdopes 131 Bluetooth Headset                                    710
OnePlus Bullets Wireless Z Bass Edition Bluetooth Headset              703
realme Buds Q Bluetooth Headset                                        660
U&I Titanic Series - Low Price Bluetooth Neckband Bluetooth Headset    583
Name: count, dtype: int64

In [11]:
# renaming product titles for better readability
df["product_title"].replace({
	'BoAt Rockerz 235v2 with ASAP charging Version 5.0 Bluetooth Headset': 'BoAt_Rockerz_235v2',
	'BoAt BassHeads 100 Wired Headset': 'BoAt_BassHeads_100_Wired',
    'BoAt Airdopes 131 Bluetooth Headset': 'BoAt_Airdopes_131',
    'OnePlus Bullets Wireless Z Bluetooth Headset': 'OnePlus_Bullets_Wireless_Z',
    'OnePlus Bullets Wireless Z Bass Edition Bluetooth Headset': 'OnePlus_Bullets_Wireless_Z_Bass_Edition',
    'realme Buds Wireless Bluetooth Headset': 'realme_Buds_Wireless',
    'realme Buds 2 Wired Headset': 'realme_Buds_2',
    'realme Buds Q Bluetooth Headset': 'realme_Buds_Q',
    'U&I Titanic Series - Low Price Bluetooth Neckband Bluetooth Headset': 'U&I_Titanic_Series',
}, inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df["product_title"].replace({


In [12]:
# checking the distribution of product titles
df["product_title"].value_counts()

product_title
BoAt_BassHeads_100_Wired                   968
realme_Buds_2                              954
OnePlus_Bullets_Wireless_Z                 939
realme_Buds_Wireless                       936
BoAt_Rockerz_235v2                         880
BoAt_Airdopes_131                          710
OnePlus_Bullets_Wireless_Z_Bass_Edition    703
realme_Buds_Q                              660
U&I_Titanic_Series                         583
Name: count, dtype: int64

In [13]:
df.shape

(7333, 6)

In [14]:
# Data Cleaning and Preprocessing for Aspect Extraction & Sentiment Classification

import pandas as pd
import re

def clean_review(text):
    """
    Clean and preprocess review text for aspect extraction & sentiment classification.
    """
    if not isinstance(text, str):  # handle NaN or non-string values
        return ""

    # Lowercase
    text = text.lower()

    # Remove URLs
    text = re.sub(r"http\S+|www\S+|https\S+", "", text, flags=re.MULTILINE)

    # Remove HTML tags
    text = re.sub(r"<.*?>", "", text)

    # Replace numbers with space
    text = re.sub(r"\d+", " ", text)

    # Replace hyphens and underscores with space
    text = re.sub(r"[-_]", " ", text)

    # Remove any character that is not a letter or space
    text = re.sub(r"[^a-z\s]", " ", text)

    # Collapse multiple spaces into one & strip
    text = re.sub(r"\s+", " ", text).strip()

    return text


def preprocess_reviews(df, column_name="review"):
    """
    Takes a dataframe and returns it with a new column 'cleaned_review'.
    """
    df["cleaned_review"] = df[column_name].apply(clean_review)
    return df


# Example usage
if __name__ == "__main__":

    df = preprocess_reviews(df, "review")
    print(df)

                 product_title  \
0           BoAt_Rockerz_235v2   
1           BoAt_Rockerz_235v2   
2           BoAt_Rockerz_235v2   
3           BoAt_Rockerz_235v2   
4           BoAt_Rockerz_235v2   
...                        ...   
9369  BoAt_BassHeads_100_Wired   
9370  BoAt_BassHeads_100_Wired   
9371  BoAt_BassHeads_100_Wired   
9372  BoAt_BassHeads_100_Wired   
9373  BoAt_BassHeads_100_Wired   

                                                 review         location  \
0     1-more flexible2-bass is very high3-sound clar...          Shirala   
1        Super sound and good looking I like that prize    Visakhapatnam   
2     Very much satisfied with the device at this pr...        Kozhikode   
3     Nice headphone, bass was very good and sound i...            Jaora   
4     Sound quality super battery backup super quali...        New Delhi   
...                                                 ...              ...   
9369  this head phnes give good base in pluged ears ...    

# **Aspect_Extraction_Sentiment_Classification_Pipeline**

In [15]:
#%pip install pyabsa
import pandas as pd
import os
import re
import json
from transformers import pipeline, AutoTokenizer
from pyabsa import AspectTermExtraction as ATEPC


def aspect_sentiment_pipeline(
    input_data,
    csv_column='review_text',
    output_file='aspect_sentiment_results.json',
    max_chunk_length=256
):
    """
    Perform aspect extraction and sentiment classification on reviews from a CSV file or list.
    Optimized for GPU (CUDA): removes manual batch loop but keeps review chunking.
    """

    # ---- PATCH: Load tokenizer and add missing bos/eos tokens ----
    tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-base")
    if tokenizer.bos_token is None:
        tokenizer.add_special_tokens({"bos_token": "[CLS]"})
    if tokenizer.eos_token is None:
        tokenizer.add_special_tokens({"eos_token": "[SEP]"})
    # --------------------------------------------------------------

    # Load aspect extractor with CUDA
    aspect_extractor = ATEPC.AspectExtractor(
        "english",
        auto_device=True,   # will auto-select GPU if available
        tokenizer=tokenizer,
        sentiment_model="cardiffnlp/twitter-roberta-base-sentiment"
    )

    # Standalone fallback sentiment classifier (CUDA enabled)
    sentiment_classifier = pipeline(
        "text-classification",
        model="cardiffnlp/twitter-roberta-base-sentiment",
        device=0  # force GPU if available
    )

    label_map = {
        "LABEL_0": "Negative",
        "LABEL_1": "Neutral",
        "LABEL_2": "Positive"
    }

    # Handle input: CSV or list
    if isinstance(input_data, str):
        if not os.path.exists(input_data):
            raise FileNotFoundError(f"File '{input_data}' does not exist")
        df = pd.read_csv(input_data)
        if csv_column not in df.columns:
            raise ValueError(f"CSV file does not contain '{csv_column}' column. Available: {list(df.columns)}")
        reviews = df[csv_column].astype(str).tolist()
    else:
        reviews = [str(r) for r in input_data if isinstance(r, str)]
        if not reviews:
            raise ValueError("No valid string reviews provided.")

    # ---- Preprocess reviews (keep chunks for long reviews) ----
    def split_review(review):
        review = re.sub(r'[^\x00-\x7F]+', ' ', review)
        sentences = re.split(r'(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?|\!)\s', review)
        chunks, current_chunk = [], ""
        for sentence in sentences:
            if len(current_chunk) + len(sentence) > max_chunk_length:
                if current_chunk:
                    chunks.append(current_chunk.strip())
                current_chunk = sentence
            else:
                current_chunk += " " + sentence
        if current_chunk:
            chunks.append(current_chunk.strip())
        return chunks if chunks else [review]

    processed_reviews, original_to_chunks_map = [], []
    for review in reviews:
        chunks = split_review(review) if len(review) > max_chunk_length * 2 else [review]
        processed_reviews.extend(chunks)
        original_to_chunks_map.append((len(chunks), review))

    # ---- GPU-parallel aspect extraction (no batch loop) ----
    aspect_results = aspect_extractor.predict(
        processed_reviews,
        print_result=False,
        pred_sentiment=True
    )

    # ---- Aggregate results per original review ----
    final_results, chunk_index, empty_aspect_count = [], 0, 0
    for num_chunks, original_review in original_to_chunks_map:
        aspect_to_sentiment = {}
        for _ in range(num_chunks):
            if chunk_index < len(aspect_results):
                chunk_result = aspect_results[chunk_index]
                aspects = chunk_result.get('aspect', [])
                sentiments = chunk_result.get('sentiment', [])
                if not aspects:
                    empty_aspect_count += 1
                for a, s in zip(aspects, sentiments):
                    aspect_to_sentiment[a] = label_map.get(s, s)
                chunk_index += 1

        aggregated_aspects = list(aspect_to_sentiment.keys())
        aggregated_sentiments = [aspect_to_sentiment[a] for a in aggregated_aspects]

        # Fallback: no aspects found → classify overall sentiment
        if not aggregated_aspects:
            aggregated_aspects = ["overall product"]
            try:
                result = sentiment_classifier(original_review, text_pair="overall product")
                aggregated_sentiments = [label_map.get(result[0]['label'], "Neutral")]
            except Exception:
                aggregated_sentiments = ["Neutral"]
            empty_aspect_count += 1

        final_results.append({
            "review": original_review,
            "aspects": aggregated_aspects,
            "sentiments": aggregated_sentiments
        })

    # ---- Save results ----
    with open(output_file, 'w') as f:
        json.dump(final_results, f, indent=2)

    return final_results

  from .autonotebook import tqdm as notebook_tqdm
  from click.parser import split_arg_string
  from click.parser import split_arg_string


[2025-09-07 22:51:53] (2.4.2) PyABSA(2.4.2): If your code crashes on Colab, please use the GPU runtime. Then run "pip install pyabsa[dev] -U" and restart the kernel.
Or if it does not work, you can use v1.x versions, e.g., pip install pyabsa<2.0 -U




Try to downgrade transformers<=4.29.0.






Processing the Cleaned DataFrame with aspect extraction and sentiment classification


In [16]:
# Import the aspect_sentiment_pipeline function from your other Python file(optional)
# Adjust the import statement based on your file structure  
def process_reviews_to_dataframe(input_df, category_column='category', review_column='review_text'):
    """
    Process a DataFrame with product categories and reviews to extract aspects and sentiments.
    Returns a new DataFrame with category, review_text, aspects, and sentiments.
    """


    # Extract aspects and sentiments using the pipeline
    reviews = input_df[review_column].tolist()
    results = aspect_sentiment_pipeline(reviews, csv_column=review_column)

    # Create output DataFrame
    output_data = []
    for i, (row, result) in enumerate(zip(input_df.itertuples(), results)):
        if result["review"] != str(getattr(row, review_column)):
            print(f"Mismatch at index {i}: Expected review '{getattr(row, review_column)}', got '{result['review']}'")
        output_data.append({
            category_column: getattr(row, category_column),
            review_column: result["review"],
            "aspects": result["aspects"],
            "sentiments": result["sentiments"]
        })

    output_df = pd.DataFrame(output_data)
    print(f"Created output DataFrame with {len(output_df)} rows")

    # Save to CSV
    '''output_file = "processed_reviews_output.csv"
    try:
        output_df.to_csv(output_file, index=False)
        print(f"Output DataFrame saved to {output_file}")
    except Exception as e:
        print(f"Error saving output DataFrame to CSV: {e}")'''''

    return output_df

# Example usage
if __name__ == "__main__":


    # Process the DataFrame
    output_df = process_reviews_to_dataframe(df, category_column="product_title", review_column="cleaned_review")
    print("\nOutput DataFrame (first 5 rows):")
    print(output_df.head())



[2025-09-07 22:53:06] (2.4.2) ********** Available ATEPC model checkpoints for Version:2.4.2 (this version) **********
[2025-09-07 22:53:06] (2.4.2) ********** Available ATEPC model checkpoints for Version:2.4.2 (this version) **********
[2025-09-07 22:53:06] (2.4.2) Downloading checkpoint:english 
[2025-09-07 22:53:06] (2.4.2) Notice: The pretrained model are used for testing, it is recommended to train the model on your own custom datasets
[2025-09-07 22:53:06] (2.4.2) Checkpoint already downloaded, skip
[2025-09-07 22:53:06] (2.4.2) Load aspect extractor from checkpoints\ATEPC_ENGLISH_CHECKPOINT\fast_lcf_atepc_English_cdw_apcacc_82.36_apcf1_81.89_atef1_75.43
[2025-09-07 22:53:06] (2.4.2) config: checkpoints\ATEPC_ENGLISH_CHECKPOINT\fast_lcf_atepc_English_cdw_apcacc_82.36_apcf1_81.89_atef1_75.43\fast_lcf_atepc.config
[2025-09-07 22:53:06] (2.4.2) state_dict: checkpoints\ATEPC_ENGLISH_CHECKPOINT\fast_lcf_atepc_English_cdw_apcacc_82.36_apcf1_81.89_atef1_75.43\fast_lcf_atepc.state_dict


Device set to use cuda:0
preparing ate inference dataloader: 100%|██████████| 7333/7333 [00:07<00:00, 958.95it/s] 
extracting aspect terms: 100%|██████████| 230/230 [13:48<00:00,  3.60s/it]   
preparing apc inference dataloader: 100%|██████████| 14010/14010 [00:28<00:00, 488.17it/s]
  lcf_cdm_vec = torch.tensor(
classifying aspect sentiments: 100%|██████████| 438/438 [26:49<00:00,  3.67s/it]  


[2025-09-07 23:34:47] (2.4.2) The results of aspect term extraction have been saved in c:\P_Project_1\ABSA\Aspect Term Extraction and Polarity Classification.FAST_LCF_ATEPC.result.json


You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


Created output DataFrame with 7333 rows

Output DataFrame (first 5 rows):
        product_title                                     cleaned_review  \
0  BoAt_Rockerz_235v2  more flexible bass is very high sound clarity ...   
1  BoAt_Rockerz_235v2     super sound and good looking i like that prize   
2  BoAt_Rockerz_235v2  very much satisfied with the device at this pr...   
3  BoAt_Rockerz_235v2  nice headphone bass was very good and sound is...   
4  BoAt_Rockerz_235v2  sound quality super battery backup super quali...   

                                             aspects  \
0  [bass, sound clarity, battery back, charging s...   
1                                   [sound, looking]   
2               [design, bluetooth, vibration motor]   
3                      [bass, sound, battery backup]   
4                    [sound quality, battery backup]   

                                          sentiments  
0  [Negative, Positive, Positive, Positive, Posit...  
1                     

In [17]:
# Displaying the first 10 rows of the output DataFrame
output_df.head(10)

Unnamed: 0,product_title,cleaned_review,aspects,sentiments
0,BoAt_Rockerz_235v2,more flexible bass is very high sound clarity ...,"[bass, sound clarity, battery back, charging s...","[Negative, Positive, Positive, Positive, Posit..."
1,BoAt_Rockerz_235v2,super sound and good looking i like that prize,"[sound, looking]","[Positive, Positive]"
2,BoAt_Rockerz_235v2,very much satisfied with the device at this pr...,"[design, bluetooth, vibration motor]","[Positive, Positive, Positive]"
3,BoAt_Rockerz_235v2,nice headphone bass was very good and sound is...,"[bass, sound, battery backup]","[Positive, Positive, Positive]"
4,BoAt_Rockerz_235v2,sound quality super battery backup super quali...,"[sound quality, battery backup]","[Positive, Positive]"
5,BoAt_Rockerz_235v2,wo it s amezing bluetooth nice look nice price...,"[bluetooth, look, price, battery back up]","[Positive, Positive, Positive, Positive]"
6,BoAt_Rockerz_235v2,awesome colour amazing experience but only the...,"[colour, charging data cable, adjustable clip]","[Positive, Negative, Negative]"
7,BoAt_Rockerz_235v2,for the first time i am posting a review just ...,"[engineering, bass, comfort]","[Positive, Negative, Positive]"
8,BoAt_Rockerz_235v2,first of all the delivery boy is a good guy ni...,"[delivery, responce, packing, color]","[Positive, Positive, Positive, Negative]"
9,BoAt_Rockerz_235v2,this headphone is good but not that much as i ...,"[battery backup, sound quality, pubg]","[Positive, Positive, Negative]"


In [18]:
# Adding a new column with aspect-sentiment dictionary for easier analysis
output_df["aspect_sentiment_dict"] = output_df.apply(
    lambda row: dict(zip(row["aspects"], row["sentiments"])), axis=1
)

output_df.head(10)

Unnamed: 0,product_title,cleaned_review,aspects,sentiments,aspect_sentiment_dict
0,BoAt_Rockerz_235v2,more flexible bass is very high sound clarity ...,"[bass, sound clarity, battery back, charging s...","[Negative, Positive, Positive, Positive, Posit...","{'bass': 'Negative', 'sound clarity': 'Positiv..."
1,BoAt_Rockerz_235v2,super sound and good looking i like that prize,"[sound, looking]","[Positive, Positive]","{'sound': 'Positive', 'looking': 'Positive'}"
2,BoAt_Rockerz_235v2,very much satisfied with the device at this pr...,"[design, bluetooth, vibration motor]","[Positive, Positive, Positive]","{'design': 'Positive', 'bluetooth': 'Positive'..."
3,BoAt_Rockerz_235v2,nice headphone bass was very good and sound is...,"[bass, sound, battery backup]","[Positive, Positive, Positive]","{'bass': 'Positive', 'sound': 'Positive', 'bat..."
4,BoAt_Rockerz_235v2,sound quality super battery backup super quali...,"[sound quality, battery backup]","[Positive, Positive]","{'sound quality': 'Positive', 'battery backup'..."
5,BoAt_Rockerz_235v2,wo it s amezing bluetooth nice look nice price...,"[bluetooth, look, price, battery back up]","[Positive, Positive, Positive, Positive]","{'bluetooth': 'Positive', 'look': 'Positive', ..."
6,BoAt_Rockerz_235v2,awesome colour amazing experience but only the...,"[colour, charging data cable, adjustable clip]","[Positive, Negative, Negative]","{'colour': 'Positive', 'charging data cable': ..."
7,BoAt_Rockerz_235v2,for the first time i am posting a review just ...,"[engineering, bass, comfort]","[Positive, Negative, Positive]","{'engineering': 'Positive', 'bass': 'Negative'..."
8,BoAt_Rockerz_235v2,first of all the delivery boy is a good guy ni...,"[delivery, responce, packing, color]","[Positive, Positive, Positive, Negative]","{'delivery': 'Positive', 'responce': 'Positive..."
9,BoAt_Rockerz_235v2,this headphone is good but not that much as i ...,"[battery backup, sound quality, pubg]","[Positive, Positive, Negative]","{'battery backup': 'Positive', 'sound quality'..."


In [19]:
# Ensure the indices are aligned before merging
df_reset = df.reset_index(drop=True)
output_df_reset = output_df.reset_index(drop=True)

# Add 'review' and 'location' from the original df to output_df
output_df_with_original = output_df_reset.copy()
output_df_with_original['original_review'] = df_reset['review']
output_df_with_original['location'] = df_reset['location']
output_df_with_original['date'] = df_reset['date']
output_df_with_original['upvotes'] = df_reset['upvotes']
output_df_with_original['downvotes'] = df_reset['downvotes']


# Display the head of the new dataframe
display(output_df_with_original.head())

Unnamed: 0,product_title,cleaned_review,aspects,sentiments,aspect_sentiment_dict,original_review,location,date,upvotes,downvotes
0,BoAt_Rockerz_235v2,more flexible bass is very high sound clarity ...,"[bass, sound clarity, battery back, charging s...","[Negative, Positive, Positive, Positive, Posit...","{'bass': 'Negative', 'sound clarity': 'Positiv...",1-more flexible2-bass is very high3-sound clar...,Shirala,8 months ago,1390,276
1,BoAt_Rockerz_235v2,super sound and good looking i like that prize,"[sound, looking]","[Positive, Positive]","{'sound': 'Positive', 'looking': 'Positive'}",Super sound and good looking I like that prize,Visakhapatnam,8 months ago,643,133
2,BoAt_Rockerz_235v2,very much satisfied with the device at this pr...,"[design, bluetooth, vibration motor]","[Positive, Positive, Positive]","{'design': 'Positive', 'bluetooth': 'Positive'...",Very much satisfied with the device at this pr...,Kozhikode,"Feb, 2020",1449,328
3,BoAt_Rockerz_235v2,nice headphone bass was very good and sound is...,"[bass, sound, battery backup]","[Positive, Positive, Positive]","{'bass': 'Positive', 'sound': 'Positive', 'bat...","Nice headphone, bass was very good and sound i...",Jaora,7 months ago,160,28
4,BoAt_Rockerz_235v2,sound quality super battery backup super quali...,"[sound quality, battery backup]","[Positive, Positive]","{'sound quality': 'Positive', 'battery backup'...",Sound quality super battery backup super quali...,New Delhi,8 months ago,533,114


In [20]:
## Final shape of the dataframe
output_df_with_original.shape

(7333, 10)

In [21]:
# Save the final DataFrame to a CSV file
output_df_with_original.to_csv("final_dataset_reviews.csv", index=False)