# Final Project Applying NLP

## Project Description: 
Sentiment analysis: the sentiment of the textual data will be analyzed and classified into at least 3 classes.

### 1. Data Collection 

#### 1.1 Collect a dataset of product reviews

- Source data: https://www.kaggle.com/code/mehmetisik/rating-product-sorting-reviews-in-amazon/input

#### 1.2 Annotate the dataset

- With labels of positive, negative or neutral sentiment, based on
collected data

In [4]:
import pandas as pd

# Load the CSV file
file_path = "./data/reviews_aws/amazon_review.csv"
try:
    data = pd.read_csv(file_path)
except FileNotFoundError:
    print(f"Error: File not found at {file_path}")
    exit()

# Verify that the 'overall' column exists and is numeric
if 'overall' not in data.columns:
    print("Error: 'overall' column is missing in the dataset.")
    exit()
if not pd.api.types.is_numeric_dtype(data['overall']):
    print("Error: 'overall' column must contain numeric data.")
    exit()

# Function to label sentiment
def label_sentiment(overall):
    if overall >= 4:
        return "Positive"
    elif overall == 3:
        return "Neutral"
    else:
        return "Negative"

# Add sentiment labels
data['sentiment'] = data['overall'].apply(label_sentiment)

# Save the updated file
output_path = "./data/reviews_aws/amazon_review_labeled.csv"
data.to_csv(output_path, index=False)

# Print confirmation
print(f"Labeled file saved at: {output_path}")
print(data.head())  # Display the first few rows of the updated dataset



Labeled file saved at: ./data/reviews_aws/amazon_review_labeled.csv
       reviewerID        asin  reviewerName helpful  \
0  A3SBTW3WS4IQSN  B007WTAJTO           NaN  [0, 0]   
1  A18K1ODH1I2MVB  B007WTAJTO          0mie  [0, 0]   
2  A2FII3I2MBMUIA  B007WTAJTO           1K3  [0, 0]   
3   A3H99DFEG68SR  B007WTAJTO           1m2  [0, 0]   
4  A375ZM4U047O79  B007WTAJTO  2&amp;1/2Men  [0, 0]   

                                          reviewText  overall  \
0                                         No issues.      4.0   
1  Purchased this for my device, it worked as adv...      5.0   
2  it works as expected. I should have sprung for...      4.0   
3  This think has worked out great.Had a diff. br...      5.0   
4  Bought it with Retail Packaging, arrived legit...      5.0   

                                  summary  unixReviewTime  reviewTime  \
0                              Four Stars      1406073600  2014-07-23   
1                           MOAR SPACE!!!      1382659200  2013-