# Ulta Beauty Sentiment Analysis

This Jupyter Notebook focuses on sentiment analysis for Ulta Beauty reviews. The dataset comprises reviews from both Yelp and Google, and the goal is to assess the sentiment of each review. The sentiment analysis is performed using a pre-trained transformer-based model from the Hugging Face model hub.

## Libraries and Tools Used

- **Pandas:** Used for efficient data manipulation and handling.
- **transformers:** Leveraged for accessing a pre-trained sentiment analysis model.
- **torch:** Employed for deep learning operations, particularly in applying the sentiment analysis model.

## Workflow Overview

1. **Data Loading:** Yelp and Google datasets are loaded and combined.
2. **Data Preprocessing:** Unnecessary columns are dropped, and data is cleaned to ensure only valid text reviews are considered.
3. **Sentiment Analysis Model:** A pre-trained sentiment analysis model is loaded using the transformers library.
4. **Sentiment Analysis Application:** The sentiment analysis model is applied to each review, and sentiment labels are assigned.
5. **Results Storage:** The dataset is updated with sentiment labels and saved to a new CSV file.

Feel free to explore the code for a detailed understanding of each step in the sentiment analysis process.


In [8]:
import pandas as pd
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

In [9]:
Y_ulta_beauty = pd.read_csv('..\datasets\csv\Y_ulta_beauty.csv')
G_ulta_beauty = pd.read_csv('..\datasets\csv\G_ulta_beauty.csv')
G_ulta_beauty.drop(columns=['Unnamed: 0'],inplace=True)

In [10]:
ulta_beauty = pd.concat([G_ulta_beauty, Y_ulta_beauty])

In [16]:
# Drop rows where 'text' is not a valid string
ulta_beauty = ulta_beauty.dropna(subset=['text'])
ulta_beauty = ulta_beauty[ulta_beauty['text'].apply(lambda x: isinstance(x, str))]

# Load the pre-trained model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("Kaludi/Reviews-Sentiment-Analysis", use_auth_token=False)
tokenizer = AutoTokenizer.from_pretrained("Kaludi/Reviews-Sentiment-Analysis", use_auth_token=False)

# Function to apply the model to each review
def analyze_sentiment(review):
    inputs = tokenizer(review, return_tensors="pt")
    outputs = model(**inputs)
    predicted_label = torch.argmax(outputs.logits, dim=1).item()
    return predicted_label

# Apply the function to the 'text' column of ulta_beauty
ulta_beauty['sentiment'] = ulta_beauty['text'].apply(lambda x: analyze_sentiment(x))




In [21]:
ulta_beauty.to_csv('..\datasets\csv\sentiment_analysis.csv')