### Group ID:
### Group Members Name with Student ID:
1. Student 1
2. Student 2
3. Student 3
4. Student 4


In [1]:
# Import Libraries

import pandas as pd
import spacy
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction import DictVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report
from spacy import displacy
from imblearn.over_sampling import SMOTE

## Dataset Loading and Basic Inspection

In [2]:
try:
    df = pd.read_csv("/kaggle/input/dataset/Dataset2.csv")
    print("Dataset loaded successfully.")
    print(df.head())  # Display the first few rows
except FileNotFoundError:
    print("Error: reviews.csv not found. Please ensure the file is in the correct directory.")


Dataset loaded successfully.
                                        product_name  product_price  Rate  \
0  MAHARAJA WHITELINE 65 L Desert Air Cooler?????...           7999     5   
1  MAHARAJA WHITELINE 65 L Desert Air Cooler?????...           7999     5   
2  MAHARAJA WHITELINE 65 L Desert Air Cooler?????...           7999     3   
3  MAHARAJA WHITELINE 65 L Desert Air Cooler?????...           7999     5   
4  MAHARAJA WHITELINE 65 L Desert Air Cooler?????...           7999     1   

              Review Sentiment  
0    i am very happy  positive  
1               nice  positive  
2     air throw weak  negative  
3              super  positive  
4  too much big size   neutral  


## Data Preprocessing Function

In [3]:
def preprocess_data(df):
    """
    Preprocesses the dataset:
    - Removes unnecessary columns.
    - Converts text to lowercase.
    - Handles missing values.
    """
    df = df[['Review', 'Sentiment']].copy()  # Explicitly create a copy
    df.loc[:, 'Review'] = df['Review'].str.lower() # use loc to modify the review column
    df.dropna(inplace=True)
    return df

## Apply Preprocessing

In [4]:
if 'df' in locals():
    df = preprocess_data(df.copy())
    print("\nPreprocessed Data:")
    print(df.head())



Preprocessed Data:
              Review Sentiment
0    i am very happy  positive
1               nice  positive
2     air throw weak  negative
3              super  positive
4  too much big size   neutral


## Load spaCy Model

In [5]:
if 'df' in locals():
    nlp = spacy.load("en_core_web_sm")
    print("\nspaCy model loaded.")



spaCy model loaded.


## Dependency Tree Generation Function

In [6]:
def generate_dependency_tree(text):
    """
    Generates a dependency tree for a given text.
    """
    doc = nlp(text)
    return doc


## Sample Review Selection and Dependency Parsing

In [7]:
if 'df' in locals():
    sample_reviews = df['Review'].sample(3).tolist()
    sample_docs = [generate_dependency_tree(review) for review in sample_reviews]
    print("\nSample Reviews and Dependency Trees Generated.")
    for i, review in enumerate(sample_reviews):
        print(f"Sample Review {i+1}: {review}")



Sample Reviews and Dependency Trees Generated.
Sample Review 1: excellent air cooling the water will last you easily for two nights n the cooling gets real effective if u place it near a window n yes just make sure you change ur water every week so to keep it clean n free from any smell its size is similar to ur 16 l fridge but much slimer n yeah thats all for now but will update my review soon after quarter year
Sample Review 2: very good product in low price its our second plastic cooler and i am happy with its cooling and other features in future i definitely buy it again and recommend to others also thanks flipkart
Sample Review 3: nice cooler


## Feature Extraction Function

In [8]:
def extract_features(doc):
    """
    Extracts syntactic features from a dependency tree.
    """
    features = []
    for token in doc:
        features.append({
            'token': token.text,
            'pos': token.pos_,
            'dep': token.dep_,
            'head': token.head.text,
            'head_pos': token.head.pos_,
        })
    return features

## Extract Sentiment Patterns Function

In [9]:
def extract_sentiment_patterns(doc):
    """
    Extracts sentiment patterns from dependency trees. (example)
    """
    patterns = []
    for token in doc:
        if token.dep_ == "amod" and token.head.pos_ == "NOUN": #adjective modifier of noun
            patterns.append(f"{token.text}_{token.head.text}")
        if token.dep_ == "neg" and token.head.pos_ == "ADJ": #negation of adjective
            patterns.append(f"{token.text}_{token.head.text}")
    return patterns

## Feature Extraction for All Reviews

In [10]:
if 'df' in locals():
    all_review_features = []
    for review in df['Review']:
        doc = nlp(review)
        all_review_features.append(extract_features(doc))

## Vectorization

In [11]:
if 'df' in locals():
    flat_features = []
    for review_feature in all_review_features:
        for feature in review_feature:
            flat_features.append(feature)

    vectorizer = DictVectorizer()
    X = vectorizer.fit_transform(flat_features).tocsr()

    review_feature_lists = []
    current_review_features = []
    for review_feature in all_review_features:
        review_feature_lists.append(review_feature)

    review_vectorized_features = []
    index = 0
    for review_feature_list in review_feature_lists:
        review_vector = []
        for feature in review_feature_list:
            review_vector.append(flat_features[index])
            index+=1
        review_vectorized_features.append(review_vector)

    review_average_vectors = []
    for review_feature_list in review_vectorized_features:
        review_average_vector = X[index-len(review_feature_list):index].mean(axis=0)
        review_average_vectors.append(review_average_vector.A) # Convert to numpy array directly

    X_dense = [review_average_vector[0] for review_average_vector in review_average_vectors]

    y = df['Sentiment']
    print("\nFeatures Vectorized.")


Features Vectorized.


## Train-Test Split and Model Training

In [12]:
if 'df' in locals():
    X_train, X_test, y_train, y_test = train_test_split(X_dense, y, test_size=0.2, random_state=42)

    model = MultinomialNB()
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

    print("\nModel Trained.")


Model Trained.


## Model Evaluation

In [13]:
if 'df' in locals():
    X_train, X_test, y_train, y_test = train_test_split(X_dense, y, test_size=0.2, random_state=42)

    smote = SMOTE(random_state=42)
    X_train_resampled, y_train_resampled = smote.fit_resample(X_train, y_train)

    model = MultinomialNB()
    model.fit(X_train_resampled, y_train_resampled)
    y_pred = model.predict(X_test)

    print("Accuracy:", accuracy_score(y_test, y_pred))
    print(classification_report(y_test, y_pred))

Accuracy: 0.1039426523297491
              precision    recall  f1-score   support

    negative       0.11      0.48      0.18        27
     neutral       0.01      0.67      0.03         3
    positive       0.78      0.06      0.10       249

    accuracy                           0.10       279
   macro avg       0.30      0.40      0.10       279
weighted avg       0.71      0.10      0.11       279



## Visualization

In [14]:
if 'df' in locals():
    for i, doc in enumerate(sample_docs):
        print(f"\nReview: {sample_reviews[i]}")
        print(f"Sentiment: {df[df['Review'] == sample_reviews[i]]['Sentiment'].values[0]}")
        displacy.render(doc, style="dep", jupyter=True, options={'distance': 90})



Review: excellent air cooling the water will last you easily for two nights n the cooling gets real effective if u place it near a window n yes just make sure you change ur water every week so to keep it clean n free from any smell its size is similar to ur 16 l fridge but much slimer n yeah thats all for now but will update my review soon after quarter year
Sentiment: positive



Review: very good product in low price its our second plastic cooler and i am happy with its cooling and other features in future i definitely buy it again and recommend to others also thanks flipkart
Sentiment: positive



Review: nice cooler
Sentiment: positive


## Analysis and Reporting

In [15]:
if 'df' in locals():
    print("""
    \nAnalysis & Reporting:
    Methodology:
    1. Data preprocessing: Lowercasing, removing unnecessary columns, and handling missing values.
    2. Dependency parsing: Used spaCy to generate dependency trees for reviews.
    3. Feature extraction: Extracted syntactic features like token text, POS tags, dependencies, and head words.
    4. Sentiment classification: Trained a Multinomial Naive Bayes model using the extracted features.
    5. Visualization: Displayed dependency trees for sample reviews with sentiment labels.

    Challenges:
    - Sparse feature matrices from dependency parsing.
    - Balancing feature granularity (too specific vs. too general).
    - Accurately capturing complex sentiment expressions.
    - Model performance depends heavily on the quality of parsed dependencies.
    - Creating a single vector for each review.

    Key Insights:
    - Dependency parsing provides valuable syntactic information for sentiment analysis.
    - Relationships between words (e.g., adjective-noun) can indicate sentiment.
    - The choice of features and model significantly impacts performance.
    - More advanced dependency parsing and feature engineering could improve results.
    - Averaging word vector representations for each review improved training.
    """)


    
Analysis & Reporting:
    Methodology:
    1. Data preprocessing: Lowercasing, removing unnecessary columns, and handling missing values.
    2. Dependency parsing: Used spaCy to generate dependency trees for reviews.
    3. Feature extraction: Extracted syntactic features like token text, POS tags, dependencies, and head words.
    4. Sentiment classification: Trained a Multinomial Naive Bayes model using the extracted features.
    5. Visualization: Displayed dependency trees for sample reviews with sentiment labels.

    Challenges:
    - Sparse feature matrices from dependency parsing.
    - Balancing feature granularity (too specific vs. too general).
    - Accurately capturing complex sentiment expressions.
    - Model performance depends heavily on the quality of parsed dependencies.
    - Creating a single vector for each review.

    Key Insights:
    - Dependency parsing provides valuable syntactic information for sentiment analysis.
    - Relationships between words (e