<a href="https://colab.research.google.com/github/AleksandraOD/Assignments/blob/main/models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [40]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import SVC
from sklearn.metrics import classification_report
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import LabelEncoder
from sklearn.svm import LinearSVC
from sklearn.pipeline import Pipeline

from textblob import TextBlob

In [33]:
file_path = '/content/drive/MyDrive/MIBA/вкр/aspects_dataset.csv'
data = pd.read_csv(file_path)

In [34]:
def map_sentiment(rating):
    if rating >= 4:
        return 'Positive'
    elif rating == 3:
        return 'Neutral'
    else:
        return 'Negative'

In [35]:
data['Sentiment'] = data['rating'].apply(map_sentiment)

In [36]:
data['processed_review_text'] = data['processed_review_text'].fillna('')

In [37]:
data.head(5)

Unnamed: 0,app_name,app_id,rating,review_text,processed_review_text,aspect_sentiments,Sentiment
0,cleaner_money_manager,ru.innim.my_finance,5,Literally changed my life. So easy and fun to ...,liter chang life easi fun track financ app ent...,"{'track': ['positive'], 'expens': ['positive']...",Positive
1,cleaner_money_manager,ru.innim.my_finance,4,It seems that the categories are shown in the ...,seem categori shown diagram without appar orde...,"{'diagram': ['positive'], 'order': ['positive'...",Positive
2,cleaner_money_manager,ru.innim.my_finance,4,"This app is versatile, customisable and easy t...",app versatil customis easi use coupl featur wo...,"{'coupl': ['positive'], 'order': ['positive'],...",Positive
3,cleaner_money_manager,ru.innim.my_finance,5,Amazing! I've tried various budget apps and no...,amaz ive tri variou budget app none convinc on...,"{'variou': ['neutral'], 'budget': ['neutral'],...",Positive
4,cleaner_money_manager,ru.innim.my_finance,3,I'm on the fence about this budget app. It has...,im fenc budget app realli nice pro enjoy con p...,"{'enjoy': ['positive'], 'sync': ['positive'], ...",Neutral


# Implementing sentiment analysis using SVM

In [66]:
X_train1, X_test1, y_train1, y_test1 = train_test_split(data['processed_review_text'], data['Sentiment'], test_size=0.2, random_state=42)

In [67]:
pipeline = Pipeline([
    ('tfidf', TfidfVectorizer(stop_words='english')),
    ('svm', LinearSVC(random_state=42))
])

In [68]:
pipeline.fit(X_train1, y_train1)


In [70]:
predictions = pipeline.predict(X_test1)

In [72]:
classification_report(y_test1, predictions)

'              precision    recall  f1-score   support\n\n    Negative       0.66      0.64      0.65        95\n     Neutral       0.33      0.11      0.17        35\n    Positive       0.89      0.94      0.91       466\n\n    accuracy                           0.84       596\n   macro avg       0.63      0.56      0.58       596\nweighted avg       0.82      0.84      0.83       596\n'

In [74]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, roc_auc_score, roc_curve
import matplotlib.pyplot as plt


accuracy = accuracy_score(y_test1, predictions)
print("Accuracy:", accuracy)

conf_matrix = confusion_matrix(y_test1, predictions)
print("Confusion Matrix:\n", conf_matrix)

class_report = classification_report(y_test1, predictions)
print("Classification Report:\n", class_report)

Accuracy: 0.8422818791946308
Confusion Matrix:
 [[ 61   5  29]
 [  6   4  25]
 [ 26   3 437]]
Classification Report:
               precision    recall  f1-score   support

    Negative       0.66      0.64      0.65        95
     Neutral       0.33      0.11      0.17        35
    Positive       0.89      0.94      0.91       466

    accuracy                           0.84       596
   macro avg       0.63      0.56      0.58       596
weighted avg       0.82      0.84      0.83       596



### Overview of the Metrics
- **Accuracy**: The model achieved an overall accuracy of approximately 84.2%. This is a solid figure indicating that the model correctly predicted the sentiment of about 84.2% of the reviews in your test dataset.

### Detailed Analysis
- **Confusion Matrix**: The matrix shows how predictions are distributed across actual sentiments:
  - **Negative**: Out of 95 actual negative reviews, 61 were correctly identified (True Negative), but 29 were incorrectly classified as positive.
  - **Neutral**: Out of 35 neutral reviews, only 4 were correctly identified, with a significant number being misclassified as positive (25).
  - **Positive**: This category had better performance, with 437 out of 466 correctly identified, showing that the model is quite reliable in recognizing positive sentiments.

- **Classification Report**: This report provides more detailed metrics for each class:
  - **Negative Sentiments** had a precision of 0.66 and a recall of 0.64, resulting in an F1-score of 0.65. This is moderate performance, showing some room for improvement, especially in reducing false positives and negatives.
  - **Neutral Sentiments** show a low performance with a precision of 0.33 and a recall of 0.11, yielding an F1-score of 0.17. This indicates a considerable struggle in correctly classifying neutral reviews, with the majority being mistaken for positive.
  - **Positive Sentiments** had high precision (0.89) and recall (0.94), leading to an F1-score of 0.91. These values indicate that the model performs very well in identifying positive reviews but may be biased towards classifying uncertain cases as positive.


# Implementing sentiment analysis using TextBlob sentiment analysis

In [75]:
def analyze_sentiment_textblob(text):
    testimonial = TextBlob(text)
    polarity = testimonial.sentiment.polarity
    if polarity > 0:
        return 'Positive'
    elif polarity == 0:
        return 'Neutral'
    else:
        return 'Negative'


In [76]:
data['Sentiment_TextBlob'] = data['processed_review_text'].apply(analyze_sentiment_textblob)
predictions_textblob = data['Sentiment_TextBlob']

# Implementing sentiment analysis using an LSTM (Long Short-Term Memory) model

In [49]:
!pip install tensorflow




In [52]:
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

In [53]:
tokenizer = Tokenizer(num_words=5000, lower=True)
tokenizer.fit_on_texts(data['processed_review_text'])
sequences = tokenizer.texts_to_sequences(data['processed_review_text'])
x = pad_sequences(sequences, maxlen=200)

In [54]:
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(data['Sentiment'])

In [55]:
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

In [58]:
model = Sequential()
model.add(Embedding(input_dim=5000, output_dim=128, input_length=200))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(3, activation='softmax'))

In [59]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])


In [60]:
model.fit(X_train, y_train, epochs=10, batch_size=64, validation_data=(X_test, y_test))


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7abb826b98a0>

In [90]:
predictions_lstm = model.predict(X_test)



In [61]:
loss, accuracy = model.evaluate(X_test, y_test)
print(f'Loss: {loss}, Accuracy: {accuracy}')

Loss: 0.8098092079162598, Accuracy: 0.813758373260498
