Created by: Viktor Hatina
Date: April 18, 2024

In [2]:
# Importing necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, roc_curve, roc_auc_score
import matplotlib.pyplot as plt
import seaborn as sns


# Experiment Sentiment Analysis on Customer Feedback and Reviews


## Description of used data
This dataset contains customer sentiments expressed in various sources such as social media, review platforms, testimonials, and more. The dataset includes text, sentiment (positive or negative), source of the sentiment, date/time of the sentiment, user ID, location, and confidence score. The sentiments reflect customers' opinions and experiences with products, services, movies, music, books, restaurants, websites, customer support, and more


In [7]:
# Load the dataset                     
df = pd.read_csv("sentiment-analysis.csv")
df.head()

Unnamed: 0,"Text, Sentiment, Source, Date/Time, User ID, Location, Confidence Score"
0,"""I love this product!"", Positive, Twitter, 202..."
1,"""The service was terrible."", Negative, Yelp Re..."
2,"""This movie is amazing!"", Positive, IMDb, 2023..."
3,"""I'm so disappointed with their customer suppo..."
4,"""Just had the best meal of my life!"", Positive..."


In [8]:
#Clean the dataset and separate the data into columns
data=df['Text, Sentiment, Source, Date/Time, User ID, Location, Confidence Score'].str.split(',', expand=True)
data.columns=['Text', 'Sentiment', 'Source', 'Date/Time', 'User ID', 'Location', 'Confidence Score']
data = data.dropna() # remove empty cells
data.head()

Unnamed: 0,Text,Sentiment,Source,Date/Time,User ID,Location,Confidence Score
0,"""I love this product!""",Positive,Twitter,2023-06-15 09:23:14,@user123,New York,0.85
1,"""The service was terrible.""",Negative,Yelp Reviews,2023-06-15 11:45:32,user456,Los Angeles,0.65
2,"""This movie is amazing!""",Positive,IMDb,2023-06-15 14:10:22,moviefan789,London,0.92
3,"""I'm so disappointed with their customer suppo...",Negative,Online Forum,2023-06-15 17:35:11,forumuser1,Toronto,0.78
4,"""Just had the best meal of my life!""",Positive,TripAdvisor,2023-06-16 08:50:59,foodie22,Paris,0.88



Sentiment analysis, also known as opinion mining, is a natural language processing (NLP) technique used to determine the sentiment or emotional tone expressed in a piece of text, such as a review, comment, tweet, or customer feedback
The primary goal of sentiment analysis is to understand the writer's attitude or opinion towards a particular subject, entity, or topic mentioned in the text.


In [9]:
# Take relevant columns such as text (feature) and sentiment (target) to use for splitting the dataset
X = data['Text'].astype(str)
y = data['Sentiment']

In [10]:
#Feature Engineering
vectorizer = TfidfVectorizer(max_features=5000, stop_words='english')
X_vec = vectorizer.fit_transform(X)


Vectorization phase in sentiment analysis converts raw text data into numerical representations, enabling machine learning algorithms to process and analyze text data effectively


In [11]:
# Split dataset into train and test samples
X_train, X_test, y_train, y_test = train_test_split(X_vec, y, test_size=0.2, random_state=42)

In [12]:
# Model Selection and Training
model = LinearSVC()
model.fit(X_train, y_train)


### Other models include
1. Linear Support Vector Classifier
2. Logistic regression - particularly suitable for binary classification of tasks
3. Random Forest and gradient boosting
4. Deep Learning Models..


In [13]:
# Function to preprocess the user input and make predictions
def predict_sentiment(input_text):
    print("Text obtained: ", input_text)
    # Preprocess the input text
    input_vec = vectorizer.transform([input_text])
    
    print("Text Vectorised: ", input_vec)
    
    # Make prediction
    prediction = model.predict(input_vec)
    print("Models Prediction: ", prediction)
    # Return the prediction
    return "Positive" if prediction[0] == ' Positive' else "Negative"


In [14]:
# Model Evaluation
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print("Classification Report:")
print(classification_report(y_test, y_pred))


Accuracy: 0.85
Classification Report:
              precision    recall  f1-score   support

    Negative       0.62      1.00      0.77         5
    Positive       1.00      0.80      0.89        15

    accuracy                           0.85        20
   macro avg       0.81      0.90      0.83        20
weighted avg       0.91      0.85      0.86        20



In [15]:

# Confusion Matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(conf_matrix)


Confusion Matrix:
[[ 5  0]
 [ 3 12]]



Explanation of Classification Report:
Precision: Precision measures the proportion of true positive predictions among all positive predictions. It indicates how often the model correctly predicts positive instances.
Recall: Recall measures the proportion of true positive predictions among all actual positive instances. It indicates the model's ability to correctly identify positive instances.
F1-score: The F1-score is the harmonic mean of precision and recall, providing a balance between the two metrics. It's useful for evaluating models when there's an imbalance between classes.
Support: Support is the number of actual occurrences of each class in the test set. It provides context for the precision, recall, and F1-score metrics.
Explanation of Confusion Matrix:
The confusion matrix is a summary of the model's predictions compared to the actual labels.
 - True Positive (TP) top left: The model correctly predicted positive instances as positive.
 - False Positive (FP) top right: The model incorrectly predicted negative instances as positive.
 - True Negative (TN) bottom right: The model correctly predicted negative instances as negative.
 - False Negative (FN) bottom left: The model incorrectly predicted positive instances as negative.

Additional Insights:
- A high accuracy score indicates that the model is correctly classifying a large portion of instances.
- Precision and recall scores provide insights into the model's performance for each class, helping to understand its strengths and weaknesses.
- The confusion matrix provides a detailed breakdown of correct and incorrect predictions, aiding in understanding where the model is making errors.
- Consideration of these metrics and insights can inform further improvements to the model, such as refining features, adjusting hyperparameters, or collecting more diverse training data.


In [16]:

# Interface loop
print("Welcome to the sentiment analysis interface!")
print("Type 'exit' to quit.")
while True:
    user_input = input("Enter your comment (type 'exit' to quit): ")
    if user_input.lower() == 'exit':
        print("Exiting...")
        break
    else:
        predicted_sentiment = predict_sentiment(user_input)
        print("Predicted Sentiment:", predicted_sentiment)

Welcome to the sentiment analysis interface!
Type 'exit' to quit.


Enter your comment (type 'exit' to quit):  this was bad service


Text obtained:  this was bad service
Text Vectorised:    (0, 161)	1.0
Models Prediction:  [' Negative']
Predicted Sentiment: Negative


Enter your comment (type 'exit' to quit):  I am very happy!


Text obtained:  I am very happy!
Text Vectorised:  
Models Prediction:  [' Positive']
Predicted Sentiment: Positive


Enter your comment (type 'exit' to quit):  The agent was very helpful. Thanks


Text obtained:  The agent was very helpful. Thanks
Text Vectorised:    (0, 83)	1.0
Models Prediction:  [' Positive']
Predicted Sentiment: Positive


Enter your comment (type 'exit' to quit):  The service was poor and I did not get the information needed


Text obtained:  The service was poor and I did not get the information needed
Text Vectorised:    (0, 161)	0.341678089136951
  (0, 133)	0.4874860448469688
  (0, 119)	0.5681608220755726
  (0, 90)	0.5681608220755726
Models Prediction:  [' Negative']
Predicted Sentiment: Negative


Enter your comment (type 'exit' to quit):  exit


Exiting...



# Thank you 

### Next Steps:
1. Improve model performance by training it on more data
2. Evaluate broader model
3. Test on official dataset