# Sentiment Analysis using Support Vector Machine (SVM)

We train our sentiment analysis using classifier algo SVM.

In [1]:
# Install the packages

!pip install pandas scikit-learn > /dev/null 2>&1

In [2]:
import pandas as pd
import sklearn.model_selection as ms
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import classification_report, accuracy_score
from sklearn.naive_bayes import MultinomialNB


In [3]:
# Load the training data and perform eda.
trained_data = pd.read_csv('../../data/twitter_data.csv')

In [4]:
# Clean up the data.

# Drop row if column category is nan or none, or column text is invalid.
trained_data = trained_data.dropna(subset=['category', 'text'])

Here are our inputs to the classification models:

| Parameter       | Description                     | Example                                   |
|:----------------|:--------------------------------|:------------------------------------------|
| Feature name    | Word name                       | ['abuses' 'again' 'from' 'this',...]      |
| Feature measure | Word count                      | [0, 0, 1, 1,...]                          |  
| Label           | The sentiment grade (-1, 0, 1)  |                                           |
| Data (X_train)  | The twitter text                | ['this comes from cabinet which...',...]  |
| Data (Y_train)  | The sentiment grade (-1, 0, 1)  | -1                                        |

In [5]:
# Vectorize the data
vec = CountVectorizer()
X = vec.fit_transform(trained_data['text'])
y = trained_data['category']

In [6]:
# Split the data into 2 datasets for training and testing.
X_train, X_test, y_train, y_test = ms.train_test_split(X, y, test_size=0.25, random_state=42)

In [7]:
# Now that we have the features determined and quantified as a vector, we can feed the features.
model = MultinomialNB()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# See results.
print("Classification Report (MultinomialNB):")
print(classification_report(y_test, y_pred))
print("Accuracy Score (MultinomialNB):")
print(accuracy_score(y_test, y_pred))


Classification Report (MultinomialNB):
              precision    recall  f1-score   support

        -1.0       0.74      0.62      0.67      8933
         0.0       0.91      0.60      0.72     13860
         1.0       0.68      0.91      0.78     17950

    accuracy                           0.74     40743
   macro avg       0.78      0.71      0.73     40743
weighted avg       0.77      0.74      0.74     40743

Accuracy Score (MultinomialNB):
0.7424342831897504


In [8]:
# Load real data.
real_data = pd.read_csv('../../data/amazon_product_reviews.csv')

# The reviews in amazon product review data is in the Review column.
X_real = vec.transform(real_data['Review'])
y_pred = model.predict(X_real)

# Combine the predicted results back to the review text.
data = pd.merge(real_data['Review'], pd.DataFrame(y_pred, columns=['Sentiment']), how='left', left_index=True, right_index=True)
print(data.head())

                                              Review  Sentiment
0             The HeatWave Electric Blanket keeps me        0.0
1  Still impressed with the durability of the Snu...        1.0
2  PowerFlex Resistance Bands are durable and ver...        1.0
3  TurboCharge Power Bank is my go-to for keeping...        1.0
4  NovaChill Cooler Bag is spacious and keeps my ...        1.0


## Prompt the User for Text to Predict

Mini deployment of the model by prompting the user to enter text to predict sentiment.

> **Note**
> 
> The function `input` only works on a local notebook setup, not on Github.


In [9]:
# Prompt the user for a text to analyze

def get_sentiment(text, model, vectorizer):
    vector = vectorizer.transform([text])
    sentiment = model.predict(vector)[0]
    
    if sentiment == 0:
        return 'Neutral sentiment'
    elif sentiment == 1:
        return 'Positive sentiment'

    return 'Negative sentiment'


done = False
while not done:
    try:
        text = input('Enter text to analyze:')
        print(get_sentiment(text, model, vec))
    except KeyboardInterrupt:
        print('Quitting')
        done = True


Positive sentiment
Quitting
