# Sentiment Analysis using Google Cloud

In [None]:
! pip3 install --upgrade --quiet wordcloud

! pip3 install --upgrade --quiet google-cloud-aiplatform \
                                 fsspec \
                                 gcsfs

In [None]:
PROJECT_ID = "nemo-493b-final"  # @param {type:"string"}

# Set the project id
! gcloud config set project {PROJECT_ID}

Updated property [core/project].


In [None]:
REGION = "us-central1"  # @param {type: "string"}

#### UUID

If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a uuid for each instance session, and append it onto the name of resources you create in this tutorial.

In [None]:
import random
import string


# Generate a uuid of a specifed length(default=8)
def generate_uuid(length: int = 8) -> str:
    return "".join(random.choices(string.ascii_lowercase + string.digits, k=length))


UUID = generate_uuid()

### Authenticate your Google Cloud account

Depending on your Jupyter environment, you may have to manually authenticate. Follow the relevant instructions below.

**1. Vertex AI Workbench**
* Do nothing as you are already authenticated.

**2. Local JupyterLab instance, uncomment and run:**

In [None]:
! gcloud auth login

Go to the following link in your browser, and complete the sign-in prompts:

    https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=32555940559.apps.googleusercontent.com&redirect_uri=https%3A%2F%2Fsdk.cloud.google.com%2Fauthcode.html&scope=openid+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fappengine.admin+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fsqlservice.login+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcompute+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Faccounts.reauth&state=1zgQ0abKQwZaC4ZoSCCAasNHB5Z5vo&prompt=consent&token_usage=remote&access_type=offline&code_challenge=5GtduZGIifi2aaET7lBLsBMzkC4QLZ8PAdi-9OIl0V4&code_challenge_method=S256

Once finished, enter the verification code provided in your browser: 

Command killed by keyboard interrupt

^C


**3. Colab, uncomment and run:**

In [None]:
from google.colab import auth
auth.authenticate_user()

### Create a Cloud Storage bucket

Create a storage bucket to store intermediate artifacts such as datasets.

In [None]:
BUCKET_URI = f"gs://finalprojectnemo-{PROJECT_ID}-unique"  # @param {type:"string"}

**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket.

In [None]:
! gsutil mb -l {REGION} -p {PROJECT_ID} {BUCKET_URI}

### Import libraries

In [None]:
import os
from typing import List, Optional, Union

import matplotlib.pyplot as plt
import pandas as pd
from google.cloud import aiplatform, storage
from wordcloud import STOPWORDS, WordCloud

## Load the data
<a name="section-5"></a>

Load the phrases and scores of the dataset from the Cloud Storage sources.

In [None]:
import pandas as pd
from nltk.sentiment import SentimentIntensityAnalyzer

# Install NLTK and download VADER lexicon
!pip install nltk
import nltk
nltk.download('vader_lexicon')

# Read CSV file from Google Cloud Storage
df = pd.read_csv("gs://finalprojectnemo/filtered_data.csv")

# Initialize the SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()

# Function to get sentiment
def get_sentiment(text):
    sentiment = sid.polarity_scores(text)
    if sentiment['compound'] >= 0.05:
        return 'positive'
    elif sentiment['compound'] <= -0.05:
        return 'negative'
    else:
        return 'neutral'

# Apply sentiment analysis to the DataFrame
df['sentiment'] = df['text'].apply(get_sentiment)

# Calculate sentiment percentages
sentiment_counts = df['sentiment'].value_counts()
sentiment_percentages = sentiment_counts / len(df) * 100

# Show sentiment percentages
print("Sentiment Percentages:")
print(sentiment_percentages)

# Generate confusion matrix (assuming you have a 'true_sentiment' column)
conf_matrix = pd.crosstab(df['target'], df['sentiment'])

# Display confusion matrix
print("\nConfusion Matrix:")
print(conf_matrix)




[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


Sentiment Percentages:
sentiment
positive    51.329442
neutral     26.588846
negative    22.081712
Name: count, dtype: float64

Confusion Matrix:
sentiment  negative  neutral  positive
target                                
0               524      323       397
4               157      497      1186


**Based on the provided sentiment percentages:**

- Approximately 51.33% of the sentiments were classified as positive.
- Around 26.59% of the sentiments were classified as neutral.
- Roughly 22.08% of the sentiments were classified as negative.

These percentages indicate the distribution of sentiment classifications in your dataset. Positive sentiments appear to be the most common, followed by neutral sentiments, with negative sentiments being the least common among the data.

**Based on the provided confusion matrix:**

- For the target class 0:
  - 524 instances were classified as negative sentiment.
  - 323 instances were classified as neutral sentiment.
  - 397 instances were classified as positive sentiment.

- For the target class 4:
  - 157 instances were classified as negative sentiment.
  - 497 instances were classified as neutral sentiment.
  - 1186 instances were classified as positive sentiment.

This confusion matrix helps in understanding how well the sentiment analysis model performed for each sentiment class (negative, neutral, positive) across different target classes (0 and 4, assuming these are the target classes). It shows the number of instances classified correctly and incorrectly into each sentiment category.