<a href="https://colab.research.google.com/github/1MuhammadFarhanAslam/ML-Projects/blob/main/Omicron_Sentiment_Analysis_using_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Mounting Google Drive**

In [None]:
from google.colab import drive

# Mount Google Drive
drive.mount('/content/drive')

# **Configure Google Colab to Kaggle through Kaggle API**

**To connect Kaggle datasets to Google Colab, you need to follow these steps:**

* 1: Install the Kaggle library in Google Colab by running the following command

In [None]:
!pip install kaggle

**Go to the Kaggle website (https://www.kaggle.com) and sign in to your account (or create a new account if you don't have one).**

*Navigate to the dataset you want to use in your Colab notebook.*

*Click on the "Copy API command" button below the dataset description. This will copy the command to download the dataset using the Kaggle API.*

*In your Colab notebook, import the necessary libraries and set up the Kaggle API by running the following code*

In [None]:
import os
import json

# Upload your Kaggle API key file (kaggle.json) to Colab using the file upload feature
from google.colab import files
files.upload()

# Read the contents of the kaggle.json file
with open('kaggle.json', 'r') as file:
    kaggle_json = json.load(file)

# **Important about Kaggle API Security**

**The command !chmod 600 ~/.kaggle/kaggle.json is used to change the permissions of the kaggle.json file to restrict access permissions.**

*In Linux-based systems, including Google Colab, file permissions are represented by a three-digit number: the first digit represents the owner's permissions, the second digit represents the group's permissions, and the third digit represents other users' permissions.*

**Here's a breakdown of what chmod 600 does:**

* ***6 means the owner (the user who uploaded the kaggle.json file) has read and write permissions (4 for read and 2 for write), but no execute permissions (0 for execute). 0 means the group and other users have no permissions to read, write, or execute the file.***

* ***By setting the permissions to chmod 600, it ensures that only the owner of the file (the user who uploaded the kaggle.json file) has read and write access, and no other users (group or others) can access or modify the file.***

* **This step is important to maintain the security of your Kaggle API key, as it contains sensitive information and should not be accessible to other users of the system.**

In [None]:
# Move the saved kaggle.json file to the required directory
os.makedirs('/root/.kaggle', exist_ok=True)
os.rename('kaggle.json', '/root/.kaggle/kaggle.json')

# Set the appropriate permissions for the Kaggle API key file
os.chmod('/root/.kaggle/kaggle.json', 0o600)

**or**

In [None]:
import os

# Specify the path to the kaggle.json file
kaggle_json_path = os.path.join(os.path.expanduser("~"), ".kaggle", "kaggle.json")

# Check if the kaggle.json file already exists
if os.path.exists(kaggle_json_path):
    print("kaggle.json file already exists.")
else:
    # Move the uploaded Kaggle API key file to the required directory
    !mkdir -p ~/.kaggle    # This command creates a directory named '.kaggle' inside the user's home directory (~). The -p option ensures that the parent directories are also created if they don't exist. If the directory already exists, this command will not throw an error
    !mv kaggle.json ~/.kaggle/    # This command moves the file named 'kaggle.json' to the ~/.kaggle/ directory. The mv command is used for file or directory relocation. The first argument, kaggle.json, represents the current name/path of the file, and the second argument, ~/.kaggle/, represents the destination directory where the file should be moved.
    !chmod 600 ~/.kaggle/kaggle.json
    print("kaggle.json file moved and permissions set successfully.")


**Verifying Kaggle API**

In [None]:
# Verify the Kaggle API is working
!kaggle datasets list

# **Downloading dataset from URL**

In [None]:
!kaggle datasets download --force gpreda/omicron-rising

**If the Kaggle API is working correctly, you can download the dataset by running the copied API command in your Colab notebook:**

* **The -d flag is useful if you want to download the dataset only once. If you use the -d flag and the dataset already exists in your local directory, Kaggle will not download the dataset again.In your case, the dataset is being updated daily, so you may want to use the --force flag to make sure that you always have the latest version of the dataset.**

**The dataset will be downloaded as a ZIP file. You can unzip the file using the following command**

In [None]:
import zipfile

# Specify the path to the ZIP file
zip_file_path = 'omicron-rising.zip'

# creating directory to unzip dataset
!mkdir -p omicron-rising

# Specify the target directory to extract the files
target_directory = 'omicron-rising'

# Open the ZIP file
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    # Extract all the files to the target directory
    zip_ref.extractall(target_directory)

print("ZIP file extracted successfully.")

In [None]:
import os

# Specify the directory path
directory_path = 'omicron-rising'

# Create the directory if it doesn't already exist
if not os.path.exists(directory_path):
    os.makedirs(directory_path)
    print(f"Directory '{directory_path}' created successfully.")
else:
    print(f"Directory '{directory_path}' already exists.")


In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator


data = pd.read_csv('/content/omicron-rising/omicron.csv')
data

In [None]:
data.shape

In [None]:
data.info()

In [None]:
data.describe()

In [None]:
data.isnull().sum()

The dataset contains null values in three columns that contains textual data, I will remove all the rows containing the null values:

In [None]:
data = data.dropna()

In [None]:
data.isnull().sum()

# **Sentiment Analysis of Omicron Variant**

The text column in the dataset contains the tweets done by people to share their opinions about the Omicron variant. To move further, we need to clean and prepare this column for the task of sentiment analysis. Here’s how we can do that:

In [None]:
import nltk
import re
from nltk.corpus import stopwords
import string

nltk.download('stopwords')
stemmer = nltk.SnowballStemmer("english")
stopword=set(stopwords.words('english'))

In [None]:
def clean(text):
    text = str(text).lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)
    text = [word for word in text.split(' ') if word not in stopword]
    text=" ".join(text)
    text = [stemmer.stem(word) for word in text.split(' ')]
    text=" ".join(text)
    return text

data["text"] = data["text"].apply(clean)

In [None]:
data.tail(5)

As we have cleaned the text column, now let’s have a look at the word cloud of the text column to look at the most number of words used by the people on their tweets:

In [None]:
for i in data.text:
  print(i)

In [None]:
text = " ".join(i for i in data.text)
text

As we have cleaned the text column, now let’s have a look at the word cloud of the text column to look at the most number of words used by the people on their tweets:

In [None]:
text = " ".join(i for i in data.text)
stopwords = set(STOPWORDS)
wordcloud = WordCloud(width=800, height=500, random_state=21, max_font_size=110, stopwords=stopwords, background_color="black").generate(text)
plt.figure( figsize=(14,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

Now let’s have a look at the word cloud of the hashtags column to look at the most number of hashtags used by the people on their tweets:

In [None]:
text = " ".join(i for i in data.hashtags)
stopwords = set(STOPWORDS)

wordcloud = WordCloud(width=800, height=500, random_state=21, max_font_size=110, stopwords=stopwords, background_color="black").generate(text)
plt.figure( figsize=(14,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

Now I will calculate the sentiment scores of the tweets about the Omicron variant. Here I will add three more columns in this dataset as Positive, Negative, and Neutral by calculating the sentiment scores of the text column:

In [None]:
nltk.download('vader_lexicon')
sentiments = SentimentIntensityAnalyzer()
data["Positive"] = [sentiments.polarity_scores(i)["pos"] for i in data["text"]]
data["Negative"] = [sentiments.polarity_scores(i)["neg"] for i in data["text"]]
data["Neutral"] = [sentiments.polarity_scores(i)["neu"] for i in data["text"]]
data = data[["text", "Positive", "Negative", "Neutral"]]
print(data.head())

Now let’s see how most of the people reacted about the Omicron variant:

In [None]:
x = sum(data["Positive"])
y = sum(data["Negative"])
z = sum(data["Neutral"])

def sentiment_score(a, b, c):
    if (a>b) and (a>c):
        print("1 😊 ")
    elif (b>a) and (b>c):
        print("-1 😠 ")
    else:
        print("0 🙂 ")

sentiment_score(x, y, z)

So most of the opinions were Neutral, which means that people were sharing information about the Omicron variant instead of sharing any positive or negative opinions.

In [None]:
# Get the highest polarity score for each row
data["Max_Sentiment"] = data[["Positive", "Negative", "Neutral"]].max(axis=1)

# Get the sentiment label
def get_sentiment_label(sentiment):
  if sentiment == "Positive":
    return "Positive"
  elif sentiment == "Negative":
    return "Negative"
  else:
    return "Neutral"

data["SentimentLabel"] = data["Max_Sentiment"].map(get_sentiment_label)
data

In [None]:
data[data['SentimentLabel']  == "Neutral" ]

In [None]:
data[data['SentimentLabel']  == "Negative" ]

Creating dataframes containing neutral,positive and negative sentiments

In [None]:
neutral = data[data['SentimentLabel'] == "Neutral"]
positive = data[data['SentimentLabel'] == "Positive"]
negative = data[data['SentimentLabel'] == "Negative"]

In [None]:
neutral

In [None]:
positive

In [None]:
negative

# **Visualization**

In [None]:
# This code creates a bar chart showing the distribution of sentiment in the dataset of tweets.
import plotly.graph_objs as go

# Create a list of x-axis labels.
x = ['Neutral', 'Positive', 'Negative']

# Create a list of y-axis values.
y = [len(neutral), len(positive), len(negative)]

# Create a bar chart object.
fig = go.Figure(data=[go.Bar(x=x, y=y, hovertext=['100% of tweets', '0% of tweets', '0% of tweets'])])

# Customize the aspect of the bar chart.
fig.update_traces(marker_line_color='midnightblue', marker_line_width=1.)

# Set the title of the bar chart.
fig.update_layout(title_text='Distribution of sentiments')

# Display the bar chart.
fig.show()

In [None]:
data['SentimentLabel'].value_counts()

In [None]:
import plotly.graph_objs as go

# Create a list of x-axis labels.
x = ['Neutral', 'Positive', 'Negative']

# Create a list of y-axis values.
y = [len(neutral), len(positive), len(negative)]

# Create a pie chart object.
fig = go.Figure(data=[go.Pie(labels=x, values=y)])

# Customize the aspect of the pie chart.
fig.update_traces(hole=0.6, marker_line_color='midnightblue', marker_line_width=1.)

# Set the title of the pie chart.
fig.update_layout(title_text='Distribution of sentiments')

# Display the pie chart.
fig.show()
