<a href="https://colab.research.google.com/github/cornflake15/twitter_sentiment_analysis/blob/master/basic-sentiment-analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Basic Sentiment Analysis with vaderSentiment

## Prerequisites
Python Modules:
  - Pandas
  - vaderSentiment


In [None]:
!pip install vaderSentiment # We only need install vaderSentiment if we run it on Colab

Collecting vaderSentiment
[?25l  Downloading https://files.pythonhosted.org/packages/76/fc/310e16254683c1ed35eeb97386986d6c00bc29df17ce280aed64d55537e9/vaderSentiment-3.3.2-py2.py3-none-any.whl (125kB)
[K     |██▋                             | 10kB 14.8MB/s eta 0:00:01[K     |█████▏                          | 20kB 1.6MB/s eta 0:00:01[K     |███████▉                        | 30kB 2.3MB/s eta 0:00:01[K     |██████████▍                     | 40kB 2.5MB/s eta 0:00:01[K     |█████████████                   | 51kB 2.0MB/s eta 0:00:01[K     |███████████████▋                | 61kB 2.2MB/s eta 0:00:01[K     |██████████████████▏             | 71kB 2.4MB/s eta 0:00:01[K     |████████████████████▉           | 81kB 2.6MB/s eta 0:00:01[K     |███████████████████████▍        | 92kB 2.8MB/s eta 0:00:01[K     |██████████████████████████      | 102kB 2.7MB/s eta 0:00:01[K     |████████████████████████████▋   | 112kB 2.7MB/s eta 0:00:01[K     |███████████████████████████████▏| 12

## How to add the dataset to Google Colab?
1. Copy the dataset to your own gdrive 
  - https://drive.google.com/drive/folders/1DeDNCpQAjwKQqau1cR2druHlsWJWF50f?usp=sharing

2. Mount to Google Drive as written in code below

In [None]:
from google.colab import drive, files
drive.mount('/content/gdrive/')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/gdrive/


## Import the necessary module

In [None]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import numpy as np
import pandas as pd

#### Read it with pandas DataFrame

In [None]:
df = pd.read_csv('/content/gdrive/My Drive/Colab Notebooks/dataset/tweet_sample_100.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,tweet_text
0,0,@erictile_: The reason why America isn’t recov...
1,1,@erictile_: The reason why America isn’t recov...
2,2,@itskevo254: This is the first year I'm not go...
3,3,@ipragyasingh1: DM agra has issued u a notice ...
4,4,@tveitdal: The Covid-19 pandemic is threatenin...


### Function to get the sentiments of the sentence.

In [None]:
def sentiment_scores(sentence):
    """
    Create a SentimentIntensityAnalyzer object. 
    polarity_scores() method of SentimentIntensityAnalyzer.
    object gives a sentiment dictionary
    which contains pos, neg, neu, and compoind scores
    """
    sid_obj = SentimentIntensityAnalyzer() 
    sentiment_dict = sid_obj.polarity_scores(sentence)
    
    return sentiment_dict

# Function to remove unwanted string in the tweet
def remove_unwanted_string(dataframe):
    new_record = []
    tweet_text = dataframe.apply(lambda x: str(x).split())
    for record in tweet_text:
        for text in record:
            if 'RT' in text or '@' in text or '#' in text:
                record.remove(text)
        
        new_record.append(' '.join(record))
    dataframe = new_record
    return dataframe

#### Remove unwanted string in the text

In [None]:
df_tweets = pd.DataFrame()
df_tweets['tweet_text'] = remove_unwanted_string(df['tweet_text'])
df_tweets.head()

Unnamed: 0,tweet_text
0,The reason why America isn’t recovering from C...
1,The reason why America isn’t recovering from C...
2,This is the first year I'm not going to Dubai ...
3,DM agra has issued u a notice for baseless ali...
4,The Covid-19 pandemic is threatening vital rai...


#### Scores the sentiment of the text

In [None]:
pos, neg, neu, comp = [], [], [], []
for text in df_tweets['tweet_text']:
    sentiment = sentiment_scores(text)
    pos.append(sentiment['pos'])
    neg.append(sentiment['neg'])
    neu.append(sentiment['neu'])
    comp.append(sentiment['compound'])
    
df_tweets_sentiment = pd.DataFrame()
# df_tweets_sentiment.columns = ['tweet_text', 'positive', 'negative', 'neutral', 'compound']
df_tweets_sentiment['tweet_text'] = df_tweets['tweet_text']
df_tweets_sentiment['positive'] = pos
df_tweets_sentiment['negative'] = neg
df_tweets_sentiment['neutral'] = neu
df_tweets_sentiment['compound'] = comp

In [None]:
df_tweets_sentiment.head()

Unnamed: 0,tweet_text,positive,negative,neutral,compound
0,The reason why America isn’t recovering from C...,0.0,0.0,1.0,0.0
1,The reason why America isn’t recovering from C...,0.0,0.0,1.0,0.0
2,This is the first year I'm not going to Dubai ...,0.194,0.168,0.638,0.4588
3,DM agra has issued u a notice for baseless ali...,0.0,0.0,1.0,0.0
4,The Covid-19 pandemic is threatening vital rai...,0.115,0.309,0.576,-0.5719
