# Sentiment analysis of COVID19 and BlackLivesMatter related tweets 

In this code a sentiment analysis that compares  

## 1. Import all packages <p>
***WARNING***: Before trying to load the packages make sure they are installed in the python environment you are working with. The specific packages used (not common packages) can be installed following the instructions from: <br>
- [GetOldTweets3] (https://pypi.org/project/GetOldTweets3/)
- [VaderSentimentAnalysis] (https://github.com/cjhutto/vaderSentiment)
- [sklearn] (https://github.com/scikit-learn/scikit-learn)

In [5]:
# Load packages
import GetOldTweets3 as got
import numpy as np
import csv
import re
import matplotlib.pyplot as plt
import os
import pandas as pd
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from mpl_toolkits.mplot3d import Axes3D
import plotly.express as px
from sklearn.svm import OneClassSVM

## 2. Retreive tweets with certain hashtag <p>
To extract historical tweets from Twitter the package GetOldTweets was used. This package uses the twitter API to crawl for tweets containing a certain searchword. The query search has to be performed directly from the terminal by running the following code lines. It will search for all the tweets that can be found based on the search criteria specified and it will save the tweet and some related information in a csv file. The specifications in the search critera will be: <p>

- **querysearch:** The word used for the query search. In this case several hashtags will be used. <br>
- **since and until:** Allow to set the timeframe in which the tweets will be retreived. <br>
- **top tweets:** Only stores tweets regarded by Twitter as Top tweets, usually they have more favourites, replies and retweets.<br>
- **lang:** Filters the language of the tweets retreived. en: English <br>
- **output:** Determines the name of the output file in which hte tweets are stored. <p>

**10-15th May 2020** <br>
* GetOldTweets3 --querysearch "#kitten" --since "2020-05-10" --until "2020-05-15" --toptweets --maxtweets 10000 --lang en --output "kitten10M.csv"<br>
* GetOldTweets3 --querysearch "#pet" --since "2020-05-10" --until "2020-05-15" --toptweets --maxtweets 10000 --lang en --output "pet10M.csv"<br>
* GetOldTweets3 --querysearch "#COVID19" --since "2020-05-10" --until "2020-05-15" --toptweets --maxtweets 10000 --lang en --output "COVID10M.csv"<br>
* GetOldTweets3 --querysearch "#BlackLivesMatter" --since "2020-05-10" --until "2020-05-15" --toptweets --maxtweets 10000 --lang en --output "BLM10M.csv"<p>

**26- 31 May 2020** <br>
* GetOldTweets3 --querysearch "#BlackLivesMatter" --since "2020-05-26" --until "2020-05-31" --toptweets --maxtweets 10000 --lang en --output "BLM26M.csv"<br>
* GetOldTweets3 --querysearch "#kitten" --since "2020-05-26" --until "2020-05-31" --toptweets --maxtweets 10000 --lang en --output "kitten26M.csv"<br>
* GetOldTweets3 --querysearch "#pet" --since "2020-05-26" --until "2020-05-31" --toptweets --maxtweets 10000 --lang en --output "pet26M.csv"  <br>
* GetOldTweets3 --querysearch "#COVID19" --since "2020-05-26" --until "2020-05-31" --toptweets --maxtweets 10000 --lang en --output "COVID26M.csv" <p>

To increase reproducibility, the exact files used for the example analysis are also provided with this code. Therefore it is not necessary to run the GetOldTweets3 query.  <br>

## 3. Load data from .csv files <p>

The data extracted in the previous step has to be reloaded for its analysis. If the user did not use the GetOldTweets3 package to extract the data it is also possible to load other .csv files containing tweets for their further analsis. Nevertheless, make sure they contain the same structure than the [tweet class](https://pypi.org/project/GetOldTweets3/) created by GetOldTweets3. <p> 
**INSTRUCTIONS:** To load the data introduce the name of each .csv file in one element of the namefiles list and run this block of code. <p> 

Each csv files will be stored as a new key in a common dictionary (data_all). The key name is the name of the specific file and the value associated to the key contains a list of lists that stores all the variables retreived for each tweet sample. 

**NOTE:** Make sure that the files are located in the same folder than the .ipnyb main file and the names are written exactly in the same way than the file to load (with no extra spaces at the start or the end of the string). ***WARNING: Case sensitive!***


In [6]:
#Replace elements in the list by the name of the .csv files to load
namefiles = ['BLM10M', 'BLM26M', 
            'COVID10M', 'COVID26M', 
            'kitten10M', 'kitten26M', 
            'pet10M', 'pet26M']

# Load all csv files
all_data = {}

for dataset in namefiles:
    with open(dataset + '.csv', newline='') as f:
        reader = csv.reader(f)
        list_reader = list(reader)
        all_data[dataset] = list_reader