# 2. Sentiment analysis and aggregation

This template is used to get sentiment scores using VADER sentiment of all tweets per company. When calculating the VADER sentiment of each, they will be aggregated via the Fisher sentiment score to calculate the daily sentiment based on the number of likes, retweets, replies and quotes. Later, plots of the daily sentiment, daily number of interactions and daily number of tweets per company are returned to provide some intuition about the behaviour of these metrics.

This Jupyter Notebook provides the opportunity to quickly inspect and test daily sentiment scores.

## 2.1. Load packages and data

This class loads all relevant packages and dates

### 2.1.1. Load packages

First, import all classes necessary for the running of this files. Then import the relevant classes from the Python `thesis_code` library. 

In [None]:
# import packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os, sys
from datetime import datetime

# Load data that returns tweets
sys.path.insert(0, os.path.abspath('C:\\Users\\Jonas\\PycharmProjects\\TwitterSentimentGARCH2021\\Code\\Sentiment analysis and aggregation'))
from sentanalysis import TwitterSentimentAnalysis

### 2.1.2. Construct colors for plotting

Construct self-constructed colormap that will be used throughout this project

In [None]:
colors = ['seagreen', 'mediumaquamarine', 'steelblue', 'cornflowerblue', 'navy', 'black']

### 2.1.3. Load data

#### 2.1.3.1. Load data

In this section, load the data with company names. Also specify the storage location where the sentiment data must be stored.

In [None]:
# Specify location of data + file name and location of storage
data_loc = r'C:\Users\Jonas\Documents\Data'
file_name_comp = '\company_ticker_list_all.xlsx'

# Specify location where daily sentiment scores must be stored
store_loc = r'C:\Users\Jonas\Documents\Data\Sentiment'

# Access company names DataFrame
df_comp_names = pd.read_excel(data_loc + file_name_comp)

#### 2.1.3.2. Load return data

Load return data to get dates of trading days in the sample

In [None]:
# Specify a dictionary to store all daily returns in
dict_returns = {}

for tckr in df_comp_names.Symbol:
    returns_loc = f'C:\\Users\\Jonas\\Documents\\Data\\Returns\\{tckr}.csv'
    dict_returns[f'returns {tckr}'] = pd.read_csv(returns_loc)

## 2.2. Analysis of tweets 

This section performs the main calculations and analysis of the Twitter data. The returned daily sentiment data is stored in a seperate folder.

In [None]:
# Find tweets per company, all stored in one folder
for i in range(len(df_comp_names)):
    company_name = df_comp_names.iloc[i]['Company']

    tweets_data_loc = f'C:\\Users\\Jonas\\Documents\\Data\\Tweets\\tweets {company_name}.csv'
    
    # find tweets of current company, and store as df_tweets
    df_tweets = pd.read_csv(tweets_data_loc)
    print(df_tweets.shape)
    

### 2.2.1. Load and analyse tweets

Load tweets, retrieved via the Notebook `data_collection` (*consult this Notebook for reference*). Then, create a sentiment analysis object for every company specific list of tweets. The class `TwitterSentimentAnalysis` has several attributes, where public metrics are seperated, dates are splitted into time and date columns, to aggregate the tweets per date. Then, the VADER sentiment lexicon is exploited to calculated the sentiment score of each tweets of the dataset. The aggregation of the tweets is done via the Fisher score, which is calculated on a daily basis.

In [None]:
# Find tweets per company, all stored in one folder
for i in range(len(df_comp_names)):
    company_name = df_comp_names.iloc[i]['Company']

    tweets_data_loc = f'C:\\Users\\Jonas\\Documents\\Data\\Tweets\\tweets {company_name}.csv'
    
    # find tweets of current company, and store as df_tweets
    df_tweets = pd.read_csv(tweets_data_loc)
    
    # Sort tweets in descending order based on date and time
    df_tweets = df_tweets.sort_values(by=['created_at']).reset_index(drop=True)
    
    # Get unique trading days of company i
    trading_days = dict_returns[f'returns {df_comp_names.Symbol.iloc[i]}'].Date.unique().tolist()
    unique_trading_days = []
    for trading_day in trading_days:
        unique_trading_days.append((datetime.strptime(trading_day, '%Y-%m-%d')).date())

    # Construct for every class a sentiment object
    sentiment_obj = TwitterSentimentAnalysis(df_tweets, 'text', 'public_metrics', 'created_at', unique_trading_days)
    
    # Calculate daily sentiment based on the calculate_daily_sent method of the sentiment_obj. This dataframe also 
    # contains the number of daily interactions and the number of daily tweets, the other quantitative sentiment metrics.
    df_daily_sent = sentiment_obj.calculate_daily_sent()
    
    # Save daily sentiment dataframe as csv
    store_name = f'\sentiment {company_name}.csv'    
    df_daily_sent.to_csv(store_loc + store_name)

### 2.2.2. Display sentiment scores and quantititave metrics

This section will display daily sentiment scores, the number of tweets and the number of daily interactions. These are returned both as describtion DataFrames and in plots.

In [None]:
# Find tweets per company, all stored in one folder
for i in range(len(df_comp_names)):
    company_name = df_comp_names.iloc[i]['Company']
   
    # Save daily sentiment dataframe as csv
    store_name = f'\sentiment {company_name}.csv'    
    df_daily = pd.read_csv(store_loc + store_name)
    display(df_daily.describe())
    
    fig, axs = plt.subplots(figsize = (20,4), nrows = 1, ncols = 3)
    
    first_date, last_date = df_daily.date.iloc[0], df_daily.date.iloc[-1]
    n = 150  # keeps every 30th label (around 1 month)

    for j in range(len(axs)):
        columns = df_daily.columns
        # If the metric is not sentiment score, plot the log change of the metrics (number of interactions and number of tweets)
        if j != 0:
            axs[j].plot(df_daily.date, df_daily[columns[j+3]], c=colors[j])
        else:
            axs[j].plot(df_daily.date, df_daily.sentiment, c=colors[j])
        
        # Set title and xticklabels
        axs[j].set_title(f'Company: {company_name}' + '\n' f'{columns[j+3]} between {first_date} and {last_date}')
        axs[j].set_xticks(axs[j].get_xticks()[::n])
        
        axs[j].tick_params(axis='x', labelrotation = 45)
    
    plt.tight_layout()
    
    # Store figures as PNG
    fig.savefig(os.path.join(store_loc, f'sentiment metrics {company_name}'))
    

------------------------------
------------------------------