# 1. Twitter data collection

This template illustrates the use of `data_collection`. In this template the Twitter data, filtered on tweets containing ESG words are selected. 

## 1.1. Load relevant data, packages and set user input

### 1.1.1. Loading packages

Load relevant python packages and relevant classes from the python library, containing all classes used for this research.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os, sys

# Load class that returns tweets
sys.path.insert(0, os.path.abspath('C:\\Users\\jdeboo\\PycharmProjects\\TwitterSentimentGARCH2021\\Code\\Data collection'))
from main_api_code import CollectTwitterData

### 1.1.2. Load Company identifier data

Load datafile containing names and Ticker symbols of the 50 largest companies of the S&P500.

In [None]:
data_loc = r'C:\Users\Jonas\Documents\Data\company_ticker_list_all.xlsx'
df = pd.read_excel(data_loc)

# Quickly leave out Facebook
df = df[df.Symbol != 'FB']

### 1.1.3. Create negation filter

Each company has own negation filter, to satisfy query limits set by Twitter

In [None]:
df_negation = pd.DataFrame(columns = df['Company'].values)

negation_apple = '(-fruit - eat -\"apple  music\" -\"green apple\" -apple.news -\"apple watch\" -music.apple -apps.apple -\"red apple\" -\"apple juice\" -IOS -\"apple tea\" -\"apple cake\" -\"apple cider\" -playing -games)'
negation_amazon = '(-\"the amazon\" -\"amzn.to\" -\"amazon.com\" -\"brazilian amazon\" -"rainforest" -"https://www.amazon" -"amazon.in" -"https://featurepoints.com" -\"free gift cards\")'
negation_chevron = None
negation_cola = None
negation_exxon = None
negation_macd = None
negation_microsoft = '(-\"microsoft office\" -\"microsoft teams\")'
negation_netflix = '(-\"watch netflix\" -\"watching netflix\" -\"netflix series\" -\"alteredcarbon\" -\"altered carbon\")'
negation_nike = '(-\"green nike\" -\"nike air\" -\"air max\")'
negation_salesforce = None
negation_tesla = None
negation_walmart = None

negation_row = [negation_apple, negation_amazon, negation_chevron, negation_cola, negation_exxon, 
                negation_google, negation_macd, negation_microsoft, negation_netflix, negation_nike, negation_salesforce,
                negation_tesla, negation_walmart]

df_negation.loc[len(df_negation)] = negation_row

### 1.1.4. Set file location

Set location where to store extracted data

In [None]:
store_loc = 'C:\\Users\\Jonas\\Documents\\Data\\Tweets\\'

### 1.1.5. Set parameters

Set parameters such as the maximum number of pages, and start date.

*Note*: dates should be included as a string in the format "YYYYMMDDHHmm"

In [None]:
# Initialize parameters
n_pages = 1000
max_results = 500
start_date = '2011-01-01T00:00:00Z'
critical_date = '2011-01-05T00:00:00Z'
end_date = '2021-08-31T12:00:00Z'

-----------------------------------------
-----------------------------------------

## 1.2 Get tweets

Store tweets as a DataFrame in a dictionary where the key matches the normal company name

In [None]:
# Set counter
counter = 0

Now find tweets for every company in the company DataFrame `df`. Use per company the company specific negation dictionary

In [None]:
for i in range(8, len(df)):
    # Set company attributes
    company_name = df.iloc[i]['Company']
    off_comp_name = df.iloc[i]['Official Company Name'] 
    ticker = df.iloc[i]['Symbol']
    
    # Construct attribute to collect tweets
    tweet_obj = CollectTwitterData(n_pages, max_results, start_date, end_date, critical_date, company_name, off_comp_name, ticker, counter, df_negation[company_name][0])
    tweets, counter = tweet_obj.get_tweets()
    
    # unpack public_metrics and discard public_metrics column
    tweets = tweets.reset_index(drop=True)
    tweets[['like_count', 'quote_count', 'reply_count', 'retweet_count']] = pd.DataFrame.from_records(tweets.public_metrics.dropna().tolist())
    tweets.drop(['public_metrics'], axis=1)
        
    file_name = f'tweets {company_name}.csv'   
    tweets.to_csv(store_loc+file_name, header=True)
    print(counter)

-------------------------------------------
-------------------------------------------