### Influential people's tweets impact on people's thoughts

##### The idea of this part of project is to explore how influential people (politicians, journalists, singers, actors, etc.) from different countries used tweeter during the coronavirus pandemic. To see the correlation between their tweets and the tweets of normal daily Twitter users and Wikipedia users we will use the results of the analysis done in the first part of the project.

<br/>

In [57]:
import pandas as pd
import numpy as np

import json
import pickle
from datetime import datetime, timedelta

# Twitter library
import tweepy

#Data visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

<br/>
We are using intervention data (from the given dataset coronawiki) and WHO dataset with number of new cases and deaths each day during pandemic in order to define periods of interest to analyse tweets for each country.

In [10]:
# Importing intervention dates for each country
data_path = './data/'

interventions = pd.read_csv(data_path + 'interventions.csv', delimiter=',', 
                            parse_dates=['1st case','1st death','School closure',
                                         'Public events banned','Lockdown','Mobility','Normalcy'])

interventions.set_index('lang', inplace = True)
interventions.head()

Unnamed: 0_level_0,1st case,1st death,School closure,Public events banned,Lockdown,Mobility,Normalcy
lang,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
fr,2020-01-24,2020-02-14,2020-03-14,2020-03-13,2020-03-17,2020-03-16,2020-07-02
da,2020-02-27,2020-03-12,2020-03-13,2020-03-12,2020-03-18,2020-03-11,2020-06-05
de,2020-01-27,2020-03-09,2020-03-14,2020-03-22,2020-03-22,2020-03-16,2020-07-10
it,2020-01-31,2020-02-22,2020-03-05,2020-03-09,2020-03-11,2020-03-11,2020-06-26
nl,2020-02-27,2020-03-06,2020-03-11,2020-03-24,NaT,2020-03-16,2020-05-29


In [56]:
who_data = pd.read_csv(data_path + 'WHO-COVID-19-global-data.csv', delimiter=',')

who_data.set_index('Date_reported', inplace = True)
who_data.head()

Unnamed: 0_level_0,Country_code,Country,WHO_region,New_cases,Cumulative_cases,New_deaths,Cumulative_deaths
Date_reported,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-01-03,AF,Afghanistan,EMRO,0,0,0,0
2020-01-04,AF,Afghanistan,EMRO,0,0,0,0
2020-01-05,AF,Afghanistan,EMRO,0,0,0,0
2020-01-06,AF,Afghanistan,EMRO,0,0,0,0
2020-01-07,AF,Afghanistan,EMRO,0,0,0,0


We are starting our analysis by defining lists containing the names of the analysed countries and the spoken languages in those contries. We will focus on Europian countires from the given dataset.

In [15]:
# List of countires and langugeges spoken in this countries
countries = ['FR','DK','DE','IT','NL','NO','SE','RS','FI','GB']
languages = ['fr','da','de','it','nl','no','sv','sr','fi','en']

<br/>
For the first stage, we will focus on retrieving tweets of influential people from Serbia and Italy (our homelands) for easier verification of the results.

In [39]:
# List of Twitter accounts of influental people from different countries
serbian_influential_people = ['avucic', 'SerbianPM', 'DraganDjilas', 'MarinikaTepic', 'JugoslavCosic',
                              'futomaki', 'VladoGeorgiev', 'brankica_st', 'OAmidzic']
italian_influential_people = ['lorenzojova']

In [35]:
# reading bearer tokens which we need to access Twitter API
with open(data_path+'BearerTokens.json','r') as file:
    bearer_tokens = json.load(file)

In [53]:
client = tweepy.Client(bearer_token=bearer_tokens["maja"], wait_on_rate_limit=True)

# getting users accounts from twitter
serbian_users = []
for username in serbian_influential_people:
    serbian_users.append(client.get_user(username=username))

for user in serbian_users: print(user.data.id, user.data.name, user.data.username)

356450858 Александар Вучић avucic
3036495555 Aна Брнабић SerbianPM
205153283 Dragan Djilas DraganDjilas
834073582514888710 Marinika Tepić MarinikaTepic
2814717661 Jugoslav Ćosić JugoslavCosic
181338564 Marija Serifovic futomaki
41577631 Vlado Georgiev - Barba VladoGeorgiev
1087237020 Brankica Stankovic brankica_st
324198256 ognjen amidzic OAmidzic


<br/>
We want to retrieve tweets from different periods during pandemic, but mostly focusing on weeks preceding intervention days or period with peeks in number of new cases and people's deaths.

For each tweet (or group of tweets) of users we would like to recognise to which topics tweets are related, what are most common used words in tweets and to check sentyment of the tweets. All of this can be useful to understand the way communication has changed during the pandemic and impacted people.