# Motivation

**Goal  of the project**

The idea of the project was to do a exploratory analysis on two topics, those being the covid-19 and the European economy. By doing this, a better understanding could arise about the situation in the European countries, how affected they have been by the virus, and the quarantine. The goal would be to show the reader the effects of this, good or bad, and maybe see how it would affect the future.

**Data sets**

In order to achieve this, multiple data sets were needed. Firstly two datasets containing the number of cases in each country, and a data set containing the total deaths. those were taken from Johns Hopkins University Covid-19 [data set](https://github.com/CSSEGISandData/COVID-19). Another dataset needed about Europe's economy was taken from the International Monetary Fund [website](https://www.imf.org/en/Publications/SPROLLS/world-economic-outlook-databases#sort=%40imfdate%20descending). More specifically, two datasets, one for the Euro zone, and one for the Emerging and developing economies in Europe.

The reason why those specific datasets were chosen is based just on the credibility of those publications.

In [1]:
#All the libraries needed throughout the exercises
import numpy as np 
import pandas as pd 
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go

%matplotlib inline
sns.set() # Set searborn as default

# Basic stats

**Covid-19**
 
For the COVID-19 datasets, the number of cases/deaths for each day is given in a new column. In order to make the plots in an easier way, one of the first things was to transpose the date columns into rows. In addition, the two datasets (number of cases and number of deaths) were merged together in order to visualize data based on both cases. As we focused our exploratory analysis on countries in Europe, we also had to filter out all the rest countries in the dataset. In addition, as the datasets included regions for some countries, the data had to be summed for each country and date. (#An additional dataset that was used only for one plot (time-lapse) was a world population dataset# this in first section maybe)

**International Monetary Fund**

Concerning the International Monetary Fund, firstly, all the data used came from 3 different datasets. One for the Euro Zone, one for Emerging and Developing Europe, and another one were we needed to download sepparatly more specific countries that we wanted in the dataset, as UK, Sweeden, Denmark, etc.

After combining the three datasets, some specific rows were deleted, containing NaN and explanations from IMF.
some list were made, containing the years, subjects, and interested subjects that we needed.
Then, lastly, a function was made to plot figures accordingly, and a series of data masks for plotting multiple subjects in one plot.


df_cases = pd.read_csv("time_series_covid19_confirmed_global.csv", engine = 'python')
df_deaths = pd.read_csv("time_series_covid19_deaths_global.csv", engine = 'python')

In [2]:
df_cases = pd.read_csv("time_series_covid19_confirmed_global.csv", engine = 'python')
df_deaths = pd.read_csv("time_series_covid19_deaths_global.csv", engine = 'python')

df_cases = df_cases.drop(['Province/State','Lat','Long'], axis=1)

In [3]:
contries_europe = ['Austria', 'Belgium', 'Cyprus', 'Estonia', 'Finland', 'France',
       'Germany', 'Greece', 'Ireland', 'Italy', 'Latvia', 'Lithuania',
       'Luxembourg', 'Malta', 'Netherlands', 'Portugal',
       'Slovakia', 'Slovenia', 'Spain',
       'Albania', 'Belarus', 'Bosnia and Herzegovina', 'Bulgaria',
       'Croatia', 'Hungary', 'Moldova', 'Montenegro',
       'North Macedonia', 'Poland', 'Romania', 'Russia', 'Serbia',
       'Turkey', 'Ukraine', 'Denmark', 'United Kingdom', 'Switzerland', 'Norway', 'Sweden', 'Iceland', 'Czechia', 'Kosovo']
filt = ~df_cases['Country/Region'].isin(contries_europe)

In [4]:
# Get indexes of the Not focused crimes
indexesNotEurope = df_cases[filt].index
# Delete these row indexes from dataFrame
df_cases.drop(indexesNotEurope , inplace=True)

In [5]:
df_cases_melted = df_cases.melt(id_vars=['Country/Region'], var_name='Date')

In [6]:
grouped_cv19 = df_cases_melted.groupby(['Country/Region', 'Date'])['value'].sum()
grouped_cv19 = grouped_cv19.reset_index()
grouped_cv19 = grouped_cv19.rename(columns = {'Country/Region':'Country', 'value':'Cases'})

In [7]:
df_deaths = df_deaths.drop(['Province/State','Lat','Long'], axis=1)

In [8]:
filt = ~df_deaths['Country/Region'].isin(contries_europe)
# Get indexes of the Not focused crimes
indexesNotEurope = df_deaths[filt].index
# Delete these row indexes from dataFrame
df_deaths.drop(indexesNotEurope , inplace=True)

In [9]:
df_deaths_melted = df_deaths.melt(id_vars=['Country/Region'], var_name='Date')

In [10]:
grouped_deaths = df_deaths_melted.groupby(['Country/Region', 'Date'])['value'].sum()
grouped_deaths = grouped_deaths.reset_index()
grouped_deaths = grouped_deaths.rename(columns = {'Country/Region':'Country', 'value':'Deaths'})

In [11]:
df_merged = pd.merge(grouped_cv19, grouped_deaths,  how='left', left_on=['Country','Date'], right_on = ['Country','Date'])

In [12]:
from datetime import datetime, timedelta

df_merged['Date'] = pd.to_datetime(df_merged['Date'])
df_total = df_merged[(df_merged['Date'] == datetime.strftime(datetime.now() - timedelta(1), '%Y-%m-%d'))]
df_total = df_merged[(df_merged['Date'] == '2020-05-14')]

In [13]:
data1 = pd.read_csv("WEO_Data_Euro.csv", encoding = "ISO-8859-1")
data2 = pd.read_csv("WEO_Data_Poor.csv", encoding = "ISO-8859-1")
data3 = pd.read_csv("WEO_Data_others.csv", encoding = "ISO-8859-1")

In [14]:
frames = [data1, data2, data3]

In [15]:
bigData = pd.concat(frames)
bigData.to_csv("completeData.csv")

In [16]:
countryList = list(bigData.Country.unique())
countryList.pop(19)
countryList.pop(19)
countryList.sort()
years = list(bigData.columns[5:27])
subjects = list(bigData["Subject Descriptor"].unique())
subjects = subjects[0:8]
interestedsubjects = [subjects[2],subjects[3],subjects[5]]

In [17]:
#function to generate figures
def New_figure(country, subject, years):

    visible = "legendonly"
    fig.add_trace(
    go.Scatter(name=country,
               x=years,
               y=list(bigData[(bigData["Country"] == country) & (bigData["Subject Descriptor"].str.contains(subject, regex=False))].iloc[0][5:27]),
               visible=visible))
    
#create masks for subject visibility in the plot
mask1 = []
mask2 = []
mask3 = []
for i in range (0,len(interestedsubjects)):
    for j in range (0,len(countryList)):
        if i == 0:
            mask1.append(True)
            mask2.append(False)
            mask3.append(False)
        if i == 1:
            mask1.append(False)
            mask2.append(True)
            mask3.append(False)
        if i == 2:
            mask1.append(False)
            mask2.append(False)
            mask3.append(True)

In [18]:
df_imf = pd.read_csv("completeData.csv", engine = 'python', thousands=',')
df_imf = df_imf.drop(['Units', 'Scale','Country/Series-specific Notes', 'Unnamed: 0', 'Estimates Start After'], axis=1)

df_imf = df_imf[['Country','Subject Descriptor','2019']]
df_imf = df_imf.dropna()

pivoted = df_imf.pivot(index='Country', columns='Subject Descriptor', values='2019').reset_index()
df = pivoted

A basic plot that we would like to show is the number of cases based on country in descending order that shows the current situation of the coronavirus. 


Another basic plot that we can obtain is the unemployment rate for each country based on the international monetary fund dataset for the year 2019.

In [21]:
import plotly.express as px

data_conf = df_total.sort_values(by=['Cases'], ascending=False)
fig = px.bar(data_conf, x='Country', y='Cases')
fig.show()

In [22]:
fig = px.bar(df, x='Country', y='Unemployment rate')
fig.show()

# Data Analysis

In [23]:
import plotly.graph_objects as go

fig = go.Figure(data=go.Choropleth(
    locationmode = "country names",
    locations=df_total['Country'], # Spatial coordinates
    z = df_total['Cases'], # Data to be color-coded
    colorscale = 'Reds',
    colorbar_title = "Cases",
))

fig.update_layout(
    title_text = 'Number of confirmed cases in Europe',
    geo_scope='europe', # limite map scope to USA
    width=900,
    height=700,
)

fig.show()

In [24]:
fig = go.Figure(data=go.Choropleth(
    locationmode = "country names",
    locations=df_total['Country'], # Spatial coordinates
    z = df_total['Deaths'], # Data to be color-coded
    colorscale = 'Reds',
    colorbar_title = "Deaths",
))

fig.update_layout(
    title_text = 'Number of deaths in Europe',
    geo_scope='europe', # limite map scope to USA
    width=800,
    height=600,
)

fig.show()

In [25]:
df_px = pd.read_csv("pop_worldometer_data.csv")

In [26]:
df_merged['Date'] = pd.to_datetime(df_merged['Date'])
df_merged['Week_Number'] = df_merged['Date'].dt.week
#df_merged = df_merged.sort_values(by=['Week_Number'])
df_merged['DayNumber'] = df_merged['Date'].dt.strftime('%j')
df_merged = df_merged.sort_values(by=['DayNumber'])


Series.dt.weekofyear and Series.dt.week have been deprecated.  Please use Series.dt.isocalendar().week instead.



In [27]:
df_px = df_px.rename(columns = {'Country (or dependency)':'Country'})
df_pop = pd.merge(df_merged, df_px, on='Country')
#df_pop = pd.merge(df_merged, df_px, left_on='Country', right_on='Country')


In [28]:
contries_emerging = ['Albania', 'Belarus', 'Bosnia and Herzegovina', 'Bulgaria',
       'Croatia', 'Hungary', 'Kosovo', 'Moldova', 'Montenegro',
       'North Macedonia', 'Poland', 'Romania', 'Russia', 'Serbia',
       'Turkey', 'Ukraine']
filt_yes = df_pop['Country'].isin(contries_emerging)
filt_not = ~df_pop['Country'].isin(contries_emerging)

import numpy as np 

number = [ filt_not, 
           filt_yes ]

color  = [ "Developed", "Emerging"]

df_pop['Em/Dev'] = np.select( number, color )

In [29]:
df_sliced = df_pop[(df_pop['Date'] > '2020-03-01')] # As the virus started spreading in Europe in the beginning of March

In [30]:
px.scatter(df_sliced, x="Cases", y="Deaths", animation_frame="DayNumber", animation_group="Country",
           size="Population (2020)", color="Em/Dev", hover_name="Country",
           log_x=False, size_max=30, range_x=[0,300000], range_y=[0,40000])

In [31]:
import plotly.graph_objects as go

fig = go.Figure(data=go.Choropleth(
    locationmode = "country names",
    locations=df['Country'], # Spatial coordinates
    z = df['Gross domestic product per capita, constant prices'], # Data to be color-coded
    colorscale = 'Blues',
    colorbar_title = "GDP",
))

fig.update_layout(
    title_text = 'GDP per capita, constant prices',
    geo_scope='europe', # limite map scope to Europe
    width=900,
    height=700,
)

fig.show()

In [32]:
# Create figure
fig = go.Figure()

visibility = []
for j in interestedsubjects:
    for i in countryList:
        New_figure(i,j,years)


# Set title
fig.update_layout(
    title_text=interestedsubjects[0],
    height=600,
    hovermode = 'x'
    )

# Add range slider
fig.update_layout(
    xaxis=dict(
        rangeselector=dict(),
        rangeslider=dict(
            visible=True),
        type="date"
    )
)
# add buttons for subjects
fig.update_layout(
    updatemenus=[
        dict(
            type="buttons",
            direction="right",
            active=0,
            x=0.226,
            y=1.13,
            buttons=list([
                dict(label="GDP",
                     method="update",
                     args=[{"visible": mask1},
                           {"title": interestedsubjects[0]}]),
                dict(label="Inflation",
                     method="update",
                     args=[{"visible": mask2},
                           {"title": interestedsubjects[1]}]),
                dict(label="Unemployment",
                     method="update",
                     args=[{"visible": mask3},
                           {"title": interestedsubjects[2]}]),
            ]),
        )
    ])

fig.show()
print("          Click entry in the legend to selecte/deselect country, double click to isolate one entry")

          Click entry in the legend to selecte/deselect country, double click to isolate one entry


In [33]:
#plot for average all over europe for 2019,2020,2021
import plotly.graph_objects as go
import plotly.figure_factory as ff

bigData = bigData.dropna(subset=["Subject Descriptor"])

fig2 = go.Figure()


fig2.add_trace(
    go.Bar(name="2019",
           x=["2019"],
           y=[pd.to_numeric(bigData["2019"][bigData["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean()],
           text = [pd.to_numeric(bigData["2019"][bigData["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean()],
           textposition='outside'
          )
)

fig2.add_trace(
    go.Bar(name="2020",
           x=["2020"],
           y=[pd.to_numeric(bigData["2020"][bigData["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean()],
           text = [pd.to_numeric(bigData["2020"][bigData["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean()],
           textposition='outside'
          )
)

fig2.add_trace(
    go.Bar(name="2021",
           x=["2021"],
           y=[pd.to_numeric(bigData["2021"][bigData["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean()],
           text = [pd.to_numeric(bigData["2021"][bigData["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean()],
           textposition='outside'
          )
)

fig2.update_traces(texttemplate='%{text:.4s}', textposition='outside')
fig2.update_layout(uniformtext_minsize=15, uniformtext_mode='hide')

fig2.update_layout(
    title_text="GDP per capita average over Europe, percent change",
    width= 500,
    height=450,
    hovermode = 'x',
    margin=dict(l=20, r=20, t=30, b=20)
    )


fig2.show()

In [34]:
#plot for average all over europe for 2019,2020,2021
import plotly.graph_objects as go
import plotly.figure_factory as ff

data1 = data1.dropna(subset=["Subject Descriptor"])
data2 = data2.dropna(subset=["Subject Descriptor"])
fig3 = go.Figure()


fig3.add_trace(
    go.Bar(name="Euro Zone",
           x=["2019","2020","2021"],
           y=[pd.to_numeric(data1["2019"][data1["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean(),
             pd.to_numeric(data1["2020"][data1["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean(),
             pd.to_numeric(data1["2021"][data1["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean()],
           
           text = [pd.to_numeric(data1["2019"][data1["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean(),
             pd.to_numeric(data1["2020"][data1["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean(),
             pd.to_numeric(data1["2021"][data1["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean()],
           textposition='outside'
          )
)

fig3.add_trace(
    go.Bar(name="Emerging and developing Europe",
           x=["2019","2020","2021"],
           y=[pd.to_numeric(data2["2019"][data2["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean(),
             pd.to_numeric(data2["2020"][data2["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean(),
             pd.to_numeric(data2["2021"][data2["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean()],
           
           text = [pd.to_numeric(data2["2019"][data2["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean(),
             pd.to_numeric(data2["2020"][data2["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean(),
             pd.to_numeric(data2["2021"][data2["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean()],
           textposition='outside'
          )
)

fig3.update_traces(texttemplate='%{text:.4s}', textposition='outside')
fig3.update_layout(uniformtext_minsize=15, uniformtext_mode='hide')

fig3.update_layout(
    title_text="GDP per capita, percent change",
    width= 800,
    height=500,
    hovermode = 'x',
    margin=dict(l=20, r=20, t=30, b=20)
    )


fig3.show()

In [35]:
#plot for average all over europe for 2019,2020,2021
import plotly.graph_objects as go
import plotly.figure_factory as ff

bigData = bigData.dropna(subset=["Subject Descriptor"])

fig4 = go.Figure()



fig4.add_trace(
    go.Bar(name="2008 fall",
           x=["2007","2008","2009"],
           y=[pd.to_numeric(bigData["2008"][bigData["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean(),
             pd.to_numeric(bigData["2009"][bigData["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean(),
             pd.to_numeric(bigData["2010"][bigData["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean()],
           
           text = [pd.to_numeric(bigData["2008"][bigData["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean(),
             pd.to_numeric(bigData["2009"][bigData["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean(),
             pd.to_numeric(bigData["2010"][bigData["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean()],
           textposition='outside'
          )
)

fig4.add_trace(
    go.Bar(name="2020 Fall",
           x=["2019","2020","2021"],
           y=[pd.to_numeric(bigData["2019"][bigData["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean(),
             pd.to_numeric(bigData["2020"][bigData["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean(),
             pd.to_numeric(bigData["2021"][bigData["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean()],
           
           text = [pd.to_numeric(bigData["2019"][bigData["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean(),
             pd.to_numeric(bigData["2020"][bigData["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean(),
             pd.to_numeric(bigData["2021"][bigData["Subject Descriptor"].str.contains("domestic product per capita", regex=False)], errors='coerce').mean()],
           textposition='outside'
          )
)

fig4.update_traces(texttemplate='%{text:.4s}', textposition='outside')
fig4.update_layout(uniformtext_minsize=15, uniformtext_mode='hide')

fig4.update_layout(
    title_text="GDP per capita average over Europe, percent change",
    width= 1200,
    height=500,
    hovermode = 'x',
    margin=dict(l=20, r=20, t=30, b=20)
    )


fig4.show()

# Genre

Genre related, for Narrative Struture, we've tried a linear approach for oredering, such that the user can go through the explanations, data, and plots in a way that would make more sense and would be easier to understand, as multiple plots shown are related in specific ways.

For interactivity, we've hit for multiple ones, leaving as much control as possible for the user. Having hovering details, filtering possibilities in some of the plots, and buttons for selecting different types of data, the user can choose to either read and go through the data that we've supplied, or interact with the graphs, look in different types of data, and find what looks interesting to themselves. Looking specifically at multiple countries, this will come quite naturally as each user will be more curious about their own, which cannot be predefined as that would be to visualize all the available countries which will make for either quite a long read, or a uniteligible plot.

For the type of messeging, we've let the user with both introductory text to each plot, and captions to make everything more clear.

# Visualization

Explain the visualizations you've chosen.
Why are they right for the story you want to tell?


For visualizing the data that we had, we chosed multiple methods, depending on which was more appropriate for each type of data, and to retain the users focus by introducing new genres over time.
Firstly, for country datas, we opted for maps in which the user can hover in order to see more data about each country. After which to visualize GDP over years, we chose line plots in which the user can select the timeframe they want to see, interested countries, and the subject of the data.

# Discussion

In conclusion, after the project was done, it could be said that some interesting facts were found in the data, both in the covid-19's datasets and the Economic Overview datasets. Visualizing some of them worked out even better than expected, but still more analysis could have been done alongside the adition of multiple plots.

More specifically, the analysis could benefit from a better vizualization of the direct link between the Covid-19 cases, deaths, and the loss in GDP, as this wasn't exemplified clearly in the project.

## Contributions

Even though the analysis, and data exploratis was done by both of the authors, Ilian worked mostly on the Covid-19 data, analyzing it, preproccessing it, and visualizing it, while Darius worked mostly on the Economy Data set.