<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Imports" data-toc-modified-id="Imports-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Imports</a></span></li><li><span><a href="#Data-loading-and-processing" data-toc-modified-id="Data-loading-and-processing-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Data loading and processing</a></span></li><li><span><a href="#Basic-information" data-toc-modified-id="Basic-information-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Basic information</a></span></li><li><span><a href="#Freedom-of-press-across-the-world" data-toc-modified-id="Freedom-of-press-across-the-world-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Freedom of press across the world</a></span><ul class="toc-item"><li><span><a href="#Freedom-of-press-by-continent" data-toc-modified-id="Freedom-of-press-by-continent-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Freedom of press by continent</a></span></li><li><span><a href="#Change-across-the-years" data-toc-modified-id="Change-across-the-years-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Change across the years</a></span></li></ul></li><li><span><a href="#Government-forms-and-freedom-of-press" data-toc-modified-id="Government-forms-and-freedom-of-press-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Government forms and freedom of press</a></span><ul class="toc-item"><li><span><a href="#Governments-of-the-world" data-toc-modified-id="Governments-of-the-world-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Governments of the world</a></span></li><li><span><a href="#Freedom-of-press-in-democratic-and-autocratic-regimes-(TODO:-Change-header-text)" data-toc-modified-id="Freedom-of-press-in-democratic-and-autocratic-regimes-(TODO:-Change-header-text)-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>Freedom of press in democratic and autocratic regimes (TODO: Change header text)</a></span></li></ul></li></ul></div>

## Imports

In [24]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sqlite3
import plotly.express as px
import plotly
import math
import pycountry_convert

from pycountry import countries
from IPython.core.display import HTML, display
from plotly.subplots import make_subplots

## Data loading and processing

In [2]:
reign=pd.read_csv("REIGN_2021_6.csv")
press_index=pd.read_csv("World Press Index 2021.csv")
continents=pd.read_csv("country-and-continent-codes.csv")

In [3]:
def add_alpha_2(row):
    try:
        alpha_2=countries.search_fuzzy(row)[0].alpha_2
    except LookupError:
        alpha_2=None
    return alpha_2

def add_alpha_3(row):
    try:
        alpha_3=countries.search_fuzzy(row)[0].alpha_3
    except LookupError:
        alpha_3=None
    return alpha_3

In [4]:

### Get only latest relevant info from REIGN dataset

reign_latest=pd.DataFrame(columns=reign.columns)
reign_grouped=reign.groupby("country")
for name, group in reign_grouped:
    latest=group.sort_values(by="year",ascending=False).iloc[0,:]
    reign_latest=reign_latest.append(latest)
reign_latest=reign_latest.reset_index()[['ccode','country','leader','year','government','gov_democracy']]
reign_latest.head()

### Add country codes

#### REIGN

#drop countries that no longer exist
reign_latest=reign_latest.query("country not in ['Czechoslovakia','Germany East','Soviet Union','Vietnam South','Yemen South','Yugoslavia']")
#replace other countries with official names
map_dict={'Cape Verde':'Cabo Verde',
          'Cen African Rep':'Central African Republic',
          'Congo-Brz':'Congo',
          'Congo/Zaire':'Congo, The Democratic Republic of the',
         'East Timor':'Timor-Leste',
         'Guinea Bissau':'Guinea-Bissau',
         'Ivory Coast':'Côte d\'Ivoire',
         'Korea North':'Korea, Democratic People\'s Republic of',
         'Korea South':'Korea, Republic of',
         'Laos':'Lao People\'s Democratic Republic',
         'St Kitts and Nevis':'Saint Kitts and Nevis',
         'St Lucia':'Saint Lucia',
         'St Vincent':'Saint Vincent and the Grenadines',
         'Swaziland':'Eswatini',
         'UKG':'United Kingdom'}
reign_latest['country']=reign_latest['country'].map(map_dict).fillna(reign_latest['country'])
#add country codes
reign_latest['alpha_2']=reign_latest['country'].apply(lambda x: add_alpha_2(x))
reign_latest['alpha_3']=reign_latest['country'].apply(lambda x: add_alpha_3(x))

#Niger gets the wrong code, need to fix it 
reign_latest.loc[reign_latest['country']=='Niger','alpha_2']='NE'
reign_latest.loc[reign_latest['country']=='Niger','alpha_3']='NER'
#### press_index

press_clean=press_index.copy()

#drop countries which don't exist in the reign dataset
press_clean=press_clean.query("`Country Name` not in ['Northern Cyprus','Hong Kong S.A.R.','Somaliland']")

#replace countries with official names
map_dict={'South Korea':'Korea, Republic of',
          'Ivory Coast':'Côte d\'Ivoire',
          'East Timor':'Timor-Leste',
          'Guinea Bissau':'Guinea-Bissau',
          'Republic of Congo':'Congo',
          'Democratic Republic of the Congo':'Congo, The Democratic Republic of the',
          'Laos':'Lao People\'s Democratic Republic',
          'North Korea':'Korea, Democratic People\'s Republic of'}
press_clean['Country Name']=press_clean['Country Name'].map(map_dict).fillna(press_clean['Country Name'])

#OECS is a tricky special case 
oecs_countries=['Antigua and Barbuda','Dominica','Grenada','Saint Kitts and Nevis','Saint Lucia','Saint Vincent and the Grenadines']

for country in oecs_countries:
    row=press_clean[press_clean['Country Name']=='OECS'].copy()
    row['Country Name']=country
    #row.reset_index(drop=True, inplace=True)
    press_clean=press_clean.append(row,ignore_index=True)

press_clean=press_clean.query("`Country Name` not in ['OECS']")
press_clean['alpha_2']=press_clean['Country Name'].apply(lambda x: add_alpha_2(x))
press_clean['alpha_3']=press_clean['Country Name'].apply(lambda x: add_alpha_3(x))
#Niger gets the wrong code, need to fix it 
press_clean.loc[press_clean['Country Name']=='Niger','alpha_2']='NE'
press_clean.loc[press_clean['Country Name']=='Niger','alpha_3']='NER'

oecs_row=press_clean[press_clean['Country Name']=='OECS'].copy()
oecs_row['Country Name']='Grenada'


press_clean[press_clean['Country Name']=='Serbia']

### Combine dataframes

continents=continents.rename(columns={'Two_Letter_Country_Code':'alpha_2'})
continents=continents[['Continent_Name','alpha_2']]
continents.head()

combined=pd.merge(press_clean,reign_latest,how="left", on=["alpha_2","alpha_3"])

#Add continents
combined=pd.merge(combined,continents,how="left", on=["alpha_2"])
#namibia is for some reason not in the continents data frame
combined.loc[23,'Continent_Name']='Africa'

#Some continent changes to be consistent with map geojsons
combined.loc[combined['country']=='Samoa','Continent_Name']='Asia'
combined.loc[combined['country']=='Tonga','Continent_Name']='Asia'

combined_with_change=combined.copy()
combined_with_change['Change']=combined_with_change.apply(lambda x: x['Global Score 2021']-x['Global Score 2019'], axis=1)
combined_with_change.head()

Unnamed: 0,Country Name,Abuse Score 2021,Underlying Situation Score 2021,Global Score 2021,Global Score 2020,Global Score 2019,alpha_2,alpha_3,ccode,country,leader,year,government,gov_democracy,Continent_Name,Change
0,Norway,0.0,6.72,6.72,7.84,7.82,NO,NOR,385.0,Norway,Solberg,2021.0,Parliamentary Democracy,1.0,Europe,-1.1
1,Finland,0.0,6.99,6.99,7.93,7.9,FI,FIN,375.0,Finland,Sanna Marin,2021.0,Parliamentary Democracy,1.0,Europe,-0.91
2,Sweden,0.0,7.24,7.24,9.25,8.31,SE,SWE,380.0,Sweden,Lofven,2021.0,Parliamentary Democracy,1.0,Europe,-1.07
3,Denmark,0.0,8.57,8.57,8.13,9.87,DK,DNK,390.0,Denmark,Mette Frederiksen,2021.0,Parliamentary Democracy,1.0,Europe,-1.3
4,Costa Rica,10.99,8.21,8.76,10.53,12.24,CR,CRI,94.0,Costa Rica,Carlos Alvarado Quesada,2021.0,Presidential Democracy,1.0,North America,-3.48


## Basic information


In [5]:
## avg. index
print(press_clean['Global Score 2021'].mean())

34.83251366120217


## Freedom of press across the world

In [6]:
HTML('''<div class="flourish-embed flourish-map" data-src="visualisation/6552777"><script src="https://public.flourish.studio/resources/embed.js"></script></div>''')

In [7]:
HTML("""<div class="flourish-embed flourish-bar-chart-race" data-src="visualisation/6581484"><script src="https://public.flourish.studio/resources/embed.js"></script></div>""")

TODO: What is abuse score again

In [8]:
HTML('''<div class="flourish-embed flourish-map" data-src="visualisation/6554015"><script src="https://public.flourish.studio/resources/embed.js"></script></div>''')

### Freedom of press by continent

In [9]:
#avg. indexes by continent
print(combined.groupby('Continent_Name')['Global Score 2021'].mean())

Continent_Name
Africa           38.101887
Asia             45.635000
Europe           25.261200
North America    28.881905
Oceania          20.657500
South America    31.411667
Name: Global Score 2021, dtype: float64


In [69]:
color_scale=['#fff0ad',
             '#ffc485',
             '#d13f3f',
             '#ad0034']

#### Asia

In [10]:
HTML('''<div class="flourish-embed flourish-map" data-src="visualisation/6554053" data-height="500px"><script src="https://public.flourish.studio/resources/embed.js"></script></div>''')

In [87]:
asia_df=combined.loc[combined['Continent_Name']=='Asia',:].sort_values(by='Global Score 2021',ascending=False)

asia_df=pd.concat([pd.DataFrame(asia_df.head(5)),pd.DataFrame(asia_df.tail(5))])
asia_df.loc[asia_df['Country Name']=='Korea, Democratic People\'s Republic of','Country Name']='North Korea'
asia_df.loc[asia_df['Country Name']=='Korea, Republic of','Country Name']='South Korea'

fig=px.bar(y=asia_df['Country Name'],
       x=asia_df['Global Score 2021'],
       orientation='h',
       text=asia_df['Global Score 2021'],
       color=asia_df['Global Score 2021'],
       color_continuous_scale=color_scale,
       range_color=[0,100])
fig.update_layout(
    title="Top and bottom 5 countries in Asia",
    xaxis_title="Global Score 2021",
    yaxis_title="Country",
    coloraxis_showscale=False
)
fig.add_hline(y=4.5, line_width=1, line_dash="dash")

#### Africa

In [11]:
HTML('''<div class="flourish-embed flourish-map" data-src="visualisation/6554077" data-height="500px"><script src="https://public.flourish.studio/resources/embed.js"></script></div>''')

In [88]:
df=combined.loc[combined['Continent_Name']=='Africa',:].sort_values(by='Global Score 2021',ascending=False)

df=pd.concat([pd.DataFrame(df.head(5)),pd.DataFrame(df.tail(5))])

fig=px.bar(y=df['Country Name'],
       x=df['Global Score 2021'],
       orientation='h',
       text=df['Global Score 2021'],
       color=df['Global Score 2021'],
       color_continuous_scale=color_scale,
       range_color=[0,100])
fig.update_layout(
    title="Top and bottom 5 countries in Africa",
    xaxis_title="Global Score 2021",
    yaxis_title="Country",
    coloraxis_showscale=False
)
fig.add_hline(y=4.5, line_width=1, line_dash="dash")

#### Europe

In [12]:
HTML('''<div class="flourish-embed flourish-map" data-src="visualisation/6554210" data-height="500px"><script src="https://public.flourish.studio/resources/embed.js"></script></div>''')

In [86]:
df=combined.loc[combined['Continent_Name']=='Europe',:].sort_values(by='Global Score 2021',ascending=False)

df=pd.concat([pd.DataFrame(df.head(5)),pd.DataFrame(df.tail(5))])

fig=px.bar(y=df['Country Name'],
       x=df['Global Score 2021'],
       orientation='h',
       text=df['Global Score 2021'],
       color=df['Global Score 2021'],
       color_continuous_scale=color_scale,
       range_color=[0,100])
fig.update_layout(
    title="Top and bottom 5 countries in Europe",
    xaxis_title="Global Score 2021",
    yaxis_title="Country",
    coloraxis_showscale=False
)
fig.add_hline(y=4.5, line_width=1, line_dash="dash")

#### North America

In [13]:
HTML('''<div class="flourish-embed flourish-map" data-src="visualisation/6554108" data-height="500px"><script src="https://public.flourish.studio/resources/embed.js"></script></div>''')

In [93]:
df=combined.loc[combined['Continent_Name']=='North America',:].sort_values(by='Global Score 2021',ascending=False)

df=pd.concat([pd.DataFrame(df.head(5)),pd.DataFrame(df.tail(5))])

fig=px.bar(y=df['Country Name'],
       x=df['Global Score 2021'],
       orientation='h',
       text=df['Global Score 2021'],
       color=df['Global Score 2021'],
       color_continuous_scale=color_scale,
       range_color=[0,100])
fig.update_layout(
    title="Top and bottom 5 countries in North America",
    xaxis_title="Global Score 2021",
    yaxis_title="Country",
    coloraxis_showscale=False
)
fig.add_hline(y=4.5, line_width=1, line_dash="dash")

#### South America

In [14]:
HTML('''<div class="flourish-embed flourish-map" data-src="visualisation/6554130" data-height="500px"><script src="https://public.flourish.studio/resources/embed.js"></script></div>''')

In [89]:
df=combined.loc[combined['Continent_Name']=='South America',:].sort_values(by='Global Score 2021',ascending=False)

df=pd.concat([pd.DataFrame(df.head(5)),pd.DataFrame(df.tail(5))])

fig=px.bar(y=df['Country Name'],
       x=df['Global Score 2021'],
       orientation='h',
       text=df['Global Score 2021'],
       color=df['Global Score 2021'],
       color_continuous_scale=color_scale,
       range_color=[0,100])
fig.update_layout(
    title="Top and bottom 5 countries in South America",
    xaxis_title="Global Score 2021",
    yaxis_title="Country",
    coloraxis_showscale=False
)
fig.add_hline(y=4.5, line_width=1, line_dash="dash")

#### Oceania

In [15]:
HTML('''<div class="flourish-embed flourish-map" data-src="visualisation/6554273" data-height="500px"><script src="https://public.flourish.studio/resources/embed.js"></script></div>''')

In [92]:
df=combined.loc[combined['Continent_Name']=='Oceania',:].sort_values(by='Global Score 2021',ascending=False)

#df=pd.concat([pd.DataFrame(df.head(5)),pd.DataFrame(df.tail(5))])

fig=px.bar(y=df['Country Name'],
       x=df['Global Score 2021'],
       orientation='h',
       text=df['Global Score 2021'],
       color=df['Global Score 2021'],
       color_continuous_scale=color_scale,
       range_color=[0,100])
fig.update_layout(
    title="Countries in Oceania",
    xaxis_title="Global Score 2021",
    yaxis_title="Country",
    coloraxis_showscale=False
)
#fig.add_hline(y=4.5, line_width=1, line_dash="dash")

TODO: Here put per-continent maps along with avg. scores per continent and best, worst countries per continent

### Change across the years

TODO: Here put the global change map and biggest +/- changes ranking

In [16]:
HTML('''<div class="flourish-embed flourish-map" data-src="visualisation/6554682"><script src="https://public.flourish.studio/resources/embed.js"></script></div>''')

## Government forms and freedom of press

TODO: What is democracy vs. autocracy, some cool sounding hypothesis

TODO: Pie chart of gov forms in the world

TODO: Avg. scores per regime

### Governments of the world

TODO: Map of govs (try to format nicely with choropleth so that democratic countries are different shades of one color and dictatorships are different shades of another)

In [17]:
HTML('''<div class="flourish-embed flourish-map" data-src="visualisation/6554762"><script src="https://public.flourish.studio/resources/embed.js"></script></div>''')

### Freedom of press in democratic and autocratic regimes (TODO: Change header text)

TODO: best and worst democratic and autocratic countries

TODO: Cards of top and bottom countries with leader photos and flags