# Global Internet accessibility EDA

## How much of the world has access to the Internet?
In this project I'll explore internet accesibility at a global scale.
<br>
<br>
Guidelines:

1. What share of people are online?
* Share of the population using the internet by country
* Share of the population using the internet by income group
* Share of the population using the internet by region
2. How many Internet users does each country and region have?
* Total number of people using the internet by country
* Total number of people using the internet by region
* Top 20 countries with the highest internet use by population share
* Top 10 countries with the highest internet use by population share over time (2000 to present)
<br>
<br>
<br>
<br>
The following visualizations illustrate the internet access of the population globally.  
Generally, in developed nations, more than two-thirds of the population are connected to the internet.  
In underdeveloped countries, the usage rate is lower, but it is growing at a steady pace.
<br>
<br>
The global access to the internet has grown rapidly since 1989, when the World Wide Web was created.  
In 1990, only 0.5% of the world population had access to the internet. However, by 2000, nearly half of the population in the US was using the internet.  
Meanwhile, most of the world still had limited access, with 93% in the East Asia and Pacific region and 99% in South Asia and Sub-Saharan Africa being offline.  
By 2016, 76% of people in the US were online, and many other countries had also caught up, with Iceland having the highest percentage of 98% of the population online.  
However, there are still many countries were little to no progress has been made, with fewer than 5% of people in very poor countries having access to the internet.  
The trend globally is that more people are getting online every year, with half the world population having internet access in 2017.  
Is important to note that as of 2018, close to half of the world population still does not have access to the internet.
<br>
<br>
<br>
<br>
* Note:  
Internet users are individuals who have used the Internet (from any location) in the last 3 months.  
The Internet can be used via a computer, mobile phone, personal digital assistant, games machine, digital TV etc.


# Setup

In [75]:
# Installs
# %pip install geopandas
# %pip install folium  
# %pip install plotly==5.11.0
# %pip install dash
# %pip install plotly.express

In [76]:
# Imports
import pandas as pd
import streamlit as st
import numpy as np
import matplotlib.pyplot as plt

import chart_studio.plotly as py
import plotly.express as px
import plotly.offline as po
import plotly.graph_objs as pg
import plotly.graph_objects as go
import plotly.io as pio


In [77]:
# Pandas options
pd.set_option('display.max_rows', 400) # Display 400 rows
pd.set_option('display.float_format', lambda x: '%.5f' % x) # Suppress scientific notation in Pandas

In [78]:
# Plotly theme
pio.templates.default = "plotly_white"

# Colors
colors_uniform = 'Mint'
colors_contrasting = px.colors.sequential.Viridis # To-do: choose a palette with more contrast
colors_monochromatic = px.colors.sequential.Mint
colors_monochromatic_reverse = px.colors.sequential.Mint_r
colors_diverging = px.colors.diverging.PRGn

# Data

## Import data

In [79]:
# Data source: World Bank

# Population = SP.POP.TOTL
# https://data.worldbank.org/indicator/SP.POP.TOTL
df_population_raw = pd.read_csv('Data Population/API_SP.POP.TOTL_DS2_en_csv_v2_4770387.csv', skiprows=4)

# Individuals using the Internet (% of population) = IT.NET.USER.ZS
# https://data.worldbank.org/indicator/IT.NET.USER.ZS
df_users_raw = pd.read_csv('Data internet/individuals_using_the_Internet_percentage_of_population.csv', skiprows=4)
df_metadata_country_raw = pd.read_csv('Data internet/Metadata_Country_individuals_using_the_Internet_percentage_of_population.csv')
df_metadata_indicator_raw= pd.read_csv('Data internet/Metadata_Indicator_individuals_using_the_Internet_percentage_of_population.csv')

In [80]:
df_users_1 = df_users_raw.copy()
df_metadata_country = df_metadata_country_raw.copy()
df_metadata_indicator = df_metadata_indicator_raw.copy()
df_population = df_population_raw.copy()

## Clean data

### df_population

In [81]:
# Drop unnecessary columns
df_population.drop(columns = ['Indicator Name', 'Indicator Code', 'Unnamed: 66'], inplace = True)

# Wide to long
# Get columns containig years
years_pop = df_population.columns[2:]

# Use melt to unpivot the DataFrame
df_population = df_population.melt(id_vars=['Country Name', 'Country Code'], value_vars=years_pop, var_name='Year', value_name='SP.POP.TOTL')

# Sort by country and year
df_population.sort_values(by=['Country Name', 'Year'], inplace=True)

# Rename columns
df_population.rename(columns={"SP.POP.TOTL": "Population"}, inplace = True)


In [141]:
# To-do:
# Correct dadatypes : I'm having problems with getting the format and de dtype right at the same time

df_population['Year'] = pd.to_datetime(df_population['Year']).dt.year


In [139]:
df_population.head(5)

Unnamed: 0,Country Name,Country Code,Year,Population
2,Afghanistan,AFG,1970,8622466.0
268,Afghanistan,AFG,1970,8790140.0
534,Afghanistan,AFG,1970,8969047.0
800,Afghanistan,AFG,1970,9157465.0
1066,Afghanistan,AFG,1970,9355514.0


In [140]:
df_population.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 16492 entries, 2 to 16491
Data columns (total 4 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Country Name  16492 non-null  object 
 1   Country Code  16492 non-null  object 
 2   Year          16492 non-null  int64  
 3   Population    16400 non-null  float64
dtypes: float64(1), int64(1), object(2)
memory usage: 644.2+ KB


### df_users_1

In [85]:
# Drop unnecessary columns
df_users_1.drop(columns = ['Indicator Name', 'Indicator Code', 'Unnamed: 66'], inplace = True)

# Wide to long
# Get years columns
years_usrs = df_users_1.columns[2:]

# Use melt to unpivot the DataFrame
df_users_1 = df_users_1.melt(id_vars=['Country Name', 'Country Code'], value_vars=years_usrs, var_name='Year', value_name='IT.NET.USER.ZS')

# Sort by country and year
df_users_1.sort_values(by=['Country Name', 'Year'], inplace=True)

# Rename columns
df_users_1.rename(columns={"IT.NET.USER.ZS": "Users percentage"}, inplace = True)


In [136]:
# To-do:
# Correct dadatypes : I'm having problems with getting the format and de dtype right at the same time

df_users_1['Year'] = pd.to_datetime(df_users_1['Year']).dt.year

df_users_1


Unnamed: 0,Country Name,Country Code,Year,Users percentage
2,Afghanistan,AFG,1970,
268,Afghanistan,AFG,1970,
534,Afghanistan,AFG,1970,
800,Afghanistan,AFG,1970,
1066,Afghanistan,AFG,1970,
...,...,...,...,...
15427,Zimbabwe,ZWE,1970,24.40000
15693,Zimbabwe,ZWE,1970,25.00000
15959,Zimbabwe,ZWE,1970,25.10000
16225,Zimbabwe,ZWE,1970,29.29857


In [137]:
df_users_1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 16492 entries, 2 to 16491
Data columns (total 4 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Country Name      16492 non-null  object 
 1   Country Code      16492 non-null  object 
 2   Year              16492 non-null  int64  
 3   Users percentage  7749 non-null   float64
dtypes: float64(1), int64(1), object(2)
memory usage: 644.2+ KB


In [88]:
df_users_1

Unnamed: 0,Country Name,Country Code,Year,Users percentage
2,Afghanistan,AFG,1960,
268,Afghanistan,AFG,1961,
534,Afghanistan,AFG,1962,
800,Afghanistan,AFG,1963,
1066,Afghanistan,AFG,1964,
...,...,...,...,...
15427,Zimbabwe,ZWE,2017,24.40000
15693,Zimbabwe,ZWE,2018,25.00000
15959,Zimbabwe,ZWE,2019,25.10000
16225,Zimbabwe,ZWE,2020,29.29857


### df_metadata_country

In [89]:
df_metadata_country.head(3)

Unnamed: 0,Country Code,Region,IncomeGroup,SpecialNotes,TableName,Unnamed: 5
0,ABW,Latin America & Caribbean,High income,,Aruba,
1,AFE,,,"26 countries, stretching from the Red Sea in t...",Africa Eastern and Southern,
2,AFG,South Asia,Low income,The reporting period for national accounts dat...,Afghanistan,


In [90]:
# Drop unnecessary columns
drop_columns_metadata_country = ['TableName', 'Unnamed: 5']
df_metadata_country.drop(columns = drop_columns_metadata_country, inplace=True)


### df_metadata_indicator

In [91]:
df_metadata_indicator.head()

Unnamed: 0,INDICATOR_CODE,INDICATOR_NAME,SOURCE_NOTE,SOURCE_ORGANIZATION,Unnamed: 4
0,IT.NET.USER.ZS,Individuals using the Internet (% of population),Internet users are individuals who have used t...,International Telecommunication Union (ITU) Wo...,


### Merge df_users_1 and df_metadata_country into df_users_metadata

In [92]:
# Merge users and metadata
users_metadata = df_users_1.merge(df_metadata_country, how='left', on='Country Code')
users_metadata.head(3)


Unnamed: 0,Country Name,Country Code,Year,Users percentage,Region,IncomeGroup,SpecialNotes
0,Afghanistan,AFG,1960,,South Asia,Low income,The reporting period for national accounts dat...
1,Afghanistan,AFG,1961,,South Asia,Low income,The reporting period for national accounts dat...
2,Afghanistan,AFG,1962,,South Asia,Low income,The reporting period for national accounts dat...


In [93]:
users_metadata[users_metadata['Country Code'] == 'WLD'].iloc[:3,:]

Unnamed: 0,Country Name,Country Code,Year,Users percentage,Region,IncomeGroup,SpecialNotes
16244,World,WLD,1960,,,,World aggregate.
16245,World,WLD,1961,,,,World aggregate.
16246,World,WLD,1962,,,,World aggregate.


In [94]:
# Replace values in column 'Region' with 'World where 'Region' is 'nan' for VIZ
users_metadata.loc[users_metadata['Country Name'] == 'World', 'Region'] = 'World'

In [95]:
users_metadata[users_metadata['Country Code'] == 'WLD'].iloc[:3,:]

Unnamed: 0,Country Name,Country Code,Year,Users percentage,Region,IncomeGroup,SpecialNotes
16244,World,WLD,1960,,World,,World aggregate.
16245,World,WLD,1961,,World,,World aggregate.
16246,World,WLD,1962,,World,,World aggregate.


In [96]:
# Isolate and drop group aggregates non World
df_group_aggregates = users_metadata[users_metadata['Region'].isna()]
group_aggregates = df_group_aggregates['Country Name'].unique()

print(group_aggregates)
df_group_aggregates.head(5)


['Africa Eastern and Southern' 'Africa Western and Central' 'Arab World'
 'Caribbean small states' 'Central Europe and the Baltics'
 'Early-demographic dividend' 'East Asia & Pacific'
 'East Asia & Pacific (IDA & IBRD countries)'
 'East Asia & Pacific (excluding high income)' 'Euro area'
 'Europe & Central Asia' 'Europe & Central Asia (IDA & IBRD countries)'
 'Europe & Central Asia (excluding high income)' 'European Union'
 'Fragile and conflict affected situations'
 'Heavily indebted poor countries (HIPC)' 'High income' 'IBRD only'
 'IDA & IBRD total' 'IDA blend' 'IDA only' 'IDA total'
 'Late-demographic dividend' 'Latin America & Caribbean'
 'Latin America & Caribbean (excluding high income)'
 'Latin America & the Caribbean (IDA & IBRD countries)'
 'Least developed countries: UN classification' 'Low & middle income'
 'Low income' 'Lower middle income' 'Middle East & North Africa'
 'Middle East & North Africa (IDA & IBRD countries)'
 'Middle East & North Africa (excluding high income)

Unnamed: 0,Country Name,Country Code,Year,Users percentage,Region,IncomeGroup,SpecialNotes
62,Africa Eastern and Southern,AFE,1960,,,,"26 countries, stretching from the Red Sea in t..."
63,Africa Eastern and Southern,AFE,1961,,,,"26 countries, stretching from the Red Sea in t..."
64,Africa Eastern and Southern,AFE,1962,,,,"26 countries, stretching from the Red Sea in t..."
65,Africa Eastern and Southern,AFE,1963,,,,"26 countries, stretching from the Red Sea in t..."
66,Africa Eastern and Southern,AFE,1964,,,,"26 countries, stretching from the Red Sea in t..."


In [97]:
# Eliminate group aggregates
users_metadata = users_metadata[users_metadata["Country Name"].isin(group_aggregates) == False]

users_metadata.head(3)

Unnamed: 0,Country Name,Country Code,Year,Users percentage,Region,IncomeGroup,SpecialNotes
0,Afghanistan,AFG,1960,,South Asia,Low income,The reporting period for national accounts dat...
1,Afghanistan,AFG,1961,,South Asia,Low income,The reporting period for national accounts dat...
2,Afghanistan,AFG,1962,,South Asia,Low income,The reporting period for national accounts dat...


### Decide on time span to analize

In [98]:
# Decide on time span to analize
which_years_to_keep = users_metadata.groupby('Year')['Users percentage'].count().reset_index()
which_years_to_keep

# Rename columns
which_years_to_keep.rename(columns={"Users percentage": "Nb of countries reporting data"}, inplace = True)


# VIZ Year vs IT.NET.USER.ZS
fig = px.line(which_years_to_keep, x="Year", y="Nb of countries reporting data", title="Nb of countries reporting data")
fig.show()

In [99]:
which_years_to_keep

Unnamed: 0,Year,Nb of countries reporting data
0,1960,7
1,1961,0
2,1962,0
3,1963,0
4,1964,0
5,1965,7
6,1966,0
7,1967,0
8,1968,0
9,1969,0


In [100]:
# I'll keep data from 1990 to the end of the dataset -1
# I'll analize the data until 2020 or 2017, depending on the graph

year_max = users_metadata['Year'].max() - 1

users_metadata = users_metadata[users_metadata['Year'].between(1990, year_max, inclusive='both')]
users_metadata


Unnamed: 0,Country Name,Country Code,Year,Users percentage,Region,IncomeGroup,SpecialNotes
30,Afghanistan,AFG,1990,0.00000,South Asia,Low income,The reporting period for national accounts dat...
31,Afghanistan,AFG,1991,0.00000,South Asia,Low income,The reporting period for national accounts dat...
32,Afghanistan,AFG,1992,0.00000,South Asia,Low income,The reporting period for national accounts dat...
33,Afghanistan,AFG,1993,0.00000,South Asia,Low income,The reporting period for national accounts dat...
34,Afghanistan,AFG,1994,0.00000,South Asia,Low income,The reporting period for national accounts dat...
...,...,...,...,...,...,...,...
16486,Zimbabwe,ZWE,2016,23.11999,Sub-Saharan Africa,Lower middle income,National Accounts data are reported in Zimbabw...
16487,Zimbabwe,ZWE,2017,24.40000,Sub-Saharan Africa,Lower middle income,National Accounts data are reported in Zimbabw...
16488,Zimbabwe,ZWE,2018,25.00000,Sub-Saharan Africa,Lower middle income,National Accounts data are reported in Zimbabw...
16489,Zimbabwe,ZWE,2019,25.10000,Sub-Saharan Africa,Lower middle income,National Accounts data are reported in Zimbabw...


### Merge  df_users_metadata & df_population into df_users (the df to analize)

In [101]:
# Merge users and metadata
users_population = users_metadata.merge(df_population, how='left', on=['Country Name', 'Country Code', 'Year'])

In [147]:
# Add column 'Users Total': total number of people using the internet
# The total number of people using the internet is calculated by multiplying the 
# % of population using the Internet['Users percentage']  
# with the population estimate ['Population']

users_population['Users Total'] = (users_population['Users percentage'] * users_population['Population']) / 100


In [148]:
# Save to file and re-import
users_population.to_csv('users.csv', index=False)

# Usable dataframe
df_users = pd.read_csv('users.csv')

In [149]:
df_users[df_users['Country Code'] == 'USA'].tail()

Unnamed: 0,Country Name,Country Code,Year,Users percentage,Region,IncomeGroup,SpecialNotes,Population,Users Total
6412,United States,USA,2016,85.54442,North America,High income,,323071755.0,276369863.1662
6413,United States,USA,2017,87.27489,North America,High income,,325122128.0,283749976.87915
6414,United States,USA,2018,88.4989,North America,High income,,326838199.0,289248221.25558
6415,United States,USA,2019,89.43028,North America,High income,,328329953.0,293626412.2486
6416,United States,USA,2020,90.9,North America,High income,,331501080.0,301334481.72


In [150]:
df_users.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6758 entries, 0 to 6757
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Country Name      6758 non-null   object 
 1   Country Code      6758 non-null   object 
 2   Year              6758 non-null   int64  
 3   Users percentage  6100 non-null   float64
 4   Region            6758 non-null   object 
 5   IncomeGroup       6696 non-null   object 
 6   SpecialNotes      2790 non-null   object 
 7   Population        6758 non-null   float64
 8   Users Total       6100 non-null   float64
dtypes: float64(3), int64(1), object(5)
memory usage: 475.3+ KB


### Create df_regions and df_income_ group

In [106]:
# Create df_regions
df_regions = df_users.groupby(['Region', 'Year']).agg({'Users percentage':'mean', 'Users Total': 'sum'}).reset_index()

In [107]:
# Create df_income_group
df_income_group = df_users.groupby(['IncomeGroup', 'Year']).agg({'Users percentage':'mean', 'Users Total': 'sum'}).reset_index()

# Analisis and VIZ

## 1. What share of people are online?

In [108]:
# VIZ text
title_share = df_metadata_indicator['INDICATOR_NAME'][0]
note = df_metadata_indicator['SOURCE_NOTE'][0]
source = df_metadata_indicator['SOURCE_ORGANIZATION'][0]
source_link = 'https://data.worldbank.org/indicator/IT.NET.USER.ZS'

### Share of the population using the internet by country

In [109]:
# Share of the population using the internet by country

fig = px.choropleth(df_users,
                    locations="Country Code",
                    color="Users percentage",
                    hover_name="Country Name", # column to add to hover information
                    hover_data=['Year','Users percentage', 'Region','IncomeGroup'],
                    color_continuous_scale=colors_monochromatic,
                    animation_frame="Year",
                    animation_group="Country Name",
                    range_color=(0, 100),
                    )

fig.update_layout(
    title_text=title_share,
    # coloraxis_colorbar_x=-0.1, # Bar to the left
    # coloraxis_colorbar_tickprefix = '%',
    coloraxis_colorbar_title = '% of population',    
    # margin={"r":0,"t":10,"l":0,"b":0}, # Map margins
    height=700, # Map height
    annotations = [dict(
        x=0.9,
        y=0,
        xref='paper',
        yref='paper',
        text=f'Source: <a href={source_link}>{source}</a>',
        showarrow = False
    )]
)

fig.update_geos(showframe=False,
                showcoastlines=False,
                # coastlinecolor="RebeccaPurple",
                showland=True, 
                landcolor="LightGrey",
                projection_type='equirectangular',
                # lataxis_showgrid=True,
                # lonaxis_showgrid=True
)

fig.update_traces(marker_line_color='white',
                  marker_line_width=0.5
                  )

fig.show()

### Share of the population using the internet by Income Group

In [152]:
# Share of the population using the internet by Income Group

fig = px.line(df_income_group,
              x="Year",
              y="Users percentage",
              color='IncomeGroup',
              color_discrete_sequence=colors_contrasting,
#              text=df_users['Year']
            )


fig.update_traces(mode="markers+lines", hovertemplate=None)

fig.update_layout(
    hovermode="x unified",
    hoverlabel = dict(namelength = -1), # Displays full text on hover unified                
    title_text=title_share + (' by income group'),
    height=600, # Map height
    annotations = [dict(
        x=0.99,
        y=0,
        xref='paper',
        yref='paper',
        text=f'Source: <a href={source_link}>{source}</a>',
        showarrow = False
    )]
)

fig.show()

### Share of the population using the internet by Region

In [111]:
# Share of the population using the internet by Region

fig = px.line(df_regions,
              x="Year",
              y="Users percentage",
              color='Region',
              color_discrete_sequence=colors_contrasting,
#              text=df_users['Year']
            )


fig.update_traces(mode="markers+lines", hovertemplate=None)

fig.update_layout(
    hovermode="x unified",
    hoverlabel = dict(namelength = -1), # Displays full text on hover unified                
    title_text=title_share + (' by region'),
    height=600, # Map height
)




fig.show()


## How many Internet users does each country and region have?

### Total number of people using the internet by country in 2017, the latest year with the most data completion

In [112]:
df_users_2017 = df_users[df_users['Year'] == 2017]

# Eliminating the World rows to have a correct result and colour scale
df_users_2017 = df_users_2017[df_users_2017['Region'] != 'World'] 

# Check if it's China
max_users = df_users_2017['Users Total'].max()
df_users[df_users['Users Total'] == max_users]

Unnamed: 0,Country Name,Country Code,Year,Users percentage,Region,IncomeGroup,SpecialNotes,Population,Users Total
1298,China,CHN,2017,54.3,East Asia & Pacific,Upper middle income,On 1 July 1997 China resumed its exercise of s...,1396215000.0,758144745.0


In [113]:
# Total number of people using the internet by country in 2017
fig = px.choropleth(df_users_2017,
                    locations="Country Code",
                    color="Users Total",
                    hover_name="Country Name", # column to add to hover information
                    hover_data=['Users Total', 'Region','IncomeGroup'],
                    color_continuous_scale=colors_monochromatic,
                    range_color=(0, max_users),
                    )

fig.update_layout(
    title_text='Total number of people using the internet by country in 2017',
    # coloraxis_colorbar_x=-0.1, # Bar to the left
    # coloraxis_colorbar_tickprefix = '%',
    coloraxis_colorbar_title = 'nb of users',    
    # margin={"r":0,"t":10,"l":0,"b":0}, # Map margins
    height=700, # Map height
    annotations = [dict(
        x=0.9,
        y=0,
        xref='paper',
        yref='paper',
        text=f'Source: <a href={source_link}>{source}</a>',
        showarrow = False
    )]
)

fig.update_geos(showframe=False,
                showcoastlines=False,
                # coastlinecolor="RebeccaPurple",
                showland=True, 
                landcolor="LightGrey",
                projection_type='equirectangular',
                # lataxis_showgrid=True,
                # lonaxis_showgrid=True
)

fig.update_traces(marker_line_color='white',
                  marker_line_width=0.5
                  )

fig.show()

### Total number of people using the internet by region

In [114]:
# Total number of people using the internet by region

df_viz_regions = df_regions[df_regions['Region'] != 'World'] # Cumulatve World numbers throws viz out of scale

fig = px.line(df_viz_regions,
              x="Year",
              y="Users Total",
              color='Region',
              color_discrete_sequence=colors_contrasting,
#              text=df_users['Year']
            )

fig.update_traces(mode="markers+lines", hovertemplate=None)

fig.update_traces(mode="markers+lines", hovertemplate=None)

fig.update_layout(
    hovermode="x unified",
    hoverlabel = dict(namelength = -1), # Displays full text on hover unified                
    title_text=('Total number of people using the internet by region'),
    height=600, # Map height
)



fig.show()

## Top 20

### Top 20 countries with the most internet users in 2017

In [115]:
#  Top 20 countries with the most internet users in 2017

# Data
df_top_20_users = df_users_2017.groupby(['Country Name', 'Country Code','Year' ])['Users Total'].sum().to_frame().reset_index()
df_top_20_users = df_top_20_users.sort_values(by=['Year', 'Users Total'], ascending=False)[:20]

# VIZ

fig = px.bar(df_top_20_users,
             y='Users Total',
             x='Country Name',
             text_auto='.2s',
             title="Top 20 countries with the most internet users in 2017",
            #  color='Users Total',
             color_discrete_sequence=colors_monochromatic_reverse,
            #  color_continuous_midpoint=0.5,
             height=400)

fig.update_traces(textfont_size=12, 
                  textangle=0, 
                  textposition="outside", 
                  cliponaxis=False)

fig.show()

### Top 20 countries with the highest internet use (by population share) in 2017

#### DATA

In [116]:
# Top 20 countries with the highest internet use by population share in 2017

df_top_20_share = df_users_2017.groupby(['Country Name', 'Country Code','Year' ])['Users percentage'].sum().to_frame().reset_index()
df_top_20_share = df_top_20_share.sort_values(by=['Year', 'Users percentage'], ascending=False)[:20]
df_top_20_share

Unnamed: 0,Country Name,Country Code,Year,Users percentage
114,Liechtenstein,LIE,2017,99.54661
21,Bermuda,BMU,2017,98.37
88,Iceland,ISL,2017,98.2552
106,Kuwait,KWT,2017,97.99999
65,Faroe Islands,FRO,2017,97.58196
159,Qatar,QAT,2017,97.38885
116,Luxembourg,LUX,2017,97.36296
9,Aruba,ABW,2017,97.17
53,Denmark,DNK,2017,97.09936
130,Monaco,MCO,2017,97.05298


In [117]:
# VIZ

fig = px.bar(df_top_20_share,
             y='Users percentage',
             x='Country Name',
             text_auto='.2s',
             title="Top 20 countries with the highest internet use by population share in 2017",
             color_discrete_sequence=colors_monochromatic_reverse)

fig.update_traces(textfont_size=12, 
                  textangle=0, 
                  textposition="outside", 
                  cliponaxis=False)

fig.show()

### Top 10 countries with the highest internet use by population share (over time) RETHINK THIS PART OF THE EDA

In [118]:
# Top 10 countries with the highest internet use by population share over time

def top_10(my_df, col_year):
    years = my_df[col_year].unique()

    df_top_10 = pd.DataFrame(columns=['Country Name', 'Country Code',  'Year',  'Users percentage'])

    for i in years:
        ds = my_df.query(f'Year == {i}')
        ds = ds.groupby(['Country Name', 'Country Code','Year' ])['Users percentage'].sum().to_frame().reset_index()
        ds = ds.sort_values(by=['Users percentage'], ascending=False)[:10]
        df_top_10 = pd.concat([df_top_10, ds])
    return df_top_10



In [119]:
df_top_10 = top_10(df_users, 'Year')
df_top_10 = df_top_10[df_top_10['Year'] >=  2000]

In [120]:
df_top_10.columns

Index(['Country Name', 'Country Code', 'Year', 'Users percentage'], dtype='object')

In [121]:
df_top_10['Users percentage'].min()

43.98435137

In [122]:


fig = px.choropleth(df_top_10,
                    locations="Country Code",
                    color="Users percentage",
                    hover_name="Country Name", # column to add to hover information
                    hover_data=['Year','Users percentage'],
                    color_continuous_scale=colors_monochromatic,
                    animation_frame="Year",
                    animation_group="Country Name",
                    range_color=(df_top_10['Users percentage'].min(), 100),
                    )

fig.update_layout(
    title_text="Top 10 countries with the highest internet use by population share (over time)",
    # coloraxis_colorbar_x=-0.1, # Bar to the left
    # coloraxis_colorbar_tickprefix = '%',
    coloraxis_colorbar_title = '% of population',    
    # margin={"r":0,"t":10,"l":0,"b":0}, # Map margins
    height=700, # Map height
    annotations = [dict(
        x=0.9,
        y=0,
        xref='paper',
        yref='paper',
        text=f'Source: <a href={source_link}>{source}</a>',
        showarrow = False
    )]
)

fig.update_geos(showframe=False,
                showcoastlines=False,
                # coastlinecolor="RebeccaPurple",
                showland=True, 
                landcolor="LightGrey",
                projection_type='equirectangular',
                # lataxis_showgrid=True,
                # lonaxis_showgrid=True
)

fig.update_traces(marker_line_color='white',
                  marker_line_width=0.5
                  )

fig.show()