# How Much of the World Has Access to the Internet?

## 📖 Background
The first-ever "XP Accelerator Competition" is now open for entries.

This competition is part of the Free Week + XP challenge at DataCamp. Winners of this competition will earn 50,000 XP. Learn your way up our XP leaderboard!

In [2]:
# Import pandas
import pandas as pd

# Read the data
broadband = pd.read_csv('data/broadband.csv')

# Take a look at the first rows
broadband

Unnamed: 0,Entity,Code,Year,Broadband_Subscriptions
0,Afghanistan,AFG,2004,0.000809
1,Afghanistan,AFG,2005,0.000858
2,Afghanistan,AFG,2006,0.001892
3,Afghanistan,AFG,2007,0.001845
4,Afghanistan,AFG,2008,0.001804
...,...,...,...,...
3883,Zimbabwe,ZWE,2016,1.217633
3884,Zimbabwe,ZWE,2017,1.315694
3885,Zimbabwe,ZWE,2018,1.406322
3886,Zimbabwe,ZWE,2019,1.395818


### Data analysis example:

Find the number of broadband subscriptions (per 100 people) for the European Union in 2018. 

We can use bracket notation to filter for `Entity` equal to 'European Union' and the `Year` equal to 2018. 

In [3]:
selection = (broadband['Entity'] == 'European Union') & (broadband['Year'] == 2018)
broadband[selection]

Unnamed: 0,Entity,Code,Year,Broadband_Subscriptions
1136,European Union,,2018,34.732712


### Data science notebooks & visualizations
Visualizations are very helpful in summarizing data and gaining insights. A well-crafted chart often conveys information much better than a table.

It is very straightforward to include plots in a data science notebook. For example, let's look at how broadband subscriptions have changed in time in Latin America and the Caribbean. 

First, we filter our data for 'Latin America and Caribbean' and save that to a new data frame called `latam`:

In [4]:
selection = broadband['Entity'] == 'Latin America and Caribbean'
latam = broadband[selection]
latam

Unnamed: 0,Entity,Code,Year,Broadband_Subscriptions
1860,Latin America and Caribbean,,2000,0.035413
1861,Latin America and Caribbean,,2001,0.143387
1862,Latin America and Caribbean,,2002,0.320632
1863,Latin America and Caribbean,,2003,0.496933
1864,Latin America and Caribbean,,2004,1.211335
1865,Latin America and Caribbean,,2005,1.587966
1866,Latin America and Caribbean,,2006,2.430825
1867,Latin America and Caribbean,,2007,3.621901
1868,Latin America and Caribbean,,2008,4.974736
1869,Latin America and Caribbean,,2009,5.960544


Workspace has built-in chart cells (create one by clicking on **Add Chart**). We use one to build the chart using the `latam` table we created in the cell above.

In [5]:
# This is a chart, switch to the DataCamp editor to view and configure it.

You can also use other visualization libraries like Matplotlib or Seaborn by running the cell below to import them into this workspace.

## How Much of the World Has Access to the Internet?

Now let's now move on to the competition and challenge.

## 📖 Background
You work for a policy consulting firm. One of the firm's principals is preparing to give a presentation on the state of internet access in the world. She needs your help answering some questions about internet accessibility across the world.

## 💾 The data

#### The research team compiled the following tables ([source](https://ourworldindata.org/internet)):

#### internet
- "Entity" - The name of the country, region, or group.
- "Code" - Unique id for the country (null for other entities).
- "Year" - Year from 1990 to 2019.
- "Internet_usage" -  The share of the entity's population who have used the internet in the last three months.

#### people
- "Entity" - The name of the country, region, or group.
- "Code" - Unique id for the country (null for other entities).
- "Year" - Year from 1990 to 2020.
- "Users" - The number of people who have used the internet in the last three months for that country, region, or group.

#### broadband
- "Entity" - The name of the country, region, or group.
- "Code" - Unique id for the country (null for other entities).
- "Year" - Year from 1998 to 2020.
- "Broadband_Subscriptions" - The number of fixed subscriptions to high-speed internet at downstream speeds >= 256 kbit/s for that country, region, or group.

_**Acknowledgments**: Max Roser, Hannah Ritchie, and Esteban Ortiz-Ospina (2015) - "Internet." OurWorldInData.org._

In [165]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

### Loading and assessing broadband table

In [7]:
# Read the broadband table
broadband = pd.read_csv('data/broadband.csv')

# table preview
broadband

Unnamed: 0,Entity,Code,Year,Broadband_Subscriptions
0,Afghanistan,AFG,2004,0.000809
1,Afghanistan,AFG,2005,0.000858
2,Afghanistan,AFG,2006,0.001892
3,Afghanistan,AFG,2007,0.001845
4,Afghanistan,AFG,2008,0.001804
...,...,...,...,...
3883,Zimbabwe,ZWE,2016,1.217633
3884,Zimbabwe,ZWE,2017,1.315694
3885,Zimbabwe,ZWE,2018,1.406322
3886,Zimbabwe,ZWE,2019,1.395818


In [8]:
# Checking for null values across columns
broadband.isnull().sum()

Entity                       0
Code                       271
Year                         0
Broadband_Subscriptions      0
dtype: int64

In [9]:
# Counting rows with zero (0) broadband subscription
(broadband.Broadband_Subscriptions == 0).sum()

10

In [10]:
# Preview row with no broadband subscription
broadband[broadband.Broadband_Subscriptions == 0]

Unnamed: 0,Entity,Code,Year,Broadband_Subscriptions
651,Chad,TCD,2020,0.0
856,Democratic Republic of Congo,COD,2010,0.0
1454,Haiti,HTI,2012,0.0
1455,Haiti,HTI,2013,0.0
1456,Haiti,HTI,2014,0.0
1902,Lebanon,LBN,2002,0.0
1903,Lebanon,LBN,2003,0.0
1904,Lebanon,LBN,2004,0.0
1905,Lebanon,LBN,2005,0.0
1906,Lebanon,LBN,2006,0.0


In [31]:
# Counting unique broadband entity
len(broadband.Entity.unique())

222

In [32]:
# Counting unique broadband code
len(broadband.Code.unique())

209

In [33]:
# Counting unique broadband year
len(broadband.Year.unique())

23

In [11]:
# Checking general info including data type
broadband.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3888 entries, 0 to 3887
Data columns (total 4 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Entity                   3888 non-null   object 
 1   Code                     3617 non-null   object 
 2   Year                     3888 non-null   int64  
 3   Broadband_Subscriptions  3888 non-null   float64
dtypes: float64(1), int64(1), object(2)
memory usage: 121.6+ KB


### Loading and assessing internet table

In [12]:
# Read the internet table
internet = pd.read_csv('data/internet.csv')

# Take a look at the first rows
internet

Unnamed: 0,Entity,Code,Year,Internet_Usage
0,Afghanistan,AFG,1990,0.000000
1,Afghanistan,AFG,1991,0.000000
2,Afghanistan,AFG,1992,0.000000
3,Afghanistan,AFG,1993,0.000000
4,Afghanistan,AFG,1994,0.000000
...,...,...,...,...
7084,Zimbabwe,ZWE,2013,15.500000
7085,Zimbabwe,ZWE,2014,16.364740
7086,Zimbabwe,ZWE,2015,22.742818
7087,Zimbabwe,ZWE,2016,23.119989


In [13]:
# Checking for null values across columns
internet.isnull().sum()

Entity               0
Code              1328
Year                 0
Internet_Usage       0
dtype: int64

In [14]:
# Counting rows with zero (0) internet usage
(internet['Internet_Usage'] == 0).sum()

994

In [43]:
# View of year with count of no internet usage
internet[internet['Internet_Usage'] == 0]['Year'].value_counts()

1990    226
1991    195
1992    178
1993    160
1994    128
1995     81
1996      5
2004      2
2003      2
2002      2
1997      2
2000      1
2001      1
1999      1
1998      1
2005      1
2006      1
2007      1
2008      1
2009      1
2010      1
2011      1
2012      1
2013      1
Name: Year, dtype: int64

In [42]:
# Previewing first 20 entity and the count of no internet usage
internet[internet['Internet_Usage'] == 0]['Entity'].value_counts()[:20]

North Korea              23
Timor                     9
Syria                     7
Sudan                     7
Eritrea                   7
Oman                      7
Comoros                   7
Gabon                     7
Guinea-Bissau             6
Papua New Guinea          6
Paraguay                  6
Rwanda                    6
Saint Kitts and Nevis     6
Samoa                     6
Haiti                     6
Guyana                    6
Gibraltar                 6
Sao Tome and Principe     6
Grenada                   6
Palestine                 6
Name: Entity, dtype: int64

In [34]:
# Counting unique internet entity
len(internet.Entity.unique())

261

In [35]:
# Counting unique internet code
len(internet.Code.unique())

215

In [36]:
# Counting unique internet year
len(internet.Year.unique())

30

In [16]:
# Checking general info including data type
internet.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7089 entries, 0 to 7088
Data columns (total 4 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Entity          7089 non-null   object 
 1   Code            5761 non-null   object 
 2   Year            7089 non-null   int64  
 3   Internet_Usage  7089 non-null   float64
dtypes: float64(1), int64(1), object(2)
memory usage: 221.7+ KB


### Loading and assessing people table

In [17]:
# Read the people table
people = pd.read_csv('data/people.csv')

# table preview
people

Unnamed: 0,Entity,Code,Year,Users
0,Afghanistan,AFG,1990,0
1,Afghanistan,AFG,1991,0
2,Afghanistan,AFG,1992,0
3,Afghanistan,AFG,1993,0
4,Afghanistan,AFG,1994,0
...,...,...,...,...
6374,Zimbabwe,ZWE,2016,3341464
6375,Zimbabwe,ZWE,2017,3599269
6376,Zimbabwe,ZWE,2018,3763048
6377,Zimbabwe,ZWE,2019,3854006


In [19]:
# Checking for null values across columns
people.isnull().sum()

Entity      0
Code      307
Year        0
Users       0
dtype: int64

In [21]:
# Counting rows with where no one used the internet
(people.Users == 0).sum()

922

In [27]:
# View of year with count of no user of the internet
people[people.Users == 0]['Year'].value_counts()

1990    193
1991    178
1992    165
1993    153
1994    127
1995     80
1996      5
2004      2
2003      2
2002      2
1997      2
2000      1
2001      1
1999      1
1998      1
2005      1
2006      1
2007      1
2008      1
2009      1
2010      1
2011      1
2012      1
2013      1
Name: Year, dtype: int64

In [45]:
# View of year with count of no user of internet
people[people.Users == 0]['Entity'].value_counts()[:20]

North Korea         23
Timor                9
Oman                 7
Gabon                7
Eritrea              7
Comoros              7
Sudan                7
Syria                7
Madagascar           6
Malawi               6
Afghanistan          6
Maldives             6
Liechtenstein        6
Mali                 6
Marshall Islands     6
Mauritania           6
Mauritius            6
Lithuania            6
Latvia               6
Libya                6
Name: Entity, dtype: int64

In [38]:
# Counting unique people entity
len(people.Entity.unique())

223

In [39]:
# Counting unique people code
len(people.Code.unique())

214

In [26]:
# Counting unique people year
len(people.Year.unique())

31

In [37]:
# Checking general info and data type
people.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6379 entries, 0 to 6378
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Entity  6379 non-null   object
 1   Code    6072 non-null   object
 2   Year    6379 non-null   int64 
 3   Users   6379 non-null   int64 
dtypes: int64(2), object(2)
memory usage: 199.5+ KB


## Data issues

### Tidiness issues
`broadband table`
* column headers are capitalize

`internet table`
* column headers are capitalize

`people table`
* column headers are capitalize

## Cleaning data
### Making copy of original data

In [46]:
# copies of original data
broad_clean = broadband.copy()
int_clean = internet.copy()
ppl_clean = people.copy()

### Tidiness
Issue #1: Column headers are capitalize in the broadband table
Issue #2: Column headers are capitalize in the internet table
Issue #3: Column headers are capitalize in the people table

**Define**: Columns header should be in lower case for easy access. Change headers to lower case.

### Code

In [68]:
# Changing column headers for each table
broad_clean.columns = broad_clean.columns.str.lower()
int_clean.columns = int_clean.columns.str.lower()
ppl_clean.columns = ppl_clean.columns.str.lower()

### Test

In [69]:
print(broad_clean.columns)
print(int_clean.columns)
print(ppl_clean.columns)

Index(['entity', 'code', 'year', 'broadband_subscriptions'], dtype='object')
Index(['entity', 'code', 'year', 'internet_usage'], dtype='object')
Index(['entity', 'code', 'year', 'users'], dtype='object')


In [72]:
broad_clean

Unnamed: 0,entity,code,year,broadband_subscriptions
0,Afghanistan,AFG,2004,0.000809
1,Afghanistan,AFG,2005,0.000858
2,Afghanistan,AFG,2006,0.001892
3,Afghanistan,AFG,2007,0.001845
4,Afghanistan,AFG,2008,0.001804
...,...,...,...,...
3883,Zimbabwe,ZWE,2016,1.217633
3884,Zimbabwe,ZWE,2017,1.315694
3885,Zimbabwe,ZWE,2018,1.406322
3886,Zimbabwe,ZWE,2019,1.395818


In [73]:
int_clean

Unnamed: 0,entity,code,year,internet_usage
6,Afghanistan,AFG,2001,0.004723
7,Afghanistan,AFG,2002,0.004561
8,Afghanistan,AFG,2003,0.087891
9,Afghanistan,AFG,2004,0.105809
10,Afghanistan,AFG,2005,1.224148
...,...,...,...,...
7084,Zimbabwe,ZWE,2013,15.500000
7085,Zimbabwe,ZWE,2014,16.364740
7086,Zimbabwe,ZWE,2015,22.742818
7087,Zimbabwe,ZWE,2016,23.119989


In [74]:
ppl_clean

Unnamed: 0,entity,code,year,users
6,Afghanistan,AFG,2001,930
7,Afghanistan,AFG,2002,958
8,Afghanistan,AFG,2003,19903
9,Afghanistan,AFG,2004,24922
10,Afghanistan,AFG,2005,298829
...,...,...,...,...
6374,Zimbabwe,ZWE,2016,3341464
6375,Zimbabwe,ZWE,2017,3599269
6376,Zimbabwe,ZWE,2018,3763048
6377,Zimbabwe,ZWE,2019,3854006


### Merging Data

In [77]:
all_table = pd.merge(broad_clean, int_clean, how='outer', on=['entity', 'code', 'year'])\
    .merge(ppl_clean, how='outer', on=['entity', 'code', 'year'])

In [78]:
all_table

Unnamed: 0,entity,code,year,broadband_subscriptions,internet_usage,users
0,Afghanistan,AFG,2004,0.000809,0.105809,2.492200e+04
1,Afghanistan,AFG,2005,0.000858,1.224148,2.988290e+05
2,Afghanistan,AFG,2006,0.001892,2.107124,5.361140e+05
3,Afghanistan,AFG,2007,0.001845,1.900000,4.921630e+05
4,Afghanistan,AFG,2008,0.001804,1.840000,4.862610e+05
...,...,...,...,...,...,...
7002,Upper-middle-income countries,,2016,,,1.387884e+09
7003,Upper-middle-income countries,,2017,,,1.463646e+09
7004,Upper-middle-income countries,,2018,,,1.577537e+09
7005,Upper-middle-income countries,,2019,,,1.689000e+09


### Saving Data

In [79]:
all_table.to_csv('all_table.csv')

### Answering Questions

#### 1. What are the top 5 countries with the highest internet use (by population share)?

In [86]:
top5Country = all_table.groupby('entity')[['internet_usage']].sum().sort_values('internet_usage', ascending=False)[:5]
top5Country

Unnamed: 0_level_0,internet_usage
entity,Unnamed: 1_level_1
Norway,1855.540181
Iceland,1806.51291
Denmark,1753.80729
Netherlands,1735.916862
Sweden,1718.79937


The top 5 country with the highest internet use (by population share) are: Norway, Iceland, Denmark, Netherland and Sweden

#### 2. How many people had internet access in those countries in 2019?

In [128]:
country_list = [i for i in top5Country.index.tolist()]
top5Country_users_2019 = all_table[(all_table['year'] == 2019) & (all_table['entity'].isin(country_list))] 
top5Country_users_2019

Unnamed: 0,entity,code,year,broadband_subscriptions,internet_usage,users
882,Denmark,DNK,2019,43.945988,98.046435,5682653.0
1554,Iceland,ISL,2019,41.070286,,357179.0
2473,Netherlands,NLD,2019,43.624691,93.288591,16197940.0
2622,Norway,NOR,2019,42.027611,98.000004,5241320.0
3360,Sweden,SWE,2019,40.240856,94.493443,9702513.0


In [115]:
total_users = int(top5Country_users_2019.users.sum())

37181605

In [135]:
AfricaEastern_andSouthern = ['Angola', 'Botswana', 'Lesotho', 'Mozambique', 'Namibia', 'South Africa', 'Zambia', 'Zimbabwe', 
                             'Burundi', 'Comoros', 'Djibouti', 'Eritrea', 'Ethiopia', 'Kenya', 'Madagascar', 'Malawi', 
                             'Mauritius', 'Rwanda', 'Seychelles', 'Somalia', 'South Sudan', 'Eswatini', 'Tanzania', 'Uganda', 
                             'Sierra Leone']

AfricaWestern_andCentral =['Benin', 'Burkina Faso', 'Cape Verde', "Cote d'Ivoire", 'Gambia', 'Ghana', 'Guinea', 'Guinea-Bissau',
                           'Liberia', 'Mali', 'Mauritania', 'Niger', 'Nigeria', 'Senegal', 'Togo', 'Cameroon', 'Central African Republic', 
                           'Chad', 'Congo', 'Democratic Republic of Congo', 'Equatorial Guinea', 'Gabon', 'Sao Tome and Principe']

LatinAmerica_andCaribbean = ['Brazil', 'Mexico', 'Colombia', 'Argentina', 'Peru', 'Venezuela', 'Chile', 'Guatemala', 'Ecuador', 'Bolivia',	
                             'Haiti', 'Cuba', 'Dominican Republic', 'Honduras', 'Paraguay', 'Nicaragua', 'El Salvador', 'Costa Rica', 
                             'Panama', 'Uruguay', 'Guyana', 'Suriname']

EastAsia_andPacific = ['Australia', 'Brunei', 'Cambodia', 'China', 'Fiji', 'French Polynesia', 'Guam', 'Hong Kong', 'Indonesia', 
                       'Japan', 'Kiribati', 'Laos', 'Macao', 'Malaysia', 'Marshall Islands', 'Micronesia (country)', 'Mongolia', 'Myanmar', 
                       'Nauru', 'New Caledonia', 'New Zealand', 'Palau', 'Papua New Guinea', 'Philippines', 'Samoa', 'Singapore', 'Solomon Islands', 
                       'Thailand', 'Timor', 'Tonga', 'Tuvalu', 'Vanuatu', 'Vietnam']

SouthAsia = ['Afghanistan', 'Bangladesh', 'Bhutan', 'India', 'Maldives', 'Nepal', 'Pakistan', 'Sri Lanka']

NorthAmerica = ['Antigua and Barbuda', 'Aruba', 'Bahamas', 'Barbados', 'Belize', 'Bermuda', 'British Virgin Islands', 'Canada', 'Cayman Islands', 
                'Costa Rica', 'Cuba', 'Dominica', 'Dominican Republic', 'El Salvador', 'Greenland', 'Georgia', 'Grenada', 'Guatemala', 'Haiti', 'Honduras', 
                'Jamaica', 'Mexico', 'Montserrat', 'Nicaragua', 'Panama', 'Puerto Rico', 'Saint Kitts and Nevis', 'Saint Lucia', 
                'Saint Vincent and the Grenadines', 'Trinidad and Tobago', 'United States', 'United States Virgin Islands']

EuropeanUnion = ['Albania', 'Andorra', 'Austria', 'Belgium', 'Bulgaria', 'Croatia', 'Cyprus', 'Czechia', 'Denmark', 'Estonia', 'Faeroe Islands', 'Finland', 'France', 
                 'Germany', 'Greece', 'Hungary', 'Ireland', 'Italy', 'Latvia', 'Lithuania', 'Luxembourg', 'Malta', 'Netherlands', 'Poland', 
                 'Portugal', 'Romania', 'Slovakia', 'Slovenia', 'Spain', 'Sweden', 'Bosnia and Herzegovina', 'Curacao', 'Gibraltar', 'Iceland', 'Liechtenstein', 
                 'Norway'] 

In [195]:
conds = [all_table['entity'].isin(AfricaEastern_andSouthern), all_table['entity'].isin(AfricaWestern_andCentral), all_table['entity'].isin(LatinAmerica_andCaribbean),
         all_table['entity'].isin(EastAsia_andPacific), all_table['entity'].isin(SouthAsia), all_table['entity'].isin(NorthAmerica), all_table['entity'].isin(EuropeanUnion)]
choices = ['Africa Eastern and Southern', 'Africa Western and Central', 'Latin America & Caribbean', 'East Asia & Pacific', 'South Asia', 'North America', 'European Union']

In [196]:
all_table['region'] = np.select(conds, choices, 'Other Region')
all_table

Unnamed: 0,entity,code,year,broadband_subscriptions,internet_usage,users,region
0,Afghanistan,AFG,2004,0.000809,0.105809,2.492200e+04,South Asia
1,Afghanistan,AFG,2005,0.000858,1.224148,2.988290e+05,South Asia
2,Afghanistan,AFG,2006,0.001892,2.107124,5.361140e+05,South Asia
3,Afghanistan,AFG,2007,0.001845,1.900000,4.921630e+05,South Asia
4,Afghanistan,AFG,2008,0.001804,1.840000,4.862610e+05,South Asia
...,...,...,...,...,...,...,...
7002,Upper-middle-income countries,,2016,,,1.387884e+09,Other Region
7003,Upper-middle-income countries,,2017,,,1.463646e+09,Other Region
7004,Upper-middle-income countries,,2018,,,1.577537e+09,Other Region
7005,Upper-middle-income countries,,2019,,,1.689000e+09,Other Region


In [198]:
all_table[all_table['entity'].isin(SouthAsia)]

Unnamed: 0,entity,code,year,broadband_subscriptions,internet_usage,users,region
0,Afghanistan,AFG,2004,0.000809,0.105809,24922.0,South Asia
1,Afghanistan,AFG,2005,0.000858,1.224148,298829.0,South Asia
2,Afghanistan,AFG,2006,0.001892,2.107124,536114.0,South Asia
3,Afghanistan,AFG,2007,0.001845,1.900000,492163.0,South Asia
4,Afghanistan,AFG,2008,0.001804,1.840000,486261.0,South Asia
...,...,...,...,...,...,...,...
6387,Sri Lanka,LKA,1996,,0.054464,9979.0,South Asia
6388,Sri Lanka,LKA,1997,,0.162465,29973.0,South Asia
6389,Sri Lanka,LKA,1998,,0.296351,55005.0,South Asia
6390,Sri Lanka,LKA,1999,,0.348414,65050.0,South Asia


In [None]:
# NorthAfrica = ['Algeria', 'Tunisia', 'Egypt', 'Libya', 'Morocco', 'Sudan']
# Caucasus_region = ['Armenia', 'Azerbaijan']
# PersianGulf = ['Bahrain', 'Oman']
# WesternAsia = ['Iran', 'Turkey', 'Yemen', 'Iraq', 'Turkey', 'Syria', 'Palestine', 'Qatar', 'Saudi Arabia']
# MiddleEast = ['Israel', 'Jordan', 'Lebanon', 'United Arab Emirates']
# CentralAsia = ['Kazakhstan', 'Uzbekistan']
# Europe = ['Monaco', 'Moldova', 'Montenegro', 'North Macedonia', 'San Marino', 
#           'Belarus', 'Switzerland', 'Ukraine']

## 💾 The data

#### The research team compiled the following tables ([source](https://ourworldindata.org/internet)):

#### internet
- "Entity" - The name of the country, region, or group.
- "Code" - Unique id for the country (null for other entities).
- "Year" - Year from 1990 to 2019.
- "Internet_usage" -  The share of the entity's population who have used the internet in the last three months.

#### people
- "Entity" - The name of the country, region, or group.
- "Code" - Unique id for the country (null for other entities).
- "Year" - Year from 1990 to 2020.
- "Users" - The number of people who have used the internet in the last three months for that country, region, or group.

#### broadband
- "Entity" - The name of the country, region, or group.
- "Code" - Unique id for the country (null for other entities).
- "Year" - Year from 1998 to 2020.
- "Broadband_Subscriptions" - The number of fixed subscriptions to high-speed internet at downstream speeds >= 256 kbit/s for that country, region, or group.

_**Acknowledgments**: Max Roser, Hannah Ritchie, and Esteban Ortiz-Ospina (2015) - "Internet." OurWorldInData.org._

## 💪 Challenge
Create a report to answer the principal's questions. Include:

1. What are the top 5 countries with the highest internet use (by population share)?
2. How many people had internet access in those countries in 2019?
3. What are the top 5 countries with the highest internet use for each of the following regions: 'Africa Eastern and Southern', 'Africa Western and Central', 'Latin America & Caribbean', 'East Asia & Pacific', 'South Asia', 'North America', 'European Union'?
4. Create a visualization for those five regions' internet usage over time.
5. What are the 5 countries with the most internet users?
6. What is the correlation between internet usage (population share) and broadband subscriptions for 2019?
7. Summarize your findings.

## 🧑‍⚖️ Judging criteria  

| CATEGORY | WEIGHTING | DETAILS                                                              |
|:---------|:----------|:---------------------------------------------------------------------|
| **Response quality** | 85%       | <ul><li> Accuracy (30%) - The response must be representative of the original data and free from errors.</li><li> Clarity (25%) - The response must be easy to understand and clearly expressed.</li><li> Completeness (30%) - The response must be a full report that responds to the question posed.</li></ul>       |
| **Presentation** | 15% | <ul><li>How legible/understandable the response is.</li><li>How well-formatted the response is.</li><li>Spelling and grammar.</li></ul> |

In the event of a tie, earlier submission time will be used as a tie-breaker. 

## 📘 Rules
To be eligible to win, you must:
* Submit your response to this problem before the deadline. 

All responses must be submitted in English.

Entrants must be:
* 18+ years old.
* Allowed to take part in a skill-based competition from their country.

Entrants can not:
* Be in a country currently sanctioned by the U.S. government.

**XP will be awarded at the end of the competition. Therefore competition XP will not count towards any daily prizes.**

## ✅ Checklist before submitting your workspace
- Rename your workspace to make it descriptive of your work. N.B., you should leave the notebook name as notebook.ipynb.
- **Remove redundant cells** like the introduction to data science notebooks, so the workbook is focused on your story.
- Check that all the cells run without error.

## ⌛️ Time is ticking. Good luck!