# Exercise 03.02 | Internet users around the world

Your company wants to invest in countries with potential internet user growth.
That is we want to find countries with the lowest ratio of internet users.

Your task is to find the following information on the internet:

* A table showing the number of internet users within a country.
* A table showing the population per country.

Once you found a webpage, load the appropriate table into a DataFrame.

----

# Discussion

Different websites will contain this information, but let's go with CIA
factbook, as their tables are quite parseable (besides, the information will be
available in other formats as well).

* Internet users: [https://www.cia.gov/library/publications/the-world-factbook/fields/204rank.html](https://www.cia.gov/library/publications/the-world-factbook/fields/204rank.html)
* Population: [https://www.cia.gov/library/publications/the-world-factbook/fields/335rank.html](https://www.cia.gov/library/publications/the-world-factbook/fields/335rank.html)


Import Pandas and load the data info dataframe lists.

In [1]:
import pandas as pd

In [2]:
f204s = pd.read_html("https://www.cia.gov/library/publications/the-world-factbook/fields/204rank.html")
f335s = pd.read_html("https://www.cia.gov/library/publications/the-world-factbook/fields/335rank.html")

Confirm, that there is only one table on the page (if there are more, you need to find out, which one contains the relevant information); then assign new variables for the dataframes.

In [3]:
assert len(f204s) == 1
assert len(f335s) == 1

In [4]:
users = f204s[0]
population = f335s[0]

In [5]:
users.head()

Unnamed: 0,Rank,Country,Internet users,Date of Information
0,1,China,730723960,est.
1,2,India,374328160,est.
2,3,United States,246809221,est.
3,4,Brazil,122841218,est.
4,5,Japan,116565962,est.


In [6]:
population.head()

Unnamed: 0,Rank,Country,Population,Date of Information
0,1,China,1394015977,July 2020 est.
1,2,India,1326093247,July 2020 est.
2,3,United States,332639102,July 2020 est.
3,4,Indonesia,267026366,July 2020 est.
4,5,Pakistan,233500636,July 2020 est.


Excellent, you found an important relevant information quickly!

Optional task: Determine the countries with the lowest ratio of internet users.

For that, we need to merge the three important columns: "Country", "Internet users" and "Population".

In [7]:
mdf = population[["Country", "Population"]].merge(users[["Country", "Internet users"]])

In [8]:
mdf.head()

Unnamed: 0,Country,Population,Internet users
0,China,1394015977,730723960
1,India,1326093247,374328160
2,United States,332639102,246809221
3,Indonesia,267026366,65525226
4,Pakistan,233500636,31338715


Calculate the ratio and put it into a new column, the sort by the ratio.

In [9]:
mdf["Ratio"] = mdf["Internet users"] / mdf["Population"]

In [10]:
mdf.sort_values(by="Ratio").head(10)

Unnamed: 0,Country,Population,Internet users,Ratio
108,Eritrea,6081196,69095,0.011362
75,Somalia,11757124,203366,0.017297
14,"Congo, Democratic Republic of the",101780263,3016000,0.029632
208,Saint Martin,32556,1100,0.033788
146,Guinea-Bissau,1927104,66169,0.034336
68,Chad,16877357,592623,0.035113
55,Niger,22772361,805702,0.035381
109,Central African Republic,5990855,246000,0.041063
51,Madagascar,26955737,1151563,0.042721
74,Burundi,11865821,574236,0.048394


According to these statistics, Eritrea has the lowest ratio of internet users.