# Countries of the World - Data Analysis with Python
-----
![](https://www.nationalgeographic.com/content/dam/ngdotcom/rights-exempt/maps/world-classic-2018-banner-clip-72.adapt.1900.1.jpg)

In this notebook, we shall be analysing data of **227 countries and territories** spread across **11 different regions**.

To know more about the dataset, please click [here](https://www.kaggle.com/fernandol/countries-of-the-world).

This notebook is roughly divided into two parts - **Data Wrangling** & **Data Visualization**

The aim here is to perform the necessary data preprocessing and dive into building visuaizations to better understand the world we live in 🌏 

*So let's get started!*

# Data Wrangling
------

Import all the required **libraries**.

In [None]:
#Import all libraries 
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import missingno as msno

Import the dataset.

In [None]:
#Import the dataset
world = pd.read_csv("../input/countries-of-the-world/countries of the world.csv")

View the **head** of the imported dataset.

In [None]:
#View head of the dataset
world.head()

View the **non-null counts** and **datatypes** of all the columns.

In [None]:
#View non-null count and data types of all columns
world.info()

Visualize the distribution of **missing values** in the dataset.

In [None]:
#Visualize missing values
msno.matrix(world)
plt.show()

It was observed that **Country** and **Region** column values contained unwanted **whitespaces** so here we removed the same.

In [None]:
#Trim 'Country' column values of whitespaces
world['Country'] = world['Country'].str.strip()
#Trim 'Region' column values of whitespaces
world['Region'] = world['Region'].str.strip()

Look at all the **unique countries** in the dataset.

In [None]:
#All the unique countries in the dataset
world['Country'].unique()

Here we look at all the **unique regions** in the dataset.

In [None]:
#All the unique regions in the dataset
world['Region'].unique()

Replace **','** with **'.'** in the necessary columns and then convert them to **float64** datatype.

In [None]:
#Wrangle columns to replace ',' by '.' and convert to 'float64' datatype
cols = ['Pop. Density (per sq. mi.)','Coastline (coast/area ratio)','Net migration','Infant mortality (per 1000 births)','Literacy (%)','Phones (per 1000)','Arable (%)','Crops (%)','Other (%)','Climate','Birthrate','Deathrate','Agriculture','Industry','Service']
for i in cols:
    world[i] = world[i].str.replace(",",".")
    world[i] = world[i].astype('float64')

# Data Visualization
------

Horizontal bar plot showing the number of countries by region
---

In [None]:
#Number of countries by region
region_counts = pd.DataFrame(world.Region.value_counts().reset_index()) 
region_counts = region_counts.rename(columns={"index":"Region","Region":"Number of countries"})
region_counts = region_counts.sort_values(by="Number of countries",ascending=False)
sns.barplot(data=region_counts,y="Region",x="Number of countries",palette="RdBu_r")
plt.title("Number of countries (Region-wise)")
plt.show()

*List of countries in each region:*
___
**Asia (Ex. Near East)**: Afghanistan, Bangladesh, Bhutan, Brunei, Burma, Cambodia, China, East Timor, Hong Kong, India, Indonesia, Iran, Japan, Korea North, Korea South, Laos, Macau, Malaysia, Maldives, Mongolia, Nepal, Pakistan, Philippines, Singapore, Sri Lanka, Taiwan, Thailand and Vietnam
___
**Eastern Europe**: Albania, Bosnia & Herzegovina, Bulgaria, Croatia, Czech Republic, Hungary, Macedonia, Poland, Romania, Serbia, Slovakia and Slovenia
___
**Northern Africa**: Algeria, Egypt, Libya, Morocco, Tunisia and Western Sahara
___
**Oceania**: American Samoa, Australia, Cook Islands, Fiji, French Polynesia, Guam, Kiribati, Marshall Islands, Micronesia Fed. St., Nauru, New Caledonia, New Zealand, N. Mariana Islands, Palau, Papua New Guinea, Samoa, Solomon Islands, Tonga, Tuvalu, Vanuatu and Wallis and Futuna
___
**Western Europe**: Andorra, Austria, Belgium, Denmark, Faroe Islands, Finland, France, Germany, Gibraltar, Greece, Guernsey, Iceland, Ireland, Isle of Man, Italy, Jersey, Liechtenstein, Luxembourg, Malta, Monaco, Netherlands, Norway, Portugal, San Marino, Spain, Sweden, Switzerland and United Kingdom
___
**Sub-Saharan Africa**: Angola, Benin, Botswana, Burkina Faso, Burundi, Cameroon, Cape Verde, Central African Rep., Chad, Comoros, Congo Dem. Rep., Congo Repub. of the, Cote d'Ivoire, Djibouti, Equatorial Guinea, Eritrea, Ethiopia, Gabon, Gambia The, Ghana, Guinea, Guinea-Bissau, Kenya, Lesotho, Liberia, Madagascar, Malawi, Mali, Mauritania, Mauritius, Mayotte, Mozambique, Namibia, Niger, Nigeria, Reunion, Rwanda, Saint Helena, Sao Tome & Principe, Senegal, Seychelles, Sierra Leone, Somalia, South Africa, Sudan, Swaziland, Tanzania, Togo, Uganda, Zambia and Zimbabwe
___
**Latin America & Caribbean**: Anguilla, Antigua & Barbuda, Argentina, Aruba, Bahamas The, Barbados, Belize, Bolivia, Brazil, British Virgin Is., Cayman Islands, Chile, Colombia, Costa Rica, Cuba, Dominica, Dominican Republic, Ecuador, El Salvador, French Guiana, Grenada, Guadeloupe, Guatemala, Guyana, Haiti, Honduras, Jamaica, Martinique, Mexico, Montserrat, Netherlands Antilles, Nicaragua, Panama, Paraguay, Peru, Puerto Rico, Saint Kitts & Nevis, Saint Lucia, Saint Vincent and the Grenadines, Suriname, Trinidad & Tobago, Turks & Caicos Is, Uruguay, Venezuela and Virgin Islands
___
**C.W. of Ind. States**: Armenia, Azerbaijan, Belarus, Georgia, Kazakhstan, Kyrgyzstan, Moldova, Russia, Tajikistan, Turkmenistan, Ukraine and Uzbekistan
___
**Near East**: Bahrain, Cyprus, Gaza Strip, Iraq, Israel, Jordan, Kuwait, Lebanon, Oman, Qatar, Saudi Arabia, Syria, Turkey, United Arab Emirates, West Bank and Yemen
___
**Northern America**: Bermuda, Canada, Greenland, St Pierre & Miquelon and United States
___
**Baltics**: Estonia, Latvia and Lithuania

Boxplot showing the population distribution in countries across different regions (log scale)
---

In [None]:
#Boxplot of population of countries by region
sort_index_viz_2 = world.groupby("Region")["Population"].median().sort_values(ascending=False).index
viz_2 = sns.catplot(data=world,y="Region",x="Population",kind="box",color="#DECBE4",height=5,aspect=3,order=sort_index_viz_2)
viz_2.set(xscale="log")
plt.title("Population of countries by region")
plt.show()

**Asia** appears to have the highest median population while **Northern America** the lowest. The outliers in the Asian boxplot belong to **China**, **India** and **Indonesia** (one of the most populated countries in the world).

Table of total population % by region
---

In [None]:
#Total population % by region
pop_percent = world.groupby("Region")["Population"].sum().sort_values(ascending=False)/(world.Population.sum())
pop_percent = (round(pop_percent,2))*100
pop_percent

An interesting thing to notice here is that even though **Sub-Saharan Africa** accounts for **11%** of the total world population (second largest), it has smaller median compared to **Northern Africa**. This can be due to the large number of countries in the Sub-Saharan region (~50 vs 6). 

Table of Top 5 countries with highest population
---

In [None]:
#Top 5 countries in terms of population
world[["Country","Region","Population"]].sort_values(by="Population",ascending=False).head(5).set_index("Country")

Boxplot showing the distribution of area in countries across different regions (log scale)
---

In [None]:
#Boxplot of area of countries by region
sort_index_viz_3 = world.groupby('Region')['Area (sq. mi.)'].median().sort_values(ascending=False).index
viz_3 = sns.catplot(data=world,y="Region",x="Area (sq. mi.)",kind="box",color="skyblue",height=5,aspect=3,order=sort_index_viz_3)
viz_3.set(xscale="log")
plt.title("Area of countries by region")
plt.xlabel("Area (Square Miles)")
plt.show()

Here, **Northern America** has the highest median while **Oceania** the lowest. This can be attributed to the facts that Northern America is home to two of world's laregest countries - **Canada** and the **USA** while **Oceania** except for **Australia, New Zealand and Papua New Guinea** is home to relatively small island nations. 

Table of Top 5 countries by largest area
---

In [None]:
#Top 5 countries in terms of area
world[["Country","Region","Area (sq. mi.)"]].sort_values(by="Area (sq. mi.)",ascending=False).head(5).set_index("Country")

Boxplot showing the population density distribution in countries across different regions (log scale)
---
[Population density](https://en.wikipedia.org/wiki/Population_density) of a country equals its total population divided by its total area.

In [None]:
#Boxplot of population densities of countries by region
sort_index_viz_4 = world.groupby('Region')['Pop. Density (per sq. mi.)'].median().sort_values(ascending=False).index
viz_4 = sns.catplot(data=world,y="Region",x="Pop. Density (per sq. mi.)",kind="box",color="#CCEBC5",height=5,aspect=3,order=sort_index_viz_4)
viz_4.set(xscale="log")
plt.title("Population densities of countries by region")
plt.xlabel("Population density (per Square Miles)")
plt.show()

**Asia** has the **highest median** with three of its countries in the **top 5** - **Macau**, **Singapore** and **Hong Kong**. On the other hand, **Northern America** has the **lowest median** because of large land sizes and relatively small populations. Best example in this case would be **Canada** which has a population density of roughly **3 people per square mile**!

Table of Top 5 countries with highest population density
---

In [None]:
#Top 5 countries in terms of population density
world[["Country","Region","Pop. Density (per sq. mi.)"]].sort_values(by="Pop. Density (per sq. mi.)",ascending=False).head(5).set_index("Country")

Boxplot showing the coastline (coast/area ratio) distribution in countries across different regions (log scale)
---

In [None]:
#Boxplot of Coastline (coast/area ratio) of countries by region
sort_index_viz_5 = world.groupby('Region')['Coastline (coast/area ratio)'].median().sort_values(ascending=False).index
viz_5 = sns.catplot(data=world,y="Region",x="Coastline (coast/area ratio)",kind="box",color="#FFFFCC",height=5,aspect=3,order=sort_index_viz_5)
viz_5.set(xscale="log")
plt.title("Coastline (coast/area) of countries by region")
plt.xlabel("Coastline (coast/area ratio)")
plt.show()

As **Oceania** consists of the majority of island nations, it appears to have the **highest** median coastline while **C.W. of Ind. States** have the lowest. This is mainly because out of **12 countries** in this region, nearly **half** of them are [**landlocked countries**](https://en.wikipedia.org/wiki/Landlocked_country).

Table of Top 5 countries with largest coastline (coast/area ratio)
---

In [None]:
#Top 5 countries in terms of coastline
world[["Country","Region","Coastline (coast/area ratio)"]].sort_values(by='Coastline (coast/area ratio)',ascending=False).head(5).set_index("Country")

Boxplot showing the distribution of net migration rate in countries across different regions
---
[Net migration rate](https://en.wikipedia.org/wiki/Net_migration_rate) of a country is the difference between the number of immigrants (people coming into the country) and the number of emigrants (people leaving the country) throughout the year. A positive net migration rates indicates that there are more people entering than leaving a country and vice versa.

In [None]:
#Boxplot of Net migration of countries by region
sort_index_mig = world.groupby('Region')['Net migration'].median().sort_values(ascending=False).index
mig = sns.catplot(data=world,y="Region",x="Net migration",kind="box",color="green",height=5,aspect=3,order=sort_index_mig)
plt.title("Net migration rate of countries by region")
plt.xlabel("Net migration rate")
plt.show()

**Notable insights**:
* **Northern America** and **Western Europe** have higher medians as compared to the rest of the regions. This can be explained by the fact that more and more people are settling in first-world nations like Canada, USA, Germany, France, etc. which are situated in these regions.
* **Near East** also has a median > 0 with some of the largest outliers which explains the migration of people to countries like Qatar and Kuwait.
* **Sub-Saharan Africa** has more outiers having net migration rate < 0 which tells us that more and more people from the underdeveloped countries in this region are moving out to seek a better life.
* **Asia** has multiple outliers having net migration rate > 0 indicating that more people are moving to developed countries like **Singapore**, **Hong Kong** and **Macau**.

Boxplot showing the distribution of infant mortality rate in countries across different regions
---
[Infant mortality rate](https://en.wikipedia.org/wiki/Infant_mortality) is the number of deaths per 1,000 live births of children under one year of age.

In [None]:
#Boxplot of Infant mortality rate of countries by region
sort_index_viz_6 = world.groupby('Region')['Infant mortality (per 1000 births)'].median().sort_values(ascending=False).index
viz_6 = sns.catplot(data=world,y="Region",x="Infant mortality (per 1000 births)",kind="box",color="#FDDAEC",height=5,aspect=3,order=sort_index_viz_6)
plt.title("Infant mortality rate of countries by region")
plt.xlabel("Infant mortality (per 1000 births)")
plt.show()

**Sub-Saharan Africa** has the **highest median** of infant mortality rate. Some of the causes for this: **Neonatal causes, child pneumonia, malaria, diarrhoea, HIV/AIDS, measles and accidents**. You can read more about it [here](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5345500/). On the other hand, developed countries in **Northern America** and **Western Europe** seem to have the lowest infant mortality rates.

Table of Top 5 countries with highest infant mortality
---

In [None]:
#Top 5 countries in terms of infant mortality rate
world[["Country","Region","Infant mortality (per 1000 births)"]].sort_values(by='Infant mortality (per 1000 births)',ascending=False).head(5).set_index("Country")

Boxplot showing the GDP per capita ($) distribution in countries across different regions
---
[GDP per capita](https://www.thebalance.com/gdp-per-capita-formula-u-s-compared-to-highest-and-lowest-3305848) is a measure of a country's economic output that accounts for its number of people. It divides the country's gross domestic product by its total population.

In [None]:
#Boxplot of GDP per capita of countries by region
sort_index_viz_7 = world.groupby('Region')['GDP ($ per capita)'].median().sort_values(ascending=False).index
viz_7 = sns.catplot(data=world,y="Region",x="GDP ($ per capita)",kind="box",color="#E5D8BD",height=5,aspect=3,order=sort_index_viz_7)
plt.title("GDP (per capita) of countries by region")
plt.xlabel("GDP ($ Per Capita)")
plt.show()

**Northern America** has the highest median and it is closely followed by **Western Europe**. On the other hand, underdeveloped nations in **Sub Saharan Africa** account for the **lowest** GDP per capita as compared to the rest of the world.

Table of Top 5 countries with highest GDP per capita ($)
---

In [None]:
#Top 5 countries in terms of GDP per capita
world[["Country","Region","GDP ($ per capita)"]].sort_values(by='GDP ($ per capita)',ascending=False).head(5).set_index("Country")

Boxplot showing the literacy % distribution in countries across different regions
---
Literacy is popularly understood as an ability to read, write and use numeracy in at least one method of writing, an understanding reflected by mainstream dictionary and handbook definitions. Although this definition is not the rule of thumb as it is met with variations at different places in the world. You can learn more about literacy [here](https://en.wikipedia.org/wiki/Literacy).

In [None]:
#Boxplot of Literacy (%) of countries by region
sort_index_viz_8 = world.groupby('Region')['Literacy (%)'].median().sort_values(ascending=False).index
viz_8 = sns.catplot(data=world,y="Region",x="Literacy (%)",kind="box",color="#FBB4AE",height=5,aspect=3,order=sort_index_viz_8)
plt.title("Literacy (%) of countries by region")
plt.xlabel("Literacy (%)")
plt.show()

Except for the **African** regions, all the regions in the world appear to have the a **median literacy rate > 80%**. It is also worth mentioning that **Asia and Sub-Saharan Africa** have **largest IQR's** which tells us that there are major variations in the literacy rates of countries in these regions.

Table of Top 5 countries with lowest literacy rate (non-nan values)
---

In [None]:
#Bottom 5 countries in terms of Literacy (%) - With non-nan values
lit_table = world[["Country","Region","Literacy (%)"]].sort_values(by='Literacy (%)',ascending=False).set_index("Country")
lit_table[~lit_table['Literacy (%)'].isna()].sort_values(by='Literacy (%)').head(5)

Boxplot showing the phones (per 1000 people) distribution in countries across different regions
---
Number of phones used per 1000 people in a country's population.

In [None]:
#Boxplot of Phones (per 1000) of countries by region
sort_index_viz_9 = world.groupby('Region')['Phones (per 1000)'].median().sort_values(ascending=False).index
viz_9 = sns.catplot(data=world,y="Region",x="Phones (per 1000)",kind="box",color="#FED9A6",height=5,aspect=3,order=sort_index_viz_9)
plt.title("Phones (per 1000) of countries by region")
plt.xlabel("Phones (per 1000)")
plt.show()

All countries in **Northern America** and **Western Europe** have **phones/1000 people > 400** which is greater than the **medians** of all the other regions in the world!

Table of Top 5 countries with lowest number of phones per 1000 people (non-nan values)
---

In [None]:
#Bottom 5 countries in terms of phones (per 1000) - With non-nan values
phone_table = world[["Country","Region","Phones (per 1000)"]].sort_values(by='Phones (per 1000)',ascending=False).set_index("Country")
phone_table[~phone_table['Phones (per 1000)'].isna()].sort_values(by='Phones (per 1000)').head(5)

Boxplot showing the distribution of arable land % in countries across different regions
---
[**Arable land**](https://en.wikipedia.org/wiki/Arable_land) is any land capable of being ploughed and used to grow crops.

In [None]:
#Boxplot of Arable land (%) countries by region
sort_index_viz_10 = world.groupby('Region')['Arable (%)'].median().sort_values(ascending=False).index
viz_10 = sns.catplot(data=world,y="Region",x="Arable (%)",kind="box",color="violet",height=5,aspect=3,order=sort_index_viz_10)
plt.title("Arable land (%) of countries by region")
plt.xlabel("Arable land (%)")
plt.show()

**Near East & Northern Africa** have the lowest median of arable land % as compared to the other regions in the world because of the presence of large **desert** areas.

Table of Top 5 countries by highest arable land %
---

In [None]:
#Top 5 countries in terms of arable land %
world[["Country","Region","Arable (%)"]].sort_values(by='Arable (%)',ascending=False).head(5).set_index("Country")

Boxplot showing the distribution of birthrate in countries across different regions
---
[Birth rate](https://en.wikipedia.org/wiki/Birth_rate) in a period is the total number of live births per 1,000 population divided by the length of the period in years.

In [None]:
#Boxplot of Birthrate of countries by region
sort_index_viz_11 = world.groupby('Region')['Birthrate'].median().sort_values(ascending=False).index
viz_11 = sns.catplot(data=world,y="Region",x="Birthrate",kind="box",color="grey",height=5,aspect=3,order=sort_index_viz_11)
plt.title("Birthrate of countries by region")
plt.xlabel("Birthrate")
plt.show()

**Sub-Saharan** countries have the highest median birthrate amoung the other regions of the world. This is mainly because in developing and/or underdeveloped countries children are needed as a labour force and to provide care for their parents in old age. In these countries, fertility rates are higher due to the lack of access to contraceptives and generally lower levels of female education. You can read more about it here. On the other hand, developed countries tend to have a lower fertility rate due to lifestyle choices associated with economic affluence where mortality rates are low, birth control is easily accessible and children often can become an economic drain caused by housing, education cost and other cost involved in bringing up children. You can read more about it [here](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4255510/).

Table of Top 5 countries with highest birthrate
---

In [None]:
#Top 5 countries in terms of birthrate
world[["Country","Region","Birthrate"]].sort_values(by='Birthrate',ascending=False).head(5).set_index("Country")

Boxplot showing the distribution of deathrate in countries across different regions
---
[**Death rate**](https://en.wikipedia.org/wiki/Mortality_rate) is a measure of the number of deaths (in general, or due to a specific cause) in a particular population, scaled to the size of that population, per unit of time.

In [None]:
#Boxplot of Deathrate of countries by region
sort_index_viz_12 = world.groupby('Region')['Deathrate'].median().sort_values(ascending=False).index
viz_12 = sns.catplot(data=world,y="Region",x="Deathrate",kind="box",color="yellow",height=5,aspect=3,order=sort_index_viz_12)
plt.title("Deathrate of countries by region")
plt.xlabel("Deathrate")
plt.show()

**Sub-Saharan Africa** has the highest median of deathrate which can be attributed to low standards of living. A low standard of living leads to poor hygiene, sanitation and often malnutrition. As a result, there is an increased exposure to diseases and due to a lack of access to proper medical facilites, more people die resulting in an increased deathrate for the country. It is also important to note that this metric mainly takes into account deaths due to medical reasons and road accidents. Hence, we see that **Near East** has a low median of deathrate inspite of multiple armed conflicts in this region.  

Table of Top 5 countries with highest deathrate
---

In [None]:
#Top 5 countries in terms of deathrate
world[["Country","Region","Deathrate"]].sort_values(by='Deathrate',ascending=False).head(5).set_index("Country")

Correlation matrix of all numeric columns
---

In [None]:
#Correlation matrix of numeric columns
corr_matrix = world.corr()
sns.heatmap(corr_matrix,cmap='PuOr')

From the correlation matrix, we can see certain variables show significant correlations - **GDP, Literacy %, Phones, Birthrate, Deathrate & Infant Mortality**. Lets build a correlogram to observe their correlations at a closer level.

Correlogram of the following columns - GDP, Literacy %, Phones, Birthrate, Deathrate & Infant Mortality
---

In [None]:
#Correlogram with regression
first_cor = world[["GDP ($ per capita)","Literacy (%)","Phones (per 1000)","Birthrate","Deathrate","Infant mortality (per 1000 births)"]]
sns.pairplot(first_cor,kind="reg")

Notable insigts:
* **Birthrate vs Infant mortality** - Higher the infant mortality in a country, higher is the birthrate. This is quite prevalant in Sub-Saharan countries.
* **GDP vs Phones** - Higher the GDP per capita, better the economic well being of a country and higher the standards of living for its citizens. Hence, more people are able to afford phones and other commodities in life.
* **Literacy vs Infant mortality** - There appears strong negative correlation between these two variables.
* **Birthrate vs Literacy** - There appears strong negative correlation between these two variables.

BRICS countries comparison
---
[BRICS](https://en.wikipedia.org/wiki/BRICS) is the acronym coined for an association of five major emerging national economies: Brazil, Russia, India, China and South Africa.

In [None]:
#Filter data for BRICS countries
brics = world.Country.isin(["Brazil","Russia","India","China","South Africa"])
brics = world[brics]
#Define function to make plots for BRICS countries
def brics_function(y,title):
    palette = {"Brazil":"#009C3B","Russia":"#0033A0","India":"#FF9933","China":"#DE2910","South Africa":"#000000"}
    sns.barplot(data=brics,x="Country",y=y,palette=palette,order=["Brazil","Russia","India","China","South Africa"])
    plt.ylabel("")
    plt.xlabel("")
    plt.title(title)
    plt.show()
#Generate multiple plots using for loop
brics_dict = {"Population":"#1 BRICS Population (In Billion)","Area (sq. mi.)":"#2 BRICS Area (In Square Miles)","Pop. Density (per sq. mi.)":"#3 BRICS Population Density (In Square Miles)","Coastline (coast/area ratio)":"#4 BRICS Coastline (Coast/Area ratio)","Net migration":"#5 BRICS Net Migration Rate",'Infant mortality (per 1000 births)':"#6 BRICS Infant Mortality (Per 1000 births)", 'GDP ($ per capita)':"#7 BRICS GDP ($ per capita)",'Literacy (%)':"#8 BRICS Literacy",'Phones (per 1000)':"#9 BRICS Number of Phones (Per 1000)",'Arable (%)':"#10 BRICS Arable Land %",'Crops (%)':"#11 BRICS Crops %",'Birthrate':"#12 BRICS Birthrate",'Deathrate':"#13 BRICS Deathrate"}
for key,value in brics_dict.items():
    brics_function(key,value)    

Comparision among BRICS countries (Quick facts):
* **China** is the most populated (BRICS countries represent **~40%** of the world population)
* **Russia** is the largest in terms of area
* **India** is the most densely populated
* Coastline (coast/area ratio) is highest for **South Africa**
* **Russia** has a migration rate **~1** while China has **~(-0.4)**
* **South Africa** has the highest infant mortality rate of **~60** and is closely followed by India
* **South Africa** has the highest GPD per capita ($) while **India** has the lowest
* All countries except **India **have a **literacy rate > 80**
* All countries except **India and South Africa** have **phones per 1000 people > 200**
* **India** has the highest arable land % of close to **~50%**
* Consequently,** India** also has the highest crop %
* All countries except **Russia** have a **birthrate > 10**
* **South Africa** has the highes death rate

G7 (Group of Seven) countries comparison
---
The [**Group of Seven**](https://en.wikipedia.org/wiki/Group_of_Seven) (G7) is an international intergovernmental economic organization consisting of seven major developed countries: Canada, France, Germany, Italy, Japan, the United Kingdom and the United States, which are some of the largest IMF-advanced economies in the world.

In [None]:
#Filter data for G7 countries
group_seven = world.Country.isin(["Canada","France","Germany","Italy","Japan","United Kingdom","United States"])
group_seven = world[group_seven]
#Define function to make plots for G7 countries
def group_seven_function(y,title):
    palette = {"Canada":"#FF0000","France":"#0055A4","Germany":"#FFCE00","Italy":"#008C45","Japan":"#BC002D","United Kingdom":"#00247D","United States":"#3C3B6E"}
    sns.barplot(data=group_seven,y="Country",x=y,palette=palette,order=["Canada","France","Germany","Italy","Japan","United Kingdom","United States"])
    plt.ylabel("")
    plt.xlabel("")
    plt.title(title)
    plt.show()
#Generate multiple plots using for loop for G7 countries
group_seven_dict = {"Population":"#1 G7 Population (x 100 Million)","Area (sq. mi.)":"#2 G7 Area (In Square Miles)","Pop. Density (per sq. mi.)":"#3 G7 Population Density (In Square Miles)","Coastline (coast/area ratio)":"#4 G7 Coastline (Coast/Area ratio)","Net migration":"#5 G7 Net Migration Rate",'Infant mortality (per 1000 births)':"#6 G7 Infant Mortality (Per 1000 births)", 'GDP ($ per capita)':"#7 G7 GDP ($ per capita)",'Literacy (%)':"#8 G7 Literacy",'Phones (per 1000)':"#9 G7 Number of Phones (Per 1000)",'Arable (%)':"#10 G7 Arable Land %",'Crops (%)':"#11 G7 Crops %",'Birthrate':"#12 G7 Birthrate",'Deathrate':"#13 G7 Deathrate"}
for key,value in group_seven_dict.items():
    group_seven_function(key,value)

Comparision among G7 countries (Quick facts):
* **USA** is the most populated
* **Canada** is the largest in terms of area and is closely followed by the **USA**
* **Japan** has the highest population density **(>300)** while **Canada** has the lowest **(<15)**
* **Japan** has the highest coast/area ratio
* Canada has the highest migration rate **(~+6)**
* **Japan** has the lowest infant mortality rate and it is also one of the lowest in the world
* **USA** has the highest GDP per capita and **G7 countries** together account for more than 46% of the global gross domestic product (GDP) based on nominal values, and more than 32% of the global GDP based on purchasing power parity (source:Wikipedia)
* All countries have a high literacy rate **(>90)**
* **USA** has the maximum number of phones per 1000 people while **Italy** has the lowest
* **Canada** has the lowest arable land % while **Germany and France** have the highest
* **Italy** has the highest crops % **(>9%)**
* Birthrate is highest for **USA** while lowest for **Germany**
* Deathrate is highest for **Germany**

# Conclusion

This brings us to the end of the notebook

As you saw, we performed some necessary data preprocessing and jumped to making visualizations to understand - regions in terms of population, infant mortality, etc., correlations among all the numeric columns, comparison among BRICS countries and comparison among G7 countries.

Hope you enjoyed the notebook and learned a bunch of new things (as much I did while working on it!)

Since I'm a beginner, I would love to have your valuable feedback and suggestions so that I can keep on improving

If you liked my work, please consider upvoting this notebook, would mean a lot to me!

Thank you 😄

![](https://www.hoopoequotes.com/media/k2/items/cache/c01223eb48b3d42923a9b5c7e1edaf08_XL.jpg)