<a href="https://colab.research.google.com/github/carolineakello/Flood-Analysis/blob/main/World%20Population%20Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# IMPORTANT: RUN THIS CELL IN ORDER TO IMPORT YOUR KAGGLE DATA SOURCES,
# THEN FEEL FREE TO DELETE THIS CELL.
# NOTE: THIS NOTEBOOK ENVIRONMENT DIFFERS FROM KAGGLE'S PYTHON
# ENVIRONMENT SO THERE MAY BE MISSING LIBRARIES USED BY YOUR
# NOTEBOOK.
import kagglehub
iamsouravbanerjee_world_population_dataset_path = kagglehub.dataset_download('iamsouravbanerjee/world-population-dataset')

print('Data source import complete.')


# <p style="padding:10px;background-color:#006837;margin:0;color:#e9c46a;font-family:newtimeroman;font-size:150%;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">World Population Analysis</p>

<p style="text-align:center; ">
<img src="https://thumbs.dreamstime.com/b/world-population-13912340.jpg" style='width: 350px; height: 350px;'>
</p>

<p style="text-align:justify; ">
The current US Census Bureau world population estimate in June 2019 shows that the current global population is 7,577,130,400 people on earth, which far exceeds the world population of 7.2 billion from 2015. Our own estimate based on UN data shows the world's population surpassing 7.7 billion.
<br><br>
China is the most populous country in the world with a population exceeding 1.4 billion. It is one of just two countries with a population of more than 1 billion, with India being the second. As of 2018, India has a population of over 1.355 billion people, and its population growth is expected to continue through at least 2050. By the year 2030, the country of India is expected to become the most populous country in the world. This is because India’s population will grow, while China is projected to see a loss in population.
<br><br>
The next 11 countries that are the most populous in the world each have populations exceeding 100 million. These include the United States, Indonesia, Brazil, Pakistan, Nigeria, Bangladesh, Russia, Mexico, Japan, Ethiopia, and the Philippines. Of these nations, all are expected to continue to grow except Russia and Japan, which will see their populations drop by 2030 before falling again significantly by 2050.
<br> <br>   
Many other nations have populations of at least one million, while there are also countries that have just thousands. The smallest population in the world can be found in Vatican City, where only 801 people reside.
</p>    

<a id='top'></a>
<div class="list-group" id="list-tab" role="tablist">
<p style="padding:10px;background-color:#006837;margin:0;color:#e9c46a;font-family:newtimeroman;font-size:130%;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">Table Of Contents</p>   
    
     
   
|No  | Contents |No  | Contents  |
|:---| :---     |:---| :----     |
|1   | [<font color="#006837"> Importing Libraries</font>](#1)                   |7   | [<font color="#006837"> Population Density</font>](#7)                 |
|2   | [<font color="#006837"> About Dataset</font>](#2)                         |8   | [<font color="#006837"> Population Growth Rate</font>](#8)   |     
|3   | [<font color="#006837"> Basic Exploration</font>](#3)                     |9  | [<font color="#006837"> Country Rank</font>](#9)|
|4   | [<font color="#006837"> Dataset Summary</font>](#4)                       |10  | [<font color="#006837"> Correlation Map</font>](#10)   |       
|5   | [<font color="#006837"> Custom Palette For Visualization</font>](#5)      |11  | [<font color="#006837"> Thank You</font>](#11)    |     
|6   | [<font color="#006837"> Population</font>](#6)              |
   
   

<a id="1"></a>
# <p style="padding:10px;background-color:#006837;margin:0;color:#e9c46a;font-family:newtimeroman;font-size:100%;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">Importing Libraries</p>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from sklearn.preprocessing import LabelEncoder

import warnings
warnings.filterwarnings('ignore')


from plotly.offline import init_notebook_mode
init_notebook_mode(connected=True)

<a id="2"></a>
# <p style="padding:10px;background-color:#006837;margin:0;color:#e9c46a;font-family:newtimeroman;font-size:100%;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">About Dataset</p>

* **Rank:** Rank by population
* **CCA3:** 3 digit Country/Territories code
* **Country:** Name of the Country/Territories
* **Capital:** Name of the Capital
* **Continent:** Name of the Continent
* **2022 Population:** Population of the Country/Territories in the year 2022
* **2020 Population:** Population of the Country/Territories in the year 2020
* **2015 Population:** Population of the Country/Territories in the year 2015
* **2010 Population:** Population of the Country/Territories in the year 2010
* **2000 Population:** Population of the Country/Territories in the year 2000
* **1990 Population:** Population of the Country/Territories in the year 1990
* **1980 Population:** Population of the Country/Territories in the year 1980
* **1970 Population:** Population of the Country/Territories in the year 1970
* **Area (km²):** Area size of the Country/Territories in square kilometer
* **Density (per km²):** Population density per square kilometer
* **Growth Rate:** Population growth rate by Country/Territories
* **World Population Percentage:** The population percentage by each Country/Territories

In [None]:
data = pd.read_csv("../input/world-population-dataset/world_population.csv")

<a id="3"></a>
# <p style="padding:10px;background-color:#006837;margin:0;color:#e9c46a;font-family:newtimeroman;font-size:100%;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">Basic Exploration</p>


**Let's have a glimpse of the dataset.**

In [None]:
print(f"Shape Of The Dataset : {data.shape}")
print(f"\nGlimpse Of The Dataset :")
data.head().style.set_properties(**{"background-color": "#006837","color":"#e9c46a","border": "1.5px solid black"})

In [None]:
print(f"Informations Of The Dataset :\n")
print(data.info())

<a id="4"></a>
# <p style="padding:10px;background-color:#006837;margin:0;color:#e9c46a;font-family:newtimeroman;font-size:100%;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">Dataset Summary</p>

In [None]:
print(f"Summary Of The Dataset :")
data.describe().style.set_properties(**{"background-color": "#006837","color":"#e9c46a","border": "1.5px solid black"})

In [None]:
data.describe(include=object).T.style.set_properties(**{"background-color": "#006837","color":"#e9c46a","border": "1.5px solid black"})

In [None]:
print(f"Null values of the Dataset :")
data.isna().sum().to_frame().T.style.set_properties(**{"background-color": "#006837","color":"#e9c46a","border": "1.5px solid black"})

**Insights:**

* There is no missing values in this dataset.
* We will encode the categorical features into numerical form later.


<a id="5"></a>
# <p style="padding:10px;background-color:#006837;margin:0;color:#e9c46a;font-family:newtimeroman;font-size:100%;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">Custom Palette For Visualization</p>

In [None]:
sns.set_style("white")
sns.set(rc={"axes.facecolor":"#D5CE98","figure.facecolor":"#D5CE98"})
sns.set_context("poster",font_scale = .7)

palette = ["#006837","#1A9850","#66BD63","#A6D96A","#D9EF8B","#FFFFBF","#FEE08B","#FDAE61","#F46D43","#D73027","#A50026"]
palette_cmap = ["#A50026","#D73027","#F46D43","#FDAE61","#FEE08B","#FFFFBF","#D9EF8B","#A6D96A","#66BD63","#1A9850","#006837"]

# sns.palplot(sns.color_palette(palette))
# sns.palplot(sns.color_palette(palette_cmap))
# plt.show()

<a id="6"></a>
# <p style="padding:10px;background-color:#006837;margin:0;color:#e9c46a;font-family:newtimeroman;font-size:100%;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">Population</p>

In [None]:
print(f"Let's have a look on the population :")
_, axs = plt.subplots(2,1,figsize=(20,16))
plt.tight_layout(pad=7.0)

sns.barplot(x=data["Country/Territory"],y=data["2022 Population"],order=data.sort_values("2022 Population",ascending=True)["Country/Territory"][:11],ax=axs[0],palette=palette, saturation=1,edgecolor = "#1c1c1c", linewidth = 4)
axs[0].set_yscale("linear")
axs[0].set_title("Least Populated Countries In 2022",fontsize=25)
axs[0].set_xlabel("\nCountry",fontsize=20)
axs[0].set_ylabel("Population",fontsize=20)
axs[0].set_xticklabels(axs[0].get_xticklabels(),rotation = 12)
for container in axs[0].containers:
    axs[0].bar_label(container,label_type="center",padding=6,size=18,color="black",rotation=0,
    bbox={"boxstyle": "round", "pad": 0.4, "facecolor": "orange", "edgecolor": "#1c1c1c", "linewidth" : 3, "alpha": 1})


sns.barplot(x=data["Country/Territory"],y=data["2022 Population"],order=data.sort_values("2022 Population",ascending=False)["Country/Territory"][:11],ax=axs[1],palette=palette, saturation=1,edgecolor = "#1c1c1c", linewidth = 4)
axs[1].set_yscale("log")
axs[1].set_title("Top Populated Countries In 2022",fontsize=25)
axs[1].set_xlabel("\nCountry",fontsize=20)
axs[1].set_ylabel("Population",fontsize=20)
axs[1].set_xticklabels(axs[1].get_xticklabels(),rotation = 0)
for container in axs[1].containers:
    axs[1].bar_label(container,label_type="edge",padding=6,size=18,color="black",rotation=0,
    bbox={"boxstyle": "round", "pad": 0.4, "facecolor": "orange", "edgecolor": "#1c1c1c", "linewidth" : 3, "alpha": 1})


sns.despine(left=True, bottom=True)
plt.show()

**Insights:**

* China is the most densely populated country with 1.4B people followed by India, United States and others.
* Vatican City is the least populated country with 510 people followed by Tokelau, Niue and others.

In [None]:
data_pop = data.copy()
data_pop = pd.DataFrame(data_pop.groupby(["Continent"])["1970 Population","1980 Population","1990 Population","2000 Population","2010 Population","2015 Population","2020 Population","2022 Population"].sum())

print("Let's have a look on the continent-wise population in 2022 :")
plt.subplots(figsize=(20,8))
p=sns.barplot(x=data_pop.index, y=data_pop["2022 Population"],order=data_pop.sort_values("2022 Population",ascending=False).index,palette=palette[0:11:2], saturation=1,edgecolor = "#1c1c1c", linewidth = 4)
p.set_yscale("log")
p.set_title("Continent-wise Population [2022]",fontsize=25)
p.set_xlabel("\nContinent",fontsize=20)
p.set_ylabel("Population",fontsize=20)
p.set_xticklabels(p.get_xticklabels(),rotation = 0)
for container in p.containers:
    p.bar_label(container,label_type="edge",padding=6,size=18,color="black",rotation=0,
    bbox={"boxstyle": "round", "pad": 0.4, "facecolor": "orange", "edgecolor": "#1c1c1c", "linewidth" : 5, "alpha": 1})

sns.despine(left=True, bottom=True)
plt.show()

**Insights:**

* Asia is the most densely populated continent with 4.7B people followed by Africa, Europe and others.


In [None]:
print(f"Let's have a look on the ratios of continent-wise population in 2022 :")
plt.subplots(figsize=(12, 12))

labels = "Asia","Africa","Europe","North America","South America","Oceania"
size = 0.5

wedges, texts, autotexts = plt.pie([data_pop.sort_values("2022 Population",ascending=False)["2022 Population"][0],
                                    data_pop.sort_values("2022 Population",ascending=False)["2022 Population"][1],
                                    data_pop.sort_values("2022 Population",ascending=False)["2022 Population"][2],
                                    data_pop.sort_values("2022 Population",ascending=False)["2022 Population"][3],
                                    data_pop.sort_values("2022 Population",ascending=False)["2022 Population"][4],
                                    data_pop.sort_values("2022 Population",ascending=False)["2022 Population"][5]],
                                    explode = (0,0,0,0,0,0),
                                    textprops=dict(size= 20, color= "white"),
                                    autopct="%.2f%%",
                                    pctdistance = 0.72,
                                    radius=.9,
                                    colors = palette[0:11:2],
                                    shadow = True,
                                    wedgeprops=dict(width = size, edgecolor = "black",
                                    linewidth = 4),
                                    startangle = -15)

plt.legend(wedges, labels, title="Continent",loc="center left",bbox_to_anchor=(1, 0, 0.5, 1), edgecolor = "black")
plt.title("\nContinent-wise Population Ratio [2022]",fontsize=25)
plt.show()

**Insights:**

* In world population 59.21% from Asia, 17.89% from Africa, 9.32% from Europe,7.53% from North America, 5.48% from South America and 0.56% from Oceania.


In [None]:
print("Population in Asia in 2022 :")

fig = px.choropleth(data_frame = data,
                    locations="Country/Territory",locationmode="country names", color="2022 Population",
                    color_continuous_scale=palette[:10],height= 600,scope="asia",
                    labels={"2022 Population":"Population"})


fig.update_layout(title=dict(text= "Population In Asia [2022]",
                             y=0.95,x=0.5,xanchor= "center",yanchor= "top",font_color="black"),
                  margin=dict(l=0, r=0, b=0, t=0),
                  geo_bgcolor="#D5CE98",
                  paper_bgcolor="#D5CE98")

fig.show()

In [None]:
print("Population in Asia in 2022 :")
plt.subplots(figsize=(20,8))
p=sns.barplot(data=data[data["Continent"]=="Asia"],x="Country/Territory", y="2022 Population",order=data[data["Continent"]=="Asia"].sort_values("2022 Population",ascending=False)["Country/Territory"][:11],palette=palette[0:11:2], saturation=1,edgecolor = "#1c1c1c", linewidth = 4)
p.set_yscale("log")
p.set_title("Population In Asia [2022]",fontsize=25)
p.set_xlabel("\nCountry",fontsize=20)
p.set_ylabel("Population",fontsize=20)
p.set_xticklabels(p.get_xticklabels(),rotation = 0)
for container in p.containers:
    p.bar_label(container,label_type="edge",padding=6,size=18,color="black",rotation=0,
    bbox={"boxstyle": "round", "pad": 0.4, "facecolor": "orange", "edgecolor": "#1c1c1c", "linewidth" : 5, "alpha": 1})

sns.despine(left=True, bottom=True)
plt.show()

**Insights:**

* China is leading in Asia with 1.42B people followed by India, Indonesia and other countries.

In [None]:
print("Population in Africa in 2022 :")

fig = px.choropleth(data_frame = data,
                    locations="Country/Territory",locationmode="country names", color="2022 Population",
                    color_continuous_scale=palette[:10],height= 600,scope="africa",
                    labels={"2022 Population":"Population"})


fig.update_layout(title=dict(text= "Population In Africa [2022]",
                             y=0.95,x=0.5,xanchor= "center",yanchor= "top",font_color="black"),
                  margin=dict(l=0, r=0, b=0, t=0),
                  geo_bgcolor="#D5CE98",
                  paper_bgcolor="#D5CE98")

fig.show()

In [None]:
print("Population in Africa in 2022 :")
plt.subplots(figsize=(20,8))
p=sns.barplot(data=data[data["Continent"]=="Africa"],x="Country/Territory", y="2022 Population",order=data[data["Continent"]=="Africa"].sort_values("2022 Population",ascending=False)["Country/Territory"][:11],palette=palette[0:11:2], saturation=1,edgecolor = "#1c1c1c", linewidth = 4)
p.set_yscale("log")
p.set_title("Population In Africa [2022]",fontsize=25)
p.set_xlabel("\nCountry",fontsize=20)
p.set_ylabel("Population",fontsize=20)
p.set_xticklabels(p.get_xticklabels(),rotation = 0)
for container in p.containers:
    p.bar_label(container,label_type="edge",padding=6,size=18,color="black",rotation=0,
    bbox={"boxstyle": "round", "pad": 0.4, "facecolor": "orange", "edgecolor": "#1c1c1c", "linewidth" : 5, "alpha": 1})

sns.despine(left=True, bottom=True)
plt.show()

**Insights:**

* Nigeria is leading in Africa with 218.5M people followed by Ethiopia, Egypt and other countries.

In [None]:
print("Population in Europe in 2022 :")

fig = px.choropleth(data_frame = data,
                    locations="Country/Territory",locationmode="country names", color="2022 Population",
                    color_continuous_scale=palette[:10],height= 600,scope="europe",
                    labels={"2022 Population":"Population"})


fig.update_layout(title=dict(text= "Population In Europe [2022]",
                             y=0.95,x=0.5,xanchor= "center",yanchor= "top",font_color="black"),
                  margin=dict(l=0, r=0, b=0, t=0),
                  geo_bgcolor="#D5CE98",
                  paper_bgcolor="#D5CE98")

fig.show()

In [None]:
print("Population in Europe in 2022 :")
plt.subplots(figsize=(20,8))
p=sns.barplot(data=data[data["Continent"]=="Europe"],x="Country/Territory", y="2022 Population",order=data[data["Continent"]=="Europe"].sort_values("2022 Population",ascending=False)["Country/Territory"][:11],palette=palette[0:11:2], saturation=1,edgecolor = "#1c1c1c", linewidth = 4)
p.set_yscale("log")
p.set_title("Population In Europe [2022]",fontsize=25)
p.set_xlabel("\nCountry",fontsize=20)
p.set_ylabel("Population",fontsize=20)
p.set_xticklabels(p.get_xticklabels(),rotation = 0)
for container in p.containers:
    p.bar_label(container,label_type="edge",padding=6,size=18,color="black",rotation=0,
    bbox={"boxstyle": "round", "pad": 0.4, "facecolor": "orange", "edgecolor": "#1c1c1c", "linewidth" : 5, "alpha": 1})

sns.despine(left=True, bottom=True)
plt.show()

**Insights:**

* Russia is leading in Europe with 144.7M people followed by Germany, United Kingdom and other countries.

In [None]:
print("Population in North America in 2022 :")

fig = px.choropleth(data_frame = data,
                    locations="Country/Territory",locationmode="country names", color="2022 Population",
                    color_continuous_scale=palette[:10],height= 600,scope="north america",
                    labels={"2022 Population":"Population"})


fig.update_layout(title=dict(text= "Population In North America [2022]",
                             y=0.95,x=0.5,xanchor= "center",yanchor= "top",font_color="black"),
                  margin=dict(l=0, r=0, b=0, t=0),
                  geo_bgcolor="#D5CE98",
                  paper_bgcolor="#D5CE98")

fig.show()

In [None]:
print("Population in North America in 2022 :")
plt.subplots(figsize=(20,8))
p=sns.barplot(data=data[data["Continent"]=="North America"],x="Country/Territory", y="2022 Population",order=data[data["Continent"]=="North America"].sort_values("2022 Population",ascending=False)["Country/Territory"][:11],palette=palette[0:11:2], saturation=1,edgecolor = "#1c1c1c", linewidth = 4)
p.set_yscale("log")
p.set_title("Population In North America [2022]",fontsize=25)
p.set_xlabel("\nCountry",fontsize=20)
p.set_ylabel("Population",fontsize=20)
p.set_xticklabels(p.get_xticklabels(),rotation = 0)
for container in p.containers:
    p.bar_label(container,label_type="edge",padding=6,size=18,color="black",rotation=0,
    bbox={"boxstyle": "round", "pad": 0.4, "facecolor": "orange", "edgecolor": "#1c1c1c", "linewidth" : 5, "alpha": 1})

sns.despine(left=True, bottom=True)
plt.show()

**Insights:**

* United States is leading in North America with 338.3M people followed by Mexico, Canada and other countries.

In [None]:
print("Population in Souh America in 2022 :")

fig = px.choropleth(data_frame = data,
                    locations="Country/Territory",locationmode="country names", color="2022 Population",
                    color_continuous_scale=palette[:10],height= 600,scope="south america",
                    labels={"2022 Population":"Population"})


fig.update_layout(title=dict(text= "Population In South America [2022]",
                             y=0.95,x=0.5,xanchor= "center",yanchor= "top",font_color="black"),
                  margin=dict(l=0, r=0, b=0, t=0),
                  geo_bgcolor="#D5CE98",
                  paper_bgcolor="#D5CE98")

fig.show()

In [None]:
print("Population in South America in 2022 :")
plt.subplots(figsize=(20,8))
p=sns.barplot(data=data[data["Continent"]=="South America"],x="Country/Territory", y="2022 Population",order=data[data["Continent"]=="South America"].sort_values("2022 Population",ascending=False)["Country/Territory"][:11],palette=palette[0:11:2], saturation=1,edgecolor = "#1c1c1c", linewidth = 4)
p.set_yscale("log")
p.set_title("Population In South America [2022]",fontsize=25)
p.set_xlabel("\nCountry",fontsize=20)
p.set_ylabel("Population",fontsize=20)
p.set_xticklabels(p.get_xticklabels(),rotation = 0)
for container in p.containers:
    p.bar_label(container,label_type="edge",padding=6,size=18,color="black",rotation=0,
    bbox={"boxstyle": "round", "pad": 0.4, "facecolor": "orange", "edgecolor": "#1c1c1c", "linewidth" : 5, "alpha": 1})

sns.despine(left=True, bottom=True)
plt.show()

**Insights:**

* Brazil is leading in South America with 215.3M people followed by Colombia, Argentina and other countries.

In [None]:
print(f"Let's have a look on the timeline of continent-wise population :")
_, axs = plt.subplots(figsize=(20,10))

sns.lineplot(x=data_pop.T.index ,y=data_pop.T["Asia"],data=data_pop.T,ax=axs,color="#006837",marker="o",linewidth=5,markersize=20)
sns.lineplot(x=data_pop.T.index ,y=data_pop.T["Africa"],data=data_pop.T,ax=axs,color="#66BD63",marker="o",linewidth=5,markersize=20)
sns.lineplot(x=data_pop.T.index ,y=data_pop.T["Europe"],data=data_pop.T,ax=axs,color="#D9EF8B",marker="o",linewidth=5,markersize=20)
sns.lineplot(x=data_pop.T.index ,y=data_pop.T["North America"],data=data_pop.T,ax=axs,color="#FEE08B",marker="o",linewidth=5,markersize=20)
sns.lineplot(x=data_pop.T.index ,y=data_pop.T["South America"],data=data_pop.T,ax=axs,color="#F46D43",marker="o",linewidth=5,markersize=20)
sns.lineplot(x=data_pop.T.index ,y=data_pop.T["Oceania"],data=data_pop.T,ax=axs,color="#A50026",marker="o",linewidth=5,markersize=20)

axs.set_title("Continent-wise Population Timeline\n",fontsize=25)
axs.set_xlabel("\nYear",fontsize=20)
axs.set_ylabel("Population",fontsize=20)
axs.legend(["Asia","Africa","Europe","North America","South America","Oceania"],title="Continent", edgecolor = "#1c1c1c")
# axs.set_xticks([],minor=False)

sns.despine(left=True, bottom=True)
plt.show()

**Insights:**

* In every continent population is increasing by time.
* Population of Asia is increasing highly followed by Africa.


In [None]:
plt.subplots(figsize=(20,10))
p=sns.scatterplot(x=data["Density (per km²)"], y=data["Area (km²)"], hue=data["Continent"],size=data["2022 Population"],palette=palette[0:11:2], edgecolor = "#1c1c1c", linewidth = 2,sizes=(100, 9000),alpha=1)
p.set_xscale("symlog")
p.set_yscale("linear")
p.set_title("Continent-wise Population Distribution [2022]",fontsize=25)
p.set_xlabel("Density/km²",fontsize=20)
p.set_ylabel("\nArea/km²",fontsize=20)

sns.despine(left=True, bottom=True)
plt.show()

**Insights:**

* Almost every Asian country has high population with high population density/km².

<a id="7"></a>
# <p style="padding:10px;background-color:#006837;margin:0;color:#e9c46a;font-family:newtimeroman;font-size:100%;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">Population Density</p>

In [None]:
print(f"Let's have a look on the distribution of population density :")
plt.subplots(figsize=(20, 8))
p = sns.histplot(data["Density (per km²)"],color=["#A50026"],kde=True,bins=60,alpha=1,fill=True,edgecolor="black",linewidth=3)
p.axes.lines[0].set_color("#e9c46a")
p.axes.set_yscale("symlog")
p.axes.set_title("\nPopulation Density/km² Distribution\n",fontsize=25)
plt.ylabel("Count",fontsize=20)
plt.xlabel("\nDensity/km²",fontsize=20)
sns.despine(left=True, bottom=True)

plt.show()

**Insights:**

* Almost all country have population density/km² in between 0 to 2000.

In [None]:
print(f"Let's have a look on the population density :")
_, axs = plt.subplots(2,1,figsize=(20,16))
plt.tight_layout(pad=6.0)

sns.barplot(x=data["Country/Territory"],y=data["Density (per km²)"],order=data.sort_values("Density (per km²)",ascending=False)["Country/Territory"][:11],ax=axs[0],palette=palette, saturation=1,edgecolor = "#1c1c1c", linewidth = 4)
axs[0].set_yscale("linear")
axs[0].set_title("Most Densely Populated Countries",fontsize=25)
axs[0].set_xlabel("\nCountry",fontsize=20)
axs[0].set_ylabel("Density/km²",fontsize=20)
axs[0].set_xticklabels(axs[0].get_xticklabels(),rotation = 0)
for container in axs[0].containers:
    axs[0].bar_label(container,label_type="center",padding=6,size=18,color="black",rotation=0,
    bbox={"boxstyle": "round", "pad": 0.4, "facecolor": "orange", "edgecolor": "#1c1c1c", "linewidth" : 3, "alpha": 1})

sns.barplot(x=data["Country/Territory"],y=data["Density (per km²)"],order=data.sort_values("Density (per km²)",ascending=True)["Country/Territory"][:11],ax=axs[1],palette=palette, saturation=1,edgecolor = "#1c1c1c", linewidth = 4)
axs[1].set_yscale("linear")
axs[1].set_title("Least Densely Populated Countries",fontsize=25)
axs[1].set_xlabel("\nCountry",fontsize=20)
axs[1].set_ylabel("Density/km²",fontsize=20)
axs[1].set_xticklabels(axs[1].get_xticklabels(),rotation = 15)
for container in axs[1].containers:
    axs[1].bar_label(container,label_type="center",padding=6,size=18,color="black",rotation=0,
    bbox={"boxstyle": "round", "pad": 0.4, "facecolor": "orange", "edgecolor": "#1c1c1c", "linewidth" : 3, "alpha": 1})

sns.despine(left=True, bottom=True)
plt.show()

**Insights:**

* Macau is the most densely populated country with more than 23172 people in per square kilometer followed by Monaco, Singapore and others.
* Greenland is the least densely populated country with 0.0261 people in per square kilometer followed by Falkland Island, Western Sahara and others.

In [None]:
data_den = data.copy()
data_den = pd.DataFrame(data_den.groupby(["Continent"])["1970 Population","1980 Population","1990 Population","2000 Population","2010 Population","2015 Population","2020 Population","2022 Population","Area (km²)"].sum())
col = ["1970 Population","1980 Population","1990 Population","2000 Population","2010 Population","2015 Population","2020 Population","2022 Population"]
for i in col:
    data_den[i] = data_den[i]/data_den["Area (km²)"]
data_den.rename(columns={"1970 Population":"1970 Density","1980 Population":"1980 Density","1990 Population":"1990 Density","2000 Population":"2000 Density","2010 Population":"2010 Density","2015 Population":"2015 Density","2020 Population":"2020 Density","2022 Population":"2022 Density"},inplace=True)
data_den.drop(columns="Area (km²)",inplace=True)


print(f"Let's have a look on the timeline of continent-wise population density :")
_, axs = plt.subplots(figsize=(20,10))

sns.lineplot(x=data_den.T.index ,y=data_den.T["Asia"],data=data_den.T,ax=axs,color="#006837",marker="o",linewidth=5,markersize=20)
sns.lineplot(x=data_den.T.index ,y=data_den.T["Africa"],data=data_den.T,ax=axs,color="#66BD63",marker="o",linewidth=5,markersize=20)
sns.lineplot(x=data_den.T.index ,y=data_den.T["Europe"],data=data_den.T,ax=axs,color="#D9EF8B",marker="o",linewidth=5,markersize=20)
sns.lineplot(x=data_den.T.index ,y=data_den.T["North America"],data=data_den.T,ax=axs,color="#FEE08B",marker="o",linewidth=5,markersize=20)
sns.lineplot(x=data_den.T.index ,y=data_den.T["South America"],data=data_den.T,ax=axs,color="#F46D43",marker="o",linewidth=5,markersize=20)
sns.lineplot(x=data_den.T.index ,y=data_den.T["Oceania"],data=data_den.T,ax=axs,color="#A50026",marker="o",linewidth=5,markersize=20)

axs.set_title("Continent-wise Population Density Timeline\n",fontsize=25)
axs.set_xlabel("\nYear",fontsize=20)
axs.set_ylabel("Population Density",fontsize=20)
axs.legend(["Asia","Africa","Europe","North America","South America","Oceania"],title="Continent", edgecolor = "#1c1c1c")
# axs.set_xticks([],minor=False)

sns.despine(left=True, bottom=True)
plt.show()

**Insights:**

* In every continent population density/km² is increasing by time.
* Population density/km² of Asia is increasing highly followed by Africa.

<a id="8"></a>
# <p style="padding:10px;background-color:#006837;margin:0;color:#e9c46a;font-family:newtimeroman;font-size:100%;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">Population Growth Rate</p>

In [None]:
print(f"Let's have a look on the population growth rate :")
_, axs = plt.subplots(2,1,figsize=(20,16))
plt.tight_layout(pad=6.0)

sns.barplot(x=data["Country/Territory"],y=data["Growth Rate"],order=data.sort_values("Growth Rate",ascending=False)["Country/Territory"][:11],ax=axs[0],palette=palette, saturation=1,edgecolor = "#1c1c1c", linewidth = 4)
axs[0].set_yscale("linear")
axs[0].set_title("Highest Population Growth Rate Countries",fontsize=25)
axs[0].set_xlabel("\nCountry",fontsize=20)
axs[0].set_ylabel("Growth Rate",fontsize=20)
axs[0].set_xticklabels(axs[0].get_xticklabels(),rotation = 0)
for container in axs[0].containers:
    axs[0].bar_label(container,label_type="center",padding=6,size=18,color="black",rotation=0,
    bbox={"boxstyle": "round", "pad": 0.4, "facecolor": "orange", "edgecolor": "#1c1c1c", "linewidth" : 3, "alpha": 1})

sns.barplot(x=data["Country/Territory"],y=data["Growth Rate"],order=data.sort_values("Growth Rate",ascending=True)["Country/Territory"][:11],ax=axs[1],palette=palette, saturation=1,edgecolor = "#1c1c1c", linewidth = 4)
axs[1].set_yscale("linear")
axs[1].set_title("Least Population Growth Rate Countries",fontsize=25)
axs[1].set_xlabel("\nCountry",fontsize=20)
axs[1].set_ylabel("Growth Rate",fontsize=20)
axs[1].set_xticklabels(axs[1].get_xticklabels(),rotation = 30)
for container in axs[1].containers:
    axs[1].bar_label(container,label_type="center",padding=6,size=18,color="black",rotation=0,
    bbox={"boxstyle": "round", "pad": 0.4, "facecolor": "orange", "edgecolor": "#1c1c1c", "linewidth" : 3, "alpha": 1})

sns.despine(left=True, bottom=True)
plt.show()

**Insights:**

* Moldova is leading in population growth rate with a growth rate of 1.0691 followed by Poland, Niger and others.
* Ukraine has the lowest population growth rate of 0.912 followed by Lebanon, American Samoa and others.

In [None]:
print("Population Growth Rate in Asia :")

fig = px.choropleth(data_frame = data,
                    locations="Country/Territory",locationmode="country names", color="Growth Rate",
                    color_continuous_scale=palette[0:5],height= 600,scope="asia",
                    labels={"Growth Rate":"Growth Rate"})


fig.update_layout(title=dict(text= "Population Growth Rate In Asia",
                             y=0.95,x=0.5,xanchor= "center",yanchor= "top",font_color="black"),
                  margin=dict(l=0, r=0, b=0, t=0),
                  geo_bgcolor="#D5CE98",
                  paper_bgcolor="#D5CE98")

fig.show()

**Insights:**

* Syria is leading in Asia with a population growth rate of 1.0376 followed by Afghanistan, Palestine and other countries.

In [None]:
print("Population Growth Rate in Africa :")

fig = px.choropleth(data_frame = data,
                    locations="Country/Territory",locationmode="country names", color="Growth Rate",
                    color_continuous_scale=palette[0:5],height= 600,scope="africa",
                    labels={"Growth Rate":"Growth Rate"})


fig.update_layout(title=dict(text= "Population Growth Rate In Africa",
                             y=0.95,x=0.5,xanchor= "center",yanchor= "top",font_color="black"),
                  margin=dict(l=0, r=0, b=0, t=0),
                  geo_bgcolor="#D5CE98",
                  paper_bgcolor="#D5CE98")

fig.show()

**Insights:**

* Niger is leading in Africa with a population growth rate of 1.0378 followed by DR Congo, Mayotte and other countries.

In [None]:
print("Population Growth Rate in Europe :")

fig = px.choropleth(data_frame = data,
                    locations="Country/Territory",locationmode="country names", color="Growth Rate",
                    color_continuous_scale=palette[0:5],height= 600,scope="europe",
                    labels={"Growth Rate":"Growth Rate"})


fig.update_layout(title=dict(text= "Population Growth Rate In Europe",
                             y=0.95,x=0.5,xanchor= "center",yanchor= "top",font_color="black"),
                  margin=dict(l=0, r=0, b=0, t=0),
                  geo_bgcolor="#D5CE98",
                  paper_bgcolor="#D5CE98")

fig.show()

**Insights:**

* Moldova is leading in Europe with a population growth rate of 1.0691 followed by Poland, Slovakia and other countries.

In [None]:
print("Population Growth Rate in North America :")

fig = px.choropleth(data_frame = data,
                    locations="Country/Territory",locationmode="country names", color="Growth Rate",
                    color_continuous_scale=palette[0:5],height= 600,scope="north america",
                    labels={"Growth Rate":"Growth Rate"})


fig.update_layout(title=dict(text= "Population Growth Rate In North America",
                             y=0.95,x=0.5,xanchor= "center",yanchor= "top",font_color="black"),
                  margin=dict(l=0, r=0, b=0, t=0),
                  geo_bgcolor="#D5CE98",
                  paper_bgcolor="#D5CE98")

fig.show()

**Insights:**

* Honduras is leading in North America with a population growth rate of 1.015 followed by Nicaragua, Guatemala and other countries.

In [None]:
print("Population Growth Rate in South America :")

fig = px.choropleth(data_frame = data,
                    locations="Country/Territory",locationmode="country names", color="Growth Rate",
                    color_continuous_scale=palette[0:5],height= 600,scope="south america",
                    labels={"Growth Rate":"Growth Rate"})


fig.update_layout(title=dict(text= "Population Growth Rate In South America",
                             y=0.95,x=0.5,xanchor= "center",yanchor= "top",font_color="black"),
                  margin=dict(l=0, r=0, b=0, t=0),
                  geo_bgcolor="#D5CE98",
                  paper_bgcolor="#D5CE98")

fig.show()

**Insights:**

* French Guiana is leading in South America with a population growth rate of 1.0239 followed by Bolivia, Paraguay and other countries.

In [None]:
data_gr = data.copy()
data_gr = pd.DataFrame(data_gr.groupby(["Continent"])["Growth Rate"].mean())

print("Let's have a look on the continent-wise average population growth rate :")
plt.subplots(figsize=(20,8))
p=sns.barplot(x=data_gr.index, y=data_gr["Growth Rate"],order=data_gr.sort_values("Growth Rate",ascending=False).index,palette=palette[0:11:2], saturation=1,edgecolor = "#1c1c1c", linewidth = 4)
p.set_yscale("log")
p.set_title("Continent-wise Average Population Growth Rate",fontsize=25)
p.set_xlabel("\nContinent",fontsize=20)
p.set_ylabel("Average Growth Rate",fontsize=20)
p.set_xticklabels(p.get_xticklabels(),rotation = 0)
for container in p.containers:
    p.bar_label(container,label_type="edge",padding=6,size=18,color="black",rotation=0,
    bbox={"boxstyle": "round", "pad": 0.4, "facecolor": "orange", "edgecolor": "#1c1c1c", "linewidth" : 4, "alpha": 1})

sns.despine(left=True, bottom=True)
plt.show()

**Insights:**

* Africa is leading in average population growth rate with a growth rate of 1.02124 followed by Asia, South America and others.
* Europe has the lowest average population growth rate of 1.00226

<a id="9"></a>
# <p style="padding:10px;background-color:#006837;margin:0;color:#e9c46a;font-family:newtimeroman;font-size:100%;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">Country Rank</p>

In [None]:
def rank(feature,color):

    _, axes = plt.subplots(figsize=(20,8))
    sns.kdeplot(x=data[data["Continent"]==feature]["Rank"], y=data["2022 Population"],edgecolor="#1c1c1c",fill=True, kind="kde",shade=False,height=10,color=color)
    axes.set_title(f"\nCountry Rank Distribution [{feature}]\n",fontsize=25)
    axes.set_ylabel("Population",fontsize=20)
    axes.set_xlabel(f"\nCountry Rank [{feature} Continent]",fontsize=20)

    sns.despine(left=True, bottom=True)
    plt.show()

In [None]:
print("Let's have a look on the distribution of asian country rank :")
rank("Asia",palette[0])

**Insights:**

* Most countries rank in Asia fall in between 0 to 75 and in between 100 to 130.


In [None]:
print("Let's have a look on the distribution of african country rank :")
rank("Africa",palette[0])

**Insights:**

* Most countries rank in Africa fall in between 6 to 85 and others are scattered.

In [None]:
print("Let's have a look on the distribution of european country rank :")
rank("Europe",palette[0])

**Insights:**

* Most countries rank in Europe fall in between 60 to 160 and others are scattered.

In [None]:
print("Let's have a look on the distribution of north american country rank :")
rank("North America",palette[0])

**Insights:**

* Most countries rank in North America fall in between 170 to 230 and others are scattered.

In [None]:
print("Let's have a look on the distribution of south american country rank :")
rank("South America",palette[0])

**Insights:**

* Most countries rank in South America fall in between 20 to 110 and others are scattered.

In [None]:
print("Let's have a look on the distribution of oceanian country rank :")
rank("Oceania",palette[0])

**Insights:**

* Most countries rank in Oceania fall in between 180 to 230 and others are scattered.

<a id="10"></a>
# <p style="padding:10px;background-color:#006837;margin:0;color:#e9c46a;font-family:newtimeroman;font-size:100%;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">Correlation Map</p>

In [None]:
catcol = ["CCA3","Country/Territory","Capital","Continent"]
le = LabelEncoder()
for col in catcol:
        data[col] = le.fit_transform(data[col])


plt.subplots(figsize =(20, 20))

sns.heatmap(data.corr(), cmap = palette_cmap, square=True, cbar_kws=dict(shrink =.82),
            annot=True, vmin=-1, vmax=1, linewidths=3,linecolor='black',annot_kws=dict(fontsize =12))
plt.title("Pearson Correlation Of Features\n", fontsize=25)
plt.xticks(rotation=90)
plt.show()

**Insights:**

* High correlation between population and world population percentage. Also high correlation between country and CCA3 code as expected.
* Medium correlation between area and population, area and world population percentage.
* Medium inverse correlation between growth rate and continent, rank and world population percentage, rank and area, rank and population.

<a id="11"></a>
# <p style="padding:10px;background-color:#006837;margin:0;color:#e9c46a;font-family:newtimeroman;font-size:100%;text-align:center;border-radius: 15px 50px;overflow:hidden;font-weight:500">Thank You</p>



<p>
<h3><font color="#006837">If you liked this notebook please upvote. Your feedback will be highly appreciated.</font></h3>

<br>

<h4><b>Author :</b></h4>

<h3>Hasib Al Muzdadid</h3>

<b>👉Shoot me mails :</b> muzdadid@gmail.com<br>
<b>👉Connect on LinkedIn :</b> https://www.linkedin.com/in/hasibalmuzdadid <br>
<b>👉Explore Github :</b> https://github.com/HasibAlMuzdadid    
    
</p>