## **World Population EDA + Map Visualization 🐹**

<img src="https://as2.ftcdn.net/v2/jpg/02/26/50/29/1000_F_226502970_RTIR8YVHPn6pvPlcpatBsd4nC2PAhqoy.jpg" width="700" height="350">


### **Import Packages & Data** 🐯

In [1]:
!pip install pycountry_convert --quiet

In [2]:
!pip install plotly --quiet

In [3]:
!pip install missingno --quiet

#### **Packages** 🚀

In [4]:
# Main Packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# Special Library
import pycountry_convert as pc
import missingno as msno

# Seaborn Style
sns.set(color_codes = True)
sns.set_style("white")

#### **Import Dataset** 🗺️

In [5]:
df = pd.read_csv("../input/countries-in-the-world-by-population-2022/world_population.csv")
df.head()

### **Data Exploration & Cleaning** 🍺

#### **Data Shape & Structure** 💎

In [6]:
df.shape

* Row count is 201
* Column count is 11

#### **Data Inspection 🐬**

In [7]:
# Let's inspect the missing values 🐢
data_info= pd.DataFrame()
data_info['Column Names']= df.columns
data_info['Datatype'] = df.dtypes.to_list()
data_info['num_NA']= data_info['Column Names'].apply(lambda x: df[x].isna().sum())
data_info['%_NA']= data_info['Column Names'].apply(lambda x: df[x].isna().mean())
data_info

In [8]:
# Missing value matrix ⚔️
msno.matrix(df,color=(0.1, 0.9, 0.7))
plt.show()

* There are 7 missing values in Urban Population % 🐐
* Datatypes are appropriate 🐪

In [9]:
# Let's inspect if there are any duplicate values 💣
data_info= pd.DataFrame()
data_info['Column Names']= df.columns
data_info['Datatype'] = df.dtypes.to_list()
data_info['Duplicate']= data_info['Column Names'].apply(lambda x: df[x].duplicated().sum())
data_info

* No country is repeated 🦏.
* Fortunately we have no duplicate values to deal 🦘.

#### **New Column Names 🦀** 

In [10]:
# Changing column names 🌟
df.rename(columns = {'Country/Other':'Country'}, inplace = True)

### **Adding New Columns 🦂**

In [11]:
# Adding a new column with country code ⚡
def countryCode (country_name):
    try:
        return pc.country_name_to_country_alpha2(country_name)
    except:
        return None                  # None keyword adds a null value 🐹

if __name__ == "__main__":
    df['Country code']= df.apply(lambda x: countryCode(x.Country), axis = 1)
    

In [12]:
# Adding a column with continent 🌈
def continent(country_code):
    try:
        return pc.country_alpha2_to_continent_code(country_code)
    except:
        return None                  # None keyword adds a null value 🐹
    
if __name__ == "__main__":
    df['Continent']= df["Country code"].apply(lambda x: continent(x))

* List of Continent Code:

    * NA: North America 🐔
    * SA: South America 🦉
    * EU: Euroupe 🐸
    * AF: Africa 🐋
    * AS: Asia 🐌
    * OC: Oceania 🐧

#### **Null Value Inspection** 🐍

In [13]:
# To drop null values 🦩
df.dropna(inplace = True)

### **EDA - Exploratory Data Analysis 🐙**

#### **Data Correlations** 🔮

In [14]:
# Correlation in Dataset 🐋
sns.heatmap(df.corr(), cmap='magma',linewidths=2)
plt.savefig('Correlation_Heatmap.png')
plt.show()

#### **Observations 🦕**

* ##### Population has an inverse relation with yearly change in population ⚒️.
* ##### There is a strong direct relation b/w country land area & population 🗡️.
* ##### Fert Rate of is decreasing in hugely populated countries 🛡️.
* ##### In highly populated countries, less people live in urban cities 💡.
* ##### Populated countries have a direct relationship with average age 🔥.

#### **Population Maps 🐶**

In [15]:
# Dataframe copy 🐩
sample = df.copy()
# Standarizing Country Names 🦊
sample["StNames"] = sample["Country"].apply(lambda x : pc.country_name_to_country_alpha3(x))

In [16]:
# To display population over world & continent maps 🦅
def populationMaps(scope,title,color):
    
    choroplethList = []
    for i in range(len(scope)):
        choroplethList.append(px.choropleth(sample, locations = "StNames", color = "Population (2020)",
                    scope = scope[i], title = title[i], color_continuous_scale= color[i]))
    
    return choroplethList
    ,
if __name__ == "__main__":
    scopeList = ["world","asia","north america","south america","africa","europe"]
    titleList = [
        "World Population Map (2020)",
        "Asia Population Map (2020)",
        "North America Population Map (2020)",
        "South America Population Map (2020)",
        "Africa Population Map (2020)",
        "Europe Population Map (2020)"
    ]
    colorList = [
        "portland",
        "viridis",
        "portland",
        "viridis",
        "viridis",
        "portland"
    ]
    
    r = populationMaps(scopeList,titleList,colorList)
    
    for i in range(len(scopeList)):
        r[i].show()

#### **Observations 🦕**
* ##### Most densely populated area of the world is South Asia ⚔️.
* ##### Asia is densely populated in comparison to other provinces 🚬.
* ##### All over Europe you can observe low population 🪓.

#### **Population Bar Plots** 🦚

In [17]:
# To display a population bar plot 🐧.
def barPlots(continents,dataframe,title, color):
    
    barList = []
    xplots = []
    yplots = []
    medAge = []
    
    for i in range(len(continents)):
        data = sample[sample["Continent"] == continents[i]].sort_values("Population (2020)", ascending=False).head(10)
        xplots.append(data["Country"])
        yplots.append(data["Population (2020)"])
        medAge.append(data["Med. Age"])
        
    for i in range(len(continents)):
        fig = px.bar(dataframe, xplots[i], yplots[i], 
                              labels={'x':'Countries', 'y':'Population', 'color':'Median Age'},
                              title = title[i], color = medAge[i], text_auto = '.2s', template="plotly_white")
        fig.update_traces(textfont_size=12, textangle=0, textposition="outside", cliponaxis=False)
        barList.append(fig)
        
        
    return barList
    
    
if __name__ == "__main__":
    
    continents = ["AS","EU","AF","SA","NA","OC"]
    titleList = [
        "Asia Population Bar Chart (2020)",
        "Europe Population Bar Chart (2020)",
        "Africa Population Bar Chart (2020)",
        "South America Population Bar Chart (2020)",
        "North America Population Bar Chart (2020)",
        "Oceania Population Bar Chart (2020)"
    ]
        
    r = barPlots(continents,sample, titleList, colorList)
    
    for i in range(len(continents)):
        r[i].show()
    

#### **Fertility Rate Analysis** 🐿️

In [18]:
# Analyzing countries with high fertility rate & there population 🦇
def highFerRate(df):
    fig = px.bar(df, x = df["Country"], y = df["Fert. Rate"], 
                 color = df["Population (2020)"], text_auto = '0.2s',
                 template="plotly_white",hover_data={"Continent","Land Area (Km²)"}, 
                title = "High Fertility Rate Countries")
    
    return fig.show()
    
if __name__ == "__main__":
    data = df.sort_values("Fert. Rate", ascending=False).head(20)
    highFerRate(data)

#### **Observations 🦕**
* ##### All top 20 high fertility rate countries are situated in Africa ⚔️.
* ##### These countries show low population around 50M with high fert. rate excluding Nigeria 🚬.
* ##### Nigeria has a large population & aren't working on there fertility rate control . 🪓.

In [19]:
# Analyzing countries with low fertility rate & there population 🦇
def highFerRate(df):
    fig = px.bar(df, x = df["Country"], y = df["Fert. Rate"], 
                 color = df["Population (2020)"], text_auto = '0.2s',
                 template="plotly_white",hover_data={"Continent","Land Area (Km²)"}, 
                title = "Low Fertility Rate Countries")
    
    return fig.show()
    
if __name__ == "__main__":
    data = df.sort_values("Fert. Rate", ascending=True).head(20)
    highFerRate(data)

#### **Observations 🦕**
* ##### Observered a pattern, that countries having less land area (km^2) have low fertility rate.  ⚔️.
* ##### Japan is taking birth control measures to control population growth 🚬.
* ##### In these countries, most couples only have 1 child only. 🪓.

#### **Population Share Visualization** 🦘

In [50]:
import plotly.graph_objects as go

def populationShare(df,continents):
    
    ratio = []
    for i in range(len(continents)):
        ratio.append(df.get_group(continents[i])["World Share"].sum())
        
    colors = ["#F10086",'#FFE61B','#533E85', '#488FB1', '#4FD3C4', '#8A39E1']

    fig = go.Figure(data=[go.Pie(labels=continents, values=ratio, hole=.3,)])
    fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=20,
                  marker=dict(colors=colors, line=dict(color='white', width=2)))
    
    fig.update_layout(
    title_text="World Population Share (2020)",)
    
    return fig.show()
    
    
if __name__ == "__main__":
    continents = ["AS","EU","AF","SA","NA","OC"]
    data = df.groupby("Continent")
    
    populationShare(data,continents)

#### **Observations 🦕**
* ##### Obvious that Asia holds almost 60% of world population  ⚔️.
* ##### Oceania region has less than 1% of the world population 🚬.