<a href="https://www.kaggle.com/code/sanphats/eda-sea-plotly?scriptVersionId=107235972" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# **Population analysis in Southeast Asia** ⭐
![](http://www.indochinatravelpackages.com/wp-content/uploads/2022/06/9.jpg)
**Image reference:** [indochinatravelpackages.com](https://www.indochinatravelpackages.com/best-southeast-asia-tours/)

### **About dataset,**

In this Dataset, we have Historical Population data for every Country/Territory in the world by different parameters like Area Size of the Country/Territory, Name of the Continent, Name of the Capital, Density, Population Growth Rate, Ranking based on Population, World Population Percentage, etc.

### **Dataset Glossary (Column-Wise)**
* **Rank**: Rank by Population.
* **CCA3**: 3 Digit Country/Territories Code.
* **Country**: Name of the Country/Territories.
* **Capital**: Name of the Capital.
* **Continent**: Name of the Continent.
* **2022 Population**: Population of the Country/Territories in the year 2022.
* **2020 Population**: Population of the Country/Territories in the year 2020.
* **2015 Population**: Population of the Country/Territories in the year 2015.
* **2010 Population**: Population of the Country/Territories in the year 2010.
* **2000 Population**: Population of the Country/Territories in the year 2000.
* **1990 Population**: Population of the Country/Territories in the year 1990.
* **1980 Population**: Population of the Country/Territories in the year 1980.
* **1970 Population**: Population of the Country/Territories in the year 1970.
* **Area (km²)**: Area size of the Country/Territories in square kilometer.
* **Density (per km²)**: Population Density per square kilometer.
* **Growth Rate**: Population Growth Rate by Country/Territories.
* **World Population Percentage**: The population percentage by each Country/Territories.

### **Outline**
- [load and preprocess data](#1)
- [Exploratory data analysis](#2)
    - [World population percentage and Growth rate in SEA countries](#2-1)
    - [Growth rate over time in SEA countries](#2-2)
    - [2022 Population and Density barplot](#2-3)
    - [Interactive choreopleth of Population between 1970 to 2022](#2-4)
    - [Summary](#2-5)

<a id='1'></a>
# **Load and preprocess data**

In [1]:
# import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

%matplotlib inline
sns.set(style = 'white')

import warnings
warnings.filterwarnings('ignore')

In [2]:
# import csv
df = pd.read_csv('../input/world-population-dataset/world_population.csv')
df.sample(5)

Unnamed: 0,Rank,CCA3,Country,Capital,Continent,2022 Population,2020 Population,2015 Population,2010 Population,2000 Population,1990 Population,1980 Population,1970 Population,Area (km²),Density (per km²),Growth Rate,World Population Percentage
119,167,MAC,Macau,Concelho de Macau,Asia,695168,676283,615239,557297,431896,350227,245332,247284,30,23172.2667,1.0125,0.01
61,156,EST,Estonia,Tallinn,Europe,1326062,1329444,1314657,1331535,1396877,1570674,1476983,1361999,45227,29.3201,0.998,0.02
117,141,LTU,Lithuania,Vilnius,Europe,2750055,2820267,2963765,3139019,3599637,3785847,3521206,3210147,65300,42.1142,0.9869,0.03
46,130,HRV,Croatia,Zagreb,Europe,4030358,4096868,4254815,4368682,4548434,4873707,4680144,4492638,56594,71.2153,0.9927,0.05
18,96,BLR,Belarus,Minsk,Europe,9534954,9633740,9700609,9731427,10256483,10428525,9817257,9170786,207600,45.9295,0.9955,0.12


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 234 entries, 0 to 233
Data columns (total 17 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   Rank                         234 non-null    int64  
 1   CCA3                         234 non-null    object 
 2   Country                      234 non-null    object 
 3   Capital                      234 non-null    object 
 4   Continent                    234 non-null    object 
 5   2022 Population              234 non-null    int64  
 6   2020 Population              234 non-null    int64  
 7   2015 Population              234 non-null    int64  
 8   2010 Population              234 non-null    int64  
 9   2000 Population              234 non-null    int64  
 10  1990 Population              234 non-null    int64  
 11  1980 Population              234 non-null    int64  
 12  1970 Population              234 non-null    int64  
 13  Area (km²)          

In [4]:
# No Null data
# Check duplicated data
print("Duplicated row :",df.duplicated().sum())

Duplicated row : 0


In [5]:
# create SEA df
sea = ['Brunei', 'cambodia', 'India', 'Indonesia', 'Laos', 'Malaysia',
       'Myanmar', 'Philippines', 'Singapore', 'Thailand', 'Vietnam'] # 11 countries
df_sea = df[df.Country.isin(sea)]

# rename df_sea columns (replace space in columns with _)
for col in df_sea.columns:
    df_sea.rename(columns={col:col.replace(" ","_")},inplace=True)

# round Density_(per_km²)
df_sea['Density_(per_km²)'] = round(df_sea['Density_(per_km²)']).astype("int64")

df_sea = df_sea.sort_values(['Rank'], ascending = False)
df_sea

Unnamed: 0,Rank,CCA3,Country,Capital,Continent,2022_Population,2020_Population,2015_Population,2010_Population,2000_Population,1990_Population,1980_Population,1970_Population,Area_(km²),Density_(per_km²),Growth_Rate,World_Population_Percentage
29,175,BRN,Brunei,Bandar Seri Begawan,Asia,449002,441725,421437,396053,333926,261928,187921,133343,5765,78,1.0081,0.01
187,113,SGP,Singapore,Singapore,Asia,5975689,5909869,5650018,5163590,4053602,3022209,2400729,2061831,710,8416,1.0058,0.07
110,103,LAO,Laos,Vientiane,Asia,7529475,7319399,6787419,6323418,5430853,4314443,3297519,2675283,236800,32,1.0141,0.09
122,45,MYS,Malaysia,Kuala Lumpur,Asia,33938221,33199993,31068833,28717731,22945150,17517054,13215707,10306508,330803,103,1.0109,0.43
140,26,MMR,Myanmar,Nay Pyi Taw,Asia,54179306,53423198,51483949,49390988,45538332,40099553,33465781,27284112,676578,80,1.0071,0.68
206,20,THA,Thailand,Bangkok,Asia,71697030,71475664,70294397,68270489,63066603,55228410,45737753,35791728,513120,140,1.0013,0.9
228,16,VNM,Vietnam,Hanoi,Asia,98186856,96648685,92191398,87411012,79001142,66912613,52968270,41928849,331212,296,1.0074,1.23
163,13,PHL,Philippines,Manila,Asia,115559009,112190977,103031365,94636700,77958223,61558898,48419546,37435586,342353,338,1.0147,1.45
93,4,IDN,Indonesia,Jakarta,Asia,275501339,271857970,259091970,244016173,214072421,182159874,148177096,115228394,1904569,145,1.0064,3.45
92,2,IND,India,New Delhi,Asia,1417173173,1396387127,1322866505,1240613620,1059633675,870452165,696828385,557501301,3287590,431,1.0068,17.77


In [6]:
# melt dataframe for chloreopleth
df_melt = pd.melt(df_sea, id_vars=['Country','CCA3'], value_vars=['2022_Population', '2020_Population','2015_Population','2010_Population',
                                                '2000_Population', '1990_Population', '1980_Population', '1970_Population'],
        var_name='Year', value_name='Population')

# Extract only number
df_melt['Year'] = df_melt['Year'].str.extract('(\d+)') 
df_melt.sort_values('Year', ascending = True, inplace = True)

#### ❗ **There is no Cambodia population data.**
#### ✅ Only ten countries left.   

<a id='2'></a>
# **Exploratory data analysis**

In [7]:
df_sea[['Rank','Country','2022_Population','Density_(per_km²)','Growth_Rate','World_Population_Percentage']].sort_values(['Rank'])

Unnamed: 0,Rank,Country,2022_Population,Density_(per_km²),Growth_Rate,World_Population_Percentage
92,2,India,1417173173,431,1.0068,17.77
93,4,Indonesia,275501339,145,1.0064,3.45
163,13,Philippines,115559009,338,1.0147,1.45
228,16,Vietnam,98186856,296,1.0074,1.23
206,20,Thailand,71697030,140,1.0013,0.9
140,26,Myanmar,54179306,80,1.0071,0.68
122,45,Malaysia,33938221,103,1.0109,0.43
110,103,Laos,7529475,32,1.0141,0.09
187,113,Singapore,5975689,8416,1.0058,0.07
29,175,Brunei,449002,78,1.0081,0.01


<a id='2-1'></a>
## **World population percentage and Growth rate in SEA countries** 

In [8]:
fig=make_subplots(specs=[[{"secondary_y": False}, {"secondary_y": True}]],  
                  horizontal_spacing=0.1,
                  shared_yaxes=True,
                  rows=1, cols=2,print_grid=True )

fig.add_trace(go.Bar(orientation ='h',
                     x=df_sea['World_Population_Percentage'],
                     y= df_sea['Country'],
                     name='world_pop',
                     text=[f'{t}%'  for t in df_sea['World_Population_Percentage']],
                     textposition='outside',
                     textfont_size=11,
                     cliponaxis=False,
                     marker_color=px.colors.sequential.dense), 1, 1)

fig.add_trace(go.Bar(orientation ='h',
                     x= df_sea['Growth_Rate'],
                     y= df_sea['Country'],
                     name='density',
                     text= [f'{t}'  for t in df_sea['Growth_Rate']],
                     textposition='outside',
                     cliponaxis=False,
                     marker_color= 'indigo'), 1, 2, secondary_y=True)


fig.update_layout(
    title ='<b>World population percentage and growth rate in SEA</b>',
    font=dict(
        size=12,
        color="black"), showlegend = False, template = "simple_white", 
   width=1000, height=500,
   xaxis_showticklabels=True,
   xaxis_title='World population percentage (%)',
   xaxis2_autorange='reversed',
   xaxis2_title='Growth rate',
   yaxis3_showticklabels= False)

fig.show()

This is the format of your plot grid:
[ (1,1) x,y      ]  [ (1,2) x2,y2,y3 ]



<a id='2-2'></a>
## **Growth rate over time in SEA countries**

In [9]:
fig = px.line(df_melt, x = 'Year', y='Population', color = 'Country', facet_col="Country", 
              facet_col_wrap=2, template="simple_white", width =850, height = 700, facet_col_spacing=0.05)

fig.for_each_yaxis(lambda y: y.update(title = ''))
fig.add_annotation(x=-0.07,y=0.5,
                   text="<b>Growth rate</b>", textangle=-90,
                    xref="paper", yref="paper") 
fig.update_layout(
    title="<b>Growth rate in SEA countries (%)</b>",
    xaxis_title="Year",
    showlegend = False)

fig.show()

<a id='2-3'></a>
## **2022 Population and Density barplot**

In [10]:
fig=make_subplots(specs=[[{"secondary_y": False}, {"secondary_y": True}]],  
                  horizontal_spacing=0.1,
                  shared_yaxes=True,
                  rows=1, cols=2,print_grid=True )

fig.add_trace(go.Bar(orientation ='h',
                     x=df_sea['2022_Population'],
                     y= df_sea['Country'],
                     name='2022 population',
                     text=[f'{t}'  for t in round(df_sea['2022_Population']/1000000000,2)],
                     textposition='outside',
                     textfont_size=11,
                     cliponaxis=False,
                     marker_color=px.colors.sequential.dense), 1, 1)

fig.add_trace(go.Bar(orientation ='h',
                     x= df_sea['Density_(per_km²)'],
                     y= df_sea['Country'],
                     name='density',
                     text= [f'{t}'  for t in round(df_sea['Density_(per_km²)'],2)],
                     textposition='outside',
                     cliponaxis=False,
                     marker_color=px.colors.sequential.Plotly3), 1, 2, secondary_y=True)


fig.update_layout(
    title ='<b>2022 Population and density in SEA</b>',
    font=dict(
        size=12,
        color="black"), showlegend = False, template = "simple_white", 
   width=1000, height=500,
   xaxis_showticklabels=True,
   xaxis_title='Population (Billion)',
   xaxis2_autorange='reversed',
   xaxis2_title='Density (per km²)',
   yaxis3_showticklabels= False)

fig.show()

This is the format of your plot grid:
[ (1,1) x,y      ]  [ (1,2) x2,y2,y3 ]



<a id='2-4'></a>
## **Interactive Choreopleth of Population between 1970 to 2022**

In [11]:
# Choreopleth
fig = px.choropleth(df_melt, locations="CCA3",
                    color="Population", 
                    width=900, height=600,
                    hover_name=df_melt.Country, 
                    color_continuous_scale=px.colors.sequential.dense,
                    range_color=(0, df_melt['Population'].max()),
                    animation_frame="Year",
                    labels={"Population" :"Population"})

fig.update_geos(fitbounds="locations", visible=True)
fig.update_layout(height=600,margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

<a id='2-5'></a>
# 🔥 **Summary**
### **Growth rate by population over time**
* Growth rate is quite similar among SEA countries except **India that we can clearly see the huge increase of population over time**. 
* There is a slight increase of population over time in Indonesia compared to India 

### **Population and density**
* 🔥 **India** outscored other SEA countries with total population with around **1.42 billions people** in 2022
    * With this high number, India is ranked at **the second-largest population in the world (Around 17.8 % of world population)** 
    * While India is the second most largest, the **population density is 431 people per km²** which is not much different compared to other SEA countries (Except Singapore)     
* Followed by **Indonesia**, 
    * **4th** in World ranking **(Around 3.45%)**
    * Total population in 2022 is around **0.28 billions** (Almost **5 times less than India population**)
    * Indonesia **population density is 145 people per km²**   
* Followed by **Philippines, Vietnam, Thailand, Myanmar, Malaysia**,
    * 2022 number of population are **0.12, 0.1, 0.07, 0.05, 0.03 billions** ,respectively
    * **13th, 16th, 20th, 26th, 45th** World ranking (**Around 1.45%, 1.23%, 0.9%, 0.68%, 0.43%**)
    * The **population density are 338, 296, 140, 80, 123 people per km²**    
* The **three least largest population countries** in SEA are **Laos, Singapore and Brunei**
    * **Laos** has **7.53 million people** with the **least population density (only 32 people per km²)**
    * **Singapore** which is a small island has **5.98 million people but has the most population density with 8416 people per km²** 🔥🔥🔥 **Almost 20 times higher than density in India but around 2,400 times less population than India**
    * **Brunei** is the smallest population with only **0.45 million people (around 0.01% of world population)**  with density around **78 people per km²**
    

---
**Thank you for reading**  👍

SANPHAT S.
