# 🌎 World Population Analysis
*Date: 7th of April, 2025*

## 🎯 Objective

The goal of this analysis is to examine global country-level data and uncover insights into population, rank of countries, growth rate, countries' population percentages, and other key indicators. We'll answer questions like

- What are the top 10 most populated countries in 2022 and other years?
  
- Which countries have the highest population density for each year?

- Which countries have the highest/lowest fertility rates per year?

- Does area (km²) affect growth rate?

- Does density (per km²) affect growth rate?

This analysis can help policymakers, educators, and researchers better understand global world population patterns.


## 📄 Dataset Description

**Source:** [Kaggle - Countries of the World](https://www.kaggle.com/datasets/iamsouravbanerjee/world-population-dataset)  
**Rows:** 234 countries  
**Columns:** 17 features, including

- `Rank:` Rank by Population.
- `CCA3:` 3 Digit Country/Territories Code.
- `Country/Territories:` Name of the Country/Territories.
- `Capital:` Name of the Capital.
- `Continent:` Name of the Continent.
- `2022 Population:` Population of the country/territories in the year 2022.
- `2020 Population:` Population of the Country/Territories in the year 2020.
- `2015 Population:` Population of the Country/Territories in the year 2015.
- `2010 Population:` Population of the Country/Territories in the year 2010.
- `2000 Population:` Population of the Country/Territories in the year 2000.
- `1990 Population:` Population of the Country/Territories in the year 1990.
- `1980 Population:` Population of the Country/Territories in the year 1980.
- `1970 Population:` Population of the Country/Territories in the year 1970.
- `Area (km²):` Area size of the country/territories in square kilometers.
- `Density (per km²):` Population Density per square kilometer.
- `Growth Rate:` Population Growth Rate by Country/Territories.
- `World Population Percentage:` The population percentage by each country/territory.

Note: Some columns contain missing or inconsistent values that will require cleaning.


In [1]:
# importing lbraaries
import pandas as pd

In [2]:
df = pd.read_csv(r"C:\Users\XANDER\Videos\TECH\PROJECTS\Data Analysis Projects\World Population\world_population.csv")
df.head(10)

Unnamed: 0,Rank,CCA3,Country,Capital,Continent,2022 Population,2020 Population,2015 Population,2010 Population,2000 Population,1990 Population,1980 Population,1970 Population,Area (km²),Density (per km²),Growth Rate,World Population Percentage
0,36,AFG,Afghanistan,Kabul,Asia,41128771.0,38972230.0,33753499.0,28189672.0,19542982.0,10694796.0,12486631.0,10752971.0,652230.0,63.0587,1.0257,0.52
1,138,ALB,Albania,Tirana,Europe,2842321.0,2866849.0,2882481.0,2913399.0,3182021.0,3295066.0,2941651.0,2324731.0,28748.0,98.8702,0.9957,0.04
2,34,DZA,Algeria,Algiers,Africa,44903225.0,43451666.0,39543154.0,35856344.0,30774621.0,25518074.0,18739378.0,13795915.0,2381741.0,18.8531,1.0164,0.56
3,213,ASM,American Samoa,Pago Pago,Oceania,44273.0,46189.0,51368.0,54849.0,58230.0,47818.0,32886.0,27075.0,199.0,222.4774,0.9831,0.0
4,203,AND,Andorra,Andorra la Vella,Europe,79824.0,77700.0,71746.0,71519.0,66097.0,53569.0,35611.0,19860.0,468.0,170.5641,1.01,0.0
5,42,AGO,Angola,Luanda,Africa,35588987.0,33428485.0,28127721.0,23364185.0,16394062.0,11828638.0,8330047.0,6029700.0,1246700.0,28.5466,1.0315,0.45
6,224,AIA,Anguilla,The Valley,North America,15857.0,15585.0,14525.0,13172.0,11047.0,8316.0,6560.0,6283.0,91.0,174.2527,1.0066,0.0
7,201,ATG,Antigua and Barbuda,Saint John’s,North America,93763.0,92664.0,89941.0,85695.0,75055.0,63328.0,64888.0,64516.0,442.0,212.1335,1.0058,0.0
8,33,ARG,Argentina,Buenos Aires,South America,45510318.0,45036032.0,43257065.0,41100123.0,37070774.0,32637657.0,28024803.0,23842803.0,2780400.0,16.3683,1.0052,0.57
9,140,ARM,Armenia,Yerevan,Asia,2780469.0,2805608.0,2878595.0,2946293.0,3168523.0,3556539.0,3135123.0,2534377.0,29743.0,93.4831,0.9962,0.03


In [3]:
df.shape

(234, 17)

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 234 entries, 0 to 233
Data columns (total 17 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   Rank                         234 non-null    int64  
 1   CCA3                         234 non-null    object 
 2   Country                      234 non-null    object 
 3   Capital                      234 non-null    object 
 4   Continent                    234 non-null    object 
 5   2022 Population              230 non-null    float64
 6   2020 Population              233 non-null    float64
 7   2015 Population              230 non-null    float64
 8   2010 Population              227 non-null    float64
 9   2000 Population              227 non-null    float64
 10  1990 Population              229 non-null    float64
 11  1980 Population              229 non-null    float64
 12  1970 Population              230 non-null    float64
 13  Area (km²)          

## 🧹 Data Cleaning

In [5]:
# checking for duplicates
df.duplicated().any()

np.False_

In [6]:
# checking for null values
df.isnull().sum().any()

np.True_

In [7]:
# checking columns that contain missing values
df.columns[df.isnull().any()]

Index(['2022 Population', '2020 Population', '2015 Population',
       '2010 Population', '2000 Population', '1990 Population',
       '1980 Population', '1970 Population', 'Area (km²)', 'Density (per km²)',
       'Growth Rate'],
      dtype='object')

In [8]:
df.isnull().sum()

Rank                           0
CCA3                           0
Country                        0
Capital                        0
Continent                      0
2022 Population                4
2020 Population                1
2015 Population                4
2010 Population                7
2000 Population                7
1990 Population                5
1980 Population                5
1970 Population                4
Area (km²)                     2
Density (per km²)              4
Growth Rate                    2
World Population Percentage    0
dtype: int64

In [9]:
# rows with missing values in the "2022 Population" column
df[df["2022 Population"].isnull()]

Unnamed: 0,Rank,CCA3,Country,Capital,Continent,2022 Population,2020 Population,2015 Population,2010 Population,2000 Population,1990 Population,1980 Population,1970 Population,Area (km²),Density (per km²),Growth Rate,World Population Percentage
62,159,SWZ,Eswatini,Mbabane,Africa,,1180655.0,1133936.0,1099920.0,1030496.0,854011.0,598564.0,442865.0,17364.0,69.2047,1.0079,0.02
154,120,NOR,Norway,Oslo,Europe,,5379839.0,,4889741.0,4491202.0,4241636.0,4085776.0,3875546.0,323802.0,16.7828,1.0058,0.07
157,222,PLW,Palau,Ngerulmud,Oceania,,17972.0,17794.0,18540.0,19726.0,15293.0,12252.0,11366.0,459.0,39.3355,1.0017,0.0
207,155,TLS,Timor-Leste,Dili,Asia,,1299995.0,1205813.0,1088486.0,,758106.0,642224.0,554021.0,14874.0,90.1772,1.0154,0.02


In [10]:
# rows with missing values in the dataframe
df[df.isnull().any(axis = 1)]

Unnamed: 0,Rank,CCA3,Country,Capital,Continent,2022 Population,2020 Population,2015 Population,2010 Population,2000 Population,1990 Population,1980 Population,1970 Population,Area (km²),Density (per km²),Growth Rate,World Population Percentage
13,91,AZE,Azerbaijan,Baku,Asia,10358070.0,,9863480.0,9237202.0,8190337.0,7427836.0,6383060.0,5425317.0,86600.0,119.6082,1.0044,0.13
42,28,COL,Colombia,Bogota,South America,51874020.0,50930660.0,47119730.0,,39215140.0,32601393.0,26176195.0,20905254.0,1141748.0,45.4339,1.0069,0.65
59,152,GNQ,Equatorial Guinea,Malabo,Africa,1674908.0,1596049.0,,1094524.0,684977.0,465549.0,282509.0,316955.0,28051.0,59.7094,1.0247,0.02
62,159,SWZ,Eswatini,Mbabane,Africa,,1180655.0,1133936.0,1099920.0,1030496.0,854011.0,598564.0,442865.0,17364.0,69.2047,1.0079,0.02
72,142,GMB,Gambia,Banjul,Africa,2705992.0,2573995.0,2253133.0,,,,,,10689.0,253.1567,1.025,0.03
73,131,GEO,Georgia,Tbilisi,Asia,3744385.0,3765912.0,3771132.0,,,,,,69700.0,53.7214,0.9964,0.05
90,94,HUN,Hungary,Budapest,Europe,9967308.0,9750573.0,9844246.0,9986825.0,10202060.0,,,10315366.0,93028.0,107.1431,1.0265,0.12
91,179,ISL,Iceland,Reykjavík,Europe,372899.0,366669.0,331060.0,318333.0,281462.0,,,204468.0,103000.0,3.6204,1.0069,0.0
92,2,IND,India,New Delhi,Asia,1417173000.0,1396387000.0,1322867000.0,1240614000.0,1059634000.0,,,557501301.0,3287590.0,431.0675,1.0068,17.77
100,52,CIV,Ivory Coast,Yamoussoukro,Africa,28160540.0,26811790.0,23596740.0,21120040.0,16799670.0,11910540.0,8303809.0,5477086.0,322463.0,,,0.35


- Because missing values are spread across each column and dropping all missing values would result in losing the entire rows of data of individual countries, we'll fill the missing values with the average across each column.

In [11]:
population_columns = ['2022 Population', '2020 Population', '2015 Population',
       '2010 Population', '2000 Population', '1990 Population',
       '1980 Population', '1970 Population']

In [12]:
df[population_columns] = df[population_columns].interpolate(axis = 1, limit_direction = "both")

In [14]:
# checking to see the results
df.isnull().sum()

Rank                           0
CCA3                           0
Country                        0
Capital                        0
Continent                      0
2022 Population                0
2020 Population                0
2015 Population                0
2010 Population                0
2000 Population                0
1990 Population                0
1980 Population                0
1970 Population                0
Area (km²)                     2
Density (per km²)              4
Growth Rate                    2
World Population Percentage    0
dtype: int64

In [15]:
df[df.isnull().any(axis = 1)]

Unnamed: 0,Rank,CCA3,Country,Capital,Continent,2022 Population,2020 Population,2015 Population,2010 Population,2000 Population,1990 Population,1980 Population,1970 Population,Area (km²),Density (per km²),Growth Rate,World Population Percentage
100,52,CIV,Ivory Coast,Yamoussoukro,Africa,28160542.0,26811790.0,23596741.0,21120042.0,16799670.0,11910540.0,8303809.0,5477086.0,322463.0,,,0.35
101,139,JAM,Jamaica,Kingston,North America,2827377.0,2820436.0,2794445.0,2733896.0,2612205.0,2392030.0,2135546.0,1859091.0,10991.0,,,0.04
183,72,SEN,Senegal,Dakar,Africa,17316449.0,16436119.0,14356181.0,12530121.0,9704287.0,7536001.0,5703869.0,4367744.0,,,1.0261,0.22
211,153,TTO,Trinidad and Tobago,Port-of-Spain,North America,1531044.0,1518147.0,1460177.0,1410296.0,1332203.0,1266518.0,1127852.0,988890.0,5130.0,,1.0035,0.02
227,51,VEN,Venezuela,Caracas,South America,28301696.0,28490453.0,30529716.0,28715022.0,24232800.5,19750579.0,15210443.0,11355475.0,,30.882,1.0036,0.35


In [16]:
df.columns

Index(['Rank', 'CCA3', 'Country', 'Capital', 'Continent', '2022 Population',
       '2020 Population', '2015 Population', '2010 Population',
       '2000 Population', '1990 Population', '1980 Population',
       '1970 Population', 'Area (km²)', 'Density (per km²)', 'Growth Rate',
       'World Population Percentage'],
      dtype='object')