# Global Countries Socio-Economic Analysis

### Exploratory Data Analysis Using Pandas

This project explores global demographic, political, and governance indicators 
across 194 countries using Python and Pandas.

## Project Objective

The goal of this analysis is to explore global country-level data and answer key 
questions related to:

- Population distribution
- Political leadership
- Democracy scores
- Regional distribution
- Naming conventions of countries

The project focuses on performing structured exploratory data analysis (EDA).

---

In [1]:
import numpy as np
import pandas as pd

In [2]:
df = pd.read_csv("../data/Countries.csv")
df.head()

Unnamed: 0,country,country_long,currency,capital_city,region,continent,demonym,latitude,longitude,agricultural_land,...,population,women_parliament_seats_pct,rural_population,urban_population,press,democracy_score,democracy_type,median_age,political_leader,title
0,Afghanistan,Islamic State of Afghanistan,Afghan afghani,Kabul,Southern Asia,Asia,Afghan,33.0,65.0,383560.0,...,41128771,27.0161,30181937,10946834,2.14,2.97,Authoritarian,12.9,Ashraf Ghani,President
1,Albania,Republic of Albania,Albanian lek,Tirana,Southern Europe,Europe,Albanian,41.0,20.0,11655.5,...,2775634,35.7143,1004807,1770827,2.62,5.98,Hybrid regime,33.7,Edi Rama,Prime Minister
2,Algeria,People's Democratic Republic of Algeria,Algerian dinar,Algiers,Northern Africa,Africa,Algerian,28.0,3.0,413588.0,...,44903225,8.10811,11328186,33575039,1.71,3.5,Authoritarian,24.0,Abdelmadjid Tebboune,President
3,Andorra,Principality of Andorra,Euro,Andorra la Vella,Southern Europe,Europe,Andorran,42.5,1.5,187.2,...,79824,46.4286,9730,70094,3.17,0.0,Unknown,38.9,Xavier Espot Zamora,Head of Government
4,Angola,People's Republic of Angola,Angolan kwanza,Luanda,Middle Africa,Africa,Angolan,-12.5,18.5,569525.0,...,35588987,33.6364,11359649,24229338,2.24,3.62,Authoritarian,12.4,João Lourenço,President


In [3]:
df.shape

(194, 64)

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 194 entries, 0 to 193
Data columns (total 64 columns):
 #   Column                                   Non-Null Count  Dtype  
---  ------                                   --------------  -----  
 0   country                                  194 non-null    object 
 1   country_long                             194 non-null    object 
 2   currency                                 194 non-null    object 
 3   capital_city                             194 non-null    object 
 4   region                                   194 non-null    object 
 5   continent                                194 non-null    object 
 6   demonym                                  194 non-null    object 
 7   latitude                                 194 non-null    float64
 8   longitude                                194 non-null    float64
 9   agricultural_land                        193 non-null    float64
 10  forest_area                              194 non-n

In [5]:
df.describe()


Unnamed: 0,latitude,longitude,agricultural_land,forest_area,land_area,rural_land,urban_land,central_government_debt_pct_gdp,expense_pct_gdp,gdp,...,net_migration,population_female,population_male,population,women_parliament_seats_pct,rural_population,urban_population,press,democracy_score,median_age
count,194.0,194.0,193.0,194.0,194.0,194.0,194.0,120.0,156.0,193.0,...,194.0,194.0,194.0,194.0,193.0,194.0,194.0,194.0,194.0,194.0
mean,18.975601,22.027491,245455.1,208678.4,667508.7,656371.1,9777.116531,66.759366,30.051403,514485100000.0,...,-51.407216,20283160.0,20505720.0,40788880.0,25.022994,17633220.0,23155660.0,2.53933,4.644536,25.661856
std,23.876225,66.396389,635626.8,782492.6,1837107.0,1811169.0,42301.458421,71.806247,26.74088,2307148000000.0,...,94525.968598,72589410.0,76071580.0,148647000.0,12.671044,76641870.0,79403930.0,1.800128,2.818297,9.415569
min,-41.0,-175.0,4.0,0.0,2.027,0.0349545,0.0,0.0,0.000267,60349400.0,...,-525116.0,5513.0,5799.0,11312.0,0.0,0.0,5717.0,0.0,0.0,10.5
25%,4.0,-5.0,6464.0,3331.775,23552.5,21865.62,359.61825,31.9513,18.371875,11813900000.0,...,-12242.25,1036218.0,1044902.0,2106358.0,15.3846,589664.0,1222244.0,1.525,2.7225,16.95
50%,16.583333,21.5,38727.8,25289.25,120375.0,115994.5,1645.17,55.42685,27.33775,41153900000.0,...,-970.0,4502713.0,4450049.0,9125614.0,25.2525,2512382.0,4508837.0,2.4,5.05,24.95
75%,40.0,50.1625,215000.0,123673.5,523700.0,491150.5,4054.0225,79.5393,35.0835,251945000000.0,...,2904.25,15266060.0,14788720.0,30313610.0,33.6364,11333540.0,16213550.0,2.925,6.9675,34.05
max,65.0,178.0,5285080.0,8153120.0,16376900.0,16224200.0,522345.0,687.994,310.443,25462700000000.0,...,561580.0,691528500.0,731180500.0,1417173000.0,61.25,908804800.0,897578400.0,10.0,9.87,50.5


In [6]:
df.isnull().sum()

country             0
country_long        0
currency            0
capital_city        0
region              0
                   ..
democracy_score     0
democracy_type      0
median_age          0
political_leader    7
title               7
Length: 64, dtype: int64

### Observation:
- The dataset contains 194 countries and 64 features.
- Some columns contain missing values.
- Data includes demographic, political, economic, and regional indicators.

---

## Section 1: Demographic Analysis

### Country with the highest population

In [7]:
highest_population = df[df["population"] == df["population"].max()]
highest_population[["country", "population", "capital_city"]]

Unnamed: 0,country,population,capital_city
75,India,1417173173,New Delhi


### Country with the lowest population

In [8]:
lowest_population = df[df["population"] == df["population"].min()]
lowest_population[["country", "population", "capital_city"]]

Unnamed: 0,country,population,capital_city
179,Tuvalu,11312,Funafuti


### Second Highest Populated Country & Political Leader

In [9]:
second_highest = df.sort_values("population", ascending=False)
second_highest[["country", "population", "political_leader"]].iloc[1]

country                  China
population          1412175000
political_leader    Xi Jinping
Name: 34, dtype: object

### African Region – Highest Populated Country

In [10]:
africa = df[df["continent"] == "Africa"]
africa[africa["population"] == africa["population"].max()]

Unnamed: 0,country,country_long,currency,capital_city,region,continent,demonym,latitude,longitude,agricultural_land,...,population,women_parliament_seats_pct,rural_population,urban_population,press,democracy_score,democracy_type,median_age,political_leader,title
125,Nigeria,Federal Republic of Nigeria,Nigerian naira,Abuja,Western Africa,Africa,Nigerian,10.0,8.0,694500.0,...,218541212,3.61111,101575770,116965442,2.14,4.44,Hybrid regime,13.2,Muhammadu Buhari,President


---

## SECTION 2: Political & Governance Analysis

### Top 5 Countries by Democratic Score

In [11]:
democratic_score = df.sort_values("democracy_score", ascending= False).copy()
democratic_score.head(5)

Unnamed: 0,country,country_long,currency,capital_city,region,continent,demonym,latitude,longitude,agricultural_land,...,population,women_parliament_seats_pct,rural_population,urban_population,press,democracy_score,democracy_type,median_age,political_leader,title
127,Norway,Kingdom of Norway,Norwegian krone,Oslo,Northern Europe,Europe,Norwegian,62.0,10.0,9859.62,...,5457127,44.9704,891476,4565651,10.0,9.87,Full democracy,35.6,Erna Solberg,Prime Minister
74,Iceland,Republic of Iceland,Iceland krona,Reykjavík,Northern Europe,Europe,Icelander,65.0,-18.0,18720.0,...,381900,47.619,22945,358955,5.32,9.58,Full democracy,32.1,Katrín Jakobsdóttir,Prime Minister
164,Sweden,Kingdom of Sweden,Swedish krona,Stockholm,Northern Europe,Europe,Swedish,62.0,15.0,30055.4,...,10486941,46.4183,1206837,9280104,9.41,9.39,Full democracy,35.6,Stefan Löfven,Prime Minister
122,New Zealand,New Zealand,New Zealand dollar,Wellington,Australia and New Zealand,Oceania,New Zealander,-41.0,174.0,101540.0,...,5124100,50.4202,672077,4452023,7.27,9.26,Full democracy,32.8,Jacinda Ardern,Prime Minister
46,Denmark,Kingdom of Denmark,Danish krone,Copenhagen,Northern Europe,Europe,Danish,56.0,10.0,26199.9,...,5903037,43.5754,686700,5216337,7.92,9.22,Full democracy,37.2,Mette Frederiksen,Prime Minister


### Countries with Unknown Political Leaders

In [12]:
print(f"There are {df[df["political_leader"].isna()]['country'].count()} countries with unknown political leader")

There are 7 countries with unknown political leader


### Countries with “Republic” in Their Name

In [13]:
def find(txt):
    if "Republic" in txt:
        return txt
        

print(f"There are {df['country_long'].apply(find).count()} countries who have \"Republic\" in their name.")

There are 125 countries who have "Republic" in their name.


---

## SECTION 3: Regional Analysis

### Total Number of Regions

In [14]:
df["region"].nunique()

22

### Countries in Eastern Europe

In [15]:
df[df["region"] == "Eastern Europe"]

Unnamed: 0,country,country_long,currency,capital_city,region,continent,demonym,latitude,longitude,agricultural_land,...,population,women_parliament_seats_pct,rural_population,urban_population,press,democracy_score,democracy_type,median_age,political_leader,title
14,Belarus,Republic of Belarus,Belarusian rubel,Minsk,Eastern Europe,Europe,Belarusian,53.0,28.0,82810.0,...,9208701,40.0,1811720,7396981,1.51,3.13,Authoritarian,36.5,Alexander Lukashenko,President
24,Bulgaria,Republic of Bulgaria,Bulgarian lev,Sofia,Eastern Europe,Europe,Bulgarian,43.0,25.0,50470.0,...,6465097,24.1667,1528155,4936942,2.23,7.03,Flawed democracy,40.7,Boyko Borisov,Prime Minister
43,Czech Republic,Czech Republic,Czech koruna,Prague,Eastern Europe,Europe,Czech,49.75,15.5,35238.7,...,10526073,25.5,2697096,7828977,3.14,7.69,Flawed democracy,38.8,Andrej Babiš,Prime Minister
73,Hungary,Hungary,Hungarian forint,Budapest,Eastern Europe,Europe,Hungarian,47.0,20.0,49030.0,...,9683505,14.0704,2657928,7025577,2.57,6.63,Flawed democracy,38.1,Viktor Orbán,Prime Minister
111,Moldova,Republic of Moldova,Moldovan leu,Chișinău,Eastern Europe,Europe,Moldovan,47.0,29.0,22646.0,...,2592477,40.5941,1473227,1119250,2.51,5.85,Hybrid regime,31.2,Ion Chicu,Prime Minister
136,Poland,Republic of Poland,Polish zloty,Warsaw,Eastern Europe,Europe,Polish,52.0,20.0,144610.0,...,37561599,28.2609,14974307,22587292,2.71,6.67,Flawed democracy,36.3,Mateusz Morawiecki,Prime Minister
139,Romania,Romania,New Romanian leu,Bucharest,Eastern Europe,Europe,Romanian,46.0,25.0,135910.0,...,18956666,19.0909,8627368,10329298,3.05,6.38,Flawed democracy,37.4,Klaus Iohannis,President
140,Russia,Russian Federation,Russian ruble,Moscow,Eastern Europe,Europe,Russian,60.0,100.0,2154940.0,...,143555736,16.2222,35708054,107847682,1.55,2.94,Authoritarian,35.0,Vladimir Putin,President
151,Slovak Republic,Slovak Republic,Euro,Bratislava,Eastern Europe,Europe,Slovak,48.666667,19.5,18830.0,...,5431752,21.3333,2503549,2928203,3.32,7.1,Flawed democracy,36.0,,
181,Ukraine,Ukraine,Ukrainian hryvnia,Kiev,Eastern Europe,Europe,Ukrainian,49.0,32.0,413110.0,...,38000000,20.331,11430780,26569220,2.41,5.69,Hybrid regime,38.9,Volodymyr Zelensky,President


---

## Section 4: Feature Engineering

### Population Density

In [16]:
df["population_density"] = df["population"] / df["land_area"]
df[["country","population","land_area", "population_density"]].head(10)

Unnamed: 0,country,population,land_area,population_density
0,Afghanistan,41128771,652230.0,63.058692
1,Albania,2775634,27400.0,101.300511
2,Algeria,44903225,2381740.0,18.853118
3,Andorra,79824,470.0,169.838298
4,Angola,35588987,1246700.0,28.546552
5,Antigua and Barbuda,93763,440.0,213.097727
6,Argentina,46234830,2736690.0,16.894435
7,Armenia,2780469,28470.0,97.663119
8,Australia,25978935,7692020.0,3.377388
9,Austria,9042528,82520.0,109.579835


### GDP per capita

In [17]:
df["gdp_per_capita"] = df["gdp"] / df["population"]
df[["country", "gdp", "population", "gdp_per_capita"]].head(10)

Unnamed: 0,country,gdp,population,gdp_per_capita
0,Afghanistan,14583100000.0,41128771,354.571742
1,Albania,18882100000.0,2775634,6802.806134
2,Algeria,191913000000.0,44903225,4273.924646
3,Andorra,3352030000.0,79824,41992.75907
4,Angola,106714000000.0,35588987,2998.511871
5,Antigua and Barbuda,1757600000.0,93763,18745.134008
6,Argentina,632770000000.0,46234830,13686.002522
7,Armenia,19502800000.0,2780469,7014.212351
8,Australia,1675420000000.0,25978935,64491.481271
9,Austria,471400000000.0,9042528,52131.43935


### CO₂ Emissions per Capita

In [18]:
df["co2_per_capita"] = df["co2_emissions"] / df["population"]
df[["country", "co2_emissions", "population", "co2_per_capita"]].head(10)

Unnamed: 0,country,co2_emissions,population,co2_per_capita
0,Afghanistan,8709.47,41128771,0.000212
1,Albania,4383.2,2775634,0.001579
2,Algeria,161563.0,44903225,0.003598
3,Andorra,448.884,79824,0.005623
4,Angola,19814.5,35588987,0.000557
5,Antigua and Barbuda,474.6,93763,0.005062
6,Argentina,154536.0,46234830,0.003342
7,Armenia,6746.6,2780469,0.002426
8,Australia,378997.0,25978935,0.014589
9,Austria,59142.4,9042528,0.00654


### Renewable Energy Gap

In [19]:
df["renewable_gap"] = df["renewable_energy_consumption_pct"] - df["fossil_energy_consumption_pct"]
df[["country", "renewable_energy_consumption_pct", "fossil_energy_consumption_pct", "renewable_gap"]].head(10)

Unnamed: 0,country,renewable_energy_consumption_pct,fossil_energy_consumption_pct,renewable_gap
0,Afghanistan,17.86,,
1,Albania,44.58,61.4218,-16.8418
2,Algeria,0.15,99.9779,-99.8279
3,Andorra,20.59,,
4,Angola,61.02,48.3056,12.7144
5,Antigua and Barbuda,0.72,0.0,0.72
6,Argentina,9.84,87.7224,-77.8824
7,Armenia,8.38,74.5619,-66.1819
8,Australia,10.89,89.6256,-78.7356
9,Austria,35.77,65.6618,-29.8918


## Saving Processed Dataset

In [20]:
df.to_csv("../data/Countries_processed.csv", index=False)