About Dataset
This dataset simulates a set of key economic, social, and environmental indicators for 20 countries over the period from 2010 to 2019.
The dataset is designed to reflect typical World Bank metrics, which are used for analysis, policy-making, and forecasting. It includes the following variables:

Country Name: The country for which the data is recorded.
Year: The specific year of the observation (from 2010 to 2019).
GDP (USD): Gross Domestic Product in billions of US dollars, indicating the economic output of a country.
Population: The total population of the country in millions.
Life Expectancy (in years): The average life expectancy at birth for the country’s population.
Unemployment Rate (%): The percentage of the total labor force that is unemployed but actively seeking employment.
CO2 Emissions (metric tons per capita): The per capita carbon dioxide emissions, reflecting environmental impact.
Access to Electricity (% of population): The percentage of the population with access to electricity, representing infrastructure development.
Country:

Description: Name of the country for which the data is recorded.
Data Type: String
Example: "United States", "India", "Brazil"
Year:

Description: The year in which the data is observed.
Data Type: Integer
Range: 2010 to 2019
Example: 2012, 2015
GDP (USD):

Description: The Gross Domestic Product of the country in billions of US dollars, indicating the economic output.
Data Type: Float (billions of USD)
Example: 14200.56 (represents 14,200.56 billion USD)
Population:

Description: The total population of the country in millions.
Data Type: Float (millions of people)
Example: 331.42 (represents 331.42 million people)
Life Expectancy (in years):

Description: The average number of years a newborn is expected to live, assuming that current mortality rates remain constant throughout their life.
Data Type: Float (years)
Range: Typically between 50 and 85 years
Example: 78.5 years
Unemployment Rate (%):

Description: The percentage of the total labor force that is unemployed but actively seeking employment.
Data Type: Float (percentage)
Range: Typically between 2% and 25%
Example: 6.25%
CO2 Emissions (metric tons per capita):

Description: The amount of carbon dioxide emissions per person in the country, measured in metric tons.
Data Type: Float (metric tons)
Range: Typically between 0.5 and 20 metric tons per capita
Example: 4.32 metric tons per capita
Access to Electricity (%):

Description: The percentage of the population with access to electricity.
Data Type: Float (percentage)
Range: Typically between 50% and 100%
Example: 95.7%

In [1]:
import pandas as pd
import plotly.express as px

# Data import

In [13]:
with open("../data/world_bank_dataset.csv", 'r') as file:
    colnames = ['country', 'year', 'gdp', 'population', 'life_expectancy', 'unemployment_rate', 'co2', 'access_electricity']
    data_word_bank = pd.read_csv(file, names=colnames,header=0)

data_word_bank.head()

Unnamed: 0,country,year,gdp,population,life_expectancy,unemployment_rate,co2,access_electricity
0,Brazil,2010,1493220000000.0,829020000.0,66.7,3.81,10.79,76.76
1,Japan,2011,17562700000000.0,897010000.0,61.4,17.98,15.67,67.86
2,India,2012,16426880000000.0,669850000.0,69.1,16.02,2.08,81.08
3,Mexico,2013,11890010000000.0,113800000.0,80.1,6.26,19.13,53.46
4,India,2014,2673020000000.0,29710000.0,62.7,3.1,15.66,82.17


In [16]:
data_word_bank.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 8 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   country             200 non-null    object 
 1   year                200 non-null    int64  
 2   gdp                 200 non-null    float64
 3   population          200 non-null    float64
 4   life_expectancy     200 non-null    float64
 5   unemployment_rate   200 non-null    float64
 6   co2                 200 non-null    float64
 7   access_electricity  200 non-null    float64
dtypes: float64(6), int64(1), object(1)
memory usage: 12.6+ KB


In [15]:
data_word_bank.describe()

Unnamed: 0,year,gdp,population,life_expectancy,unemployment_rate,co2,access_electricity
count,200.0,200.0,200.0,200.0,200.0,200.0,200.0
mean,2014.5,10568670000000.0,738790800.0,66.3245,13.27165,10.0582,72.87675
std,2.879489,5547703000000.0,438995600.0,9.818859,6.804166,5.712125,14.791291
min,2010.0,1011720000000.0,9970000.0,50.5,2.27,0.81,50.12
25%,2012.0,5774120000000.0,353377500.0,57.775,6.81,4.8825,60.315
50%,2014.5,10506150000000.0,721670000.0,64.9,13.47,9.745,70.28
75%,2017.0,15034510000000.0,1090860000.0,74.5,18.5425,15.6225,86.2975
max,2019.0,19983770000000.0,1498060000.0,84.9,24.79,19.84,99.76
