# **World Development Indicators Analysis**

## **Introduction**

The following project is meant to analyze a wide range of socioeconomic, demographic, environmental, and development-related statistics for countries around the world. The dataset covers various aspects of development, including economic growth, poverty, education, health, environment, governance, and more. Analyzing the extensive dataset compiled by the World Bank involves examining a wide range of statistics to gain insights into global trends, regional disparities, and the progress of countries in terms of development. This static dataset covers 1960 - 2015.

The dataset contains the following file formats: CSV and SQLITE File but the analysis will only use the SQLITE File
### **Data source**

To access and organize the dataset i.e. World Development Indicators dataset run the "get_dataset.py" file.

In [1]:
# Libraries

import sqlite3
import pandas as pd

In [2]:
# Creates a connection to the World Development Indicators database.

wdi_db_path = r"C:\Users\DSL\Documents\python\W.D.I. Analysis\W.D.I. Dataset\indicators.sqlite"
db_connection = sqlite3.connect(wdi_db_path)
cursor = db_connection.cursor()

def connect_db():
    
    try: # Test database connection.
        cursor.execute("SELECT * FROM sqlite_master WHERE type='table';")
        global all_tables
        all_tables = cursor.fetchall()
        print("Database connection successful...")
    except sqlite3.Error as error:
        print('Error occurred - ', error)

connect_db()
# Remember to close the connection

Database connection successful...


In [3]:
# A table representing the aggregate of rows and columns in the database.

def db_totals():
    print(f"Table\t\t\t Total Rows \t\t\tTotal Columns\n{'-' * 75}")
    for tables in all_tables:
        table_name = tables[1]
        total_rows_query = f"SELECT COUNT(*) from {table_name};"
        cursor.execute(total_rows_query)
        total_rows = cursor.fetchone()[0]
        cursor.execute(f"PRAGMA table_info({table_name})")
        columns = cursor.fetchall()
        num_columns = len(columns)
        print(f"{table_name:<20} {total_rows:>20} {num_columns:>20}")

db_totals()


Table			 Total Rows 			Total Columns
---------------------------------------------------------------------------
Country                               247                   31
CountryNotes                         4857                    3
Series                               1345                   20
Indicators                        5656458                    6
SeriesNotes                           369                    3
Footnotes                          532415                    4


In [4]:
# Queries the selected tables instructed by user input from the db for the specified columns to display the column information of the table.
selected_tables = ["Country", "CountryNotes", "Indicators", "Series"]
tables_columns = [
        {
            "Country": ["CountryCode", "ShortName", "Region", "IncomeGroup"],
            "CountryNotes": ["Countrycode", "Seriescode", "Description"],
            "Indicators": ["CountryName", "CountryCode", "IndicatorName", "IndicatorCode", "Year", "Value"],
            "Series": ["SeriesCode", "Topic", "IndicatorName", "LongDefinition"]
        }
    ]
print(selected_tables, "\n")

def view_relevant_data():
    table_name = input("Table name: ").strip()
    
    if table_name in selected_tables:
        cursor.execute(f"PRAGMA table_info({table_name});")
        result = cursor.fetchall()

        print(f"{table_name} Table")
        for columns in result:
            if columns[1] in tables_columns[0][table_name]:
                print(columns)

view_relevant_data()


['Country', 'CountryNotes', 'Indicators', 'Series'] 

Indicators Table
(0, 'CountryName', 'TEXT', 0, None, 0)
(1, 'CountryCode', 'TEXT', 0, None, 0)
(2, 'IndicatorName', 'TEXT', 0, None, 0)
(3, 'IndicatorCode', 'TEXT', 0, None, 0)
(4, 'Year', 'INTEGER', 0, None, 0)
(5, 'Value', 'NUMERIC', 0, None, 0)


In [5]:
table_name = input("Table name: ").strip()

available_columns = tables_columns[0][table_name]
query = f"SELECT {', '.join(available_columns)} FROM {table_name};"
df = pd.read_sql_query(query, db_connection)
random_sample = df.sample(n=10) # Random sample of the DataFrame
random_sample



Unnamed: 0,SeriesCode,Topic,IndicatorName,LongDefinition
1022,TX.VAL.MRCH.R1.ZS,Private Sector & Trade: Exports,Merchandise exports to developing economies in...,Merchandise exports to developing economies in...
1128,GC.BAL.CASH.CN,Public Sector: Government finance: Deficit & f...,Cash surplus/deficit (current LCU),Cash surplus or deficit is revenue (including ...
173,DT.DIS.BLTC.CD,Economic Policy & Debt: External debt: Disburs...,"PPG, bilateral concessional (DIS, current US$)",Bilateral debt includes loans from governments...
271,NY.ADJ.DCO2.GN.ZS,Economic Policy & Debt: National accounts: Adj...,Adjusted savings: carbon dioxide damage (% of ...,Carbon dioxide damage is estimated to be $20 p...
757,NY.GDP.DEFL.ZS,Financial Sector: Exchange rates & prices,GDP deflator (base year varies by country),The GDP implicit deflator is the ratio of GDP ...
370,NE.CON.GOVT.ZS,Economic Policy & Debt: National accounts: Sha...,General government final consumption expenditu...,General government final consumption expenditu...
834,SP.DYN.TO65.MA.ZS,Health: Mortality,"Survival to age 65, male (% of cohort)",Survival to age 65 refers to the percentage of...
679,EG.NSF.ACCS.UR.ZS,Environment: Energy production & use,"Access to non-solid fuel, urban (% of urban po...","Access to non-solid fuel, urban is the percent..."
526,SE.SEC.REPT.ZS,Education: Efficiency,"Repeaters, secondary, total (% of total enroll...",Repeaters in secondary school are the number o...
376,NE.DAB.DEFL.ZS,Economic Policy & Debt: National accounts: Sha...,Gross national expenditure deflator (base year...,Gross national expenditure (formerly domestic ...
