# Project 1

Before code:

*The dataset I use is "The Global Findex Database" from https://www.worldbank.org/en/publication/globalfindex/download-data

*The column I concentrate on is "account_t_d", and the definition is "the percentage of respondents who report having an account at a bank or similar financial institution or report personally using a mobile money service in the past year."


1. Using Pandas

a. Read in the Data

In [None]:
# import library
import pandas as pd

# read in the data
df = pd.read_csv("GlobalFindexDatabase2025.csv")
df.info()

# clean the data
df = df[df["account_t_d"] != "NA"]

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8564 entries, 0 to 8563
Columns: 437 entries, countrynewwb to con32h_s
dtypes: float64(430), int64(1), object(6)
memory usage: 28.6+ MB


  df = pd.read_csv("GlobalFindexDatabase2025.csv")


b. Compute the Mean, the median, and the mode (using pandas)

In [41]:
# Calculate

mean = df["account_t_d"].mean()
median = df["account_t_d"].median()
mode = df["account_t_d"].mode()

# Check
print(f"mean: {mean:.4f}")
print(f"median: {median:.4f}")
print(f"mode: {mode.tolist()}")

mean: 0.6085
median: 0.6189
mode: [1.0]


2. the hard way

a. Read in the Data

In [None]:
import csv
from collections import Counter
import math

# information setup
filename = "GlobalFindexDatabase2025.csv"
column_name = "account_t_d"
values = []

# read csv
with open(filename, newline="", encoding="utf-8") as f:
    reader = csv.DictReader(f)
    for row in reader:
        # clean data
        if row[column_name] != "NA":
            values.append(row)

cleaned_column = [row[column_name] for row in values]
# str to float
cleaned_column = [float(x) for x in cleaned_column]


b. Compute the Mean, the median, and the mode

In [None]:
# mean
def calculate_mean(data):
    mean = sum(data) / len(data)
    return mean


# median
def calculate_median(data):
    sorted_data = sorted(data)
    n = len(sorted_data)

    if n % 2 == 1:
        # odd
        return sorted_data[n // 2]
    else:
        # even
        mid1 = sorted_data[n // 2 - 1]
        mid2 = sorted_data[n // 2]
        return (mid1 + mid2) / 2


# mode
def calculate_mode(data):
    # calculate freq
    frequency = {}
    for value in data:
        frequency[value] = frequency.get(value, 0) + 1

    max_freq = max(frequency.values())

    # in case more than 1 mode
    modes = [value for value, freq in frequency.items() if freq == max_freq]

    return modes, max_freq


mean = calculate_mean(cleaned_column)
median = calculate_median(cleaned_column)
modes, freq = calculate_mode(cleaned_column)

print(f"mean: {mean:.4f}")
print(f"median: {median:.4f}")
print(f"mode: {modes} (freq: {freq})")

mean: 0.6085
median: 0.6189
mode: [1.0] (freq: 186)


3. Data VisualizationðŸ’³

To know the percentage of account held in the world by year, I use the following code.

In [51]:
# group by year and count
sum_dict = df.groupby("year")["account_t_d"].mean().to_dict()
sum_dict

{2011: 0.4937317067288967,
 2014: 0.5550598447226154,
 2017: 0.6044045269927819,
 2021: 0.6985940248406766,
 2022: 0.38043651768750003,
 2024: 0.6980778228617868}

In [55]:
print("World Account Percentage, in 5%")

for year, total in sorted(sum_dict.items()):
    cards = "ðŸ’³" * int(total // 0.05)
    print(f"{year}: {cards}")

World Account Percentage, in 5%
2011: ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³
2014: ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³
2017: ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³
2021: ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³
2022: ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³
2024: ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³ðŸ’³
