<a href="https://colab.research.google.com/github/Antony-6487/basic-calc/blob/main/Zindi%20Challenge.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Financial inclusion remains one of the main obstacles to economic and human development in Africa. For example, across Kenya, Rwanda, Tanzania, and Uganda only 9.1 million adults (or 14% of adults) have access to or use a commercial bank account.

Traditionally, access to bank accounts has been regarded as an indicator of financial inclusion. Despite the proliferation of mobile money in Africa, and the growth of innovative fintech solutions, banks still play a pivotal role in facilitating access to financial services. Access to bank accounts enable households to save and make payments while also helping businesses build up their credit-worthiness and improve their access to loans, insurance, and related services. Therefore, access to bank accounts is an essential contributor to long-term economic growth.

The objective of this competition is to create a machine learning model to predict which individuals are most likely to have or use a bank account. The models and solutions developed can provide an indication of the state of financial inclusion in Kenya, Rwanda, Tanzania and Uganda, while providing insights into some of the key factors driving individuals’ financial security.

Evaluation
The evaluation metric for this challenge is Mean Absolute Error, where 1 indicates that the individual does have a bank account and 0 indicates that they do not.

Your submission file should look like:

unique_id                   bank_account
uniqueid_1 x Kenya              1
uniqueid_2 x Kenya              0
uniqueid_3 x Kenya              1  

In [61]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import mean_absolute_error

#Loading the data files
train = pd.read_csv("Train (1).csv")
test = pd.read_csv("Test.csv")
sample = pd.read_csv("SampleSubmission.csv")

train["bank_account"] = train["bank_account"].map({"Yes": 1, "No": 0})
id_col = "uniqueid"
X = train.drop(columns=["bank_account", id_col])
y = train["bank_account"]

# Identify column types
cat_cols = X.select_dtypes(include=["object"]).columns.tolist()
num_cols = X.select_dtypes(exclude=["object"]).columns.tolist()

#Preprocessing
num_pipe = Pipeline([("imputer", SimpleImputer(strategy="median"))])
cat_pipe = Pipeline([
    ("imputer", SimpleImputer(strategy="most_frequent")),
    ("encoder", OneHotEncoder(handle_unknown="ignore"))
])

preprocessor = ColumnTransformer([
    ("num", num_pipe, num_cols),
    ("cat", cat_pipe, cat_cols)
])

#the model pipeline
model = Pipeline(steps=[
  ("preprocessor", preprocessor),
  ("classifier", RandomForestClassifier(n_estimators=300, random_state=42))])

#Splitting the data into training and validation set
X_train, X_valid, y_train, y_valid = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=42
)

model.fit(X_train, y_train)

#Evaluating
y_pred = model.predict(X_valid)
mae = mean_absolute_error(y_valid, y_pred)
print("Validation MAE:", mae)


#PREDICTING ON UNSEEN DATASET
X_test = test.drop(columns=[id_col])

test_preds = model.predict(X_test)
test["uniqueid_full"] = test["uniqueid"].astype(str) + " x " + test["country"]

submission = pd.DataFrame({
    "uniqueid": test["uniqueid_full"],
    "bank_account": test_preds
})

submission["bank_account"] = test_preds
submission.to_csv("submission.csv", index=False)
print(submission.head())

Validation MAE: 0.13560042507970244
                uniqueid  bank_account
0  uniqueid_6056 x Kenya             1
1  uniqueid_6060 x Kenya             1
2  uniqueid_6065 x Kenya             0
3  uniqueid_6072 x Kenya             0
4  uniqueid_6073 x Kenya             0


In [62]:
from google.colab import files

files.download("submission.csv")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>