# 📊 04_clean_BNPL_intention_to_use.ipynb

## 🎯 Purpose

This notebook prepares the dataset `BNPL_intention_to_use.xlsx` 
from Indian Institute of Management Lucknow.This dataset contains structured survey responses from 226 young shoppers in India to measure their intention to adopt Buy Now, Pay Later (BNPL) services. The data includes responses to Likert-scale items representing key behavioral constructs such as Financial Literacy, Performance Expectancy, Effort Expectancy, Perceived Usefulness, Attitude, and Intention to Use

## Relevance to Our Research
Limitations of this Source
Our project investigates how demographic and financial 
behavior features influence default risk across BNPL and 
traditional loans. This dataset is **Not usable for modeling** 
because:

- It has **no outcome labels** (like default vs. non-default).
- It has **no demographic features** (like age,gender,income,etc.).
- It needs more **Financial Behavior features** (such as credit score, existing debts, repayment history)
- The dataset is **cross-sectional**, not longitudinal, so it captures intentions at a single time point, not actual adoption behavior over time.
- Coded headers (e.g., FL1, PE3) require a **codebook** or original questionnaire to interpret meaningfully.

We can use this dataset:

- In **data exploration of attitudes and intentions**, not for modeling default risk.

If we have access to more data or can collect follow-up information, we can make this dataset much more actionable for our research question.

## Output

Cleaned version saved in:
- `../1_datasets/reference/BNPL_intention_to_use_cleaned.csv`

In [None]:
# %pip install openpyxl
# Import pandas for data handling
import pandas as pd

Defaulting to user installation because normal site-packages is not writeable
Collecting openpyxl
  Downloading openpyxl-3.1.5-py2.py3-none-any.whl.metadata (2.5 kB)
Collecting et-xmlfile (from openpyxl)
  Downloading et_xmlfile-2.0.0-py3-none-any.whl.metadata (2.7 kB)
Downloading openpyxl-3.1.5-py2.py3-none-any.whl (250 kB)
Downloading et_xmlfile-2.0.0-py3-none-any.whl (18 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-2.0.0 openpyxl-3.1.5
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.0.1 -> 25.1.1
[notice] To update, run: C:\Users\Lenovo\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


In [9]:
# Load the dataset
df = pd.read_excel("../1_datasets/raw_data/BNPL Intention to use.xlsx")

# Preview the first few rows
df.head()

Unnamed: 0,FL1,FL2,FL3,PE1,PE2,PE3,EE1,EE2,EE3,EE4,...,PU1,PU2,PU3,IA1,IA2,IA3,AT1,AT2,AT3,AT4
0,6,4,4,5,5,5,6,5,5,5,...,5,5,3,4,4,4,6,6,6,6
1,5,5,5,4,5,5,2,3,4,4,...,6,6,6,2,2,2,5,5,4,5
2,6,6,6,5,3,6,6,6,6,6,...,3,6,2,4,3,3,6,7,6,5
3,3,5,5,2,5,2,2,3,5,5,...,3,5,5,2,2,2,6,6,3,5
4,5,5,5,6,4,6,6,4,4,6,...,5,5,6,3,3,4,4,3,2,2


In [6]:
print(df.columns.tolist())

['FL1', 'FL2', 'FL3', 'PE1', 'PE2', 'PE3', 'EE1', 'EE2', 'EE3', 'EE4', 'PA1', 'PA2', 'PA3', 'PC1', 'PC2', 'PC3', 'PRE1', 'PRE3', 'PU1', 'PU2', 'PU3', 'IA1', 'IA2', 'IA3', 'AT1', 'AT2', 'AT3', 'AT4']


## 🔎 Variable Description

| Acronym               | Description                          |
|-----------------------|--------------------------------------|
| FL                    | Financial Literacy                   |
| PE                    | Performance Expectancy               |
| EE                    | Effort Expectancy                    |
| PA                    | Perceived Advantage                  |
| PC                    | Perceived Compatibility              |
| PRE                   | Perceived Risk                       |
| PU                    | Perceived Usefulness                 |
| IA                    | Intention to Adopt/Use               |
| AT                    | Attitude Toward (BNPL, technology, etc.)                 |


In [10]:
# Identify rows with missing data
missing = df[df.isnull().any(axis=1)]
print("Number of rows with missing values:", len(missing))

Number of rows with missing values: 0


In [8]:
# Check for data types
data_types = df.dtypes
print("Data types of each column:\n", data_types)

Data types of each column:
 FL1     int64
FL2     int64
FL3     int64
PE1     int64
PE2     int64
PE3     int64
EE1     int64
EE2     int64
EE3     int64
EE4     int64
PA1     int64
PA2     int64
PA3     int64
PC1     int64
PC2     int64
PC3     int64
PRE1    int64
PRE3    int64
PU1     int64
PU2     int64
PU3     int64
IA1     int64
IA2     int64
IA3     int64
AT1     int64
AT2     int64
AT3     int64
AT4     int64
dtype: object


## Save Cleaned Dataset

Although this dataset is not used for modeling, we preserve it 
for potential exploratory or historical comparison.

We store it in the `/1_datasets/reference/` folder to distinguish 
it from modeling datasets.

In [13]:
# Save to a reference subfolder
df.to_csv("../1_datasets/reference/BNPL_intention_to_use_cleaned.csv", index=False)

print(" Saved cleaned reference dataset.")

 Saved cleaned reference dataset.
