## 1. BUSINESS UNDERSTANDING

#### 1.1 Business Context
PaySwift Ghana is a growing fintech company providing:
- Mobile payment solutions for individuals and small businesses,
- Micro-loans and credit scoring services, and
- Personal finance management tools, such as budgeting and expense tracking.

Despite its innovative offerings, PaySwift operates in a highly competitive and cost-sensitive Ghanaian fintech market, where customer trust, user experience, and repayment discipline strongly determine long-term success. The company’s growth strategy depends on using data-driven insights to optimize product design, manage financial risk, and improve customer retention.

#### 1.2 Business Problem 
Despite its innovative offerings, PaySwift faces challenges that threaten its long-term growth. The company has observed that customer engagement with new digital features such as savings goals and budgeting tools is lower than expected. Additionally, the rate of loan defaults among micro-borrowers is rising, which poses a risk to profitability and operational sustainability. Furthermore, customer retention remains a concern, as a significant proportion of users become inactive after initial registration. These challenges point to a broader need for data-driven insight to understand customer behavior, assess credit risk, and improve service adoption.

#### 1.3 Business Objectives
- Increase Feature Adoption – Identify behavioral patterns that drive or hinder the use of underutilized features and design interventions to boost engagement.
- Reduce Loan Default Rates – Develop data-based risk profiles to predict and prevent high-risk borrowers from defaulting.
- Improve Customer Retention – Detect churn signals early and personalize retention and re-engagement strategies.

#### 1.4 Key Stakeholders
- **Product Managers** – Responsible for feature development and user experience optimization.
- **Credut & Risk Management Team** – Oversees credit risk assessment and loan portfolio health.
- **Marketing Team** – Focuses on customer acquisition, retention, and engagement strategies.
- **Data Science Team** – Analyzes data to generate insights and build predictive models.
- **Executive Leadership** – Makes strategic decisions based on data-driven insights to guide company growth.
- **Customers** – End-users whose behavior and satisfaction directly impact business success.

#### 1.5 Business Questions
- Which customer segments show the highest and lowest adoption of PaySwift’s key features?
- How does feature usage relate to loan repayment performance?
- What patterns exist between transaction activity and customer churn?
- What demographic and behavioral factors most influence loan default rates?
- Which combination of features, age, and transaction level drives long-term customer retention?


#### 2. DATA UNDERSTANDING

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [4]:
#Load the dataset
payswift_data = pd.read_csv('../Data/PaySwift.csv')

payswift_data.head()

<IPython.core.display.Javascript object>

Unnamed: 0,User ID,Age,Loan Amount (GHS),Repayment Status,Feature Used Most,Monthly Transactions,Churned?,Default Risk (1-5)
0,1001,28,2000,Late Payment,Mobile Payments,12,No,4
1,1002,35,5000,On-Time,Budgeting Tool,25,No,1
2,1003,22,1000,Defaulted,Loan Services,5,Yes,5
3,1004,40,3500,On-Time,Savings Tracker,18,No,2
4,1005,30,4000,Late Payment,Investment Advice,9,Yes,3


In [7]:
#Inspect the data types and missing values
print("The number of rows and columns:", payswift_data.shape)

payswift_data.info()


The number of rows and columns: (100, 8)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 8 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   User ID               100 non-null    int64 
 1   Age                   100 non-null    int64 
 2   Loan Amount (GHS)     100 non-null    int64 
 3   Repayment Status      99 non-null     object
 4   Feature Used Most     84 non-null     object
 5   Monthly Transactions  100 non-null    int64 
 6   Churned?              100 non-null    object
 7   Default Risk (1-5)    100 non-null    int64 
dtypes: int64(5), object(3)
memory usage: 6.4+ KB


In [10]:
#Check for duplicates rows and columns
duplicated_rowa = payswift_data.duplicated().sum()
print("Number of duplicate rows:", duplicated_rowa)

duplicated_columns = payswift_data.duplicated().T.sum()
print("Number of duplicate columns:", duplicated_columns)



Number of duplicate rows: 0
Number of duplicate columns: 0


In [9]:
#check for missing values
missing_values = payswift_data.isnull().sum()
print("Missing values in each column:\n", missing_values)

Missing values in each column:
 User ID                  0
Age                      0
Loan Amount (GHS)        0
Repayment Status         1
Feature Used Most       16
Monthly Transactions     0
Churned?                 0
Default Risk (1-5)       0
dtype: int64


In [14]:
#Change data types if necessary
# changing User_ID to string
payswift_data['User ID'] = payswift_data['User ID'].astype(str)

In [15]:
payswift_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 8 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   User ID               100 non-null    object
 1   Age                   100 non-null    int64 
 2   Loan Amount (GHS)     100 non-null    int64 
 3   Repayment Status      99 non-null     object
 4   Feature Used Most     84 non-null     object
 5   Monthly Transactions  100 non-null    int64 
 6   Churned?              100 non-null    object
 7   Default Risk (1-5)    100 non-null    int64 
dtypes: int64(4), object(4)
memory usage: 6.4+ KB


In [16]:
#initial data exploration
payswift_data.describe().round(2)

Unnamed: 0,Age,Loan Amount (GHS),Monthly Transactions,Default Risk (1-5)
count,100.0,100.0,100.0,100.0
mean,33.17,3432.5,12.52,3.28
std,8.66,2048.7,8.2,1.48
min,19.0,800.0,1.0,1.0
25%,26.0,1537.5,5.75,2.0
50%,32.5,3150.0,11.0,3.0
75%,40.0,4825.0,19.0,5.0
max,50.0,8000.0,30.0,5.0


In [17]:
# Get summary statistics for categorical columns
categorical_stats = payswift_data.describe(include='object')
print("Categorical Column Statistics:\n")
print(categorical_stats)

# Get value counts for each categorical column
print("\nDetailed value counts for each categorical column:\n")
for column in payswift_data.select_dtypes(include=['object']).columns:
    print(f"\nValue counts for {column}:")
    print(payswift_data[column].value_counts())
    print(f"Unique values: {payswift_data[column].nunique()}")
    print("-" * 50)

Categorical Column Statistics:

       User ID Repayment Status Feature Used Most Churned?
count      100               99                84      100
unique     100                3                 5        2
top       1001          On-Time   Mobile Payments       No
freq         1               34                17       66

Detailed value counts for each categorical column:


Value counts for User ID:
User ID
1001    1
1002    1
1003    1
1004    1
1005    1
       ..
1096    1
1097    1
1098    1
1099    1
1100    1
Name: count, Length: 100, dtype: int64
Unique values: 100
--------------------------------------------------

Value counts for Repayment Status:
Repayment Status
On-Time         34
Late Payment    33
Defaulted       32
Name: count, dtype: int64
Unique values: 3
--------------------------------------------------

Value counts for Feature Used Most:
Feature Used Most
Mobile Payments      17
Budgeting Tool       17
Loan Services        17
Savings Tracker      17
Investment 

In [18]:
# Handling missing values using condiitional imputation
cond1 = (payswift_data['Feature Used Most'].isna()) & (payswift_data['Default Risk (1-5)'] >= 4)
cond2 = (payswift_data['Feature Used Most'].isna()) & (payswift_data['Default Risk (1-5)'] <= 2) & (payswift_data['Monthly Transactions'] >= 15)
cond3 = (payswift_data['Feature Used Most'].isna()) & (payswift_data['Default Risk (1-5)'] <= 2) & (payswift_data['Monthly Transactions'] < 10)
cond4 = (payswift_data['Feature Used Most'].isna()) & (payswift_data['Default Risk (1-5)'] == 3) & (payswift_data['Churned?'] == 'Yes')

payswift_data.loc[cond1, 'Feature Used Most'] = 'Loan Services'
payswift_data.loc[cond2, 'Feature Used Most'] = 'Mobile Payments'
payswift_data.loc[cond3, 'Feature Used Most'] = 'Budgeting Tool'
payswift_data.loc[cond4, 'Feature Used Most'] = 'Savings Tracker'
payswift_data['Feature Used Most'] = payswift_data['Feature Used Most'].fillna('Not Specified')

In [21]:
payswift_data.head(50)

Unnamed: 0,User ID,Age,Loan Amount (GHS),Repayment Status,Feature Used Most,Monthly Transactions,Churned?,Default Risk (1-5)
0,1001,28,2000,Late Payment,Mobile Payments,12,No,4
1,1002,35,5000,On-Time,Budgeting Tool,25,No,1
2,1003,22,1000,Defaulted,Loan Services,5,Yes,5
3,1004,40,3500,On-Time,Savings Tracker,18,No,2
4,1005,30,4000,Late Payment,Investment Advice,9,Yes,3
5,1006,27,2500,Defaulted,Loan Services,6,Yes,5
6,1007,45,6000,On-Time,Mobile Payments,30,No,1
7,1008,33,1500,Late Payment,Budgeting Tool,8,No,3
8,1009,29,3000,Defaulted,Loan Services,4,Yes,5
9,1010,50,7500,On-Time,Savings Tracker,22,No,2


In [22]:
# Handling missing values in repayment status
payswift_data.loc[payswift_data['Repayment Status'].isna() & (payswift_data['Default Risk (1-5)'] >= 4), 'Repayment Status'] = 'Defaulted'
payswift_data.loc[payswift_data['Repayment Status'].isna() & (payswift_data['Default Risk (1-5)'] == 3) & (payswift_data['Churned?'] == 'Yes'), 'Repayment Status'] = 'Late Payment'
payswift_data.loc[payswift_data['Repayment Status'].isna() & (payswift_data['Default Risk (1-5)'] <= 2), 'Repayment Status'] = 'On-Time'
payswift_data['Repayment Status'] = payswift_data['Repayment Status'].fillna('Unknown')

In [25]:
payswift_data.head(11)

Unnamed: 0,User ID,Age,Loan Amount (GHS),Repayment Status,Feature Used Most,Monthly Transactions,Churned?,Default Risk (1-5)
0,1001,28,2000,Late Payment,Mobile Payments,12,No,4
1,1002,35,5000,On-Time,Budgeting Tool,25,No,1
2,1003,22,1000,Defaulted,Loan Services,5,Yes,5
3,1004,40,3500,On-Time,Savings Tracker,18,No,2
4,1005,30,4000,Late Payment,Investment Advice,9,Yes,3
5,1006,27,2500,Defaulted,Loan Services,6,Yes,5
6,1007,45,6000,On-Time,Mobile Payments,30,No,1
7,1008,33,1500,Late Payment,Budgeting Tool,8,No,3
8,1009,29,3000,Defaulted,Loan Services,4,Yes,5
9,1010,50,7500,On-Time,Savings Tracker,22,No,2


### Initial Inspection of the Data 