# FinTech User Analytics - Business Intelligence & Statistical Analysis

## Overview

This notebook conducts statistical analysis and creates foundational visualisations for the cleaned FinTech user dataset. The analysis focuses on churn patterns, product adoption insights, and user engagement trends to prepare data insights for interactive Tableau dashboard development.

**Key Objectives:**
 * Validate business hypotheses using statistical tests
 * Identify key patterns in churn, rewards, referrals, and platform usage
 * Create foundational visualisations to guide dashboard design
 * Generate statistical summaries for business stakeholders

## 1. Environment Setup & Data Loading

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats
from scipy.stats import chi2_contingency, mannwhitneyu, ttest_ind

In [3]:
df = pd.read_csv('../Data/Cleaned/cleaned_data.csv')
print(f"Dataset Overview:")
print(f"Shape: {df.shape}")
print(f"Churn Rate: {df['churn'].mean():.1%}")

Dataset Overview:
Shape: (26542, 32)
Churn Rate: 42.1%


## 2. Data Overview

In [4]:
# Key metrics overview
key_metrics = {
    'Total Users': len(df),
    'Churned Users': df['churn'].sum(),
    'Churn Rate': f"{df['churn'].mean():.1%}",
    'Avg Age': f"{df['age'].mean():.1f}",
    'Avg Credit Score': f"{df['credit_score'].mean():.0f}",
    'App Users': df['app_downloaded'].sum(),
    'Web Users': df['web_user'].sum(),
    'Referred Users': df['is_referred'].sum()
}

for key, value in key_metrics.items():
    print(f"{key}: {value}")

Total Users: 26542
Churned Users: 11174
Churn Rate: 42.1%
Avg Age: 32.2
Avg Credit Score: 543
App Users: 25292
Web Users: 16100
Referred Users: 8454


## 3. Hypothesis Testing

### Hypothesis 1: Users with higher rewards are less likely to churn

In [5]:
# Compare rewards between churned and retained users
churned_rewards = df[df['churn'] == 1]['rewards_earned']
retained_rewards = df[df['churn'] == 0]['rewards_earned']

# Statistical test
stat, p_value = mannwhitneyu(retained_rewards, churned_rewards, alternative='greater')

print(f"Median rewards - Churned: {churned_rewards.median():.2f}")
print(f"Median rewards - Retained: {retained_rewards.median():.2f}")
print(f"Mann-Whitney U test p-value: {p_value:.4f}")
print(f"Result: {'✅ SUPPORTED' if p_value < 0.05 else '❌ NOT SUPPORTED'} (α=0.05)")

Median rewards - Churned: 15.00
Median rewards - Retained: 26.00
Mann-Whitney U test p-value: 0.0000
Result: ✅ SUPPORTED (α=0.05)


### Summary

**Method:**  
We compared the `rewards_earned` between churned and retained users using the **Mann-Whitney U test**—a non-parametric alternative to the t-test suitable for comparing distributions that are not normally distributed.

**Results:**
- **Median rewards (Churned):** 15.00
- **Median rewards (Retained):** 26.00
- **Mann-Whitney U test p-value:** 0.0000

**Interpretation:**
Since the p-value is far below the significance level (α = 0.05), we **reject the null hypothesis**.  
This indicates that **retained users tend to earn significantly more rewards** than users who churn.

**Conclusion:**  
The hypothesis is **supported**. Reward earnings are positively associated with user retention, suggesting that incentive-based programs may help reduce churn on the platform.


### Hypothesis 2: Referred users have better retention and product adoption

Comparing churn rates between referred and non-referred users using a Chi-Square Test of Independence, which assesses whether there is a significant relationship between referral status and churn.
We also evaluate whether referred users are more likely to adopt financial products.

In [7]:
# Churn rates by referral status
referral_churn = df.groupby('is_referred')['churn'].agg(['count', 'sum', 'mean'])
referral_churn['churn_rate'] = referral_churn['mean']

print("Churn rates:")
print(f"Non-referred: {referral_churn.loc[0, 'churn_rate']:.1%}")
print(f"Referred: {referral_churn.loc[1, 'churn_rate']:.1%}")

# Chi-square test for independence
contingency = pd.crosstab(df['is_referred'], df['churn'])
chi2, p_val, dof, expected = chi2_contingency(contingency)
print(f"Chi-square test p-value: {p_val:.4f}")
print(f"Result: {'SUPPORTED' if p_val < 0.05 else 'NOT SUPPORTED'} (α=0.05)")

# Credit card adoption by referral
cc_by_referral = df.groupby('is_referred')['cc_taken'].mean()
print(f"\nCredit card adoption:")
print(f"Non-referred: {cc_by_referral[0]:.1%}")
print(f"Referred: {cc_by_referral[1]:.1%}")

Churn rates:
Non-referred: 45.0%
Referred: 35.9%
Chi-square test p-value: 0.0000
Result: SUPPORTED (α=0.05)

Credit card adoption:
Non-referred: 8.2%
Referred: 5.8%


## Summary

* Referral status has a significant positive impact on user retention, with referred users showing notably lower churn rates than non-referred users. This highlights the effectiveness of referral programs in promoting long-term user engagement.
* Referral status does not appear to boost product adoption.

## Hypothesis 3: App users show higher engagement than web-only users

An engagement score is created based on total user activity, including purchases, partner purchases, deposits, credit card adoption, and rewards earned (scaled). We then compared engagement between:

 * App users – users who downloaded the app
 * Web-only users – users who accessed via web but did not download the app

A **two-sample t-test** was used to assess whether the mean engagement of app users is significantly different from that of web-only users.

In [8]:
# Create engagement score
df['engagement_score'] = (
    df['purchases'] + df['purchases_partners'] + 
    df['deposits'] + df['cc_taken'] + df['rewards_earned']/10
)

# Compare app vs web-only users
app_users = df[df['app_downloaded'] == 1]['engagement_score']
web_only = df[(df['web_user'] == 1) & (df['app_downloaded'] == 0)]['engagement_score']

print(f"Mean engagement - App users: {app_users.mean():.2f}")
print(f"Mean engagement - Web only: {web_only.mean():.2f}")

# Statistical test
stat, p_val = ttest_ind(app_users, web_only)
print(f"T-test p-value: {p_val:.4f}")
print(f"Result: {'SUPPORTED' if p_val < 0.05 else 'NOT SUPPORTED'} (α=0.05)")

Mean engagement - App users: 39.10
Mean engagement - Web only: 0.01
T-test p-value: 0.0000
Result: SUPPORTED (α=0.05)


## Summary

App users are significantly more engaged than web-only users, indicating that encouraging app downloads could be a key driver of user activity and platform involvement. Mobile channels appear to promote deeper user interaction and should be prioritised in user acquisition and retention strategies.