# FinTech User Analytics - Business Intelligence & Statistical Analysis

## Overview

This notebook conducts statistical analysis and creates foundational visualisations for the cleaned FinTech user dataset. The analysis focuses on churn patterns, product adoption insights, and user engagement trends to prepare data insights for interactive Tableau dashboard development.

**Key Objectives:**
 * Validate business hypotheses using statistical tests
 * Identify key patterns in churn, rewards, referrals, and platform usage
 * Create foundational visualisations to guide dashboard design
 * Generate statistical summaries for business stakeholders

## 1. Environment Setup & Data Loading

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats
from scipy.stats import chi2_contingency, mannwhitneyu, ttest_ind

In [3]:
df = pd.read_csv('../Data/Cleaned/cleaned_data.csv')
print(f"Dataset Overview:")
print(f"Shape: {df.shape}")
print(f"Churn Rate: {df['churn'].mean():.1%}")

Dataset Overview:
Shape: (26542, 32)
Churn Rate: 42.1%


## 2. Data Overview

In [4]:
# Key metrics overview
key_metrics = {
    'Total Users': len(df),
    'Churned Users': df['churn'].sum(),
    'Churn Rate': f"{df['churn'].mean():.1%}",
    'Avg Age': f"{df['age'].mean():.1f}",
    'Avg Credit Score': f"{df['credit_score'].mean():.0f}",
    'App Users': df['app_downloaded'].sum(),
    'Web Users': df['web_user'].sum(),
    'Referred Users': df['is_referred'].sum()
}

for key, value in key_metrics.items():
    print(f"{key}: {value}")

Total Users: 26542
Churned Users: 11174
Churn Rate: 42.1%
Avg Age: 32.2
Avg Credit Score: 543
App Users: 25292
Web Users: 16100
Referred Users: 8454


## 3. Hypothesis Testing

### Hypothesis 1: Users with higher rewards are less likely to churn

In [5]:
# Compare rewards between churned and retained users
churned_rewards = df[df['churn'] == 1]['rewards_earned']
retained_rewards = df[df['churn'] == 0]['rewards_earned']

# Statistical test
stat, p_value = mannwhitneyu(retained_rewards, churned_rewards, alternative='greater')

print(f"Median rewards - Churned: {churned_rewards.median():.2f}")
print(f"Median rewards - Retained: {retained_rewards.median():.2f}")
print(f"Mann-Whitney U test p-value: {p_value:.4f}")
print(f"Result: {'✅ SUPPORTED' if p_value < 0.05 else '❌ NOT SUPPORTED'} (α=0.05)")

Median rewards - Churned: 15.00
Median rewards - Retained: 26.00
Mann-Whitney U test p-value: 0.0000
Result: ✅ SUPPORTED (α=0.05)


### Summary

**Method:**  
We compared the `rewards_earned` between churned and retained users using the **Mann-Whitney U test**—a non-parametric alternative to the t-test suitable for comparing distributions that are not normally distributed.

**Results:**
- **Median rewards (Churned):** 15.00
- **Median rewards (Retained):** 26.00
- **Mann-Whitney U test p-value:** 0.0000

**Interpretation:**
Since the p-value is far below the significance level (α = 0.05), we **reject the null hypothesis**.  
This indicates that **retained users tend to earn significantly more rewards** than users who churn.

**Conclusion:**  
The hypothesis is **supported**. Reward earnings are positively associated with user retention, suggesting that incentive-based programs may help reduce churn on the platform.


### Hypothesis 2: Referred users have better retention and product adoption