## Initial Inspection of FinTech Users Data

### 1. Dataset Overview
Source: [Kaggle link](https://www.kaggle.com/datasets/niketdheeryan/fintech-users-data)

### 2. Data Loading

In [6]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [7]:
# Load the dataset
df = pd.read_csv('../Data/RAW/Fintech_user.csv')  
df.head()  # Display the first few rows of the dataset

Unnamed: 0,user,churn,age,housing,credit_score,deposits,withdrawal,purchases_partners,purchases,cc_taken,...,waiting_4_loan,cancelled_loan,received_loan,rejected_loan,zodiac_sign,left_for_two_month_plus,left_for_one_month,rewards_earned,reward_rate,is_referred
0,55409,0,37.0,na,,0,0,0,0,0,...,0,0,0,0,Leo,1,0,,0.0,0
1,23547,0,28.0,R,486.0,0,0,1,0,0,...,0,0,0,0,Leo,0,0,44.0,1.47,1
2,58313,0,35.0,R,561.0,47,2,86,47,0,...,0,0,0,0,Capricorn,1,0,65.0,2.17,0
3,8095,0,26.0,R,567.0,26,3,38,25,0,...,0,0,0,0,Capricorn,0,0,33.0,1.1,1
4,61353,1,27.0,na,,0,0,2,0,0,...,0,0,0,0,Aries,1,0,1.0,0.03,0


### 3. Data Structure
- Number of rows and columns.
- Data types of each column.

In [12]:
print("Dataset Shape:", df.shape)

Dataset Shape: (27000, 31)


In [13]:
# Checking data types
print(df.dtypes)

user                         int64
churn                        int64
age                        float64
housing                     object
credit_score               float64
deposits                     int64
withdrawal                   int64
purchases_partners           int64
purchases                    int64
cc_taken                     int64
cc_recommended               int64
cc_disliked                  int64
cc_liked                     int64
cc_application_begin         int64
app_downloaded               int64
web_user                     int64
app_web_user                 int64
ios_user                     int64
android_user                 int64
registered_phones            int64
payment_type                object
waiting_4_loan               int64
cancelled_loan               int64
received_loan                int64
rejected_loan                int64
zodiac_sign                 object
left_for_two_month_plus      int64
left_for_one_month           int64
rewards_earned      

### 4. Missing Values and Duplicates
- Check for missing values.
- Check for duplicates.

In [14]:
# Checking for missing values
missing_values = df.isnull().sum()
print("Missing Values:\n", missing_values[missing_values > 0])

Missing Values:
 age                  4
credit_score      8031
rewards_earned    3227
dtype: int64


In [15]:
#checking for duplicates
duplicates = df.duplicated().sum()
print("Number of Duplicates:", duplicates)

Number of Duplicates: 458


### 5. Summary Statistics
- Numerical columns summary.
- Categorical columns summary.

In [16]:
# Checking numerical summary statistics
numerical_summary = df.describe()
print("Numerical Summary:\n", numerical_summary)

Numerical Summary:
                user         churn           age  credit_score      deposits  \
count  27000.000000  27000.000000  26996.000000  18969.000000  27000.000000   
mean   35422.702519      0.413852     32.219921    542.944225      3.341556   
std    20321.006678      0.492532      9.964838     61.059315      9.131406   
min        1.000000      0.000000     17.000000      2.000000      0.000000   
25%    17810.500000      0.000000     25.000000    507.000000      0.000000   
50%    35749.000000      0.000000     30.000000    542.000000      0.000000   
75%    53244.250000      1.000000     37.000000    578.000000      1.000000   
max    69658.000000      1.000000     91.000000    838.000000     65.000000   

         withdrawal  purchases_partners     purchases      cc_taken  \
count  27000.000000        27000.000000  27000.000000  27000.000000   
mean       0.307000           28.062519      3.273481      0.073778   
std        1.055416           42.219686      8.953077  

In [17]:
#Checking categorical summary statistics
categorical_summary = df.describe(include=['object'])
print("Categorical Summary:\n", categorical_summary)

Categorical Summary:
        housing payment_type zodiac_sign
count    27000        27000       27000
unique       3            5          13
top         na    Bi-Weekly      Cancer
freq     13860        12716        2424


### 6. Data Distribution and Visualisations
- Histograms, boxplots for numerical features.
- Bar plots for categorical features.

### 7. Correlations and Relationships
- Correlation matrix.
- Scatter plots or pair plots for interesting pairs.

### 8. Initial Insights and Next Steps
- Summary of findings.
- Recommendations for further analysis or cleaning.