### Bank Trasaction System (Data Analysis)

##### Description
The Bank Transaction System for Data Analysis is a data-driven application that records and analyzes banking transactions such as deposits, withdrawals, and transfers. It enables efficient data management, visualization of customer behavior, detection of suspicious activities, and financial trend analysis using tools like dashboards, charts, and statistical methods. The system supports decision-making by providing meaningful insights into transaction patterns and banking performance.

##### 1. Data Cleaning
##### a. Load data

In [2]:
import pandas as pd

# Load the dataset into a DataFrame
df = pd.read_csv('Dataset.csv')

# Display the number of missing values per column
print("Missing values per column:\n", df.isnull().sum())


Missing values per column:
 transaction_id                 0
customer_id                    0
account_type                 193
branch_code                    0
transaction_type             245
transaction_amount             0
transaction_date               0
channel                      250
balance_after_transaction      0
merchant_category_code         0
geo_location                   0
dtype: int64


#### 2. Clean Data

In [5]:
import numpy as np

# Convert 'account_type' to string and normalize the values
df['account_type'] = df['account_type'].astype(str).str.strip()

# Replace string 'nan' or 'NaN' with actual np.nan
df['account_type'].replace(['nan', 'NaN', 'None', ''], np.nan, inplace=True)

# Convert column back to object type if needed
df['account_type'] = df['account_type'].astype('category')

# Calculate the mode
mode_account_type = df['account_type'].mode()[0]

# Replace true NaN values with mode
df['account_type'].fillna(mode_account_type, inplace=True)

# Check the result
print("Missing values in 'account_type' after filling:", df['account_type'].isnull().sum())


Missing values in 'account_type' after filling: 0


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['account_type'].replace(['nan', 'NaN', 'None', ''], np.nan, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['account_type'].fillna(mode_account_type, inplace=True)


In [10]:
import pandas as pd
import numpy as np

# Load dataset
df = pd.read_csv('Dataset.csv')

# Convert all values in 'account_type' to lowercase and strip spaces
df['account_type'] = df['account_type'].str.strip().str.lower()

# Replace string 'nan' with actual np.nan
df['account_type'].replace('nan', np.nan, inplace=True)

# Now fill missing values with mode
mode_account_type = df['account_type'].mode()[0]
df['account_type'].fillna(mode_account_type, inplace=True)

# Optional: Capitalize first letter for uniform display
df['account_type'] = df['account_type'].str.capitalize()

# Final check
print("Unique values after cleaning:", df['account_type'].unique())
print("Missing values after cleaning:", df['account_type'].isnull().sum())


Unique values after cleaning: ['Business' 'Current' 'Savings']
Missing values after cleaning: 0


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['account_type'].replace('nan', np.nan, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['account_type'].fillna(mode_account_type, inplace=True)


In [14]:
# Convert to lowercase for uniform comparison
df['transaction_type'] = df['transaction_type'].astype(str).str.strip().str.lower()

# Replace typo 'dep0sit' with correct 'deposit'
df['transaction_type'] = df['transaction_type'].replace('dep0sit', 'deposit')

# Replace string 'nan' with np.nan (if present)
df['transaction_type'].replace('nan', np.nan, inplace=True)

# Convert real NaNs with mode
mode_transaction_type = df['transaction_type'].mode()[0]
df['transaction_type'].fillna(mode_transaction_type, inplace=True)

# Optional: Capitalize each word (e.g., 'online payment' → 'Online Payment')
df['transaction_type'] = df['transaction_type'].str.title()

# Final check
print("Unique values after cleaning:", df['transaction_type'].unique())
print("Missing values in 'transaction_type':", df['transaction_type'].isnull().sum())


Unique values after cleaning: ['Deposit' 'Transfer' 'Online Payment' 'Withdrawal']
Missing values in 'transaction_type': 0


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['transaction_type'].replace('nan', np.nan, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['transaction_type'].fillna(mode_transaction_type, inplace=True)


In [20]:
print(df['channel'].unique())


['Mobile Banking' 'Atm' 'Qr' 'Counter' 'Internet']


In [18]:
# Normalize case: Convert to lowercase first
df['channel'] = df['channel'].astype(str).str.strip().str.lower()

# Replace string 'nan' with np.nan
df['channel'].replace('nan', np.nan, inplace=True)

# Fill actual NaN values with the mode
mode_channel = df['channel'].mode()[0]
df['channel'].fillna(mode_channel, inplace=True)

# Format properly (e.g., 'mobile banking' → 'Mobile Banking')
df['channel'] = df['channel'].str.title()

# Final check
print("Unique values in 'channel':", df['channel'].unique())
print("Missing values in 'channel':", df['channel'].isnull().sum())


Unique values in 'channel': ['Mobile Banking' 'Atm' 'Qr' 'Counter' 'Internet']
Missing values in 'channel': 0


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['channel'].replace('nan', np.nan, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['channel'].fillna(mode_channel, inplace=True)


In [21]:
# Remove leading and trailing spaces from 'branch_code'
df['branch_code'] = df['branch_code'].astype(str).str.strip()

# Check if spaces removed by showing unique values or example rows
print(df['branch_code'].head())


0    BR-002
1    BR-002
2    BR-014
3    BR-013
4    BR-002
Name: branch_code, dtype: object


In [22]:
# 4. Save the updated DataFrame to a new CSV file
df.to_csv('cleaned.csv', index=False)