# Analyzing Banking Trends: Customer Transactions and Regional Impact
In the ever-evolving world of banking and finance, understanding customer behavior and the regional impact of transactions plays a crucial role in decision-making and strategic planning. This project, titled "Analyzing Banking Trends: Customer Transactions and Regional Impact," aims to explore and analyze the vast troves of transaction data to gain valuable insights into customer behavior patterns and their implications on different world regions.

**Objective**: The primary objective of this project is to delve into customer transactions and identify trends that may impact regional economies and financial systems. By combining data cleaning techniques in Python and utilizing SQL queries on a set of interconnected tables, we aim to gain a comprehensive understanding of how customer transactions vary across different regions and the possible implications on the banking sector.

**Data Sources**: The project leverages three key tables that provide valuable information for analysis:

1. **world_regions table**: This table contains data on various world regions and their corresponding codes and names. It serves as a reference to categorize customers based on their regional affiliation.

2. **user_nodes table**: The user_nodes table holds crucial details about consumers' banking nodes, including their unique consumer IDs, associated region IDs, node IDs, start dates, and end dates. This data enables us to identify the specific banking nodes to which customers are connected and their duration of association.

3. **user_transaction table** : This table is a comprehensive repository of customer transactions, containing data such as consumer IDs, transaction dates, types of transactions, and transaction amounts. Analyzing this data allows us to uncover patterns in customer spending and financial behaviors.

In [1]:
## Imports
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings

warnings.filterwarnings("ignore")
%matplotlib inline
%load_ext lab_black

## 1 Load the Dataset

### Working with user_nodes dataset.

In [2]:
user_nodes = pd.read_csv("user_nodes.csv")
user_nodes.head()

Unnamed: 0,id_,area_id_,node_id_,act_date,deact_date,has_loan,is_act
0,1,3,4,02-01-2020,03-01-2020,1,0
1,2,3,5,03-01-2020,17-01-2020,0,1
2,3,5,4,27-01-2020,18-02-2020,0,0
3,4,5,4,07-01-2020,19-01-2020,1,1
4,5,3,3,15-01-2020,23-01-2020,0,1


In [3]:
# summary
user_nodes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3643 entries, 0 to 3642
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   id_         3643 non-null   int64 
 1   area_id_    3643 non-null   int64 
 2   node_id_    3643 non-null   int64 
 3   act_date    3643 non-null   object
 4   deact_date  3643 non-null   object
 5   has_loan    3643 non-null   int64 
 6   is_act      3643 non-null   int64 
dtypes: int64(5), object(2)
memory usage: 199.4+ KB


In [4]:
# shape
user_nodes.shape

(3643, 7)

In [5]:
# check null values
user_nodes.isna().sum()

id_           0
area_id_      0
node_id_      0
act_date      0
deact_date    0
has_loan      0
is_act        0
dtype: int64

In [10]:
# number of duplicate values in user_nodes
user_nodes.duplicated().sum()

143

In [11]:
# droping the duplicates values
user_nodes = user_nodes.drop_duplicates()

In [14]:
# droping the 'has_load' and 'is_act'
user_nodes = user_nodes.drop(columns=["has_loan", "is_act"])

In [19]:
# renaming the columns for better readiability

user_nodes = user_nodes.rename(
    columns={
        "id_": "consumer_id",
        "area_id_": "region_id",
        "node_id_": "node_id",
        "act_date": "start_date",
        "deact_date": "end_date",
    }
)

In [35]:
user_nodes.to_csv("user_nodes_cleaned.csv", index=False)

### working with user_transactions data


In [21]:
user_transactions = pd.read_csv("user_transactions.csv")
user_transactions.head()

Unnamed: 0,id_,t_date,t_type,t_amt,has_credit_card,account_type
0,312,20-01-2020,deposit,485,Yes,Savings
1,376,03-01-2020,deposit,706,No,Current
2,188,13-01-2020,deposit,601,No,Savings
3,138,11-01-2020,deposit,520,No,Salary
4,373,18-01-2020,deposit,596,No,Salary


In [23]:
# summary
user_transactions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5947 entries, 0 to 5946
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   id_              5947 non-null   int64 
 1   t_date           5947 non-null   object
 2   t_type           5947 non-null   object
 3   t_amt            5947 non-null   int64 
 4   has_credit_card  5947 non-null   object
 5   account_type     5947 non-null   object
dtypes: int64(2), object(4)
memory usage: 278.9+ KB


In [24]:
# shape
user_transactions.shape

(5947, 6)

In [25]:
# null values
user_transactions.isna().sum()

id_                0
t_date             0
t_type             0
t_amt              0
has_credit_card    0
account_type       0
dtype: int64

In [28]:
# duplicate value
user_transactions.duplicated().sum()

79

In [29]:
# droping the duplicate values
user_transactions = user_transactions.drop_duplicates()

In [31]:
# droping the columns 'has_credit_card' and 'account_type'
user_transactions = user_transactions.drop(columns=["has_credit_card", "account_type"])

In [33]:
# rename the columns
user_transactions = user_transactions.rename(
    columns={
        "id_": "consumer_id",
        "t_date": "transaction_date",
        "t_type": "transaction_type",
        "t_amt": "transaction_amount",
    }
)

In [34]:
user_transactions.to_csv("user_transactions_cleaned.csv", index=False)

In [None]:
Host: localhost | Database: b5ef35c5

Username: b5ef35c5 | Password: Cab#22se