## Exploratory Data Analysis (EDA) in SQL

* This notebook demonstrates the versatility of SQL for Exploratory Data Analysis.
* The purpose is to discover insights from the data.

* Results of all the queries in this notebook were saved as csv files 
* The csv files will be ingested by visualization tools: Power BI, Tableau, MS Excel, Google Sheets.

In [1]:
import pandas as pd
import pyodbc
import warnings
warnings.filterwarnings('ignore')

In [2]:
server = 'JAK-PC\\SQLEXPRESS'
database = 'BankDB'
driver = '{ODBC Driver 18 for SQL Server}'

conn_string = f'DRIVER={driver};SERVER={server};DATABASE={database};\
              Trusted_Connection=yes;Encrypt=no;TrustServerCertificate=yes'

try:
    conn = pyodbc.connect(conn_string)
    cursor = conn.cursor()
except pyodbc.Error as ex:
    print("Connection error:", ex)

In [3]:
def fetch_data(query_string):
    """
    fetch_function consumes query_string (a SQL Query Statement), and 
    produces df, a pandas dataframe that contains the result of the SQL query.
    """
    df = pd.DataFrame()

    try:
        df = pd.read_sql_query(query_string, conn)
    except pyodbc.Error as ex:
        print("Connection error:", ex) 

    blank_row_index = [''] * len(df)
    df.index = blank_row_index
    
    return df

### Customer Segmentation

Identify **5** different segments of customers based on their transaction behavior.

In [4]:
query_string = """
    WITH CustomerSegments AS (
        SELECT 
            AccountID,
            AVG(Amount) AS AvgTransactionAmount,
            NTILE(5) OVER (ORDER BY AVG(Amount)) AS Segment
        FROM 
            BankTransaction
        GROUP BY 
            AccountID
    )
    SELECT 
        Segment,
        AVG(AvgTransactionAmount) AS AverageAmount,
        STRING_AGG(AccountID, ', ') AS AccountIDsInSegment
    FROM 
        CustomerSegments
    GROUP BY 
        Segment
    ORDER BY 
        Segment;
"""

df = fetch_data(query_string)
print("*** 5 Customer Segments - Based on Average Transaction Amount ***")
df.head() 

*** 5 Customer Segments - Based on Average Transaction Amount ***


Unnamed: 0,Segment,AverageAmount,AccountIDsInSegment
,1,1576.111632,"1086, 2523, 609, 3087, 3252, 358, 1316, 333, 2..."
,2,2570.228588,"2373, 3184, 842, 320, 2173, 3470, 2539, 3355, ..."
,3,4754.334396,"1956, 1292, 1702, 3834, 1232, 1063, 1007, 2701..."
,4,7655.605422,"2189, 2705, 90, 750, 3672, 1528, 1634, 2463, 9..."
,5,12232.170242,"2141, 485, 7, 270, 683, 9883, 3167, 2954, 2927..."


### Customer Segmentation

Districts with the most and least accounts and average number of accounts per district.

In [5]:
query_string = """
    WITH DistrictAccountCounts AS (
        SELECT DistrictID,
            COUNT(DISTINCT AccountID) AS NumAccounts
        FROM Account
        GROUP BY DistrictID
    )
    SELECT
        (SELECT TOP 1 DistrictID 
         FROM DistrictAccountCounts 
         ORDER BY NumAccounts) AS DistrictWithMinAccounts,
        (SELECT MIN(NumAccounts)
         FROM DistrictAccountCounts) AS MinNoOfAccounts,
        (SELECT TOP 1 DistrictID 
         FROM DistrictAccountCounts 
         ORDER BY NumAccounts DESC) AS DistrictWithMaxAccounts,
        (SELECT MAX(NumAccounts) 
         FROM DistrictAccountCounts) AS MaxNoOfAccounts,
        (SELECT AVG(NumAccounts * 1.0) 
         FROM DistrictAccountCounts) AS AvgNoOfAccounts;
    """

df = fetch_data(query_string)
print("*** Districts with the most and least accounts and average number of accounts per district ***")
df.head()     

*** Districts with the most and least accounts and average number of accounts per district ***


Unnamed: 0,DistrictWithMinAccounts,MinNoOfAccounts,DistrictWithMaxAccounts,MaxNoOfAccounts,AvgNoOfAccounts
,58,32,1,554,58.441558


### Customer Segmentation

Districts with the highest and lowest loan payments and average loan payments per district.

In [6]:
query_string = """
    WITH DistrictLoanPayments AS (
        SELECT
            A.DistrictID,
            SUM(L.Payments) AS TotalLoanPayments
        FROM
            Account A
        JOIN
            Loan L ON A.AccountID = L.AccountID
        GROUP BY
            A.DistrictID
    )
    SELECT
        -- District with the lowest loan payments
        (SELECT TOP 1 DistrictID 
         FROM DistrictLoanPayments 
         ORDER BY TotalLoanPayments) AS DistrictWithMinLoanPayments,

        -- Minimum loan payments in any district
        (SELECT MIN(TotalLoanPayments) 
         FROM DistrictLoanPayments) AS MinLoanPayments,

        -- District with the highest loan payments
        (SELECT TOP 1 DistrictID 
         FROM DistrictLoanPayments 
         ORDER BY TotalLoanPayments DESC) AS DistrictWithMaxLoanPayments,

        -- Maximum loan payments in any district
        (SELECT MAX(TotalLoanPayments) 
         FROM DistrictLoanPayments) AS MaxLoanPayments,

        -- Average loan payments per district
        (SELECT AVG(TotalLoanPayments * 1.0) 
         FROM DistrictLoanPayments) AS AvgLoanPayments;
    """

df = fetch_data(query_string)
print("*** Districts with the highest and lowest loan payments and average loan payments per district ***")
df.head()    

*** Districts with the highest and lowest loan payments and average loan payments per district ***


Unnamed: 0,DistrictWithMinLoanPayments,MinLoanPayments,DistrictWithMaxLoanPayments,MaxLoanPayments,AvgLoanPayments
,35,2650.0,1,364472.0,37117.311688


### Customer Segmentation

Accounts with the highest and lowest transaction amounts average transaction amount.

In [7]:
query_string = """
    WITH AccountTransactionAmounts AS (
        SELECT
            AccountID,
            MAX(Amount) AS HighestTransactionAmount,
            MIN(Amount) AS LowestTransactionAmount
        FROM
            BankTransaction
        GROUP BY
            AccountID
    )
    SELECT
        -- Account with the highest transaction amount
        (
            SELECT TOP 1 AccountID 
            FROM AccountTransactionAmounts 
            ORDER BY HighestTransactionAmount DESC) AS AccountWithMaxTransactionAmount,

        -- Highest transaction amount in any account
        (
            SELECT MAX(HighestTransactionAmount) 
            FROM AccountTransactionAmounts) AS MaxTransactionAmount,

        -- Account with the lowest transaction amount
        (
            SELECT TOP 1 AccountID 
            FROM AccountTransactionAmounts 
            ORDER BY LowestTransactionAmount) AS AccountWithMinTransactionAmount,

        -- Lowest transaction amount in any account
        (
            SELECT MIN(LowestTransactionAmount) 
            FROM AccountTransactionAmounts) AS MinTransactionAmount,

        -- Average transaction amount per account
        (
            SELECT AVG(HighestTransactionAmount * 1.0) 
            FROM AccountTransactionAmounts) AS AvgTransactionAmount;
    """

df = fetch_data(query_string)
print("*** Accounts with the highest, lowest, and average transaction amounts. ***")
df.head() 

*** Accounts with the highest, lowest, and average transaction amounts. ***


Unnamed: 0,AccountWithMaxTransactionAmount,MaxTransactionAmount,AccountWithMinTransactionAmount,MinTransactionAmount,AvgTransactionAmount
,998,87400.0,5129,0.0,29619.047555


### Customer Segmentation

Accounts with the highest, lowest, and average transaction frequencies.

In [8]:
query_string = """
    WITH AccountTransactionFrequency AS (
        SELECT
            AccountID,
            COUNT(*) AS TransactionFrequency
        FROM
            BankTransaction
        GROUP BY
            AccountID
    )
    SELECT
        -- Account with the highest transaction frequency
        (SELECT TOP 1 AccountID 
         FROM AccountTransactionFrequency 
         ORDER BY TransactionFrequency DESC) AS AccountWithMaxTransactionFrequency,

        -- Highest transaction frequency in any account
        (SELECT MAX(TransactionFrequency) 
         FROM AccountTransactionFrequency) AS MaxTransactionFrequency,

        -- Account with the lowest transaction frequency
        (SELECT TOP 1 AccountID 
         FROM AccountTransactionFrequency 
         ORDER BY TransactionFrequency) AS AccountWithMinTransactionFrequency,

        -- Lowest transaction frequency in any account
        (SELECT MIN(TransactionFrequency) 
         FROM AccountTransactionFrequency) AS MinTransactionFrequency,

        -- Average transaction frequency per account
        (SELECT AVG(TransactionFrequency * 1.0) 
         FROM AccountTransactionFrequency) AS AvgTransactionFrequency;
    """

df = fetch_data(query_string)
print("*** Accounts with the highest, lowest, and average transaction frequencies. ***")
df.head() 

*** Accounts with the highest, lowest, and average transaction frequencies. ***


Unnamed: 0,AccountWithMaxTransactionFrequency,MaxTransactionFrequency,AccountWithMinTransactionFrequency,MinTransactionFrequency,AvgTransactionFrequency
,8261,675,182,9,234.737777


### Customer Segmentation

#### High-Value Clients: 
* Who are our high-value customers, and what are their characteristics?
* A high value account has an amount in the bank > average sum of all amounts.
* High-value clients are actually based on various criteria.

#### Purpose:

* This query aims to identify and analyze high-value clients based on their financial activities across different banking products.
* It provides insights into which clients have significant transactions in terms of bank orders, loans, or general transactions.
* This information can be valuable for targeted marketing campaigns, personalized financial services, or risk management strategies.

### High-Valued Accounts Based on Loan Amount

* To qualify as a high-valued account, the loan amount > average of all loan amounts.

In [9]:
query_string = """
    WITH AverageLoanAmount AS (
        SELECT AVG(L.Amount) AS AverageLoan
        FROM Loan L
    ),
    HighValueClientsByLoan AS (
        SELECT D.ClientID, A.AccountID, SUM(L.Amount) AS TotalLoanAmount
        FROM Loan L
        JOIN Account A ON L.AccountID = A.AccountID
        JOIN Disposition D ON A.AccountID = D.AccountID
        GROUP BY D.ClientID, A.AccountID
    )
    SELECT 
        HVL.ClientID, 
        HVL.AccountID, 
        HVL.TotalLoanAmount
    FROM 
        HighValueClientsByLoan HVL
    JOIN 
        AverageLoanAmount AL ON HVL.TotalLoanAmount > AL.AverageLoan
    ORDER BY 
        HVL.TotalLoanAmount DESC;
"""

df = fetch_data(query_string)
print("*** High-Value Clients by Loan Amount ***")
df.head() 

*** High-Value Clients by Loan Amount ***


Unnamed: 0,ClientID,AccountID,TotalLoanAmount
,9340,7542,590820.0
,10997,8926,566640.0
,10998,8926,566640.0
,2823,2335,541200.0
,981,817,538500.0


### High-Valued Accounts Based on Transaction Amount

* For high-valued account, the Transaction Amount > average of all Transaction Amounts.

In [10]:
query_string = """
    WITH AverageTransactionAmount AS (
        SELECT AVG(BT.Amount) AS AverageTransaction
        FROM BankTransaction BT
    ),
    HighValueClientsByTransaction AS (
        SELECT D.ClientID, A.AccountID, SUM(BT.Amount) AS TotalTransactionAmount
        FROM BankTransaction BT
        JOIN Account A ON BT.AccountID = A.AccountID
        JOIN Disposition D ON A.AccountID = D.AccountID
        GROUP BY D.ClientID, A.AccountID
    )
    SELECT
        HVBT.ClientID,
        HVBT.AccountID,
        HVBT.TotalTransactionAmount
    FROM 
        HighValueClientsByTransaction HVBT
    JOIN AverageTransactionAmount ATA 
        ON HVBT.TotalTransactionAmount > ATA.AverageTransaction
    ORDER BY 
        HVBT.TotalTransactionAmount DESC;

"""

df = fetch_data(query_string)
print("*** High-Value Clients by Transaction Amount ***")
df.head() 

*** High-Value Clients by Transaction Amount ***


Unnamed: 0,ClientID,AccountID,TotalTransactionAmount
,255,212,7619102.4
,4251,3521,7401229.2
,4252,3521,7401229.2
,3335,2762,7399357.6
,1359,1132,7386440.3


### High-Valued Accounts Based on Order Amount

* To qualify as a high-valued account, the Order Amount > average of all Order Amounts.

In [11]:
query_string = """
    WITH AverageBankOrderAmount AS (
        SELECT AVG(BO.Amount) AS AverageBankOrder
        FROM BankOrder BO
    ),
    HighValueClientsByBankOrder AS (
        SELECT D.ClientID, A.AccountID, SUM(BO.Amount) AS TotalBankOrderAmount
        FROM BankOrder BO
        JOIN Account A ON BO.AccountID = A.AccountID
        JOIN Disposition D ON A.AccountID = D.AccountID
        GROUP BY D.ClientID, A.AccountID
    )
    SELECT
        HVBO.ClientID,
        HVBO.AccountID,
        HVBO.TotalBankOrderAmount
    FROM 
        HighValueClientsByBankOrder HVBO
    JOIN AverageBankOrderAmount ABA 
        ON HVBO.TotalBankOrderAmount > ABA.AverageBankOrder
    ORDER BY 
        HVBO.TotalBankOrderAmount DESC;
"""

df = fetch_data(query_string)
print("*** High-Value Clients by Order Amount ***")
df.head() 

*** High-Value Clients by Order Amount ***


Unnamed: 0,ClientID,AccountID,TotalBankOrderAmount
,3629,3005,22704.3
,2866,2371,21785.3
,2865,2371,21785.3
,3517,2910,21725.3
,2083,1718,21634.0


### Loan Stats

Grouping loans based on loan status.

In [12]:
query_string = """
    SELECT 
        StatusID, 
        COUNT(*) AS LoanCount, 
        SUM(Amount) AS TotalAmount, 
        AVG(Amount) AS AverageAmount, 
        MIN(Amount) AS MinAmount, 
        MAX(Amount) AS MaxAmount
    FROM 
        Loan
    GROUP BY 
        StatusID;
"""

df = fetch_data(query_string)
print("*** Grouping loans based on loan status. ***")
df.head() 

*** Grouping loans based on loan status. ***


Unnamed: 0,StatusID,LoanCount,TotalAmount,AverageAmount,MinAmount,MaxAmount
,A,203,18603216.0,91641.458128,4980.0,323472.0
,B,31,4362348.0,140720.903225,29448.0,464520.0
,C,403,69078372.0,171410.352357,5148.0,590820.0
,D,45,11217804.0,249284.533333,36204.0,541200.0


### Loan Stats

Grouping loans (Per Year) based on loan status.

In [13]:
query_string = """
    SELECT
        StatusID,
        YEAR(EntryDate) AS EntryYear,
        COUNT(*) AS LoanCount,
        SUM(Amount) AS TotalAmount,
        AVG(Amount) AS AverageAmount,
        MIN(Amount) AS MinAmount,
        MAX(Amount) AS MaxAmount
    FROM
        Loan
    GROUP BY
        StatusID,
        YEAR(EntryDate);
"""

df = fetch_data(query_string)
print("*** Grouping loans (Per Year) based on loan status. ***")
df.head() 

*** Grouping loans (Per Year) based on loan status. ***


Unnamed: 0,StatusID,EntryYear,LoanCount,TotalAmount,AverageAmount,MinAmount,MaxAmount
,A,1993,16,1807992.0,112999.5,21924.0,274740.0
,B,1993,4,811284.0,202821.0,75624.0,464520.0
,A,1994,73,7537632.0,103255.232876,4980.0,323472.0
,B,1994,12,2163972.0,180331.0,49320.0,299088.0
,C,1994,14,2943300.0,210235.714285,50460.0,398640.0


### Loan Time Series Data

Grouping loans (Per Month and Year) based on loan status.

In [14]:
query_string = """
    SELECT
        StatusID,
        YEAR(EntryDate) AS EntryYear,
        MONTH(EntryDate) AS EntryMonth,
        COUNT(*) AS LoanCount,
        SUM(Amount) AS TotalAmount,
        AVG(Amount) AS AverageAmount,
        MIN(Amount) AS MinAmount,
        MAX(Amount) AS MaxAmount
    FROM 
        Loan
    GROUP BY
        StatusID,
        YEAR(EntryDate),
        MONTH(EntryDate);
"""

df = fetch_data(query_string)
print("*** Loan Time Series Data: Grouping loans (Per Month and Year) based on loan status. ***")
df.head() 

*** Loan Time Series Data: Grouping loans (Per Month and Year) based on loan status. ***


Unnamed: 0,StatusID,EntryYear,EntryMonth,LoanCount,TotalAmount,AverageAmount,MinAmount,MaxAmount
,A,1993,7,2,293040.0,146520.0,127080.0,165960.0
,A,1993,8,1,105804.0,105804.0,105804.0,105804.0
,A,1993,9,3,415368.0,138456.0,52788.0,274740.0
,A,1993,10,1,154416.0,154416.0,154416.0,154416.0
,A,1993,11,3,218556.0,72852.0,21924.0,117024.0


### Loan: Effect of Loan Duration on Loan Status

In [15]:
query_string = """
    SELECT
        Duration,
        StatusID,
        COUNT(*) AS LoanCount,
        ROUND(MAX(Amount), 2) AS MaxAmount,
        ROUND(MIN(Amount), 2) AS MinAmount,
        ROUND(SUM(Amount), 2) AS TotalAmount,
        ROUND(AVG(Amount), 2) AS AverageAmount
    FROM
        Loan
    GROUP BY
        Duration,
        StatusID
    ORDER BY
        Duration ASC,
        StatusID ASC;
"""

df = fetch_data(query_string)
print("*** Effect of Loan Duration on Loan Status ***")
df.head() 

*** Effect of Loan Duration on Loan Status ***


Unnamed: 0,Duration,StatusID,LoanCount,MaxAmount,MinAmount,TotalAmount,AverageAmount
,12,A,93,116832.0,4980.0,5136444.0,55230.58
,12,B,10,96396.0,29448.0,584256.0,58425.6
,12,C,27,109344.0,5148.0,1269348.0,47012.89
,12,D,1,36204.0,36204.0,36204.0,36204.0
,24,A,64,198240.0,7656.0,5966688.0,93229.5


### Time Series: Effect of Loan Duration on Loan Status (Per Month and Year)

In [16]:
query_string = """
    SELECT
        YEAR(EntryDate) AS LoanYear,
        MONTH(EntryDate) AS LoanMonth,
        Duration,
        StatusID,
        COUNT(*) AS LoanCount,
        ROUND(MAX(Amount), 2) AS MaxAmount,
        ROUND(MIN(Amount), 2) AS MinAmount,
        ROUND(SUM(Amount), 2) AS TotalAmount,
        ROUND(AVG(Amount), 2) AS AverageAmount
    FROM
        Loan
    GROUP BY
        YEAR(EntryDate),
        MONTH(EntryDate),
        Duration,
        StatusID
    ORDER BY
        LoanYear ASC,
        LoanMonth ASC,
        Duration ASC,
        StatusID ASC;
"""

df = fetch_data(query_string)
print("*** Time Series: Effect of Loan Duration on Loan Status (Per Month and Year) ***")
df.head() 

*** Time Series: Effect of Loan Duration on Loan Status (Per Month and Year) ***


Unnamed: 0,LoanYear,LoanMonth,Duration,StatusID,LoanCount,MaxAmount,MinAmount,TotalAmount,AverageAmount
,1993,7,12,B,1,96396.0,96396.0,96396.0,96396.0
,1993,7,36,A,1,165960.0,165960.0,165960.0,165960.0
,1993,7,60,A,1,127080.0,127080.0,127080.0,127080.0
,1993,8,36,A,1,105804.0,105804.0,105804.0,105804.0
,1993,9,12,A,1,52788.0,52788.0,52788.0,52788.0


### Time Series: Effect of Duration and District on Loans (Per Month and Year)

In [17]:
query_string = """
    SELECT
        YEAR(L.EntryDate) AS LoanYear,
        MONTH(L.EntryDate) AS LoanMonth,
        D.DistrictID,
        Duration,
        StatusID,
        COUNT(*) AS LoanCount,
        ROUND(MAX(L.Amount), 2) AS MaxAmount,
        ROUND(MIN(L.Amount), 2) AS MinAmount,
        ROUND(SUM(L.Amount), 2) AS TotalAmount,
        ROUND(AVG(L.Amount), 2) AS AverageAmount
    FROM
        Loan L
    JOIN
        Account A ON L.AccountID = A.AccountID
    JOIN
        District D ON A.DistrictID = D.DistrictID
    GROUP BY
        YEAR(L.EntryDate),
        MONTH(L.EntryDate),
        D.DistrictID,
        Duration,
        StatusID
    ORDER BY
        LoanYear ASC,
        LoanMonth ASC,
        D.DistrictID ASC,
        Duration ASC,
        StatusID ASC;
"""

df = fetch_data(query_string)
print("*** Time Series: Effect of Duration and District on Loans (Per Month and Year) ***")
df.head() 

*** Time Series: Effect of Duration and District on Loans (Per Month and Year) ***


Unnamed: 0,LoanYear,LoanMonth,DistrictID,Duration,StatusID,LoanCount,MaxAmount,MinAmount,TotalAmount,AverageAmount
,1993,7,30,12,B,1,96396.0,96396.0,96396.0,96396.0
,1993,7,45,60,A,1,127080.0,127080.0,127080.0,127080.0
,1993,7,46,36,A,1,165960.0,165960.0,165960.0,165960.0
,1993,8,12,36,A,1,105804.0,105804.0,105804.0,105804.0
,1993,9,1,60,A,1,274740.0,274740.0,274740.0,274740.0


### Time Series (Sparse Table): Effect of Duration and District on Loans (Per Month and Year)

* CROSS JOIN to include months when there's no amount entered. 
* Fill in values for months with Amount as 0, and sort

In [18]:
query_string = """
    WITH YearList AS (
        SELECT DISTINCT YEAR(EntryDate) AS Year FROM Loan
    ),
    MonthList AS (
        SELECT DISTINCT MONTH(EntryDate) AS Month FROM Loan
    )
    SELECT
        Y.Year,
        M.Month,
        D.DistrictID,
        L.Duration,
        LS.StatusID,
        COUNT(L.LoanID) AS LoanCount,
        COALESCE(ROUND(MAX(L.Amount), 2), 0) AS MaxAmount,
        COALESCE(ROUND(MIN(L.Amount), 2), 0) AS MinAmount,
        COALESCE(ROUND(SUM(L.Amount), 2), 0) AS TotalAmount,
        COALESCE(ROUND(AVG(L.Amount), 2), 0) AS AverageAmount
    FROM
        YearList Y
    CROSS JOIN
        MonthList M
    CROSS JOIN
        District D
    CROSS JOIN
        LoanStatus LS
    LEFT JOIN
        Loan L ON Y.Year = YEAR(L.EntryDate) AND 
                  M.Month = MONTH(L.EntryDate) AND 
                  LS.StatusID = L.StatusID
    LEFT JOIN
        Account A ON L.AccountID = A.AccountID
    LEFT JOIN
        District DD ON A.DistrictID = DD.DistrictID
    WHERE
        L.Duration IS NOT NULL
    GROUP BY
        Y.Year,
        M.Month,
        D.DistrictID,
        L.Duration,
        LS.StatusID
    ORDER BY
        Y.Year ASC,
        M.Month ASC,
        D.DistrictID ASC,
        L.Duration ASC,
        LS.StatusID ASC;
"""

df = fetch_data(query_string)
print("*** Time Series (Sparse Table): Effect of Duration and District on Loans (Per Month and Year) -  ***")
df.head() 

*** Time Series (Sparse Table): Effect of Duration and District on Loans (Per Month and Year) -  ***


Unnamed: 0,Year,Month,DistrictID,Duration,StatusID,LoanCount,MaxAmount,MinAmount,TotalAmount,AverageAmount
,1993,7,1,12,B,1,96396.0,96396.0,96396.0,96396.0
,1993,7,1,36,A,1,165960.0,165960.0,165960.0,165960.0
,1993,7,1,60,A,1,127080.0,127080.0,127080.0,127080.0
,1993,7,10,12,B,1,96396.0,96396.0,96396.0,96396.0
,1993,7,10,36,A,1,165960.0,165960.0,165960.0,165960.0


### Effect of Seasons on Account Opening.

* Are there seasonal trends in client account opening habits?

In [19]:
query_string = """
    SELECT
        YEAR(EntryDate) AS Year,
        MONTH(EntryDate) AS Month,
        COUNT(DISTINCT AccountID) AS NumAccountsOpened
    FROM
        Account
    GROUP BY
        YEAR(EntryDate),
        MONTH(EntryDate)
    ORDER BY
        YEAR(EntryDate),
        MONTH(EntryDate);

    -- Seasonal trends in client account opening habits per district.
    SELECT
        YEAR(EntryDate) AS Year,
        MONTH(EntryDate) AS Month,
        DistrictID,
        COUNT(DISTINCT AccountID) AS NumAccountsOpened
    FROM
        Account
    GROUP BY
        YEAR(EntryDate),
        MONTH(EntryDate),
        DistrictID
    ORDER BY
        YEAR(EntryDate),
        MONTH(EntryDate),
        DistrictID;
"""

df = fetch_data(query_string)
print("*** Effect of Seasons on Account Opening. ***")
df.head() 

*** Effect of Seasons on Account Opening. ***


Unnamed: 0,Year,Month,NumAccountsOpened
,1993,1,96
,1993,2,98
,1993,3,104
,1993,4,77
,1993,5,91


### Effect of Seasons on Loan Payments.

* Are there seasonal trends in client loan payment habits?

In [20]:
query_string = """
    SELECT
        YEAR(EntryDate) AS Year,
        MONTH(EntryDate) AS Month,
        COUNT(DISTINCT LoanID) AS NumLoans,
        COUNT(*) AS TotalPayments,
        SUM(Payments) AS TotalPaymentAmount,
        AVG(Payments * 1.0) AS AvgPaymentAmount,
        MAX(Payments) AS MaxPaymentAmount,
        MIN(Payments) AS MinPaymentAmount
    FROM
        Loan
    GROUP BY
        YEAR(EntryDate),
        MONTH(EntryDate)
    ORDER BY
        YEAR(EntryDate),
        MONTH(EntryDate);

    -- Seasonal trends in client loan payment habits per district.
    SELECT
        YEAR(L.EntryDate) AS Year,
        MONTH(L.EntryDate) AS Month,
        A.DistrictID,
        COUNT(DISTINCT L.LoanID) AS NumLoans,
        COUNT(*) AS TotalPayments,
        SUM(L.Payments) AS TotalPaymentAmount,
        AVG(L.Payments * 1.0) AS AvgPaymentAmount,
        MAX(L.Payments) AS MaxPaymentAmount,
        MIN(L.Payments) AS MinPaymentAmount
    FROM
        Loan L
    JOIN
        Account A ON L.AccountID = A.AccountID
    GROUP BY
        YEAR(L.EntryDate),
        MONTH(L.EntryDate),
        A.DistrictID
    ORDER BY
        YEAR(L.EntryDate),
        MONTH(L.EntryDate),
        A.DistrictID;
"""

df = fetch_data(query_string)
print("*** Effect of Seasons on Loan Payments ***")
df.head() 

*** Effect of Seasons on Loan Payments ***


Unnamed: 0,Year,Month,NumLoans,TotalPayments,TotalPaymentAmount,AvgPaymentAmount,MaxPaymentAmount,MinPaymentAmount
,1993,7,3,3,14761.0,4920.333333,8033.0,2118.0
,1993,8,1,1,2939.0,2939.0,2939.0,2939.0
,1993,9,4,4,19919.0,4979.75,7281.0,3660.0
,1993,10,1,1,3217.0,3217.0,3217.0,3217.0
,1993,11,3,3,8802.0,2934.0,4876.0,609.0


### Effect of Seasons on Clients Loan Status.

* Are there seasonal trends in client loan status habits?

In [21]:
query_string = """
    SELECT
        Y.Year,
        M.Month,
        LS.StatusID,
        COALESCE(COUNT(DISTINCT L.LoanID), 0) AS NumLoans
    FROM
        (SELECT DISTINCT YEAR(EntryDate) AS Year FROM Loan) Y
    CROSS JOIN
        (SELECT DISTINCT MONTH(EntryDate) AS Month FROM Loan) M
    CROSS JOIN
        LoanStatus LS
    LEFT JOIN
        Loan L ON Y.Year = YEAR(L.EntryDate) AND 
                  M.Month = MONTH(L.EntryDate) AND 
                  LS.StatusID = L.StatusID
    GROUP BY
        Y.Year,
        M.Month,
        LS.StatusID
    ORDER BY
        Y.Year,
        M.Month,
        LS.StatusID;
"""

df = fetch_data(query_string)
print("*** Effect of Seasons on Clients Loan Status ***")
df.head() 

*** Effect of Seasons on Clients Loan Status ***


Unnamed: 0,Year,Month,StatusID,NumLoans
,1993,1,A,0
,1993,1,B,0
,1993,1,C,0
,1993,1,D,0
,1993,2,A,0


### Time Series: Seasonal Trends in Client Loan Status Habits Per District.

In [22]:
query_string = """
    SELECT
        Y.Year,
        M.Month,
        A.DistrictID,
        LS.StatusID,
        COALESCE(COUNT(DISTINCT L.LoanID), 0) AS NumLoans
    FROM
        (SELECT DISTINCT YEAR(EntryDate) AS Year FROM Loan) Y
    CROSS JOIN
        (SELECT DISTINCT MONTH(EntryDate) AS Month FROM Loan) M
    CROSS JOIN
        LoanStatus LS
    CROSS JOIN
        Account A
    LEFT JOIN
        Loan L ON Y.Year = YEAR(L.EntryDate) AND 
                  M.Month = MONTH(L.EntryDate) AND 
                  LS.StatusID = L.StatusID
        AND A.AccountID = L.AccountID
    GROUP BY
        Y.Year,
        M.Month,
        A.DistrictID,
        LS.StatusID
    ORDER BY
        Y.Year,
        M.Month,
        A.DistrictID,
        LS.StatusID;
"""

df = fetch_data(query_string)
print("*** Time Series: Seasonal Trends in Client Loan Status Habits Per District. ***")
df.head() 

*** Time Series: Seasonal Trends in Client Loan Status Habits Per District. ***


Unnamed: 0,Year,Month,DistrictID,StatusID,NumLoans
,1993,1,1,A,0
,1993,1,1,B,0
,1993,1,1,C,0
,1993,1,1,D,0
,1993,1,10,A,0


### Time Series: Seasonal Trends on Clients Habits Towards Bank Transactions.

In [23]:
query_string = """
    SELECT
        YEAR(EntryDate) AS Year,
        MONTH(EntryDate) AS Month,
        MIN(Amount) AS MinAmount,
        MAX(Amount) AS MaxAmount,
        SUM(Amount) AS TotalAmount,
        AVG(Amount * 1.0) AS AvgAmount,
        COUNT(*) AS NumTransactions
    FROM
        BankTransaction
    GROUP BY
        YEAR(EntryDate),
        MONTH(EntryDate)
    ORDER BY
        YEAR(EntryDate),
        MONTH(EntryDate);
"""

df = fetch_data(query_string)
print("*** Time Series: Seasonal Trends on Clients Habits Towards Bank Transactions. ***")
df.head() 

*** Time Series: Seasonal Trends on Clients Habits Towards Bank Transactions. ***


Unnamed: 0,Year,Month,MinAmount,MaxAmount,TotalAmount,AvgAmount,NumTransactions
,1993,1,0.8,49752.0,702157.6,3966.99209,177
,1993,2,2.9,62100.0,2726925.3,6903.608354,395
,1993,3,1.7,51700.0,4730318.6,6997.512721,676
,1993,4,2.5,60500.0,7378367.8,8081.454326,913
,1993,5,3.1,60000.0,11680753.4,8943.915313,1306


### Time Series: Seasonal Trends of Clients' Bank Transaction Habits Per District.

In [24]:
query_string = """
    SELECT
        YEAR(BT.EntryDate) AS Year,
        MONTH(BT.EntryDate) AS Month,
        A.DistrictID,
        MIN(BT.Amount) AS MinAmount,
        MAX(BT.Amount) AS MaxAmount,
        SUM(BT.Amount) AS TotalAmount,
        AVG(BT.Amount * 1.0) AS AvgAmount,
        COUNT(*) AS NumTransactions
    FROM
        BankTransaction BT
    JOIN
        Account A ON BT.AccountID = A.AccountID
    GROUP BY
        YEAR(BT.EntryDate),
        MONTH(BT.EntryDate),
        A.DistrictID
    ORDER BY
        YEAR(BT.EntryDate),
        MONTH(BT.EntryDate),
        A.DistrictID;
"""

df = fetch_data(query_string)
print("*** Time Series: Seasonal Trends of Clients' Bank Transaction Habits Per District. ***")
df.head() 

*** Time Series: Seasonal Trends of Clients' Bank Transaction Habits Per District. ***


Unnamed: 0,Year,Month,DistrictID,MinAmount,MaxAmount,TotalAmount,AvgAmount,NumTransactions
,1993,1,1,15.9,6446.0,20099.9,1435.707142,14
,1993,1,10,28.2,10737.0,11665.2,3888.4,3
,1993,1,12,17.0,19961.0,22078.0,5519.5,4
,1993,1,13,1100.0,1100.0,1100.0,1100.0,1
,1993,1,14,800.0,800.0,800.0,800.0,1


### What Banks Do Our Clients Transact With?

In [25]:
query_string = """
    SELECT
        Bank,
        COUNT(*) AS NumTransactions,
        SUM(Amount) AS TotalAmount,
        AVG(Amount * 1.0) AS AvgAmount,
        MIN(Amount) AS MinAmount,
        MAX(Amount) AS MaxAmount
    FROM
        BankTransaction
    WHERE
        Bank IS NOT NULL AND Bank <> ''
    GROUP BY
        Bank
    ORDER BY
        Bank;
"""

df = fetch_data(query_string)
print("*** What Banks Do Our Clients Transact With? ***")
df.head() 

*** What Banks Do Our Clients Transact With? ***


Unnamed: 0,Bank,NumTransactions,TotalAmount,AvgAmount,MinAmount,MaxAmount
,AB,21720,108354898.8,4988.715414,5.0,72966.0
,CD,19597,104137717.9,5313.962234,15.0,74176.0
,EF,21293,108391703.6,5090.485305,3.0,73970.0
,GH,21499,125956293.3,5858.704744,10.0,74648.0
,IJ,20525,111914481.7,5452.593505,2.0,74522.0


### What type of transactions do our clients do with other banks?

In [26]:
query_string = """
    SELECT
        Bank,
        Type,
        COUNT(*) AS NumTransactions,
        SUM(Amount) AS TotalAmount,
        AVG(Amount * 1.0) AS AvgAmount,
        MIN(Amount) AS MinAmount,
        MAX(Amount) AS MaxAmount
    FROM
        BankTransaction
    WHERE
        Bank IS NOT NULL AND Bank <> '' AND Type IS NOT NULL AND Type <> ''
    GROUP BY
        Bank, Type
    ORDER BY
        Bank, Type;
"""

df = fetch_data(query_string)
print("*** What type of transactions do our clients do with other banks? ***")
df.head() 

*** What type of transactions do our clients do with other banks? ***


Unnamed: 0,Bank,Type,NumTransactions,TotalAmount,AvgAmount,MinAmount,MaxAmount
,AB,Deposit,4807,54138718.0,11262.47514,2904.0,72966.0
,AB,Withdraw,16913,54216180.8,3205.592195,5.0,14707.0
,CD,Deposit,4984,57608541.0,11558.696027,2904.0,74176.0
,CD,Withdraw,14613,46529176.9,3184.094771,15.0,13461.0
,EF,Deposit,4880,50959607.0,10442.542418,2942.0,73970.0


### Close the Database Connection

In [27]:
try:
    cursor.close()
    conn.close()
except pyodbc.Error as ex:
    print("Connection error:", ex)

### End of Exploratory Data Analysis in SQL

* Next step is to load the generated tabular data into Power BI, Tableau, Excel, or Google Sheet to create visualizations and derive insights.