## Exploratory Data Analysis (EDA) in SQL

* This notebook demonstrates the versatility of SQL for Exploratory Data Analysis.
* The purpose is to discover insights from the data.

* Results of all the queries in this notebook were saved as csv files 
* The csv files will be ingested by visualization tools: Power BI, Tableau, MS Excel, Google Sheets.

In [2]:
import pandas as pd
import pyodbc
import warnings
warnings.filterwarnings('ignore')

In [3]:
server = 'JAK-PC\\SQLEXPRESS'
database = 'BankDB'
driver = '{ODBC Driver 18 for SQL Server}'

conn_string = f'DRIVER={driver};SERVER={server};DATABASE={database};\
              Trusted_Connection=yes;Encrypt=no;TrustServerCertificate=yes'

try:
    conn = pyodbc.connect(conn_string)
    cursor = conn.cursor()
except pyodbc.Error as ex:
    print("Connection error:", ex)

In [4]:
def fetch_data(query_string):
    """
    fetch_function consumes query_string (a SQL Query Statement), and 
    produces df, a pandas dataframe that contains the result of the SQL query.
    """
    df = pd.DataFrame()

    try:
        df = pd.read_sql_query(query_string, conn)
    except pyodbc.Error as ex:
        print("Connection error:", ex) 

    blank_row_index = [''] * len(df)
    df.index = blank_row_index
    
    return df

### Customer Segmentation

Identify **5** different segments of customers based on their transaction behavior.

In [None]:
query_string = """
    WITH CustomerSegments AS (
        SELECT 
            AccountID,
            AVG(Amount) AS AvgTransactionAmount,
            NTILE(5) OVER (ORDER BY AVG(Amount)) AS Segment
        FROM BankTransaction
        GROUP BY AccountID
    )
    SELECT 
        Segment,
        AVG(AvgTransactionAmount) AS AverageAmount,
        STRING_AGG(AccountID, ', ') AS AccountIDsInSegment
    FROM CustomerSegments
    GROUP BY Segment
    ORDER BY Segment;
"""

df = fetch_data(query_string)
print("*** 5 Customer Segments - Based on Average Transaction Amount ***")
df.to_csv('../results-csv-data-going-to-powerbi/segment-5-buckets-on-transaction-amount.csv', index=False)
df.head() 

### Account Volume by District

In [None]:
query_string = """
    SELECT DistrictID, 
           COUNT(*) AS CountOfDistrict
    FROM Account
    GROUP BY DistrictID
    ORDER BY CountOfDistrict DESC
"""

df = fetch_data(query_string)
print("*** Account Volume by District ***")
df.to_csv('../results-csv-data-going-to-powerbi/account-volume-by-district.csv', index=False)
df.head()

### Loan Payments by District

In [None]:
query_string = """
    SELECT D.DistrictID, 
        SUM(L.Payments) AS SumOfPayments, 
        AVG(L.Payments) AS AvgOfPayments, 
        COUNT(L.Payments) AS CountOfPayments
    FROM District D
    LEFT JOIN Account A ON D.DistrictID = A.DistrictID
    LEFT JOIN Loan L ON A.AccountID = L.AccountID
    GROUP BY D.DistrictID
"""

df = fetch_data(query_string)
print("*** Loan Payments by District ***")
df.to_csv('../results-csv-data-going-to-powerbi/loan-payments-by-district.csv', index=False)
df.head()    

### Bank Transaction by District

In [None]:
query_string = """
    SELECT D.DistrictID, 
        SUM(BT.Amount) AS SumOfAmounts, 
        AVG(BT.Amount) AS AvgOfAmounts, 
        COUNT(BT.Amount) AS CountOfAmounts
    FROM District D
    LEFT JOIN Account A ON D.DistrictID = A.DistrictID
    LEFT JOIN BankTransaction BT ON A.AccountID = BT.AccountID
    GROUP BY D.DistrictID
    ORDER BY SumOfAmounts
    """

df = fetch_data(query_string)
print("*** Accounts with the highest, lowest, and average transaction amounts. ***")
df.to_csv('../results-csv-data-going-to-powerbi/transaction-amount-by-district.csv', index=False)
df.head() 

### Customer Segmentation

#### High-Value Clients: 
* Who are our high-value customers, and what are their characteristics?
* A high value account has an amount in the bank > average sum of all amounts.
* High-value clients are actually based on various criteria.

#### Purpose:

* This query aims to identify and analyze high-value clients based on their financial activities across different banking products.
* It provides insights into which clients have significant transactions in terms of bank orders, loans, or general transactions.
* This information can be valuable for targeted marketing campaigns, personalized financial services, or risk management strategies.

### High-Valued Accounts Based on Loan Amount

To qualify as a high-valued account, the loan amount: 
    * higher than the average loan amounts, and
    * falls within the top 20% of all the loans given out.

In [None]:
query_string = """
    SELECT TOP (CAST(0.2 * (SELECT COUNT(*) FROM Loan) AS INT)) Loan.*,
        DistrictID
    FROM 
        Loan
    LEFT JOIN 
        Account A ON A.AccountID = Loan.AccountID
    WHERE 
        Amount > (SELECT AVG(Amount) FROM Loan)
    ORDER BY 
        Amount DESC;
"""

df = fetch_data(query_string)
print("*** High-Value Clients by Loan Amount ***")
df.to_csv('../results-csv-data-going-to-powerbi/high-value-account-by-loan-amount.csv', index=False)
df.head() 

### High-Valued Accounts Based on Transaction Amount

* For high-valued account, the Transaction Amount > average of all Transaction Amounts.

In [None]:
query_string = """
    WITH AverageTransactionAmount AS (
        SELECT AVG(BT.Amount) AS AverageTransaction
        FROM BankTransaction BT
    ),
    HighValueClientsByTransaction AS (
        SELECT D.ClientID, A.AccountID, SUM(BT.Amount) AS TotalTransactionAmount
        FROM BankTransaction BT
        JOIN Account A ON BT.AccountID = A.AccountID
        JOIN Disposition D ON A.AccountID = D.AccountID
        GROUP BY D.ClientID, A.AccountID
    )
    SELECT
        HVBT.ClientID,
        HVBT.AccountID,
        HVBT.TotalTransactionAmount
    FROM 
        HighValueClientsByTransaction HVBT
    JOIN AverageTransactionAmount ATA 
        ON HVBT.TotalTransactionAmount > ATA.AverageTransaction
    ORDER BY 
        HVBT.TotalTransactionAmount DESC;

"""

df = fetch_data(query_string)
print("*** High-Value Clients by Transaction Amount ***")
df.to_csv('../load-to-powerbi/high-value-account-by-transaction-amount.csv', index=False)
df.head() 

### High-Valued Accounts Based on Bank Order Amount

* To qualify as a high-valued account, the Order Amount > average of all Order Amounts.

In [None]:
query_string = """
    WITH AverageBankOrderAmount AS (
        SELECT AVG(BO.Amount) AS AverageBankOrder
        FROM BankOrder BO
    ),
    HighValueClientsByBankOrder AS (
        SELECT D.ClientID, A.AccountID, SUM(BO.Amount) AS TotalBankOrderAmount
        FROM BankOrder BO
        JOIN Account A ON BO.AccountID = A.AccountID
        JOIN Disposition D ON A.AccountID = D.AccountID
        GROUP BY D.ClientID, A.AccountID
    )
    SELECT
        HVBO.ClientID,
        HVBO.AccountID,
        HVBO.TotalBankOrderAmount
    FROM 
        HighValueClientsByBankOrder HVBO
    JOIN AverageBankOrderAmount ABA 
        ON HVBO.TotalBankOrderAmount > ABA.AverageBankOrder
    ORDER BY 
        HVBO.TotalBankOrderAmount DESC;
"""

df = fetch_data(query_string)
print("*** High-Value Clients by Order Amount ***")
df.to_csv('../load-to-powerbi/high-value-account-by-order-amount.csv', index=False)
df.head() 

### Loan Stats

Grouping loans based on loan status.

In [None]:
query_string = """
    SELECT 
        StatusID, 
        COUNT(*) AS LoanCount, 
        SUM(Amount) AS TotalAmount, 
        AVG(Amount) AS AverageAmount, 
        MIN(Amount) AS MinAmount, 
        MAX(Amount) AS MaxAmount
    FROM 
        Loan
    GROUP BY 
        StatusID;
"""

df = fetch_data(query_string)
print("*** Grouping loans based on loan status. ***")
df.to_csv('../load-to-powerbi/loan-stats-by-status.csv', index=False)
df.head() 

### Loan Stats

Grouping loans (Per Year) based on loan status.

In [None]:
query_string = """
    SELECT
        StatusID,
        YEAR(EntryDate) AS EntryYear,
        COUNT(*) AS LoanCount,
        SUM(Amount) AS TotalAmount,
        AVG(Amount) AS AverageAmount,
        MIN(Amount) AS MinAmount,
        MAX(Amount) AS MaxAmount
    FROM
        Loan
    GROUP BY
        StatusID,
        YEAR(EntryDate);
"""

df = fetch_data(query_string)
print("*** Grouping loans (Per Year) based on loan status. ***")
df.to_csv('../load-to-powerbi/loan-stats-by-status-per-year-time-series.csv', index=False)
df.head() 

### Loan Time Series Data

Grouping loans (Per Month and Year) based on loan status.

In [None]:
query_string = """
    SELECT
        StatusID,
        YEAR(EntryDate) AS EntryYear,
        MONTH(EntryDate) AS EntryMonth,
        COUNT(*) AS LoanCount,
        SUM(Amount) AS TotalAmount,
        AVG(Amount) AS AverageAmount,
        MIN(Amount) AS MinAmount,
        MAX(Amount) AS MaxAmount
    FROM 
        Loan
    GROUP BY
        StatusID,
        YEAR(EntryDate),
        MONTH(EntryDate);
"""

df = fetch_data(query_string)
print("*** Loan Time Series Data: Grouping loans (Per Month and Year) based on loan status. ***")
df.to_csv('../load-to-powerbi/loan-stats-by-status-per-month-time-series.csv', index=False)
df.head() 

### Loan: Effect of Loan Duration on Loan Status

In [None]:
query_string = """
    SELECT
        Duration,
        StatusID,
        COUNT(*) AS LoanCount,
        ROUND(MAX(Amount), 2) AS MaxAmount,
        ROUND(MIN(Amount), 2) AS MinAmount,
        ROUND(SUM(Amount), 2) AS TotalAmount,
        ROUND(AVG(Amount), 2) AS AverageAmount
    FROM
        Loan
    GROUP BY
        Duration,
        StatusID
    ORDER BY
        Duration ASC,
        StatusID ASC;
"""

df = fetch_data(query_string)
print("*** Effect of Loan Duration on Loan Status ***")
df.to_csv('../load-to-powerbi/effect-of-duration-on-loan-status.csv', index=False)
df.head() 

### Time Series: Effect of Loan Duration on Loan Status (Per Month and Year)

In [None]:
query_string = """
    SELECT
        YEAR(EntryDate) AS LoanYear,
        MONTH(EntryDate) AS LoanMonth,
        Duration,
        StatusID,
        COUNT(*) AS LoanCount,
        ROUND(MAX(Amount), 2) AS MaxAmount,
        ROUND(MIN(Amount), 2) AS MinAmount,
        ROUND(SUM(Amount), 2) AS TotalAmount,
        ROUND(AVG(Amount), 2) AS AverageAmount
    FROM
        Loan
    GROUP BY
        YEAR(EntryDate),
        MONTH(EntryDate),
        Duration,
        StatusID
    ORDER BY
        LoanYear ASC,
        LoanMonth ASC,
        Duration ASC,
        StatusID ASC;
"""

df = fetch_data(query_string)
print("*** Time Series: Effect of Loan Duration on Loan Status (Per Month and Year) ***")
df.to_csv('../load-to-powerbi/effect-of-duration-on-loan-status-monthly-time-series.csv', index=False)
df.head() 

### Time Series: Effect of Duration and District on Loans (Per Month and Year)

In [None]:
query_string = """
    SELECT
        YEAR(L.EntryDate) AS LoanYear,
        MONTH(L.EntryDate) AS LoanMonth,
        D.DistrictID,
        Duration,
        StatusID,
        COUNT(*) AS LoanCount,
        ROUND(MAX(L.Amount), 2) AS MaxAmount,
        ROUND(MIN(L.Amount), 2) AS MinAmount,
        ROUND(SUM(L.Amount), 2) AS TotalAmount,
        ROUND(AVG(L.Amount), 2) AS AverageAmount
    FROM
        Loan L
    JOIN
        Account A ON L.AccountID = A.AccountID
    JOIN
        District D ON A.DistrictID = D.DistrictID
    GROUP BY
        YEAR(L.EntryDate),
        MONTH(L.EntryDate),
        D.DistrictID,
        Duration,
        StatusID
    ORDER BY
        LoanYear ASC,
        LoanMonth ASC,
        D.DistrictID ASC,
        Duration ASC,
        StatusID ASC;
"""

df = fetch_data(query_string)
print("*** Time Series: Effect of Duration and District on Loans (Per Month and Year) ***")
df.to_csv('../load-to-powerbi/effect-of-duration-and-district-on-loan-status-monthly-time-series.csv', index=False)
df.head() 

### Time Series (Sparse Table): Effect of Duration and District on Loans (Per Month and Year)

* CROSS JOIN to include months when there's no amount entered. 
* Fill in values for months with Amount as 0, and sort

In [None]:
query_string = """
    WITH YearList AS (
        SELECT DISTINCT YEAR(EntryDate) AS Year FROM Loan
    ),
    MonthList AS (
        SELECT DISTINCT MONTH(EntryDate) AS Month FROM Loan
    )
    SELECT
        Y.Year,
        M.Month,
        D.DistrictID,
        L.Duration,
        LS.StatusID,
        COUNT(L.LoanID) AS LoanCount,
        COALESCE(ROUND(MAX(L.Amount), 2), 0) AS MaxAmount,
        COALESCE(ROUND(MIN(L.Amount), 2), 0) AS MinAmount,
        COALESCE(ROUND(SUM(L.Amount), 2), 0) AS TotalAmount,
        COALESCE(ROUND(AVG(L.Amount), 2), 0) AS AverageAmount
    FROM
        YearList Y
    CROSS JOIN
        MonthList M
    CROSS JOIN
        District D
    CROSS JOIN
        LoanStatus LS
    LEFT JOIN
        Loan L ON Y.Year = YEAR(L.EntryDate) AND 
                  M.Month = MONTH(L.EntryDate) AND 
                  LS.StatusID = L.StatusID
    LEFT JOIN
        Account A ON L.AccountID = A.AccountID
    LEFT JOIN
        District DD ON A.DistrictID = DD.DistrictID
    WHERE
        L.Duration IS NOT NULL
    GROUP BY
        Y.Year,
        M.Month,
        D.DistrictID,
        L.Duration,
        LS.StatusID
    ORDER BY
        Y.Year ASC,
        M.Month ASC,
        D.DistrictID ASC,
        L.Duration ASC,
        LS.StatusID ASC;
"""

df = fetch_data(query_string)
print("*** Time Series (Sparse Table): Effect of Duration and District on Loans (Per Month and Year) -  ***")
df.to_csv('../load-to-powerbi/effect-of-duration-and-district-on-loan-status-monthly-time-series-sparse-data.csv', index=False)
df.head() 

### Effect of Seasons on Account Opening.

* Are there seasonal trends in client account opening habits?

In [None]:
query_string = """
    SELECT
        YEAR(EntryDate) AS Year,
        MONTH(EntryDate) AS Month,
        COUNT(DISTINCT AccountID) AS NumAccountsOpened
    FROM
        Account
    GROUP BY
        YEAR(EntryDate),
        MONTH(EntryDate)
    ORDER BY
        YEAR(EntryDate),
        MONTH(EntryDate);

    -- Seasonal trends in client account opening habits per district.
    SELECT
        YEAR(EntryDate) AS Year,
        MONTH(EntryDate) AS Month,
        DistrictID,
        COUNT(DISTINCT AccountID) AS NumAccountsOpened
    FROM
        Account
    GROUP BY
        YEAR(EntryDate),
        MONTH(EntryDate),
        DistrictID
    ORDER BY
        YEAR(EntryDate),
        MONTH(EntryDate),
        DistrictID;
"""

df = fetch_data(query_string)
print("*** Effect of Seasons on Account Opening. ***")
df.to_csv('../load-to-powerbi/effect-of-seasons-on-account-opening-monthly-time-series.csv', index=False)
df.head() 

### Effect of Seasons on Loan Payments.

* Are there seasonal trends in client loan payment habits?

In [None]:
query_string = """
    SELECT
        YEAR(EntryDate) AS Year,
        MONTH(EntryDate) AS Month,
        COUNT(DISTINCT LoanID) AS NumLoans,
        COUNT(*) AS TotalPayments,
        SUM(Payments) AS TotalPaymentAmount,
        AVG(Payments * 1.0) AS AvgPaymentAmount,
        MAX(Payments) AS MaxPaymentAmount,
        MIN(Payments) AS MinPaymentAmount
    FROM
        Loan
    GROUP BY
        YEAR(EntryDate),
        MONTH(EntryDate)
    ORDER BY
        YEAR(EntryDate),
        MONTH(EntryDate);

    -- Seasonal trends in client loan payment habits per district.
    SELECT
        YEAR(L.EntryDate) AS Year,
        MONTH(L.EntryDate) AS Month,
        A.DistrictID,
        COUNT(DISTINCT L.LoanID) AS NumLoans,
        COUNT(*) AS TotalPayments,
        SUM(L.Payments) AS TotalPaymentAmount,
        AVG(L.Payments * 1.0) AS AvgPaymentAmount,
        MAX(L.Payments) AS MaxPaymentAmount,
        MIN(L.Payments) AS MinPaymentAmount
    FROM
        Loan L
    JOIN
        Account A ON L.AccountID = A.AccountID
    GROUP BY
        YEAR(L.EntryDate),
        MONTH(L.EntryDate),
        A.DistrictID
    ORDER BY
        YEAR(L.EntryDate),
        MONTH(L.EntryDate),
        A.DistrictID;
"""

df = fetch_data(query_string)
print("*** Effect of Seasons on Loan Payments ***")
df.to_csv('../load-to-powerbi/effect-of-seasons-on-loan-payment-monthly-time-series.csv', index=False)
df.head() 

### Effect of Seasons on Clients Loan Status.

* Are there seasonal trends in client loan status habits?

In [None]:
query_string = """
    SELECT
        Y.Year,
        M.Month,
        LS.StatusID,
        COALESCE(COUNT(DISTINCT L.LoanID), 0) AS NumLoans
    FROM
        (SELECT DISTINCT YEAR(EntryDate) AS Year FROM Loan) Y
    CROSS JOIN
        (SELECT DISTINCT MONTH(EntryDate) AS Month FROM Loan) M
    CROSS JOIN
        LoanStatus LS
    LEFT JOIN
        Loan L ON Y.Year = YEAR(L.EntryDate) AND 
                  M.Month = MONTH(L.EntryDate) AND 
                  LS.StatusID = L.StatusID
    GROUP BY
        Y.Year,
        M.Month,
        LS.StatusID
    ORDER BY
        Y.Year,
        M.Month,
        LS.StatusID;
"""

df = fetch_data(query_string)
print("*** Effect of Seasons on Clients Loan Status ***")
df.to_csv('../load-to-powerbi/effect-of-seasons-on-laon-status-monthly-time-series.csv', index=False)
df.head() 

### Time Series: Seasonal Trends in Client Loan Status Habits Per District.

In [None]:
query_string = """
    SELECT
        Y.Year,
        M.Month,
        A.DistrictID,
        LS.StatusID,
        COALESCE(COUNT(DISTINCT L.LoanID), 0) AS NumLoans
    FROM
        (SELECT DISTINCT YEAR(EntryDate) AS Year FROM Loan) Y
    CROSS JOIN
        (SELECT DISTINCT MONTH(EntryDate) AS Month FROM Loan) M
    CROSS JOIN
        LoanStatus LS
    CROSS JOIN
        Account A
    LEFT JOIN
        Loan L ON Y.Year = YEAR(L.EntryDate) AND 
                  M.Month = MONTH(L.EntryDate) AND 
                  LS.StatusID = L.StatusID
        AND A.AccountID = L.AccountID
    GROUP BY
        Y.Year,
        M.Month,
        A.DistrictID,
        LS.StatusID
    ORDER BY
        Y.Year,
        M.Month,
        A.DistrictID,
        LS.StatusID;
"""

df = fetch_data(query_string)
print("*** Time Series: Seasonal Trends in Client Loan Status Habits Per District. ***")
df.to_csv('../load-to-powerbi/effect-of-seasons-on-laon-status-per-district-monthly-time-series.csv', index=False)
df.head() 

### Time Series: Seasonal Trends on Clients Habits Towards Bank Transactions.

In [None]:
query_string = """
    SELECT
        YEAR(EntryDate) AS Year,
        MONTH(EntryDate) AS Month,
        MIN(Amount) AS MinAmount,
        MAX(Amount) AS MaxAmount,
        SUM(Amount) AS TotalAmount,
        AVG(Amount * 1.0) AS AvgAmount,
        COUNT(*) AS NumTransactions
    FROM
        BankTransaction
    GROUP BY
        YEAR(EntryDate),
        MONTH(EntryDate)
    ORDER BY
        YEAR(EntryDate),
        MONTH(EntryDate);
"""

df = fetch_data(query_string)
print("*** Time Series: Seasonal Trends on Clients Habits Towards Bank Transactions. ***")
df.to_csv('../load-to-powerbi/effect-of-seasons-on-bank-transactions-monthly-time-series.csv', index=False)
df.head() 

### Time Series: Seasonal Trends of Clients' Bank Transaction Habits Per District.

In [None]:
query_string = """
    SELECT
        YEAR(BT.EntryDate) AS Year,
        MONTH(BT.EntryDate) AS Month,
        A.DistrictID,
        MIN(BT.Amount) AS MinAmount,
        MAX(BT.Amount) AS MaxAmount,
        SUM(BT.Amount) AS TotalAmount,
        AVG(BT.Amount * 1.0) AS AvgAmount,
        COUNT(*) AS NumTransactions
    FROM
        BankTransaction BT
    JOIN
        Account A ON BT.AccountID = A.AccountID
    GROUP BY
        YEAR(BT.EntryDate),
        MONTH(BT.EntryDate),
        A.DistrictID
    ORDER BY
        YEAR(BT.EntryDate),
        MONTH(BT.EntryDate),
        A.DistrictID;
"""

df = fetch_data(query_string)
print("*** Time Series: Seasonal Trends of Clients' Bank Transaction Habits Per District. ***")
df.to_csv('../load-to-powerbi/effect-of-seasons-on-bank-transactions-per-district-monthly-time-series.csv', index=False)
df.head() 

### What Banks Do Our Clients Transact With?

In [5]:
query_string = """
    SELECT
        Bank,
        COUNT(*) AS NumTransactions,
        SUM(Amount) AS TotalAmount,
        AVG(Amount * 1.0) AS AvgAmount,
        MIN(Amount) AS MinAmount,
        MAX(Amount) AS MaxAmount
    FROM
        BankTransaction
    WHERE
        Bank IS NOT NULL AND Bank <> ''
    GROUP BY
        Bank
    ORDER BY
        Bank;
"""

df = fetch_data(query_string)
print("*** What Banks Do Our Clients Transact With? ***")
#df.to_csv('../load-to-powerbi/other-banks-clients-use-for-service.csv', index=False)
df.head() 

*** What Banks Do Our Clients Transact With? ***


Unnamed: 0,Bank,NumTransactions,TotalAmount,AvgAmount,MinAmount,MaxAmount
,AB,21720,108354898.8,4988.715414,5.0,72966.0
,CD,19597,104137717.9,5313.962234,15.0,74176.0
,EF,21293,108391703.6,5090.485305,3.0,73970.0
,GH,21499,125956293.3,5858.704744,10.0,74648.0
,IJ,20525,111914481.7,5452.593505,2.0,74522.0


### What type of transactions do our clients do with other banks?

In [6]:
query_string = """
    SELECT
        Bank,
        Type,
        COUNT(*) AS NumTransactions,
        SUM(Amount) AS TotalAmount,
        AVG(Amount * 1.0) AS AvgAmount,
        MIN(Amount) AS MinAmount,
        MAX(Amount) AS MaxAmount
    FROM
        BankTransaction
    WHERE
        Bank IS NOT NULL AND Bank <> '' AND Type IS NOT NULL AND Type <> ''
    GROUP BY
        Bank, Type
    ORDER BY
        Bank, Type;
"""

df = fetch_data(query_string)
print("*** What type of transactions do our clients do with other banks? ***")
#df.to_csv('../load-to-powerbi/transactions-clients-do-with-other-banks.csv', index=False)
df.head() 

*** What type of transactions do our clients do with other banks? ***


Unnamed: 0,Bank,Type,NumTransactions,TotalAmount,AvgAmount,MinAmount,MaxAmount
,AB,Deposit,4807,54138718.0,11262.47514,2904.0,72966.0
,AB,Withdraw,16913,54216180.8,3205.592195,5.0,14707.0
,CD,Deposit,4984,57608541.0,11558.696027,2904.0,74176.0
,CD,Withdraw,14613,46529176.9,3184.094771,15.0,13461.0
,EF,Deposit,4880,50959607.0,10442.542418,2942.0,73970.0


### Close the Database Connection

In [None]:
try:
    cursor.close()
    conn.close()
except pyodbc.Error as ex:
    print("Connection error:", ex)

### End of Exploratory Data Analysis in SQL

* Next step is to load the generated tabular data into Power BI, Tableau, Excel, or Google Sheet to create visualizations and derive insights.

In [None]:
server = 'JAK-PC\\SQLEXPRESS'
database = 'BankDB'
driver = '{ODBC Driver 18 for SQL Server}'

conn_string = f'DRIVER={driver};SERVER={server};DATABASE={database};\
              Trusted_Connection=yes;Encrypt=no;TrustServerCertificate=yes'

try:
    conn = pyodbc.connect(conn_string)
    cursor = conn.cursor()
except pyodbc.Error as ex:
    print("Connection error:", ex)

In [None]:
def fetch_data(query_string):
    """
    fetch_function consumes query_string (a SQL Query Statement), and 
    produces df, a pandas dataframe that contains the result of the SQL query.
    """
    df = pd.DataFrame()

    try:
        df = pd.read_sql_query(query_string, conn)
    except pyodbc.Error as ex:
        print("Connection error:", ex) 

    blank_row_index = [''] * len(df)
    df.index = blank_row_index
    
    return df

In [None]:
query_string = """
    SELECT DISTINCT(CAST(DistrictID AS INTEGER)) DistrictID, COUNT(*) AS CountOfDistrict
    FROM Account
    GROUP BY DistrictID
    ORDER BY CAST(DistrictID AS INTEGER)
"""

df = fetch_data(query_string)
print("*** Districts with the most and least accounts and average number of accounts per district ***")
df.to_csv('../results-csv-data-going-to-powerbi/account-count-per-district-stats.csv', index=False)
df.head()

In [None]:
try:
    cursor.close()
    conn.close()
except pyodbc.Error as ex:
    print("Connection error:", ex)