## Exploratory Data Analysis (EDA) in SQL - Part 3

### Introduction:

In this notebook, we leverage the power of SQL to conduct Exploratory Data Analysis (EDA). The primary objective is to unearth valuable insights from the dataset, paving the way for informed decision-making and strategic planning.

### Purpose:

The purpose of this EDA in SQL is to:
- Explore the dataset comprehensively to understand its characteristics, distributions, and relationships.
- Identify key patterns, trends, and anomalies within the data.
- Generate actionable insights to drive business decisions and strategies.

### Key Features:

- **Versatility of SQL:** Showcase the flexibility and effectiveness of SQL for exploratory analysis.
- **Insight Generation:** Utilize SQL queries to extract meaningful insights and derive actionable conclusions.
- **Data Preparation for Visualization:** Save the results of SQL queries as CSV files for ingestion into visualization tools such as Power BI, Tableau, MS Excel, and Google Sheets.

### Next Steps:

1. **Query Execution:** Execute SQL queries to explore various aspects of the dataset, including summary statistics, distributions, and relationships between variables.
  
2. **Insight Generation:** Analyze the results of SQL queries to identify patterns, trends, and correlations within the data.

3. **Export to CSV:** Save the results of SQL queries as CSV files to facilitate ingestion into visualization tools for further analysis and visualization.

By leveraging SQL for exploratory data analysis, we aim to extract actionable insights and unlock the full potential of the dataset to drive business success.

In [2]:
import pandas as pd
import pyodbc
import warnings
warnings.filterwarnings('ignore')

### Establish connection to the SQL Server Database

In [3]:
server = 'JAK-PC\\SQLEXPRESS'
database = 'BankDB'
driver = '{ODBC Driver 18 for SQL Server}'

conn_string = f'DRIVER={driver};SERVER={server};DATABASE={database};\
              Trusted_Connection=yes;Encrypt=no;TrustServerCertificate=yes'

try:
    conn = pyodbc.connect(conn_string)
    cursor = conn.cursor()
except pyodbc.Error as ex:
    print("Connection error:", ex)

### Function to fetch the data from the SQL Server Database (Python and Pandas)

In [4]:
def fetch_data(query_string):
    """
    fetch_function consumes query_string (a SQL Query Statement), and 
    produces df, a pandas dataframe that contains the result of the SQL query.
    """
    df = pd.DataFrame()

    try:
        df = pd.read_sql_query(query_string, conn)
    except pyodbc.Error as ex:
        print("Connection error:", ex) 

    blank_row_index = [''] * len(df)
    df.index = blank_row_index
    
    return df

### Customer Segmentation

Identify **5** different segments of customers based on their transaction behavior.

In [None]:
query_string = """
    WITH CustomerSegments AS (
        SELECT 
            AccountID,
            AVG(Amount) AS AvgTransactionAmount,
            NTILE(5) OVER (ORDER BY AVG(Amount)) AS Segment
        FROM BankTransaction
        GROUP BY AccountID
    )
    SELECT 
        Segment,
        AVG(AvgTransactionAmount) AS AverageAmount,
        STRING_AGG(AccountID, ', ') AS AccountIDsInSegment
    FROM CustomerSegments
    GROUP BY Segment
    ORDER BY Segment;
"""

df = fetch_data(query_string)
print("*** 5 Customer Segments - Based on Average Transaction Amount ***")
df.to_csv('../results-csv-data-going-to-powerbi/segment-5-buckets-on-transaction-amount.csv', index=False)
df.head() 

### Account Volume by District

In [None]:
query_string = """
    SELECT DistrictID, 
           COUNT(*) AS CountOfDistrict
    FROM Account
    GROUP BY DistrictID
    ORDER BY CountOfDistrict DESC
"""

df = fetch_data(query_string)
print("*** Account Volume by District ***")
df.to_csv('../results-csv-data-going-to-powerbi/account-volume-by-district.csv', index=False)
df.head()

### Loan Payments by District

In [None]:
query_string = """
    SELECT D.DistrictID, 
        SUM(L.Payments) AS SumOfPayments, 
        AVG(L.Payments) AS AvgOfPayments, 
        COUNT(L.Payments) AS CountOfPayments
    FROM District D
    LEFT JOIN Account A ON D.DistrictID = A.DistrictID
    LEFT JOIN Loan L ON A.AccountID = L.AccountID
    GROUP BY D.DistrictID
"""

df = fetch_data(query_string)
print("*** Loan Payments by District ***")
df.to_csv('../results-csv-data-going-to-powerbi/loan-payments-by-district.csv', index=False)
df.head()    

### Bank Transaction by District

In [None]:
query_string = """
    SELECT D.DistrictID, 
        SUM(BT.Amount) AS SumOfAmounts, 
        AVG(BT.Amount) AS AvgOfAmounts, 
        COUNT(BT.Amount) AS CountOfAmounts
    FROM District D
    LEFT JOIN Account A ON D.DistrictID = A.DistrictID
    LEFT JOIN BankTransaction BT ON A.AccountID = BT.AccountID
    GROUP BY D.DistrictID
    ORDER BY SumOfAmounts
    """

df = fetch_data(query_string)
print("*** Accounts with the highest, lowest, and average transaction amounts. ***")
df.to_csv('../results-csv-data-going-to-powerbi/transaction-amount-by-district.csv', index=False)
df.head() 

### Customer Segmentation: Identifying High-Value Clients

#### High-Value Clients:
- **Definition:** High-value clients represent individuals whose financial engagements exceed standard thresholds, showcasing significant activity within the banking realm.
- **Characteristics:** A high-value account is distinguished by holding an amount exceeding a threshold in account balances, loan amounts, transaction volume, loan payments, etc. It is indicative of substantial financial involvement.
- **Varied Criteria:** High-value client classification encompasses diverse criteria tailored to specific financial contexts.

#### Purpose:
This analysis aims to pinpoint and assess high-value clients based on their financial interactions across diverse banking services. By delving into the transactions associated with bank orders, loans, and general financial activities, this query sheds light on customers with notable transactional patterns. Such insights hold significance for targeted marketing initiatives, tailored financial offerings, and effective risk management strategies.


### Top *N* High-Valued Accounts Criteria: 

To qualify as a high-valued account, the following criteria are assessed:
- **Loan Amount:** Accounts with loan amounts surpassing a threshold.
- **Loan Payment Volume:** Accounts with loan payments surpassing a threshold.
- **Bank Account Balance:** Accounts with higher than average balance.
- **Deposit Volume:** Accounts with bank deposit volume amounts surpassing a threshold.

This approach ensures the identification of the bank's high value clients.

In [5]:
query_string = """
    SELECT TOP (CAST(0.2 * (SELECT COUNT(*) FROM Loan) AS INT)) Loan.*,
        DistrictID
    FROM 
        Loan
    LEFT JOIN 
        Account A ON A.AccountID = Loan.AccountID
    WHERE 
        Amount > (SELECT AVG(Amount) FROM Loan)
    ORDER BY 
        Amount DESC;
"""

df = fetch_data(query_string)
print("*** High-Value Clients by Loan Amount ***")
df.to_csv('../results-csv-data-going-to-powerbi/high-value-account-by-loan-amount.csv', index=False)
df.head() 

*** High-Value Clients by Loan Amount ***


Unnamed: 0,LoanID,AccountID,EntryDate,Amount,Duration,Payments,StatusID,DistrictID
,6534,7542,1997-10-19,590820.0,60,9847.0,C,54
,6791,8926,1998-01-23,566640.0,60,9444.0,C,1
,5447,2335,1997-11-12,541200.0,60,9020.0,D,70
,5132,817,1995-02-17,538500.0,60,8975.0,C,5
,5569,2936,1998-01-20,504000.0,60,8400.0,C,3


### High-Valued Accounts Based on Transaction Amount

* For high-valued account, the Transaction Amount > average of all Transaction Amounts.

In [None]:
query_string = """
    WITH AverageTransactionAmount AS (
        SELECT AVG(BT.Amount) AS AverageTransaction
        FROM BankTransaction BT
    ),
    HighValueClientsByTransaction AS (
        SELECT D.ClientID, A.AccountID, SUM(BT.Amount) AS TotalTransactionAmount
        FROM BankTransaction BT
        JOIN Account A ON BT.AccountID = A.AccountID
        JOIN Disposition D ON A.AccountID = D.AccountID
        GROUP BY D.ClientID, A.AccountID
    )
    SELECT
        HVBT.ClientID,
        HVBT.AccountID,
        HVBT.TotalTransactionAmount
    FROM 
        HighValueClientsByTransaction HVBT
    JOIN AverageTransactionAmount ATA 
        ON HVBT.TotalTransactionAmount > ATA.AverageTransaction
    ORDER BY 
        HVBT.TotalTransactionAmount DESC;
"""

df = fetch_data(query_string)
print("*** High-Value Clients by Transaction Amount ***")
df.to_csv('../load-to-powerbi/high-value-account-by-transaction-amount.csv', index=False)
df.head() 

### High-Valued Accounts Based on Bank Order Amount

* To qualify as a high-valued account, the Order Amount > average of all Order Amounts.

In [None]:
query_string = """
    WITH AverageBankOrderAmount AS (
        SELECT AVG(BO.Amount) AS AverageBankOrder
        FROM BankOrder BO
    ),
    HighValueClientsByBankOrder AS (
        SELECT D.ClientID, A.AccountID, SUM(BO.Amount) AS TotalBankOrderAmount
        FROM BankOrder BO
        JOIN Account A ON BO.AccountID = A.AccountID
        JOIN Disposition D ON A.AccountID = D.AccountID
        GROUP BY D.ClientID, A.AccountID
    )
    SELECT
        HVBO.ClientID,
        HVBO.AccountID,
        HVBO.TotalBankOrderAmount
    FROM 
        HighValueClientsByBankOrder HVBO
    JOIN AverageBankOrderAmount ABA 
        ON HVBO.TotalBankOrderAmount > ABA.AverageBankOrder
    ORDER BY 
        HVBO.TotalBankOrderAmount DESC;
"""

df = fetch_data(query_string)
print("*** High-Value Clients by Order Amount ***")
df.to_csv('../load-to-powerbi/high-value-account-by-order-amount.csv', index=False)
df.head() 

### Loan Stats

Grouping loans based on loan status.

In [6]:
query_string = """
    SELECT 
        StatusID, 
        COUNT(*) AS LoanCount, 
        SUM(Amount) AS TotalAmount, 
        AVG(Amount) AS AverageAmount, 
        MIN(Amount) AS MinAmount, 
        MAX(Amount) AS MaxAmount
    FROM 
        Loan
    GROUP BY 
        StatusID;
"""

df = fetch_data(query_string)
print("*** Grouping loans based on loan status. ***")
df.to_csv('../load-to-powerbi/loan-stats-by-status.csv', index=False)
df.head() 

*** Grouping loans based on loan status. ***


Unnamed: 0,StatusID,LoanCount,TotalAmount,AverageAmount,MinAmount,MaxAmount
,A,203,18603216.0,91641.458128,4980.0,323472.0
,B,31,4362348.0,140720.903225,29448.0,464520.0
,C,403,69078372.0,171410.352357,5148.0,590820.0
,D,45,11217804.0,249284.533333,36204.0,541200.0


Grouping loans (Per Year) based on loan status.

In [7]:
query_string = """
    SELECT
        StatusID,
        YEAR(EntryDate) AS EntryYear,
        COUNT(*) AS LoanCount,
        SUM(Amount) AS TotalAmount,
        AVG(Amount) AS AverageAmount,
        MIN(Amount) AS MinAmount,
        MAX(Amount) AS MaxAmount
    FROM
        Loan
    GROUP BY
        StatusID,
        YEAR(EntryDate);
"""

df = fetch_data(query_string)
print("*** Grouping loans (Per Year) based on loan status. ***")
df.to_csv('../load-to-powerbi/loan-stats-by-status-per-year-time-series.csv', index=False)
df.head() 

*** Grouping loans (Per Year) based on loan status. ***


Unnamed: 0,StatusID,EntryYear,LoanCount,TotalAmount,AverageAmount,MinAmount,MaxAmount
,A,1993,16,1807992.0,112999.5,21924.0,274740.0
,B,1993,4,811284.0,202821.0,75624.0,464520.0
,A,1994,73,7537632.0,103255.232876,4980.0,323472.0
,B,1994,12,2163972.0,180331.0,49320.0,299088.0
,C,1994,14,2943300.0,210235.714285,50460.0,398640.0


### Loan Time Series Data

Grouping loans (Per Month and Year) based on loan status.

In [None]:
query_string = """
    SELECT
        StatusID,
        YEAR(EntryDate) AS EntryYear,
        MONTH(EntryDate) AS EntryMonth,
        COUNT(*) AS LoanCount,
        SUM(Amount) AS TotalAmount,
        AVG(Amount) AS AverageAmount,
        MIN(Amount) AS MinAmount,
        MAX(Amount) AS MaxAmount
    FROM 
        Loan
    GROUP BY
        StatusID,
        YEAR(EntryDate),
        MONTH(EntryDate);
"""

df = fetch_data(query_string)
print("*** Loan Time Series Data: Grouping loans (Per Month and Year) based on loan status. ***")
df.to_csv('../load-to-powerbi/loan-stats-by-status-per-month-time-series.csv', index=False)
df.head() 

### Loan: Effect of Loan Duration on Loan Status

In [None]:
query_string = """
    SELECT
        Duration,
        StatusID,
        COUNT(*) AS LoanCount,
        ROUND(MAX(Amount), 2) AS MaxAmount,
        ROUND(MIN(Amount), 2) AS MinAmount,
        ROUND(SUM(Amount), 2) AS TotalAmount,
        ROUND(AVG(Amount), 2) AS AverageAmount
    FROM
        Loan
    GROUP BY
        Duration,
        StatusID
    ORDER BY
        Duration ASC,
        StatusID ASC;
"""

df = fetch_data(query_string)
print("*** Effect of Loan Duration on Loan Status ***")
df.to_csv('../load-to-powerbi/effect-of-duration-on-loan-status.csv', index=False)
df.head() 

### Time Series: Effect of Loan Duration on Loan Status (Per Month and Year)

In [None]:
query_string = """
    SELECT
        YEAR(EntryDate) AS LoanYear,
        MONTH(EntryDate) AS LoanMonth,
        Duration,
        StatusID,
        COUNT(*) AS LoanCount,
        ROUND(MAX(Amount), 2) AS MaxAmount,
        ROUND(MIN(Amount), 2) AS MinAmount,
        ROUND(SUM(Amount), 2) AS TotalAmount,
        ROUND(AVG(Amount), 2) AS AverageAmount
    FROM
        Loan
    GROUP BY
        YEAR(EntryDate),
        MONTH(EntryDate),
        Duration,
        StatusID
    ORDER BY
        LoanYear ASC,
        LoanMonth ASC,
        Duration ASC,
        StatusID ASC;
"""

df = fetch_data(query_string)
print("*** Time Series: Effect of Loan Duration on Loan Status (Per Month and Year) ***")
df.to_csv('../load-to-powerbi/effect-of-duration-on-loan-status-monthly-time-series.csv', index=False)
df.head() 

### Time Series: Effect of Duration and District on Loans (Per Month and Year)

In [None]:
query_string = """
    SELECT
        YEAR(L.EntryDate) AS LoanYear,
        MONTH(L.EntryDate) AS LoanMonth,
        D.DistrictID,
        Duration,
        StatusID,
        COUNT(*) AS LoanCount,
        ROUND(MAX(L.Amount), 2) AS MaxAmount,
        ROUND(MIN(L.Amount), 2) AS MinAmount,
        ROUND(SUM(L.Amount), 2) AS TotalAmount,
        ROUND(AVG(L.Amount), 2) AS AverageAmount
    FROM
        Loan L
    JOIN
        Account A ON L.AccountID = A.AccountID
    JOIN
        District D ON A.DistrictID = D.DistrictID
    GROUP BY
        YEAR(L.EntryDate),
        MONTH(L.EntryDate),
        D.DistrictID,
        Duration,
        StatusID
    ORDER BY
        LoanYear ASC,
        LoanMonth ASC,
        D.DistrictID ASC,
        Duration ASC,
        StatusID ASC;
"""

df = fetch_data(query_string)
print("*** Time Series: Effect of Duration and District on Loans (Per Month and Year) ***")
df.to_csv('../load-to-powerbi/effect-of-duration-and-district-on-loan-status-monthly-time-series.csv', index=False)
df.head() 

### Effect of Seasons on Account Opening.

* Are there seasonal trends in client account opening habits?

In [None]:
query_string = """
    SELECT
        YEAR(EntryDate) AS Year,
        MONTH(EntryDate) AS Month,
        COUNT(DISTINCT AccountID) AS NumAccountsOpened
    FROM
        Account
    GROUP BY
        YEAR(EntryDate),
        MONTH(EntryDate)
    ORDER BY
        YEAR(EntryDate),
        MONTH(EntryDate);

    -- Seasonal trends in client account opening habits per district.
    SELECT
        YEAR(EntryDate) AS Year,
        MONTH(EntryDate) AS Month,
        DistrictID,
        COUNT(DISTINCT AccountID) AS NumAccountsOpened
    FROM
        Account
    GROUP BY
        YEAR(EntryDate),
        MONTH(EntryDate),
        DistrictID
    ORDER BY
        YEAR(EntryDate),
        MONTH(EntryDate),
        DistrictID;
"""

df = fetch_data(query_string)
print("*** Effect of Seasons on Account Opening. ***")
df.to_csv('../load-to-powerbi/effect-of-seasons-on-account-opening-monthly-time-series.csv', index=False)
df.head() 

### Effect of Seasons on Loan Payments.

* Are there seasonal trends in client loan payment habits?

In [None]:
query_string = """
    SELECT
        YEAR(EntryDate) AS Year,
        MONTH(EntryDate) AS Month,
        COUNT(DISTINCT LoanID) AS NumLoans,
        COUNT(*) AS TotalPayments,
        SUM(Payments) AS TotalPaymentAmount,
        AVG(Payments * 1.0) AS AvgPaymentAmount,
        MAX(Payments) AS MaxPaymentAmount,
        MIN(Payments) AS MinPaymentAmount
    FROM
        Loan
    GROUP BY
        YEAR(EntryDate),
        MONTH(EntryDate)
    ORDER BY
        YEAR(EntryDate),
        MONTH(EntryDate);

    -- Seasonal trends in client loan payment habits per district.
    SELECT
        YEAR(L.EntryDate) AS Year,
        MONTH(L.EntryDate) AS Month,
        A.DistrictID,
        COUNT(DISTINCT L.LoanID) AS NumLoans,
        COUNT(*) AS TotalPayments,
        SUM(L.Payments) AS TotalPaymentAmount,
        AVG(L.Payments * 1.0) AS AvgPaymentAmount,
        MAX(L.Payments) AS MaxPaymentAmount,
        MIN(L.Payments) AS MinPaymentAmount
    FROM
        Loan L
    JOIN
        Account A ON L.AccountID = A.AccountID
    GROUP BY
        YEAR(L.EntryDate),
        MONTH(L.EntryDate),
        A.DistrictID
    ORDER BY
        YEAR(L.EntryDate),
        MONTH(L.EntryDate),
        A.DistrictID;
"""

df = fetch_data(query_string)
print("*** Effect of Seasons on Loan Payments ***")
df.to_csv('../load-to-powerbi/effect-of-seasons-on-loan-payment-monthly-time-series.csv', index=False)
df.head() 

### Effect of Seasons on Clients Loan Status.

* Are there seasonal trends in client loan status habits?

In [None]:
query_string = """
    SELECT
        Y.Year,
        M.Month,
        LS.StatusID,
        COALESCE(COUNT(DISTINCT L.LoanID), 0) AS NumLoans
    FROM
        (SELECT DISTINCT YEAR(EntryDate) AS Year FROM Loan) Y
    CROSS JOIN
        (SELECT DISTINCT MONTH(EntryDate) AS Month FROM Loan) M
    CROSS JOIN
        LoanStatus LS
    LEFT JOIN
        Loan L ON Y.Year = YEAR(L.EntryDate) AND 
                  M.Month = MONTH(L.EntryDate) AND 
                  LS.StatusID = L.StatusID
    GROUP BY
        Y.Year,
        M.Month,
        LS.StatusID
    ORDER BY
        Y.Year,
        M.Month,
        LS.StatusID;
"""

df = fetch_data(query_string)
print("*** Effect of Seasons on Clients Loan Status ***")
df.to_csv('../load-to-powerbi/effect-of-seasons-on-laon-status-monthly-time-series.csv', index=False)
df.head() 

### Time Series: Seasonal Trends in Client Loan Status Habits Per District.

In [None]:
query_string = """
    SELECT
        Y.Year,
        M.Month,
        A.DistrictID,
        LS.StatusID,
        COALESCE(COUNT(DISTINCT L.LoanID), 0) AS NumLoans
    FROM
        (SELECT DISTINCT YEAR(EntryDate) AS Year FROM Loan) Y
    CROSS JOIN
        (SELECT DISTINCT MONTH(EntryDate) AS Month FROM Loan) M
    CROSS JOIN
        LoanStatus LS
    CROSS JOIN
        Account A
    LEFT JOIN
        Loan L ON Y.Year = YEAR(L.EntryDate) AND 
                  M.Month = MONTH(L.EntryDate) AND 
                  LS.StatusID = L.StatusID
        AND A.AccountID = L.AccountID
    GROUP BY
        Y.Year,
        M.Month,
        A.DistrictID,
        LS.StatusID
    ORDER BY
        Y.Year,
        M.Month,
        A.DistrictID,
        LS.StatusID;
"""

df = fetch_data(query_string)
print("*** Time Series: Seasonal Trends in Client Loan Status Habits Per District. ***")
df.to_csv('../load-to-powerbi/effect-of-seasons-on-laon-status-per-district-monthly-time-series.csv', index=False)
df.head() 

### Time Series: Seasonal Trends on Clients Habits Towards Bank Transactions.

In [None]:
query_string = """
    SELECT
        YEAR(EntryDate) AS Year,
        MONTH(EntryDate) AS Month,
        MIN(Amount) AS MinAmount,
        MAX(Amount) AS MaxAmount,
        SUM(Amount) AS TotalAmount,
        AVG(Amount * 1.0) AS AvgAmount,
        COUNT(*) AS NumTransactions
    FROM
        BankTransaction
    GROUP BY
        YEAR(EntryDate),
        MONTH(EntryDate)
    ORDER BY
        YEAR(EntryDate),
        MONTH(EntryDate);
"""

df = fetch_data(query_string)
print("*** Time Series: Seasonal Trends on Clients Habits Towards Bank Transactions. ***")
df.to_csv('../load-to-powerbi/effect-of-seasons-on-bank-transactions-monthly-time-series.csv', index=False)
df.head() 

### Time Series: Seasonal Trends of Clients' Bank Transaction Habits Per District.

In [None]:
query_string = """
    SELECT
        YEAR(BT.EntryDate) AS Year,
        MONTH(BT.EntryDate) AS Month,
        A.DistrictID,
        MIN(BT.Amount) AS MinAmount,
        MAX(BT.Amount) AS MaxAmount,
        SUM(BT.Amount) AS TotalAmount,
        AVG(BT.Amount * 1.0) AS AvgAmount,
        COUNT(*) AS NumTransactions
    FROM
        BankTransaction BT
    JOIN
        Account A ON BT.AccountID = A.AccountID
    GROUP BY
        YEAR(BT.EntryDate),
        MONTH(BT.EntryDate),
        A.DistrictID
    ORDER BY
        YEAR(BT.EntryDate),
        MONTH(BT.EntryDate),
        A.DistrictID;
"""

df = fetch_data(query_string)
print("*** Time Series: Seasonal Trends of Clients' Bank Transaction Habits Per District. ***")
df.to_csv('../load-to-powerbi/effect-of-seasons-on-bank-transactions-per-district-monthly-time-series.csv', index=False)
df.head() 

### What Banks Do Our Clients Transact With?

In [5]:
query_string = """
    SELECT
        Bank,
        COUNT(*) AS NumTransactions,
        SUM(Amount) AS TotalAmount,
        AVG(Amount * 1.0) AS AvgAmount,
        MIN(Amount) AS MinAmount,
        MAX(Amount) AS MaxAmount
    FROM
        BankTransaction
    WHERE
        Bank IS NOT NULL AND Bank <> ''
    GROUP BY
        Bank
    ORDER BY
        Bank;
"""

df = fetch_data(query_string)
print("*** What Banks Do Our Clients Transact With? ***")
#df.to_csv('../load-to-powerbi/other-banks-clients-use-for-service.csv', index=False)
df.head() 

*** What Banks Do Our Clients Transact With? ***


Unnamed: 0,Bank,NumTransactions,TotalAmount,AvgAmount,MinAmount,MaxAmount
,AB,21720,108354898.8,4988.715414,5.0,72966.0
,CD,19597,104137717.9,5313.962234,15.0,74176.0
,EF,21293,108391703.6,5090.485305,3.0,73970.0
,GH,21499,125956293.3,5858.704744,10.0,74648.0
,IJ,20525,111914481.7,5452.593505,2.0,74522.0


### What type of transactions do our clients do with other banks?

In [6]:
query_string = """
    SELECT
        Bank,
        Type,
        COUNT(*) AS NumTransactions,
        SUM(Amount) AS TotalAmount,
        AVG(Amount * 1.0) AS AvgAmount,
        MIN(Amount) AS MinAmount,
        MAX(Amount) AS MaxAmount
    FROM
        BankTransaction
    WHERE
        Bank IS NOT NULL AND Bank <> '' AND Type IS NOT NULL AND Type <> ''
    GROUP BY
        Bank, Type
    ORDER BY
        Bank, Type;
"""

df = fetch_data(query_string)
print("*** What type of transactions do our clients do with other banks? ***")
#df.to_csv('../load-to-powerbi/transactions-clients-do-with-other-banks.csv', index=False)
df.head() 

*** What type of transactions do our clients do with other banks? ***


Unnamed: 0,Bank,Type,NumTransactions,TotalAmount,AvgAmount,MinAmount,MaxAmount
,AB,Deposit,4807,54138718.0,11262.47514,2904.0,72966.0
,AB,Withdraw,16913,54216180.8,3205.592195,5.0,14707.0
,CD,Deposit,4984,57608541.0,11558.696027,2904.0,74176.0
,CD,Withdraw,14613,46529176.9,3184.094771,15.0,13461.0
,EF,Deposit,4880,50959607.0,10442.542418,2942.0,73970.0


### Close the Database Connection

In [None]:
try:
    cursor.close()
    conn.close()
except pyodbc.Error as ex:
    print("Connection error:", ex)

### End of Exploratory Data Analysis in SQL

* Next step is to load the generated tabular data into Power BI, Tableau, Excel, or Google Sheet to create visualizations and derive insights.