# SQL and Data Viz

1. Identify the best month in terms of loan issuance. What was the quantity and amount lent in each month?
2. Which batch had the best overall adherence?
3. Do different interest rates lead to different loan outcomes in terms of default rate?
4. Rank the best 10 and 10 worst clients. Explain your methodology for constructing this ranking.
5. What is the default rate by month and batch?
6. Assess the profitability of this operation. Provide an analysis of the operation's timeline.

> adherence: clients that got loans\
> season: loan issuing month\
> default rate: defaulted/issued loans

## Importing Libraries and Establishing Database Connection

In [1]:
import pandas as pd
from sqlalchemy import create_engine
from dotenv import load_dotenv
import os

In [2]:
# Load environment variables from .env file
load_dotenv()

True

In [3]:
# Function to execute SQL queries and return results as a pandas DataFrame
def execute_query(query):
    # Create a SQLAlchemy engine
    engine = create_engine(f"postgresql+psycopg2://{os.getenv('DB_USER')}:{os.getenv('DB_PASSWORD')}@{os.getenv('DB_HOST')}:{os.getenv('DB_PORT')}/{os.getenv('DB_NAME')}")
    
    # Execute the query and return the result as a DataFrame
    with engine.connect() as connection:
        df = pd.read_sql_query(query, connection)
    return df

# Exploratory Data Analysis

In [26]:
query_five_first_on_loans = """
SELECT *
FROM loans
LIMIT 5;
"""

execute_query(query_five_first_on_loans)

Unnamed: 0,user_id,loan_id,created_at,due_at,paid_at,status,loan_amount,tax,due_amount,amount_paid
0,46937,1,2020-01-06 08:58:24,2020-04-05 08:58:24,2020-02-21 08:58:24,paid,16638.0,186.01,18071.86,18071.86
1,29211,2,2020-01-07 05:12:59,2020-04-06 05:12:59,2020-03-09 05:12:59,paid,1886.0,21.09,2331.44,2331.44
2,62030,3,2020-01-12 02:06:18,2020-04-11 02:06:18,NaT,default,39802.0,444.99,42237.09,4147.27
3,14500,4,2020-01-14 18:09:12,2020-04-13 18:09:12,2020-01-28 18:09:12,paid,5114.0,57.17,5554.72,5554.72
4,73480,5,2020-01-15 17:28:24,2020-04-14 17:28:24,2020-03-14 17:28:24,paid,22153.0,247.67,27385.1,27385.1


In [27]:
query_five_first_on_clients = """
SELECT *
FROM clients
LIMIT 5;
"""

execute_query(query_five_first_on_clients)

Unnamed: 0,user_id,created_at,status,batch,credit_limit,interest_rate,denied_reason,denied_at
0,1,2023-09-18 16:05:36,approved,1,47500,30,,NaT
1,2,2020-07-05 07:00:37,denied,1,59750,20,money_loundry,2023-07-29 02:48:33
2,3,2023-07-25 03:39:55,approved,1,73000,30,,NaT
3,4,2022-07-01 01:28:58,approved,1,14250,20,,NaT
4,5,2023-06-23 20:17:40,approved,1,23750,20,,NaT


## Analysis - Identifying the Best Month for Loan Issuance

In [4]:
query_best_month = '''
SELECT 
    DATE_TRUNC('month', created_at) AS month,
    COUNT(loan_id) AS total_quantity,
    SUM(loan_amount) AS total_amount
FROM 
    loans
GROUP BY 
    DATE_TRUNC('month', created_at)
ORDER BY 
    total_amount DESC
LIMIT 1;
'''

In [5]:
execute_query(query_best_month)

Unnamed: 0,month,total_quantity,total_amount
0,2023-12-01,17351,442464966.0



The analysis indicates that December 2023 had the highest loan issuance, with a total of 17,351 loans issued and a total amount lent of $442,464,966.00. This information provides insights into the peak activity of loan issuance, which can be further analyzed to understand potential factors contributing to the increased demand for loans during that month.

In [12]:
#  What was the quantity and amount lent in each month?
query_monthly = '''
SELECT 
    DATE_TRUNC('month', created_at) AS month,
    COUNT(loan_id) AS total_quantity,
    SUM(loan_amount) AS total_amount
FROM
    loans
GROUP BY    
    DATE_TRUNC('month', created_at)
ORDER BY
    month;
'''

In [13]:
execute_query(query_monthly)

Unnamed: 0,month,total_quantity,total_amount
0,2020-01-01,16,348731.0
1,2020-02-01,59,1723978.0
2,2020-03-01,107,2460062.0
3,2020-04-01,145,3465180.0
4,2020-05-01,161,4323270.0
5,2020-06-01,224,5918356.0
6,2020-07-01,274,7086345.0
7,2020-08-01,314,7998350.0
8,2020-09-01,343,8852936.0
9,2020-10-01,464,11477276.0


## Analysis - Identifying the Batch with the Best Overall Adherence

In [43]:
# Batch with the best overall Adherence
query_best_adherence = '''
SELECT 
    c.batch AS batch_id,
    COUNT(l.user_id) AS total_loans,
    SUM(CASE WHEN l.status = 'paid' THEN 1 ELSE 0 END) AS paid_loans,
    SUM(CASE WHEN l.status = 'paid' THEN 1.0 ELSE 0 END) / COUNT(l.user_id) AS adherence
FROM 
    loans l
JOIN 
    clients c ON l.user_id = c.user_id
GROUP BY 
    c.batch
ORDER BY 
    adherence DESC;
'''

execute_query(query_best_adherence)

Unnamed: 0,batch_id,total_loans,paid_loans,adherence
0,2,37415,22558,0.602913
1,3,8958,5350,0.597232
2,1,98364,58248,0.592168
3,4,5971,3439,0.57595


the result indicates that batch number 2 had the highest proportion of clients who successfully repaid their loans compared to the other batches, with an adherence rate of approximately 60.29%. This suggests that clients in batch 2 demonstrated better adherence to loan repayment obligations compared to clients in other batches.

## Analysis - Examining the Relationship Between Interest Rates and Loan Outcomes

In [46]:
query_interest_rates_loan_outcomes = '''
SELECT 
    c.batch,
    COUNT(*) AS total_loans,
    SUM(CASE WHEN l.status = 'default' THEN 1 ELSE 0 END) AS defaulted_loans,
    SUM(CASE WHEN l.status = 'default' THEN 1.0 ELSE 0 END) / COUNT(*) AS default_rate
FROM 
    loans l
JOIN 
    clients c ON l.user_id = c.user_id
GROUP BY 
    c.batch;
'''

execute_query(query_interest_rates_loan_outcomes)

Unnamed: 0,batch,total_loans,defaulted_loans,default_rate
0,1,98364,7906,0.080375
1,2,37415,3226,0.086222
2,3,8958,718,0.080152
3,4,5971,491,0.082231


## Analysis - Ranking the Best and Worst Clients

## Analysis - Determining Default Rate by Month and Batch

## Analysis - Assessing the Profitability of the Operation