- Developed by [Algoritma](https://algorit.ma)'s product division and instructors team

# Introduction

This notebook is designed to facilitate the visualization of our analysis before we integrate it into the Streamlit dashboard, streamlining our workflow. By using this notebook, we can perform preliminary data visualizations and explore different insights interactively.

For our visualizations, we will use `plotly.express`, a powerful library we studied earlier. 

In this notebook, we will:
1. **Prepare and Clean Data**: Load and preprocess the data to ensure it's ready for visualization.
2. **Generate Visualizations**: Use `plotly.express` to create various plots and charts, exploring the data and uncovering insights.
3. **Evaluate Visualizations**: Assess the effectiveness of the visualizations in conveying the desired information and make adjustments as needed.

Once the visualizations are finalized and validated in this notebook, we can seamlessly integrate them into our Streamlit dashboard, enhancing the overall user experience and interactivity of our final application.



# Libraries

In [1]:
import pandas as pd
import plotly.express as px

# Import Data

We will use the `loan_clean` dataset, which we have used previously. First, we'll load the data and review its structure:

In [2]:
loan = pd.read_pickle('data_input/loan_clean')

The dataset consists of 10,307 entries and includes the following columns:

- **id**: Unique identifier for each loan (integer).
- **issue_date**: The date when the loan was issued (datetime).
- **employment_length**: Length of employment of the borrower (float).
- **home_ownership**: Type of home ownership (category).
- **income_category**: Category of the borrower's income (category).
- **annual_income**: Annual income of the borrower (integer).
- **loan_amount**: Amount of the loan (integer).
- **term**: Term of the loan (category).
- **purpose**: Purpose of the loan (category).
- **interest_payments**: Type of interest payments (category).
- **loan_condition**: Current condition of the loan (category).
- **interest_rate**: Interest rate of the loan (float).
- **grade**: Grade assigned to the loan (category).
- **dti**: Debt-to-income ratio (float).
- **total_payment**: Total payment amount (float).
- **installment**: Monthly installment amount (float).
- **issue_weekday**: Day of the week when the loan was issued (object).

# Dashboard Analysis Features

For our Streamlit dashboard, we will provide an analysis of the loan data that includes:

- **Overview**: A summary of key metrics and general statistics about the loan dataset.
- **Time-Based Analysis**
- **Loan Performance**: Analysis of how different loans are performing, including factors like loan conditions and grades.
- **Financial Analysis**: Insights into financial aspects such as income categories, loan amounts, interest rates, and total payments.

This structured approach will help us present a comprehensive view of the loan data and provide valuable insights through the dashboard.


## Overview

In the overview section of our Streamlit dashboard, we will display key metrics to provide a snapshot of the loan data. Here are the metrics we will calculate and display:

**1. Total Loans**

   - This metric shows the total number of loans in the dataset.

In [12]:
print(f'Total Loans: {loan.shape[0]:,}')

f"{loan['id'].count():,.0f}"
#f"{loan['id'].count():,.0f}".replace(',','.')

Total Loans: 10,307


'10,307'

**2. Total Loan Amount**

   - This metric provides the sum of all loan amounts in the dataset.

In [19]:
print(f'Total Loan Amount: ${loan['loan_amount'].sum():,.2f}')

f"${loan['loan_amount'].sum():,.0f}"

Total Loan Amount: $158,391,575.00


'$158,391,575'

**3. Average Interest Rate**

   - This metric calculates the average interest rate across all loans.

In [22]:
print(f'Average Interest Rate: {loan['interest_rate'].mean():.0f}%')

f"{loan['interest_rate'].mean():,.0f}"

Average Interest Rate: 13%


'13'

**4. Average Loan Amount**

   - This metric shows the average amount of the loans.

In [6]:
print(f'Average Loan Amount: ${loan['loan_amount'].mean():,.0f}')

Average Loan Amount: $15,367


## Time-Based Analysis

In this section, we will analyze how loan data trends over time. This includes examining the number of loans issued, the total loan amount over time, and the distribution of loans based on the day of the week.

**1. Loans Issued Over Time**

We will create a line chart to show the number of loans issued over time. This will help us identify trends, such as periods with high or low loan issuance.

In [7]:
# Data wrangling: Aggregate number of loans issued per date
data_agg = loan.groupby(['issue_date']).count()['id'].reset_index()

px.line(
    data_agg,
    x='issue_date',
    y='id',
    markers=True,
    title='Number of Loans Issued Over Time',
    labels={
        'issue_date': 'Issue Date',
        'id': 'Number of Loans'
    },
    template="seaborn"
)

**2. Loan Amount Over Time**

We will create a line or bar chart to display the total loan amount issued over time. This visualization will help us understand the trends in the total loan amount and identify any significant changes over different periods.


In [8]:
# Data wrangling: Aggregate total loan amount per date
data_amount_agg = loan.groupby(['issue_date'])['loan_amount'].sum().reset_index()

# Create a line chart for the total loan amount over time
px.line(
    data_amount_agg,
    x='issue_date',
    y='loan_amount',
    markers=True,
    title='Total Loan Amount Issued Over Time',
    labels={
        'issue_date': 'Issue Date',
        'loan_amount': 'Total Loan Amount'
    },
    template="seaborn"
)

**3. Issue Date Analysis**

We will create a bar chart to show the distribution of loans based on the day of the week. This will help us analyze if there are any specific days when loans are more frequently issued.

In [9]:
# Data wrangling: Count the number of loans issued per weekday
weekday_counts = loan['issue_weekday'].value_counts().sort_index()

# Create a bar chart for loan distribution by day of the week
px.bar(
    weekday_counts,
    x=weekday_counts.index,
    y=weekday_counts.values,
    labels={
        'issue_weekday': 'Day of the Week',
        'y': 'Number of Loans'
    },
    title='Distribution of Loans by Day of the Week',
    template="seaborn"
)


## Loan Performance

In the "Loan Performance" section, we will analyze and visualize the performance of loans based on various criteria. This section will include:

**1. Loan Condition Analysis**

We will display the distribution of loans based on their condition using a pie chart. This visualization will help us understand the proportion of loans in different conditions.


In [10]:
# Calculate the distribution of loan conditions
loan_condition_counts = loan['loan_condition'].value_counts()

# Create a pie chart for loan condition analysis
px.pie(
    loan_condition_counts,
    names=loan_condition_counts.index,
    values=loan_condition_counts.values,
    hole=0.4,
    labels={
        'loan_condition': 'Loan Condition',
        'value': 'Number of Loans'
    },
    title='Distribution of Loans by Condition',
    template="seaborn",
)


**2. Grade Distribution**

Next, we'll create a bar chart to show the distribution of loans based on their grade. This visualization provides insights into how loans are categorized by grade and their frequencies.

In [11]:
# Calculate the distribution of loan grades
grade_counts = loan['grade'].value_counts(sort=False)

# Create a bar chart for grade distribution
px.bar(
    grade_counts,
    x=grade_counts.index,
    y=grade_counts.values,
    labels={
        'grade': 'Grade',
        'y': 'Number of Loans'
    },
    title='Distribution of Loans by Grade',
    template="seaborn",
)

## Financial Analysis

In this section, we will focus on analyzing the financial aspects of loans classified as "Good Loan". This will help us understand the distribution of loan amounts and their variation across different purposes.



In [12]:
# Filter data for 'Good Loan' condition
condition = loan[loan['loan_condition'] == 'Good Loan']

**1. Loan Amount Distribution**

We will create a histogram to display the distribution of loan amounts for loans classified as "Good Loan". This visualization will help us understand the range and frequency of loan amounts within this category.

In [13]:
px.histogram(
    condition,
    x='loan_amount',
    nbins=30,  # Number of bins in the histogram
    color='term',
    title='Loan Amount Distribution by Condition',
    template='seaborn',
    labels={
        'loan_amount': 'Loan Amount',
        'term': 'Loan Term'
    }
)

**2. Loan Amount Distribution by Purpose**

We will use a box plot to show the distribution of loan amounts for each purpose among loans classified as "Good Loan". This will help us see how loan amounts vary across different purposes and whether there are any significant differences.

In [14]:
px.box(
    condition,
    x='purpose',
    y='loan_amount',
    color='term',
    title='Loan Amount Distribution by Purpose',
    template='seaborn',
    labels={
        'loan_amount': 'Loan Amount',
        'term': 'Loan Term',
        'purpose': 'Loan Purpose'
    }
)