# Statistical Loan Analysis for Risk Mitigation and Client Solvency.
In the dynamic landscape of financial services, the pursuit of ensuring responsible lending practices and minimizing risk is a perpetual mission. Financial institutions navigate a labyrinth of challenges, but within these challenges lies an exceptional opportunity – the power of statistical analysis.

Statistical analysis serves as the compass guiding organizations towards this opportunity. It's the art of dissecting and interpreting data, employing rigorous quantitative methods to unearth valuable insights and patterns within the vast realm of financial information. Among the plethora of metrics used for risk assessment, one critical indicator stands out - the likelihood of loan repayment.

Predicting a client's creditworthiness is akin to wielding a key that unlocks a world of possibilities. It empowers financial institutions to tailor loan terms with precision, a measure that not only ensures loans are made on proper terms but also minimizes default rates, benefiting both clients and the institution. In essence, it's a pathway to not just optimizing lending practices but revolutionizing financial risk management as a whole.

In this journey towards responsible lending, you play a pivotal role. You are the statistical virtuoso, armed with cutting-edge tools and techniques in statistical analysis. Your mission is to transform raw data into meaningful insights that illuminate the intricate web of client creditworthiness. Through the lens of data analysis, you decode the mysteries of loan repayment, revealing trends and patterns that hold the key to more prudent and sustainable lending decisions.

Working closely with the financial team, you craft compelling data visualizations that bring these insights to life. Your data-driven creations become the guiding stars, steering financial professionals towards sound decision-making, enhanced risk assessment, and more responsible lending practices. While the intricacies of your work may often go unnoticed, its impact reverberates throughout the financial institution.

In the realm of statistical analysis for responsible lending, you are the unsung hero, the one who helps unveil the extraordinary stories of prudent financial decisions and sustainable risk management. Your dedication to data and your ability to transform it into enlightening insights contribute to the ongoing saga of financial excellence, making every client's journey towards financial security that much more extraordinary.

### Module 1
#### Task 1: Analyzing Loan Data.
In the bustling office of the loan department, our dedicated team gathers around a mission: to analyze the loans data, uncovering the stories hidden within it. We embark on this task because we know that responsible lending is the cornerstone of financial stability. By diving into this dataset, we aim to ensure that every loan we extend is based on thorough statistical evaluation, reducing risk and ensuring client success. With pandas and Python by our side, we're equipped to turn this raw data into actionable insights, contributing to a brighter financial future for our clients and our institution.


#### Module 1 - Task 1
-- Description Solution - Load the data.
- Import Pandas and alias it as 'pd'.
- Read the CSV file movies Loans.csv into a Pandas DataFrame named 'df'.
- To import the 'Loans.csv' file, which is located in the root path of your project, you should use the following path: './Loans.csv'.
- Inspect the data by calling the variable 'df'.

In [3]:
# import pandas
import pandas as pd

# Load the dataset
df = pd.read_csv("C:/Users/Abhishek/Downloads/Loans.csv")

#Inspect data
df

Unnamed: 0,ListingNumber,Term,LoanStatus,BorrowerRate,EstimatedEffectiveYield,EstimatedLoss,EstimatedReturn,ProsperRating (Alpha),Occupation,EmploymentStatus,IsBorrowerHomeowner,LoanOriginalAmount,MonthlyLoanPayment,Investors
0,193129,36,Completed,0.1580,,,,,Other,Self-employed,True,9425,330.43,258
1,1209647,36,Current,0.0920,0.07960,0.0249,0.05470,A,Professional,Employed,False,10000,318.93,1
2,81716,36,Completed,0.2750,,,,,Other,Not available,False,3001,123.32,41
3,658116,36,Current,0.0974,0.08490,0.0249,0.06000,A,Skilled Labor,Employed,True,10000,321.45,158
4,909464,36,Current,0.2085,0.18316,0.0925,0.09066,D,Executive,Employed,True,15000,563.97,20
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
113932,753087,36,Current,0.1864,0.16490,0.0699,0.09500,C,Food Service Management,Employed,True,10000,364.74,1
113933,537216,36,FinalPaymentInProgress,0.1110,0.10070,0.0200,0.08070,A,Professional,Employed,True,2000,65.57,22
113934,1069178,60,Current,0.2150,0.18828,0.1025,0.08578,D,Other,Employed,True,10000,273.35,119
113935,539056,60,Completed,0.2605,0.24450,0.0850,0.15950,C,Food Service,Full-time,True,15000,449.55,274


#### Task 2: Pursuing Data Purity.
In our quest for data integrity, we tackle the task of identifying and eliminating duplicates within our loan records. We undertake this mission to ensure the accuracy and reliability of our financial data. By pinpointing and removing duplicates, we create a clean and uncluttered dataset, reducing the risk of erroneous decisions and enhancing our ability to provide clients with transparent and trustworthy financial services. With each duplicate we eradicate, we pave the way for more informed and responsible lending practices, safeguarding the financial futures of our valued clients.

#### Module 1 - Task 2 Description Solution

Finding Duplicates.

- Calculate the number of duplicate rows in the DataFrame 'df' using the duplicated() method and then sum them up using the sum() method..
- Display the total number of duplicate rows, which is stored in the variable 'duplicates'.

In [4]:
# Finding Duplicates
duplicates = df.duplicated().sum()

# Displaying the total number of duplicate rows
print(f'Total number of duplicate rows: {duplicates}')
duplicates

Total number of duplicate rows: 871


871

#### Task 3: Removing Duplicates for Precision Lending.¶
In our relentless pursuit of data excellence, we're on a mission to remove duplicate entries from our loan dataset. We embark on this task to enhance the efficiency and accuracy of our lending operations. By eliminating duplicates, we aim to create a streamlined and error-free database that underpins our commitment to responsible lending. Each duplicate erased ensures that our clients' financial journeys are free from confusion and ambiguity, ultimately contributing to a more seamless and secure lending experience.

Module 1 - Task 3
Description
Solution

Removing Duplicate Rows.
- Apply the drop_duplicates method to 'df' to remove duplicate rows.
- The inplace=True argument is used to modify 'df' in place, which means it will remove duplicates directly from 'df' without the need to assign the result to a new variable.
- After executing this code, 'df' will be updated with the duplicate rows removed.

In [5]:
# Remove duplicates rows
df.drop_duplicates()

# Display the updated DataFrame without duplicate rows
print("Updated DataFrame after removing duplicates:")
print(df)

Updated DataFrame after removing duplicates:
        ListingNumber  Term              LoanStatus  BorrowerRate  \
0              193129    36               Completed        0.1580   
1             1209647    36                 Current        0.0920   
2               81716    36               Completed        0.2750   
3              658116    36                 Current        0.0974   
4              909464    36                 Current        0.2085   
...               ...   ...                     ...           ...   
113932         753087    36                 Current        0.1864   
113933         537216    36  FinalPaymentInProgress        0.1110   
113934        1069178    60                 Current        0.2150   
113935         539056    60               Completed        0.2605   
113936        1140093    36                 Current        0.1039   

        EstimatedEffectiveYield  EstimatedLoss  EstimatedReturn  \
0                           NaN            NaN             

#### Task 4: Addressing Null Values.
In our relentless pursuit of data accuracy, we focus on identifying and rectifying null values within our loan dataset. We undertake this task to ensure that our financial records are complete and reliable. By addressing null values, we aim to provide a comprehensive and trustworthy dataset, enabling more precise lending decisions. Each null value resolved is a step towards greater transparency and accountability in our lending operations, ultimately enhancing the financial well-being of our clients.


Module 1 - Task 4
Description
Solution

Counting Null Values.
- Apply the .isnull() method to 'df' to identify and mark null values, returning a DataFrame with True/False values.
- Use the .sum() method on the result to count the number of null values in each column.
- Store the count of null values in the variable 'null_values'.

In [6]:
# Identify and mark null values
null_values_marked = df.isnull()

# Count the number of null values in each column
null_values = null_values_marked.sum()

# Display the count of null values in each column
print("Null Values in Each Column:")
print(null_values)

Null Values in Each Column:
ListingNumber                  0
Term                           0
LoanStatus                     0
BorrowerRate                   0
EstimatedEffectiveYield    29084
EstimatedLoss              29084
EstimatedReturn            29084
ProsperRating (Alpha)      29084
Occupation                  3588
EmploymentStatus            2255
IsBorrowerHomeowner            0
LoanOriginalAmount             0
MonthlyLoanPayment             0
Investors                      0
dtype: int64


#### Task 5: Ensuring Data Completeness.
In our unwavering commitment to data quality, we're focused on the task of eliminating null values from our loan dataset. We embark on this mission to guarantee that our financial records are robust and complete. By removing null values, we aim to provide a dataset that's reliable and comprehensive, thereby facilitating well-informed lending decisions. Every null value addressed brings us one step closer to a more accurate and dependable foundation for our lending operations, ultimately fortifying the financial stability of our clients.

Module 1 - Task 5
Description
Solution

Removing Rows with Null Values.
- Apply the dropna method to 'df' to remove rows containing null values.
- Use the inplace=True argument to modify 'df' directly by removing rows with null values within 'df' itself.
- After execution, 'df' will be updated with the null value-containing rows removed.

In [7]:
# Removing Rows with Null Values
df.dropna(inplace=True)

# Display the updated DataFrame after the removing rows with null values
print("DataFrame after removing rows with null values:")
print(df)

DataFrame after removing rows with null values:
        ListingNumber  Term              LoanStatus  BorrowerRate  \
1             1209647    36                 Current        0.0920   
3              658116    36                 Current        0.0974   
4              909464    36                 Current        0.2085   
5             1074836    60                 Current        0.1314   
6              750899    36                 Current        0.2712   
...               ...   ...                     ...           ...   
113932         753087    36                 Current        0.1864   
113933         537216    36  FinalPaymentInProgress        0.1110   
113934        1069178    60                 Current        0.2150   
113935         539056    60               Completed        0.2605   
113936        1140093    36                 Current        0.1039   

        EstimatedEffectiveYield  EstimatedLoss  EstimatedReturn  \
1                       0.07960         0.0249          

### Module 2
#### Task 1: Renaming Columns for Clarity.

In our ongoing quest for data clarity, we've embarked on a task to rename columns within our loan dataset. We undertake this mission to enhance the readability and understanding of our data. By assigning more intuitive and informative column names, we aim to facilitate smoother data analysis and interpretation, ultimately enabling us to make more precise lending decisions. Each column renamed is a step towards a dataset that speaks clearly and concisely, guiding us towards a more informed and efficient lending process for the benefit of our clients.

Module 2 - Task 1
Description
Solution

Renaming Columns.
- Create a dictionary named 'namer' that defines the mapping of old column names to new column names.
- Apply the rename method to 'df' with the 'columns' parameter set to 'namer' to rename columns based on the dictionary mapping.
- Use the inplace=True argument to modify 'df' directly, updating the column names as specified in the 'namer' dictionary.
- After execution, 'df' will have its column names changed according to the 'namer' dictionary.

In [8]:
# Renaming Columns
namer = {
    "ListingNumber": "id",
    "Term": "duration",
    "LoanStatus": "status",
    "BorrowerRate": "rate",
    "EstimatedEffectiveYield": "yield",
    "EstimatedLoss": "loss",
    "EstimatedReturn": "return",
    "ProsperRating (Alpha)": "prosper",
    "Occupation": "occupation",
    "EmploymentStatus": "employment",
    "IsBorrowerHomeowner": 'home_owner',
    "LoanOriginalAmount": "loan_amount",
    "MonthlyLoanPayment": "payment",
    "Investors": "investors"
}

# Applying the rename method to 'df' with inplace=True
df.rename(columns=namer, inplace=True)

# Display the DataFrame with renamed columns
print("DataFrame with Renamed Columns:")
print(df)

DataFrame with Renamed Columns:
             id  duration                  status    rate    yield    loss  \
1       1209647        36                 Current  0.0920  0.07960  0.0249   
3        658116        36                 Current  0.0974  0.08490  0.0249   
4        909464        36                 Current  0.2085  0.18316  0.0925   
5       1074836        60                 Current  0.1314  0.11567  0.0449   
6        750899        36                 Current  0.2712  0.23820  0.1275   
...         ...       ...                     ...     ...      ...     ...   
113932   753087        36                 Current  0.1864  0.16490  0.0699   
113933   537216        36  FinalPaymentInProgress  0.1110  0.10070  0.0200   
113934  1069178        60                 Current  0.2150  0.18828  0.1025   
113935   539056        60               Completed  0.2605  0.24450  0.0850   
113936  1140093        36                 Current  0.1039  0.09071  0.0299   

         return prosper        

#### Task 2: Categorizing for Efficiency.
In our quest for streamlined data management, we're working on categorizing specific columns within our loan dataset. We've taken on this task to optimize data storage and speed up data processing. By converting selected columns into categorical data types, we aim to reduce memory usage and accelerate data analysis, ultimately contributing to more efficient lending practices. Each column categorized represents a stride towards a more agile and responsive dataset, equipping us to make quicker and more data-informed lending decisions for the benefit of our clients.

Module 2 - Task 2
Description
Solution

Converting Columns to Categorical Data Type.
- Define a list called 'categories' that contains the column names to be converted to the categorical data type.
- Loop through the 'categories' list to process each column specified in the list.
- Within the loop, use the .astype('category') method to change the data type of each specified column to categorical.
- The loop iterates over each column in 'categories' and converts them to the categorical data type.
- After execution, 'df' will have the specified columns with the categorical data type.

In [9]:
# Converting Columns to Categorical Data Type

# list of columns to be converted to categorical data type
categories = ['status', 'prosper', 'occupation', 'employment']

# Loop through the 'catgories' list to process each column
for column in categories:
    df[column] = df[column].astype('category')
    
# Display the DataFrame with converted categorical columns
print("DataFrame with Catgorical Columns:")
print(df)

DataFrame with Catgorical Columns:
             id  duration                  status    rate    yield    loss  \
1       1209647        36                 Current  0.0920  0.07960  0.0249   
3        658116        36                 Current  0.0974  0.08490  0.0249   
4        909464        36                 Current  0.2085  0.18316  0.0925   
5       1074836        60                 Current  0.1314  0.11567  0.0449   
6        750899        36                 Current  0.2712  0.23820  0.1275   
...         ...       ...                     ...     ...      ...     ...   
113932   753087        36                 Current  0.1864  0.16490  0.0699   
113933   537216        36  FinalPaymentInProgress  0.1110  0.10070  0.0200   
113934  1069178        60                 Current  0.2150  0.18828  0.1025   
113935   539056        60               Completed  0.2605  0.24450  0.0850   
113936  1140093        36                 Current  0.1039  0.09071  0.0299   

         return prosper     

#### Task 3: Archiving the Insights.
In our pursuit of data preservation and accessibility, we're executing a task to save our loan dataset in a CSV file. We've taken on this mission to create a well-organized data archive for future reference and analysis. By exporting the dataset to a CSV file, we ensure that our valuable insights and lending history are securely stored, facilitating easy retrieval and analysis. Each dataset saved represents a proactive step towards a more data-resilient and informative lending operation, helping us maintain a historical record of our financial journey for the benefit of our clients and institution.

Module 2 - Task 3
Description
Solution

Exporting a Pandas DataFrame to a CSV File.
- Prepare to save the Pandas DataFrame 'df' to a CSV file
- Use the to_csv method with the file name 'loans_data.csv' as the argument to specify the target file.
- Set index=False to exclude writing the DataFrame's index to the CSV file.
- The to_csv method will export the contents of 'df' to the specified CSV file, 'loans_data.csv'.
- After execution, the data in 'df' will be saved to a CSV file without the index.
- The CSV file, 'loans_data.csv', is created in the current working directory.

In [10]:
# Exporting a Pandas DataFrame to a CSV File
csv_filename = 'loans_data.csv'

# Save the DataFrame to a CSV file
df.to_csv(csv_filename, index=False)

print(f"DataFrame exported succesfully to {csv_filename}.")

DataFrame exported succesfully to loans_data.csv.


#### Task 4: Data Download, Import, and Database Connection.

Module 2 - Task 4
Description
Solution

Data Download, Import, and Database Connection.
- Download the dataset loans_data.csv which is exported in Module 2 - Task 3.
- Create the table on MYSQL using your credentials provided here
- Use the provided login information to access the database by clicking the ""localhost"" link located on the Database Info tab. - Once there, you need to upload the required datasets in the specific database mentioned in the database info tab. Rename the table to 'loans_data' using the Operations tab within the database interface and then click on ""Run test"" to complete the task.
- Use the %load_ext sql command to load the SQL extension in your Jupyter Notebook environment. This extension allows you to run - SQL commands directly within your notebook.
- Use the %sql magic command to specify the connection string for your MySQL database. Replace <user>, <password>, and <db_name> with your actual database credentials and details.

In [11]:
import pymysql

# Database credentials
username = 'root'
password = 'A2908@bhi'
host = 'localhost'
port = 3306
database = 'sql_python_eda'

try:
    # Create a connection to the MySQL server
    connection = pymysql.connect(
        host=host,
        user=username,
        password=password,
        port=port,
        database=database,
        cursorclass=pymysql.cursors.DictCursor
    )

    # Create a cursor object
    cursor = connection.cursor()

    # Test the connection by running a simple query
    cursor.execute("SELECT 1")

    # Fetch the result
    result = cursor.fetchone()

    # Print the result
    print(result)

except Exception as e:
    print(f"Error: {e}")

finally:
    # Close the cursor and connection
    if cursor:
        cursor.close()
    if connection:
        connection.close()


{'1': 1}


### Module 3
#### Task 1: A Glimpse into the World of Loans.
In our endeavor to understand the depth of our loan dataset, we execute a simple yet essential task – counting the records. We undertake this mission to gain insights into the scale of our lending operations and the volume of data at our disposal. By counting the records, we obtain a clear picture of our dataset's size and potential. Each count we perform contributes to a better understanding of our data, paving the way for more informed decision-making and strategic planning in the realm of lending.

Module 3 - Task 1
Description
Solution

Counting Rows in a Database Table.
- Execute a SQL query using the 'SELECT' statement.
- Use 'COUNT(*)' as the expression within the 'SELECT' statement.
- The query aims to count the total number of rows in a 'loans_data'.

In [15]:
import pymysql

# Database credentials
username = 'root'
password = 'A2908@bhi'
host = 'localhost'
port = 3306
database = 'sql_python_eda'

try:
    # Create a connection to the MySQL server
    connection = pymysql.connect(
        host=host,
        user=username,
        password=password,
        port=port,
        database=database,
        cursorclass=pymysql.cursors.DictCursor
    )

    # Create a cursor object
    cursor = connection.cursor()

    # SQL statement to count rows in the 'loans_data' table
    count_rows_query = "SELECT COUNT(*) FROM loans_data"

    # Execute the query
    cursor.execute(count_rows_query)

    # Fetch the result
    result = cursor.fetchone()

    # Print the count
    print(f"Number of rows in the 'loans_data' table: {result['COUNT(*)']}")

except Exception as e:
    print(f"Error: {e}")


Number of rows in the 'loans_data' table: 83520


#### Task 2: Profiling Loan Data.
We perform this mission to gain a deeper understanding of interest rates and loan amounts. By selecting, counting, and aggregating specific data points within a defined interest rate range, we extract valuable statistics. Each calculated metric sheds light on the lending landscape, offering us a comprehensive view of interest rate dynamics and loan amount variations. This in-depth analysis equips us to make more informed decisions, set competitive interest rates, and tailor loan amounts effectively for our clients' financial success.

Module 3 - Task 2
Description
Solution

Calculate Loan Data Statistics.
- Use the 'SELECT' statement to retrieve specific statistics from a database.
- Calculate and display the following statistics: 'loan_count', 'average_interest_rate', 'min_interest_rate', 'max_interest_rate', 'average_loan_amount', 'min_loan_amount', 'max_loan_amount'
- These calculations are performed on data from the 'loans_data' table.
- Apply a 'WHERE' clause to filter data where the 'rate' column is between 0.06 and 0.26.

In [17]:
try:
    # Create a cursor object
    cursor = connection.cursor()
    
    # SQL statement to calculate loan data statistics
    statistics_query = """
    SELECT
        COUNT(*) AS loan_count,
        AVG(rate) AS average_interest_rate,
        MIN(rate) AS min_interest_rate,
        MAX(rate) AS max_interest_rate,
        AVG(loan_amount) AS average_loan_amount,
        MIN(loan_amount) AS min_loan_amount,
        MAX(loan_amount) AS max_loan_amount
    FROM
        loans_data
    WHERE
        rate BETWEEN 0.06 AND 0.26
    """

    # Execute the query
    cursor.execute(statistics_query)
    
    # Fetch the result
    result = cursor.fetchone()
    
    # print the statistics
    print(f"Loan Count: {result['loan_count']}")
    print(f"Average Interest Rate: {result['average_interest_rate']:.2%}")
    print(f"Min Interest Rate: {result['min_interest_rate']:.2%}")
    print(f"Max Interest Rate: {result['max_interest_rate']:.2%}")
    print(f"Average Loan Amount: ${result['average_loan_amount']:.2f}")
    print(f"Min Loan Amount: ${result['min_loan_amount']:.2f}")
    print(f"Max Loan Amount: ${result['max_loan_amount']:.2f}")
    
except Exception as e:
    print(f"Error: {e}")

Loan Count: 63574
Average Interest Rate: 16.44%
Min Interest Rate: 6.00%
Max Interest Rate: 26.00%
Average Loan Amount: $10497.44
Min Loan Amount: $1000.00
Max Loan Amount: $35000.00


#### Task 3: Navigating Loan Metrics.
Our mission is to explore the intricate details of our loan data through a structured query. We aim to extract valuable insights into the interest rates and loan amounts within a specific range. By calculating statistics like loan count, average interest rate, and loan amount extremes, we gain a comprehensive understanding of our lending practices. This analysis empowers us to make data-driven decisions, ensuring competitive interest rates and optimized loan amounts for our clients, enhancing their financial experience and ensuring sound lending practices.

Module 3 - Task 3
Description
Solution

Aggregating Loan Amounts by Employment Status.
- Use the 'SELECT' statement to retrieve information regarding employment and the total loan amount for each employment status.
- Calculate the total loan amount for each employment category using the 'SUM' function and alias it as 'total_loan'.
- Group the results by the 'employment' column using the 'GROUP BY' clause.
- Arrange the results in ascending order based on the 'employment' column using the 'ORDER BY' clause.

In [20]:
try:
    # Create a cursor object
    cursor = connection.cursor()
    
    # SQL Statement to aggregate loan amounts by employment status
    loan_amount_query = """
    SELECT
        employment,
        SUM(loan_amount) AS total_loan
    FROM
        loans_data
    GROUP BY
        employment
    ORDER BY
        employment ASC
    """
    
    # Execute the query
    cursor.execute(loan_amount_query)
    
    # Fetch the result
    result = cursor.fetchall()
    
    # Print the aggregated loan amounts by employment status
    print("Employment Status | Total Loan Amount")
    print("-------------------------------------")
    for row in result:
        print(f"{row['employment']} | ${row['total_loan']:.2f}")
    
except Exception as e:
    print(f"Error:{e}")
    

Employment Status | Total Loan Amount
-------------------------------------
Employed | $659279092.00
Full-time | $43436751.00
Not employed | $3421628.00
Other | $15713206.00
Part-time | $861148.00
Retired | $1661185.00
Self-employed | $35971641.00


#### Task 4: Charting the Loan Landscape.
In our mission to gain a comprehensive view of our lending practices, we embark on a task to group and count loans based on their duration and status. By employing SQL to organize this data, we uncover patterns and trends within our loan portfolio. The grouped data allows us to better understand the distribution of loans across different durations and statuses. This insight supports more informed decision-making and strategic planning, helping us tailor our lending offerings to align with the needs of our clients.

Module 3 - Task 4
Description
Solution

Counting Loans by Duration and Status.
- Use the 'SELECT' statement to retrieve specific columns: 'duration', 'status', and count of loans, aliased as 'loan_count'.
- The 'GROUP BY' clause organizes the results by the 'duration' and 'status' columns.
- Apply the 'ORDER BY' clause to sort the results in ascending order first by 'duration' and then by 'status'.

In [21]:
try:
    # Create a cursor object
    cursor = connection.cursor()
    
    # SQL statement to count loans by duration and status
    loan_count_query = """
    SELECT
        duration,
        status,
        COUNT(*) AS loan_count
    FROM
        loans_data
    GROUP BY
        duration, status
    ORDER BY
        duration ASC,
        status ASC
    """
    
    # Execute the Query
    cursor.execute(loan_count_query)
    
    # Fetch the result
    result = cursor.fetchall()
    
    # Print the loan count by duration and status
    print("Duration | Status | Loan Count")
    print("------------------------------")
    for row in result:
        print(f"{row['duration']} | {row['status']} | {row['loan_count']}")
       
    
except Exception as e:
    print(f"Error:{e}")
    

Duration | Status | Loan Count
------------------------------
12 | Chargedoff | 72
12 | Completed | 1449
12 | Current | 62
12 | Defaulted | 10
12 | FinalPaymentInProgress | 10
12 | Past Due (1-15 days) | 3
12 | Past Due (16-30 days) | 3
12 | Past Due (31-60 days) | 1
12 | Past Due (61-90 days) | 2
12 | Past Due (91-120 days) | 1
36 | Chargedoff | 4178
36 | Completed | 15778
36 | Current | 35340
36 | Defaulted | 809
36 | FinalPaymentInProgress | 155
36 | Past Due (>120 days) | 9
36 | Past Due (1-15 days) | 547
36 | Past Due (16-30 days) | 173
36 | Past Due (31-60 days) | 234
36 | Past Due (61-90 days) | 205
36 | Past Due (91-120 days) | 193
60 | Chargedoff | 1086
60 | Completed | 2424
60 | Current | 19870
60 | Defaulted | 186
60 | FinalPaymentInProgress | 38
60 | Past Due (>120 days) | 7
60 | Past Due (1-15 days) | 248
60 | Past Due (16-30 days) | 88
60 | Past Due (31-60 days) | 123
60 | Past Due (61-90 days) | 106
60 | Past Due (91-120 days) | 110


#### Task 5: Employability and Interest Rates.¶
Our task revolves around examining the relationship between employment status and interest rates within our loan dataset. We perform this task to identify how employment influences borrowing costs and to understand the distribution of loans among different employment categories. Through SQL's grouping and aggregation, we gain insights into average interest rates across various employment statuses. This analysis equips us to tailor our lending practices, offer competitive rates, and serve clients effectively based on their employment situation, ultimately contributing to a more financially inclusive and responsive lending approach.

Module 3 - Task 5
Description
Solution

Analyzing Loans by Employment Status.
- Use the 'SELECT' statement to retrieve columns 'employment', average of the 'rate' column (aliased as 'average_interest_rate'), and count of loans (aliased as 'loan_count').
- The 'GROUP BY' clause organizes the results by the 'employment' column.
- Apply the 'ORDER BY' clause to sort the results in ascending order based on the 'employment' column.

In [22]:
try:
    # Create a cursor object
    cursor = connection.cursor()
    
    # SQL statement to analyze loans by employment status
    employment_analysis_query = """
    SELECT 
        employment,
        AVG(rate) AS average_interest_rate,
        COUNT(*) AS loan_count
    FROM 
        loans_data
    GROUP BY
        employment
    ORDER BY 
        employment ASC
    """
    
    # Execute the query
    cursor.execute(employment_analysis_query)
    
    # Fetch the result
    result = cursor.fetchall()
    
    # Print the analysis results
    print("Employment Status | Average Interest Rate | Loan Count")
    print("------------------------------------------------------")
    for row in result:
        print(f"{row['employment']} | {row['average_interest_rate']} | {row['loan_count']}")
        

except Exception as e:
    print(f"Error:{e}")

Employment Status | Average Interest Rate | Loan Count
------------------------------------------------------
Employed | 0.19279335611349158 | 67310
Full-time | 0.19949605097148723 | 7926
Not employed | 0.26135161787365224 | 649
Other | 0.23114971705739634 | 2474
Part-time | 0.21275976562500007 | 256
Retired | 0.2167269754768392 | 367
Self-employed | 0.21106284706919154 | 4538


#### Task 6: Homeownership and Interest Rates.
Our task is to delve into the connection between homeownership and interest rates within our loan dataset. We undertake this mission to discern how owning a home impacts borrowing costs and to assess the distribution of loans among homeowners and non-homeowners. By utilizing SQL to group and aggregate this data, we gain insights into average interest rates for these distinct groups. This analysis enables us to refine our lending strategies, offering competitive rates tailored to clients' homeownership status, fostering a more inclusive and client-centric approach to financial services.

Module 3 - Task 6
Description
Solution

Analyzing Loans by Home Ownership.
- Use the 'SELECT' statement to retrieve columns 'home_owner', average of the 'rate' column (aliased as 'average_interest_rate'), and count of loans (aliased as 'loan_count').
- The 'GROUP BY' clause organizes the results by the 'home_owner' column.
- Apply the 'ORDER BY' clause to sort the results in ascending order based on the 'home_owner' column.

In [28]:
# Create a cursor object
cursor = connection.cursor()

# SQL statement to analyze loans by home ownership
homeownership_analysis_query = """
SELECT
    home_owner,
    AVG(rate) AS average_interest_rate,
    COUNT(*) AS loan_count
FROM
    loans_data
GROUP BY
    home_owner
ORDER BY
    home_owner ASC
"""

# Execute the query
cursor.execute(homeownership_analysis_query)

# Fetch the result
result = cursor.fetchall()

# Print the analysis results
print("Homeownership Status | Average Interest Rate | Loan Count")
print("---------------------------------------------------------")
for row in result:
    homeownership_status = 'Homeowner' if row['home_owner'] == 1 else 'Non-Homeowner'
    print(f"{homeownership_status} | {row['average_interest_rate']} | {row['loan_count']}")

Homeownership Status | Average Interest Rate | Loan Count
---------------------------------------------------------
Non-Homeowner | 0.206214786150706 | 39280
Non-Homeowner | 0.18741636075948645 | 44240


#### Task 7: Unpacking Prosper Ratings.
Our mission is to dissect the impact of Prosper ratings on interest rates within our loan dataset. We embark on this task to uncover how creditworthiness influences borrowing costs and to understand the loan distribution across different Prosper rating categories. Through SQL's grouping and aggregation, we extract insights into average interest rates for each rating category. This analysis equips us to fine-tune our lending practices, providing competitive rates tailored to clients' credit profiles, enhancing financial accessibility, and ensuring a more personalized approach to lending.

Module 3 - Task 7
Description
Solution

Analyzing Loans by Prosper Rating.
- Use the 'SELECT' statement to retrieve columns 'prosper', average of the rate column (aliased as 'average_interest_rate'), and count of loans (aliased as 'loan_count').
- The 'GROUP BY' clause organizes the results by the 'prosper' rating.
- Apply the 'ORDER BY' clause to sort the results in ascending order based on the 'prosper' rating.

In [30]:
# Create a cursor object
cursor = connection.cursor()

# SQL Statement to analyze loans by Prosper rating
prosper_rating_analysis_query = """
SELECT
    prosper,
    AVG(rate) AS average_interest_rate,
    COUNT(*) AS loan_count
FROM
    loans_data
GROUP BY
    prosper
ORDER BY
    prosper ASC
"""

# Execute the query
cursor.execute(prosper_rating_analysis_query)

# Fetch the result
result = cursor.fetchall()

# Print the analysis results
print("Prosper Rating | Average Interest Rate | Loan Count")
print("---------------------------------------------------")
for row in result:
    print(f"{row['prosper']} | {row['average_interest_rate']} | {row['loan_count']}")

Prosper Rating | Average Interest Rate | Loan Count
---------------------------------------------------
A | 0.11296206319313615 | 14337
AA | 0.0791682211357666 | 5318
B | 0.15464272946702576 | 15329
C | 0.19474814546670072 | 17956
D | 0.24676870250692226 | 14081
E | 0.2936861033156601 | 9621
HR | 0.3174095231171868 | 6878


#### Task 8: Loan Amounts and Monthly Payments.
Our task revolves around investigating the relationship between loan amounts and monthly payments within our loan dataset. We embark on this mission to comprehend how the size of loans impacts the monthly financial commitment for borrowers. By using SQL's grouping and aggregation, we gain insights into the average monthly payments for various loan amount categories. This analysis empowers us to align our lending strategies, ensuring that loan terms correspond with clients' financial capabilities, ultimately fostering responsible lending and financial well-being.

Module 3 - Task 8
Description
Solution

Analyzing Loans by Loan Amount.
- Use the 'SELECT' statement to retrieve columns 'loan_amount', average of the payment column (aliased as 'average_payment'), and count of loans (aliased as 'loan_count').
- The 'GROUP BY' clause organizes the results by the 'loan_amount' column.
- Apply the 'ORDER BY' clause to sort the results in ascending order based on the 'loan_amount' column.

In [32]:
# Create a cursor object
cursor = connection.cursor()

# SQL statements to analyze loans by loan amount
loan_amount_analysis_query = """
SELECT
    loan_amount,
    AVG(payment) AS average_payment,
    COUNT(*) AS loan_count
FROM
    loans_data
GROUP BY
    loan_amount
ORDER BY
    loan_amount ASC
""" 

# Exectue the query
cursor.execute(loan_amount_analysis_query)

# Fetch the result
result = cursor.fetchall()

# Print the analysis results
print("Loan Amount | Average Payment | Loan Count")
print("------------------------------------------")
for row in result:
    print(f"{row['loan_amount']} | {row['average_payment']} | {row['loan_count']}")

Loan Amount | Average Payment | Loan Count
------------------------------------------
1000 | 35.279500657030304 | 761
1050 | 30.627142857142854 | 7
1080 | 26.09 | 1
1099 | 38.13 | 1
1100 | 37.367560975609756 | 41
1112 | 34.39 | 1
1125 | 42.17 | 2
1150 | 46.0925 | 4
1175 | 42.48 | 1
1190 | 44.76 | 1
1200 | 42.41845454545454 | 110
1215 | 48.95 | 1
1234 | 37.46 | 1
1250 | 49.955238095238094 | 21
1275 | 40.68 | 1
1300 | 45.07421052631578 | 38
1315 | 48.49 | 1
1336 | 54.9 | 1
1350 | 48.044 | 5
1364 | 60.09 | 1
1390 | 62.88 | 1
1395 | 52.56 | 1
1400 | 51.5564 | 25
1422 | 53.57 | 1
1424 | 0.0 | 1
1425 | 0.0 | 1
1450 | 55.695 | 4
1475 | 61.144999999999996 | 2
1477 | 58.73 | 1
1484 | 59.7 | 1
1500 | 55.605410526315666 | 475
1550 | 42.0175 | 4
1575 | 58.45 | 1
1577 | 62.13 | 1
1600 | 53.935714285714305 | 35
1650 | 61.686666666666675 | 6
1666 | 51.52 | 1
1667 | 55.14 | 1
1675 | 75.77 | 1
1680 | 51.68 | 1
1699 | 65.045 | 2
1700 | 64.74206896551723 | 29
1725 | 55.34 | 1
1750 | 69.84909090909092 | 2

#### Task 9: Examining Interest Rates and Lending Dynamics.
Our task centers on exploring the connection between the number of investors and interest rates within our loan dataset. We undertake this mission to unravel how investor involvement influences borrowing costs and to gain a better understanding of the loan distribution across various investor scenarios. By employing SQL's grouping and aggregation, we extract insights into the average interest rates associated with different levels of investor participation. This analysis empowers us to fine-tune our lending strategies, ensuring competitive rates that cater to the preferences of investors, ultimately enhancing our lending practices and fostering a thriving financial ecosystem.

Module 3 - Task 9
Description
Solution

Analyzing Loans by Number of Investors.
- Use the 'SELECT' statement to retrieve columns 'investors', average of the rate column (aliased as 'average_interest_rate'), and count of loans (aliased as 'loan_count').
- The 'GROUP BY' clause organizes the results based on the number of investors.
- Apply the 'ORDER BY' clause to sort the results in ascending order according to the number of investors.

In [37]:
# Create a cursor object
cursor = connection.cursor()

 # SQL statement to examine the connection between the number of investors and interest rates
investor_analysis_query = """
SELECT
    investors,
    AVG(rate) AS average_interest_rate,
    COUNT(*) AS loan_count
FROM
    loans_data
GROUP BY
    investors
ORDER BY
    investors ASC
"""

# Execute the query
cursor.execute(investor_analysis_query)

# Fetch the result
result = cursor.fetchall()

# Print the Analysis results
print("Number of Investors | Average Interest Rate | Loan Count")
print("--------------------------------------------------------")
for row in result:
    print(f"{row['investors']} | {row['average_interest_rate']} | {row['loan_count']}")    


Number of Investors | Average Interest Rate | Loan Count
--------------------------------------------------------
1 | 0.17278270719275193 | 26485
2 | 0.22564207879295925 | 1193
3 | 0.23990162790697672 | 860
4 | 0.24437481804949054 | 687
5 | 0.25135049019607875 | 612
6 | 0.2550584369449381 | 563
7 | 0.25425490909090903 | 550
8 | 0.2574528548123985 | 613
9 | 0.25694011090573016 | 541
10 | 0.25823295019157133 | 522
11 | 0.2533039370078745 | 508
12 | 0.2520052391799545 | 439
13 | 0.2519378318584077 | 452
14 | 0.2445738386308069 | 409
15 | 0.2449839901477836 | 406
16 | 0.2471727722772278 | 404
17 | 0.24513677884615384 | 416
18 | 0.23976902173913067 | 368
19 | 0.24287980535279832 | 411
20 | 0.24075906432748534 | 342
21 | 0.23709077669902953 | 412
22 | 0.24265768261964768 | 397
23 | 0.2448400503778341 | 397
24 | 0.24530256410256424 | 390
25 | 0.2408594724220626 | 417
26 | 0.24385814479638046 | 442
27 | 0.2446089086859691 | 449
28 | 0.244797777777778 | 405
29 | 0.2476200968523004 | 413
30 | 0.


#### Task 10: Loan Durations and Return Rates.
Our mission is to delve into the correlation between loan durations and return rates within our loan dataset. We embark on this task to comprehend how the duration of loans impacts the returns for investors, and to assess the distribution of loans across different timeframes. Through SQL's grouping and aggregation, we gain insights into the average return rates for loans of varying durations. This analysis equips us to refine our lending and investment strategies, offering attractive opportunities that align with investors' preferences, ultimately contributing to a more informed and rewarding financial environment.

Module 3 - Task 10
Description
Solution

Analyzing Loans by Duration and Return Rate.
- Use the 'SELECT' statement to retrieve columns 'duration', average of the 'return' column (aliased as 'average_return_rate'), and count of loans (aliased as 'loan_count').
- The 'GROUP BY' clause organizes the results based on the 'duration' column.
- Apply the 'ORDER BY' clause to sort the results in ascending order according to the 'duration'.

In [41]:
# Create a cursor object
cursor = connection.cursor()

# SQL Statements to analyze loans by duration and return rate
duration_return_analysis_query = """
SELECT
    duration,
    AVG(`return`) AS average_return_rate,
    COUNT(*) AS loan_count
FROM 
    loans_data
GROUP BY
    duration
ORDER BY
    duration ASC
"""

# Execute the query
cursor.execute(duration_return_analysis_query)

# Fetch the result
result = cursor.fetchall()

# Print the analysis results
print("Loan Duration | Average Return Rate | Loan Count")
print("------------------------------------------------")
for row in result:
    print(f"{row['duration']} | {row['average_return_rate']} | {row['loan_count']}")

Loan Duration | Average Return Rate | Loan Count
------------------------------------------------
12 | 0.0606798512089276 | 1613
36 | 0.09501932125440683 | 57621
60 | 0.1017076830272576 | 24286


#### Task 11: Prosper Ratings and Return Rates.
Our mission is to decipher the relationship between Prosper ratings and return rates within our loan dataset. We undertake this task to understand how creditworthiness influences the investment returns for our stakeholders and to evaluate the distribution of loans across different Prosper rating categories. Utilizing SQL's grouping and aggregation, we extract insights into the average return rates for each rating category. This analysis empowers us to tailor our investment strategies, offering attractive opportunities that match investors' risk preferences, fostering a more rewarding and informed investment landscape.

Module 3 - Task 11
Description
Solution

Analyzing Loans by Prosper Rating and Return Rate.
- Use the 'SELECT' statement to retrieve columns 'prosper,' average of the return column (aliased as 'average_return_rate'), and count of loans (aliased as 'loan_count').
- The 'GROUP BY' clause organizes the results based on the 'prosper' rating.
- Apply the 'ORDER BY' clause to sort the results in ascending order according to the 'prosper' rating.

In [42]:
# Create a cursor object
cursor = connection.cursor()

# SQL Statements to analyze loans by Prosper rating and return 
prosper_return_analysis_query = """
SELECT
    prosper,
    AVG(`return`) AS average_return_rate,
    COUNT(*) AS loan_count
FROM 
    loans_data
GROUP BY
    prosper
ORDER BY
    prosper ASC
"""

# Execute the query
cursor.execute(prosper_return_analysis_query)

# Fetch the result
result = cursor.fetchall()

# Print the analysis result
print("Prosper Rating | Average Return Rate | Loan Count")
print("-------------------------------------------------")
for row in result:
    print(f"{row['prosper']} | {row['average_return_rate']} | {row['loan_count']}")

Prosper Rating | Average Return Rate | Loan Count
-------------------------------------------------
A | 0.06971953546767058 | 14337
AA | 0.05405663971417852 | 5318
B | 0.08647714071368288 | 15329
C | 0.09845004121185354 | 17956
D | 0.11912750798948674 | 14081
E | 0.12509241658871276 | 9621
HR | 0.11365075603372882 | 6878
