In [1]:
# Import necessary libraries
import pandas as pd
from pathlib import Path
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report

---

## Split the Data into Training and Testing Sets

### Step 1: Read the `lending_data.csv` data from the `Resources` folder into a Pandas DataFrame.

In [2]:
# Path to the CSV file
csv_file_path = Path("/mnt/data/lending_data.csv")

# Load data into a DataFrame
lending_data = pd.read_csv(csv_file_path)

# Display the first few rows of the DataFrame
print(lending_data.head())

### Step 2: Create the labels set (`y`)  from the “loan_status” column, and then create the features (`X`) DataFrame from the remaining columns.

In [3]:
# Define the target variable (labels) and features
target = lending_data["loan_status"]
features = lending_data.drop("loan_status", axis=1)

In [4]:
# Show the first few rows of the target variable and features
print(target.head())

In [5]:
# Show the first few rows of the target variable and features
print(features.head())

### Step 3: Split the data into training and testing datasets by using `train_test_split`.

In [7]:
# Split the dataset into training and testing sets
features_train, features_test, target_train, target_test = train_test_split(
    features, target, random_state=1
)

---

## Create a Logistic Regression Model with the Original Data

###  Step 1: Fit a logistic regression model by using the training data (`X_train` and `y_train`).

In [8]:
# Initialize and train the Logistic Regression model
logistic_model = LogisticRegression(random_state=1)
logistic_model.fit(features_train, target_train)

### Step 2: Save the predictions on the testing data labels by using the testing feature data (`X_test`) and the fitted model.

In [9]:
# Predictions using the test set
predictions = logistic_model.predict(features_test)

### Step 3: Evaluate the model’s performance by doing the following:

* Generate a confusion matrix.

* Print the classification report.

In [11]:
# Generate and display the confusion matrix
confusion_mat = confusion_matrix(target_test, predictions)
print(confusion_mat)

In [12]:
# Print the classification report
classification_rep = classification_report(target_test, predictions)
print(classification_rep)

### Step 4: Answer the following question.

**Question:** How well does the logistic regression model predict both the `0` (healthy loan) and `1` (high-risk loan) labels?

**Answer:** In the realm of logistic regression modeling, the performance witnessed in distinguishing 'healthy loans' (class 0) is strikingly high, with the model securing an almost impeccable precision rate of 1.00, derived from a ratio of 18663 correct predictions out of 18719 attempts (18663 true positives and 56 false negatives). This is complemented by a recall of 0.99, where it accurately identified 18663 out of 18865 relevant instances (18663 true positives and 102 false positives), showcasing a robust capacity for consistent and accurate classification in this segment.
Conversely, the model's efficacy in predicting 'high-risk loans' (class 1) does not match this high standard. Here, the precision stands at a notable but comparatively lower 85% (0.85), a figure reached by correctly predicting 563 out of 665 instances (563 true positives and 102 false positives). The recall rate for high-risk loans is more encouraging at 0.91, where the model correctly identified 563 out of 619 instances (563 true positives and 56 false negatives), indicating a reasonably high level of accuracy in pinpointing high-risk scenarios. While the model demonstrates a commendable level of predictive precision, especially in identifying healthy loans, its performance in high-risk loan detection, although still substantial, suggests potential areas for refinement. This differential in performance across categories underlines the model's stronger suit in handling lower-risk scenarios while offering insights into where model tuning could enhance its prediction capabilities in more complex, higher-risk classifications.

---