

```
# Ce texte est au format code
```
<div>
  <h1 style="font-size: 24px; font-weight: bold;">Credit risk analysis</h1>

<p> To build a predictive model for estimating the probability of default (PD) based on customer characteristics, we can use a popular classification algorithm like Logistic Regression. Logistic Regression is commonly used for binary classification tasks, such as predicting whether a customer will default (1) or not (0) on their loan.
</p>



In [4]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, roc_auc_score, confusion_matrix

# Exporting data
raw_url="https://raw.githubusercontent.com/NaimiMariem/QR_JP_Morgan_Chase/main/Task%203%20and%204_Loan_Data%20(1).csv"
loan_data = pd.read_csv(raw_url)

loan_data.head()



Unnamed: 0,customer_id,credit_lines_outstanding,loan_amt_outstanding,total_debt_outstanding,income,years_employed,fico_score,default
0,8153374,0,5221.545193,3915.471226,78039.38546,5,605,0
1,7442532,5,1958.928726,8228.75252,26648.43525,2,572,1
2,2256073,0,3363.009259,2027.83085,65866.71246,4,602,0
3,4885975,0,4766.648001,2501.730397,74356.88347,5,612,0
4,4700614,1,1345.827718,1768.826187,23448.32631,6,631,0


In [3]:
# Separate features (X) and target variable (y)
X = loan_data.drop(columns=['customer_id', 'default'])
y = loan_data['default']

# Split the data into training and testing sets (80% for training, 20% for testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the Logistic Regression model
log_reg_model = LogisticRegression(random_state=42)

# Train the model on the training data
log_reg_model.fit(X_train, y_train)

# Predict the probability of default on the test set
y_pred_prob = log_reg_model.predict_proba(X_test)[:, 1]  # Probability of default (class 1)

# Assuming a recovery rate of 10%, calculate the expected loss on a loan
recovery_rate = 0.1
expected_loss = (1 - recovery_rate) * y_pred_prob

# Evaluation metrics (optional)
accuracy = accuracy_score(y_test, log_reg_model.predict(X_test))
roc_auc = roc_auc_score(y_test, y_pred_prob)
confusion = confusion_matrix(y_test, log_reg_model.predict(X_test))

# Print the evaluation metrics
print("Accuracy: {:.2f}%".format(accuracy * 100))
print("ROC AUC Score: {:.4f}".format(roc_auc))
print("Confusion Matrix:")
print(confusion)


Accuracy: 98.95%
ROC AUC Score: 0.9992
Confusion Matrix:
[[1647    5]
 [  16  332]]




```
# Ce texte est au format code
```
<p> Additionally, we have included some evaluation metrics to assess the model's performance. The accuracy metric tells us how often the model correctly predicts default or no default. The ROC AUC score measures the model's ability to discriminate between default and non-default cases. The confusion matrix shows the count of true positives, true negatives, false positives, and false negatives.</p>