#### PROJECT OVERVIEW
Credit Risk Prediction Platform — Tech Reyal Ltd
Tech Reyal Ltd is building a production-ready Credit Risk Prediction Application for fintechs, lenders, credit unions, and BNPL providers to make faster, safer, and more consistent lending decisions. The platform combines probability-of-default modelling, scorecard-style risk bands, policy decisioning, and model explainability to deliver underwriting outcomes that are interpretable, auditable, and operationally usable.
At its core, the solution takes applicant/account data (e.g., affordability, credit behaviour, utilisation, repayment performance, exposure attributes) and produces a compact set of decision outputs for both individual decisions and portfolio-level risk management. The design is modular, allowing institutions to plug in their own data sources, policies, and risk appetite while maintaining transparent governance for model risk and regulatory compliance.

What the Platform Produces Per Applicant / Account
1) Probability of Default (PD)
The model estimates the likelihood that an applicant will default within a defined horizon (e.g., 12 months).
Example output: PD = 3.2%
Used for: approve/decline decisions, risk-based pricing, credit limit setting, and portfolio risk estimation.
2) Risk Score / Scorecard Band
Applicants are mapped to an easily interpretable score or banding system.
Example output: Score = 620 or Band = A–E
Used for: consistent underwriting, segmentation, and policy rules (e.g., minimum band thresholds per product).
3) Decision Recommendation (Approve / Refer / Decline)
A policy engine combines PD and risk bands with affordability and business rules to recommend an action.
Example output: Decision = Refer
Used for: reducing manual review workload, improving speed and consistency, and ensuring standardised decisioning across channels.
4) Expected Loss (EL) and related metrics
The platform computes risk metrics aligned to credit risk frameworks such as IFRS 9:
EL = PD × LGD × EAD (optionally supporting scenario-based stress adjustments).
Used for: provisioning, capital planning, and portfolio strategy.
5) Pricing Inputs (Risk-based pricing)
The service generates pricing guidance from PD/EL, such as a risk premium or recommended APR range.
Example output: Recommended APR uplift = +2.1%
Used for: balancing growth and profitability, reducing adverse selection, and maintaining competitive pricing.
6) Early Warning Signals / Monitoring Flags
For existing accounts, the system identifies deterioration trends and triggers watchlist flags.
Example outputs: “Risk rising”, “Payment stress”, “Utilisation spike”
Used for: proactive collections, targeted customer support, and loss prevention.
7) Explainability / Reason Codes
Each prediction and decision is accompanied by interpretable “reason codes” describing key drivers.
Example outputs: “High utilisation”, “Thin credit file”, “Recent missed payments”
Used for: transparency, regulatory compliance, customer communication, and appeals handling.
8) Portfolio Insights & Stress Testing
Beyond single decisions, the platform aggregates results to show risk concentration and scenario projections.
Example outputs: risk distribution by segment, projected defaults under adverse scenarios, changes in EL under stress.
Used for: risk appetite setting, downturn planning, and strategic portfolio adjustments.


SECTION A: HOME CREDIT DEFAULT RISK SOLUTION 

In [2]:
#import dependencies
import os

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import seaborn as sns


In [12]:
# Outputs (dataset directories)
DATA_DIR = "dataset_selected" 
OUTPUT_DIR = os.path.join(DATA_DIR, "output")  # a subfolder for generated outputs
os.makedirs(OUTPUT_DIR, exist_ok=True)  # create folder if not already present



# Load dimension and fact tables
train_df = pd.read_csv("../home_credit_data/application_train.csv")
train_df.head()

Unnamed: 0,SK_ID_CURR,TARGET,NAME_CONTRACT_TYPE,CODE_GENDER,FLAG_OWN_CAR,FLAG_OWN_REALTY,CNT_CHILDREN,AMT_INCOME_TOTAL,AMT_CREDIT,AMT_ANNUITY,...,FLAG_DOCUMENT_18,FLAG_DOCUMENT_19,FLAG_DOCUMENT_20,FLAG_DOCUMENT_21,AMT_REQ_CREDIT_BUREAU_HOUR,AMT_REQ_CREDIT_BUREAU_DAY,AMT_REQ_CREDIT_BUREAU_WEEK,AMT_REQ_CREDIT_BUREAU_MON,AMT_REQ_CREDIT_BUREAU_QRT,AMT_REQ_CREDIT_BUREAU_YEAR
0,100002,1,Cash loans,M,N,Y,0,202500.0,406597.5,24700.5,...,0,0,0,0,0.0,0.0,0.0,0.0,0.0,1.0
1,100003,0,Cash loans,F,N,N,0,270000.0,1293502.5,35698.5,...,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0
2,100004,0,Revolving loans,M,Y,Y,0,67500.0,135000.0,6750.0,...,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0
3,100006,0,Cash loans,F,N,Y,0,135000.0,312682.5,29686.5,...,0,0,0,0,,,,,,
4,100007,0,Cash loans,M,N,Y,0,121500.0,513000.0,21865.5,...,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0


In [14]:
#load other files 
# Base folder (same pattern you used)
BASE_PATH = "../home_credit_data"

# Load all Home Credit datasets
application_train_df     = pd.read_csv(f"{BASE_PATH}/application_train.csv")
application_test_df      = pd.read_csv(f"{BASE_PATH}/application_test.csv")
bureau_df                = pd.read_csv(f"{BASE_PATH}/bureau.csv")
bureau_balance_df        = pd.read_csv(f"{BASE_PATH}/bureau_balance.csv")
credit_card_balance_df   = pd.read_csv(f"{BASE_PATH}/credit_card_balance.csv")
#homecredit_desc_df       = pd.read_csv(f"{BASE_PATH}/HomeCredit_columns_description.csv")
installments_payments_df = pd.read_csv(f"{BASE_PATH}/installments_payments.csv")
pos_cash_balance_df      = pd.read_csv(f"{BASE_PATH}/POS_CASH_balance.csv")
previous_application_df  = pd.read_csv(f"{BASE_PATH}/previous_application.csv")
sample_submission_df     = pd.read_csv(f"{BASE_PATH}/sample_submission.csv")