#### PROJECT OVERVIEW
Credit Risk Prediction Platform — Tech Reyal Ltd
Tech Reyal Ltd is building a production-ready Credit Risk Prediction Application for fintechs, lenders, credit unions, and BNPL providers to make faster, safer, and more consistent lending decisions. The platform combines probability-of-default modelling, scorecard-style risk bands, policy decisioning, and model explainability to deliver underwriting outcomes that are interpretable, auditable, and operationally usable.
At its core, the solution takes applicant/account data (e.g., affordability, credit behaviour, utilisation, repayment performance, exposure attributes) and produces a compact set of decision outputs for both individual decisions and portfolio-level risk management. The design is modular, allowing institutions to plug in their own data sources, policies, and risk appetite while maintaining transparent governance for model risk and regulatory compliance.

What the Platform Produces Per Applicant / Account
1) Probability of Default (PD)
The model estimates the likelihood that an applicant will default within a defined horizon (e.g., 12 months).
Example output: PD = 3.2%
Used for: approve/decline decisions, risk-based pricing, credit limit setting, and portfolio risk estimation.
2) Risk Score / Scorecard Band
Applicants are mapped to an easily interpretable score or banding system.
Example output: Score = 620 or Band = A–E
Used for: consistent underwriting, segmentation, and policy rules (e.g., minimum band thresholds per product).
3) Decision Recommendation (Approve / Refer / Decline)
A policy engine combines PD and risk bands with affordability and business rules to recommend an action.
Example output: Decision = Refer
Used for: reducing manual review workload, improving speed and consistency, and ensuring standardised decisioning across channels.
4) Expected Loss (EL) and related metrics
The platform computes risk metrics aligned to credit risk frameworks such as IFRS 9:
EL = PD × LGD × EAD (optionally supporting scenario-based stress adjustments).
Used for: provisioning, capital planning, and portfolio strategy.
5) Pricing Inputs (Risk-based pricing)
The service generates pricing guidance from PD/EL, such as a risk premium or recommended APR range.
Example output: Recommended APR uplift = +2.1%
Used for: balancing growth and profitability, reducing adverse selection, and maintaining competitive pricing.
6) Early Warning Signals / Monitoring Flags
For existing accounts, the system identifies deterioration trends and triggers watchlist flags.
Example outputs: “Risk rising”, “Payment stress”, “Utilisation spike”
Used for: proactive collections, targeted customer support, and loss prevention.
7) Explainability / Reason Codes
Each prediction and decision is accompanied by interpretable “reason codes” describing key drivers.
Example outputs: “High utilisation”, “Thin credit file”, “Recent missed payments”
Used for: transparency, regulatory compliance, customer communication, and appeals handling.
8) Portfolio Insights & Stress Testing
Beyond single decisions, the platform aggregates results to show risk concentration and scenario projections.
Example outputs: risk distribution by segment, projected defaults under adverse scenarios, changes in EL under stress.
Used for: risk appetite setting, downturn planning, and strategic portfolio adjustments.


SECTION A: HOME CREDIT DEFAULT RISK SOLUTION 

In [3]:
#import dependencies
import os

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import seaborn as sns


In [4]:
# Outputs (dataset directories)
DATA_DIR = "dataset_selected" 
OUTPUT_DIR = os.path.join(DATA_DIR, "output")  # a subfolder for generated outputs
os.makedirs(OUTPUT_DIR, exist_ok=True)  # create folder if not already present



# Load dimension and fact tables
train_df = pd.read_csv("../home_credit_data/application_train.csv")
train_df.head()

Unnamed: 0,SK_ID_CURR,TARGET,NAME_CONTRACT_TYPE,CODE_GENDER,FLAG_OWN_CAR,FLAG_OWN_REALTY,CNT_CHILDREN,AMT_INCOME_TOTAL,AMT_CREDIT,AMT_ANNUITY,AMT_GOODS_PRICE,NAME_TYPE_SUITE,NAME_INCOME_TYPE,NAME_EDUCATION_TYPE,NAME_FAMILY_STATUS,NAME_HOUSING_TYPE,REGION_POPULATION_RELATIVE,DAYS_BIRTH,DAYS_EMPLOYED,DAYS_REGISTRATION,DAYS_ID_PUBLISH,OWN_CAR_AGE,FLAG_MOBIL,FLAG_EMP_PHONE,FLAG_WORK_PHONE,FLAG_CONT_MOBILE,FLAG_PHONE,FLAG_EMAIL,OCCUPATION_TYPE,CNT_FAM_MEMBERS,REGION_RATING_CLIENT,REGION_RATING_CLIENT_W_CITY,WEEKDAY_APPR_PROCESS_START,HOUR_APPR_PROCESS_START,REG_REGION_NOT_LIVE_REGION,REG_REGION_NOT_WORK_REGION,LIVE_REGION_NOT_WORK_REGION,REG_CITY_NOT_LIVE_CITY,REG_CITY_NOT_WORK_CITY,LIVE_CITY_NOT_WORK_CITY,...,LIVINGAPARTMENTS_MEDI,LIVINGAREA_MEDI,NONLIVINGAPARTMENTS_MEDI,NONLIVINGAREA_MEDI,FONDKAPREMONT_MODE,HOUSETYPE_MODE,TOTALAREA_MODE,WALLSMATERIAL_MODE,EMERGENCYSTATE_MODE,OBS_30_CNT_SOCIAL_CIRCLE,DEF_30_CNT_SOCIAL_CIRCLE,OBS_60_CNT_SOCIAL_CIRCLE,DEF_60_CNT_SOCIAL_CIRCLE,DAYS_LAST_PHONE_CHANGE,FLAG_DOCUMENT_2,FLAG_DOCUMENT_3,FLAG_DOCUMENT_4,FLAG_DOCUMENT_5,FLAG_DOCUMENT_6,FLAG_DOCUMENT_7,FLAG_DOCUMENT_8,FLAG_DOCUMENT_9,FLAG_DOCUMENT_10,FLAG_DOCUMENT_11,FLAG_DOCUMENT_12,FLAG_DOCUMENT_13,FLAG_DOCUMENT_14,FLAG_DOCUMENT_15,FLAG_DOCUMENT_16,FLAG_DOCUMENT_17,FLAG_DOCUMENT_18,FLAG_DOCUMENT_19,FLAG_DOCUMENT_20,FLAG_DOCUMENT_21,AMT_REQ_CREDIT_BUREAU_HOUR,AMT_REQ_CREDIT_BUREAU_DAY,AMT_REQ_CREDIT_BUREAU_WEEK,AMT_REQ_CREDIT_BUREAU_MON,AMT_REQ_CREDIT_BUREAU_QRT,AMT_REQ_CREDIT_BUREAU_YEAR
0,100002,1,Cash loans,M,N,Y,0,202500.0,406597.5,24700.5,351000.0,Unaccompanied,Working,Secondary / secondary special,Single / not married,House / apartment,0.018801,-9461,-637,-3648.0,-2120,,1,1,0,1,1,0,Laborers,1.0,2,2,WEDNESDAY,10,0,0,0,0,0,0,...,0.0205,0.0193,0.0,0.0,reg oper account,block of flats,0.0149,"Stone, brick",No,2.0,2.0,2.0,2.0,-1134.0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,1.0
1,100003,0,Cash loans,F,N,N,0,270000.0,1293502.5,35698.5,1129500.0,Family,State servant,Higher education,Married,House / apartment,0.003541,-16765,-1188,-1186.0,-291,,1,1,0,1,1,0,Core staff,2.0,1,1,MONDAY,11,0,0,0,0,0,0,...,0.0787,0.0558,0.0039,0.01,reg oper account,block of flats,0.0714,Block,No,1.0,0.0,1.0,0.0,-828.0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0
2,100004,0,Revolving loans,M,Y,Y,0,67500.0,135000.0,6750.0,135000.0,Unaccompanied,Working,Secondary / secondary special,Single / not married,House / apartment,0.010032,-19046,-225,-4260.0,-2531,26.0,1,1,1,1,1,0,Laborers,1.0,2,2,MONDAY,9,0,0,0,0,0,0,...,,,,,,,,,,0.0,0.0,0.0,0.0,-815.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0
3,100006,0,Cash loans,F,N,Y,0,135000.0,312682.5,29686.5,297000.0,Unaccompanied,Working,Secondary / secondary special,Civil marriage,House / apartment,0.008019,-19005,-3039,-9833.0,-2437,,1,1,0,1,0,0,Laborers,2.0,2,2,WEDNESDAY,17,0,0,0,0,0,0,...,,,,,,,,,,2.0,0.0,2.0,0.0,-617.0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,,,,,,
4,100007,0,Cash loans,M,N,Y,0,121500.0,513000.0,21865.5,513000.0,Unaccompanied,Working,Secondary / secondary special,Single / not married,House / apartment,0.028663,-19932,-3038,-4311.0,-3458,,1,1,0,1,0,0,Core staff,1.0,2,2,THURSDAY,11,0,0,0,0,1,1,...,,,,,,,,,,0.0,0.0,0.0,0.0,-1106.0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0


In [5]:
# Home credit dataset cell
#load other files 
# Base folder (same pattern you used)
BASE_PATH = "../home_credit_data"

# Load all Home Credit datasets
application_train_df     = pd.read_csv(f"{BASE_PATH}/application_train.csv")
application_test_df      = pd.read_csv(f"{BASE_PATH}/application_test.csv")
bureau_df                = pd.read_csv(f"{BASE_PATH}/bureau.csv")
bureau_balance_df        = pd.read_csv(f"{BASE_PATH}/bureau_balance.csv")
credit_card_balance_df   = pd.read_csv(f"{BASE_PATH}/credit_card_balance.csv")
#homecredit_desc_df       = pd.read_csv(f"{BASE_PATH}/HomeCredit_columns_description.csv")
installments_payments_df = pd.read_csv(f"{BASE_PATH}/installments_payments.csv")
pos_cash_balance_df      = pd.read_csv(f"{BASE_PATH}/POS_CASH_balance.csv")
previous_application_df  = pd.read_csv(f"{BASE_PATH}/previous_application.csv")
sample_submission_df     = pd.read_csv(f"{BASE_PATH}/sample_submission.csv")

In [1]:
from ydata_profiling import ProfileReport

In [8]:
#To create a ydata profiling report, a sample of the train dataset is taken to reduce the time taken for generating the report. The full dataset can be used if time is not a constraint.
sample_df = application_train_df.sample( n=1000, random_state=42)  # Sample 1000 rows for profiling

In [None]:
# for a faster report, we can disable some of the more time-consuming features. The settings below are optimized for speed while still providing useful insights.
profile_sample_train = ProfileReport(
    sample_df,
    title="Application Train Profile (Fast)",
    explorative=False,   # important
    minimal=True,        # very important
    correlations={
        "pearson": {"calculate": False},
        "spearman": {"calculate": False},
        "kendall": {"calculate": False},
        "phi_k": {"calculate": False},
        "cramers": {"calculate": False},
    },
    interactions={"continuous": False},
    duplicates={"head": 0},
    missing_diagrams={"matrix": False, "heatmap": False, "dendrogram": False},
    progress_bar=True
)

profile_sample_train.to_notebook_iframe()



Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]

100%|██████████| 122/122 [00:05<00:00, 22.18it/s]


Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

In [None]:

# running ydata profiling on full dataset (not recommended for large datasets)
# Generate profiling report for all dataset shared in the Home Credit cell
profile = ProfileReport(application_train_df, title="Application Train Data Profiling Report", explorative=True)
profile.to_file(os.path.join(OUTPUT_DIR, "application_train_profile_report.html"))
profile.to_notebook_iframe()

# for the other dataset 
#Generate profiling report for all dataset shared in the Home Credit cell
profile_test = ProfileReport(application_test_df, title="Application Test Data Profiling Report", explorative = True)
profile_test.to_file(os.path.join(OUTPUT_DIR, "application_test_profile_report.html")) 

#Profile report for bureau dataset
profile_bureau = ProfileReport(bureau_df, title="Bureau Data Profiling Report", explorative = True)
profile_bureau.to_file(os.path.join(OUTPUT_DIR, "bureau_profile_report.html"))  

#Profile report for bureau balance dataset
profile_bureau_balance = ProfileReport(bureau_balance_df, title="Bureau Balance Data Profil ing Report", explorative = True)
profile_bureau_balance.to_file(os.path.join(OUTPUT_DIR, "bureau_balance_profile_report.html"))      

#profile report for credit card balance dataset
profile_credit_card_balance = ProfileReport(credit_card_balance_df, title="Credit Card Balance Data Profiling Report", explorative = True)
profile_credit_card_balance.to_file(os.path.join(OUTPUT_DIR, "credit_card_balance_profile_report.html"))    

#profile report for installments payments dataset
profile_installments_payments = ProfileReport(installments_payments_df, title="Installments Payments Data Profiling Report", explorative = True)
profile_installments_payments.to_file(os.path.join(OUTPUT_DIR, "installments_payments_profile_report.html"))           

#profile report for pos cash balance dataset
profile_pos_cash_balance = ProfileReport(pos_cash_balance_df, title="POS Cash Balance Data Profiling Report", explorative = True)
profile_pos_cash_balance.to_file(os.path.join(OUTPUT_DIR, "pos_cash_balance_profile_report.html"))  

#profile report for previous application dataset
profile_previous_application = ProfileReport(previous_application_df, title="Previous Application Data Profiling Report", explorative = True)
profile_previous_application.to_file(os.path.join(OUTPUT_DIR, "previous_application_profile_report.html"))              

      


