#### PD Model Monitoring
- A year has passed since I built the Probability of Default (PD), Loss Given Default (LGD) and Exposure at Default (EAD) models, estimated the Expected Loss (EL) of the loans and designed the credit policy. Thus, it is necessary to apply model monitoring.

- **PD Model Monitoring:**
    - Imagine a year has passed since we built our PD model. Although it is very unlikely, the people applying for loans now might be very different from those we used to train our PD model. We need to reassess if our PD model is working well.
    - If the population of the new applicants is too different from the population we used to build the model, the results may be disastrous. In such cases, we need to redevelop the model.
- **Model Maintenance:**
    - The process of assessing the model in light of new data.
    - We do this every six months or every year, for example.
    - We can use the number of applicants as a benchmark. For example, we can redevelop our model after 50,000 or 100,000 new data points.
- **Population Stability Index (PSI):**
    - PSI is used to identify if the characteristics of the new data significantly differ from the original data, potentially indicating the need for model reevaluation or redevelopment. 
    - First population: The original population we used to train our model.
    - Second population: All the new data we get.
    - The idea is to slice a feature (continuous or discrete) into categories (fine classing or coarse classing). Then, assess the distribution of the two population groups across these different categories. The **original** population is called **actual**, while the **new** data is called **expected.**
    - The formula for PSI is defined as: $$ {PSI} = \sum_{i=1}^{k} (\% \text{Actual}_{i} - \% \text{Expected}_{i}) \times \ln\left(\frac{\% \text{Actual}_{i}}{\% \text{Expected}_{i}}\right) $$
        - $ \text{Actual}_{i} $ represents the observed distribution of the variable in question in the original population (the one used to train the model). The subscript $ \text{i} $ indicates that this is specific to a particular category when the variable is sliced into categories (e.g., when performing fine classing or coarse classing).
        - $ \text{Expected}_{i} $ represents the expected distribution of the variable in the new population (the one for which you are assessing stability). Similarly, the subscript $ \text{i} $ indicates that this is specific to the same category.
    - We interpret the PSI values as follows:
        - PSI = 0: No difference between the actual (original data) and expected (new data) populations.
        - PSI < 0.1: Little to no difference between the actual (original data) and expected (new data) populations.
        - 0.1 < PSI < 0.25: There is a slight difference between the actual (original data) and expected (new data) populations. No action is taken.
        - PSI ≥ 0.25: There is a substantial difference between the actual (original data) and expected (new data) populations. Action is taken.
        - PSI = 1: Absolute difference between the actual (original data) and expected (new data) populations. 

- Taking into account all the information discussed above, **I will now load new data**, specifically **loans from 2015 onward**, and **compare** it with the **original data.** This comparison, akin to the **PSI calculation**, involves assessing the actual (original data) versus expected (new data) populations, comparing their **distributions across categories** of each independent and dependent variable. If the **PSI** indicates a **significant difference** between these **populations**, suggesting a shift in applicant characteristics, it will signal the **need** to **redevelop the PD Model.**