https://medium.com/@juanmi.gutierrez/quantile-mapping-bias-correction-63ed01d5a618

In [9]:
import numpy as np
from scipy.stats import norm, gamma, erlang, expon, percentileofscore

In [10]:
# Data
u1, s1 = 100,20
bias = 20
bias_d = 2
n = 100
q_ = 20
dist1 = norm(loc=u1, scale=s1)
dist2 = norm(loc=u1+bias_d, scale=s1)

quantiles = [round(x*0.05,2) for x in range(1,q_)]   # calculates quantiles ranging from 0.05 to 0.95 (inclusive) in increments of 0.05. These quantiles are used to compare distribution characteristics at various points.

q_dist1 = [dist1.ppf(x*0.05) for x in range(1,q_)] # lists of quantile values for dist1 
q_dist2 = [dist2.ppf(x*0.05) for x in range(1,q_)]

# Distribution real sample
ref_dataset = np.random.normal(u1,s1,n).round(2)
# Sub-estimated
model_present = (q_dist1 - np.random.normal(bias,2,q_-1)).round(2) # This represents a scenario where the model systematically underestimates or overestimates the true quantile values due to a bias.
# Future model with the same bias
model_future = (q_dist2 - np.random.normal(bias,2,q_-1)).round(2)

The eQM Delta-proportion method assumes that the proportional difference between the downscaled and observed value during the period of observed data applies as a systematic bias to the future period as well.

In [11]:
def eQM_porcentual_delta(ref_dataset, model_present, model_future):
        """
        Remove the biases for each quantile value taking the difference between 
        ref_dataset and model_present at each percentile as a kind of systematic bias (delta)
        and add them to model_future at the same percentile.

        returns: downscaled model_present and model_future        
        """
   
        model_present_corrected = np.zeros(model_present.size)  
        model_future_corrected = np.zeros(model_future.size)

        for ival, model_value in enumerate(model_present):  #index and the value of each item in mdel_present
            percentile = percentileofscore(model_present, model_value)  # Returns the percentile (0-100) where the model_value falls within model_present
            percentile_ref = np.percentile(ref_dataset, percentile) # search for the corresponding val in the ref_dataset
            dif = (percentile_ref - model_value)/model_value   #represents the percentage error or bias at this percentile.
            model_present_corrected[ival] = model_value*(1+dif)
            model_future_corrected[ival] = model_future[ival]*(1+dif)   # adjust the value si dif<0 the scale the val down, sinn scale it up
            
        return model_present_corrected, model_future_corrected

In [12]:
model_present_corrected, model_future_corrected = eQM_porcentual_delta(ref_dataset, model_present, model_future)

To understand the effectiveness of the `eQM_porcentual_delta` function (which stands for "Empirical Quantile Matching with percentage delta correction") and the subsequent analysis better, let's delve into the details of what the function aims to achieve and how the effectiveness of this bias correction is evaluated:

### Objective of eQM_porcentual_delta

The primary goal of `eQM_porcentual_delta` is to correct biases in predictive models by aligning their predictions closer to the actual observed data. It does this by:
- Identifying the systematic bias at each quantile between the model predictions and the actual data.
- Correcting this bias for each quantile prediction in both the present and future model predictions.

### How Bias Correction is Performed

For each quantile in the `model_present`, the function calculates the percentage difference between the model's prediction and the actual value at the same quantile in the `ref_dataset`. This difference is considered a systematic bias. The function then corrects both the present and future model predictions by adjusting them based on this calculated bias.

### Evaluation of Effectiveness

The effectiveness of this correction is evaluated by comparing the empirical quantiles of the `ref_dataset` with the theoretical quantiles (expected from the model predictions) both before and after correction. The steps include:

1. **Calculating Empirical Quantiles:**
   - For each predicted quantile in the model (both original and corrected), count how many values in the `ref_dataset` are less than or equal to this predicted value.
   - The empirical quantile is the proportion of the reference dataset that falls below the predicted quantile value.

2. **Comparing Empirical and Theoretical Quantiles:**
   - Theoretical quantiles are the expected positions of each quantile based on the model. For example, if the model predicts the 20th percentile, the theoretical quantile is 0.2.
   - The empirical quantile is derived from the actual data (how data aligns with the model's prediction).
   - Differences between theoretical and empirical quantiles are calculated to assess how well the model predictions align with the actual data.

3. **Analyzing the Differences:**
   - These differences are analyzed before and after correction to see if the bias correction method effectively reduces the discrepancy between the model's predictions and the actual data.
   - The differences are split into those that are less than 0 (underestimations) and those greater than 0 (overestimations) to separately assess the correction's impact on both under and over-predictions.
   - The mean of these differences indicates the average bias in predictions; a smaller absolute value post-correction suggests improved alignment with the actual data.

### Conclusion

The effectiveness of the `eQM_porcentual_delta` method is thus determined by its ability to minimize these differences, indicating a closer match between the model's predictions and the actual observed quantiles. If after applying the correction, the differences between the theoretical and empirical quantiles are reduced, it suggests that the model's systematic bias at each quantile has been effectively corrected, leading to more accurate predictions. This method is particularly useful for predictive models where accurate quantile estimations are crucial, such as in financial risk models or in forecasting weather-related events.

In [15]:
list_qp = []
list_qp_corrected = []

for i, q in enumerate(model_present):
    count = 0
    count_corrected = 0
    for j in ref_dataset:
        q_p = model_present[i]
        q_p_corrected = model_present_corrected[i]
        if j<=q_p:
            count +=1
        if j<=q_p_corrected:
            count_corrected +=1
    perc_qp = count/n
    list_qp.append(perc_qp)

    perc_qp_corrected = count_corrected/n
    list_qp_corrected.append(perc_qp_corrected)

dif = np.array(quantiles)-(list_qp)
dif = np.mean(dif[dif<=0]), np.mean(dif[dif>0])
dif_a = np.mean(np.array(quantiles)-(list_qp))
dif_a = np.mean(dif_a[dif_a<=0]), np.mean(dif_a[dif_a>0])

dif = np.array(quantiles)-(list_qp_corrected)
dif = np.mean(dif[dif<=0]), np.mean(dif[dif>0])
dif_corrected = np.mean(np.array(quantiles)-(list_qp_corrected))
dif_corrected = np.mean(dif_corrected[dif_corrected<=0]), np.mean(dif_corrected[dif_corrected>0])

print("EQM Delta proportion")
print("Previous Difference", dif_a)
print("Before    ", list_qp)
print("Theorical Quantile  ", quantiles)
print("Corrected",list_qp_corrected)
print("Difference with correction",dif)

EQM Delta proportion
Previous Difference (nan, 0.2905263157894737)
Before     [0.0, 0.0, 0.02, 0.03, 0.06, 0.1, 0.07, 0.1, 0.12, 0.17, 0.15, 0.17, 0.29, 0.23, 0.35, 0.39, 0.51, 0.57, 0.65]
Theorical Quantile   [0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]
Corrected [0.06, 0.11, 0.16, 0.21, 0.27, 0.42, 0.32, 0.37, 0.47, 0.58, 0.53, 0.63, 0.73, 0.68, 0.79, 0.84, 0.89, 0.94, 1.0]
Difference with correction (-0.039999999999999994, 0.02499999999999998)


In [16]:
list_qpf = []
list_qpf_corrected = []

for i, q in enumerate(model_future):
    count = 0
    count_corrected = 0
    for j in ref_dataset:
        q_f = model_future[i]
        q_f2 = model_future_corrected[i]
        if j<=q_f:
            count +=1
        if j<=q_f2:
            count_corrected +=1
            
    perc_qpf = count/n
    list_qpf.append(perc_qpf)
    
    perc_qpf_corrected = count_corrected/n
    list_qpf_corrected.append(perc_qpf_corrected)

dif = np.array(quantiles)-(list_qpf)
dif = np.mean(dif[dif<=0]), np.mean(dif[dif>0])
dif_a = np.mean(np.array(quantiles)-(list_qpf))
dif_a = np.mean(dif_a[dif_a<=0]), np.mean(dif_a[dif_a>0])

dif = np.array(quantiles)-(list_qpf_corrected)
dif = np.mean(dif[dif<=0]), np.mean(dif[dif>0])
dif_corrected = np.mean(np.array(quantiles)-(list_qpf_corrected))
dif_corrected = np.mean(dif_corrected[dif_corrected<=0]), np.mean(dif_corrected[dif_corrected>0])

print("EQM Delta proportion in FUTURE DATA")
print("Previous Difference", dif_a)
print("Before    ", list_qp)
print("Theorical Quantile  ", quantiles)
print("Corrected",list_qp_corrected)
print("Difference with correction",dif)

EQM Delta proportion in FUTURE DATA
Previous Difference (nan, 0.25789473684210523)
Before     [0.0, 0.0, 0.02, 0.03, 0.06, 0.1, 0.07, 0.1, 0.12, 0.17, 0.15, 0.17, 0.29, 0.23, 0.35, 0.39, 0.51, 0.57, 0.65]
Theorical Quantile   [0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]
Corrected [0.06, 0.11, 0.16, 0.21, 0.27, 0.42, 0.32, 0.37, 0.47, 0.58, 0.53, 0.63, 0.73, 0.68, 0.79, 0.84, 0.89, 0.94, 1.0]
Difference with correction (-0.07578947368421053, nan)
