In [1]:
import pandas as pd
import numpy as np

In [2]:
# replace NaN reject_code with 0
claims_df = pd.read_csv('../data/processed/dim_claims_train.csv').fillna(0)
# type cast reject_code as int for nicer printing
claims_df.loc[:, 'reject_code'] = claims_df['reject_code'].astype(int)

date_df = pd.read_csv('../data/processed/dim_date_train.csv')
pa_df = pd.read_csv('../data/processed/dim_pa_train.csv')
bridge_df = pd.read_csv('../data/processed/bridge_train.csv')

## What is the average rate of approved prior authorizations?

In [3]:
# approval is 1 and rejection is 0 resulting in the mean providing the approval rate
print(f"{round(100 * pa_df['pa_approved'].mean(), 1)}% of PAs ({pa_df['pa_approved'].count()} records) are approved.")

73.5% of PAs (444682 records) are approved.


## What is the aggregate rate of prior authorization approval, segmented by `correct_diagnosis`, `tried_and_failed`, and `contraindication`?
- A correct diagnosis for the drug prescribed **increases** approval rate by 3.8%.
- Trying and failing a generic alternative **increases** approval rate by 11.1%.
- A drug with a contraindication **decreases** approval rate by 24.6%.

In [4]:
for name in ['correct_diagnosis', 'tried_and_failed', 'contraindication']:
    outcomes = {}
    for outcome, view in pa_df.groupby(name):
        rate = view['pa_approved'].mean()
        print(f"{round(100 * rate, 1)}% of PAs ({view['pa_approved'].count()} records) are approved if the {name} is {bool(outcome)}.")
        outcomes[outcome] = rate
    delta = round(100 * (outcomes[1] - outcomes[0]), 1)
    print(f"If {name} is True, the approval rate changes by {delta}%.")
    print('')

70.4% of PAs (89040 records) are approved if the correct_diagnosis is False.
74.3% of PAs (355642 records) are approved if the correct_diagnosis is True.
If correct_diagnosis is True, the approval rate changes by 3.8%.

67.9% of PAs (221940 records) are approved if the tried_and_failed is False.
79.0% of PAs (222742 records) are approved if the tried_and_failed is True.
If tried_and_failed is True, the approval rate changes by 11.1%.

78.4% of PAs (355368 records) are approved if the contraindication is False.
53.9% of PAs (89314 records) are approved if the contraindication is True.
If contraindication is True, the approval rate changes by -24.6%.



## How does the payer (`bin`) influence the average rate of prior authorization approval?
* Payer `999001` approves 90.7% of PAs, but has the least amount of PAs (51344).

In [5]:
merged = bridge_df.merge(claims_df, on='dim_claim_id').merge(pa_df, on='dim_pa_id')

for payer, view in merged.groupby('bin'):
    rate = view['pa_approved'].mean()
    print(f"{round(100 * rate, 1)}% of PAs ({view['pa_approved'].count()} records) are approved with payer of {payer}.")

78.7% of PAs (106801 records) are approved with payer of 417380.
71.1% of PAs (193722 records) are approved with payer of 417614.
62.9% of PAs (92815 records) are approved with payer of 417740.
90.7% of PAs (51344 records) are approved with payer of 999001.


## How does the drug (`drug`) influence the average rate of prior authorization approval?

In [6]:
for drug, view in merged.groupby('drug'):
    rate = view['pa_approved'].mean()
    print(f"{round(100 * rate, 1)}% of PAs ({view['pa_approved'].count()} records) are approved with drug of {drug}.")

76.3% of PAs (230732 records) are approved with drug of A.
75.9% of PAs (123482 records) are approved with drug of B.
63.1% of PAs (90468 records) are approved with drug of C.


## How does the reject code (`reject_code`) influence the average rate of prior authorization approval?
- 50% of PAs with reject code 70 (drug not covered by plan and not on formulary) approved.
- 94.8% of PAs with reject code 75 (drug is in formulary but does not have preferred status) approved.
- 88.4% of PAs with reject code 76 (drug is covered, but plan limits have been exceeded) approved.

In [7]:
for reject_code, view in merged.groupby('reject_code'):
    rate = view['pa_approved'].mean()
    print(f"{round(100 * rate, 1)}% of PAs ({view['pa_approved'].count()} records) are approved with reject code of {reject_code}.")

50.0% of PAs (201599 records) are approved with reject code of 70.
94.8% of PAs (173935 records) are approved with reject code of 75.
88.4% of PAs (69148 records) are approved with reject code of 76.


## How does the combination of drug (`drug`), payer (`bin`), and reject code (`reject_code`) influence the average rate of prior authorization approval?
- Payer `999001` always uses reject code 76 and has a higher rate of approval compared to the other payer with reject code 76 for that specific drug.
- For each drug , each payer *except* `999001` has a single code they use. Each payer also only uses each code once.
- Approval rates vary across drug-payer combination.

In [8]:
split = (pd.DataFrame(merged.groupby(['drug', 'bin', 'reject_code'])['pa_approved'].mean()) * 100).round(1)
split.loc[:, '# of PA Records'] = merged.groupby(['drug', 'bin', 'reject_code'])['pa_approved'].count()
split = split.rename(columns={'pa_approved': 'PA Approval %'})
display(split.sort_index(level=[0, 1, 2]))

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,PA Approval %,# of PA Records
drug,bin,reject_code,Unnamed: 3_level_1,Unnamed: 4_level_1
A,417380,75,99.0,70844
A,417614,70,58.3,125179
A,417740,76,90.2,8630
A,999001,76,96.3,26079
B,417380,76,90.4,3503
B,417614,75,97.4,62872
B,417740,70,38.9,43966
B,999001,76,92.9,13141
C,417380,70,33.1,32454
C,417614,76,63.6,5671


## How does the combination of contraindication (`contraindication`), generic failure (`tried_and_failed`), and correct diagnosis (`correct_diagnosis`) influence the average rate of prior authorization approval?
- The magnitude of influence of each feature is consistent with the averages above.
- Satisfying the beneficial conditions no contraindication, trying and failing a generic alternative, and a correct diagnosis results in an approval rate incraese of +40.5%, which is smaller than the sum of each individual increase:
    - **correct diagnosis** results in an **increase of +4.4%**
    - **tried and failed generic alternative** results in an **increase of +12.9%**
    - **not having a contraindication** results in an **increase of +26%**
    
This relationship indicates diminishing returns when all conditions are satisfied.

In [9]:
split = (pd.DataFrame(merged.groupby(['contraindication', 'tried_and_failed', 'correct_diagnosis'])['pa_approved'].mean()) * 100).round(1)
split.loc[:, '# of PA Records'] = merged.groupby(['contraindication', 'tried_and_failed', 'correct_diagnosis'])['pa_approved'].count()
split = split.rename(columns={'pa_approved': 'PA Approval %'})

display(split.sort_values('PA Approval %', ascending=False))

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,PA Approval %,# of PA Records
contraindication,tried_and_failed,correct_diagnosis,Unnamed: 3_level_1,Unnamed: 4_level_1
0,1,1,84.4,142304
0,1,0,81.1,35687
0,0,1,73.9,141959
0,0,0,69.9,35418
1,1,1,61.1,35799
1,1,0,56.8,8952
1,0,1,48.3,35580
1,0,0,43.9,8983


## For each drug (`drug`) and payer (`bin`), how does the contraindication (`contraindication`), generic failure (`tried_and_failed`), and correct diagnosis (`correct_diagnosis`) influence the average rate of prior authorization approval?
- The baseline approval rate (contraindication, not trying and failing generic, and incorrect diagnosis), shown as the bottom row in each table, varies widely across drugs and payers.
- The increase in approval rate also changes significantly across drug-payer combinations.

In [10]:
for drug in merged['drug'].unique():
    print(f'-- Drug {drug} --')
    for payer in merged['bin'].unique():
        section = merged.loc[(merged['drug'] == drug) & (merged['bin'] == payer)]
        split = (pd.DataFrame(section.groupby(['contraindication', 'tried_and_failed', 'correct_diagnosis'])['pa_approved'].mean()) * 100).round(1)
        split.loc[:, '# of PA Records'] = section.groupby(['contraindication', 'tried_and_failed', 'correct_diagnosis'])['pa_approved'].count()
        split = split.rename(columns={'pa_approved': 'PA Approval %'})
        
        reject_code = section['reject_code'].unique()[0]
        print(f'PA approval rate for payer {payer} (reject_code {reject_code}) and drug {drug}')
        display(split.sort_values('PA Approval %', ascending=False))

-- Drug A --
PA approval rate for payer 417380 (reject_code 75) and drug A


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,PA Approval %,# of PA Records
contraindication,tried_and_failed,correct_diagnosis,Unnamed: 3_level_1,Unnamed: 4_level_1
0,1,0,99.9,5650
0,1,1,99.9,22500
0,0,1,99.6,22667
0,0,0,99.5,5771
1,1,1,98.2,5700
1,1,0,97.1,1450
1,0,1,94.7,5662
1,0,0,92.8,1444


PA approval rate for payer 417740 (reject_code 76) and drug A


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,PA Approval %,# of PA Records
contraindication,tried_and_failed,correct_diagnosis,Unnamed: 3_level_1,Unnamed: 4_level_1
0,1,1,97.3,2738
0,1,0,96.3,672
0,0,1,93.1,2759
0,0,0,89.9,682
1,1,1,82.9,707
1,1,0,78.8,193
1,0,1,65.7,717
1,0,0,53.1,162


PA approval rate for payer 999001 (reject_code 76) and drug A


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,PA Approval %,# of PA Records
contraindication,tried_and_failed,correct_diagnosis,Unnamed: 3_level_1,Unnamed: 4_level_1
0,1,1,99.4,8276
0,1,0,99.2,2096
0,0,1,97.8,8389
0,0,0,97.6,2070
1,1,1,93.0,2076
1,1,0,90.1,517
1,0,1,83.0,2152
1,0,0,80.9,503


PA approval rate for payer 417614 (reject_code 70) and drug A


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,PA Approval %,# of PA Records
contraindication,tried_and_failed,correct_diagnosis,Unnamed: 3_level_1,Unnamed: 4_level_1
0,1,1,77.1,40049
0,1,0,71.4,10008
0,0,1,57.9,39923
0,0,0,50.5,9955
1,1,1,37.6,10147
1,1,0,30.7,2497
1,0,1,18.6,10057
1,0,0,14.0,2543


-- Drug B --
PA approval rate for payer 417380 (reject_code 76) and drug B


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,PA Approval %,# of PA Records
contraindication,tried_and_failed,correct_diagnosis,Unnamed: 3_level_1,Unnamed: 4_level_1
0,1,1,98.1,1089
0,1,0,96.2,260
0,0,1,92.8,1110
0,0,0,90.6,297
1,1,1,82.3,300
1,1,0,74.1,81
1,0,0,66.3,86
1,0,1,66.1,280


PA approval rate for payer 417740 (reject_code 70) and drug B


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,PA Approval %,# of PA Records
contraindication,tried_and_failed,correct_diagnosis,Unnamed: 3_level_1,Unnamed: 4_level_1
0,1,1,56.8,14063
0,1,0,50.6,3636
0,0,1,37.0,14037
0,0,0,29.1,3414
1,1,1,19.5,3570
1,1,0,14.0,872
1,0,1,6.8,3459
1,0,0,4.6,915


PA approval rate for payer 999001 (reject_code 76) and drug B


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,PA Approval %,# of PA Records
contraindication,tried_and_failed,correct_diagnosis,Unnamed: 3_level_1,Unnamed: 4_level_1
0,1,1,98.5,4221
0,1,0,97.9,1093
0,0,1,95.4,4228
0,0,0,95.1,1008
1,1,1,85.2,1083
1,1,0,80.4,275
1,0,1,70.0,981
1,0,0,63.1,252


PA approval rate for payer 417614 (reject_code 75) and drug B


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,PA Approval %,# of PA Records
contraindication,tried_and_failed,correct_diagnosis,Unnamed: 3_level_1,Unnamed: 4_level_1
0,1,1,99.7,20237
0,1,0,99.4,5049
0,0,1,98.9,19999
0,0,0,98.1,5033
1,1,1,94.4,4965
1,1,0,92.2,1266
1,0,1,87.1,5055
1,0,0,83.0,1268


-- Drug C --
PA approval rate for payer 417380 (reject_code 70) and drug C


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,PA Approval %,# of PA Records
contraindication,tried_and_failed,correct_diagnosis,Unnamed: 3_level_1,Unnamed: 4_level_1
0,1,1,50.6,10494
0,1,0,41.1,2584
0,0,1,30.7,10201
0,0,0,23.0,2522
1,1,1,15.4,2637
1,1,0,10.3,687
1,0,1,5.7,2657
1,0,0,3.0,672


PA approval rate for payer 417740 (reject_code 75) and drug C


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,PA Approval %,# of PA Records
contraindication,tried_and_failed,correct_diagnosis,Unnamed: 3_level_1,Unnamed: 4_level_1
0,1,1,94.3,12838
0,1,0,92.2,3282
0,0,1,85.3,12977
0,0,0,81.8,3239
1,1,1,70.0,3182
1,1,0,64.8,772
1,0,1,50.3,3167
1,0,0,43.0,762


PA approval rate for payer 999001 (reject_code 76) and drug C


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,PA Approval %,# of PA Records
contraindication,tried_and_failed,correct_diagnosis,Unnamed: 3_level_1,Unnamed: 4_level_1
0,1,1,90.0,4033
0,1,0,88.1,928
0,0,1,77.6,3837
0,0,0,71.8,951
1,1,1,58.2,987
1,1,0,48.9,225
1,0,1,38.1,919
1,0,0,34.0,244


PA approval rate for payer 417614 (reject_code 76) and drug C


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,PA Approval %,# of PA Records
contraindication,tried_and_failed,correct_diagnosis,Unnamed: 3_level_1,Unnamed: 4_level_1
0,1,1,83.8,1766
0,1,0,79.3,429
0,0,1,64.0,1832
0,0,0,58.6,476
1,1,1,44.0,445
1,1,0,33.3,117
1,0,1,18.6,474
1,0,0,10.6,132
