In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
# replace NaN reject_code with 0
claims_df = pd.read_csv('../data/processed/dim_claims_train.csv').fillna(0)
# type cast reject_code as int for nicer printing
claims_df.loc[:, 'reject_code'] = claims_df['reject_code'].astype(int)

date_df = pd.read_csv('../data/processed/dim_date_train.csv')
pa_df = pd.read_csv('../data/processed/dim_pa_train.csv')
bridge_df = pd.read_csv('../data/processed/bridge_train.csv')

## What is the average rate of approved claim?

In [3]:
print(f"{round(100 * claims_df['pharmacy_claim_approved'].mean(), 1)}% of claims ({claims_df['pharmacy_claim_approved'].count()} records) are approved.")

58.4% of claims (1068460 records) are approved.


## How does the payer (`bin`) influence the average rate of claim approval?
- Payer `999001` approved 90% of claims and has the largest number of claims (512k).

In [4]:
for payer, view in claims_df.groupby('bin'):
    print(f"{round(100 * view['pharmacy_claim_approved'].mean(), 1)}% of claims ({view['pharmacy_claim_approved'].count()} records) are approved with payer of {payer}.")

23.1% of claims (138919 records) are approved with payer of 417380.
21.2% of claims (245819 records) are approved with payer of 417614.
45.8% of claims (171359 records) are approved with payer of 417740.
90.0% of claims (512363 records) are approved with payer of 999001.


## How does the drug (`drug`) influence the average rate of claim approval?

In [5]:
for drug, view in claims_df.groupby('drug'):
    print(f"{round(100 * view['pharmacy_claim_approved'].mean(), 1)}% of claims ({view['pharmacy_claim_approved'].count()} records) are approved with drug of {drug}.")

57.5% of claims (543381 records) are approved with drug of A.
54.9% of claims (274076 records) are approved with drug of B.
64.0% of claims (251003 records) are approved with drug of C.


## How does the combination of payer (`bin`) and reject code (`reject_code`) influence the average rate of claim approval?
- Payer `999001` approves ~90% of claims, regardless of drug.
- The remaining payers **only** approve a single drug, but that approval is at a rate of 90%.

In [6]:
split = pd.DataFrame(claims_df.groupby(['bin', 'drug'])['pharmacy_claim_approved'].mean())
split.loc[:, 'pharmacy_claim_approved_count'] = claims_df.groupby(['bin', 'drug'])['pharmacy_claim_approved'].count()
split = split.rename(columns={'pharmacy_claim_approved': 'pharmacy_claim_approved_percent'})
display(pd.DataFrame(split).sort_index(level=[0, 1]))

Unnamed: 0_level_0,Unnamed: 1_level_0,pharmacy_claim_approved_percent,pharmacy_claim_approved_count
bin,drug,Unnamed: 2_level_1,Unnamed: 3_level_1
417380,A,0.0,70844
417380,B,0.901659,35621
417380,C,0.0,32454
417614,A,0.0,125179
417614,B,0.0,62872
417614,C,0.901831,57768
417740,A,0.901003,87174
417740,B,0.0,43966
417740,C,0.0,40219
999001,A,0.899767,260184


## How does each payer (`payer`) administer reject codes (`reject_code`) for each drug (`drug`)?
- If a payer accepts a drug with out PA, a claim may be rejected with code 76 (drug covered but limit exceeded).
- If a payer does not accept a drug without PA, then the reject code is either 70 (drug not covered) or 75 (drug on forumlary but required PA).
- Payer `999001` accepts all drugs and only administers reject code 76.

In [7]:
split = pd.DataFrame(claims_df.groupby(['bin', 'drug', 'reject_code'])['pharmacy_claim_approved'].count())
#split.loc[:, 'pharmacy_claim_approved_count'] = claims_df.groupby(['bin', 'drug'])['pharmacy_claim_approved'].count()
split = split.rename(columns={'pharmacy_claim_approved': 'pharmacy_claim_approved_percent'})
display(pd.DataFrame(split).sort_index(level=[0, 1]))

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,pharmacy_claim_approved_percent
bin,drug,reject_code,Unnamed: 3_level_1
417380,A,75,70844
417380,B,0,32118
417380,B,76,3503
417380,C,70,32454
417614,A,70,125179
417614,B,75,62872
417614,C,0,52097
417614,C,76,5671
417740,A,0,78544
417740,A,76,8630
