# Data Warehouse Medicare Texas QA - Member Enrollment Monthly

Performing QA on member_enrollment_monthly table in dw_staging before moving them to data_warehouse schema

## Initialization

Just loading packages that will be used and initializing connection to GP DB.

In [6]:
import pandas as pd
import sys
import psycopg2
sys.path.append('H:/uth_helpers')
from db_utils import get_dsn

In [12]:
connection = psycopg2.connect(get_dsn())
connection.autocommit = True

## Table Information

This table contains enrollment information on a monthly level. Depending on the data source, this information can be easily extracted in a monthly level.

Data Sources:

* Optum Zip/Optum DoD: Enrollment information not on monthly level. Enrollment dates have a begin date and an end  date which may be longer than a month. Enrollment tables are **mbr_enroll** and **mbr_co_enroll**
* Truven: Enrollment table , **t**, contains monthly level enrollment data
* Medicaid: Enrollment tables (**enrl**, **chip_uth**, **htw_enrl**) are in month level usually identified by **elig_month/elig_date** column
* Medicare: Enrollment table (**mbsf_abcd_summary**) are in yearly level, to get monthly enrollment, you need to look at the **mdcr_status_code_** columns


Ideally we should have counts of enrollment tables from raw sources. These counts are included with the rest of the raw data tables counts for the given data sources.

* Optum Zip: **qa_reporting.optum_zip_counts**
* Optum Dod: **qa_reporting.optum_dod_counts**
* Medicaid: **qa_reporting.mdcd_enrollment_counts_[cy/fy]**
* Truven: **qa_reporting.truven_counts**
* Medicare: **qa_reporting.medicare_national_counts** and **qa_reporting.medicare_texas_counts**

## Row Counts and Enrollment Counts

In [8]:
query = ''' drop table if exists qa_reporting.dw_mcrt_mbr_enrl_monthly;
create table qa_reporting.dw_mcrt_mbr_enrl_monthly
(
    data_source text,
    calendar_year int,
    table_src text,
    dw_row_count int,
    src_row_count int,
    row_count_diff int,
    row_count_diff_percentage float,
    dw_uth_mbr_id_count int,
    dw_src_mbr_id_count int,
    src_mbr_count int,
    mbr_count_diff int,
    mbr_count_percentage float,
    date_generated date
);
'''

with connection.cursor() as cursor:
    cursor.execute(query)

In [9]:
with connection.cursor() as cursor:
      query = '''
insert into qa_reporting.dw_mcrt_mbr_enrl_monthly
(data_source, calendar_year, table_src, dw_row_count, dw_uth_mbr_id_count, dw_src_mbr_id_count, date_generated)
select data_source, 
        year, 
        table_id_src, 
        count(*),
        count(distinct uth_member_id),
        count(distinct member_id_src),
        current_date
  from dw_staging.mcrt_member_enrollment_monthly
 group by 1,2,3;
      '''

      cursor.execute(query)

      query = '''
update qa_reporting.dw_mcrt_mbr_enrl_monthly a
set src_mbr_count = b.pat_count,
    mbr_count_diff = a.dw_src_mbr_id_count - b.pat_count,
    mbr_count_percentage = 100. * abs(a.dw_src_mbr_id_count - b.pat_count) / b.pat_count
from qa_reporting.medicare_texas_counts b
where data_source = 'mcrt'
and a.calendar_year = b.year
and a.table_src = 'medicare_texas.' || b.table_name
;
      '''

      cursor.execute(query)


      query = '''
      with mcr_month_enrollment as (
            select year, bene_id, t.month_year_id
            from medicare_texas.mbsf_abcd_summary a
            cross join lateral (values (a.year || '01', a.mdcr_status_code_01), (a.year || '02', a.mdcr_status_code_02),
                              (a.year || '03', a.mdcr_status_code_03), (a.year || '04', a.mdcr_status_code_04), (a.year || '05', a.mdcr_status_code_05),
                              (a.year || '06', a.mdcr_status_code_06), (a.year || '07', a.mdcr_status_code_07), (a.year || '08', a.mdcr_status_code_08),
                              (a.year || '09', a.mdcr_status_code_09), (a.year || '10', a.mdcr_status_code_10), (a.year || '11', a.mdcr_status_code_11),
                              (a.year || '12', a.mdcr_status_code_12))
            t(month_year_id, enrollment_status)
            where t.enrollment_status in ('10','11','20','21','31')
      ),
      mcr_month_enrl_count as (
            select year::int, count(*) as row_count
            from mcr_month_enrollment 
            group by 1
      )
      update qa_reporting.dw_mcrt_mbr_enrl_monthly a
      set src_row_count = b.row_count,
            row_count_diff = a.dw_row_count - b.row_count,
            row_count_diff_percentage = 100. * abs(a.dw_row_count - b.row_count) / b.row_count
      from mcr_month_enrl_count b
      where a.calendar_year = b.year
      '''
      
      cursor.execute(query)

After inserting the counts from the dw_staging schema, let's see if there are any years where the counts do not match with the raw tables.

In [10]:
query = '''
select * 
from qa_reporting.dw_mcrt_mbr_enrl_monthly
order by calendar_year
;'''
member_monthly_df = pd.read_sql(query, con=connection)
member_monthly_df




Unnamed: 0,data_source,calendar_year,table_src,dw_row_count,src_row_count,row_count_diff,row_count_diff_percentage,dw_uth_mbr_id_count,dw_src_mbr_id_count,src_mbr_count,mbr_count_diff,mbr_count_percentage,date_generated
0,mcrt,2014,medicare_texas.mbsf_abcd_summary,43529167,43529167,0,0.0,3822495,3822495,3822796,-301,0.007874,2023-06-26
1,mcrt,2015,medicare_texas.mbsf_abcd_summary,45003185,45003185,0,0.0,3948967,3948967,3949215,-248,0.00628,2023-06-26
2,mcrt,2016,medicare_texas.mbsf_abcd_summary,46448004,46448004,0,0.0,4068903,4068903,4069556,-653,0.016046,2023-06-26
3,mcrt,2017,medicare_texas.mbsf_abcd_summary,47866081,47866081,0,0.0,4194036,4194036,4194289,-253,0.006032,2023-06-26
4,mcrt,2018,medicare_texas.mbsf_abcd_summary,48871117,48871117,0,0.0,4284273,4284273,4284529,-256,0.005975,2023-06-26
5,mcrt,2019,medicare_texas.mbsf_abcd_summary,50396299,50396299,0,0.0,4411218,4411218,4411405,-187,0.004239,2023-06-26
6,mcrt,2020,medicare_texas.mbsf_abcd_summary,51868937,51868937,0,0.0,4538294,4538294,4538440,-146,0.003217,2023-06-26


In [25]:
member_monthly_df[(member_monthly_df['row_count_diff_percentage'] > 1.) | (member_monthly_df['mbr_count_percentage'] > 1.)]

Unnamed: 0,data_source,calendar_year,table_src,dw_row_count,src_row_count,row_count_diff,row_count_diff_percentage,dw_uth_mbr_id_count,dw_src_mbr_id_count,src_mbr_count,mbr_count_diff,mbr_count_percentage,date_generated


If **member_monthly_df** does not have any rows, it means that all of the rows from the raw tables are in this enrollment table at a monthly level.

## Gender Count

Now that we have verified that most if not all of the rows from the raw tables, ccaet and mdcrt, have been added to the member_enrollment_monthly table, we will check that the counts for other columns such as gender have been correctly added to the DW table.

In this case we won't seperate the counts by source table, just by calendar year.

In [13]:
query = '''with mcrn_gen_cd as (
    select year::int, bene_id, t.month_year_id, sex_ident_cd
            from medicare_texas.mbsf_abcd_summary a
            cross join lateral (values (a.year || '01', a.mdcr_status_code_01), (a.year || '02', a.mdcr_status_code_02),
                              (a.year || '03', a.mdcr_status_code_03), (a.year || '04', a.mdcr_status_code_04), (a.year || '05', a.mdcr_status_code_05),
                              (a.year || '06', a.mdcr_status_code_06), (a.year || '07', a.mdcr_status_code_07), (a.year || '08', a.mdcr_status_code_08),
                              (a.year || '09', a.mdcr_status_code_09), (a.year || '10', a.mdcr_status_code_10), (a.year || '11', a.mdcr_status_code_11),
                              (a.year || '12', a.mdcr_status_code_12))
            t(month_year_id, enrollment_status)
            where t.enrollment_status in ('10','11','20','21','31')
),
mcrn_gen as (
    select year, c.gender_cd, count(*) gender_count
    from mcrn_gen_cd m
    left join reference_tables.ref_gender c
    on c.data_source = 'mcr'
   and c.gender_cd_src = m.sex_ident_cd
    group by 1,2
), dw_gen as (
    select year, gender_cd, count(*) gender_count
    from dw_staging.mcrt_member_enrollment_monthly
    group by 1,2
)
select a.year, a.gender_cd, a.gender_count as dw_gender_count, b.gender_count as src_gender_count, 
        a.gender_count - b.gender_count as gender_count_diff, 
        100. * abs(a.gender_count - b.gender_count) / b.gender_count as gender_count_diff_percentage
from mcrn_gen b
full outer join dw_gen a
on a.year = b.year
and a.gender_cd = b.gender_cd;
'''
 
df = pd.read_sql(query,  con=connection)
df.sort_values(['year', 'gender_cd'])

Unnamed: 0,year,gender_cd,dw_gender_count,src_gender_count,gender_count_diff,gender_count_diff_percentage
4,2014,F,23606390,23606390,0,0.0
13,2014,M,19922765,19922765,0,0.0
10,2014,U,12,12,0,0.0
8,2015,F,24379314,24379314,0,0.0
0,2015,M,20623859,20623859,0,0.0
19,2015,U,12,12,0,0.0
6,2016,F,25138720,25138720,0,0.0
14,2016,M,21309262,21309262,0,0.0
3,2016,U,22,22,0,0.0
2,2017,F,25887373,25887373,0,0.0


## Plan Type Counts

In [14]:
# Including enrollments where the plantyp column is NULL. Treating it as if unknown.
query = '''with mcrt_enroll as (
    select year::int, bene_id, ent.plan_type
    from medicare_texas.mbsf_abcd_summary a
    cross join lateral (values (01, a.mdcr_entlmt_buyin_ind_01, a.mdcr_status_code_01), (02, a.mdcr_entlmt_buyin_ind_02, a.mdcr_status_code_02),
                        (03, a.mdcr_entlmt_buyin_ind_03, a.mdcr_status_code_03), (04, a.mdcr_entlmt_buyin_ind_04, a.mdcr_status_code_04), (05, a.mdcr_entlmt_buyin_ind_05, a.mdcr_status_code_05),
                        (06, a.mdcr_entlmt_buyin_ind_06, a.mdcr_status_code_06), (07, a.mdcr_entlmt_buyin_ind_07, a.mdcr_status_code_07), (08, a.mdcr_entlmt_buyin_ind_08, a.mdcr_status_code_08),
                        (09, a.mdcr_entlmt_buyin_ind_09, a.mdcr_status_code_09), (10, a.mdcr_entlmt_buyin_ind_10, a.mdcr_status_code_10), (11, a.mdcr_entlmt_buyin_ind_11, a.mdcr_status_code_11),
                        (12, a.mdcr_entlmt_buyin_ind_12, a.mdcr_status_code_12))
    t(month_year_id, mcdcr_enrlmt, enrollment_status)
    join reference_tables.ref_medicare_entlmt_buyin ent 
    on ent.buyin_cd = t.mcdcr_enrlmt
    where t.enrollment_status in ('10','11','20','21','31')
  ),
mcrt_plans as (          
    select year, case when plan_type is null then 'UNK' else plan_type end as plan_type, count(*) plan_count
    from mcrt_enroll a
    group by 1,2
),
dw_plans as (
    select year, case when plan_type is null then 'UNK' else plan_type end as plan_type,
            count(*) plan_count
    from dw_staging.mcrt_member_enrollment_monthly
    group by 1,2
)
select a.year, a.plan_type, a.plan_count as dw_plan_count, b.plan_count as src_plan_count, 
        a.plan_count - b.plan_count as plan_count_diff, 
        100. * abs(a.plan_count - b.plan_count) / b.plan_count as plan_count_diff_percentage
from mcrt_plans b
full outer join dw_plans a
on a.year = b.year
and a.plan_type = b.plan_type
order by year;
'''

plan_count_df = pd.read_sql(query,  con=connection)
plan_count_df.sort_values(['year', 'plan_type'])



Unnamed: 0,year,plan_type,dw_plan_count,src_plan_count,plan_count_diff,plan_count_diff_percentage
1,2014,A,3528334,3528504.0,-170.0,0.004818
3,2014,AB,27205682,39874693.0,-12669011.0,31.772059
0,2014,B,124890,125970.0,-1080.0,0.857347
2,2014,C,12670261,,,
4,2015,A,3679191,3679329.0,-138.0,0.003751
7,2015,AB,27025043,41189583.0,-14164540.0,34.388646
5,2015,B,133224,134273.0,-1049.0,0.781244
6,2015,C,14165727,,,
10,2016,A,3834805,3835098.0,-293.0,0.00764
8,2016,AB,27129679,42468700.0,-15339021.0,36.118414


Looks like the greatest difference in the plan counts are for member enrolled in Medicare type A and B plans. The difference, however, is not  significant when looking the yearly overall difference.

In [15]:
plan_count_df[plan_count_df['plan_count_diff_percentage'] > 1.0]

Unnamed: 0,year,plan_type,dw_plan_count,src_plan_count,plan_count_diff,plan_count_diff_percentage
3,2014,AB,27205682,39874693.0,-12669011.0,31.772059
7,2015,AB,27025043,41189583.0,-14164540.0,34.388646
8,2016,AB,27129679,42468700.0,-15339021.0,36.118414
13,2017,AB,26995863,43708892.0,-16713029.0,38.237137
16,2018,AB,25960398,44551495.0,-18591097.0,41.729457
19,2018,B,64082,133284.0,-69202.0,51.920711
22,2019,B,59288,130138.0,-70850.0,54.442208
23,2019,AB,25776631,45852253.0,-20075622.0,43.783284
24,2020,AB,25215474,47276547.0,-22061073.0,46.663884
26,2020,B,55142,127013.0,-71871.0,56.585546


In [16]:
yearly_plan_count_df = plan_count_df.groupby('year')['dw_plan_count', 'src_plan_count'].sum()
yearly_plan_count_df['plan_count_diff'] = yearly_plan_count_df['dw_plan_count'] - yearly_plan_count_df['src_plan_count']
yearly_plan_count_df['plan_count_diff_percentage'] = 100.* abs(yearly_plan_count_df['plan_count_diff'] / yearly_plan_count_df['src_plan_count'])
yearly_plan_count_df

  yearly_plan_count_df = plan_count_df.groupby('year')['dw_plan_count', 'src_plan_count'].sum()


Unnamed: 0_level_0,dw_plan_count,src_plan_count,plan_count_diff,plan_count_diff_percentage
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2014,43529167,43529167.0,0.0,0.0
2015,45003185,45003185.0,0.0,0.0
2016,46448004,46448004.0,0.0,0.0
2017,47866081,47866081.0,0.0,0.0
2018,48871117,48871117.0,0.0,0.0
2019,50396299,50396299.0,0.0,0.0
2020,51868937,51868937.0,0.0,0.0


In [17]:
year_row_count = member_monthly_df.groupby(['calendar_year'])[['dw_row_count', 'dw_uth_mbr_id_count']].sum()
year_row_count

Unnamed: 0_level_0,dw_row_count,dw_uth_mbr_id_count
calendar_year,Unnamed: 1_level_1,Unnamed: 2_level_1
2014,43529167,3822495
2015,45003185,3948967
2016,46448004,4068903
2017,47866081,4194036
2018,48871117,4284273
2019,50396299,4411218
2020,51868937,4538294
