***
# CORE Cartridge Notebook::[enrich_patient_journey_hierarchy]
![CORE Logo](assets/coreLogo.png) 

---
## Keep in Mind
Good Transforms Are...
- **singular in purpose:** good transforms do one and only one thing, and handle all known cases for that thing. 
- **repeatable:** transforms should be written in a way that they can be run against the same dataset an infinate number of times and get the same result every time. 
- **easy to read:** 99 times out of 100, readable, clear code that runs a little slower is more valuable than a mess that runs quickly. 
- **No 'magic numbers':** if a variable or function is not instantly obvious as to what it is or does, without context, maybe consider renaming it.

## Workflow - how to use this notebook to make science
#### Data Science
1. **Document your transform.** Fill out the _description_ cell below describing what it is this transform does; this will appear in the configuration application where Ops will create, configure and update pipelines. 
1. **Define your config object.** Fill out the _configuration_ cell below the commented-out guide to define the variables you want ops to set in the configuration application (these will populate here for every pipeline). 
2. **Build your transformation logic.** Use the transformation cell to do that magic that you do. 
![caution](assets/cautionTape.png)

### Description
What does this transformation do? be specific.

![what does your transform do](assets/what.gif)

## Planned

We map status/sub-status combinations to a patient journey model in order to provide actionable insights to our customers.

Definition of Done:

- by status/sub-status combination - map to a patient journey bucket defined in a bridge table.
- process to kick out any statuses/sub-statuses that are NOT mapped. 

**Transform assumes MASTER patient status and MASTER substatus transforms at a minimum have been run.**

<a id="CELL1"></a>
## CELL 1 
<font color=orange>
last time touched for 'dev'  Thursday, August 29, 2019 12:27:56 PM GMT-04:00 DST  <br>
last time touched for 'productionalize'     
</font>

In [17]:
"""CELL 1
'stays for 'dev' and 'productionalize''
builds and returns a database session
local assumes a psql instance in a local docker container
only postgres database is supported for configuration_application at this time
"""
"""
gets env-based configuration secret
returns a session to the configuration db
for dev env it pre-populates the database with helper and seed data
"""
from core.helpers.session_helper import SessionHelper
session = SessionHelper().session

2019-08-29 16:53:01,569 - core.helpers.session_helper.SessionHelper - INFO - Creating session for dev environment...
2019-08-29 16:53:01,604 - core.helpers.configuration_mocker.ConfigurationMocker - DEBUG - Generating administrator mocks.
2019-08-29 16:53:01,609 - core.helpers.configuration_mocker.ConfigurationMocker - DEBUG - Done generating administrator mocks.
2019-08-29 16:53:01,612 - core.helpers.configuration_mocker.ConfigurationMocker - DEBUG - Generating pharmaceutical company mocks.
2019-08-29 16:53:01,617 - core.helpers.configuration_mocker.ConfigurationMocker - DEBUG - Done generating pharmaceutical company mocks.
2019-08-29 16:53:01,619 - core.helpers.configuration_mocker.ConfigurationMocker - DEBUG - Generating brand mocks.
2019-08-29 16:53:01,624 - core.helpers.configuration_mocker.ConfigurationMocker - DEBUG - Done generating brand mocks.
2019-08-29 16:53:01,628 - core.helpers.configuration_mocker.ConfigurationMocker - DEBUG - Generating segment mocks.
2019-08-29 16:53:0

In [18]:
import logging
logging.getLogger().setLevel(logging.DEBUG)
log = logging.getLogger()

## CONFIGURATION - PLEASE TOUCH
### <font color=pink>This cell will be off in production as configurations will come from the configuration postgres DB</color>

In [19]:
"""
************ CONFIGURATION - PLEASE TOUCH **************
Pipeline Builder configuration: creates configurations from variables specified here!!
This cell will be off in production as configurations will come from the configuration postgres DB.
"""
"""
PIPELINE STATE:

raw-->ingest-->master-->enhance-->enrich-->metrics-->dimensional

"""
# config vars: this dataset
config_pharma = "sun" # the pharmaceutical company which owns {brand}
config_brand = "ilumya" # the brand this pipeline operates on
config_state = "enrich" # the state this transform runs in
config_name = "enrich_patient_journey_hierarchy" # the name of this transform, which is the name of this notebook without .ipynb

# input vars: dataset to fetch. 
# Recall that a contract published to S3 has a key format branch/pharma/brand/state/name
input_branch = "sun-extract-validation"
# if None, input_branch is automagically set to your working branch
input_pharma = "sun"
input_brand = "ilumya"
input_state = "ingest"
input_name = "symphony_health_association_ingest_column_mapping"

#This contract defines the base of the output structure of data into S3.
#
#contract structure in s3: 
#s3:// {ENV} / {BRANCH} / {PARENT} / {CHILD} / {STATE} / {name of input}
#
#ENV - environment Must be one of development, uat, production.
#Prefixed with integrichain- due to global unique reqirement
#BRANCH - the software branch for development this will be the working pull request (eg pr-225)
#in uat this will be edge, in production this will be master
#PARENT - The top level source identifier
#this is generally the customer (and it is aliased as such) but can be IntegriChain for internal sources,
#or another aggregator for future-proofing
#CHILD - The sub level source identifier, generally the brand (and is aliased as such)
#STATE - One of: raw, ingest, master, enhance, enrich, metrics


##### <font color=orange>SETUP - DON'T TOUCH </font> ####
### <font color=pink>This cell will be turned off in production</color>
Populating config mocker based on config parameters...

In [20]:
"""
************ SETUP - DON'T TOUCH **************
Populating config mocker based on config parameters...
"""
import core.helpers.pipeline_builder as builder

ids = builder.build(config_pharma, config_brand, config_state, config_name, session)
"""
RETURNS: A list of 2 items: [transformation_id, run_id] where transformation_id corresponds
to the configuration created/found for {transformation} and run_id is a randomly generated 6 digit
number (to avoid publishing to the same place with the same dataset)
"""
transform_id = ids[0]
run_id = ids[1]

2019-08-29 16:53:14,069 - core.logging - DEBUG - Adding/getting mocks for specified configurations...
2019-08-29 16:53:14,109 - core.logging - DEBUG - Done. Creating mock run event and committing results to configuration mocker.


In [21]:
"""************ SETUP - DON'T TOUCH **************
This section imports data from the configuration database
and should not need to be altered or otherwise messed with. 
~~These are not the droids you are looking for~~
"""
from core.constants import BRANCH_NAME, ENV_BUCKET, BATCH_JOB_QUEUE
from core.helpers.session_helper import SessionHelper
from core.models.configuration import Transformation
from dataclasses import dataclass
from core.dataset_contract import DatasetContract
#import logging
#logging.getLogger().setLevel(logging.DEBUG)
#log = logging.getLogger()
#session = SessionHelper().session

db_transform = session.query(Transformation).filter(Transformation.id == transform_id).one()

@dataclass
class DbTransform:
    id: int = db_transform.id ## the instance id of the transform in the config app
    name: str = db_transform.transformation_template.name ## the transform name in the config app
    state: str = db_transform.pipeline_state.pipeline_state_type.name ## the pipeline state, one of raw, ingest, master, enhance, enrich, metrics, dimensional
    branch:str = BRANCH_NAME ## the git branch for this execution 
    brand: str = db_transform.pipeline_state.pipeline.brand.name ## the pharma brand name
    pharmaceutical_company: str = db_transform.pipeline_state.pipeline.brand.pharmaceutical_company.name # the pharma company name
    publish_contract: DatasetContract = DatasetContract(branch=BRANCH_NAME,
                            state=db_transform.pipeline_state.pipeline_state_type.name,
                            parent=db_transform.pipeline_state.pipeline.brand.pharmaceutical_company.name,
                            child=db_transform.pipeline_state.pipeline.brand.name,
                            dataset=db_transform.transformation_template.name)
    

In [22]:
log.debug(f'Transform Id:{transform_id} Run Id:{run_id}')
log.debug(f'Branch name:{BRANCH_NAME} Env Bucket:{ENV_BUCKET} Batch Job Queue:{BATCH_JOB_QUEUE}')

2019-08-29 16:53:35,332 - root - DEBUG - Transform Id:6 Run Id:220468
2019-08-29 16:53:35,334 - root - DEBUG - Branch name:DC-676_Merge_to_Patient_Journey_Bucket_Map_for_Alkermes Env Bucket:ichain-dev Batch Job Queue:dev-core


##### <font color=orange>SETUP - DON'T TOUCH </font> ####
### <font color=pink>This cell will be turned off in production</color>
Populating config mocker based on config parameters...

### CONFIGURATION - VARIABLES - PLEASE TOUCH
# TRANSFORM

In [115]:
""" 
CONFIGURATION ********* VARIABLES - PLEASE TOUCH ********* 
This section defines what you expect to get from the configuration application 
in a single "transform" object. Define the vars you need here, and comment inline to the right of them 
for all-in-one documentation. 
Engineering will build a production "transform" object for every pipeline that matches what you define here.

@@@ FORMAT OF THE DATA CLASS IS: @@@ 

<variable_name>: <data_type> #<comment explaining what the value is to future us>
e.g.
class Transform(DbTransform):
    some_ratio: float
    site_name: str

~~These ARE the droids you are looking for~~
"""
"""
imports
"""
import pandas as pd
from core.logging import get_logger
 
class Transform(DbTransform):
    '''
    YOUR properties go here!!
    Variable properties should be assigned to the exact name of
    the transformation as it appears in the Jupyter notebook filename.
    ''' 
    # PROD
    '''
    col_status: str = db_transform.variables.col_status # Column containing the status
    col_substatus: str = db_transform.variables.col_substatus # Column containing the substatus
    customer_name: str = db_transform.variables.customer_name # Name of pharmaceutical company
    input_transform: str = db_transform.variables.input_transform # The name of the dataset to pull from
    '''    
    # DEV  

    col_status: str 
    col_substatus: str  
    customer_name: str 
    input_transform: str   
        
    def enrich_status_substatus():
        customer_name = transform.customer_name
        try:
            if customer_name=='sun':
                enrich_patient_journey_hierarchy_dict = Transform.enrich_patient_journey_hierarchy_sun()
                df_enrich_patient_journey_hierarchy = pd.DataFrame.from_dict(enrich_patient_journey_hierarchy_dict)
            elif customer_name=='bi':
                enrich_patient_journey_hierarchy_dict = Transform.enrich_patient_journey_hierarchy_bi()
                df_enrich_patient_journey_hierarchy = pd.DataFrame.from_dict(enrich_patient_journey_hierarchy_dict)
                pass
            elif customer_name=='alkermes':
                enrich_patient_journey_hierarchy_dict = Transform.enrich_patient_journey_hierarchy_alkermes()
                df_enrich_patient_journey_hierarchy = pd.DataFrame.from_dict(enrich_patient_journey_hierarchy_dict)
            else:
                logger.exception('expecting customer name as sun bi or alkermes')
                raise Exception('expecting customer name as sun bi or alkermes') 
        except Exception as e:
            go = False 
            logger.exception(f'exception:{e}')
            raise Exception(f'raise exception:{e}')    
        return df_enrich_patient_journey_hierarchy   
    
    def patient_journey_hierarchy_sun():
        # need to input/ define for ic-gold mapping
        # temporary until furture User story defines
        # IC - GOLD persistence solution
        # kept for consistency with the tranforms built
        # for prior sprint
        # This is for documentation at the moment!
        patient_journey_hierarchy_dict = {}
        patient_journey_hierarchy_dict[1]='BV/PA'
        patient_journey_hierarchy_dict[2]='FINANCIAL'
        patient_journey_hierarchy_dict[3]='FULFILLMENT'
        patient_journey_hierarchy_dict[4]='INTAKE'
        patient_journey_hierarchy_dict[5]='PATIENT'
        patient_journey_hierarchy_dict[6]='PAYER'
        patient_journey_hierarchy_dict[7]='PROVIDER'
        patient_journey_hierarchy_dict[8]='TRANSFERRED'     
        return patient_journey_hierarchy_dict 
    
    def patient_journey_hierarchy_alkermes():
        # need to input/ define for ic-gold mapping
        # temporary until furture User story defines
        # IC - GOLD persistence solution
        # kept for consistency with the tranforms built
        # for prior sprint
        # This is for documentation at the moment!
        patient_journey_hierarchy_dict = {}
        patient_journey_hierarchy_dict[1]='BV/PA'
        patient_journey_hierarchy_dict[2]='FINANCIAL'
        patient_journey_hierarchy_dict[3]='FULFILLMENT'
        patient_journey_hierarchy_dict[4]='INTAKE'
        patient_journey_hierarchy_dict[5]='PATIENT'
        patient_journey_hierarchy_dict[6]='PAYER'
        patient_journey_hierarchy_dict[7]='PBM'
        patient_journey_hierarchy_dict[8]='PROVIDER'
        patient_journey_hierarchy_dict[9]='TRANSFERRED'     
        return patient_journey_hierarchy_dict 
    
    def patient_journey_hierarchy_bi():
        # need to input/ define for ic-gold mapping
        # temporary until furture User story defines
        # IC - GOLD persistence solution
        # kept for consistency with the tranforms built
        # for prior sprint
        # This is for documentation at the moment!
        patient_journey_hierarchy_dict = {}
        return patient_journey_hierarchy_dict 
    
    
    def enrich_patient_journey_hierarchy_sun():
        enrich_patient_journey_hierarchy_dict = {
            transform.col_status:['ACTIVE','ACTIVE','ACTIVE','ACTIVE','ACTIVE','ACTIVE','ACTIVE',
                      'ACTIVE','ACTIVE','ACTIVE','CANCELLED','CANCELLED','CANCELLED',
                      'CANCELLED','CANCELLED','CANCELLED','CANCELLED','CANCELLED','CANCELLED',
                      'CANCELLED','CANCELLED','CANCELLED','CANCELLED','DENIED','DENIED','DENIED',
                      'DENIED','DENIED','DENIED','DISCONTINUED','DISCONTINUED','DISCONTINUED',
                      'DISCONTINUED','DISCONTINUED','DISCONTINUED','DISCONTINUED','DISCONTINUED',
                      'DISCONTINUED','DISCONTINUED','DISCONTINUED','DISCONTINUED','DISCONTINUED',
                      'DISCONTINUED','DISCONTINUED','DISCONTINUED','PENDING','PENDING','PENDING',
                      'PENDING','PENDING','PENDING','PENDING','PENDING','PENDING','PENDING','PENDING',
                      'PENDING','PENDING','PENDING','PENDING','PENDING','PENDING'],
            transform.col_substatus:['HOLD OTHER ','HOLD RTS','INSURANCE HOLD','MATERIAL','PATENT RESPONSE','PRESCRIBER','PT HOLD','READY','SHIPMENT','TREATMENT DELAY','ALT THERAPY','INSURANCE COPAY','INSURANCE DENIED','INSURANCE OON','INSURANCE OTHER','OTHER','PATIENT DECEASED','PATIENT END','PATIENT FINANCIAL','PATIENT RESPONSE','PRESCRIBER END','TRANSFER HUB','TRANSFER SP','DOSAGE','FORMULARY','OTHER','PA','QUANTITY','STEP EDIT','ALT THERAPY','INS OON ','INS OTHER','INSURANCE COPAY','INSURANCE DENIED','OTHER','PATIENT DECEASED','PATIENT END','PATIENT FINANCIAL','PATIENT RESPONSE','PRESCRIBER END','SERVICES END','THERAPY COMPLETE','THERAPY END','TRANSFER HUB','TRANSFER SP','APPEAL','BENEFITS','COPAY ASSISTANCE','DELAY','FOUNDATION','INFORMATION','INVENTORY HOLD','NEW','OTHER','PA','PATIENT  RESPONSE','PATIENT CONTACT','PATIENT FINANCIAL','PATIENT HOLD','PRESCRIBER','PRESCRIBER  HOLD','THERAPY HOLD'],'patient_journey_hierarchy':['FULFILLMENT','FULFILLMENT','PAYER','FULFILLMENT','PATIENT','PROVIDER','PATIENT','FULFILLMENT','FULFILLMENT','FULFILLMENT','PROVIDER','PAYER','PAYER','PAYER','PAYER','PROVIDER','PATIENT','PATIENT','PATIENT','PATIENT','PROVIDER','TRANSFERRED','TRANSFERRED','PAYER','PAYER','PAYER','PAYER','PAYER','PAYER','PROVIDER','PAYER','PAYER','PAYER','PAYER','PROVIDER','PATIENT','PATIENT','PATIENT','PATIENT','PROVIDER','PROVIDER','PATIENT','PROVIDER','TRANSFERRED','TRANSFERRED','BV/PA','BV/PA','FINANCIAL','FULFILLMENT','FINANCIAL','INTAKE','FULFILLMENT','INTAKE','FULFILLMENT','BV/PA','FULFILLMENT','FULFILLMENT','FINANCIAL','FULFILLMENT','FULFILLMENT','FULFILLMENT','FULFILLMENT']}
        return enrich_patient_journey_hierarchy_dict

    def enrich_patient_journey_hierarchy_alkermes():
        enrich_patient_journey_hierarchy_dict = {
            transform.col_status:[],
            transform.col_substatus:[]}
        return enrich_patient_journey_hierarchy_dict
    
    def enrich_patient_journey_hierarchy_bi():
        enrich_patient_journey_hierarchy_dict = {
            transform.col_status:[],
            transform.col_substatus:[]}
        return enrich_patient_journey_hierarchy_dict
    
    
    def enrich_patient_journey_hierarchy(self,df):
        try:        
            enriched_col_name = 'patient_journey_hierarchy'
            go = False # assume things are not working YET.
           
            dffail = pd.DataFrame() # initialize df for fails
            
            logger.info('try:')      
            
            # log metadata           
            # df in
            dfShape = df.shape
            logger.info(f'df in  shape: {dfShape[0]} {dfShape[1]}')
            logger.info(f'df in {df.head()}') 
            
            # are we expecting certain column names? YES 
            statusColNameExpected = transform.col_status
            substatusColNameExpected = transform.col_substatus
            
            logger.info(f'expecting column name  status   as:{statusColNameExpected}')
            logger.info(f'expecting column name  substatus as:{substatusColNameExpected}')
            columnNamesArr = df.columns.values.tolist()
            logger.info(f'df column names:{columnNamesArr}')
            
            if statusColNameExpected in columnNamesArr and substatusColNameExpected in columnNamesArr:
 
                # apply Upper Case to col(s) values of interest
                # apply strip  to col(s) values of interest
                # apply other cleanup for substaus
                df[statusColNameExpected]= df[statusColNameExpected].apply(lambda x: x.upper() if x is not None else x)   
                df[statusColNameExpected]= df[statusColNameExpected].apply(lambda x: x.strip() if x is not None else x)
                df[substatusColNameExpected]= df[substatusColNameExpected].apply(lambda x: x.upper() if x is not None else x)   
                df[substatusColNameExpected]= df[substatusColNameExpected].apply(lambda x: x.strip() if x is not None else x)
                df[substatusColNameExpected]= df[substatusColNameExpected].apply(lambda x: x.replace('_',' ').replace('\r', '').replace('\t', '').replace('\w', '') if x is not None else x)
                
                
                # enrich mapping
                df_enrich_patient_journey_hierarchy = pd.DataFrame()
                df_enrich_patient_journey_hierarchy = Transform.enrich_status_substatus()
                
                # Merge 
                # apply enrich selection for the columns of interest
                status_col = transform.col_status
                substatus_col = transform.col_substatus
                try:
                    df = pd.merge(df,df_enrich_patient_journey_hierarchy
                                  ,how='left',left_on=[status_col,substatus_col]
                                  ,right_on=[status_col,substatus_col],validate='m:1',indicator=True)
                except pd.errors.MergeError as e:
                    go = False
                    logger.exception(f'try merge exception:{e}')
                    raise Exception(str(e))
                
                #df.drop([status_col,substatus_col], axis=1,inplace = True)
                
                # create pass and fail dataframes                              
                # what fails             
                # _merge
                dffail = df.loc[df['_merge'] == 'left_only']
                #dffail.drop(['_merge','patient_journey_hierarchy'], axis=1,inplace = True)

                # what passes
                df = df.loc[df['_merge'] == 'both']
                #df.drop(['_merge'], axis=1,inplace = True)
                                              
                # meta data log for what comes out of the function pass and fail df
                dfOutSize = df.size
                dfOutShape = df.shape
                dffailSize = dffail.size
                dffailShape = dffail.shape
                logger.info(f'df in   shape: {dfShape[0]} {dfShape[1]}')                
                logger.info(f'df pass shape: {dfOutShape[0]} {dfOutShape[1]}')
                logger.info(f'df fail shape: {dffailShape[0]} {dffailShape[1]}')
                logger.info(f'df pass {df.head()}')
                logger.info(f'df fail {dffail.head()}')  
                go = True
            else:
                go = False  
                logger.exception('expecting column names for patient status substatus if/else exception raise')
                raise Exception('expecting column names for patient status substatus if/else exception raise')              
        except Exception as e:
            go = False  
            logger.exception(f'exception:{e}')
            raise Exception(str(e))
        else:
            pass
        finally:
            pass
        return df.copy(),dffail.copy(),go
                

transform = Transform()
logger = get_logger(f'core.transforms.{transform.state}.{transform.name}')

### *Please place your value assignments for development below!*
### <font color=pink>This cell will be turned off in production, Engineering will set to pull from the configuration</color>

In [108]:
## Please place your value assignments for development here!!
## This cell will be turned off in production and Engineering will set to pull from the configuration application instead
## For the last example, this could look like...
## transform.some_ratio = 0.6
## transform.site_name = "WALGREENS"

transform.customer_name = 'alkermes'
transform.col_status = 'status_code'
transform.col_substatus = 'sub_status'
transform.input_transform = '' # for DEV NA

#### FETCH DATA - TOUCH, CAREFULLY ####
#### <font color=pink>This cell will be turned off in production, as the input_contract will be handled by the pipeline</color> ####

In [104]:
logger.info("FETCH DATA CELL - TOUCH - This cell will be turned off in production, as the input_contract will be handled by the pipeline. ")

# for testing / development onsly
run_id = 3

if not input_branch:
    input_branch = BRANCH_NAME
input_contract = DatasetContract(branch=input_branch,
                                 state=input_state, 
                                 parent=input_pharma, 
                                 child=input_brand, 
                                 dataset=input_name)
run_filter = []
run_filter.append(dict(partition="__metadata_run_id", comparison="==", values=[run_id]))
# IF YOU HAVE PUBLISHED DATA MULTIPLE TIMES, uncomment the above line and change the int to the run_id to fetch.
# Otherwise, you will have duplicate values in your fetched dataset!

# bypass/comment out when unit testing individual parquet files
df = input_contract.fetch(filters=run_filter)



2019-08-29 18:15:31,815 - core.transforms.enrich.enrich_patient_journey_hierarchy - INFO - FETCH DATA CELL - TOUCH - This cell will be turned off in production, as the input_contract will be handled by the pipeline. 
2019-08-29 18:15:31,830 - core.dataset_contract.DatasetContract - INFO - Fetching dataframe from s3 location s3://ichain-dev/sun-extract-validation/sun/ilumya/ingest/symphony_health_association_ingest_column_mapping.
2019-08-29 18:15:31,850 - urllib3.util.retry - DEBUG - Converted retries value: False -> Retry(total=False, connect=None, read=None, redirect=0, status=None)
2019-08-29 18:15:31,851 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): ichain-dev.s3.amazonaws.com:443
2019-08-29 18:15:32,009 - urllib3.connectionpool - DEBUG - https://ichain-dev.s3.amazonaws.com:443 "GET /?prefix=sun-extract-validation%2Fsun%2Filumya%2Fingest%2Fsymphony_health_association_ingest_column_mapping&encoding-type=url HTTP/1.1" 200 None
2019-08-29 18:15:32,035 - urllib3

## *<font color=grey>unit test development only*</font>
*<font color=grey>The next **5** cells will be deleted in production.* </font>

In [72]:
# unit test/development only
# before shot unit testing only
dfSize = df.size
dfShape = df.shape
print('shape: {} {}'.format(dfShape[0],dfShape[1])) 

shape: 24457 72


In [73]:
# unit test/development only
# needed to see the col(s) of interest
pd.set_option('display.max_columns', 72)

In [95]:
## unit test/development only
df.head()

Unnamed: 0,rec_date,pharm_code,pharm_npi,transtype,pharm_transaction_id,trans_seq,ref_source,ref_date,program_id,pharmacy_id,pat_last_name,pat_first_name,pat_dob,pat_gender,pat_addr1,pat_addr2,pat_city,pat_state,pat_zip,dx1_code,dx2_code,status_date,status_code,sub_status,pres_last_name,pres_first_name,pres_addr1,pres_addr2,pres_city,pres_state,pres_zip,pres_phone,pres_npi,pres_dea,facility_name,rxdate,rxnumber,rxrefills,rxfill,refill_remaining,prev_disp,rx_ndc_number,medication,quantity,day_supply,ship_date,ship_carrier,shiptracking_num,ship_location,ship_address,ship_city,ship_state,ship_zip,has_medical,primary_coverage_type,primary_payer_name,primary_payer_type,secondary_coverage_type,secondary_payer_name,secondary_payer_type,plan_paid_amt,pat_copay,copay_assist_amount,oth_payer_amt,xfer_pharmname,msa_patient_id,msa_patient_bmap,__metadata_run_timestamp,__metadata_app_version,__metadata_output_contract,__metadata_transform_timestamp,__metadata_run_id
0,20181024115959,ACCREDO,1346208949,COM,279133432018102401,0,DIRECT,20181019120000,,27913343,,,,,,,,,0,L40.0,,20181024115959,CANCELLED,OTHER,,,,,,,99999,,,,,,,,,,,,ILUMYA,,,,,,,,,,,Y,MEDICAL,GENERAL DIRECT,COMMERCIAL,,,,,,,,,,NNNNV,2019-07-01 13:25:07,0.0.11,s3://ichain-dev/sun-extract-validation/sun/ilu...,2019-07-01 13:35:22,3
1,20181025115959,ACCREDO,1346208949,COM,278370982018102502,0,DIRECT,20181022120000,,27837098,,,,F,,,,,0,L40.0,,20181025115959,CANCELLED,INSURANCE OON,GREENBERG,ROBERT,5201 NORRIS CANYON RD,,SAN RAMON,CA,94583,9252771300.0,1639195316.0,BG0616043,,,,,,,,,ILUMYA,,,,,,,,,,,Y,MEDICAL,BROWN & TOLAND MEDICAL GRP,COMMERCIAL,,,,,,,,,,NNNVV,2019-07-01 13:25:07,0.0.11,s3://ichain-dev/sun-extract-validation/sun/ilu...,2019-07-01 13:35:22,3
2,20181029115959,ACCREDO,1346208949,COM,279181482018102903,0,DIRECT,20181024120000,,27918148,,,,M,,,,,0,L40.0,,20181029115959,CANCELLED,OTHER,SCIURBA,SALVATORE,111 WEST WATER ST,,TOMS RIVER,NJ,8753,7322444700.0,1093765307.0,,,,,,,,,,ILUMYA,,,,,,,,,,,Y,MEDICAL,GENERAL HORIZON BCBS NJ,COMMERCIAL,,,,,,,,,,NNNVV,2019-07-01 13:25:07,0.0.11,s3://ichain-dev/sun-extract-validation/sun/ilu...,2019-07-01 13:35:22,3
3,20181102115959,ACCREDO,1346208949,COM,267244982018110204,0,DIRECT,20181030120000,,26724498,,,,F,,,,,0,Q84,L40.0,20181102115959,CANCELLED,INSURANCE OON,KNUCKLES,MELISSA,1101 EAST MASTER STREET,,CORBIN,KY,40701,6065282881.0,1821074360.0,BK0531562,,,,,,,,,ILUMYA,,,,,,,,,,,Y,MEDICAL,ANTHEM BCBS OF KENTUCKY,MEDICARE,,,,,,,,,,NNNVV,2019-07-01 13:25:07,0.0.11,s3://ichain-dev/sun-extract-validation/sun/ilu...,2019-07-01 13:35:22,3
4,20181106115959,ACCREDO,1346208949,COM,160618142018110605,0,DIRECT,20181102120000,,16061814,,,,F,,,,,0,696.1,,20181106115959,CANCELLED,OTHER,KORY,MARK,16216 BAXTER ROAD,SUITE 200,CHESTERFIELD,MO,63017,6365321000.0,1326034489.0,BK1220045,,,,,,,,,ILUMYA,,,,,,,,,,,Y,MEDICAL,EXPRESS SCRIPTS,COMMERCIAL,,,,,,,,,,NNNVV,2019-07-01 13:25:07,0.0.11,s3://ichain-dev/sun-extract-validation/sun/ilu...,2019-07-01 13:35:22,3


#### FETCH DATA - TOUCH only if necessary, BUT CAREFULLY ####
#### <font color=red>This cell will be turned ON in production, as the input_contract will be handled by the pipeline</color> ####

In [None]:
### Retrieve current dataset from contract
from core.dataset_diff import DatasetDiff

diff = DatasetDiff(db_transform.id)
df = diff.get_diff(transform_name=transform.input_transform, values=[run_id])

# <font color=red>**CALL**</font> THE TRANSFORM


In [109]:
### Use the variables above to execute your transformation.
### the final output needs to be a variable named final_dataframe
logger.info("CALL THE TRANSFORM - execute your transformation - the final output needs to be a variable named final_dataframe")

final_dataframe, final_fail, go = transform.enrich_patient_journey_hierarchy(df)

if go==True:
    logger.info("CALL THE TRANSFORM -  go no go = GO")
elif go==False:
    logger.info("CALL THE TRANSFORM -  go no go = NO go")
else:
    go=False
    logger.info("CALL THE TRANSFORM -  go no go = unknown make it NO go")
        

2019-08-29 18:43:29,120 - core.transforms.enrich.enrich_patient_journey_hierarchy - INFO - CALL THE TRANSFORM - execute your transformation - the final output needs to be a variable named final_dataframe
2019-08-29 18:43:29,126 - core.transforms.enrich.enrich_patient_journey_hierarchy - INFO - try:
2019-08-29 18:43:29,134 - core.transforms.enrich.enrich_patient_journey_hierarchy - INFO - df in  shape: 24457 72
2019-08-29 18:43:29,189 - core.transforms.enrich.enrich_patient_journey_hierarchy - INFO - df in          rec_date pharm_code   pharm_npi transtype pharm_transaction_id  \
0  20181024115959    ACCREDO  1346208949       COM   279133432018102401   
1  20181025115959    ACCREDO  1346208949       COM   278370982018102502   
2  20181029115959    ACCREDO  1346208949       COM   279181482018102903   
3  20181102115959    ACCREDO  1346208949       COM   267244982018110204   
4  20181106115959    ACCREDO  1346208949       COM   160618142018110605   

  trans_seq ref_source        ref_date

### *<font color=grey>unittest python*</font>

In [85]:
# untit test/development only look at the fails
final_fail.head()

Unnamed: 0,rec_date,pharm_code,pharm_npi,transtype,pharm_transaction_id,trans_seq,ref_source,ref_date,program_id,pharmacy_id,pat_last_name,pat_first_name,pat_dob,pat_gender,pat_addr1,pat_addr2,pat_city,pat_state,pat_zip,dx1_code,dx2_code,status_date,status_code,sub_status,pres_last_name,pres_first_name,pres_addr1,pres_addr2,pres_city,pres_state,pres_zip,pres_phone,pres_npi,pres_dea,facility_name,rxdate,...,rxfill,refill_remaining,prev_disp,rx_ndc_number,medication,quantity,day_supply,ship_date,ship_carrier,shiptracking_num,ship_location,ship_address,ship_city,ship_state,ship_zip,has_medical,primary_coverage_type,primary_payer_name,primary_payer_type,secondary_coverage_type,secondary_payer_name,secondary_payer_type,plan_paid_amt,pat_copay,copay_assist_amount,oth_payer_amt,xfer_pharmname,msa_patient_id,msa_patient_bmap,__metadata_run_timestamp,__metadata_app_version,__metadata_output_contract,__metadata_transform_timestamp,__metadata_run_id,patient_journey_hierarchy,_merge
18,20181015115959,ACCREDO,1043309735,COM,2766827620181015162,0,DIRECT,20180419120000,,27668276,,,,M,,,,,0,C44.91,,20181015115959,DISCONTINUED,PATIENT RESPOSNE,RANA,FAUZIA,653 WEST 8TH ST,"3RD FL, FACULTY CLINIC",JACKSONVILLE,FL,32209,9043831021,1639138100,BR3610119,,20180329.0,...,,,,,ODOMZO,,,,,,,,,,,Y,MEDICAL,MOLINA HEALTHCARE OF FL - CAREMARK,MEDICAID,,,,,0.0,,,,,NNNVV,2019-07-01 13:25:07,0.0.11,s3://ichain-dev/sun-extract-validation/sun/ilu...,2019-07-01 13:35:22,3,,left_only
32,20181227115959,ACCREDO,1346208949,COM,280008962018122784,0,HUB,20181224120000,,28000896,,,,F,,,,,0,L40.9,,20181227115959,CANCELLED,PA,ITKIN,.ALEKSAND,7565 MISSION VALLEY,,SAN DIEGO,CA,92108,8587845767,1447344023,BI7460811,,,...,,,,,ILUMYA,,,,,,,,,,,Y,MEDICAL,PRIME THERAPEUTICS,COMMERCIAL,,,,,,,,,,NNNVV,2019-07-01 13:25:07,0.0.11,s3://ichain-dev/sun-extract-validation/sun/ilu...,2019-07-01 13:35:22,3,,left_only
38,20190122115959,ACCREDO,1346208949,COM,2800089620190122137,0,HUB,20181228120000,,28000896,,,,F,,,,,0,L40.9,,20190122115959,CANCELLED,PA,ITKIN,.ALEKSAND,7565 MISSION VALLEY,,SAN DIEGO,CA,92108,8587845767,1447344023,BI7460811,,,...,,,,,ILUMYA,,,,,,,,,,,Y,MEDICAL,PRIME THERAPEUTICS,COMMERCIAL,,,,,,,,,,NNNVV,2019-07-01 13:25:07,0.0.11,s3://ichain-dev/sun-extract-validation/sun/ilu...,2019-07-01 13:35:22,3,,left_only
90,20181217115959,ACCREDO,1346208949,COM,275741432018121758,0,HUB,20181205120000,,27574143,,,,M,,,,,0,,,20181217115959,CANCELLED,PA,WAAGE,RYANNE,201 S LLOYD ST STE E,# 206,ABERDEEN,SD,57401,6052260560,1871896431,MB2329678,,,...,,,,,ILUMYA,,,,,,,,,,,Y,MEDICAL,SANFORD HEALTH - ESI,COMMERCIAL,MEDICAL,ILUMYA COPAY ASSIST-RELAYHEALTH,CASH,,,,,,,NNNVV,2019-07-01 13:25:07,0.0.11,s3://ichain-dev/sun-extract-validation/sun/ilu...,2019-07-01 13:35:22,3,,left_only
136,20181216115959,ACCREDO,1346208949,COM,2741310020181216245,0,DIRECT,20180221120000,,27413100,,,,F,,,,,0,R69,,20181216115959,DISCONTINUED,PATIENT RESPOSNE,RANDALL,JOHN,124 SAGAMORE PKWY W,,WEST LAFAYETTE,IN,47906,7654636722,1326076530,BR1531385,,20180226.0,...,,,,,ODOMZO,,,,,,,,,,,N,PHARMACY,MEDCO HOME DELIVERY,COMMERCIAL,,,,,0.0,,,,,NNNVV,2019-07-01 13:25:07,0.0.11,s3://ichain-dev/sun-extract-validation/sun/ilu...,2019-07-01 13:35:22,3,,left_only


# **publish**
### Writing to S3
Invoke the `publish()` command to write to a given contract. Some things to know:
- To invoke publish a contract must be at the grain of dataset. This is because file names will be set by the dataframe=\>parquet conversion. 
- publish only accepts a pandas dataframe.
- publish does not allow for timedelta data types at this time (this is missing functionality in pyarrow).
- publish handles partitioning the data as per contract, creating file paths, and creating the binary parquet files in S3, as well as the needed metadata. <br>
**- by default, all datasets include a single partition, \_\_metadata\_run\_id, the RunEvent ID of an executed pipeline**

In [92]:

if go==True:
    logger.info("PUBLISH - that's it - its a GO - just provide the final dataframe to the var final_dataframe and we take it from there")
    transform.publish_contract.publish(final_dataframe, run_id, session)
elif go==False:
    logger.info("PUBLISH -  go no go = NO go -  so DONT publish")
else:
    go=False
    logger.info("PUBLISH -  go no go = unknown make it NO go - so DONT publish")    
session.close()

2019-08-29 18:02:20,854 - core.transforms.enrich.enrich_patient_journey_hierarchy - INFO - PUBLISH - that's it - its a GO - just provide the final dataframe to the var final_dataframe and we take it from there
2019-08-29 18:02:20,860 - core.dataset_contract.DatasetContract - INFO - Publishing dataframe to s3 location s3://ichain-dev/dc-676_merge_to_patient_journey_bucket_map_for_alkermes/sun/ilumya/enrich/enrich_patient_journey_hierarchy with run ID 3.
2019-08-29 18:02:20,865 - core.dataset_contract.DatasetContract - INFO - Setting environment variables to Core sandbox service account...
2019-08-29 18:02:20,944 - urllib3.util.retry - DEBUG - Converted retries value: False -> Retry(total=False, connect=None, read=None, redirect=0, status=None)
2019-08-29 18:02:20,946 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): secretsmanager.us-east-1.amazonaws.com:443
2019-08-29 18:02:21,077 - urllib3.connectionpool - DEBUG - https://secretsmanager.us-east-1.amazonaws.com:443 

  """)


2019-08-29 18:02:21,780 - s3parq.publish_parq - DEBUG - Schema data_core created. Creating table sun_ilumya_enrich_patient_journey_hierarchy...
2019-08-29 18:02:21,816 - s3parq.publish_redshift - DEBUG - Determining write metadata for publish...
2019-08-29 18:02:21,820 - s3parq.publish_redshift - DEBUG - Determining write metadata for publish...
2019-08-29 18:02:22,224 - s3parq.publish_redshift - INFO - Running query to create table: CREATE EXTERNAL TABLE data_core.sun_ilumya_enrich_patient_journey_hierarchy (rec_date VARCHAR, pharm_code VARCHAR, pharm_npi VARCHAR, transtype VARCHAR, pharm_transaction_id VARCHAR, trans_seq VARCHAR, ref_source VARCHAR, ref_date VARCHAR, program_id VARCHAR, pharmacy_id VARCHAR, pat_last_name VARCHAR, pat_first_name VARCHAR, pat_dob VARCHAR, pat_gender VARCHAR, pat_addr1 VARCHAR, pat_addr2 VARCHAR, pat_city VARCHAR, pat_state VARCHAR, pat_zip VARCHAR, dx1_code VARCHAR, dx2_code VARCHAR, status_date VARCHAR, status_code VARCHAR, sub_status VARCHAR, pres_la

***