# CORE Cartridge Notebook::[transform name here]
![CORE Logo](assets/coreLogo.png) 

---
## Keep in Mind
Good Transforms Are...
- **singular in purpose:** good transforms do one and only one thing, and handle all known cases for that thing. 
- **repeatable:** transforms should be written in a way that they can be run against the same dataset an infinate number of times and get the same result every time. 
- **easy to read:** 99 times out of 100, readable, clear code that runs a little slower is more valuable than a mess that runs quickly. 
- **No 'magic numbers':** if a variable or function is not instantly obvious as to what it is or does, without context, maybe consider renaming it.

## Workflow - how to use this notebook to make science
#### Data Science
1. **Document your transform.** Fill out the _description_ cell below describing what it is this transform does; this will appear in the configuration application where Ops will create, configure and update pipelines. 
1. **Define your config object.** Fill out the _configuration_ cell below the commented-out guide to define the variables you want ops to set in the configuration application (these will populate here for every pipeline). 
2. **Build your transformation logic.** Use the transformation cell to do that magic that you do. 
![caution](assets/cautionTape.png)

### Description
What does this transformation do? be specific.

![what does your transform do](assets/what.gif)

(clear out and replace with your description)

### Configuration

In [3]:
from core.helpers.session_helper import SessionHelper
session = SessionHelper().session

2019-09-04 20:47:16,593 - core.helpers.session_helper.SessionHelper - INFO - Creating session for dev environment...
2019-09-04 20:47:16,637 - core.helpers.configuration_mocker.ConfigurationMocker - DEBUG - Generating administrator mocks.
2019-09-04 20:47:16,717 - core.helpers.configuration_mocker.ConfigurationMocker - DEBUG - Done generating administrator mocks.
2019-09-04 20:47:16,718 - core.helpers.configuration_mocker.ConfigurationMocker - DEBUG - Generating pharmaceutical company mocks.
2019-09-04 20:47:16,724 - core.helpers.configuration_mocker.ConfigurationMocker - DEBUG - Done generating pharmaceutical company mocks.
2019-09-04 20:47:16,726 - core.helpers.configuration_mocker.ConfigurationMocker - DEBUG - Generating brand mocks.
2019-09-04 20:47:16,730 - core.helpers.configuration_mocker.ConfigurationMocker - DEBUG - Done generating brand mocks.
2019-09-04 20:47:16,733 - core.helpers.configuration_mocker.ConfigurationMocker - DEBUG - Generating segment mocks.
2019-09-04 20:47:1

In [15]:
"""
************ CONFIGURATION - PLEASE TOUCH **************
Pipeline Builder configuration: creates configurations from variables specified here!!
This cell will be off in production as configurations will come from the configuration postgres DB.
"""
# config vars: this dataset
config_pharma = "pharma" # the pharmaceutical company which owns {brand}
config_brand = "brand" # the brand this pipeline operates on
config_state = "raw" # the state this transform runs in
config_name = "Template" # the name of this transform, which is the name of this notebook without .ipynb

# input vars: dataset to fetch. Recall that a contract published to S3 has a key format branch/pharma/brand/state/name
#input_pharma = "alkermes"
#input_brand = "vivitrol"
#input_state = "ingest"
#input_name = "patient_status_standardize_numbers"
#input_branch = "ds-321" # if None, input_branch is automagically set to your working branch

input_pharma = "sun"
input_brand = "ilumya"
input_state = "ingest"
input_name = "patient_status_ingest_column_mapping"
input_branch = "longitudal-id" # if None, input_branch is automagically set to your working branch

In [16]:
"""
************ SETUP - DON'T TOUCH **************
Populating config mocker based on config parameters...
"""
import core.helpers.pipeline_builder as builder

ids = builder.build(config_pharma, config_brand, config_state, config_name, session)
transform_id = ids[0]
run_id = ids[1]

2019-09-04 20:56:22,048 - core.logging - DEBUG - Adding/getting mocks for specified configurations...
2019-09-04 20:56:22,075 - core.logging - DEBUG - Done. Creating mock run event and committing results to configuration mocker.


In [17]:
"""
************ SETUP - DON'T TOUCH **************
This section imports data from the configuration database
and should not need to be altered or otherwise messed with. 
~~These are not the droids you are looking for~~
"""
from core.constants import BRANCH_NAME, ENV_BUCKET
from core.helpers.session_helper import SessionHelper
from core.models.configuration import Transformation
from dataclasses import dataclass
from core.dataset_contract import DatasetContract

db_transform = session.query(Transformation).filter(Transformation.id == transform_id).one()

@dataclass
class DbTransform:
    id: int = db_transform.id ## the instance id of the transform in the config app
    name: str = db_transform.transformation_template.name ## the transform name in the config app
    state: str = db_transform.pipeline_state.pipeline_state_type.name ## the pipeline state, one of raw, ingest, master, enhance, enrich, metrics, dimensional
    branch:str = BRANCH_NAME ## the git branch for this execution 
    brand: str = db_transform.pipeline_state.pipeline.brand.name ## the pharma brand name
    pharmaceutical_company: str = db_transform.pipeline_state.pipeline.brand.pharmaceutical_company.name # the pharma company name
    publish_contract: DatasetContract = DatasetContract(branch=BRANCH_NAME,
                            state=db_transform.pipeline_state.pipeline_state_type.name,
                            parent=db_transform.pipeline_state.pipeline.brand.pharmaceutical_company.name,
                            child=db_transform.pipeline_state.pipeline.brand.name,
                            dataset=db_transform.transformation_template.name)


In [18]:
""" 
********* VARIABLES - PLEASE TOUCH ********* 
This section defines what you expect to get from the configuration application 
in a single "transform" object. Define the vars you need here, and comment inline to the right of them 
for all-in-one documentation. 
Engineering will build a production "transform" object for every pipeline that matches what you define here.

@@@ FORMAT OF THE DATA CLASS IS: @@@ 

<variable_name>: <data_type> #<comment explaining what the value is to future us>

e.g.

class Transform(DbTransform):
    some_ratio: float
    site_name: str

~~These ARE the droids you are looking for~~
"""

class Transform(DbTransform):
    '''
    YOUR properties go here!!
    Variable properties should be assigned to the exact name of
    the transformation as it appears in the Jupyter notebook filename.
    '''

transform = Transform()

In [19]:
## Please place your value assignments for development here!!
## This cell will be turned off in production and Engineering will set to pull from the configuration application instead
## For the last example, this could look like...
## transform.some_ratio = 0.6
## transform.site_name = "WALGREENS"

### Transformation

In [20]:
"""
************ FETCH DATA - TOUCH, BUT CAREFULLY **************
This cell will be turned off in production, as the input_contract will be handled by the pipeline.
"""

if not input_branch:
    input_branch = BRANCH_NAME
input_contract = DatasetContract(branch=input_branch, state=input_state, parent=input_pharma, child=input_brand, dataset=input_name)
run_filter = []
# run_filter.append(dict(partition="run_id", comparison="==", values=[1]))
# IF YOU HAVE PUBLISHED DATA MULTIPLE TIMES, uncomment the above line and change the int to the run_id to fetch.
# Otherwise, you will have duplicate values in your fetched dataset!
sun = input_contract.fetch(filters=run_filter)

2019-09-04 20:56:25,293 - core.dataset_contract.DatasetContract - INFO - Fetching dataframe from s3 location s3://ichain-dev/longitudal-id/sun/ilumya/ingest/patient_status_ingest_column_mapping.


of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  return pd.concat(temp_frames)


In [21]:
import numpy as np
import pandas as pd

In [22]:
pd.set_option('display.max_columns', 150)
pd.set_option('display.max_rows', 500)

In [23]:
sun.head()

Unnamed: 0,__metadata_app_version,__metadata_output_contract,__metadata_run_id,__metadata_run_timestamp,__metadata_transform_timestamp,aggregator_transaction_id,brand,bridge_patient,bridge_quantity_dispensed,bridge_quantity_dispensed_2,copay_as_amount,customer_status,customer_status_description,customer_substatus,days_supply,dose_count,dose_exchange_count,dose_exchange_flag,dose_titration_count,dose_titration_quantity,dx_1,dx_2,enroll_received_date,fitness_for_duty_request_flag,fitness_for_duty_ship_date,has_medical_coverage_flag,hcp_address_1,hcp_address_2,hcp_city,hcp_dea_number,hcp_facility,hcp_first_name,hcp_last_name,hcp_middle_name,hcp_npi,hcp_phone,hcp_specialty,hcp_state,hcp_state_license_number,hcp_suffix,hcp_zip,hub_patient,hub_patient_id,longitudinal_patient_id,medication,ndc,other_payer_amount,oxygen_flag,patient_consent_date,patient_dob,patient_gender,patient_oop_program_name,patient_state,patient_support_1,patient_support_2,patient_zip,pharmacy_address_1,pharmacy_address_2,pharmacy_city,pharmacy_code,pharmacy_dea_number,pharmacy_hin,pharmacy_name,pharmacy_ncpdp,pharmacy_npi,pharmacy_parent_name,pharmacy_patient_id,pharmacy_state,pharmacy_transaction_id,pharmacy_zip,prev_dispensed,primary_coins,primary_copay,primary_cost_amount,primary_cost_type,primary_coverage_type,primary_deductible,primary_patient_responsibility,primary_payer,primary_payer_bin,primary_payer_group,primary_payer_iin,primary_payer_pcn,primary_payer_subtype,primary_payer_type,primary_pbm_name,primary_plan,primary_plan_paid,primary_plan_type,primary_prior_auth_expiration_date,primary_prior_auth_required_flag,prior_therapy_name,quantity_dispensed,referral_date,referral_number,referral_source,restatement_flag,rx_date,rx_fill_number,rx_fills,rx_number,rx_refills_remaining,secondary_coins,secondary_copay,secondary_coverage_type,secondary_deductible,secondary_patient_responsibility,secondary_payer,secondary_payer_bin,secondary_payer_flag,secondary_payer_group,secondary_payer_iin,secondary_payer_pcn,secondary_payer_subtype,secondary_payer_type,secondary_plan,secondary_plan_paid,secondary_plan_type,ship_address_1,ship_address_2,ship_carrier,ship_city,ship_date,ship_location,ship_state,ship_tracking_id,ship_zip,status,status_date,substatus,transaction_date,transaction_sequence,transaction_type,transfer_pharmacy,triage_date,uom_dispensed
0,0.0.11,s3://ichain-dev/longitudal-id/sun/ilumya/inges...,1,2019-08-19 19:48:40,2019-08-19 19:52:24,,,,,,,CANCELLED,Patient End,,28,,,,,,L40.0,,,,,N,9015 US HIGHWAY 301 N,,PARRISH,,,BRETT,BLAKE,,1376772707,9417761577,,FL,,,34219,,,,ILUMYA 100MG/ML PFS INJ,47335017795,,,,,F,,,,,34,,,,,,,BRV,,1083045140,,406800804,,BRIOVARX_20190819_144786231,,,,,,,PHARMACY,,,UHC C AND S,,,,,,MEDICARE D,,,,,,,,1,2019080609:35:15,,DIRECT,,20190422,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2019081821:52:26,,2019081907:06:51,0,COM,,,
1,0.0.11,s3://ichain-dev/longitudal-id/sun/ilumya/inges...,1,2019-08-19 19:48:40,2019-08-19 19:52:24,,,,,,,CANCELLED,Patient End,,84,,,,,,L40.0,,,,,N,9015 US HIGHWAY 301 N,,PARRISH,,,BRETT,BLAKE,,1376772707,9417761577,,FL,,,34219,,,,ILUMYA 100MG/ML PFS INJ,47335017795,,,,,F,,,,,34,,,,,,,BRV,,1083045140,,406800804,,BRIOVARX_20190819_144786233,,,,,,,PHARMACY,,,UHC C AND S,,,,,,MEDICARE D,,,,,,,,1,2019080609:35:15,,DIRECT,,20190422,0,5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2019081821:52:26,,2019081907:06:51,0,COM,,,
2,0.0.11,s3://ichain-dev/longitudal-id/sun/ilumya/inges...,1,2019-08-19 19:48:40,2019-08-19 19:52:24,,,,,,,CANCELLED,Patient End,,84,,,,,,L40.0,,,,,N,9015 US HIGHWAY 301 N,,PARRISH,,,BRETT,BLAKE,,1376772707,9417761577,,FL,,,34219,,,,ILUMYA 100MG/ML PFS INJ,47335017795,,,,,F,,,,,34,,,,,,,BRV,,1083045140,,406800804,,BRIOVARX_20190819_144786232,,,,,,,PHARMACY,,,UHC C AND S,,,,,,MEDICARE D,,,,,,,,1,2019080609:35:15,,DIRECT,,20190422,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2019081821:52:26,,2019081907:06:51,0,COM,,,
3,0.0.11,s3://ichain-dev/longitudal-id/sun/ilumya/inges...,1,2019-08-19 19:48:40,2019-08-19 19:52:24,,,,,,,CANCELLED,Patient End,,28,,,,,,L40.0,,,,,N,9015 US HIGHWAY 301 N,,PARRISH,,,BRETT,BLAKE,,1376772707,9417761577,,FL,,,34219,,,,ILUMYA 100MG/ML PFS INJ,47335017795,,,,,F,,,,,34,,,,,,,BRV,,1083045140,,406800804,,BRIOVARX_20190819_144786231,,,,,,,PHARMACY,,,UHC C AND S,,,,,,MEDICARE D,,,,,,,,1,2019080609:35:15,,DIRECT,,20190422,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2019081821:52:26,,2019081907:06:51,0,COM,,,
4,0.0.11,s3://ichain-dev/longitudal-id/sun/ilumya/inges...,1,2019-08-19 19:48:40,2019-08-19 19:52:24,,,,,,,CANCELLED,Patient End,,84,,,,,,L40.0,,,,,N,9015 US HIGHWAY 301 N,,PARRISH,,,BRETT,BLAKE,,1376772707,9417761577,,FL,,,34219,,,,ILUMYA 100MG/ML PFS INJ,47335017795,,,,,F,,,,,34,,,,,,,BRV,,1083045140,,406800804,,BRIOVARX_20190819_144786233,,,,,,,PHARMACY,,,UHC C AND S,,,,,,MEDICARE D,,,,,,,,1,2019080609:35:15,,DIRECT,,20190422,0,5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2019081821:52:26,,2019081907:06:51,0,COM,,,


In [None]:
### Use the variables above to execute your transformation. the final output needs to be a variable named final_dataframe

### Publish

In [None]:
## that's it - just provide the final dataframe to the var final_dataframe and we take it from there
transform.publish_contract.publish(final_dataframe, run_id, session)
session.close()