# CORE Cartridge Notebook :: Dispense Ingest Column Mapping
![CORE Logo](assets/coreLogo.png) 

---
## Keep in Mind
Good Transforms Are...
- **singular in purpose:** good transforms do one and only one thing, and handle all known cases for that thing. 
- **repeatable:** transforms should be written in a way that they can be run against the same dataset an infinate number of times and get the same result every time. 
- **easy to read:** 99 times out of 100, readable, clear code that runs a little slower is more valuable than a mess that runs quickly. 
- **No 'magic numbers':** if a variable or function is not instantly obvious as to what it is or does, without context, maybe consider renaming it.

## Workflow - how to use this notebook to make science
#### Data Science
1. **Document your transform.** Fill out the _description_ cell below describing what it is this transform does; this will appear in the configuration application where Ops will create, configure and update pipelines. 
1. **Define your config object.** Fill out the _configuration_ cell below the commented-out guide to define the variables you want ops to set in the configuration application (these will populate here for every pipeline). 
2. **Build your transformation logic.** Use the transformation cell to do that magic that you do. 
![caution](assets/cautionTape.png)

### Description
What does this transformation do? be specific.

![what does your transform do](assets/what.gif)

This transform takes dispense data and maps it to our predetermined schema.

### Configuration

In [None]:
from core.helpers.session_helper import SessionHelper
session = SessionHelper().session

In [None]:
"""
************ SETUP - DON'T TOUCH **************
This section imports data from the configuration database
and should not need to be altered or otherwise messed with. 
~~These are not the droids you are looking for~~
"""
from core.constants import BRANCH_NAME, ENV_BUCKET
from core.helpers.session_helper import SessionHelper
from core.models.configuration import Transformation
from dataclasses import dataclass
from core.dataset_contract import DatasetContract

db_transform = session.query(Transformation).filter(Transformation.id == transform_id).one()

@dataclass
class DbTransform:
    id: int = db_transform.id ## the instance id of the transform in the config app
    name: str = db_transform.transformation_template.name ## the transform name in the config app
    state: str = db_transform.pipeline_state.pipeline_state_type.name ## the pipeline state, one of raw, ingest, master, enhance, enrich, metrics, dimensional
    branch:str = BRANCH_NAME ## the git branch for this execution 
    brand: str = db_transform.pipeline_state.pipeline.brand.name ## the pharma brand name
    pharmaceutical_company: str = db_transform.pipeline_state.pipeline.brand.pharmaceutical_company.name # the pharma company name
    publish_contract: DatasetContract = DatasetContract(branch=BRANCH_NAME,
                            state=db_transform.pipeline_state.pipeline_state_type.name,
                            parent=db_transform.pipeline_state.pipeline.brand.pharmaceutical_company.name,
                            child=db_transform.pipeline_state.pipeline.brand.name,
                            dataset=db_transform.transformation_template.name)


In [None]:
""" 
********* VARIABLES - PLEASE TOUCH ********* 
This section defines what you expect to get from the configuration application 
in a single "transform" object. Define the vars you need here, and comment inline to the right of them 
for all-in-one documentation. 
Engineering will build a production "transform" object for every pipeline that matches what you define here.

@@@ FORMAT OF THE DATA CLASS IS: @@@ 

<variable_name>: <data_type> #<comment explaining what the value is to future us>

e.g.

class Transform(DbTransform):
    some_ratio: float
    site_name: str

~~These ARE the droids you are looking for~~
"""

class Transform(DbTransform):
    '''
    YOUR properties go here!!
    Variable properties should be assigned to the exact name of
    the transformation as it appears in the Jupyter notebook filename.
    '''
    input_transform: str = db_transform.variables.input_transform # The name of the dataset to pull from
    input_source_file_prefix: str = db_transform.variables.input_source_file_prefix # If from initial ingest, the file prefix name
    transaction_date: str = db_transform.variables.transaction_date # Date and time record was generated in SP system
    pharmacy_code: str = db_transform.variables.pharmacy_code # Unique identifier for submitting pharmacy.
    pharmacy_npi: str = db_transform.variables.pharmacy_npi # NPI of reporting/dispensing pharmacy
    pharmacy_hin: str = db_transform.variables.pharmacy_hin # HIN of reporting pharmacy
    pharmacy_name: str = db_transform.variables.pharmacy_name # Name of reporting pharmacy
    pharmacy_ncpdp: str = db_transform.variables.pharmacy_ncpdp # NCPDP of reporting pharmacy
    pharmacy_address_1: str = db_transform.variables.pharmacy_address_1 # Address of reporting pharmacy
    pharmacy_address_2: str = db_transform.variables.pharmacy_address_2 # Address of reporting pharmacy
    pharmacy_city: str = db_transform.variables.pharmacy_city # City of reporting pharmacy
    pharmacy_state: str = db_transform.variables.pharmacy_state # State of reporting pharmacy
    pharmacy_zip: str = db_transform.variables.pharmacy_zip # Zip code of reporting pharmacy
    transaction_type: str = db_transform.variables.transaction_type # Code set indicating activity category for transaction within SP system.
    pharmacy_transaction_id: str = db_transform.variables.pharmacy_transaction_id # Value that can uniquely identify the transaction in the SP's system.
    transaction_id: str = db_transform.variables.transaction_id # Internal transaction ID 
    transaction_sequence: str = db_transform.variables.transaction_sequence # If the transaction was previously reported, increment by one for each restatement.
    referral_source: str = db_transform.variables.referral_source # Source of where the patient was referred to SP
    referral_date: str = db_transform.variables.referral_date # Date and time the patient referral was received at SP and entered into system.
    longitudinal_patient_id: str = db_transform.variables.longitudinal_patient_id # Unique tokenized ID used to identify patient across all datasets
    pharmacy_patient_id: str = db_transform.variables.pharmacy_patient_id # Patient's encrypted internal ID at reporting/dispensing pharmacy
    patient_dob: str = db_transform.variables.patient_dob # Patient Date of Birth
    hub_patient_id: str = db_transform.variables.hub_patient_id # Patient's ID set by HUB program
    bridge_patient: str = db_transform.variables.bridge_patient # Flag to indicate if patient was referred Bridge
    hub_patient: str = db_transform.variables.hub_patient # Flag to indicate if patient was referred to Hub
    patient_state: str = db_transform.variables.patient_state # Patient State
    patient_zip: str = db_transform.variables.patient_zip # Patient Zip Code
    patient_gender: str = db_transform.variables.patient_gender # Patient Gender
    dx_1: str = db_transform.variables.dx_1 # Primary ICD-10 Diagnostic Code
    dx_2: str = db_transform.variables.dx_2 # Secondary ICD-10 Dx Code
    status_date: str = db_transform.variables.status_date # Date and time that patient was assigned status / substatus combination
    status: str = db_transform.variables.status # Status Code
    substatus: str = db_transform.variables.substatus # Sub-Status Code
    customer_status: str = db_transform.variables.customer_status # Status provided by customer
    customer_substatus: str = db_transform.variables.customer_substatus # Sub Status provided by customer
    customer_status_description: str = db_transform.variables.customer_status_description # Description of Customer Sub Status
    hcp_last_name: str = db_transform.variables.hcp_last_name # Prescriber Last Name
    hcp_first_name: str = db_transform.variables.hcp_first_name # Prescriber First Name
    hcp_address_1: str = db_transform.variables.hcp_address_1 # Prescriber Address Line 1
    hcp_address_2: str = db_transform.variables.hcp_address_2 # Prescriber Address Line 2
    hcp_city: str = db_transform.variables.hcp_city # Prescriber City
    hcp_state: str = db_transform.variables.hcp_state # Prescriber State
    hcp_zip: str = db_transform.variables.hcp_zip # Prescriber Zip Code
    hcp_phone: str = db_transform.variables.hcp_phone # Prescriber Phone Number
    hcp_specialty: str = db_transform.variables.hcp_specialty # Prescriber Specialty
    hcp_npi: str = db_transform.variables.hcp_npi # Prescriber NPI
    hcp_dea_number: str = db_transform.variables.hcp_dea_number # Prescriber DEA
    hcp_facility: str = db_transform.variables.hcp_facility # Prescriber Clinic / Practice Name
    rx_date: str = db_transform.variables.rx_date # Date Rx was written by prescriber
    rx_number: str = db_transform.variables.rx_number # Encrypted Rx Number assigned by pharmacy
    rx_fills: str = db_transform.variables.rx_fills # Number of refills allowed on original Rx
    rx_fill_number: str = db_transform.variables.rx_fill_number # Fill Number for Rx
    rx_refills_remaining: str = db_transform.variables.rx_refills_remaining # Number of refills remaining on Rx
    prev_dispensed: str = db_transform.variables.prev_dispensed # Number of prior dispenses on brand
    ndc: str = db_transform.variables.ndc # NDC-11 Number on original RX
    brand: str = db_transform.variables.brand # Product brand represented by NDC
    medication: str = db_transform.variables.medication # Product name represented by NDC
    quantity_dispensed: str = db_transform.variables.quantity_dispensed # Amount of product authorized or dispensed
    uom_dispensed: str = db_transform.variables.uom_dispensed # Unit of Measure for quantity of product dispensed
    days_supply: str = db_transform.variables.days_supply # Number of therapy days in product supply dispensed.
    ship_date: str = db_transform.variables.ship_date # Date and time product or educational materials were shipped to patient
    ship_carrier: str = db_transform.variables.ship_carrier # Name of delivery courier.
    ship_tracking_id: str = db_transform.variables.ship_tracking_id # Tracking Number of shipment
    ship_location: str = db_transform.variables.ship_location # Indicator of delivery address.
    ship_address_1: str = db_transform.variables.ship_address_1 # Delivery Address Line 1
    ship_address_2: str = db_transform.variables.ship_address_2 # Delivery Address Line 2
    ship_city: str = db_transform.variables.ship_city # Delivery City
    ship_state: str = db_transform.variables.ship_state # Delivery State
    ship_zip: str = db_transform.variables.ship_zip # Delivery Zip Code
    has_medical_coverage_flag: str = db_transform.variables.has_medical_coverage_flag # Flag indicating if patient has medical coverage.
    primary_coverage_type: str = db_transform.variables.primary_coverage_type # Type of primary benefit coverage
    primary_payer: str = db_transform.variables.primary_payer # Name of primary payer for patient
    primary_payer_type: str = db_transform.variables.primary_payer_type # Primary Payer Type
    primary_payer_subtype: str = db_transform.variables.primary_payer_subtype # Primary Payer Sub-Type
    primary_payer_group: str = db_transform.variables.primary_payer_group # Primary Payer Group Identifier
    primary_payer_bin: str = db_transform.variables.primary_payer_bin # Primary Payer Bank Identification Number
    primary_payer_iin: str = db_transform.variables.primary_payer_iin # Primary Payer Issuer Identification Number
    primary_payer_pcn: str = db_transform.variables.primary_payer_pcn # Primary Payer Processor Control Number
    primary_plan: str = db_transform.variables.primary_plan # Primary Plan Name
    primary_plan_type: str = db_transform.variables.primary_plan_type # Primary Plan Type
    secondary_coverage_type: str = db_transform.variables.secondary_coverage_type # Type of secondary benefit coverage
    secondary_payer_name: str = db_transform.variables.secondary_payer_name # Name of secondary payer for patient
    secondary_payer_type: str = db_transform.variables.secondary_payer_type # Secondary Payer Type
    secondary_payer_subtype: str = db_transform.variables.secondary_payer_subtype # Secondary Payer Sub-Type
    secondary_payer_group: str = db_transform.variables.secondary_payer_group # Secondary Payer Group Identifier
    secondary_payer_bin: str = db_transform.variables.secondary_payer_bin # Secondary Payer Bank Identification Number
    secondary_payer_iin: str = db_transform.variables.secondary_payer_iin # Secondary Payer Issuer Identification Number
    secondary_payer_pcn: str = db_transform.variables.secondary_payer_pcn # Secondary Payer Processor Control Number
    secondary_plan: str = db_transform.variables.secondary_plan # Secondary Payer Plan Name
    secondary_plan_type: str = db_transform.variables.secondary_plan_type # Secondary Payer Plan Type
    primary_plan_paid: str = db_transform.variables.primary_plan_paid # Amount paid by Primary Payer / PBM to Pharmacy
    secondary_plan_paid: str = db_transform.variables.secondary_plan_paid # Amount paid by Secondary Payer / PBM to Pharmacy
    primary_copay: str = db_transform.variables.primary_copay # Patient copay related to primary payer
    primary_coins: str = db_transform.variables.primary_coins # Patient coinsurance related to primary payer
    primary_deductible: str = db_transform.variables.primary_deductible # Patient deductible related to primary payer
    primary_patient_responsibility: str = db_transform.variables.primary_patient_responsibility # Total patient responsibility dollar amount (Copay, Coinsurance, and/or Deductible) set by Primary Payer
    secondary_copay: str = db_transform.variables.secondary_copay # Patient copay related to secondary payer
    secondary_coins: str = db_transform.variables.secondary_coins # Patient coinsurance related to secondary payer
    secondary_deductible: str = db_transform.variables.secondary_deductible # Patient deductible related to secondary payer
    secondary_patient_responsibility: str = db_transform.variables.secondary_patient_responsibility # Total patient responsibility dollar amount (Copay, Coinsurance, and/or Deductible) set by Secondary Payer
    copay_as_amount: str = db_transform.variables.copay_as_amount # Total amount received by pharmacy from copay assistance program.
    other_payer_amount: str = db_transform.variables.other_payer_amount # Total amount paid by additional payers including amounts listed in CopayAssistAmount
    primary_pbm: str = db_transform.variables.primary_pbm # Name of Primary PBM e.g. Cigna, Express Scripts, etc. Pharmacy benefit manager (PBM) is a third-party administrator of prescription drug programs for commercial health plans, self-insured employer plans, Medicare Part D plans, the Federal Employees Health Benefits Program, and state government employee plans.
    hcp_middle_name: str = db_transform.variables.hcp_middle_name # Prescriber middle name
    hcp_suffix: str = db_transform.variables.hcp_suffix # Prescriber suffix e.g. Jr / Sr/ etc.
    pharmacy_dea_number: str = db_transform.variables.pharmacy_dea_number # Pharmacy DEA number/ID
    aggregator_ship_id: str = db_transform.variables.aggregator_ship_id # Aggregator (e.g. liquidhub, mckesson, symphony, etc) shipment ID
    referral_number: str = db_transform.variables.referral_number # Referral number / ID
    primary_cost_type: str = db_transform.variables.primary_cost_type # Primary benefit cost share type
    primary_cost_amount: str = db_transform.variables.primary_cost_amount # Primary benefit cost share amount
    primary_payer_prior_auth_expiration_date: str = db_transform.variables.primary_payer_prior_auth_expiration_date # Primary Payer Prior Authorization (PA) exp. date
    copay_card_used_flag: str = db_transform.variables.copay_card_used_flag # Alkermes - this is a Y/N flag for copay card used

transform = Transform()

In [None]:
column_renames = {
    'transaction_date' : transform.transaction_date,
    'pharmacy_code' : transform.pharmacy_code,
    'pharmacy_npi' : transform.pharmacy_npi,
    'pharmacy_hin' : transform.pharmacy_hin,
    'pharmacy_name' : transform.pharmacy_name,
    'pharmacy_ncpdp' : transform.pharmacy_ncpdp,
    'pharmacy_address_1' : transform.pharmacy_address_1,
    'pharmacy_address_2' : transform.pharmacy_address_2,
    'pharmacy_city' : transform.pharmacy_city,
    'pharmacy_state' : transform.pharmacy_state,
    'pharmacy_zip' : transform.pharmacy_zip,
    'transaction_type' : transform.transaction_type,
    'pharmacy_transaction_id' : transform.pharmacy_transaction_id,
    'transaction_id' : transform.transaction_id,
    'transaction_sequence' : transform.transaction_sequence,
    'referral_source' : transform.referral_source,
    'referral_date' : transform.referral_date,
    'longitudinal_patient_id' : transform.longitudinal_patient_id,
    'pharmacy_patient_id' : transform.pharmacy_patient_id,
    'patient_dob' : transform.patient_dob,
    'hub_patient_id' : transform.hub_patient_id,
    'bridge_patient' : transform.bridge_patient,
    'hub_patient' : transform.hub_patient,
    'patient_state' : transform.patient_state,
    'patient_zip' : transform.patient_zip,
    'patient_gender' : transform.patient_gender,
    'dx_1' : transform.dx_1,
    'dx_2' : transform.dx_2,
    'status_date' : transform.status_date,
    'status' : transform.status,
    'substatus' : transform.substatus,
    'customer_status' : transform.customer_status,
    'customer_substatus' : transform.customer_substatus,
    'customer_status_description' : transform.customer_status_description,
    'hcp_last_name' : transform.hcp_last_name,
    'hcp_first_name' : transform.hcp_first_name,
    'hcp_address_1' : transform.hcp_address_1,
    'hcp_address_2' : transform.hcp_address_2,
    'hcp_city' : transform.hcp_city,
    'hcp_state' : transform.hcp_state,
    'hcp_zip' : transform.hcp_zip,
    'hcp_phone' : transform.hcp_phone,
    'hcp_specialty' : transform.hcp_specialty,
    'hcp_npi' : transform.hcp_npi,
    'hcp_dea_number' : transform.hcp_dea_number,
    'hcp_facility' : transform.hcp_facility,
    'rx_date' : transform.rx_date,
    'rx_number' : transform.rx_number,
    'rx_fills' : transform.rx_fills,
    'rx_fill_number' : transform.rx_fill_number,
    'rx_refills_remaining' : transform.rx_refills_remaining,
    'prev_dispensed' : transform.prev_dispensed,
    'ndc' : transform.ndc,
    'brand' : transform.brand,
    'medication' : transform.medication,
    'quantity_dispensed' : transform.quantity_dispensed,
    'uom_dispensed' : transform.uom_dispensed,
    'days_supply' : transform.days_supply,
    'ship_date' : transform.ship_date,
    'ship_carrier' : transform.ship_carrier,
    'ship_tracking_id' : transform.ship_tracking_id,
    'ship_location' : transform.ship_location,
    'ship_address_1' : transform.ship_address_1,
    'ship_address_2' : transform.ship_address_2,
    'ship_city' : transform.ship_city,
    'ship_state' : transform.ship_state,
    'ship_zip' : transform.ship_zip,
    'has_medical_coverage_flag' : transform.has_medical_coverage_flag,
    'primary_coverage_type' : transform.primary_coverage_type,
    'primary_payer' : transform.primary_payer,
    'primary_payer_type' : transform.primary_payer_type,
    'primary_payer_subtype' : transform.primary_payer_subtype,
    'primary_payer_group' : transform.primary_payer_group,
    'primary_payer_bin' : transform.primary_payer_bin,
    'primary_payer_iin' : transform.primary_payer_iin,
    'primary_payer_pcn' : transform.primary_payer_pcn,
    'primary_plan' : transform.primary_plan,
    'primary_plan_type' : transform.primary_plan_type,
    'secondary_coverage_type' : transform.secondary_coverage_type,
    'secondary_payer_name' : transform.secondary_payer_name,
    'secondary_payer_type' : transform.secondary_payer_type,
    'secondary_payer_subtype' : transform.secondary_payer_subtype,
    'secondary_payer_group' : transform.secondary_payer_group,
    'secondary_payer_bin' : transform.secondary_payer_bin,
    'secondary_payer_iin' : transform.secondary_payer_iin,
    'secondary_payer_pcn' : transform.secondary_payer_pcn,
    'secondary_plan' : transform.secondary_plan,
    'secondary_plan_type' : transform.secondary_plan_type,
    'primary_plan_paid' : transform.primary_plan_paid,
    'secondary_plan_paid' : transform.secondary_plan_paid,
    'primary_copay' : transform.primary_copay,
    'primary_coins' : transform.primary_coins,
    'primary_deductible' : transform.primary_deductible,
    'primary_patient_responsibility' : transform.primary_patient_responsibility,
    'secondary_copay' : transform.secondary_copay,
    'secondary_coins' : transform.secondary_coins,
    'secondary_deductible' : transform.secondary_deductible,
    'secondary_patient_responsibility' : transform.secondary_patient_responsibility,
    'copay_as_amount' : transform.copay_as_amount,
    'other_payer_amount' : transform.other_payer_amount,
    'primary_pbm' : transform.primary_pbm,
    'hcp_middle_name' : transform.hcp_middle_name,
    'hcp_suffix' : transform.hcp_suffix,
    'pharmacy_dea_number' : transform.pharmacy_dea_number,
    'aggregator_ship_id' : transform.aggregator_ship_id,
    'referral_number' : transform.referral_number,
    'primary_cost_type' : transform.primary_cost_type,
    'primary_cost_amount' : transform.primary_cost_amount,
    'primary_payer_prior_auth_expiration_date' : transform.primary_payer_prior_auth_expiration_date,
    'copay_card_used_flag' : transform.copay_card_used_flag
}

required_columns = []

In [None]:
from core.logging import get_logger
logger = get_logger(f"core.transforms.{transform.state}.{transform.name}")

In [None]:
column_renames = {
    'transaction_date' : transform.transaction_date,
    'pharmacy_code' : transform.pharmacy_code,
    'pharmacy_npi' : transform.pharmacy_npi,
    'pharmacy_hin' : transform.pharmacy_hin,
    'pharmacy_name' : transform.pharmacy_name,
    'pharmacy_ncpdp' : transform.pharmacy_ncpdp,
    'pharmacy_address_1' : transform.pharmacy_address_1,
    'pharmacy_address_2' : transform.pharmacy_address_2,
    'pharmacy_city' : transform.pharmacy_city,
    'pharmacy_state' : transform.pharmacy_state,
    'pharmacy_zip' : transform.pharmacy_zip,
    'transaction_type' : transform.transaction_type,
    'pharmacy_transaction_id' : transform.pharmacy_transaction_id,
    'transaction_id' : transform.transaction_id,
    'transaction_sequence' : transform.transaction_sequence,
    'referral_source' : transform.referral_source,
    'referral_date' : transform.referral_date,
    'longitudinal_patient_id' : transform.longitudinal_patient_id,
    'pharmacy_patient_id' : transform.pharmacy_patient_id,
    'patient_dob' : transform.patient_dob,
    'hub_patient_id' : transform.hub_patient_id,
    'bridge_patient' : transform.bridge_patient,
    'hub_patient' : transform.hub_patient,
    'patient_state' : transform.patient_state,
    'patient_zip' : transform.patient_zip,
    'patient_gender' : transform.patient_gender,
    'dx_1' : transform.dx_1,
    'dx_2' : transform.dx_2,
    'status_date' : transform.status_date,
    'status' : transform.status,
    'substatus' : transform.substatus,
    'customer_status' : transform.customer_status,
    'customer_substatus' : transform.customer_substatus,
    'customer_status_description' : transform.customer_status_description,
    'hcp_last_name' : transform.hcp_last_name,
    'hcp_first_name' : transform.hcp_first_name,
    'hcp_address_1' : transform.hcp_address_1,
    'hcp_address_2' : transform.hcp_address_2,
    'hcp_city' : transform.hcp_city,
    'hcp_state' : transform.hcp_state,
    'hcp_zip' : transform.hcp_zip,
    'hcp_phone' : transform.hcp_phone,
    'hcp_specialty' : transform.hcp_specialty,
    'hcp_npi' : transform.hcp_npi,
    'hcp_dea_number' : transform.hcp_dea_number,
    'hcp_facility' : transform.hcp_facility,
    'rx_date' : transform.rx_date,
    'rx_number' : transform.rx_number,
    'rx_fills' : transform.rx_fills,
    'rx_fill_number' : transform.rx_fill_number,
    'rx_refills_remaining' : transform.rx_refills_remaining,
    'prev_dispensed' : transform.prev_dispensed,
    'ndc' : transform.ndc,
    'brand' : transform.brand,
    'medication' : transform.medication,
    'quantity_dispensed' : transform.quantity_dispensed,
    'uom_dispensed' : transform.uom_dispensed,
    'days_supply' : transform.days_supply,
    'ship_date' : transform.ship_date,
    'ship_carrier' : transform.ship_carrier,
    'ship_tracking_id' : transform.ship_tracking_id,
    'ship_location' : transform.ship_location,
    'ship_address_1' : transform.ship_address_1,
    'ship_address_2' : transform.ship_address_2,
    'ship_city' : transform.ship_city,
    'ship_state' : transform.ship_state,
    'ship_zip' : transform.ship_zip,
    'has_medical_coverage_flag' : transform.has_medical_coverage_flag,
    'primary_coverage_type' : transform.primary_coverage_type,
    'primary_payer' : transform.primary_payer,
    'primary_payer_type' : transform.primary_payer_type,
    'primary_payer_subtype' : transform.primary_payer_subtype,
    'primary_payer_group' : transform.primary_payer_group,
    'primary_payer_bin' : transform.primary_payer_bin,
    'primary_payer_iin' : transform.primary_payer_iin,
    'primary_payer_pcn' : transform.primary_payer_pcn,
    'primary_plan' : transform.primary_plan,
    'primary_plan_type' : transform.primary_plan_type,
    'secondary_coverage_type' : transform.secondary_coverage_type,
    'secondary_payer_name' : transform.secondary_payer_name,
    'secondary_payer_type' : transform.secondary_payer_type,
    'secondary_payer_subtype' : transform.secondary_payer_subtype,
    'secondary_payer_group' : transform.secondary_payer_group,
    'secondary_payer_bin' : transform.secondary_payer_bin,
    'secondary_payer_iin' : transform.secondary_payer_iin,
    'secondary_payer_pcn' : transform.secondary_payer_pcn,
    'secondary_plan' : transform.secondary_plan,
    'secondary_plan_type' : transform.secondary_plan_type,
    'primary_plan_paid' : transform.primary_plan_paid,
    'secondary_plan_paid' : transform.secondary_plan_paid,
    'primary_copay' : transform.primary_copay,
    'primary_coins' : transform.primary_coins,
    'primary_deductible' : transform.primary_deductible,
    'primary_patient_responsibility' : transform.primary_patient_responsibility,
    'secondary_copay' : transform.secondary_copay,
    'secondary_coins' : transform.secondary_coins,
    'secondary_deductible' : transform.secondary_deductible,
    'secondary_patient_responsibility' : transform.secondary_patient_responsibility,
    'copay_as_amount' : transform.copay_as_amount,
    'other_payer_amount' : transform.other_payer_amount,
    'primary_pbm' : transform.primary_pbm,
    'hcp_middle_name' : transform.hcp_middle_name,
    'hcp_suffix' : transform.hcp_suffix,
    'pharmacy_dea_number' : transform.pharmacy_dea_number,
    'aggregator_ship_id' : transform.aggregator_ship_id,
    'referral_number' : transform.referral_number,
    'primary_cost_type' : transform.primary_cost_type,
    'primary_cost_amount' : transform.primary_cost_amount,
    'primary_payer_prior_auth_expiration_date' : transform.primary_payer_prior_auth_expiration_date,
    'copay_card_used_flag' : transform.copay_card_used_flag
}

required_columns = []

In [None]:
## Please place your value assignments for development here!!
## This cell will be turned off in production and Engineering will set to pull from the configuration application instead
## For the last example, this could look like...
## transform.some_ratio = 0.6
## transform.site_name = "WALGREENS"

### Transformation

In [None]:
import pandas as pd
from s3parq import fetch

from core.column_mapping import (
    check_required_columns,
    rename_and_correct_shape,
    retrieve_initial_ingest_file_names
)

In [None]:
input_contract = DatasetContract(parent=transform.publish_contract.parent, 
                                child=transform.publish_contract.child,
                                state="ingest",
                                dataset=transform.input_transform
                                )

ingest_prefix=input_contract.key+"/"+transform.input_source_file_prefix
bucket = transform.publish_contract.env

In [None]:
file_names = retrieve_initial_ingest_file_names(bucket=bucket, file_path_prefix=ingest_prefix)

In [None]:
final_dataframe=pd.DataFrame()
run_filter = [{"partition": "__metadata_run_id", "comparison": "==", "values": [run_id]}]

for file_name in file_names:
    logger.debug(f"Retrieving data from path : {ingest_prefix}")
    logger.debug(f"Ingesting data under file name : {file_name} , with run_id : {run_id}")
    
    # Run with parallel as False since its much slower if the data is not large
    file_df = fetch(bucket=bucket, key=(ingest_prefix+"/"+file_name), filters=run_filter, parallel=False)
    
    logger.debug(f"File data fetched, fetched dataframe shape : {file_df.shape}")
    
    # Check base requirement fullfillment
    try:
        check_required_columns(df=file_df, column_renames=column_renames, required_columns=required_columns)
    except MissingRequiredColumnError:
        # TODO: this needs to send a notification! That is occuring in a separate story however
        logger.info(f"File :   {file_name}   : is missing required columns and is being skipped.")
        continue
        
    logger.debug("File meets requirements.")
    
    file_df = rename_and_correct_shape(df=file_df,column_renames=column_renames)
    
    logger.debug("File successfully appended.")
    final_dataframe = final_dataframe.append(file_df)

### Publish

In [None]:
## that's it - just provide the final dataframe to the var final_dataframe and we take it from there
transform.publish_contract.publish(final_dataframe, run_id, session)
session.close()