# CORE Cartridge Notebook::Publish to FTP
![CORE Logo](assets/coreLogo.png) 

---
## Keep in Mind
Good Transforms Are...
- **singular in purpose:** good transforms do one and only one thing, and handle all known cases for that thing. 
- **repeatable:** transforms should be written in a way that they can be run against the same dataset an infinate number of times and get the same result every time. 
- **easy to read:** 99 times out of 100, readable, clear code that runs a little slower is more valuable than a mess that runs quickly. 
- **No 'magic numbers':** if a variable or function is not instantly obvious as to what it is or does, without context, maybe consider renaming it.

## Workflow - how to use this notebook to make science
#### Data Science
1. **Document your transform.** Fill out the _description_ cell below describing what it is this transform does; this will appear in the configuration application where Ops will create, configure and update pipelines. 
1. **Define your config object.** Fill out the _configuration_ cell below the commented-out guide to define the variables you want ops to set in the configuration application (these will populate here for every pipeline). 
2. **Build your transformation logic.** Use the transformation cell to do that magic that you do. 
![caution](assets/cautionTape.png)

### Description
What does this transformation do? be specific.

![what does your transform do](assets/what.gif)

This transformation publishes a dataset in S3 to an external FTP server. The credentials for the FTP server should be stored securely in an AWS Secret, with the secret_name and secret_type_of provided to the transformation.

### Configuration

In [None]:
from core.helpers.session_helper import SessionHelper
session = SessionHelper().session

In [None]:
"""
************ SETUP - DON'T TOUCH **************
This section imports data from the configuration database
and should not need to be altered, molested or otherwise messed with. 
~~These are not the droids you are looking for~~
"""
from core.constants import BRANCH_NAME, ENV_BUCKET
from core.models.configuration import Transformation
from dataclasses import dataclass
from core.dataset_contract import DatasetContract
import pandas as pd

db_transform = session.query(Transformation).filter(Transformation.id == transform_id).one()

@dataclass
class DbTransform:
    id: int = db_transform.id ## the instance id of the transform in the config app
    name: str = db_transform.transformation_template.name ## the transform name in the config app
    state: str = db_transform.pipeline_state.pipeline_state_type.name ## the pipeline state, one of raw, ingest, master, enhance, enrich, metrics, dimensional
    branch:str = BRANCH_NAME ## the git branch for this execution 
    brand: str = db_transform.pipeline_state.pipeline.brand.name ## the pharma brand name
    pharmaceutical_company: str = db_transform.pipeline_state.pipeline.brand.pharmaceutical_company.name # the pharma company name
    publish_contract: DatasetContract = DatasetContract(branch=BRANCH_NAME,
                            state=db_transform.pipeline_state.pipeline_state_type.name,
                            parent=db_transform.pipeline_state.pipeline.brand.pharmaceutical_company.name,
                            child=db_transform.pipeline_state.pipeline.brand.name,
                            dataset=db_transform.transformation_template.name)

In [None]:
""" 
********* CONFIGURATION - PLEASE TOUCH ********* 
This section defines what you expect to get from the configuration application 
in a single "transform" object. Define the vars you need here, and comment inline to the right of them 
for all-in-one documentation. 
Engineering will build a production "transform" object for every pipeline that matches what you define here.

@@@ FORMAT OF THE DATA CLASS IS: @@@ 

<value_name>: <data_type> #<comment explaining what the value is to future us>

~~These ARE the droids you are looking for~~
"""

class Transform(DbTransform):
    input_transform: str = db_transform.variables.input_transform # name of transformation to pull dataset from
    prefix: str = db_transform.variables.prefix # file prefix to publish to ftp
    suffix: str = db_transform.variables.suffix # file suffix to publish to ftp
    filetype: str = db_transform.variables.filetype # filetype to publish to ftp (DO NOT INCLUDE . IN FILETYPE)
    separator: str = db_transform.variables.separator # single character separator for output file
    compression: bool = db_transform.variables.compression # if true, published file will be compressed as gzip
    date_format: str = db_transform.variables.date_format # string formatting for datetime
    remote_path: str = db_transform.variables.remote_path # path to publish to on FTP server
    secret_name: str = db_transform.variables.secret_name # AWS secret name containing FTP credentials
    secret_type_of: str = db_transform.variables.secret_type_of # AWS secret type of, should almost always be "FTP"

In [None]:
## Please place your value assignments for development here!!
## This cell will be turned off in production and Engineering will set to pull form the configuration application instead

transform = Transform()

### Transformation

In [None]:
### Retrieve current dataset from contract
from core.dataset_diff import DatasetDiff

diff = DatasetDiff(db_transform.id)
df = diff.get_diff(transform_name=transform.input_transform, values=[run_id])

In [None]:
from core.helpers import file_mover
from core.secret import Secret
import core.helpers.drop_metadata as dropper
import tempfile, datetime

if len(transform.separator) != 1:
    raise ValueError("Error: Separator must be a single character.")

if transform.filetype.find(".") != -1:
    raise ValueError("Error: Filetype should not contain '.'")

prefix = "/" + transform.prefix + "_" + transform.brand.upper()
suffix = transform.suffix
filetype = "." + transform.filetype.lower()

ts = datetime.datetime.now()
time = ts.strftime(transform.date_format)

if suffix == "":
    filename = '_'.join([prefix, time]) + filetype
else:
    filename = '_'.join([prefix, time, suffix]) + filetype
    
if transform.compression:
    filename += '.gz'

df = dropper.drop_metadata(df)

with tempfile.TemporaryDirectory() as temp_dir:
    filename = temp_dir + filename
    
    if transform.compression:
        df.fillna('')\
            .to_csv(filename,
                    sep=transform.separator,
                    header=True,
                    index=False,
                    compression='gzip'
                   )
    else:
        df.fillna('')\
            .to_csv(filename,
                    sep=transform.separator,
                    header=True,
                    index=False
                   )

    ftp_secret = Secret(name=transform.secret_name, type_of=transform.secret_type_of, mode="write")
    file_mover.publish_file(local_path=filename, remote_path=transform.remote_path, secret=ftp_secret)

### Publish

In [None]:
## Files are published to FTP in this transformation. This transformation does not publish to a contract in S3.