# Helper Function to Automatically Deploy AutoAI Models to Db2 as Python UDF

This notebook loads the helper function that automatically deploys an AutoAI Model as a Db2 Python UDF. This must be used an AutoAI **model notebook**. It should not be used in an experiment notebook

## Output Handling Functions

These two functions deal with formatting errors or successes within the main function

In [1]:
# How to output success message
def success_msg(message):
    from IPython.display import HTML, display
    html = '<p><p style="border:2px; border-style:solid; border-color:#00FF00; background-color:#F0FFF0; padding: 1em;">'
    display(HTML(html + message + "</p>"))  

In [2]:
# How to ouput error message
def errormsg(message):
    from IPython.display import HTML, display
    html = '<p><p style="border:2px; border-style:solid; border-color:#FF0000; background-color:#ffe6e6; padding: 1em;">'
    display(HTML(html + message + "</p>"))

## Helper Function

In [3]:
def deploy_autoai_model_as_db2pyudf(udf_source_filename, model_filename, jupyterpod_path, db2pod_path,id_col_index,create_function=False,instance_url="",function_name=""):
    """
    Deploy an IBM AutoAI Model to Db2 as a Python UDF. This is done in the following steps:
    1. Save the AutoAI model as a joblib file on the shared filesystem between the Jupyter and Db2 pods
    2. Write the Python UDF source file on the shared filesystem between the Jupyter and Db2 pods
    3. Change permissions of UDF source file and joblib file to be accessible by the Db2 fenced process
    4. (Optional) Register the UDF with Db2 through a CREATE OR REPLACE FUNCTION statement
    
    Parameters
    ----------
    udf_source_filename : str
        The filename that you would like to save the UDF source file under. Do not include full path.
        Example: 'myudf.py'
    model_filename: str
        The filename that you would like to save the the AutoAI model under. Do not include full path. Must be a joblib file.
        Example: 'myautoaimodel.joblib'
    jupyterpod_path: str
        The path in the Jupyter pod to save the model and UDF source file to. 
        This should be path that is shared between the Jupyter pod and the Db2 pod.
        Example: '/mnts/jupyterfiles/adrian/'
    db2pod_path: str
        The path in the Db2 pod where the Db2 fenced process can access the UDF source file and AutoAI model. 
        This should be path that is shared between the Jupyter pod and the Db2 pod.
        Example: '/mnt/blumeta0/adrian/'
    id_col_index: int
        The index (starting at 0) of the column of the input table that contains the unique row id.
        Used to map the output prediction to the input row.
        Example: If the input to the UDF is MY_UDF((SELECT COUNT(*) FROM T1),i.C1,i.C2,i.ID,i.C4), then id_col_index=2
    create_function: boolean, optional (default is False)
        A flag to indicate whether the function should automatically register the UDF with Db2 through a CREATE OR REPLACE FUNCTION statement. Will overwrite any existing function with the same name.
        If set to true, arguments instance_url and function_name must be provided.
    instance_url: str, optional (default is "")
        A string of your Cloud Pak for Data instance home url. 
        Example: "https://cpd-cpd-instance.apps.db2ai.cp.fyre.ibm.com"
    function_name: str, optional (default is "")
        A string for the function name to be registered with Db2 during the CREATE OR REPLACE FUNCTION statement.
        Example: 'MY_UDF'
    
    Example Use
    ----------
    deploy_autoai_model_as_db2pyudf(udf_source_filename='myudf.py', 
                                model_filename='myautoaimodel.joblib',
                                jupyterpod_path='/mnts/jupyterfiles/adrian/',
                                db2pod_path='/mnt/blumeta0/adrian/',
                                id_col_index=0,
                               create_function=True,
                               instance_url="https://cpd-cpd-instance.apps.db2ai.cp.fyre.ibm.com"
                               function_name='FLIGHT_PREDICTER')
    
    """
    import ibm_db
    import ibm_db_dbi
    from joblib import dump
    import os
    import pandas as pd
    from ibm_watson_machine_learning.experiment import AutoAI
    from ibm_watson_machine_learning import APIClient
    
    jupyter_model_path = jupyterpod_path+model_filename
    db2_model_path = db2pod_path+model_filename
    jupyter_udf_path = jupyterpod_path+udf_source_filename
    db2_udf_path = db2pod_path+udf_source_filename
    
    
    ############################
    # 1. Save the AutoAI model #
    ############################
    print('Saving the AutoAI model...')

    try:
        dump(pipeline,jupyter_model_path)
    except Exception as e:
        errormsg("ERROR: Unable to save AutoAI model as joblib file")
        print(e)
        return
    else:
        print("Successfully saved the AutoAI model to path:", jupyter_model_path)
        print("")


    ###########################
    # 2. Write the Python UDF #
    ###########################
    
    # Import statements
    udf_import = '''
################
### IMPORTS ###
###############
import nzae

import numpy as np
from joblib import load

class full_pipeline(nzae.Ae):
    def _runUdtf(self):
        #####################
        ### INITIALIZATON ###
        #####################
    '''
    
    # Load the model from the filesystem
    udf_loadmodel = '''
        trained_pipeline = load('{}')
        '''.format(db2_model_path)

    # UDF body - row batching, model scoring, and output
    udf_body = '''
        #######################
        ### DATA COLLECTION ###
        #######################
        # Collect rows into a single batch
        batchsize = 0
        rownum = 0
        row_list = []
        for row in self:
            if (rownum==0):
                # Grab batchsize from first element value (select count (*))
                batchsize=row[0] 
            
            # Collect everything but first element (which is select count(*))
            row_list.append(row[1:])
            rownum = rownum+1

            if rownum==batchsize:
                data= np.array(row_list)

                ##############################
                ### MODEL SCORING & OUTPUT ###
                ##############################
                
                # Collect data into a numpy array for scoring
                data=np.array(row_list)
                
                # Collect row IDs - TODO can probably just do this in the output step!
                ids=data[:,{}]
                
                # Call our trained pipeline to transform the data and make predictions
                predictions = trained_pipeline.predict(data)

                # Output the row id and the corresponding prediction
                for x in range(predictions.shape[0]):
                    self.output(int(ids[x]),int(predictions[x]))
                
                #Reset rownum and row_list for next batch
                row_list=[]
                rownum=0
        self.done()
full_pipeline.run()
    '''.format(id_col_index)
    # Write the PyUDF file
    print('Writing Python UDF source file...')
    try:
        with open(jupyter_udf_path, mode='w') as file:
            file.write(udf_import)

        with open(jupyter_udf_path, mode='a') as file:
            file.write(udf_loadmodel)

        with open(jupyter_udf_path, mode='a') as file:
            file.write(udf_body)
    except Exception as e:
        errormsg("ERROR: Unable to write Python UDF source file")
        print(e)
        return
    else:
        print("Successfully saved the Python UDF source file to path:", jupyter_udf_path)
        print("")
    
    ##############################
    # 3. Change file permissions # ##TODO: figure out how to handle errors!
    ##############################
    # Change permissions of UDF source file and joblib file to be accessible by the Db2 fenced process

    print('Changing file permissions...')
    !chmod -R 777 $jupyter_model_path
    !chmod -R 777 $jupyter_udf_path
    print("")
    
    ##########################
    # 4. Create UDF function #
    ##########################
    
    if create_function==True:
        print('Automatically registering UDF function...')
        print("")
        
        # First check required optional arguments are specified
        if instance_url == "":
            errormsg("ERROR: Cloud Pak for Data instance URL not provided!")
            return
        if function_name == "":
            errormsg("ERROR: Function name not provided!")
            return
        
        # Connect to Db2
        print('Attempting to make a connection to Db2...')
        try:
            # Get the Db2 credentials from WML
            url = instance_url
            wml_credentials = {
                "instance_id": "openshift",
                "token": os.environ.get("USER_ACCESS_TOKEN"),
                "url": url,
                "version": "4.0"
            }
            client = APIClient(wml_credentials)
            client.set.default_project(experiment_metadata['project_id'])
            Db2_credentials = client.connections.get_details()['resources'][0]['entity']['properties']
            
            # Make a connection to Db2
            Db2_dsn = 'DATABASE={};HOSTNAME={};PORT={};PROTOCOL=TCPIP;UID={uid};PWD={pwd}'.format(
            Db2_credentials['database'],
            Db2_credentials['host'],
            Db2_credentials['port'],
            uid=Db2_credentials['username'],
            pwd=Db2_credentials['password']
            )
            Db2_connection = ibm_db.connect(Db2_dsn,"","")
            dbi_connection = ibm_db_dbi.Connection(Db2_connection)
        except Exception as e:
            errormsg("ERROR: Connect to Db2 failed")
            print(e)
            return
        else:
            print('Connection successful!')
            print("")
            
        # Determine input column datatypes
        print('Attempting to determine input column datatypes...')
        try:
            # Get input table name from experiment metadata
            input_table = experiment_metadata['excel_sheet']
            sql = '''SELECT NAME, COLTYPE,LENGTH FROM SYSIBM.SYSCOLUMNS 
            WHERE TBCREATOR='{}' AND TBNAME='{}' AND NAME!='{}' ORDER BY COLNO 
            '''.format(input_table.split('.')[0],input_table.split('.')[1],experiment_metadata['prediction_column'])
            # Create a string from the mapping. This is used in the CREATE FUNCTION statement
            dtypes_df = pd.read_sql(sql,dbi_connection)
            mapping = [str(dtypes_df['COLTYPE'][dtypes_df['NAME']==x].values[0]).strip()+'('+str(dtypes_df['LENGTH'][dtypes_df['NAME']==x].values[0])+')' 
               if str(dtypes_df['COLTYPE'][dtypes_df['NAME']==x].values[0]).strip()=="VARCHAR" 
               else str(dtypes_df['COLTYPE'][dtypes_df['NAME']==x].values[0]).strip() 
               for x in dtypes_df['NAME']]
            input_dtypes_string = ', '.join([x for x in mapping ])
        except Exception as e:
            errormsg("ERROR: Error determining input datatypes")
            print(e)
            return
        else:
            print('Successfully determined input column datatypes!')
            print("")
            
        # Automatically execute CREATE FUNCTION statement
        print('Attempting to execute CREATE FUNCTION statement...')
        try:
            sql='''
CREATE OR REPLACE FUNCTION 
{}(INTEGER,{}) 
RETURNS TABLE (ID INTEGER,PREDICTION SMALLINT)
LANGUAGE PYTHON PARAMETER STYLE NPSGENERIC  FENCED  NOT THREADSAFE  NO FINAL CALL  DISALLOW PARALLEL  NO DBINFO  
DETERMINISTIC NO EXTERNAL ACTION CALLED ON NULL INPUT  
NO SQL EXTERNAL NAME '{}'
        '''.format(function_name,input_dtypes_string,db2_udf_path)

            print(sql)
            stmt = ibm_db.prepare(Db2_connection, sql)
            ibm_db.execute(stmt)
        except Exception as e:
            errormsg("ERROR: Unable to execute CREATE FUNCTION statement!")
            print(e)
            return
        else:
            print('UDF registered with Db2!')
        
        
        # Show how to call the UDF
        msg = """Execute the following SQL statement to call your UDF to make predictions on input data<br>
        <code style="background-color:#F0FFF0">
        SELECT f.* from &ltINPUT_TABLE&gt i,
        TABLE({}((SELECT COUNT(*) from &ltINPUT_TABLE&gt),i.C1,i.C2, ...)) f</code><br>
        <br>
        Replace &ltINPUT_TABLE&gt with the name of the table that contains the raw data to be scored (e.g., FLIGHTS.DATA)<br>
        Replace i.C1, i.C2, ... with the input columns (e.g., i.DAY, i.ORIGIN,...)<br>
        You may choose to replace the first argument (SELECT COUNT(*) from &ltINPUT_TABLE&gt) with a custom batchsize. <br>
        Note that the batchsize must be a clean divisor of the input table. E.g., for a table of 10 rows, you may choose a batchsize of 1, 2, 5, or 10.""".format(function_name)
        success_msg(msg)
    
    # If Create Function argument not provided, provide steps for manual function registration
    else:
        print('Steps to manually create your UDF function:')
        
        # How to write the CREATE FUNCTION statement
        msg= '''
        Execute the following SQL statement to create your UDF:<br>
        <code style="background-color:#F0FFF0">
        CREATE OR REPLACE FUNCTION 
        &ltUDF_NAME>(INTEGER,&ltC1 DATATYPE&gt,&ltC2 DATATYPE&gt,...) 
        RETURNS TABLE (ID INTEGER,PREDICTION SMALLINT)
        LANGUAGE PYTHON PARAMETER STYLE NPSGENERIC  FENCED  NOT THREADSAFE  NO FINAL CALL  
        DISALLOW PARALLEL  NO DBINFO DETERMINISTIC NO EXTERNAL ACTION CALLED ON NULL INPUT  
        NO SQL EXTERNAL NAME '{}'</code><br>
        <br>
        Replace &ltUDF_NAME&gt with a function name (e.g., MY_UDF)<br>
        Replace &ltCn DATATYPE&gt with the datatype of the nth input column (e.g., VARCHAR(8))
                '''.format(db2_udf_path)
        success_msg(msg)
        
        # How to call the UDF
        msg='''
        Execute the following SQL statement to call your UDF to make predictions on input data:<br>
        <code style="background-color:#F0FFF0">
        SELECT f.* from &ltINPUT_TABLE&gt i,
        TABLE(&ltUDF_NAME&gt((SELECT COUNT(*) from &ltINPUT_TABLE&gt),i.C1,i.C2, ...)) f</code><br>
        <br>
        Replace &ltUDF_NAME&gt with the name of your UDF (e.g., MY_UDF)<br>
        Replace &ltINPUT_TABLE&gt with the name of the table that contains the raw data to be scored (e.g., FLIGHTS.DATA)<br>
        Replace i.C1, i.C2, ... with the input columns (e.g., i.DAY, i.ORIGIN,...)<br>
        You may choose to replace the first argument (SELECT COUNT(*) from &ltINPUT_TABLE&gt) with a custom batchsize. <br>
        Note that the batchsize must be a clean divisor of the input table. E.g., for a table of 10 rows, you may choose a batchsize of 1, 2, 5, or 10.
        '''
        success_msg(msg)

In [4]:
print('--------------')
success_msg('''Function deploy_autoai_model_as_db2pyudf successfully loaded!<br>
Run <code style="background-color:#F0FFF0">help(deploy_autoai_model_as_db2pyudf)</code> to get function information''')

--------------
