# Project Objective

Using Snowpark Python, develop a process in Snowflake which allows the user to assess if the personal and professional objectives set by an employee correlates with the objectives of the firm and the department he works in. 

Given the objectives in a DataFrame, the output of the model is a evaluation between 0 and 5 (if the objectice is relevant, the ideal output should be 5).

### Import Libraries 

In [1]:
# Snowpark for Python
from snowflake.snowpark.session import Session
from snowflake.snowpark.types import Variant
from snowflake.snowpark.functions import udf,sum,col,array_construct,month,year,call_udf,lit
from snowflake.snowpark.version import VERSION
# Misc
import json
import logging 
logger = logging.getLogger("snowflake.snowpark.session")
logger.setLevel(logging.ERROR)

### Establish Secure Connection to Snowflake

In [2]:
# Create Snowflake Session object
connection_parameters = json.load(open('connection.json'))
session = Session.builder.configs(connection_parameters).create()
session.sql_simplifier_enabled = True

snowflake_environment = session.sql('select current_user(), current_version()').collect()
snowpark_version = VERSION

### Create Scalar User-Defined Function (UDF) for inference

Create and register a Snowpark Python UDF and add the trained model as a dependency**.

In [None]:
session.clear_imports()
session.clear_packages()

# Add trained model and Python packages from Snowflake Anaconda channel available on the server-side as UDF dependencies
session.add_import('@dash_models/<the_model>')
session.add_packages('pandas','joblib','numpy','scikit-learn==1.1.1')

@udf(name='assess_objective_relevance', session=session,replace=True,is_permanent=True,stage_location='@dash_udfs')
def assess_objective_relevance(objective: list) -> int:
    import sys
    import pandas as pd
    from joblib import load
    import sklearn

    IMPORT_DIRECTORY_NAME = "snowflake_import_directory"
    import_dir = sys._xoptions[IMPORT_DIRECTORY_NAME]
    
    model_file = import_dir + '<the_model>'
    model = load(model_file)
            
    features = ['OBJECTIVE']
    df = pd.DataFrame([objective], columns=features)
    estimated_relevance = abs(model.predict(df)[0])
    return estimated_relevance

### Create Vectorized User-Defined Function (UDF) using Batch API for inference

Here we will leverage the Python UDF Batch API to create a **vectorized** UDF which takes a Pandas Dataframe as input. This means that each call to the UDF receives a set/batch of rows compared to a Scalar UDF which gets one row as input. 

In [None]:
session.clear_imports()
session.clear_packages()

import cachetools
from snowflake.snowpark.types import PandasSeries, PandasDataFrame

# Add trained model and Python packages from Snowflake Anaconda channel available on the server-side as UDF dependencies
session.add_import('@dash_models/<the_model>')
session.add_packages('pandas','joblib','numpy', 'scikit-learn','cachetools')

@cachetools.cached(cache={})
def load_model(filename):
    import joblib
    import sys
    import os

    IMPORT_DIRECTORY_NAME = "snowflake_import_directory"
    import_dir = sys._xoptions[IMPORT_DIRECTORY_NAME]

    if import_dir:
        with open(os.path.join(import_dir, filename), 'rb') as file:
            m = joblib.load(file)
            return m

@udf(name='batch_assess_objective_relevance',session=session,replace=True,is_permanent=True,stage_location='@dash_udfs')
def batch_assess_objective_relevance(objectives: PandasDataFrame[String, String, String, String]) -> PandasSeries[int]:
    import sklearn
    objectives.columns = ['OBJECTIVES']
    model = load_model('<the_model>')
    return abs(model.predict(objectives))