## Introduction

**Prediction example:**  
___
In this example we will show how to:
- Setup the required environment for accessing the ecosystem prediction server.
- Upload data to ecosystem prediction server.
- Load data into feature store and parse to frame.
- Build and test a prediction model for prism scores.

## Setup

**Setting up import path:**  
___
Add path of ecosystem notebook wrappers.
- **notebook_path:** Path to notebook repository.

In [None]:
notebook_path = "/path of ecosystem server python wrappers"

In [None]:
# ---- Uneditible ----
import sys
sys.path.append(notebook_path)
# ---- Uneditible ----

**Import required packages:**  
___
Import and load all packages required for the following usecase.

In [3]:
# ---- Uneditible ----
import pymongo
from bson.son import SON
import pprint
import pandas as p
import json
import numpy
import operator
import datetime
import time
import os

from prediction import jwt_access
from prediction.apis import functions
from prediction.apis import data_munging_engine
from prediction.apis import worker_h2o
from prediction.apis import prediction_engine
from prediction.apis import worker_file_service
# ---- Uneditible ----

**Setup prediction server access:**  
___
Create access token for prediction server.
- **url:** Url for the prediction server to access.
- **username:** Username for prediction server.
- **password:** Password for prediction server.

In [None]:
url = "http://demo.ecosystem.ai:3001/api"
username = "user@ecosystem.ai"
password = "cd486be3-9955-4364-8ccc-a9ab3ffbc168"

In [None]:
# ---- Uneditible ----
auth = jwt_access.Authenticate(url, username, password)
# ---- Uneditible ----

## Upload Data

**List uploaded files:**  
___
List all files already uploaded.

In [None]:
# ---- Uneditible ----
files = worker_file_service.get_files(auth, path="./", user=username)
files = files["item"]
for file in files:
    print(file["name"])
# ---- Uneditible ----

**List uploadable files:**  
___
List all files in path ready for upload to prediction server.

In [None]:
# ---- Uneditible ----
path = "../example_data/"
upload_files = [f for f in os.listdir(path) if os.path.isfile(os.path.join(path, f))]
print(upload_files)
# ---- Uneditible ----

**Upload file:**  
___
Select file to upload to prediction server.
- **file_name:** file name of file to upload to prediction server. See list of available files for upload.

In [None]:
file_name = "output.csv"

In [None]:
# ---- Uneditible ----
worker_file_service.upload_file(auth, path + file_name, "/data/")
# ---- Uneditible ----

**List uploaded files:**  
___
List all files in path ready for upload to prediction server to compare with previous list to confirm that file was uploaded correctly.

In [None]:
# ---- Uneditible ----
files = worker_file_service.get_files(auth, path="./", user=username)
files = files["item"]
for file in files:
    print(file["name"])
# ---- Uneditible ----

## File to Featurestore

**Load file into feature store:**  
___
Load selected file into a feature store and parse the data into a frame.
- **file_name:** file name of uploaded file to load into a feature store.
- **featurestore_name:** name of feature store to load data into.

In [None]:
file_name = "output.csv"
featurestore_name = "test_featurestore"

In [None]:
# ---- Uneditible ----
hexframename = functions.save_file_as_userframe(auth, file_name, featurestore_name, username)
# ---- Uneditible ----

## Build Model

**Train Model:**
___
Set training parameters for model and train.
- **predict_id:** Id for the prediction (for logging). 
- **description:** Description of model (for logging).
- **model_id:** Id for the model (for logging).
- **model_type:** Type of model to build (for logging). 
- **frame_name:** Name of frame used (for logging).
- **frame_name_desc:** Description of frame used (for logging).
- **model_purpose:** Purpose of model (for logging).
- **version:** Model version (for logging).

The following parameters are dependend on what is selected in the algo parameter.

- **algo:** Algorithm to use to train model. (Availble algorithms: "H20-AUTOML")
- **training_frame:** Data frame to use for training the model.
- **validation_frame:** Data frame to use for validating the model.
- **max_models:** Maximum number of models to build.
- **stopping_tolerance:** (TODO)
- **max_runtime_secs:** Maximum number of seconds to spend on training.
- **stopping_rounds:** (TODO)
- **stopping_metric:** (TODO)
- **nfolds:** (TODO)
- **response_column:** The column or field in the dataset to predict.
- **ignored_columns:** List of columns to exclude in the model training.
- **hidden:** (TODO)
- **exclude_algos:** Algorithms to exclude in the automl run.

In [None]:
version = "1010"
model_id = featurestore_name + version
hexframename = "bank_full_1.hex"
model_purpose = "Prediction of whether nonbehavioural prism model is correct"
description = "Automated features store generated for " + featurestore_name
model_params = { 
        "predict_id": featurestore_name,
        "description": description,
        "model_id": model_id,
        "model_type": "AUTOML",
        "frame_name": hexframename,
        "frame_name_desc": description,
        "model_purpose": model_purpose,
        "version": version,
        "model_parms": {
            "algo": "H2O-AUTOML",
            "training_frame": hexframename,
            "validation_frame": hexframename,
            "max_models": 10,
            "stopping_tolerance": 0.005,
            "note_stop": "stopping_tolerance of 0.001 for 1m rows and 0.004 for 100k rows",
            "max_runtime_secs": 3600,
            "stopping_rounds": 10,
            "sort_metric": "logloss",
            "stopping_metric": "AUTO",
            "nfolds": 0,
            "note_folds": "nfolds=0 will disable the stacked ensemble creation process",
            "response_column": "job",
            "ignored_columns": [            
                "default",
                "balance",
                "contact",
                "day",
                "month",
                "duration",
                "campaign",
                "pdays",
                "previous",
                "poutcome",
                "y"
            ],
            "hidden": [
                "1"
            ],
            "exclude_algos": [
                "StackedEnsemble",
            ]
        }
    }

In [None]:
# ---- Uneditible ----
worker_h2o.train_model(auth, model_id, "automl", json.dumps(model_params["model_parms"]))
# ---- Uneditible ----

**View Model:**
___
View autoML model to see which generated models are performing the best.

In [None]:
# ---- Uneditible ----
model_data = worker_h2o.get_train_model(auth, model_id, "AUTOML")
print(model_data)
# ---- Uneditible ----

**Save Model:**
___
Save model for prediction.
- **model_id:** Id for the model to save. 

In [None]:
h2o_name = "GLM_1_AutoML_20210722_145224"
zip_name = h2o_name + ".zip"
worker_h2o.download_model_mojo(auth,h2o_name)
high_level_mojo = worker_h2o.get_train_model(auth, h2o_name, "user")
model_to_save = high_level_mojo["models"][0]
model_to_save["model_identity"] = h2o_name
model_to_save["userid"] = "user"
model_to_save["timestamp"] = "time_stamp"
prediction_engine.save_model(auth,model_to_save)

**View Model Stats:**
___
View stats of saved model.

In [None]:
prediction_engine.get_user_model(auth,h2o_name)
stats = worker_h2o.get_model_stats(auth,h2o_name,"ecosystem","variable_importances")