Skip to content

Latest commit

 

History

History
206 lines (145 loc) · 7.88 KB

README.md

File metadata and controls

206 lines (145 loc) · 7.88 KB

FaaS - API modeling

minimal R version License: MPL 2.0

faas4i

Repository for running scale modeling and model update on FaaS

Scale modeling performs an exhaustive search for best models in time series data, providing information about the fit of the best models, their cross-validation accuracy measures and many other outputs that are usually of interest.

A brief description of the scale modeling is presented below. The full description of their parameters can be found on the Wiki of the repository.

Installation

Before you start this installation, make sure you have the package remotes installed in your machine:

install.packages("remotes")

We currently require that you have the version 5.2.1 of the package curl installed in your machine to use faas4i. If you are using a different version, you will need to remove the other version and then install 5.2.1:

remove.packages("curl")
remotes::install_version("curl", version = "5.2.1", force = TRUE, repos = "http://cran.r-project.org")

Then you can install faas4i:

remotes::install_github("4intelligence/faas4i", force = TRUE, upgrade = FALSE)

Don't forget to load the library and you are all set to start using the package!

library("faas4i")

Authentication

Each user will need to setup the authentication using the function login. The function login will display a URI where 4CastHub's user email and password will be required, as well as the two-factor-authentication code.

faas4i::login()
## Once the url is printed, copy and paste it to your browser and follow with authentication

By default, the login function will wait 90 seconds for authentication. If you wish to adjust the wait time, it is possible to change the parameter using a numeric value for sleep_time.

Basic Usage

I) Scale Modeling

The scale modeling requests are sent via ‘run_models’ function. There are some arguments to feed this function. We are going to walk through all of them in this example and then will call the API.

1) Data List [‘data_list’]

  • A list of datasets to perform modeling, where the list elements must be named after the Y variable name. It is not possible to have more than one Y (dependent) variable with same name in a data_list.
  • The user should also define the date_variable and its format (date_format).

Let us see two examples of data lists, one with 1 Y and the other with multiple Y’s

Example 1 data_list [single Y]:
# Load the dataset with our data
# Since this dataset is stored in the package, we can load it directly
dataset_1 <- faas4i::dataset_1

# But you will need to do something similar to the line below when loading 
# your own dataset locally
# dataset_1 <- readxl::read_excel("./inputs/dataset_1.xlsx")

# Put it inside a list (therefore, a 'data list')
# and name the list element with the name of the target variable
data_list <-  list(dataset_1)
names(data_list) <- c("fs_pim")

# Also, specify the date variable and its format 
date_variable <- "DATE_VARIABLE"
date_format <- '%Y-%m-%d'

Example 2 data_list [multiple Ys]:
# Load a data frame with our data
# Again, we can load the data directly from the package for this example
dataset_1 <- faas4i::dataset_1
dataset_2 <- faas4i::dataset_2
dataset_3 <- faas4i::dataset_3

# But you will need to read it when you are loading your own data locally
# dataset_1 <- readxl::read_excel("./inputs/dataset_1.xlsx")
# dataset_2 <- readxl::read_excel("./inputs/dataset_2.xlsx")
# dataset_3 <- readxl::read_excel("./inputs/dataset_3.xlsx")

# Put it inside a list (therefore, a 'data list')
# and name every list element with the name of the target variable
data_list <-  list(dataset_1, dataset_2, dataset_3)
names(data_list) <- c("fs_pim", "fs_pmc", "fs_pib")

# Also, specify the date variable and its format 
# (must have the same name in all datasets)
date_variable <- "DATE_VARIABLE"
date_format <- '%Y-%m-%d'

2) Model Specifications [‘model_spec’]

The model specifications should be provided in a list format as:

## Default settings
model_spec <- list(n_steps = <input>,
                   n_windows = <input>,
                   log = TRUE,
                   seas.d = TRUE,
                   n_best = 20,
                   accuracy_crit = "MAPE",
                   exclusions = list(),
                   golden_variables = c(),
                   fill_forecast = FALSE,
                   cv_summary = 'mean',
                   selection_methods = list(
                     lasso = TRUE,
                     rf = TRUE,
                     corr = TRUE,
                     apply.collinear = TRUE),
                   lags = list(),
                   allowdrift = TRUE,
                   user_model = list())

The critical and required input we expect from users is the CV setting (n_steps and n_windows). All remaining non-provided arguments will assume their default values, as defined above. You can find a full description of these arguments on the Wiki of the repository.


3) Project Name [‘project_name’]

Define a project name. A string with character and/or numeric inputs that should be at most 50 characters long. Special characters will be removed.

project_name <- "example_project"

4) Send job request

Wants to make sure everything is alright? Though not necessary, you can validate your request beforehand by using the following function:

faas4i::validate_models(data_list = data_list, date_variable = date_variable, 
                        date_format = date_format, model_spec = model_spec,
                        project_name = project_name) 

It will return a message indicating if your specifications are correctly defined and point out to the arguments that need adjustment (if any).

Or you can simply send your FaaS API request. We'll take care of running the validate_models and let you know if something needs your attention before we can proceed. If everything is correct, we'll automatically send the request, and you will see a message with the status of your request in your console.

faas4i::run_models(data_list = data_list, date_variable = date_variable, 
                   date_format = date_format, model_spec = model_spec,
                   project_name = project_name) 

II) Other Functionalities

1) List projects

You can check directly in R all projects you have in the platform using the following function:

my_projects <- list_projects()

This function returns a list with information of the projects: project id, project name, status, creation date, last time it was updated, etc.

2) Download forecast pack

You can also download the project output (forecast pack) in RDS format directly in R using the function:

download_zip(project_id = "project_id",
             path = "path",
             filename = "file_name")

To download the forecast pack, you will need the project_id which is an information available in the output of the list_projects function. You need to give the path of the directory you want to save the forecast pack and filename.