## How-to guide for Real-Time Forecasting use-case on Abacus.AI platform
This notebook provides you with a hands on environment to build a real-time forecasting model using the Abacus.AI Python Client Library.

We'll be using the [Household Electricity Usage Dataset](https://s3.amazonaws.com/realityengines.exampledatasets/rtforecasting/household_electricity_usage.csv), which contains data about electricity usage in a specified household.

1. Install the Abacus.AI library.

In [None]:
!pip install abacusai

We'll also import pandas and pprint tools for neat visualization in this notebook.

In [1]:
import pandas as pd # A tool we'll use to download and preview CSV files
import pprint # A tool to pretty print dictionary outputs
pp = pprint.PrettyPrinter(indent=2)

2. Add your Abacus.AI [API Key](https://abacus.ai/app/profile/apikey) generated using the API dashboard as follows:

In [2]:
#@title Abacus.AI API Key

api_key = '2fdecde877dc45fab937eff82b70eff0'  #@param {type: "string"}

3. Import the Abacus.AI library and instantiate a client.

In [3]:
from abacusai import ApiClient
client = ApiClient(api_key)

## 1. Create a Project

Abacus.AI projects are containers that have datasets and trained models. By specifying a business **Use Case**, Abacus.AI tailors the deep learning algorithms to produce the best performing model possible for your data.

We'll call the `list_use_cases` method to retrieve a list of the available Use Cases currently available on the Abacus.AI platform.

In [4]:
client.list_use_cases()

[UseCase(use_case='UCPLUGANDPLAY',
   pretty_name='Plug & Play Your Tensorflow Model',
   description='Upload your already trained model and leverage our model serving infrastructure.. Host your models on our infrastructure and get a JSON api with auto scaling and more!'),
 UseCase(use_case='EMBEDDINGS_ONLY',
   pretty_name='Vector Matching Engine',
   description='Upload embeddings and leverage our similarity search infrastructure.. Scale to high traffic, update your index in near realtime'),
 UseCase(use_case='MODEL_WITH_EMBEDDINGS',
   pretty_name='Tensorflow Model With Vector Matching Engine',
   description='Upload your already trained model and leverage our model serving infrastructure.. Host your models on our infrastructure and get a JSON api with auto scaling and more!'),
 UseCase(use_case='TORCH_MODEL_WITH_EMBEDDINGS',
   pretty_name='PyTorch Model With Vector Matching Engine',
   description='Upload your already trained model and leverage our model serving infrastructure.. H

In this notebook, we're going to create a real-time forecasting model using the Household Electricity Usage dataset. The 'ENERGY' use case is best tailored for this situation.

In [5]:
#@title Abacus.AI Use Case

use_case = 'ENERGY'  #@param {type: "string"}

By calling the `describe_use_case_requirements` method we can view what datasets are required for this use_case.

In [6]:
for requirement in client.describe_use_case_requirements(use_case):
  pp.pprint(requirement.to_dict())

{ 'allowed_feature_mappings': { 'DATE': { 'allowed_feature_types': [ 'TIMESTAMP'],
                                          'description': 'Date (day, year or '
                                                         'month) that '
                                                         'corresponds to the '
                                                         'target value.',
                                          'required': True},
                                'FUTURE': { 'description': 'Known values ahead '
                                                           'of time (e.g., '
                                                           'State Holidays, '
                                                           'National Holidays '
                                                           'etc.) that can be '
                                                           'easily included in '
                                                           'the training '


Finally, let's create the project.

In [7]:
real_time_project = client.create_project(name='Electricity Usage Forecasting', use_case=use_case)
real_time_project.to_dict()

{'project_id': '156aa80118',
 'name': 'Electricity Usage Forecasting',
 'use_case': 'ENERGY',
 'created_at': '2021-11-23T19:21:55+00:00',
 'feature_groups_enabled': True}

**Note: When feature_groups_enabled is True then the use case supports feature groups (collection of ML features). Feature groups are created at the organization level and can be tied to a project to further use it for training ML models**

## 2. Add Datasets to your Project

Abacus.AI can read datasets directly from `AWS S3` or `Google Cloud Storage` buckets, otherwise you can also directly upload and store your datasets with Abacus.AI. For this notebook, we will have Abacus.AI read the datasets directly from a public S3 bucket's location.

We are using one dataset for this notebook. We'll tell Abacus.AI how the dataset should be used when creating it by tagging the dataset with a special Abacus.AI **Dataset Type**.
- [Household Electricity Usage Dataset](https://s3.amazonaws.com/realityengines.exampledatasets/rtforecasting/household_electricity_usage.csv) (**TIMESERIES**): 
This dataset contains information about electricity usage in specified households over a period of time.

### Add the dataset to Abacus.AI

First we'll use Pandas to preview the file, then add it to Abacus.AI.

In [8]:
pd.read_csv('https://s3.amazonaws.com/realityengines.exampledatasets/rtforecasting/household_electricity_usage.csv')

Unnamed: 0,id,time,value
0,MT_294,2011-01-01 00:00:00,378.11935
1,MT_294,2011-01-01 01:00:00,373.61487
2,MT_294,2011-01-01 02:00:00,360.93750
3,MT_294,2011-01-01 03:00:00,363.46283
4,MT_294,2011-01-01 04:00:00,371.08110
...,...,...,...
12973675,MT_369,2011-12-31 19:00:00,0.00000
12973676,MT_369,2011-12-31 20:00:00,0.00000
12973677,MT_369,2011-12-31 21:00:00,0.00000
12973678,MT_369,2011-12-31 22:00:00,0.00000


Using the Create Dataset API, we can tell Abacus.AI the public S3 URI of where to find the datasets. We will also give each dataset a Refresh Schedule, which tells Abacus.AI when it should refresh the dataset (take an updated/latest copy of the dataset).

If you're unfamiliar with Cron Syntax, Crontab Guru can help translate the syntax back into natural language: [https://crontab.guru/#0_12_\*_\*_\*](https://crontab.guru/#0_12_*_*_*)

**Note: This cron string will be evaluated in UTC time zone**

In [20]:
real_time_dataset = client.create_dataset_from_file_connector(name='Household Electricity Usage',table_name='Household_Electricity_Usage',
                                     location='s3://realityengines.exampledatasets/rtforecasting/household_electricity_usage.csv',
                                     refresh_schedule='0 12 * * *')
datasets = [real_time_dataset]

## 3. Create Feature Groups and add them to your Project

Datasets are created at the organization level and can be used to create feature groups as follows:

In [23]:
feature_group = client.create_feature_group(table_name='real_time_forecasting',sql='SELECT * FROM Household_Electricity_Usage')

Adding Feature Group to the project:

In [24]:
client.add_feature_group_to_project(feature_group_id=feature_group.feature_group_id,project_id = real_time_project.project_id)

Setting the Feature Group type according to the use case requirements:

In [25]:
client.set_feature_group_type(feature_group_id=feature_group.feature_group_id, project_id = real_time_project.project_id, feature_group_type= "TIMESERIES")

Check current Feature Group schema:

In [26]:
client.get_feature_group_schema(feature_group_id=feature_group.feature_group_id)

[Feature(name='id',
   select_clause=None,
   feature_mapping=None,
   source_table='Household_Electricity_Usage99',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='CATEGORICAL',
   data_type='STRING',
   columns=None,
   point_in_time_info=None),
 Feature(name='time',
   select_clause=None,
   feature_mapping=None,
   source_table='Household_Electricity_Usage99',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='TIMESTAMP',
   data_type='DATETIME',
   columns=None,
   point_in_time_info=None),
 Feature(name='value',
   select_clause=None,
   feature_mapping=None,
   source_table='Household_Electricity_Usage99',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='NUMERICAL',
   data_type='FLOAT',
   columns=None,
   point_in_time_info=None)]

#### For each **Use Case**, there are special **Column Mappings** that must be applied to a column to fulfill use case requirements. We can find the list of available **Column Mappings** by calling the *Describe Use Case Requirements* API:

In [27]:
client.describe_use_case_requirements(use_case)[0].allowed_feature_mappings

{'ITEM_ID': {'description': 'The unique identifier of the item whose target value you are forecasting.',
  'allowed_feature_types': ['CATEGORICAL'],
  'required': True},
 'TARGET': {'description': 'The target value you are forecasting. (e.g. energy, electricity usage).',
  'allowed_feature_types': ['NUMERICAL'],
  'required': True},
 'DATE': {'description': 'Date (day, year or month) that corresponds to the target value.',
  'allowed_feature_types': ['TIMESTAMP'],
  'required': True},
 'FUTURE': {'description': 'Known values ahead of time (e.g., State Holidays, National Holidays etc.) that can be easily included in the training dataset.',
  'multiple': True,
  'required': False},
 'IGNORE': {'description': 'Ignore this column in training',
  'multiple': True,
  'required': False}}

In [28]:
client.set_feature_mapping(project_id = real_time_project.project_id,feature_group_id= feature_group.feature_group_id, feature_name='value',feature_mapping='TARGET')
client.set_feature_mapping(project_id = real_time_project.project_id,feature_group_id= feature_group.feature_group_id, feature_name='time',feature_mapping='DATE')
client.set_feature_mapping(project_id = real_time_project.project_id,feature_group_id= feature_group.feature_group_id, feature_name='id',feature_mapping='ITEM_ID')

[Feature(name='id',
   select_clause=None,
   feature_mapping='ITEM_ID',
   source_table='Household_Electricity_Usage99',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='CATEGORICAL',
   data_type='STRING',
   columns=None,
   point_in_time_info=None),
 Feature(name='time',
   select_clause=None,
   feature_mapping='DATE',
   source_table='Household_Electricity_Usage99',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='TIMESTAMP',
   data_type='DATETIME',
   columns=None,
   point_in_time_info=None),
 Feature(name='value',
   select_clause=None,
   feature_mapping='TARGET',
   source_table='Household_Electricity_Usage99',
   original_name=None,
   using_clause=None,
   order_clause=None,
   where_clause=None,
   feature_type='NUMERICAL',
   data_type='FLOAT',
   columns=None,
   point_in_time_info=None)]

For each required Feature Group Type within the use case, you must assign the Feature group to be used for training the model:

In [None]:
client.use_feature_group_for_training(project_id=real_time_project.project_id, feature_group_id=feature_group.feature_group_id)

Now that we've our feature groups assigned, we're almost ready to train a model!

To be sure that our project is ready to go, let's call project.validate to confirm that all the project requirements have been met:

In [None]:
real_time_project.validate()

## 4. Train a Model

For each **Use Case**, Abacus.AI has a bunch of options for training. We can call the *Get Training Config Options* API to see the available options.

In [None]:
real_time_project.get_training_config_options()

In this notebook, we'll just train with the default options, but definitely feel free to experiment, especially if you have familiarity with Machine Learning.

In [None]:
real_time_model = real_time_project.train_model(training_config={})
real_time_model.to_dict()

After we start training the model, we can call this blocking call that routinely checks the status of the model until it is trained and evaluated.

In [None]:
real_time_model.wait_for_evaluation()

**Note that model training might take some minutes to some hours depending upon the size of datasets, complexity of the models being trained and a variety of other factors**

## **Checkpoint** [Optional]
As model training can take an hours to complete, your page could time out or you might end up hitting the refresh button, this section helps you restore your progress:

In [None]:
!pip install abacusai
import pandas as pd
import pprint
pp = pprint.PrettyPrinter(indent=2)
api_key = ''  #@param {type: "string"}
from abacusai import ApiClient
client = ApiClient(api_key)
real_time_project = next(project for project in client.list_projects() if project.name == 'Electricity Usage Forecasting')
real_time_model = real_time_project.list_models()[-1]
real_time_model.wait_for_evaluation()

## Evaluate your Model Metrics

After your model is done training you can inspect the model's quality by reviewing the model's metrics

In [None]:
pp.pprint(real_time_model.get_metrics().to_dict())

To get a better understanding on what these metrics mean, visit our [documentation](https://abacus.ai/app/help/useCases/ENERGY/training) page.

## 5. Deploy Model

After the model has been trained, we need to deploy the model to be able to start making predictions. Deploying a model will reserve cloud resources to host the model for Realtime and/or batch predictions.

In [None]:
real_time_deployment = client.create_deployment(name='Electricity Usage Deployment',description='Electricity Usage Deployment',model_id=real_time_model.model_id)
real_time_deployment.wait_for_deployment()

After the model is deployed, we need to create a deployment token for authenticating prediction requests. This token is only authorized to predict on deployments in this project, so it's safe to embed this token inside of a user-facing application or website.

In [None]:
deployment_token = real_time_project.create_deployment_token().deployment_token
deployment_token

## 6. Predict


Now that you have an active deployment and a deployment token to authenticate requests, you can make the `get_forecast` API call below.

This command will return a forecast under each percentile for the specified ITEM_ID. The forecast will be performed based on attributes specified in the dataset.

In [None]:
ApiClient().get_forecast(deployment_token=deployment_token, 
               deployment_id=real_time_deployment.deployment_id, 
               query_data={"id":"MT_001"})