# Building test dataset

This script demonstrates how to create a data for testing.

## Set configuration

The methods and modules in this example make use of information from the `configure.json` file, which is located in the `/doc` folder. Fill in the blanks in the file. Some of them are: the model type, the number of nodes, the number of epochs in the training step, the size of the data test, the size of the windowing data, checking if the data contains accumulated values, and a flag to apply the moving average.

In [1]:
configure_json = open('../doc/configure.json', 'r')

import json
print(json.dumps(json.load(configure_json), indent=4))

{
    "ncovid": "ML COVID-19 configure file",
    "author": "NatalNet NCovid",
    "published_at": 2021,
    "folder_configs": {
        "docs_path": "../doc/",
        "data_path": "../dbs/",
        "model_path": "fitted_model/",
        "model_path_remote": "https://",
        "glossary_file": "glossary.json"
    },
    "model_configs": {
        "type_used": "Artificial",
        "is_predicting": "False",
        "Artificial": {
            "model": "lstm",
            "nodes": 300,
            "epochs": 100,
            "dropout": 0.1,
            "batch_size": 64,
            "earlystop": 30,
            "is_output_in_input": "True",
            "data_configs": {
                "is_accumulated_values": "False",
                "is_apply_moving_average": "True",
                "window_size": 7,
                "data_test_size_in_days": 35,
                "type_norm": ""
            },
            "Autoregressive": {
                "model": "arima",
                "p": 1,
    

### Importing the configurations

In [13]:
import sys
sys.path.append("../src")

import configs_manner

print("Data window size: \n", configs_manner.model_infos["data_window_size"])
print("If data is accumulated: \n", configs_manner.model_infos["data_is_accumulated_values"])
print("If is to apply the moving average: \n", configs_manner.model_infos["data_is_apply_moving_average"])

Data window size: 
 7
If data is accumulated: 
 False
If is to apply the moving average: 
 True


To configure any param, fill the `configure.json` file.

## Remote data request

It is necessary to set some information for requesting data from local path, database or remote web.

In [3]:
# specif code to the remote repository data.
repo = "p971074907"
# coutry and state acronym splited by a ":"
path = "brl:rn"
# columns (or features) to be extracted from the database, each one splited by a ":"
feature = "date:newDeaths:newCases:"
# start date for the data request.
begin = "2020-05-01"
# finish date for the data request.
end = "2021-07-01"

### Load data

Before requesting data, it is necessary to create a `data constructor`, wich will collect and prepare the data for Ncovid.

Create a `DataConstructor` and use `.collect_dataframe()`. See [Loading remote data](loading_remote_data.ipynb) file.

In [4]:
# import the data_manner.py file. (taking into account that you are in src/ path)
import data_manner

# creating the DataConstructor instance
data_constructor = data_manner.DataConstructor()
# collect data from the remote repository.
collected_data = data_constructor.collect_dataframe(path, repo, feature, begin, end)

Implicitly, multiple data manipulation and transformations are made, such as applying moving average or data differentiation.

### Build test

Now, use `.build_test()` method to create a data test.

In [5]:
test = data_constructor.build_test(collected_data)

print("Test X and Test target shapes: ", test.x.shape, test.y.shape)

Test X and Test target shapes:  (60, 7, 2) (60, 7, 1)
