# Loading Local Data

This script demonstrates how to load a local `.csv` data.

## Set configuration

The methods and modules in this example make use of information from the `configure.json` file, which is located in the `/doc` folder. Fill in the blanks in the file. Some of them are: the model type, the number of nodes, the number of epochs in the training step, the size of the data test, the size of the windowing data, checking if the data contains accumulated values, and a flag to apply the moving average.

In [1]:
configure_json = open('../doc/configure.json', 'r')

import json
print(json.dumps(json.load(configure_json), indent=4))

{
    "ncovid": "ML COVID-19 configure file",
    "author": "NatalNet NCovid",
    "published_at": 2021,
    "folder_configs": {
        "docs_path": "../doc/",
        "data_path": "../dbs/",
        "model_path": "fitted_model/",
        "model_path_remote": "https://",
        "glossary_file": "glossary.json"
    },
    "model_configs": {
        "type_used": "Artificial",
        "is_predicting": "False",
        "Artificial": {
            "model": "lstm",
            "nodes": 300,
            "epochs": 100,
            "dropout": 0.1,
            "batch_size": 64,
            "earlystop": 30,
            "is_output_in_input": "True",
            "data_configs": {
                "is_accumulated_values": "False",
                "is_apply_moving_average": "True",
                "window_size": 7,
                "data_test_size_in_days": 35,
                "type_norm": ""
            },
            "Autoregressive": {
                "model": "arima",
                "p": 1,
    

### Importing the configurations

In [3]:
import sys
sys.path.append("../src")

import configs_manner

print("Model infos: \n", configs_manner.model_infos)
print("\n")
print("Models path: \n", configs_manner.model_path)
print("\n")
print("Data path: \n", configs_manner.data_path)

Model infos: 
 {'model_nodes': 300, 'model_epochs': 100, 'model_dropout': 0.1, 'model_batch_size': 64, 'model_earlystop': 30, 'model_is_output_in_input': True, 'data_is_accumulated_values': False, 'data_is_apply_moving_average': True, 'data_window_size': 7, 'data_test_size_in_days': 35, 'data_type_norm': ''}


Models path: 
 ../dbs/fitted_model/


Data path: 
 ../dbs/


To configure any param, fill the `configure.json` file.

## Create data

Before requesting data, it is necessary to create a `data constructor`, wich will collect and prepare the data for Ncovid.

Create a `DataConstructor` and use `.collect_dataframe()`. See [Loading remote data](loading_remote_data.ipynb) file.

In [4]:
# import the data_manner.py file. (taking into account that you are in src/ path)
import data_manner

# creating the DataConstructor instance
data_constructor = data_manner.DataConstructor()

## Collect local data

Once the data constructor has been created, use `collect_dataframe()` method to find the local `.csv` data path.

In [5]:
# collect data from the local path.
collected_data = data_constructor.collect_dataframe(configs_manner.data_path + "df_araraquara.csv")

The collected data is a time-series n-featured-vector.

In this example, the data contains two features, represented by columns: covid-19 daily 
- 1:  cases, and 
- 2: deaths.

In [6]:
print("Feature 0 (confirmed cases): length ", len(collected_data[0]))
print("Feature 1 (confirmed deaths): length ", len(collected_data[1]))

Feature 0 (confirmed cases): length  430
Feature 1 (confirmed deaths): length  430
