# NNodely Documentation - Load Data

Listed here are all the modalitites by which you can load data inside the nnodely framework.
There are three modalities to load a dataset inside nnodely:
1. Using a directory, each file represents a simulation, with time coherence between lines.
2. Using a dictionary, each element in the dictionary represents a variable.
3. Using a pandas dataframe.

In [1]:
# uncomment the command below to install the nnodely package
#!pip install nnodely

from nnodely import *

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>-- nnodely_v1.4.0 --<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<


In the following lines a network is created.

In [2]:
in1 = Input('in1')
target = Input('target')
relation = Fir(in1.tw(0.05))
output = Output('out', relation)

model = Modely(visualizer=TextVisualizer())
model.addMinimize('out', output, target.last())
model.neuralizeModel(0.01)

[32m{'Constants': {},
 'Functions': {},
 'Info': {'SampleTime': 0.01,
          'nnodely_version': '1.4.0',
          'ns': [5, 0],
          'ntot': 5,
          'num_parameters': 5},
 'Inputs': {'in1': {'dim': 1, 'ns': [5, 0], 'ntot': 5, 'tw': [-0.05, 0]},
            'target': {'dim': 1, 'ns': [1, 0], 'ntot': 1, 'sw': [-1, 0]}},
 'Minimizers': {'out': {'A': 'Fir2', 'B': 'SamplePart4', 'loss': 'mse'}},
 'Outputs': {'out': 'Fir2'},
 'Parameters': {'PFir3W': {'dim': 1,
                           'tw': 0.05,
                           'values': [[0.8369134068489075],
                                      [0.7161089181900024],
                                      [0.04796421527862549],
                                      [0.12465757131576538],
                                      [0.12058025598526001]]}},
 'Relations': {'Fir2': ['Fir', ['TimePart1'], 'PFir3W', None, 0],
               'SamplePart4': ['SamplePart', ['target'], -1, [-1, 0]],
               'TimePart1': ['TimePart', ['

## Load a dataset via directory

Load a dataset inside the framework using a directory.

You must specify a name for the dataset, the folder path and also the structure of the data so that the framework will know which column must be used for every input of the network.

In [3]:
train_folder = 'data'
data_struct = ['in1', '', 'target']
model.loadData(name='dataset', source=train_folder, format=data_struct)

[32mDataset Name:                 dataset[0m
[32mNumber of files:              1[0m
[32mTotal number of samples:      28[0m
[32mShape of target:              (28, 1, 1)[0m
[32mShape of in1:                 (28, 5, 1)[0m


you can also specify various parameters such as the number of lines to skip, the delimiter to use between data and if you want to include the header of the file.

In [4]:
model.loadData(name='dataset_2', source=train_folder, format=data_struct, skiplines=4, delimiter='\t', header=None)

[32mDataset Name:                 dataset_2[0m
[32mNumber of files:              1[0m
[32mTotal number of samples:      24[0m
[32mShape of target:              (24, 1, 0)[0m
[32mShape of in1:                 (24, 5, 1)[0m


## Load a dataset from a custom dictionary

you can build your own dataset with a dictionary containing all the necessary inputs of the network and passing it to the 'source' attribute

In [5]:
import numpy as np
data_x = np.array(range(10))
data_a = 2
data_b = -3
dataset = {'in1': data_x, 'target': (data_a*data_x) + data_b}

model.loadData(name='dataset_3', source=dataset)

[32mDataset Name:                 dataset_3[0m
[32mNumber of files:              1[0m
[32mTotal number of samples:      6[0m
[32mShape of target:              (6, 1, 1)[0m
[32mShape of in1:                 (6, 5, 1)[0m


## Load a dataset from a pandas DataFrame

you can also use a pandas dataframe as source for loading a dataset inside the nnodely framework

In [6]:
import pandas as pd
# Create a DataFrame with random values for each input
df = pd.DataFrame({
    'in1': np.linspace(1,100,100, dtype=np.float32),
    'target': np.linspace(1,100,100, dtype=np.float32)})

model.loadData(name='dataset_4', source=df)

[32mDataset Name:                 dataset_4[0m
[32mNumber of files:              1[0m
[32mTotal number of samples:      96[0m
[32mShape of target:              (96, 1, 1)[0m
[32mShape of in1:                 (96, 5, 1)[0m


## Resampling a pandas DataFrame

if you have a column representing time you can also use those values to resample the dataset using the sample time of the neuralized network

In [7]:
df = pd.DataFrame({
    'time': np.array([1.0,1.5,2.0,4.0,4.5,5.0,7.0,7.5,8.0,8.5], dtype=np.float32),
    'in1': np.linspace(1,10,10, dtype=np.float32),
    'target': np.linspace(1,10,10, dtype=np.float32)})

model.loadData(name='dataset_resampled', source=df, resampling=True)

[32mDataset Name:                 dataset_resampled[0m
[32mNumber of files:              1[0m
[32mTotal number of samples:      747[0m
[32mShape of target:              (747, 1, 1)[0m
[32mShape of in1:                 (747, 5, 1)[0m


## Get Samples from the Dataset

Once a dataset is loaded, you can use it to get random samples from the dataset. Set the 'window' argument to choose the number of samples to get from the specific dataset, and 'index' for selecting a specific time instant.

In [8]:
sample = model.getSamples(dataset='dataset_4', window=5)
model(sample, sampled=True)

{'out': [164.1360626220703,
  165.98228454589844,
  167.82852172851562,
  169.67474365234375,
  171.52096557617188]}