# Using d3d_loader to access D3D data

The `d3d_loader` class implements a [pytorch Dataset](https://pytorch.org/docs/stable/data.html#map-style-datasets) that can be used to load D3D data, stored on traverse, for Machine Learning. 


From a user perspective, the `d3d_loader` lines up groups `1d` and `2d` signals over the same time interval and using a common sampling frequency. As a dataset, it can be used to iterate over this group of signals as is done in routine machine learning tasks

The example below illustrates how to instantiate the `d3d_loader` and access multiple signals



In [1]:
import sys
sys.path.append("/home/rkube/repos/d3d_loaders")
from d3d_loaders.d3d_loaders import D3D_dataset
from os.path import join

import numpy as np
import torch
from torch.utils.data import DataLoader

import logging
logging.basicConfig(filename="d3d_loader.log", level=logging.INFO)

## Setup

To setup the dataloader we need to define a shot of interest and the time interval. We also need to define the desired sampling time on which all signals will be sub-samples. Additionally, we can load immediately load data on the gpu be specifying a device

In [2]:
shotnr = 169113
# Define interval for the signals
t0 = 0.001      
t1 = 4000.0
# Define a sampling time. This must be smaller than the sampling frequency on which the data was collected.
t_sample = 1.0 
# Define the GPU as the device on which to store the data
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Store variable in dictionary to pass to loading functions
t_params = {
    "tstart" : t0,
    "tend"   : t1,
    "tsample": t_sample
}

Next we want to specify a list of predictor and target signals. Typically these signals will be used as
`model(predictor) = target`. For prediction tasks, some time series should be shifted into the future to satisfy causality. Time shifts can be defined for each predictor and target signal individually using the `shift_target` dictionary. The keys correspond to signal names and the values correspond to a time shift in milliseconds.


Here we use `pinj`, `neut`, and `ae_prob` as predictors. The first two signals are just the data loaded from `hdf5` files. The `ae_prob` signal is constructed from the ECE data through the RCN model.

In [3]:
ds = D3D_dataset(shotnr, t_params,
                 predictors=["pinj", "neut", "ae_prob"],
                 targets=["ae_prob_delta"],
                 shift_target={"ae_prob_delta": 100.0},
                 device=device)

Once the dataset is instantiated, the `predictor` and `targets` signals are available as [`torch.tensor`](https://pytorch.org/docs/stable/tensors.html).

In [4]:
ds.predictors

{'pinj': signal_1d(tensor([[-1.7243],
         [-1.7243],
         [-1.7243],
         ...,
         [-0.6110],
         [-0.5756],
         [-0.5817]], device='cuda:0'),
 'neut': signal_1d(tensor([[-1.1213],
         [-0.9802],
         [-0.6800],
         ...,
         [-0.9932],
         [-0.6994],
         [-0.9071]], device='cuda:0'),
 'ae_prob': signal_1d(tensor([[-0.5361, -0.4713, -0.5580, -0.5282, -0.4688],
         [-0.5552, -0.5045, -0.5606, -0.5465, -0.5136],
         [-0.5714, -0.5365, -0.5582, -0.5660, -0.5408],
         ...,
         [-0.5808, -0.3281, -0.2933,  0.1483, -0.2929],
         [-0.5810, -0.3023, -0.2971,  0.1564, -0.2804],
         [-0.5828, -0.2705, -0.2981,  0.1742, -0.2535]], device='cuda:0')}

In [5]:
ds.targets

{'ae_prob_delta': signal_1d(tensor([[-0.0339, -0.0120, -0.0196, -0.0228, -0.0230],
         [-0.0497, -0.0067, -0.0298, -0.0250, -0.0404],
         [-0.0613, -0.0042, -0.0491, -0.0298, -0.0720],
         ...,
         [-0.0173, -0.2163, -0.1291, -0.1429,  0.2643],
         [-0.0111, -0.2856, -0.1328, -0.1607,  0.2149],
         [-0.0033, -0.3649, -0.1427, -0.2090,  0.1258]], device='cuda:0')}

Accessing predictor and target data follows pytorch conventions. We can index the dataset to get a tuple of all`(predictor, target)` samples.  

In [6]:
pred, target = ds[0]
print(pred.shape, target.shape)

torch.Size([7]) torch.Size([5])


## DataLoaders
As a dataset, the `d3d_loader` can easily be used in [`DataLoaders`](https://pytorch.org/docs/stable/data.html#module-torch.utils.data). The code below illustrates usage.
Features, such as shuffling, batch_sizes, and multi-threading are supported through this interface.


In [7]:
# Instantiate a dataloader which loads 5 samples per call
loader = DataLoader(ds, batch_size=37)

In [8]:
# Get the first item and print the sizes
pred, target = next(iter(loader))
print(pred.shape, target.shape)

torch.Size([37, 7]) torch.Size([37, 5])
