# Using d3d_loader to access D3D data

The `d3d_loader` class implements a [pytorch Dataset](https://pytorch.org/docs/stable/data.html#map-style-datasets) that can be used to load D3D data, stored on traverse, for Machine Learning. 


From a user perspective, the `d3d_loader` lines up groups `1d` and `2d` signals over the same time interval and using a common sampling frequency. As a dataset, it can be used to iterate over this group of signals as is done in routine machine learning tasks

The example below illustrates how to instantiate the `d3d_loader` and access multiple signals



In [None]:
import sys
sys.path.append("/home/rkube/repos/d3d_loaders")
from d3d_loaders.d3d_loaders import D3D_dataset
from os.path import join

import numpy as np
import torch
from torch.utils.data import DataLoader

import logging
logging.basicConfig(filename="d3d_loader.log", level=logging.INFO)

## Setup

To setup the dataloader we need to define a shot of interest and the time interval. We also need to define the desired sampling time on which all signals will be sub-samples. Additionally, we can load immediately load data on the gpu be specifying a device

In [None]:
shotnr = 169113
# Define interval for the signals
t0 = 0.001      
t1 = 4000.0
# Define a sampling time. This must be smaller than the sampling frequency on which the data was collected.
t_sample = 1.0 
# Define the GPU as the device on which to store the data
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

Next we want to specify a list of predictor and target signals. Typically these signals will be used as
`model(predictor) = target`. For prediction tasks, some time series should be shifted into the future to satisfy causality. Time shifts can be defined for each predictor and target signal individually using the `shift_target` dictionary. The keys correspond to signal names and the values correspond to a time shift in milliseconds.


Here we use `pinj`, `neut`, and `ae_prob` as predictors. The first two signals are just the data loaded from `hdf5` files. The `ae_prob` signal is constructed from the ECE data through the RCN model.

In [None]:
ds = D3D_dataset(shotnr, t0, t1, t_sample,
                 predictors=["pinj", "neut", "ae_prob"],
                 targets=["ae_prob_delta"],
                 shift_target={"ae_prob_delta": 100.0},
                 device=device)

Once the dataset is instantiated, the `predictor` and `targets` signals are available as [`torch.tensor`](https://pytorch.org/docs/stable/tensors.html).

In [None]:
ds.predictors

In [None]:
ds.targets

Accessing predictor and target data follows pytorch conventions. We can index the dataset to get a tuple of all`(predictor, target)` samples.  

In [None]:
pred, target = ds[0]
print(pred.shape, target.shape)

## DataLoaders
As a dataset, the `d3d_loader` can easily be used in [`DataLoaders`](https://pytorch.org/docs/stable/data.html#module-torch.utils.data). The code below illustrates usage.
Features, such as shuffling, batch_sizes, and multi-threading are supported through this interface.


In [None]:
# Instantiate a dataloader which loads 5 samples per call
loader = DataLoader(ds, batch_size=37)

In [None]:
# Get the first item and print the sizes
pred, target = next(iter(loader))
print(pred.shape, target.shape)