# Load dataset (AirfRANS use case)

The aim of this notebook is to download the data and load it using the dedicated Dataset module within LIPS. For more details about the data, we refer to  [this link](https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.datasets.AirfRANS.html).

## Import required packages

In [None]:
import time
from lips.dataset.airfransDataSet import download_data,AirfRANSDataSet

## First step: download the dataset

In [None]:
DIRECTORY_NAME='Dataset'

import os
if not os.path.isdir(DIRECTORY_NAME):
    download_data(root_path=".", directory_name=DIRECTORY_NAME)

## Second step: load the dataset

Within the data, we select the quantities we are interested in for this use case

In [None]:
attr_names = (
        'x-position',
        'y-position',
        'x-inlet_velocity', 
        'y-inlet_velocity', 
        'distance_function', 
        'x-normals', 
        'y-normals', 
        'x-velocity', 
        'y-velocity', 
        'pressure', 
        'turbulent_viscosity',
    )

Next, we separate the inputs from the outputs

In [None]:
attr_x = attr_names[:7]
attr_y = attr_names[7:]

We are now in position to instantiate the dataset, the only aspects left to do are to define the required arguments

In [None]:
configuration_file = None #Convenient alternative but not required at this point
dataset_name = "my_dataset"
usecase_task = "scarce" #Four task are supported: 'full', 'scarce', 'reynolds', 'aoa'
usecase_split = "training" #Describe which data subset within a task to be used, the other option is testing
log_path = "dataset_log"

In [None]:
my_dataset = AirfRANSDataSet(config = configuration_file, 
                             name = dataset_name,
                             task = usecase_task,
                             split = usecase_split,
                             attr_names = attr_names, 
                             attr_x = attr_x, 
                             attr_y = attr_y, 
                             log_path = log_path)

Finally, we load the dataset

In [None]:
start_time = time.time()
my_dataset.load(path = DIRECTORY_NAME)
end_time = time.time() - start_time
print("Loaded in %.2E s" %end_time)

It is possible to assess the content of the dataset:

In [None]:
print(my_dataset)

To save it/reload it (reloading the dataset after saving it it faster than loading it the first time)

In [None]:
from lips.dataset.airfransDataSet import save_internal,reload_dataset

In [None]:
save_internal(dataset=my_dataset,path_out="AirfRANSDataset")

In [None]:
start_time = time.time()
reloaded_dataset=reload_dataset(path_in = "AirfRANSDataset",
                              name = dataset_name,
                              task = usecase_task,
                              split = usecase_split,
                              attr_x = attr_x, 
                              attr_y = attr_y)
end_time = time.time() - start_time

print(reloaded_dataset, "Loaded in %.2E s" %end_time)

In [None]:
assert my_dataset==reloaded_dataset,"Datasets should be the same!"