# Import Airfoil design Dataset

The aim of this notebook is to shows how the challenge datasets could be downloaded and imported using LIPS features.

### Prerequisites

Install the LIPS framework if it is not already done. For more information look at the LIPS framework [Github repository](https://github.com/IRT-SystemX/LIPS) 

In [None]:
# !pip install -r requirements.txt
# or 
# !pip install -U .

Install the AirfRANS package

In [None]:
# !pip install airfrans

In [None]:
#### Import required packages
import os
from lips import get_root_path
from lips.dataset.airfransDataSet import download_data
from lips.benchmark.airfransBenchmark import AirfRANSBenchmark

In [None]:
# indicate required paths
LIPS_PATH = get_root_path()
DIRECTORY_NAME = 'Dataset'
BENCHMARK_NAME = "Case1"
LOG_PATH = LIPS_PATH + "lips_logs.log"

Define the configuration files path.

In [None]:
BENCH_CONFIG_PATH = os.path.join("airfoilConfigurations","benchmarks","confAirfoil.ini") #Configuration file related to the benchmark

Download the data

In [None]:
if not os.path.isdir(DIRECTORY_NAME):
    download_data(root_path=".", directory_name=DIRECTORY_NAME)

In order to load the data on disk, we rely on the `load` method of the dedicated benchmark class. On could also load individually each dataset if required.

However, note that in the context of this competition, the datasets considered are loaded by using the airfrans dataset but with some modifications, namely:

- Train dataset: 'scarce' task, training split, filtered to keep the simulation where the number of reynolds is between 3e6  and 5e6
- Test dataset: 'full' task, testing split
- OOD dataset: reynolds task, testing split

In [None]:
benchmark=AirfRANSBenchmark(benchmark_path = DIRECTORY_NAME,
                            config_path = BENCH_CONFIG_PATH,
                            benchmark_name = BENCHMARK_NAME,
                            log_path = LOG_PATH)
benchmark.load(path=DIRECTORY_NAME)

We can also have a look at the datasets loaded (meaning all the features, labels)

In [None]:
print("train dataset: ", benchmark.train_dataset)
print("test dataset: ", benchmark._test_dataset )
print("test dataset: ", benchmark._test_ood_dataset )

For each dataset, the number of samples within it is the overall number of nodes in the whole dataset. Therefore, each variable within the dataset arises from the concatenation of each nodal quantities

Thus, it can also be intesresting to retrieve the number of physical simulation in each of them. It can be done with the method `get_simulations_sizes` from the `Dataset` class.

In [None]:
for datasetName,dataset in zip(["Train","Test","OOD"],[benchmark.train_dataset,benchmark._test_dataset,benchmark._test_ood_dataset]):
    print("%s dataset: "%datasetName)
    print("\t Number of simulations:", len(dataset.get_simulations_sizes()))