# Packed Ensemble Application to the AirfRANS dataset

## Colab setup

Set `colab` to `True` if you wish to use it 

In [1]:
colab = False
if colab:
    from google.colab import drive

    drive.mount("/content/drive")
    !source/content/drive/MyDrive/my_colab_env/bin/activate
    import sys
    import os

    sys.path.append("/content/drive/MyDrive/my_colab_env/lib/python3.10/site-packages")
    os.chdir("/content/drive/MyDrive/ml4science/ml4physim_startingkit")

    sys.path.append(os.getcwd())

## Installation

Install the LIPS framework if it is not already done. For more information look at the LIPS framework [Github repository](https://github.com/IRT-SystemX/LIPS)

In [2]:
# !pip install -r requirements.txt
# or
# !pip install -U .


Install the AirfRANS package

In [3]:
# !pip install airfrans

## Imports

In [4]:
import numpy as np
import pandas as pd
import os
import pickle
import torch
import torch.nn.functional as F

from lips import get_root_path
from lips.dataset.scaler.standard_scaler import StandardScaler
from lips.benchmark.airfransBenchmark import AirfRANSBenchmark
from lips.dataset.airfransDataSet import download_data

from my_packed_ensemble import *
from my_packed_cv import *

## Generic Step (Load the required data) <a id='generic_step'></a>

In [5]:
# indicate required paths
LIPS_PATH = get_root_path()
DIRECTORY_NAME = '../src/Dataset'
BENCHMARK_NAME = "Case1"
LOG_PATH = LIPS_PATH + "lips_logs.log"

Define the configuration files path, that aim to describe specific caracteristics of the use case or the augmented simulator.

In [6]:
BENCH_CONFIG_PATH = os.path.join("airfoilConfigurations", "benchmarks",
                                 "confAirfoil.ini")  #Configuration file related to the benchmark
SIM_CONFIG_PATH = os.path.join("airfoilConfigurations", "simulators", "torch_fc.ini")  #Configuration file re

Download the data

In [7]:
if not os.path.isdir(DIRECTORY_NAME):
    download_data(root_path=".", directory_name=DIRECTORY_NAME)

Loading the dataset using the dedicated class used by LIPS platform offers a list of advantages:

1. Ease the importing of datasets
1. A set of functions to organize the `inputs` and `outputs` required by augmented simulators


In [8]:
# Load the required benchmark datasets, if it is not pickled, pickle it for future use
try:
    with open('benchmark.pkl', 'rb') as f:
        benchmark = pickle.load(f)
except:
    benchmark = AirfRANSBenchmark(benchmark_path=DIRECTORY_NAME,
                                  config_path=BENCH_CONFIG_PATH,
                                  benchmark_name=BENCHMARK_NAME,
                                  log_path=LOG_PATH)
    benchmark.load(path=DIRECTORY_NAME)
    with open('benchmark.pkl', 'wb') as f:
        pickle.dump(benchmark, f)

In [9]:
dict = {
    "evaluateonly": False,
      "scoringonly": False,
      "simulator_config": {
        "simulator_type": "custom",
        "simulator_file": "my_packed_ensemble_mk2",
        "name": "MyAugmentedSimulator",
        "model": "AugmentedSimulator",
        "scaler_type": "None"
       },
      "simulator_extra_parameters": {
        "input_size": 7,
        "output_size": 4,
        "hidden_sizes": [64,64,8,64,64,64,8,64,64],
        "M": 8,
        "alpha": 4,
        "gamma": 1,
        "device": "cuda",
    
        "batch_size": 1,
        "nb_epochs": 600,
        "lr": 0.001,
        "subsampling": 32000
      },
      "training_config": {}
}

In [10]:
import new_submission.my_packed_ensemble_mk2 as my_packed_ensemble_mk2

simulator = my_packed_ensemble_mk2.AugmentedSimulator(benchmark, **dict["simulator_extra_parameters"])

Using GPU


In [32]:
simulator.train(benchmark.train_dataset)

Normalize train data
Transform done
Start training


  0%|          | 0/600 [00:00<?, ?it/s]

Epoch:  1


  0%|          | 1/600 [00:09<1:36:06,  9.63s/it]

Epoch:  2


  0%|          | 2/600 [00:18<1:32:45,  9.31s/it]

Epoch:  3


  0%|          | 3/600 [00:27<1:31:45,  9.22s/it]

Epoch:  4


  1%|          | 4/600 [00:36<1:31:15,  9.19s/it]

Epoch:  5


  1%|          | 5/600 [00:46<1:31:07,  9.19s/it]

Epoch:  6


  1%|          | 6/600 [00:55<1:30:59,  9.19s/it]

Epoch:  7


  1%|          | 7/600 [01:04<1:30:28,  9.15s/it]

Epoch:  8


  1%|▏         | 8/600 [01:13<1:30:07,  9.13s/it]

Epoch:  9


  2%|▏         | 9/600 [01:22<1:29:52,  9.12s/it]

Epoch:  10


  2%|▏         | 10/600 [01:31<1:29:43,  9.12s/it]

Epoch:  11


  2%|▏         | 11/600 [01:40<1:29:17,  9.10s/it]

Epoch:  12


  2%|▏         | 12/600 [01:49<1:29:21,  9.12s/it]

Epoch:  13


  2%|▏         | 13/600 [01:58<1:28:58,  9.10s/it]

Epoch:  14


  2%|▏         | 14/600 [02:08<1:28:52,  9.10s/it]

Epoch:  15


  2%|▎         | 15/600 [02:17<1:29:03,  9.13s/it]

Epoch:  16


  3%|▎         | 16/600 [02:26<1:29:04,  9.15s/it]

Epoch:  17


  3%|▎         | 17/600 [02:35<1:28:35,  9.12s/it]

Epoch:  18


  3%|▎         | 18/600 [02:44<1:28:30,  9.12s/it]

Epoch:  19


  3%|▎         | 19/600 [02:53<1:28:36,  9.15s/it]

Epoch:  20


  3%|▎         | 20/600 [03:02<1:28:12,  9.13s/it]

Epoch:  21


  4%|▎         | 21/600 [03:12<1:28:11,  9.14s/it]

Epoch:  22


  4%|▎         | 22/600 [03:21<1:28:11,  9.15s/it]

Epoch:  23


  4%|▍         | 23/600 [03:30<1:28:14,  9.18s/it]

Epoch:  24


  4%|▍         | 24/600 [03:39<1:28:09,  9.18s/it]

Epoch:  25


  4%|▍         | 25/600 [03:48<1:27:59,  9.18s/it]

Epoch:  26


  4%|▍         | 26/600 [03:57<1:27:24,  9.14s/it]

Epoch:  27


  4%|▍         | 27/600 [04:06<1:26:48,  9.09s/it]

Epoch:  28


  4%|▍         | 27/600 [04:13<1:29:49,  9.41s/it]


KeyboardInterrupt: 

## Model selection (Cross validation)

Importing the necessary dependencies, as well as the `packed_ensemble` methods

Create cross validation on hyperparameters of the model defined by ``param_grid``

In [9]:
param_grid = {
    'hidden_sizes': [(48, 128, 48), (128, 256, 128)],
    'dropout': [True, False],
    "alpha": [2, 4],
    "gamma": [2, 4],
    "M": [4],
    'lr': [1e-2, 1e-3]
}

The `param_grid` will be divided in 3 partitions, each one will be executed on a different machine.

- Anton - partition 0
- Anthony - partition 1
- Alexi - partition 2

Change it in the cell below.

In [None]:
partition = 1
device="cuda" if torch.cuda.is_available() else "cpu"

# hyperparameter tuning using CV
hyperparameters_tuning(benchmark=benchmark, param_grid=param_grid, k_folds=4, num_epochs=100, batch_size=500000, shuffle=True, n_workers=6,
                        scaler=StandardScaler(), partition=partition, verbose=True, size_scale=0.3, device=device)

<br></br>

---

## Model training

Define input and output sizes of the model

In [None]:
input_size, output_size = infer_input_output_size(benchmark.train_dataset)

create a Packed MLP model

In [None]:
# device
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

model = PackedMLP(input_size=input_size,
                  output_size=output_size,
                  hidden_sizes=(50, 100, 50),
                  activation=F.relu,
                  device=device,
                  dropout=True,
                  )
model.to(device)
print(model.device)
print(model)

Create the `trainloader`

In [None]:
train_loader = model.process_dataset(benchmark.train_dataset, training=True, n_workers=6)

Training the model

In [None]:
model, train_losses, _ = train(model, train_loader, epochs=1, device=device, lr=3e-4)

##### prediction on `test_dataset`
This dataset has the same distribution as the training set

In [None]:
predictions, observations = predict(model, benchmark._test_dataset, device=device)

In [None]:
print("Prediction dimensions: ", predictions["x-velocity"].shape, predictions["y-velocity"].shape,
      predictions["pressure"].shape, predictions["turbulent_viscosity"].shape)
print("Observation dimensions:", observations["x-velocity"].shape, observations["y-velocity"].shape,
      observations["pressure"].shape, observations["turbulent_viscosity"].shape)
print("We have good dimensions!")

In [None]:
from lips.evaluation.airfrans_evaluation import AirfRANSEvaluation

evaluator = AirfRANSEvaluation(config_path=BENCH_CONFIG_PATH,
                               scenario=BENCHMARK_NAME,
                               data_path=DIRECTORY_NAME,
                               log_path=LOG_PATH)

observation_metadata = benchmark._test_dataset.extra_data
metrics = evaluator.evaluate(observations=observations,
                             predictions=predictions,
                             observation_metadata=observation_metadata)
print(metrics)

##### Prediction on `test_ood_dataset`
This dataset has a different distribution in comparison to the training set.

In [None]:
predictions, observations = predict(model, benchmark._test_ood_dataset, device=device)
evaluator = AirfRANSEvaluation(config_path=BENCH_CONFIG_PATH,
                               scenario=BENCHMARK_NAME,
                               data_path=DIRECTORY_NAME,
                               log_path=LOG_PATH)

metrics = evaluator.evaluate(observations=observations,
                             predictions=predictions,
                             observation_metadata=observation_metadata)
print(metrics)