![CC](https://i.creativecommons.org/l/by/4.0/88x31.png)

This work is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).

# End-to-end ML project with OpenFOAM and PyTorch

**Note: this exercise still needs to be updated for the winter term 2022/2023.**

## Overview

In this exercise, we generate and work with the same data as in lecture 3. Many of the tasks can be completed by copying and modifying code snippets from the lecture notebook. Beyond the steps covered in the lecture, we will simplify the ML problem by leveraging a coordinate transformation coming from turbulence modeling.

## Generating the data

To generate the data, copy the *parameter_variation_1d.py* into the exercise folder:
```
# assuming you are at the top-level of the lecture repository
cp test_cases/parameter_variation_1d.py exercises/
```
Now open the script and inspect the implemented functions. Try answering the following questions:

1. Where is the OpenFOAM base simulation located?
2. How many simulations are performed in total?
3. How many simulations are performed at the same time?
4. Which parameter(s) in which file(s) of the base setup is/are modified?

Your workstation or laptop might be equipped with fewer compute cores than the script assumes. Running multiple simulations at the same time on shared resources slows down the computations unnecessarily. To determine the number of CPU cores available on your machine, run the command `lscpu` and search for the line *Core(s) per socket ...* in the output. You should not run more simulations simultaneously than cores are available (each simulation runs only on a single core). Modify the script accordingly, and divide the parameter space into 10 to 30 sections (this number determines how many simulations are performed). To start the parameter study, make sure the script is executable and run the script:
```
cd exercises
chmod +x parameter_variation_1d.py
./parameter_variation_1d.py
```
Depending on the available resources and the overall number of simulations, this computation should take about 10-30min.

## Direct learning approach

Following the lecture notebook:

- load and visualize the data
- compare the velocity profiles against Spalding's function
- reshape, split, and normalize the data
- train a baseline model and evaluate the $L_2$ and $L_\infty$ norms on the test data
- compare the predictions of the best model against the original data
- visualize the prediction errors by an additional method of your choice

## Hyperparameter tuning

In the next step, we try to tune the ML model. Vary the following hyperparameters and try to minimize the prediction error:

- number of training epochs
- learning rate
- number of neurons per layer
- number of hidden layers
- activation function
- repeated training runs

Take the best model you found, compare the prediction against the original data, and visualize the prediction error.

## Leveraging Spalding's function

The good agreement of our data with Spalding's function might have triggered already the idea that we should somehow be able to use this relation. It might not be exactly clear yet how leverage this knowledge from turbulence modeling, but the following steps will guide you there:

- for each Reynolds number, extract the friction velocity $u_\tau$ and plot $u_\tau$ against $Re$
- create a model of your choice for the relation $u_\tau = f(Re)$; if you think that a simpler model than a neural network will do the task, use a simpler model
- transform the original data using the $u_\tau$ model as follows:  
  - transform $U_x$ to $u^+$
  - transform the distance $y$ to $\tilde{y} = \mathrm{log}(y^+)$  
  - normalize the new data  
- create a new model for the relation $u^+ = f(\tilde{y}, Re$
- make a prediction $\hat{u}^+$ based on the test data
- transform $\hat{u}^+$ into $\hat{U}_x$ using the $u_\tau$ model
- compare the model performance to the best model obtained with the direct training approach

In one of the next lectures, you will learn how to hide the composition of multiple models in a single top-level model.

**Congratulations! This completes the third exercise session.**