# Take Home Coding Challenge

## Objective:
Prepare a dataset for training an ML model on the airfRANS data.

The airfRANS dataset is a collection of airfoils and their RANS simulation solution. The [documentation](https://airfrans.readthedocs.io/en/latest/notes/dataset.html) describes the dataset and the functionality in the Python library they've built in detail.

Ask:
* Create a dataset for training an ML model using the airfRANS dataset.
  * The dataset should proivde a sequence of points with their SDF (distance from the airfoil) value as the input `(x, y, sdf)` and the velocity `(x, y, v_x, v_y`) as the target. Package the data such that it can be quickly loaded for training a model.
* Provide some dataset statistics to help users understand the data.
* Create a simple model to train on the dataset.
  * Note: Don't worry about model performance, just get something that runs and trains.
* Document your design decisions.

### Note:
This is an intentionally open-ended challenge. The primary objective is to see how you write code. Think of this as an evaluation of your ability to write code for a production environment.

## Time
This challenge is designed to take ~2-3 hours. If it's taking much longer than that, feel free to stop and document what your next steps would have been.

## Deliverables
Your choice. You can send us a github repo, jupyter notebook, or just raw files. Do whatever you think will best demonstrate your SW engineering skills.

In [26]:
import airfrans as af
import os
from pathlib import Path

In [2]:
directory_name = Path('airfrans/')
file_name = 'Dataset'
if not directory_name.exists() or not any(directory_name.iterdir()):
  af.dataset.download(root=str(directory_name), file_name=file_name, unzip=True, OpenFOAM=False)

Downloading AirfRANS: 9.34GB [17:31, 9.54MB/s]                                


Extracting Dataset.zip at airfrans...


In [27]:
directory_name = Path('airfrans/')
file_name = 'Dataset'
dataset_list, dataset_name = af.dataset.load(root=str(directory_name/file_name), task = 'reynolds', train = True)

NameError: name 'directory_name' is not defined

In [24]:
import importlib
import sys
sys.path.append('./data_manip')
import manip_utils as manip
importlib.reload(manip)

filtered_dataframes = manip.convert_and_filter_dataframes(dataset_list)
# Verify the first DataFrame
print(filtered_dataframes[0].head())

NameError: name 'dataset_list' is not defined

In [14]:
# Save the filtered DataFrames to the default directory './processed_data/'
manip.save_dataframes_as_bytes(filtered_dataframes)

Total DataFrames saved: 504


In [21]:
import importlib
import sys
sys.path.append('./data_manip')
import manip_utils as manip
importlib.reload(manip)
# Load the saved DataFrames in batches and collect them into a single list
loaded_dataframes = manip.load_dataframes_in_batches_and_collect(batch_size=10)

# Verify the number of DataFrames loaded
print(f"Loaded {len(loaded_dataframes)} DataFrames.")

Total DataFrames loaded: 504
Loaded 504 DataFrames.


In [4]:
print(loaded_dataframes[0].head())

          x         y       sdf       v_x       v_y
0  4.216870 -0.199916  3.223076 -3.353434  2.491901
1  4.216889 -0.199935  3.223097 -3.353404  2.491885
2  3.991839 -0.185931  2.997611 -3.343191  2.857927
3  3.991858 -0.185950  2.997631 -3.343161  2.857906
4  3.782508 -0.172922  2.787876 -3.332824  3.269879


In [22]:
train_df = loaded_dataframes[:5]

train_X, train_Y = manip.package_dataframes_for_training(train_df)


In [12]:
import viz
importlib.reload(viz)
viz.generate_heatmaps_for_simulations(loaded_dataframes, selected_indices=[0, 1])

Heatmaps for simulation 0 saved to ./heatmaps/.
Heatmaps for simulation 1 saved to ./heatmaps/.


In [13]:
# Print statistics for the first simulation
viz.print_simulation_statistics(loaded_dataframes[0])

# Create distribution plots for the first simulation and save them
viz.plot_data_distributions(loaded_dataframes[0], simulation_index=0)


             x         y       sdf        v_x          v_y
min  -2.161836 -1.619144  0.000000 -72.970451 -2688.514893
max   4.225655  1.615449  3.512383  18.782475  1732.500000
mean  0.487830  0.001489  0.237908  -8.261889     2.031050
std   0.724264  0.331377  0.483629  15.898418   690.991214
Saved x distribution plot to ./data_distribution_plots/simulation_0_x_distribution.png.
Saved y distribution plot to ./data_distribution_plots/simulation_0_y_distribution.png.
Saved v_x distribution plot to ./data_distribution_plots/simulation_0_v_x_distribution.png.
Saved v_y distribution plot to ./data_distribution_plots/simulation_0_v_y_distribution.png.
Saved sdf distribution plot to ./data_distribution_plots/simulation_0_sdf_distribution.png.


In [23]:
sys.path.append('./modeling')
import AirFNN
importlib.reload(AirFNN)
model = AirFNN.AirFNN()
AirFNN.train_model(train_X, train_Y, model)

Using device: cpu
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch complete
Batch c

After this you can use the model to make predictions on new data and also test it on test data to see if the test loss and training loss are similar. Similar test and training loss implies that the model is not overfitting.