New ML online training tutorial #176

al-rigazzi · 2022-03-20T23:21:58Z

This PR substitutes the old ML training tutorials with a new one.

The major features of this tutorial:

is set to run with local launcher, thus it will be run in the docs page too
trains a real surrogate model
does not require Horovod or MPI4PY to run
runs in 3 minutes on a standard CPU

The whole training is performed in Keras, the PyTorch version will be ready soon.

codecov-commenter · 2022-03-20T23:31:02Z

Codecov Report

Merging #176 (b78c03e) into develop (d59cd2e) will increase coverage by 0.23%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           develop     #176      +/-   ##
===========================================
+ Coverage    81.20%   81.43%   +0.23%     
===========================================
  Files           57       57              
  Lines         2910     2968      +58     
===========================================
+ Hits          2363     2417      +54     
- Misses         547      551       +4

Impacted Files	Coverage Δ
smartsim/ml/tf/__init__.py	`100.00% <100.00%> (ø)`
smartsim/ml/tf/utils.py	`95.83% <100.00%> (+2.50%)`	⬆️
smartsim/_core/generation/modelwriter.py	`84.93% <0.00%> (-4.78%)`	⬇️
smartsim/settings/base.py	`94.11% <0.00%> (+2.81%)`	⬆️

Spartee

One tiny comment, but I'm approving this. I think we could open 1 more ticket about getting this into the smartsim-tutorials container, but I feel like thats outside the scope of this ticket.

Spartee · 2022-03-25T22:53:48Z

smartsim/ml/tf/utils.py

+    input_names = [x.name.split(":")[0] for x in frozen_func.inputs]
+    output_names = [x.name.split(":")[0] for x in frozen_func.outputs]
+
+    model_serialized = frozen_func.graph.as_graph_def().SerializeToString(deterministic=True)


should this be an option? I'm guessing no but want to be sure.

al-rigazzi · 2022-03-30T14:52:51Z

Added a test for the new function, coverage is stable (small increase).

Add surrogate notebook, remove old notebooks

c15993f

al-rigazzi requested review from ashao and Spartee March 20, 2022 23:21

al-rigazzi added 3 commits March 21, 2022 16:45

Fix online training doc

6a3d0b5

Change ml notebook and dockerfiles related

8e8198f

Fix typo in surrogate training notebook

b6ec9bb

Spartee approved these changes Mar 25, 2022

View reviewed changes

Add tests for new serialize_model function

b78c03e

al-rigazzi merged commit 958877b into CrayLabs:develop Mar 30, 2022

al-rigazzi deleted the new-ml-tutorial branch March 30, 2022 14:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New ML online training tutorial #176

New ML online training tutorial #176

al-rigazzi commented Mar 20, 2022

codecov-commenter commented Mar 20, 2022 •

edited

Loading

Spartee left a comment

Spartee Mar 25, 2022

al-rigazzi commented Mar 30, 2022

New ML online training tutorial #176

New ML online training tutorial #176

Conversation

al-rigazzi commented Mar 20, 2022

codecov-commenter commented Mar 20, 2022 • edited Loading

Codecov Report

Spartee left a comment

Choose a reason for hiding this comment

Spartee Mar 25, 2022

Choose a reason for hiding this comment

al-rigazzi commented Mar 30, 2022

codecov-commenter commented Mar 20, 2022 •

edited

Loading