User can request managerial info, like trial string identifiers, from cli #69

tsirif · 2018-05-07T06:15:27Z

Why:
I need to way to identify saved stuff in local disk from the
execution of my script given a particular trial. This way I can have
a reproducible deterministic string identifier that I intend to pass
appropriately to a command line argument (if specified from orion hunt) of my executable. This will disambiguate saved states,
snapshots, or logs (i.e. from tensorboard), among different trials of
the same executable.

The actual implementation of all save/load is left to the executable
itself, as well as configuring the absolute path where the stuff will
be saved to. Most (responsible) ML programs I have personally encountered
have a cli way to specify some naming for the experiment to run.

bouthilx

We can leave seed support for latter. I'll open an issue. Please fix the docstring for flake8 and for first person singular (comment here). Rest is fine. 🙂

bouthilx · 2018-05-09T00:22:44Z

src/orion/core/io/space_builder.py

+           2. Prefix ``'$'`` is given to substitutions from exposed properties.
+           3. Prefix ``'_'`` is given to arguments which do not interact with.
+              Suffix is a unique integer.
+           4. ``'config'`` is given to user's script configuration file template


There should be an example. The descriptions alone are not clear enough.

bouthilx · 2018-05-09T00:26:15Z

src/orion/core/io/space_builder.py

+        argument, if not found elsewhere. Only one is allowed.
+
+        .. note:: Currently positional arguments cannot define a parameter
+           dimension. And I don't think they will; it wouldn't be good for


We should not use first person singular in docstrings. We can't use positional arguments because they don't have a name assigned to. I think that's enough justification.

bouthilx · 2018-05-09T00:41:32Z

src/orion/core/io/space_builder.py

+    USERARGS_NAMESPACE = r'\W*([a-zA-Z0-9_-]+)'
+    USERARGS_CONFIG = '--config='
+
+    EXPOSED_PROPERTIES = ['trial.hash_name', 'trial.full_name', 'exp.name']


If I understand properly, if there is some $trial.hash_name in the command line of the user, this will be replaced by the actual trial.hash_name provided by orion. Is that right?

bouthilx · 2018-05-09T00:44:26Z

src/orion/core/worker/trial.py

-        ret = "Trial(experiment={0}, status={1}, params.value={2})".format(
-            repr(self.experiment), repr(self._status), [p.value for p in self.params])
+        ret = "Trial(experiment={0}, status={1},\n      params={2})".format(
+            repr(self.experiment), repr(self._status), self.params_repr(sep=('\n' + ' ' * 13)))


What about using pprint if we want some fancy repr?

of course, but later

bouthilx · 2018-05-09T00:48:22Z

src/orion/core/worker/trial.py

+    def hash_name(self):
+        """Generate a unique name with an md5sum hash for this `Trial`.
+
+        .. note:: Two trials that have the same `params` must have the same `hash_name`.


I disagree. Two trials with identical parameters could have different results. We can't assume the user's script is deterministic. We could, however, add a seed to the attributes of the Trial and pass it through an environment variable to the user's script. The user could seed the script using it, making it deterministic for a given pair of (params, seed).

Is it the case that we may request from the script to evaluate a particular set of parameters twice? Then, that's a design issue that we did not take care about.

only if the params are the same, hash_name is going to be, not the results.

#71 is this the one that you opened because of this observation?

Yes I opened #71 for this.

If two trials with identical parameters give different results then they should be considered different, hence have different hash names.

For hyper-parameter optimization, it would not make sense to run twice the exact same hyper-parameters unless they are all categoricals or very small interval of integers. If any is dimension is real or large interval of integers, than the optimization algorithm is better to poke around than running twice identical sets.

However, someone could want to run multiple trials with identical params to compute the variance of the performance. This is something we could support.

To make it clear, what I propose is to keep it like you did for now, our first algorithms won't use seed information anyway. But I believe we should add support for seeds later on, and include the seed information in what we use to compute the hash name of a trial. A change like that should not break anything so it's ok to postpone it. (it's possible to keep backward compatibility for prior trials after we add support for seeds)

hash_name is supposed to be generated and given as an input to user's script so that it is able to disambiguate local directories that it may use (i.e. for snapshots or logs) among different hyperparameter trial sets. It is supposed to be generated by trials whose results are not yet known, as is the use case above. So this hash_name corresponds only to trial.params only. To make it able to run multiple experiments for the same set of hyperparameters:

yes we should provide the seed to the user's script. I predict a ~trial.seed kind of command in the likes of what I've built here. So, seed is going to be an attribute of a Trial as well.

hash_name will be generated as a function of (trial.params, trial.seed) to ensure uniqueness of results and reproducibility.

bouthilx · 2018-05-09T00:50:34Z

tests/functional/demo/orion_config.yaml

@@ -6,6 +6,7 @@ max_trials: 100
 algorithms:
  gradient_descent:
    learning_rate: 0.1
+    # dx_tolerance: 1e-7


Why comment out? Should we remove it?

To show that it is the default, but not used here

ok, that's fine

bouthilx · 2018-05-09T00:53:19Z

tests/functional/demo/test_demo.py

+                                "./dir_per_trial.py",
+                                "--dir={}".format(str(tmpdir)),
+                                "--other-name~exp.name",
+                                "--name~trial.hash_name",


We use ~ for both exposed variables and distributions? Wasn't it supposed to be $ in this case?

bouthilx · 2018-05-09T00:55:59Z

tests/functional/demo/test_demo.py

+    # (notice that an appropriate learning rate was selected in this stress
+    # test to create underdamped behaviour), that it begins to suggest same
+    # things from the past. This is intended to be shown with the assertions
+    # in the for-loop below.


Why not used random search if gradient descent causes side-effects you need to verify to make sure the problem is not md5 of bad hash creation.

For the reason that even in the case where Δx_t is the minimum possible in the finite float representation scenario, which can be reached by gradient descent for example, the demand for diff hash_names => diff parameters is true. By random search I wouldn't be able to have that stressed behaviour.

bouthilx · 2018-05-09T00:57:57Z

tests/unittests/core/test_space_builder.py

-        assert spacebuilder.userargs_tmpl['/yolo'] == '-yolo='
-        assert spacebuilder.userargs_tmpl['/arch2'] == '--arch2='
+        assert list(spacebuilder.userargs_tmpl.keys()) == ['_0', '/yolo', '_1', '/arch2',
+                                                           '$2', '_3', '_4', '_5']


Ah, now I understand what the $ was for.

- Also, explicitly fill `trial._id` with `Trial.NoID` during init. - Minor renaming from self._type to self.type for getters in `Trial.Value`. - Change representation of a `Trial` to better print out contained parameters. Why: I need to way to identify saved stuff in local disk from the execution of my script given a particular trial. This way I can have a reproducible deterministic string identifier that I intend to pass appropriately to a command line argument (if specified from ``orion hunt``) of my executable. This will disambiguate saved states, snapshots, or logs (i.e. from tensorboard), among different trials of the same executable. The actual implementation of all save/load is left to the executable itself, as well as configuring the absolute path where the stuff will be saved to. Most (responsible) ML programs I have personally encountered have a cli way to specify some naming for the experiment to run.

- Replace name property - Planning to match all flavours of experiment naming

Copy pasting from documentation of `SpaceBuilder.build_from`: This method is also responsible for parsing semantics which allow information from the `Experiment` or a `Trial` object to be passed to user's script. For example, a command line option like ``--name~trial.hash_name`` will be parsed to mean that a unique hash identifier of the trial that defines an execution shall be passed to the option ``--name``. Usage of these properties are not obligatory, but it helps integrating with solutions which help experiment reproducibility and resumability. See :attr:`EXPOSED_PROPERTIES` to check which properties are supported.

- `SpaceBuilder.userargs_tmpl` is now an OrderedDict. - Built command line args respect **the order** and **the way** in which they where parsed. - Less acrobatics regarding configuration file path. - Tackles the cases where ``~`` in a cli arg means home folder as in a shell path name. - Logs warning wherever implicit arguable decisions are taken. - TODO: First positional may not be a config file => it's a bad restriction. - TODO: Import parsing regex elements perhaps from a orion config file Copying from documentation of `SpaceBuilder._build_from_args`: Build templates from arguments found in the original cli. Rules for namespacing: 1. Prefix ``'/'`` is given to parameter dimensions. 2. Prefix ``'$'`` is given to substitutions from exposed properties. 3. Prefix ``'_'`` is given to arguments which do not interact with. Suffix is a unique integer. 4. ``'config'`` is given to user's script configuration file template User script configuration file argument is treated specially, trying to recognise a ``--config`` option or by checking the first positional argument, if not found elsewhere. Only one is allowed. .. note:: Currently positional arguments cannot define a parameter dimension. And I don't think they will; it wouldn't be good for your health down the line. Please be responsible and give a good descriptive name to your inputs. .. note:: Templates preserve given argument order.

Reason: MongoDB version changes, message in exception changes

I got greedy. Support it with a functional test. Remove also dead code from ``SpaceBuilder._build_from_args``. Add a test to show how it fails when it does.

Reason is that '/' means special thing in UNIX path names

codecov-io · 2018-05-09T23:52:57Z

Codecov Report

Merging #69 into master will increase coverage by 0.05%.
The diff coverage is 99.38%.

@@            Coverage Diff             @@
##           master      #69      +/-   ##
==========================================
+ Coverage   97.31%   97.37%   +0.05%     
==========================================
  Files          34       34              
  Lines        2982     3087     +105     
  Branches      247      258      +11     
==========================================
+ Hits         2902     3006     +104     
  Misses         49       49              
- Partials       31       32       +1

Impacted Files	Coverage Δ
tests/unittests/core/mongodb_test.py	`99.53% <ø> (-0.01%)`	⬇️
src/orion/core/worker/consumer.py	`78.12% <100%> (ø)`	⬆️
tests/unittests/core/test_space_builder.py	`100% <100%> (ø)`	⬆️
tests/functional/demo/test_demo.py	`98.19% <100%> (+0.26%)`	⬆️
src/orion/core/worker/trial.py	`100% <100%> (ø)`	⬆️
tests/unittests/core/test_trial.py	`100% <100%> (ø)`	⬆️
src/orion/core/io/space_builder.py	`97.68% <98.5%> (-0.18%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9002feb...54d1090. Read the comment docs.

bouthilx · 2018-05-10T00:07:58Z

src/orion/core/io/space_builder.py

-           your health down the line. Please be responsible and give a good
-           descriptive name to your inputs.
+        .. note:: Positional arguments cannot define a parameter
+           dimension, because no name, but an arbitrary (semantically) one, can


I understand what you mean but it seems to me that it could be explained in more simple words. Just dropping "but an arbitrary (semantically) one" would make it much simpler. I don't think the note needs to justify why we do not assign random strings as hyper-parameter names when none are explicitly given.

We could have assigned _5 as a name. Let's keep like this, please

Well that would be terrible. It would be very hard to understand what is _5 when the user try to analyze the results latter on.

The documentation needs to be clear and concise. I'm sorry but this is not.

Positional arguments cannot define a parameter dimension, because no **meaningful** name can be assigned ?

tsirif · 2018-05-10T00:16:23Z

We could have assigned `_5` as a name. Let's keep like this, please.

…

On Wed, May 9, 2018 at 8:07 PM Xavier Bouthillier ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/orion/core/io/space_builder.py <#69 (comment)>: > @@ -337,10 +337,9 @@ def _build_from_args(self, cmd_args): recognise a ``--config`` option or by checking the first positional argument, if not found elsewhere. Only one is allowed. - .. note:: Currently positional arguments cannot define a parameter - dimension. And I don't think they will; it wouldn't be good for - your health down the line. Please be responsible and give a good - descriptive name to your inputs. + .. note:: Positional arguments cannot define a parameter + dimension, because no name, but an arbitrary (semantically) one, can I understand what you mean but it seems to me that it could be explained in more simple words. Just dropping "but an arbitrary (semantically) one" would make it more simpler. I don't think the note needs to justify why we do not assign random strings as hyper-parameter names when none are explicitly given. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#69 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AD5VAe35OJoeH99Oog-GW1vnnqbnvWO3ks5tw4TfgaJpZM4T0c9R> .

bouthilx approved these changes May 9, 2018

View reviewed changes

bouthilx mentioned this pull request May 9, 2018

Support seeds for trials #71

Open

tsirif added 10 commits May 9, 2018 19:43

Add properties hash_name and full_name to Trial

113cb49

- Replace name property - Planning to match all flavours of experiment naming

Add integration test for script that require unique dir per trial

66a18aa

Fix test: Remove expectation msg in native mongo connection error

6ce1e14

Reason: MongoDB version changes, message in exception changes

Add exp.name as exported property

6b77c8b

I got greedy. Support it with a functional test. Remove also dead code from ``SpaceBuilder._build_from_args``. Add a test to show how it fails when it does.

Add another test case to show that paths with ~ are ok

ed23ca3

Add a stress test using underdamped GD for trials' hash_name

ed294ee

Replace '/' with '.' in Trial.full_name

fcb01f2

Reason is that '/' means special thing in UNIX path names

tsirif force-pushed the feature/trial_name branch from 0677da7 to 4e86544 Compare May 9, 2018 23:47

bouthilx reviewed May 10, 2018

View reviewed changes

Fix flake8 and docstring

54d1090

tsirif force-pushed the feature/trial_name branch from 4e86544 to 54d1090 Compare May 10, 2018 00:59

bouthilx merged commit 573d997 into Epistimio:master May 10, 2018

tsirif deleted the feature/trial_name branch May 10, 2018 01:13

tsirif mentioned this pull request May 18, 2018

Trial.Param should be hashable #89

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User can request managerial info, like trial string identifiers, from cli #69

User can request managerial info, like trial string identifiers, from cli #69

tsirif commented May 7, 2018

bouthilx left a comment •

edited

Loading

bouthilx May 9, 2018

bouthilx May 9, 2018

bouthilx May 9, 2018

bouthilx May 9, 2018

bouthilx May 9, 2018

tsirif May 9, 2018

bouthilx May 9, 2018

tsirif May 9, 2018

bouthilx May 9, 2018

tsirif May 10, 2018

bouthilx May 9, 2018

tsirif May 9, 2018

bouthilx May 10, 2018

bouthilx May 9, 2018

bouthilx May 9, 2018

bouthilx May 9, 2018

tsirif May 9, 2018

bouthilx May 9, 2018

codecov-io commented May 9, 2018 •

edited

Loading

bouthilx May 10, 2018 •

edited

Loading

bouthilx May 10, 2018

tsirif May 10, 2018

bouthilx May 10, 2018

tsirif commented May 10, 2018 via email

User can request managerial info, like trial string identifiers, from cli #69

User can request managerial info, like trial string identifiers, from cli #69

Conversation

tsirif commented May 7, 2018

bouthilx left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented May 9, 2018 • edited Loading

Codecov Report

bouthilx May 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tsirif commented May 10, 2018 via email

bouthilx left a comment •

edited

Loading

codecov-io commented May 9, 2018 •

edited

Loading

bouthilx May 10, 2018 •

edited

Loading