Feature: Algorithms, Dimensions, Space and Embeddings #36

tsirif · 2017-12-19T19:50:51Z

No description provided.

- Missing transformers but it should be functional without them for now. Transformers will be later compositor objects. Check module documentation. In general I think this refactoring will be able to support arbitrary transformations and distributions on dimensions.

- Fix shape to use scipy internal function

Also, silence a new 'wrong-import-order' error form pylint since it conflicts with current 'google' style from flake-import-order plugin.

- Change how Categorical is repr-ed so that probabilities are printed .2f

codecov-io · 2017-12-22T03:37:33Z

Codecov Report

Merging #36 into master will increase coverage by 10.59%.
The diff coverage is 97.27%.

@@             Coverage Diff             @@
##           master      #36       +/-   ##
===========================================
+ Coverage   86.95%   97.54%   +10.59%     
===========================================
  Files          16       34       +18     
  Lines        1012     2691     +1679     
  Branches      102      223      +121     
===========================================
+ Hits          880     2625     +1745     
+ Misses        132       42       -90     
- Partials        0       24       +24

Impacted Files	Coverage Δ
src/metaopt/core/__init__.py	`100% <ø> (ø)`	⬆️
tests/unittests/core/mongodb_test.py	`100% <ø> (ø)`
src/metaopt/core/worker/trial.py	`100% <100%> (ø)`	⬆️
src/metaopt/core/io/database/__init__.py	`100% <100%> (ø)`	⬆️
src/metaopt/algo/space.py	`100% <100%> (ø)`
src/metaopt/algo/random.py	`100% <100%> (ø)`
src/metaopt/core/io/convert.py	`100% <100%> (ø)`
tests/unittests/core/test_trial.py	`100% <100%> (ø)`
tests/unittests/client/test_client.py	`100% <100%> (ø)`
src/metaopt/core/io/database/mongodb.py	`100% <100%> (ø)`	⬆️
... and 44 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f4db650...00496fc. Read the comment docs.

- Highlight Scipy's bug with a xfail test

1. Declare namespace packages: ``metaopt`` and ``metaopt.algo`` 2. Rename distribution of this package as ``metaopt.core`` 3. Make `Factory` metaclass search and register for available entry points with a group named after the by-itself-and-for-itself class 4. Add base class for optimization algorithms

bouthilx · 2018-02-04T21:39:58Z

🎉 😄

We are rushing for ICML so I might not have enough time to review it unfortunately. Worst case scenario would be February 12th. I'll do my best, I can hardly wait to review and have this PR merged. 🙂

bouthilx

Review commits up to 55 / 55

a4414af - Space: Add _Discrete class and ban interval from Categorical

bouthilx · 2018-02-27T01:58:58Z

src/metaopt/core/worker/experiment.py

@@ -283,6 +283,9 @@ def configure(self, config):
        if self._init_done:
            raise RuntimeError("Configuration is done; cannot reset an Experiment.")

+        # Sanitize and replace some sections with objects
+        candidate_algorithms = Experiment._sanitize_and_instantiate_config(config)


That is weird. Why would _sanitize_and_instantiate_config() return optimization algorithms. I understand the convenience, but it's weird. Smells like there is a bad coupling.

I would at least split that line and fetch candidate_algorithms from config after instantiation.

Ok this is a little bit nasty, but I will explain:

What we 'want':

Most recent metaopt configuration has priority over the existing in database. In addition, having the same experiment name, means that we run the same algorithms.

Configurations should match except allowed exceptions. Currently those are: non_forking_attrs = ('status', 'pool_size', 'max_trials')

User can specify an algorithm to run with its defaults parameters just by specifying the algo's name. So a user will not be forced to write for instance:
algorithms: random: {} instead of algorithms: random in moptconfig. Note that in former case the value of algorithms is a dictionary, while in the latter case it is a string.

We would like to save in the database the complete specification of an algorithm, with all of its parameters specified. That's a measure in order to have consistency. Consider the case where the defaults values for an algorithm's parameters change.

What's the problem and solution:
Comparing the configuration, needs them to have a similar form. For our convenience, we can proactively instantiate algorithms from the new configuration (so that (3) can be a feature) and sanitize it by executing (code from Experiment._sanitize_and_instantiate_config):

algorithms = PrimaryAlgo(space, config['algorithms']) # Sanitize 'algorithms' section in given configuration config['algorithms'] = algorithms.configuration

, as BaseAlgorithm.configuration always returns the complete specification.
So now we can compare them safely, because we didn't mess non-forking attributes (2) and we brought it to a consistent form (4). We are able to proactively instantiate the algorithms because of (1).

Can you elaborate on:

I would at least split that line and fetch candidate_algorithms from config after instantiation.

I did not quite get it.

As I said, I understand the convenience, but thanks for the explanation anyway. 🙂

Is Experiment._sanitize_and_instantiate_config() expected to affect the state of the experiment object at some point? I know it does not in the current implementation but maybe you have some plan in mind where this would happen? If it's not the case, then I would make the function edit a copy of config and return the modified version, leaving the passed one intact.

instantiated_config = Experiment._sanitize_and_instantiate_config(config) candidate_algorithms = instantiated_config['algorithms']

Or if we keep current inplace implementation, my two line split proposition would look like this.

Experiment._sanitize_and_instantiate_config(config) candidate_algorithms = config['algorithms']

I'd prefer the former one.

@tsirif Can we settle on this one too?

I don't think it can be changed this way. or I am not able to see the way. I am going to go through the code with a particular example in mind (I will remove some code to focus where it's necessary):

def configure(self, config): # ``config['algorithms'] == 'random'`` ; this was set from user's configuration candidate_algorithms = Experiment._sanitize_and_instantiate_config(config) # after this call variable config will be: ``config['algorithms'] == {'random': {}}`` if self.status is None: # Suppose that experiment with same name already exists is_new = True else: # So we will be here # Now we have to check against what already exists in database # If were not to to `_sanitize_and_instantiate_config` as it is, # then ``config['algorithms'] == 'random'``. is_new = self._is_different_from(config) # ``is_new is True``, if ``config['algorithms'] == 'random'``. # But ``is_new is False``, if ``config['algorithms'] == {'random': {}}`` # That is because in database, a complete (not partial) specification of the experiment # and its algorithms is being saved. if is_new: self._fork_config(config) for section, value in config.items(): setattr(self, section, value) self._init_done = True # In any case, current config from user has the top precedence in defining # which algorithms are going to be run. self.algorithms = candidate_algorithms self.status = 'pending' # Get complete specification of experiment final_config = self.configuration # Push it to the database if is_new: self._db.write('experiments', final_config) self._id = final_config['_id'] else: self._db.write('experiments', final_config, {'_id': self._id})

def _sanitize_and_instantiate_config(config): space = SpaceBuilder().build_from(config['metadata']['user_args']) if not space: raise ValueError("Parameter space is empty. There is nothing to optimize.") # Instantiate algorithms algorithms = PrimaryAlgo(space, config['algorithms']) # Get back a complete specification of algorithms configuration! # Basically this step converts, ``config['algorithms'] == 'random'`` to # ``config['algorithms'] == {'random': {}}``. config['algorithms'] = algorithms.configuration return algorithms

def _is_different_from(self, config): is_diff = False for section, value in config.items(): item = getattr(self, section) # See below in the comparison, that no special case is being considered for # ``section == 'algorithms'``. Because of this, ``is_new is True`` will be returned # falsely, if the previous steps in ``_sanitize_and_instantiate_config`` do not happen. # Special treatment of ``section == 'algorithms'``, will effectively do twice the thing that # `BaseAlgorithm.__init__` does through `PrimaryAlgo`. # It breaks delegation of responsibility. if item != value: is_diff = True break

What is being proposed above, does not resolve this case. The reason is that config['algorithms'] is being replaced by an actual PrimaryAlgo instance which makes this key not immediately comparable with what (possibly) already exists in the database.

I am not very fond of this particular part of current implementation. But I think it is nonetheless a sufficient measure.

OK! There was an important detail I missed. The method _sanitize_and_instantiate_config() will

instantiate the space and algorithm(s) and will set the algorithm's config inside the experiment's config (sanitize),

but it will return the instantiated algorithm, not its config (instantiate)

This is because

We want a unique form of config to compare (ex: algo: random and algo: {random: {}} maps to the same config {random: {}}) (sanitize)

We want to instantiate the actual algorithm to set it as an attribute of the experiment (instantiate)

The problem is we need the instantiated config in order to get the sanitized config, that's why both operation are currently coupled in a single method.

I think we could get rid of the sanitize part by doing the tests for differences and forking on a temporary copy of the experiment. This way we would still avoid changes on the experiment itself during the tests. Something like this:

def configure(self, config): experiment = copy.deepcopy(self) experiment._instantiate_config(config) if self.status is None: is_new = True else: is_new = self._is_different_from(experiment.configuration) if is_new: experiment._fork_config(config) self._instantiate_config(experiment.configuration) self._init_done = True self.status = 'Pending' # ...

The method _instantiate_config() would setup the experiment with instantiated objects of the config.

def _instantiate_config(self, config): space = SpaceBuilder().build_from(config['metadata']['user_args']) if not space: raise ValueError("Parameter space is empty. There is nothing to optimize.") # Instantiate algorithms self.algorithms = PrimaryAlgo(space, config['algorithms']) for section, value in config.items(): setattr(self, section, value)

Nice idea! Much cleaner!!

bouthilx · 2018-02-27T02:00:55Z

src/metaopt/core/worker/experiment.py

+        algorithms = PrimaryAlgo(space, config['algorithms'])
+        # Sanitize 'algorithms' section in given configuration
+        config['algorithms'] = algorithms.configuration
+        return algorithms


See previous comment

bouthilx · 2018-02-27T02:03:57Z

src/metaopt/algo/base.py

@@ -90,8 +90,8 @@ def suggest(self, num=1):

    @abstractmethod
    def observe(self, points, results):
-        """Observe evaluation `results` corresponding to list of `points` in
-        space.
+        """Observe the `results` of the evaluation of the `points` in the


I would add a similar note as for judge and make clear the difference between the two methods.

bouthilx · 2018-02-27T02:07:08Z

src/metaopt/algo/space.py


        """
-        return self.categories
+        raise RuntimeError("Categories are not ordered. Use ``self.categories`` instead")


I find ordered confusing in this context. I would simply say that Categories have no interval.

Also, elaborate more on its usage in docs

Also, use 'norm' distributions instead of 'alpha' in tests as they are more popular.

Use the trial for debugging the reason why it failed to execute the script. - Add explanatory comment on ``point`` argument on `BaseAlgorithm.judge`. - Rename some Consumer's attributes to facilitate understanding the code

bouthilx · 2018-03-07T02:39:56Z

Good! Let me recap this tomorrow morning. It seems we are close to a merge. 🎉

bouthilx · 2018-03-08T19:12:09Z

@tsirif Sorry I spent too much time on the survey and I'm lacking time now for the PR. I'll recap tomorrow afternoon (Friday).

bouthilx · 2018-03-08T19:13:38Z

@tsirif By the way, would be nice if you could make the change for random() -> choices() before tomorrow so the PR might be ready for merge at that point.

bouthilx

Good overall. I opened issues for things I believe we can discuss and fix outside this PR. There is 3 things I'd like us to settle on before the PR is merged.

The _Discrete class
The way Categorical sample points (I find it overcomplicated)
Whether Experiment._sanitize_and_instantiate_config should be inplace or not.

bouthilx · 2018-03-10T03:28:30Z

src/metaopt/algo/space.py

+        return size
+
+
+class _Discrete(Dimension):


I believe _Discrete class is useless. The sample method could solely reside inside Integer class. Categorical's sample method is straightforward using numpy.random.choice. Also, interval method is overwritten anyway by Categorical's to raise RuntimeError.

Although I could easily see how Integer and Categorical are related in theory, in our context combining them is not giving any advantage. It complexifies Categorical without giving any advantage in return.

I am not able to pass somehow a separate "sandboxed" seed without affecting global numpy RNG. numpy.random.choice does not offer such a choice. Temp saving the random state by:

orig_state = numpy.random.get_state() numpy.random.seed(seed) samples = # Do stuff with `numpy.random` here numpy.random.set_state(orig_state) return samples

i am not sure whether affecting global numpy RNG is safe, if this is not your intention.

What about setting a self.rng = numpy.RandomState() object inside the Dimension? We could then use rng.choices().

I will use it only inside Categorical. The rest of Dimension use scipy.stats which does this thing internally.

I patched Categorical to use numpy.RandomState, using check_random_state helper function.

However, I found a difficulty removing the _Discrete subclass. Remember that _Discrete class was introduced as an alternative to remove discrete keyword argument from Dimension. So the programming flow currently is as follows, and it takes advantage of Python mechanics regarding class inheritance:

Currently:
Suppose a call to Integer(...).sample(...). In general calling a method in an Integer object will first search within Integer and then super calls from there will try to find corresponding methods first in Real, then _Discrete and then Dimension.

class Integer(Real, _Discrete): def interval(self, alpha=1.0): low, high = super(Integer, self).interval(alpha) # Call to Real.interval # make them integers return int_low, int_high def __contains__(self, point): # Check if they are integers... return super(Integer, self).__contains__(point) # Call to Dimension.__contains__ # calling Integer(...).sample(...) will enter first Real.sample; # due to the order of subclasses

class Real(Dimension): def interval(self, alpha=1.0): prior_low, prior_high = super(Real, self).interval(alpha) # Call to Dimension.interval return (max(prior_low, self._low), min(prior_high, self._high)) # Add constraints to interval def sample(self, n_samples=1, seed=None): # Checks if constraint are met! samples = [] for _ in range(n_samples): for _ in range(4): sample = super(Real, self).sample(1, seed) # Calls _Discrete.sample if sample[0] not in self: # Calls Integer.__contains__ # ... # ... # ... return samples

class _Discrete(Dimension): def sample(self, n_samples=1, seed=None): samples = super(_Discrete, self).sample(n_samples, seed) # Calls Dimension.sample # Making discrete by ourselves because scipy does not use **floor** return list(map(lambda x: numpy.floor(x).astype(int), samples)) # Converts to integers

So the sequence of calls goes like this:

[Integer.sample (redirects)]

Real.sample (fulfills extra constraints)

_Discrete.sample (converts to integers)

Dimension.sample (samples from distribution)

Integer.__contains__ (checks if they are integers)

Dimension.__contains__ (check if they are within bounds)

Integer.interval (makes it int)

Real.interval (extra constraints)

Dimension.interval (asks the distribution for its support)

If we remove _Discrete and move all methods in class Integer(Real). The flow will be become:

So the sequence of calls goes like this:

Integer.sample (converts to integers)

Real.sample (fulfills extra constraints)

Dimension.sample (samples from distribution)

Integer.__contains__ (checks if they are integers)

Dimension.__contains__ (check if they are within bounds)

Integer.interval (makes it int)

Real.interval (extra constraints)

Dimension.interval (asks the distribution for its support)

Poor Dimension.sample will return whatever the scipy distribution returns, and the check from the Integer.__contains__ will fail, if the underlying distribution in scipy corresponds to a real one... However, we should allow (I believe) discretizing real distributions, so that for instance someone could have approximately normal distributed integer stochastic variables. What do you think about this?

I agree with the idea of discretizing Real distributions. However I don't see the problem here, can't we just move _Discrete.sample() inside Integer.sample() rather than completely removing this method and rely solely on Real.sample()?

Sorry for the confusion. I meant if we removed class _Discrete(Dimension) and moved all of its methods to class Integer(Real) (including sample), then:

... the sequence of calls goes like this:

Integer.sample (converts to integers)

Real.sample (fulfills extra constraints)

Dimension.sample (samples from distribution) -> this will return possibly samples in real numbers

Integer.__contains__ (checks if they are integers) -> it will fail because it checks if they are integers

Dimension.__contains__ (check if they are within bounds)

Integer.interval (makes it int)

Real.interval (extra constraints)

Dimension.interval (asks the distribution for its support)

Oh yes indeed, I was confused. 😅

My proposition then would be to replace this confusing inheritance path with a simple separation of sample() and _sample() where sample makes verifications over _sample() which do it blindly.

So this line:

sample = super(Real, self).sample(1, seed)

becomes

sample = self._sample(1, seed)

where

class Real(Dimension): # ... def _sample(self, n_samples, seed=None): return super(Real, self).sample(1, seed) # ... class Integer(Dimension): # ... def _sample(self, n_samples, seed=None): samples = super(Integer, self)._sample(n_samples, seed) return list(map(lambda x: numpy.floor(x).astype(int), samples)) # ...

I think it is easier this way to track the path of execution in order to understand, modify and maintain the code.

We would now have:

Integer.sample (converts to integers)

Real.sample (fulfills extra constraints)

Integer._sample

Real._sample

Dimension._sample (samples from distribution)

Integer.__contains__

Dimension.__contains__ (check if they are within bounds)

Integer.interval (makes it int)

Real.interval (extra constraints)

Dimension.interval (asks the distribution for its support)

bouthilx · 2018-03-10T03:33:20Z

src/metaopt/algo/space.py

+            self.categories = tuple(categories)
+            self._probs = tuple(numpy.tile(1. / len(categories), len(categories)))
+
+        prior = distributions.rv_discrete(values=(list(range(len(self.categories))),


I stand by my point here. I don't see the advantage of going through rv_discrete. Using numpy.random.choices would be simpler, without a need for any _inverse and _check.

bouthilx · 2018-03-10T03:35:55Z

src/metaopt/core/worker/experiment.py

@@ -283,6 +283,9 @@ def configure(self, config):
        if self._init_done:
            raise RuntimeError("Configuration is done; cannot reset an Experiment.")

+        # Sanitize and replace some sections with objects
+        candidate_algorithms = Experiment._sanitize_and_instantiate_config(config)


@tsirif Can we settle on this one too?

bouthilx

Good! I think we can remove commit 54e9c54 given we will keep _Discrete. I'd rather leave both sample() and interval() inside of it then.

bouthilx · 2018-03-19T20:19:33Z

tests/unittests/algo/test_space.py


        points = space.sample(2, seed=seed)
+        print(points)


bouthilx · 2018-03-19T20:19:39Z

tests/unittests/algo/test_space.py

-                          (points1[1], points2[1], points3[1])]
+        test_points = [(points1[0], points2[0], points3[0]),
+                       (points1[1], points2[1], points3[1])]
+        print(test_points)


tsirif · 2018-03-19T21:22:38Z

@bouthilx All is ok!

bouthilx · 2018-03-19T23:18:51Z

src/metaopt/algo/space.py

+    def __contains__(self, point):
+        """Check if constraints hold for this `point` of `Dimension`.
+
+        :param x: a parameter corresponding to this `Dimension`.


Missed those points

tsirif added 9 commits December 19, 2017 00:00

Add init file for namespace package metaopt.algo

8b8adf4

Space: __contains__ checks for shape as well

1745d61

- Fix shape to use scipy internal function

Move test directories around to group unittests better

1ade683

Setup: Add dependencies on numpy and scipy

27c1bf0

Fix: Update pylint to 1.8.1 to fix 'unused __class__'

ee51290

Also, silence a new 'wrong-import-order' error form pylint since it conflicts with current 'google' style from flake-import-order plugin.

Space: Add tests for space module

c6bb107

Add code and tests for DimensionBuilder

22db8e6

Set True/False to globals in eval if Py2 (combatibility)

9b2a91e

- Change how Categorical is repr-ed so that probabilities are printed .2f

tsirif added 16 commits January 17, 2018 22:10

Dimension: Make interval consider explic. bounds

1ba8a08

- Highlight Scipy's bug with a xfail test

Create core.io.convert for emitting/parsing conf files

32879da

Add SpaceBuilder: parses Space object, generates conf+args

31bd7c5

Add tests for mixin BaseAlgorithm

ae38157

Add primary algo module for transforming and checking

3435c93

Add tests for wrapper Primary Algorithm class (w/o transform)

bb1f19e

Make Dimension/SpaceBuilder a singleton class

e1903c2

Experiment: Instantiate space and algorithms

0df64b9

Add module with trial to sample conversion functions

789b150

Experiment: Add support for scoring function on suggested trials

cf35956

Trial: Add property gradient to return one gradient type result

d495a63

Rename utils/format.py to utils/format_trials.py

90796bb

Add gradient examples on results of some example trials

ad155b6

Move some fixture useful for functional testing to conftest

dd79d09

Add producer module which uses suggest and observe of an algo

6118cf9

tsirif added 2 commits February 5, 2018 03:32

Implement consumer module which executes user script

3798935

Implement function workon, coordination of producer/consumer

3eb7fe8

bouthilx self-requested a review February 12, 2018 18:02

bouthilx requested changes Feb 27, 2018

View reviewed changes

tsirif added 6 commits March 6, 2018 18:20

BaseAlgorithm: Change **hypers to **kwargs

b906e05

Also, elaborate more on its usage in docs

Space: Integer returns integer valued interval

3f3bbdb

Also, use 'norm' distributions instead of 'alpha' in tests as they are more popular.

Space: Simplify Categorical sampling; use nd.array.tolist

e7875ca

Test Space: Complete asserts in Space sampling

d3368b0

Space: Handle np.inf interval in Integer dims

e209f57

Consumer: If a trial fails, report it as 'broken'

79bba01

Use the trial for debugging the reason why it failed to execute the script. - Add explanatory comment on ``point`` argument on `BaseAlgorithm.judge`. - Rename some Consumer's attributes to facilitate understanding the code

Fix Epistimio#48: Ban random kwrd. enum -> choices. wrap uniform kwrd

cfb776a

This was referenced Mar 10, 2018

Should we seed unit-tests? #51

Closed

Should Categorical dimensions have ordered or non-ordered categories #52

Closed

bouthilx reviewed Mar 10, 2018

View reviewed changes

This was referenced Mar 15, 2018

Fix race condition for experiment trial selection #53

Merged

Add indexation to databases and fix race condition for experiment configuration #55

Merged

Space: Use numpy.random.choice for Categorical dimension

f8b1465

tsirif mentioned this pull request Mar 19, 2018

Regarding _Discrete class in algo.space #56

Closed

bouthilx approved these changes Mar 19, 2018

View reviewed changes

tsirif added 4 commits March 19, 2018 17:08

Experiment: Refactor configure to use a cloned version of itself

4a08bcf

Space: Add clarifying comment on discussion about _Discrete

88c553b

Fix Categorical dimension messing with objects' types

862cad6

Fix example code MySimpleAlgo in algo.base

e91d4c9

tsirif force-pushed the feature/algorithm branch from b881215 to e91d4c9 Compare March 19, 2018 21:15

bouthilx reviewed Mar 19, 2018

View reviewed changes

Fix documentation of __contains__ methods in algo.space

00496fc

bouthilx merged commit e6e62d6 into Epistimio:master Mar 19, 2018

bouthilx mentioned this pull request Mar 20, 2018

Decrease unittest time for database #54

Merged

tsirif deleted the feature/algorithm branch March 20, 2018 16:37

Feature: Algorithms, Dimensions, Space and Embeddings #36

Feature: Algorithms, Dimensions, Space and Embeddings #36

Conversation

tsirif commented Dec 19, 2017

codecov-io commented Dec 22, 2017 • edited Loading

Codecov Report

bouthilx commented Feb 4, 2018

bouthilx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bouthilx Mar 18, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bouthilx commented Mar 7, 2018

bouthilx commented Mar 8, 2018

bouthilx commented Mar 8, 2018

bouthilx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bouthilx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tsirif commented Mar 19, 2018

Choose a reason for hiding this comment

codecov-io commented Dec 22, 2017 •

edited

Loading

bouthilx Mar 18, 2018 •

edited

Loading