Add indexation to databases and fix race condition for experiment configuration #55

bouthilx · 2018-03-18T18:47:42Z

Related to #10, #14 and #41 .
Needs #36 and #53 to be merged.

Why:

Our main and only database back end is a NoSQL one and only _id is indexed by default. That means every read operation (including reads necessary in updates with queries) needs to parse the entire collection unless the query specifies the _id. This is highly inefficient. Adding indexes on keys which are often used in queries will significantly improve the database operations.

Additionally, we want to force some keys to be unique, namely the tuple (experiment name, user name). To do so we need a method to specify unique indexes.

Experiments are believed to be mostly queried by (name, user) or status, so those keys are indexed.
Trials are mostly queried by experiment, status, results, start_time or end_time.

How:

Add `ensure_index` method to `AbstractDB`.

Keys can be specified by name and optionally by sort order too.
Keys can be forced to be unique.
Generic sort orders are added to AbstractDB.
MongoDB backend uses dbcollection.create_index()

Add generic `DuplicateKeyError` and a decorator for `MongoDB` methods.

If the wrapped method raises pymongo.errors.DuplicateKeyError, it is converted to generic DuplicateKeyError.
If the wrapped method raises pymongo.errors.BulkWriteError containing at least one "duplicate key error", it is converted to generic DuplicateKeyError.

Note: pymongo.errors.BulkWriteError can contain different type of errors. For conversion simplicity DuplicateKeyError` will have precedence over the others.

Add index configuration during experiment initialization.

Add to `cli` handling of configuration failures due to race condition.

Note:

Unittest configuration contained two experiments with the same tuple
(name, user). The addition of enforced uniqueness on this tuple makes
many unittest crash.

codecov-io · 2018-03-20T03:19:03Z

Codecov Report

Merging #55 into master will decrease coverage by 0.21%.
The diff coverage is 96.13%.

@@            Coverage Diff             @@
##           master      #55      +/-   ##
==========================================
- Coverage   97.47%   97.25%   -0.22%     
==========================================
  Files          34       34              
  Lines        2772     2918     +146     
  Branches      234      245      +11     
==========================================
+ Hits         2702     2838     +136     
- Misses         44       49       +5     
- Partials       26       31       +5

Impacted Files	Coverage Δ
src/metaopt/core/io/database/__init__.py	`100% <100%> (ø)`	⬆️
src/metaopt/core/worker/experiment.py	`94.59% <100%> (-0.84%)`	⬇️
src/metaopt/core/cli.py	`88.37% <75%> (-3.52%)`	⬇️
src/metaopt/core/io/database/mongodb.py	`95% <89.13%> (-5%)`	⬇️
tests/unittests/core/mongodb_test.py	`99.53% <98.64%> (-0.47%)`	⬇️
tests/unittests/core/test_experiment.py	`98.88% <98.87%> (+0.1%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e2e1098...c60a82e. Read the comment docs.

bouthilx · 2018-03-20T03:22:47Z

@tsirif Problems fixed, ready for review. :)

tsirif · 2018-03-20T17:31:57Z

This will resolve #41 also!

tsirif

Super cool! An extra concern that I want to raise is #57. What do you think?

tsirif · 2018-03-20T18:15:14Z

src/metaopt/core/io/database/mongodb.py

+    AbstractDB, DatabaseError, DuplicateKeyError)
+
+
+def mongodb_exception_wrapper(method):


That's a pretty good idea! Shall we also enhance the method to handle & reraise ConnectionFailure and OperationFailure errors handled within initiate_connection? Perhaps removing the logger too from AbstractDB and initiate_connection, while appending logger's message to the error.

tsirif · 2018-03-20T18:16:42Z

src/metaopt/core/io/database/mongodb.py

+
+        return converted_keys
+
+    def _convert_sort_order(self, sort_order):


I am not quite sure what's up with the 'ascending', 'descending' business. Does it have to do with how are the documents gonna be stored in a dbcollection?

Yes, exactly!

tsirif · 2018-03-20T18:42:59Z

tests/unittests/core/test_experiment.py

+        assert type(a_list) == list
+        assert len(a_list) >= 1
+        # Part of `TestReserveTrial.test_reserve_success2`
+        print(a_list)


There is a print here

😱 Thanks! 😆

tsirif · 2018-03-20T18:52:17Z

tests/unittests/core/mongodb_test.py

+
+        query = {'_id': exp_config[0][1]['_id']}
+
+        with pytest.raises(pymongo.errors.OperationFailure):


Moving OperationFailure handling and reraising with a wrapped exception from within the decorator, makes even more sense due to this observation/test! What do you think?

The thing is OperationFailure is a pretty broad exception and we don't want to convert exception causes that we don't know. If there is a specific OperationFailure we know and need to convert, then we should verify its message before converting. My main concern with this method for conversion is that pymongo's error messages might vary from versions to others. I'd say we try like it is and extent the catches if necessary later on.

@bouthilx I understand your point. So let the reraise happen exactly where the exception is raised; where the reason for OperationFailure cannot mean multiple stuff.

Well, I tried something at 1a46674. What do you think of it?

tsirif · 2018-03-20T18:53:03Z

src/metaopt/core/io/database/mongodb.py

+
+
+def mongodb_exception_wrapper(method):
+    """Convert pymongo exceptions to generic exception types defined in src.core.io.database.


metaopt.core.io.database instead

tsirif · 2018-03-20T19:07:16Z

tests/unittests/core/test_experiment.py

        assert 'cannot reset' in str(exc_info.value)

+    def test_try_set_after_race_condition(self, exp_config, new_config):


So this test demonstrates that if two Experiment perceive that is_new is True, then only one of them will actually configure and that the other one will raise a DuplicateError. The exception will get caught by metaopt.cli.create_experiment and creation will be retries. All second attempts for creation will succeed (assuming no forking conditions) because the second time is_new is False! Is this correct?

Yes! Quite accurate. If I may say, all experiments racing will configure, but only one will be able to write its configuration to DB, while the others will hit a DuplicateKeyError. My intention was to test the retry inside the cli unittests, but maybe we should also test it at a lower level on Experiment directly, what do you think?

and mark it xfail? why not? it will help polishing UI details later 😛

But it won't fail. I've added it in 4bde5c9

@bouthilx last reply was meant for #57...

I think that even if the first one will only write a new configuration with unique (exp_name, user_name), all the rest will eventually enter the branch where is_new is False and they will still call _db.write(...) but they are going to update it to be exactly the same thing.

As these two are the only options:

if is_new: self._db.write('experiments', final_config) # XXX: Reminder for future DB implementations: # MongoDB, updates an inserted dict with _id, so should you :P self._id = final_config['_id'] else: self._db.write('experiments', final_config, {'_id': self._id})

It could verified by a mock at _db.write and testing against the order of the calls to that function and which arguments they call it with. Perhaps, it could be an informative test. I am not sure..

The else-clause will not update the document to exactly the same thing. Some changes do not trigger a fork, max_trials for instance. I would verify such modifications. Don't we have a unit test doing that?

bouthilx · 2018-03-21T02:40:18Z

@tsirif Done. Please don't merge before I squash 127a0e6 and 63e9522. I let them there to ease the flow of feedback and discussion until we consider it ready for merge.

Why: Our main and only database back end is a NoSQL one and only `_id` is indexed by default. That means every `read` operation (including reads necessary in updates with queries) needs to parse the entire collection unless the query specifies the _id. This is highly inefficient. Adding indexes on keys which are often used in queries will significantly improve the database operations. Additionally, we want to force some keys to be unique, namely the tuple (experiment name, user name). To do so we need a method to specify unique indexes. How: Add `ensure_index` method to `AbstractDB`. - Keys can be specified by name and optionally by sort order too. - Keys can be forced to be unique. - Generic sort orders are added to `AbstractDB`. - `MongoDB` backend uses `dbcollection.create_index()`

Why: `MongoDB` operations raise exceptions specific to `MongoDB`. Code outside the back end needs generic exceptions to handle database exceptions transparently from one back end to another. How: Add generic `DuplicateKeyError`. Add a decorator for `MongoDB` methods. - If the wrapped method raises `pymongo.errors.DuplicateKeyError`, it is converted to generic `DuplicateKeyError`. - If the wrapped method raises `pymongo.errors.BulkWriteError` containing at least one "duplicate key error", it is converted to generic `DuplicateKeyError`. Note: `pymongo.errors.BulkWriteError` can contain different type of errors. For conversion simplicity `DuplicateKeyError` will have precedence over the others.

Note: See commit 6e7ba18 for more information about index creation in databases. Why: As explained in 6e7ba18, efficient database operations need indexes. - Experiments are believed to be mostly queried by (name, user) or status, so those keys are indexed. - Trials are mostly queried by experiment, status, results, start_time or end_time. How: Make every `ensure_index()` call during Experiment initialization. The method is idempotent (only applies change if necessary) so we do not need to verify if the index exists or not a priori. Writing the final config to an already existing experiment raises a DuplicatKeyError. To avoid this `final_config["name"]` and `final_config["metadata"]["user"]` are now popped out before `db.write()`.

Why: Unittest configuration contained two experiments with the same tuple (name, user). The addition of enforced uniqueness on this tuple makes many unittest crash. How: Change the name of the second experiment with duplicate keys and adapt unittest to either test one or the other. For unit tests requiring many instances, change the query to (name, ) instead of (name, user) so that the output is still more than one experiment.

Why: The configuration of the experiment is not atomic and race conditions are likely to happen when many workers are submitted at once to start optimization. How: After all configuration sources are merged, experiment configuration attempts to write it to DB. If write fails because of a race condition, `cli` will rebuild the experiment from scratch. Doing this will trigger a fresh read from DB and load the configuration saved by the _winning_ worker.

tsirif · 2018-03-21T03:38:36Z

src/metaopt/core/io/database/mongodb.py

-    pymongo.errors.BulkWriteError[DuplicateKeyError] -> DuplicateKeyError
+    pymongo.errors.BulkWriteError[DUPLICATE_KEY_MESSAGES] -> DuplicateKeyError
+    pymongo.errors.ConnectionFailure -> DatabaseError
+    pymongo.errors.OperationFailure(AUTH_FAILED_MESSAGES) -> DatabaseError


OCD here... above it is with square brackets, here it is with paretheses... sorry for that 🤓

@bouthilx this one

I would agree, but [] vs () has a meaning. BulkWriteError contains a list of errors, from which we convert the one with DUPLICATE_KEY_MESSAGES. OperationFailure has only a message.

Why: `MongoDB.initiate_connection` have two database specific exceptions that it can raise; `OperationFailure("auth failed")` and `ConnectionFailure`. We now have a `mongodb_exception_wrapper` to automatically convert such exceptions outside `MongoDB`'s methods. How: Wrap the method `initiate_connection()` and catch `ConnectionFailure` or `OperationFailure` inside the wrapper. Note: `OperationFailure` is a generic exception for `MongoDB` so we need to match the error message to "auth failed", otherwise we let the exception go up as is.

bouthilx requested a review from tsirif March 18, 2018 18:47

bouthilx changed the title ~~Feature/db/index~~ Add indexation to databases and fix race condition for experiment configuration Mar 18, 2018

bouthilx force-pushed the feature/db/index branch 3 times, most recently from 9180cc5 to 152f68f Compare March 20, 2018 02:58

tsirif mentioned this pull request Mar 20, 2018

What should we do with multiple expconfigs found in db by Experiment.__init__? #57

Closed

tsirif requested changes Mar 20, 2018

View reviewed changes

bouthilx force-pushed the feature/db/index branch from d633061 to 54bced3 Compare March 21, 2018 02:51

bouthilx added 2 commits March 20, 2018 23:00

bouthilx force-pushed the feature/db/index branch from 54bced3 to 5548f1f Compare March 21, 2018 03:03

bouthilx added 3 commits March 20, 2018 23:10

bouthilx force-pushed the feature/db/index branch from 5548f1f to 1a46674 Compare March 21, 2018 03:11

tsirif reviewed Mar 21, 2018

View reviewed changes

bouthilx force-pushed the feature/db/index branch from 1a46674 to c60a82e Compare March 21, 2018 03:43

tsirif approved these changes Mar 21, 2018

View reviewed changes

tsirif merged commit dff43f1 into Epistimio:master Mar 21, 2018

mgermain mentioned this pull request Aug 30, 2021

Failed to sample in interval ... #647

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add indexation to databases and fix race condition for experiment configuration #55

Add indexation to databases and fix race condition for experiment configuration #55

bouthilx commented Mar 18, 2018 •

edited

Loading

codecov-io commented Mar 20, 2018 •

edited

Loading

bouthilx commented Mar 20, 2018

tsirif commented Mar 20, 2018

tsirif left a comment

tsirif Mar 20, 2018

tsirif Mar 20, 2018

bouthilx Mar 21, 2018

tsirif Mar 20, 2018

bouthilx Mar 21, 2018

tsirif Mar 20, 2018

bouthilx Mar 21, 2018

tsirif Mar 21, 2018

bouthilx Mar 21, 2018

tsirif Mar 20, 2018

tsirif Mar 20, 2018

bouthilx Mar 21, 2018 •

edited

Loading

tsirif Mar 21, 2018

bouthilx Mar 21, 2018

tsirif Mar 21, 2018

bouthilx Mar 21, 2018

bouthilx commented Mar 21, 2018

tsirif Mar 21, 2018

tsirif Mar 21, 2018

bouthilx Mar 21, 2018

		AbstractDB, DatabaseError, DuplicateKeyError)


		def mongodb_exception_wrapper(method):


		return converted_keys

		def _convert_sort_order(self, sort_order):


		query = {'_id': exp_config[0][1]['_id']}

		with pytest.raises(pymongo.errors.OperationFailure):



		def mongodb_exception_wrapper(method):
		"""Convert pymongo exceptions to generic exception types defined in src.core.io.database.

		assert 'cannot reset' in str(exc_info.value)

		def test_try_set_after_race_condition(self, exp_config, new_config):

Add indexation to databases and fix race condition for experiment configuration #55

Add indexation to databases and fix race condition for experiment configuration #55

Conversation

bouthilx commented Mar 18, 2018 • edited Loading

Why:

How:

Add ensure_index method to AbstractDB.

Add generic DuplicateKeyError and a decorator for MongoDB methods.

Add index configuration during experiment initialization.

Add to cli handling of configuration failures due to race condition.

Note:

codecov-io commented Mar 20, 2018 • edited Loading

Codecov Report

bouthilx commented Mar 20, 2018

tsirif commented Mar 20, 2018

tsirif left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bouthilx Mar 21, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bouthilx commented Mar 21, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bouthilx commented Mar 18, 2018 •

edited

Loading

Add `ensure_index` method to `AbstractDB`.

Add generic `DuplicateKeyError` and a decorator for `MongoDB` methods.

Add to `cli` handling of configuration failures due to race condition.

codecov-io commented Mar 20, 2018 •

edited

Loading

bouthilx Mar 21, 2018 •

edited

Loading