Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add indexation to databases and fix race condition for experiment configuration #55

Merged
merged 6 commits into from
Mar 21, 2018

Conversation

bouthilx
Copy link
Member

@bouthilx bouthilx commented Mar 18, 2018

Related to #10, #14 and #41 .
Needs #36 and #53 to be merged.

Why:

Our main and only database back end is a NoSQL one and only _id is indexed by default. That means every read operation (including reads necessary in updates with queries) needs to parse the entire collection unless the query specifies the _id. This is highly inefficient. Adding indexes on keys which are often used in queries will significantly improve the database operations.

Additionally, we want to force some keys to be unique, namely the tuple (experiment name, user name). To do so we need a method to specify unique indexes.

  • Experiments are believed to be mostly queried by (name, user) or status, so those keys are indexed.
  • Trials are mostly queried by experiment, status, results, start_time or end_time.

How:

Add ensure_index method to AbstractDB.

  • Keys can be specified by name and optionally by sort order too.
  • Keys can be forced to be unique.
  • Generic sort orders are added to AbstractDB.
  • MongoDB backend uses dbcollection.create_index()

Add generic DuplicateKeyError and a decorator for MongoDB methods.

  • If the wrapped method raises pymongo.errors.DuplicateKeyError, it is converted to generic DuplicateKeyError.
  • If the wrapped method raises pymongo.errors.BulkWriteError containing at least one "duplicate key error", it is converted to generic DuplicateKeyError.

Note: pymongo.errors.BulkWriteError can contain different type of errors. For conversion simplicity DuplicateKeyError` will have precedence over the others.

Add index configuration during experiment initialization.

Add to cli handling of configuration failures due to race condition.


Note:

Unittest configuration contained two experiments with the same tuple
(name, user). The addition of enforced uniqueness on this tuple makes
many unittest crash.

@bouthilx bouthilx requested a review from tsirif March 18, 2018 18:47
@bouthilx bouthilx changed the title Feature/db/index Add indexation to databases and fix race condition for experiment configuration Mar 18, 2018
@bouthilx bouthilx force-pushed the feature/db/index branch 3 times, most recently from 9180cc5 to 152f68f Compare March 20, 2018 02:58
@codecov-io
Copy link

codecov-io commented Mar 20, 2018

Codecov Report

Merging #55 into master will decrease coverage by 0.21%.
The diff coverage is 96.13%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #55      +/-   ##
==========================================
- Coverage   97.47%   97.25%   -0.22%     
==========================================
  Files          34       34              
  Lines        2772     2918     +146     
  Branches      234      245      +11     
==========================================
+ Hits         2702     2838     +136     
- Misses         44       49       +5     
- Partials       26       31       +5
Impacted Files Coverage Δ
src/metaopt/core/io/database/__init__.py 100% <100%> (ø) ⬆️
src/metaopt/core/worker/experiment.py 94.59% <100%> (-0.84%) ⬇️
src/metaopt/core/cli.py 88.37% <75%> (-3.52%) ⬇️
src/metaopt/core/io/database/mongodb.py 95% <89.13%> (-5%) ⬇️
tests/unittests/core/mongodb_test.py 99.53% <98.64%> (-0.47%) ⬇️
tests/unittests/core/test_experiment.py 98.88% <98.87%> (+0.1%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e2e1098...c60a82e. Read the comment docs.

@bouthilx
Copy link
Member Author

@tsirif Problems fixed, ready for review. :)

@tsirif
Copy link
Member

tsirif commented Mar 20, 2018

This will resolve #41 also!

Copy link
Member

@tsirif tsirif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super cool! An extra concern that I want to raise is #57. What do you think?

AbstractDB, DatabaseError, DuplicateKeyError)


def mongodb_exception_wrapper(method):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a pretty good idea! Shall we also enhance the method to handle & reraise ConnectionFailure and OperationFailure errors handled within initiate_connection? Perhaps removing the logger too from AbstractDB and initiate_connection, while appending logger's message to the error.


return converted_keys

def _convert_sort_order(self, sort_order):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quite sure what's up with the 'ascending', 'descending' business. Does it have to do with how are the documents gonna be stored in a dbcollection?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly!

assert type(a_list) == list
assert len(a_list) >= 1
# Part of `TestReserveTrial.test_reserve_success2`
print(a_list)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a print here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😱 Thanks! 😆


query = {'_id': exp_config[0][1]['_id']}

with pytest.raises(pymongo.errors.OperationFailure):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving OperationFailure handling and reraising with a wrapped exception from within the decorator, makes even more sense due to this observation/test! What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is OperationFailure is a pretty broad exception and we don't want to convert exception causes that we don't know. If there is a specific OperationFailure we know and need to convert, then we should verify its message before converting. My main concern with this method for conversion is that pymongo's error messages might vary from versions to others. I'd say we try like it is and extent the catches if necessary later on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bouthilx I understand your point. So let the reraise happen exactly where the exception is raised; where the reason for OperationFailure cannot mean multiple stuff.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I tried something at 1a46674. What do you think of it?



def mongodb_exception_wrapper(method):
"""Convert pymongo exceptions to generic exception types defined in src.core.io.database.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metaopt.core.io.database instead

assert 'cannot reset' in str(exc_info.value)

def test_try_set_after_race_condition(self, exp_config, new_config):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this test demonstrates that if two Experiment perceive that is_new is True, then only one of them will actually configure and that the other one will raise a DuplicateError. The exception will get caught by metaopt.cli.create_experiment and creation will be retries. All second attempts for creation will succeed (assuming no forking conditions) because the second time is_new is False! Is this correct?

Copy link
Member Author

@bouthilx bouthilx Mar 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Quite accurate. If I may say, all experiments racing will configure, but only one will be able to write its configuration to DB, while the others will hit a DuplicateKeyError. My intention was to test the retry inside the cli unittests, but maybe we should also test it at a lower level on Experiment directly, what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and mark it xfail? why not? it will help polishing UI details later 😛

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it won't fail. I've added it in 4bde5c9

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bouthilx last reply was meant for #57...

I think that even if the first one will only write a new configuration with unique (exp_name, user_name), all the rest will eventually enter the branch where is_new is False and they will still call _db.write(...) but they are going to update it to be exactly the same thing.

As these two are the only options:

        if is_new:
            self._db.write('experiments', final_config)
            # XXX: Reminder for future DB implementations:
            # MongoDB, updates an inserted dict with _id, so should you :P
            self._id = final_config['_id']
        else:
            self._db.write('experiments', final_config, {'_id': self._id})

It could verified by a mock at _db.write and testing against the order of the calls to that function and which arguments they call it with. Perhaps, it could be an informative test. I am not sure..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else-clause will not update the document to exactly the same thing. Some changes do not trigger a fork, max_trials for instance. I would verify such modifications. Don't we have a unit test doing that?

@bouthilx
Copy link
Member Author

@tsirif Done. Please don't merge before I squash 127a0e6 and 63e9522. I let them there to ease the flow of feedback and discussion until we consider it ready for merge.

Why:

Our main and only database back end is a NoSQL one and only `_id` is
indexed by default. That means every `read` operation (including reads
necessary in updates with queries) needs to parse the entire collection
unless the query specifies the _id. This is highly inefficient. Adding
indexes on keys which are often used in queries will significantly
improve the database operations.

Additionally, we want to force some keys to be unique, namely the
tuple (experiment name, user name). To do so we need a method to specify
unique indexes.

How:

Add `ensure_index` method to `AbstractDB`.

- Keys can be specified by name and optionally by sort order too.
- Keys can be forced to be unique.
- Generic sort orders are added to `AbstractDB`.
- `MongoDB` backend uses `dbcollection.create_index()`
Why:

`MongoDB` operations raise exceptions specific to `MongoDB`. Code
outside the back end needs generic exceptions to handle database
exceptions transparently from one back end to another.

How:

Add generic `DuplicateKeyError`.

Add a decorator for `MongoDB` methods.
- If the wrapped method raises `pymongo.errors.DuplicateKeyError`, it is
  converted to generic `DuplicateKeyError`.
- If the wrapped method raises `pymongo.errors.BulkWriteError`
  containing at least one "duplicate key error", it is converted to
  generic `DuplicateKeyError`.

Note: `pymongo.errors.BulkWriteError` can contain different type of
      errors. For conversion simplicity `DuplicateKeyError` will have
      precedence over the others.
Note: See commit 6e7ba18 for more information about index creation in
databases.

Why:

As explained in 6e7ba18, efficient database operations need indexes.
- Experiments are believed to be mostly queried by (name, user) or
  status, so those keys are indexed.
- Trials are mostly queried by experiment, status, results, start_time
  or end_time.

How:

Make every `ensure_index()` call during Experiment initialization. The
method is idempotent (only applies change if necessary) so we do not
need to verify if the index exists or not a priori.

Writing the final config to an already existing experiment raises a
DuplicatKeyError. To avoid this `final_config["name"]` and
`final_config["metadata"]["user"]` are now popped out before
`db.write()`.
Why:

Unittest configuration contained two experiments with the same tuple
(name, user). The addition of enforced uniqueness on this tuple makes
many unittest crash.

How:

Change the name of the second experiment with duplicate keys and adapt
unittest to either test one or the other. For unit tests requiring many
instances, change the query to (name, ) instead of (name, user) so that
the output is still more than one experiment.
Why:

The configuration of the experiment is not atomic and race conditions
are likely to happen when many workers are submitted at once to start
optimization.

How:

After all configuration sources are merged, experiment configuration
attempts to write it to DB. If write fails because of a race condition,
`cli` will rebuild the experiment from scratch. Doing this will trigger
a fresh read from DB and load the configuration saved by the _winning_
worker.
pymongo.errors.BulkWriteError[DuplicateKeyError] -> DuplicateKeyError
pymongo.errors.BulkWriteError[DUPLICATE_KEY_MESSAGES] -> DuplicateKeyError
pymongo.errors.ConnectionFailure -> DatabaseError
pymongo.errors.OperationFailure(AUTH_FAILED_MESSAGES) -> DatabaseError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OCD here... above it is with square brackets, here it is with paretheses... sorry for that 🤓

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bouthilx this one

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree, but [] vs () has a meaning. BulkWriteError contains a list of errors, from which we convert the one with DUPLICATE_KEY_MESSAGES. OperationFailure has only a message.

Why:

`MongoDB.initiate_connection` have two database specific exceptions that
it can raise; `OperationFailure("auth failed")` and `ConnectionFailure`.
We now have a `mongodb_exception_wrapper` to automatically convert such
exceptions outside `MongoDB`'s methods.

How:

Wrap the method `initiate_connection()` and catch `ConnectionFailure`
or `OperationFailure` inside the wrapper.

Note:
`OperationFailure` is a generic exception for `MongoDB` so we need to
match the error message to "auth failed", otherwise we let the exception
go up as is.
@tsirif tsirif merged commit dff43f1 into Epistimio:master Mar 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants