-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add indexation to databases and fix race condition for experiment configuration #55
Conversation
9180cc5
to
152f68f
Compare
Codecov Report
@@ Coverage Diff @@
## master #55 +/- ##
==========================================
- Coverage 97.47% 97.25% -0.22%
==========================================
Files 34 34
Lines 2772 2918 +146
Branches 234 245 +11
==========================================
+ Hits 2702 2838 +136
- Misses 44 49 +5
- Partials 26 31 +5
Continue to review full report at Codecov.
|
@tsirif Problems fixed, ready for review. :) |
This will resolve #41 also! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super cool! An extra concern that I want to raise is #57. What do you think?
AbstractDB, DatabaseError, DuplicateKeyError) | ||
|
||
|
||
def mongodb_exception_wrapper(method): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a pretty good idea! Shall we also enhance the method to handle & reraise ConnectionFailure
and OperationFailure
errors handled within initiate_connection
? Perhaps removing the logger too from AbstractDB and initiate_connection
, while appending logger's message to the error.
|
||
return converted_keys | ||
|
||
def _convert_sort_order(self, sort_order): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not quite sure what's up with the 'ascending', 'descending' business. Does it have to do with how are the documents gonna be stored in a dbcollection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly!
assert type(a_list) == list | ||
assert len(a_list) >= 1 | ||
# Part of `TestReserveTrial.test_reserve_success2` | ||
print(a_list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a print
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😱 Thanks! 😆
|
||
query = {'_id': exp_config[0][1]['_id']} | ||
|
||
with pytest.raises(pymongo.errors.OperationFailure): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving OperationFailure
handling and reraising with a wrapped exception from within the decorator, makes even more sense due to this observation/test! What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is OperationFailure is a pretty broad exception and we don't want to convert exception causes that we don't know. If there is a specific OperationFailure we know and need to convert, then we should verify its message before converting. My main concern with this method for conversion is that pymongo's error messages might vary from versions to others. I'd say we try like it is and extent the catches if necessary later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bouthilx I understand your point. So let the reraise happen exactly where the exception is raised; where the reason for OperationFailure
cannot mean multiple stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I tried something at 1a46674. What do you think of it?
|
||
|
||
def mongodb_exception_wrapper(method): | ||
"""Convert pymongo exceptions to generic exception types defined in src.core.io.database. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metaopt.core.io.database
instead
assert 'cannot reset' in str(exc_info.value) | ||
|
||
def test_try_set_after_race_condition(self, exp_config, new_config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this test demonstrates that if two Experiment perceive that is_new is True
, then only one of them will actually configure and that the other one will raise a DuplicateError. The exception will get caught by metaopt.cli.create_experiment
and creation will be retries. All second attempts for creation will succeed (assuming no forking conditions) because the second time is_new is False
! Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Quite accurate. If I may say, all experiments racing will configure, but only one will be able to write its configuration to DB, while the others will hit a DuplicateKeyError. My intention was to test the retry inside the cli
unittests, but maybe we should also test it at a lower level on Experiment
directly, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and mark it xfail? why not? it will help polishing UI details later 😛
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it won't fail. I've added it in 4bde5c9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bouthilx last reply was meant for #57...
I think that even if the first one will only write a new configuration with unique (exp_name, user_name), all the rest will eventually enter the branch where is_new is False
and they will still call _db.write(...)
but they are going to update it to be exactly the same thing.
As these two are the only options:
if is_new:
self._db.write('experiments', final_config)
# XXX: Reminder for future DB implementations:
# MongoDB, updates an inserted dict with _id, so should you :P
self._id = final_config['_id']
else:
self._db.write('experiments', final_config, {'_id': self._id})
It could verified by a mock at _db.write
and testing against the order of the calls to that function and which arguments they call it with. Perhaps, it could be an informative test. I am not sure..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The else
-clause will not update the document to exactly the same thing. Some changes do not trigger a fork, max_trials
for instance. I would verify such modifications. Don't we have a unit test doing that?
d633061
to
54bced3
Compare
Why: Our main and only database back end is a NoSQL one and only `_id` is indexed by default. That means every `read` operation (including reads necessary in updates with queries) needs to parse the entire collection unless the query specifies the _id. This is highly inefficient. Adding indexes on keys which are often used in queries will significantly improve the database operations. Additionally, we want to force some keys to be unique, namely the tuple (experiment name, user name). To do so we need a method to specify unique indexes. How: Add `ensure_index` method to `AbstractDB`. - Keys can be specified by name and optionally by sort order too. - Keys can be forced to be unique. - Generic sort orders are added to `AbstractDB`. - `MongoDB` backend uses `dbcollection.create_index()`
Why: `MongoDB` operations raise exceptions specific to `MongoDB`. Code outside the back end needs generic exceptions to handle database exceptions transparently from one back end to another. How: Add generic `DuplicateKeyError`. Add a decorator for `MongoDB` methods. - If the wrapped method raises `pymongo.errors.DuplicateKeyError`, it is converted to generic `DuplicateKeyError`. - If the wrapped method raises `pymongo.errors.BulkWriteError` containing at least one "duplicate key error", it is converted to generic `DuplicateKeyError`. Note: `pymongo.errors.BulkWriteError` can contain different type of errors. For conversion simplicity `DuplicateKeyError` will have precedence over the others.
54bced3
to
5548f1f
Compare
Note: See commit 6e7ba18 for more information about index creation in databases. Why: As explained in 6e7ba18, efficient database operations need indexes. - Experiments are believed to be mostly queried by (name, user) or status, so those keys are indexed. - Trials are mostly queried by experiment, status, results, start_time or end_time. How: Make every `ensure_index()` call during Experiment initialization. The method is idempotent (only applies change if necessary) so we do not need to verify if the index exists or not a priori. Writing the final config to an already existing experiment raises a DuplicatKeyError. To avoid this `final_config["name"]` and `final_config["metadata"]["user"]` are now popped out before `db.write()`.
Why: Unittest configuration contained two experiments with the same tuple (name, user). The addition of enforced uniqueness on this tuple makes many unittest crash. How: Change the name of the second experiment with duplicate keys and adapt unittest to either test one or the other. For unit tests requiring many instances, change the query to (name, ) instead of (name, user) so that the output is still more than one experiment.
Why: The configuration of the experiment is not atomic and race conditions are likely to happen when many workers are submitted at once to start optimization. How: After all configuration sources are merged, experiment configuration attempts to write it to DB. If write fails because of a race condition, `cli` will rebuild the experiment from scratch. Doing this will trigger a fresh read from DB and load the configuration saved by the _winning_ worker.
5548f1f
to
1a46674
Compare
pymongo.errors.BulkWriteError[DuplicateKeyError] -> DuplicateKeyError | ||
pymongo.errors.BulkWriteError[DUPLICATE_KEY_MESSAGES] -> DuplicateKeyError | ||
pymongo.errors.ConnectionFailure -> DatabaseError | ||
pymongo.errors.OperationFailure(AUTH_FAILED_MESSAGES) -> DatabaseError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OCD here... above it is with square brackets, here it is with paretheses... sorry for that 🤓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bouthilx this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree, but [] vs () has a meaning. BulkWriteError contains a list of errors, from which we convert the one with DUPLICATE_KEY_MESSAGES. OperationFailure has only a message.
Why: `MongoDB.initiate_connection` have two database specific exceptions that it can raise; `OperationFailure("auth failed")` and `ConnectionFailure`. We now have a `mongodb_exception_wrapper` to automatically convert such exceptions outside `MongoDB`'s methods. How: Wrap the method `initiate_connection()` and catch `ConnectionFailure` or `OperationFailure` inside the wrapper. Note: `OperationFailure` is a generic exception for `MongoDB` so we need to match the error message to "auth failed", otherwise we let the exception go up as is.
1a46674
to
c60a82e
Compare
Related to #10, #14 and #41 .
Needs #36 and #53 to be merged.
Why:
Our main and only database back end is a NoSQL one and only
_id
is indexed by default. That means everyread
operation (including reads necessary in updates with queries) needs to parse the entire collection unless the query specifies the _id. This is highly inefficient. Adding indexes on keys which are often used in queries will significantly improve the database operations.Additionally, we want to force some keys to be unique, namely the tuple (experiment name, user name). To do so we need a method to specify unique indexes.
How:
Add
ensure_index
method toAbstractDB
.AbstractDB
.MongoDB
backend usesdbcollection.create_index()
Add generic
DuplicateKeyError
and a decorator forMongoDB
methods.pymongo.errors.DuplicateKeyError
, it is converted to genericDuplicateKeyError
.pymongo.errors.BulkWriteError
containing at least one "duplicate key error", it is converted to genericDuplicateKeyError
.Note:
pymongo.errors.BulkWriteError
can contain different type of errors. For conversion simplicity DuplicateKeyError` will have precedence over the others.Add index configuration during experiment initialization.
Add to
cli
handling of configuration failures due to race condition.Note:
Unittest configuration contained two experiments with the same tuple
(name, user). The addition of enforced uniqueness on this tuple makes
many unittest crash.