Change MongoDB Connection's interface to implement a .run() #975

sohkai · 2016-12-20T09:07:31Z

pymongo's interface is very different than rethinkdb's with queries being run directly through the mongodb connection object rather than a .run() method. We should discuss if Connections should still be expected to have a .run() method.

The text was updated successfully, but these errors were encountered:

r-marques · 2016-12-20T09:11:23Z

I like what we do with rethinkdb because we have a single place to handle exceptions and deal with connection errors.

I have been looking at the pymongo documentation and I think it would be possible to do the same thing we do with rethinkdb.

pymongo operations allows us to basically construct the queries without executing them and then we can use bulk_write to execute them on the database

sohkai · 2016-12-20T13:43:39Z

From #977, when this is done, we should also port some of the RethinkDB connection tests to MongoDB's connection.

r-marques · 2017-01-10T14:14:15Z

I moved this issue to a different milestone.
At this point I don't know exactly what needs to be done. pymongo handles database connections much differently from rethinkdb driver.

pymongo tries to reconnect by itself so we don't need to handle that like with rethinkdb.
Another big difference is that with rethinkdb you would connect to a specific node in a cluster. With mongodb this is different. You connect to the cluster itself and the driver takes care of connecting to a new node when the current node is down, etc...

This raises a lot of new questions that should be handled when you actually perform tests on a cluster.

r-marques · 2017-01-17T15:26:51Z

A note that we got from MongoDB regarding our connection code:

Retry loop not needed, as the driver will either block until a connection is available or timeout

From pymongo documentation:

We get an AutoReconnect exception. This means that the driver was not able to connect to the old primary (which makes sense, as we killed the server), but that it will attempt to automatically reconnect on subsequent operations. When this exception is raised our application code needs to decide whether to retry the operation or to simply continue, accepting the fact that the operation might have failed.

On subsequent attempts to run the query we might continue to see this exception. Eventually, however, the replica set will failover and elect a new primary (this should take no more than a couple of seconds in general). At that point the driver will connect to the new primary and the operation will succeed

So the driver automatically reconnects if the connection goes down or the primary changes but it still raises an AutoReconnect exception if we run the query while it is still reconnecting.

accepting the fact that the operation might have failed

This means that the operation may or may have not gone through so for inserts its important that we catch a DuplicateKeyError when retrying.

vrde · 2017-01-23T15:45:47Z

pymongo operations allows us to basically construct the queries without executing them and then we can use bulk_write to execute them on the database

That's pretty interesting, but unfortunately there is no Find, FindOne, or Aggregate operation 😿.

vrde · 2017-01-25T15:50:30Z

I've explored the bulk API, and it's too limited to support our use case. The bulk API supports, as a first operator, find and insert, while we use count, find_one, aggregate.

vrde · 2017-01-27T11:25:30Z

Here we go! Two PRs for this issue. I want to give more context about the two solutions.

#1100 implements a new object called Lazy, that can be used to describe a behavior, and execute it later. The basic idea was to emulate the RethinkDB interface to querying the DB. Lazy objects allow composition, so if you have complex queries you can create them in different lines.

#1109 approach is simpler, and it just uses lambda functions to achieve a similar behavior.

#1100 is one order of magnitude slower than #1109, but it's not a bummer IMO:

In [10]: def func1(): return Lazy().split(',')[1].split(' ').pop(1).strip()

In [11]: def func2(): return lambda x: x.split(',')[1].split(' ').pop(1).strip()

In [12]: %timeit func1()
The slowest run took 7.23 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.73 µs per loop

In [13]: %timeit func2()
The slowest run took 27.28 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 138 ns per loop

#1100 uses less code (and this is a plus), #1109 is more ergonomic. Ideas?

sohkai · 2017-02-10T08:35:50Z

Closing as #1100 implements this.

sohkai added this to the MongoDB integration milestone Dec 20, 2016

r-marques mentioned this issue Dec 20, 2016

Test mongodb connection #977

Merged

sohkai changed the title ~~Revisit Connection.run()~~ Change MongoDB Connection's interface to implement a .run() Dec 20, 2016

r-marques modified the milestones: MongoDB Replication, MongoDB integration Jan 10, 2017

This was referenced Jan 17, 2017

Support MongoDB replication #1054

Closed

Added replicaset name to bigchaindb config #1063

Merged

r-marques assigned vrde Jan 23, 2017

This was referenced Jan 26, 2017

[WIP] Add run interface to MongoDB connection (aka: lazy queries) #1100

Merged

[WIP] Use lambda to make things lazy #1109

Closed

vrde mentioned this issue Jan 30, 2017

Break out of the loop once a connection is established #1113

Merged

sohkai closed this as completed Feb 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change MongoDB Connection's interface to implement a .run() #975

Change MongoDB Connection's interface to implement a .run() #975

sohkai commented Dec 20, 2016

r-marques commented Dec 20, 2016

sohkai commented Dec 20, 2016

r-marques commented Jan 10, 2017

r-marques commented Jan 17, 2017

vrde commented Jan 23, 2017 •

edited

Loading

vrde commented Jan 25, 2017

vrde commented Jan 27, 2017

sohkai commented Feb 10, 2017

Change MongoDB Connection's interface to implement a .run() #975

Change MongoDB Connection's interface to implement a .run() #975

Comments

sohkai commented Dec 20, 2016

r-marques commented Dec 20, 2016

sohkai commented Dec 20, 2016

r-marques commented Jan 10, 2017

r-marques commented Jan 17, 2017

vrde commented Jan 23, 2017 • edited Loading

vrde commented Jan 25, 2017

vrde commented Jan 27, 2017

sohkai commented Feb 10, 2017

vrde commented Jan 23, 2017 •

edited

Loading