Add support for asynchronous client #39

turu · 2016-01-15T16:27:05Z

In a nutshell: This pull request adds, in a fully backward compatible way, a new optional, asynchronous client to the library.

Why this matters?
Druid can be very useful in many scenarios different than ad-hoc analysis of online data - f.e it can be successfully used to create flexible and interactive analytics dashboards or monitors. Applying asynchronous programming style is often a go-to architectural choice, when designing such systems - especially, when they are implemented in single threaded languages like python. The lack of asynchronous client can therefore be a serious problem. This pull request fixes it.

What has been done?

PyDruid client has been refactored to facilitate addition of a new client
- extracted Query and QueryBuilder abstractions (tests migrated as well)
- extracted common BaseDruidClient class
AsyncPyDruid client implemented, which mirrors functionality of synchronous PyDruid, but works in an asynchronous way
- the client uses an asynchronous http client
- it is optional in the same fashion as support for pandas export
Added minimal set of integration tests for both clients.
setup.py enhanced to use pip extras. As a result, both pandas export and async support are explicitly declared as optional features (see updated readme for more info)
All new (and old as well) classes are documented. Examples of usage added both to docs and readme.

(accidentally ;)) Fixed issues:

PyDruid not extending object #29
fa3c366 fixes the non-deterministic behavior of test_export_tsv. The issue originated from the fact that python dictionaries are unordered.

A note on implementation of AsyncPyDruid:
Asynchronous client uses elements of Tornado framework. This was a conscious choice based on the fact that Tornado is stable and mature, and integrates very well with python’s 3.5 native async support (async, await keywords) as well as other asynchronous frameworks. As a result one is not locked into using Tornado for the whole of his/her app.
See: http://www.tornadoweb.org/en/stable/guide/coroutines.html#python-3-5-async-and-await

A note on backward compatibility:
As a result of refactoring, clients are reusable. This is achieved through clients’ query methods returning Query objects filled with results, rather than result dictionary. However, this change is fully backward compatible - clients still expose the same (now deprecated) interface allowing one to access details of the most recently executed query. Query objects on the other hand, act as wrappers around result dictionary, giving users the same dictionary interface they had before, enriched to include methods for exporting results and accessing query details.

A note on python 2.x <-> 3.x compatibility:
The changes have been tested and should work reliably both in python 2.x and 3.x

Additional note on the authors:
Changes introduced by this pull request have been originally developed in https://github.com/turu/pydruid by me (github.com/turu), and github.com/l-j. Since then, they were forked to our organization (github.com/Dreamlab) and they are already used in production. We both signed the CLA.

Best regards,
Piotr Turek

fjy · 2016-01-21T20:08:29Z

@turu Thanks for this contrib. Do you mind filling out the CLA here: druid.io/community/cla.html

turu · 2016-01-23T13:44:32Z

@fjy I already did. Can you verify that you have my CLA registered?

drcrallen · 2016-01-23T15:39:45Z

Note there's a commit from https://github.com/l-j in here

drcrallen · 2016-01-23T15:40:59Z

and there's a cla on file for @l-j and @turu

turu · 2016-01-30T18:29:13Z

@fjy @drcrallen Is there any update on the status of this pr?

fjy · 2016-02-04T05:13:46Z

@turu apologies for delay, will ask people to finish reviewing this

@drcrallen @xvrl please review

fjy · 2016-02-20T06:47:51Z

@turu can we resolve the merge conflicts and I will merge this

fjy · 2016-02-20T06:48:17Z

pydruid/async_client.py

@@ -0,0 +1,157 @@
+#


please use the new license header

I changed the year in license header, from 2013 to 2016. Is that ok? If there is a completely new header somewhere, which is supposed to be used, could you point me to it?

turu · 2016-03-07T22:33:15Z

@fjy I resolved merge conflicts. To the best of my knowledge, everything should be ok. All tests are passing.

xvrl · 2016-03-10T18:15:49Z

@turu Thanks for the comprehensive description, a few questions:

you mentioned something about backwards compatibility of Query objects, can you think of a case where someone's existing code might break because of that change? It might be worthwhile to include it in the release notes once we publish a new version
you've been running those changes in production. Are you still using the synchronous client in production as well? Just asking since that may reduce the chance of regressions with the synchronous client.
lastly, would you mind squashing commits before we can merge this, we like to keep commit history relatively clean to make it easier for everyone to track change history

xvrl · 2016-03-10T18:21:41Z

Otherwise I'm 👍 on this one, assuming someone has tested that the existing client still works well in practice.

* refactored and tested client abstraction * improved setup.py enabling pip extras * fixed python3 compatibility * fixed issues with test_export_tsv testcase * provided backward compatibility Fixes druid-io#29

turu · 2016-03-13T00:09:17Z

@xvrl Regarding the points you mentioned:

from current user's perspective, the main change is the fact that instead of returning a plain dictionary from client's methods, we are now returning a Query object. However, this object extends collections.MutableSequence and acts as a wrapper around the dictionary. As a result, the change should be completely transparent to the end user. All in all, I think I would recommend the same kind of caution as with every upgrade of external dependency - no less, no more.
we have been using the synchronous client, during development of our system and continue to do so, when experimenting with stuff. It's more convenient to use than the asynchronous in such scenarios. Actually, I've just made a small experiment, by running one of our services on the side, replacing async client with the synchronous one and leaving all the logic intact. It worked as expected.
should be better now

Add support for asynchronous client

fjy reviewed Feb 20, 2016
View reviewed changes

dakra mentioned this pull request Mar 11, 2016

fix python3 compatibility: utf-8 decode bytestream from urlopen #44

Closed

turu force-pushed the async_support branch from 38ba549 to 189bc34 Compare March 12, 2016 12:08

* added support for asynchronous client

166983e

* refactored and tested client abstraction * improved setup.py enabling pip extras * fixed python3 compatibility * fixed issues with test_export_tsv testcase * provided backward compatibility Fixes druid-io#29

turu force-pushed the async_support branch from 189bc34 to 166983e Compare March 12, 2016 12:26

fjy added a commit that referenced this pull request Mar 18, 2016

Merge pull request #39 from DreamLab/async_support

9373190

Add support for asynchronous client

fjy merged commit 9373190 into druid-io:master Mar 18, 2016

dakra mentioned this pull request May 21, 2016

Adding support for regex filter #52

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for asynchronous client #39

Add support for asynchronous client #39

turu commented Jan 15, 2016

fjy commented Jan 21, 2016

turu commented Jan 23, 2016

drcrallen commented Jan 23, 2016

drcrallen commented Jan 23, 2016

turu commented Jan 30, 2016

fjy commented Feb 4, 2016

fjy commented Feb 20, 2016

fjy Feb 20, 2016

turu Mar 7, 2016

turu commented Mar 7, 2016

xvrl commented Mar 10, 2016

xvrl commented Mar 10, 2016

turu commented Mar 13, 2016

Add support for asynchronous client #39

Add support for asynchronous client #39

Conversation

turu commented Jan 15, 2016

fjy commented Jan 21, 2016

turu commented Jan 23, 2016

drcrallen commented Jan 23, 2016

drcrallen commented Jan 23, 2016

turu commented Jan 30, 2016

fjy commented Feb 4, 2016

fjy commented Feb 20, 2016

fjy Feb 20, 2016

Choose a reason for hiding this comment

turu Mar 7, 2016

Choose a reason for hiding this comment

turu commented Mar 7, 2016

xvrl commented Mar 10, 2016

xvrl commented Mar 10, 2016

turu commented Mar 13, 2016