New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to harden and make aw-client more resilient #28

Merged
merged 11 commits into from Mar 17, 2018

Conversation

Projects
None yet
3 participants
@ErikBjare
Copy link
Member

ErikBjare commented Jan 30, 2018

  • Switched to using persist-queue instead of home-rolled variant

TODO

  • Ensure that invalid events aren't added to the queue (causing queue-blockage), as @ahnlabb learned the hard way.
    • Done, just did a simple assert isinstance(data, dict) for now (should only happen to people who develop new watchers anyway)
  • Use persist-queue properly (notably the behavior of task_done())

@wafflebot wafflebot bot added the review label Jan 30, 2018

@ErikBjare ErikBjare referenced this pull request Jan 30, 2018

Closed

Harden aw-client #170

@ErikBjare

This comment has been minimized.

Copy link
Member

ErikBjare commented Jan 30, 2018

Seems to work great, only one issue:

2018-01-30 12:40:22 [WARNING]: auto_commit=False is still experimental,only use it with care.  (persistqueue.sqlbase:100)

Not sure how experimental it is, perhaps we should reach out to the developer to find out.

@ErikBjare

This comment has been minimized.

Copy link
Member

ErikBjare commented Jan 30, 2018

I should add that I'm not certain this behaves the intended way in some circumstances, namely:

  • When I do a queue.get(), the request fails, and queue.task_done() is not called, will the same object be returned on the second get()? Probably not.

There's also an additional issue for future plans (but no deal breaker since it would add significant, hard-to-maintain, complexity to the preexisting solution as well):

  • If we want to push to the front of the queue or modify the last pushed event (such as heartbeat_merge) that is not supported. Although it might not be that hard to modify the library to do this.
@ErikBjare

This comment has been minimized.

Copy link
Member

ErikBjare commented Jan 30, 2018

I think I found another option: shelve in the standard library.

Edit: shelve could work, but might use a lot of memory and eat a significant chunk of CPU if the queue grows large. The best option still seems to be persist-queue, but it'd take some modifications to work as we want it to.

Edit 2: the cons of shelve might not be that bad if we also implement client-side heartbeat_merge since that makes the queue unlikely to grow very large.

@johan-bjareholt

This comment has been minimized.

Copy link
Member

johan-bjareholt commented Jan 30, 2018

Not sure how experimental it is, perhaps we should reach out to the developer to find out.

Yeah probably. auto_commit=True is set in all the examples in the README so it does seem encouraged there atleast which is odd.

@peter-wangxu

This comment has been minimized.

Copy link

peter-wangxu commented Feb 19, 2018

@ErikBjare I came across that your were using persist-queue, just want to answer some your doubt here:

Yeah probably. auto_commit=True is set in all the examples in the README so it does seem encouraged there atleast which is odd.

The warning has been removed in v0.3.5 now as the auto_commit=False is tested and performing very well.

Thanks
Peter

@ErikBjare

This comment has been minimized.

Copy link
Member

ErikBjare commented Feb 19, 2018

@peter-wangxu Awesome! Thanks for reaching out!

ErikBjare added some commits Feb 25, 2018

@ErikBjare

This comment has been minimized.

Copy link
Member

ErikBjare commented Mar 8, 2018

@johan-bjareholt This is ready for review, please review meticulously. I'll test it on my machine for the coming days.

@johan-bjareholt

This comment has been minimized.

Copy link
Member

johan-bjareholt commented Mar 8, 2018

Currently testing locally, seems to work. Will have a further look at the code itself later.

Still seems to recover incredibly slowly though same as previously, we really need to add the ability to send batches of heartbeats because sending heartbeats and events syncronously over HTTP is incredibly slow even locally. If i queue up heartbeats for 2min it takes another 2min for it to recover those events.

Maybe that's the issue why it is delayed on windows, because HTTP requests locally are even slower and doesn't have time to get ahead again after a bit of queueing? Just a guess.

@ErikBjare

This comment has been minimized.

Copy link
Member

ErikBjare commented Mar 8, 2018

@johan-bjareholt Yeah, batching or premerging would be two options. I started working on a premerge thingy based on this PR, but it quickly got out of hand. Still worth taking a look at just to see all the crazy edgecases when you need to take things like pulsetime into account (not to mention how it breaks many of the guarantees given by persist-queue): #29

@johan-bjareholt
Copy link
Member

johan-bjareholt left a comment

Not done reviewing yet, might be some more.
Seems good though from a first glance.

@@ -255,65 +255,44 @@ def __init__(self, client, dispatch_interval=0):
# Buckets that will have events queued to them, will be created if they don't exist
self._registered_buckets = [] # type: List[Bucket]

self._queue = queue.Queue()
self._attempt_reconnect_interval = 10

This comment has been minimized.

@johan-bjareholt

johan-bjareholt Mar 9, 2018

Member

Maybe add this to the config?

This comment has been minimized.

@ErikBjare

ErikBjare Mar 9, 2018

Member

Maybe, but I think it would be a bit overkill.

def test_pdb():
pdb.set_trace()
"""

This comment has been minimized.

@johan-bjareholt

johan-bjareholt Mar 9, 2018

Member

What's this?

This comment has been minimized.

@ErikBjare

ErikBjare Mar 16, 2018

Member

pdb is pretty handy when testing. You should try it sometime. Removed it now though.

def add_request(self, endpoint: str, data: dict) -> None:
"""
Add a request to the queue.
NOTE: Only supports heartbeats

This comment has been minimized.

@johan-bjareholt

johan-bjareholt Mar 9, 2018

Member

We used to support both events and heartbeats, what happened to that?

This comment has been minimized.

@ErikBjare

ErikBjare Mar 9, 2018

Member

We never really did in a good way, if we would do so properly it would cause a lot of additional complexity.


def _create_buckets(self):
persistqueue_path = os.path.join(queued_dir, self.client.name + ".v{}.persistqueue.sql".format(self.VERSION))

This comment has been minimized.

@johan-bjareholt

johan-bjareholt Mar 9, 2018

Member

.sql usually means that it contains a sql command rather than a sqlite db, should probably be just .db

This comment has been minimized.

@ErikBjare

ErikBjare Mar 9, 2018

Member

Good point, will fix.

@@ -243,7 +243,7 @@ class RequestQueue(threading.Thread):

VERSION = 1 # update this whenever the queue-file format changes

def __init__(self, client, dispatch_interval=0):
def __init__(self, client: ActivityWatchClient, dispatch_interval: float=0) -> None:

This comment has been minimized.

@johan-bjareholt

johan-bjareholt Mar 9, 2018

Member

Remove default dispatch_interval from class parameter? If it's only for testing aw-client you could just as well just hardcode it.

This comment has been minimized.

@ErikBjare

ErikBjare Mar 16, 2018

Member

dispatch_interval was never used in the class, removed.

def _try_connect(self) -> bool:
try: # Try to connect
self._create_buckets()
self.connected = True
logger.info("Connection to aw-server established")
logger.info("Connection to aw-server established by {}".format(self.client.client_name))

This comment has been minimized.

@johan-bjareholt

johan-bjareholt Mar 9, 2018

Member

Why was this change useful? Seems unnecessary to have it print the client name here?

This comment has been minimized.

@ErikBjare

ErikBjare Mar 9, 2018

Member

In the aw-qt log you couldn't tell which client had connected to the server.

This comment has been minimized.

@johan-bjareholt

johan-bjareholt Mar 10, 2018

Member

Hm, that sucks. It's a shame that python doesn't print the loggers name in the logmessage by default, it it often very useful.
GStreamer does logging very well with the loggers name, then you can filter them with simple env variables such as GST_DEBUG=mylogger:debug,mylogger2:warn etc.. Very neat.

This comment has been minimized.

@ErikBjare

ErikBjare Mar 16, 2018

Member

I'm pretty sure we could set something similar up in aw-core's setup_logging.

ErikBjare added some commits Mar 16, 2018

@ErikBjare

This comment has been minimized.

Copy link
Member

ErikBjare commented Mar 16, 2018

@johan-bjareholt Fixed a few things you mentioned, merge?

@johan-bjareholt johan-bjareholt merged commit 03c1f14 into master Mar 17, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@wafflebot wafflebot bot removed the review label Mar 17, 2018

@ErikBjare ErikBjare deleted the dev/harden branch Mar 18, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment