Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to harden and make aw-client more resilient #28

Merged
merged 11 commits into from Mar 17, 2018

Conversation

ErikBjare
Copy link
Member

@ErikBjare ErikBjare commented Jan 30, 2018

  • Switched to using persist-queue instead of home-rolled variant

TODO

  • Ensure that invalid events aren't added to the queue (causing queue-blockage), as @ahnlabb learned the hard way.
    • Done, just did a simple assert isinstance(data, dict) for now (should only happen to people who develop new watchers anyway)
  • Use persist-queue properly (notably the behavior of task_done())

@ErikBjare
Copy link
Member Author

@ErikBjare ErikBjare commented Jan 30, 2018

Seems to work great, only one issue:

2018-01-30 12:40:22 [WARNING]: auto_commit=False is still experimental,only use it with care.  (persistqueue.sqlbase:100)

Not sure how experimental it is, perhaps we should reach out to the developer to find out.

@ErikBjare
Copy link
Member Author

@ErikBjare ErikBjare commented Jan 30, 2018

I should add that I'm not certain this behaves the intended way in some circumstances, namely:

  • When I do a queue.get(), the request fails, and queue.task_done() is not called, will the same object be returned on the second get()? Probably not.

There's also an additional issue for future plans (but no deal breaker since it would add significant, hard-to-maintain, complexity to the preexisting solution as well):

  • If we want to push to the front of the queue or modify the last pushed event (such as heartbeat_merge) that is not supported. Although it might not be that hard to modify the library to do this.

@ErikBjare
Copy link
Member Author

@ErikBjare ErikBjare commented Jan 30, 2018

I think I found another option: shelve in the standard library.

Edit: shelve could work, but might use a lot of memory and eat a significant chunk of CPU if the queue grows large. The best option still seems to be persist-queue, but it'd take some modifications to work as we want it to.

Edit 2: the cons of shelve might not be that bad if we also implement client-side heartbeat_merge since that makes the queue unlikely to grow very large.

@johan-bjareholt
Copy link
Member

@johan-bjareholt johan-bjareholt commented Jan 30, 2018

Not sure how experimental it is, perhaps we should reach out to the developer to find out.

Yeah probably. auto_commit=True is set in all the examples in the README so it does seem encouraged there atleast which is odd.

@peter-wangxu
Copy link

@peter-wangxu peter-wangxu commented Feb 19, 2018

@ErikBjare I came across that your were using persist-queue, just want to answer some your doubt here:

Yeah probably. auto_commit=True is set in all the examples in the README so it does seem encouraged there atleast which is odd.

The warning has been removed in v0.3.5 now as the auto_commit=False is tested and performing very well.

Thanks
Peter

@ErikBjare
Copy link
Member Author

@ErikBjare ErikBjare commented Feb 19, 2018

@peter-wangxu Awesome! Thanks for reaching out!

@ErikBjare
Copy link
Member Author

@ErikBjare ErikBjare commented Mar 8, 2018

@johan-bjareholt This is ready for review, please review meticulously. I'll test it on my machine for the coming days.

@johan-bjareholt
Copy link
Member

@johan-bjareholt johan-bjareholt commented Mar 8, 2018

Currently testing locally, seems to work. Will have a further look at the code itself later.

Still seems to recover incredibly slowly though same as previously, we really need to add the ability to send batches of heartbeats because sending heartbeats and events syncronously over HTTP is incredibly slow even locally. If i queue up heartbeats for 2min it takes another 2min for it to recover those events.

Maybe that's the issue why it is delayed on windows, because HTTP requests locally are even slower and doesn't have time to get ahead again after a bit of queueing? Just a guess.

@ErikBjare
Copy link
Member Author

@ErikBjare ErikBjare commented Mar 8, 2018

@johan-bjareholt Yeah, batching or premerging would be two options. I started working on a premerge thingy based on this PR, but it quickly got out of hand. Still worth taking a look at just to see all the crazy edgecases when you need to take things like pulsetime into account (not to mention how it breaks many of the guarantees given by persist-queue): #29

Copy link
Member

@johan-bjareholt johan-bjareholt left a comment

Not done reviewing yet, might be some more.
Seems good though from a first glance.

@@ -255,65 +255,44 @@ def __init__(self, client, dispatch_interval=0):
# Buckets that will have events queued to them, will be created if they don't exist
self._registered_buckets = [] # type: List[Bucket]

self._queue = queue.Queue()
self._attempt_reconnect_interval = 10
Copy link
Member

@johan-bjareholt johan-bjareholt Mar 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add this to the config?

Copy link
Member Author

@ErikBjare ErikBjare Mar 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but I think it would be a bit overkill.


def test_pdb():
pdb.set_trace()
"""
Copy link
Member

@johan-bjareholt johan-bjareholt Mar 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this?

Copy link
Member Author

@ErikBjare ErikBjare Mar 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pdb is pretty handy when testing. You should try it sometime. Removed it now though.

def add_request(self, endpoint: str, data: dict) -> None:
"""
Add a request to the queue.
NOTE: Only supports heartbeats
Copy link
Member

@johan-bjareholt johan-bjareholt Mar 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to support both events and heartbeats, what happened to that?

Copy link
Member Author

@ErikBjare ErikBjare Mar 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never really did in a good way, if we would do so properly it would cause a lot of additional complexity.


def _create_buckets(self):
persistqueue_path = os.path.join(queued_dir, self.client.name + ".v{}.persistqueue.sql".format(self.VERSION))
Copy link
Member

@johan-bjareholt johan-bjareholt Mar 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.sql usually means that it contains a sql command rather than a sqlite db, should probably be just .db

Copy link
Member Author

@ErikBjare ErikBjare Mar 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will fix.

@@ -243,7 +243,7 @@ class RequestQueue(threading.Thread):

VERSION = 1 # update this whenever the queue-file format changes

def __init__(self, client, dispatch_interval=0):
def __init__(self, client: ActivityWatchClient, dispatch_interval: float=0) -> None:
Copy link
Member

@johan-bjareholt johan-bjareholt Mar 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove default dispatch_interval from class parameter? If it's only for testing aw-client you could just as well just hardcode it.

Copy link
Member Author

@ErikBjare ErikBjare Mar 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dispatch_interval was never used in the class, removed.

def _try_connect(self) -> bool:
try: # Try to connect
self._create_buckets()
self.connected = True
logger.info("Connection to aw-server established")
logger.info("Connection to aw-server established by {}".format(self.client.client_name))
Copy link
Member

@johan-bjareholt johan-bjareholt Mar 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this change useful? Seems unnecessary to have it print the client name here?

Copy link
Member Author

@ErikBjare ErikBjare Mar 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the aw-qt log you couldn't tell which client had connected to the server.

Copy link
Member

@johan-bjareholt johan-bjareholt Mar 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that sucks. It's a shame that python doesn't print the loggers name in the logmessage by default, it it often very useful.
GStreamer does logging very well with the loggers name, then you can filter them with simple env variables such as GST_DEBUG=mylogger:debug,mylogger2:warn etc.. Very neat.

Copy link
Member Author

@ErikBjare ErikBjare Mar 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure we could set something similar up in aw-core's setup_logging.

@ErikBjare
Copy link
Member Author

@ErikBjare ErikBjare commented Mar 16, 2018

@johan-bjareholt Fixed a few things you mentioned, merge?

@johan-bjareholt johan-bjareholt merged commit 03c1f14 into master Mar 17, 2018
2 checks passed
@ghost ghost removed the review label Mar 17, 2018
@ErikBjare ErikBjare deleted the dev/harden branch Mar 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants