Implement a means of getting more than 1000 data points #131

lcmcninch · 2021-02-26T15:42:49Z

This PR resolves #108, at least partially. The end result is that the data method can use the limit query parameter and link header to return any number of data points (or all data points) of a feed. Current functionality returns all data up to 1000 points and doesn't provide the option to either decrease or increase that limit.

Scope of change:

The method _get of the Client class has a new keyword argument "params" which is passed directly to requests.get. The default value is None, which is also the default for requests.get so this should be completely backward compatible.
client.py now imports from re and urllib.parse. I'm not an expert on the various versions of python but I don't think this will cause any compatibility issues.
An attribute _last_response is added to the Client class to store the response of the last request. It is updated with each _get, _post and _delete call. This isn't ideal but I thought it would be the least intrusive way of allowing access to properties of the response other than the json (such as the 'link' header). I'm open to other ideas.
A get_next_link method is added which parses the 'link' header from the _last_response. This method is based on this example which is referenced in the API documentation here. However, as discussed in this forum post, there is a bug in the API. If and when this bug is fixed, the get_next_link method would break and need to be updated.
Finally, a keyword argument max_results is added to the data method of Client. If max_results is less than the API page limit of 1000, it simply applies the lower limit. If it is larger than 1000, it will use the pagination link in the get response to retrieve more data until max_responses (or the end of the data) is reached. If max_responses is None, all of the data for the feed is retrieved. This should be backwards compatible as the current implementation always returns all data (up to 1000). If max_results isn't given, it defaults to 1000 and previous behavior should be retained.

Limitations:

Ideally we would also implement the start_time, end_time and include parameters of the API, but I wanted to get this pull request in and discuss before taking it any further. There is also the issue with start_time mentioned in this post which would need to be addressed.
I did not add the new functionality to the examples. Because this could move a lot of data, I'm not sure how to best implement it in the examples. Let me know if you have thoughts on how you'd want an example implemented.

…eters.

… retrieval of more than 1000 data points. This implementation is subject to an existing bug in the pagination link header in the API that will break when and if that bug is fixed.

brentru

Hi - thank you for the PR, asking for a few changes.

brentru · 2021-03-01T15:01:56Z

Adafruit_IO/client.py

+        params = {'limit': max_results} if max_results else None
+        data = []
+        path = "feeds/{0}/data".format(feed)
+        while True:


Could you change this from while True to a non-blocking loop?

I understand your distaste for while True. I implemented it this way in the event the user wants all data points. In that case it runs indefinitely until it runs out of data to retrieve. To remove the while True, I see at least two options. Either way would require an additional query to get the number of data points on a feed using feeds/{feed_key}/details query.

Disallow None for max_results. If we do this, we should expose the number of data points in a feed so that the user can request all data without having to guess how many points. Right now the Feed object doesn't include this.

Allow None as max_results but internally we query the feed details to get the total number of data points on the feed.

I can see pros/cons either way. Let me know if you have a preference.

As a side note, it might be worth thinking about implementing more details for Feed anyway. Maybe I'll open another issue for discussing this.

brentru · 2021-03-01T15:03:10Z

Adafruit_IO/client.py

        return Data.from_dict(self._get(path))

-    def data(self, feed, data_id=None):
+    def data(self, feed, data_id=None, max_results=API_PAGE_LIMIT):


Does a call to this function return the same value as the master branch?

Yes, this is backwards compatible. If max_results isn't provided (legacy code wouldn't provide it), up to 1000 points will be returned, as before.

brentru · 2021-03-01T15:03:42Z

Adafruit_IO/client.py

 from .errors import RequestError, ThrottlingError
 from .model import Data, Feed, Group

+API_PAGE_LIMIT = 1000


Could you comment what this is?

Currently, the web API returns a maximum of 1000 data points per page. I could have hard-coded this into the function definition for data, but I prefer to have constants like this defined up front. That way if the API were to change for whatever reason, the python library would only need to be updated in this one location.

…ne (caller is requesting all data).

lcmcninch · 2021-04-16T02:20:29Z

I made the requested change based on option two from my response. I query the feed count, and use that as max_results, if it is given as None. I'm not 100% sure that that query will always work, but it was the way I was able to find to get the number of records in a feed. If there's a cleaner way, please let me know.

lcmcninch · 2021-11-30T01:02:15Z

Apologies, I inadvertently deleted this branch, which closed the PR. I restored the branch and I am re-opening.

brentru · 2021-11-30T16:00:45Z

@lcmcninch Hi - could you resolve the conflicting files and I'll look at merging this in.

# Conflicts: # tests/test_client.py

lcmcninch · 2021-11-30T21:39:52Z

@lcmcninch Hi - could you resolve the conflicting files and I'll look at merging this in.

@brentru No problem, conflict resolved. Thanks for looking at this!

jwcooper · 2021-12-22T21:08:06Z

I think we should drop the default from 1000 records to something much lower. In fact, I think we may change the server to default to a lower limit as well. I'd like to think 100 records is a better default limit.

lcmcninch · 2021-12-22T22:25:47Z

@jwcooper I'm happy to make the change, do you want me to reduce it to 100?

lcmcninch · 2022-01-17T15:44:32Z

I lowered the default limit to 100.

unlimitedbacon · 2022-02-15T20:10:09Z

Any update on getting this merged?

lcmcninch added 2 commits February 25, 2021 20:34

Add params keyword argument to _get method to enable use of URL param…

e94e2d4

…eters.

Implement a "max_results" parameter for the data method, to allow for…

e9afd5b

… retrieval of more than 1000 data points. This implementation is subject to an existing bug in the pagination link header in the API that will break when and if that bug is fixed.

brentru requested changes Mar 1, 2021

View reviewed changes

Eliminate while True: by querying the feed count if max_results is No…

3d267e9

…ne (caller is requesting all data).

lcmcninch requested a review from brentru April 16, 2021 02:17

lcmcninch closed this Nov 30, 2021

lcmcninch deleted the get-pages branch November 30, 2021 00:21

lcmcninch restored the get-pages branch November 30, 2021 00:57

lcmcninch reopened this Nov 30, 2021

Merge branch 'master' into get-pages

c152480

# Conflicts: # tests/test_client.py

Lower default page limit from 1000 to 100.

b533482

brentru merged commit 06df42d into adafruit:master Mar 10, 2022

Implement a means of getting more than 1000 data points #131

Implement a means of getting more than 1000 data points #131

Uh oh!

Conversation

lcmcninch commented Feb 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brentru left a comment

Choose a reason for hiding this comment

Uh oh!

brentru Mar 1, 2021

Choose a reason for hiding this comment

Uh oh!

lcmcninch Mar 1, 2021

Choose a reason for hiding this comment

Uh oh!

lcmcninch Mar 1, 2021

Choose a reason for hiding this comment

Uh oh!

brentru Mar 1, 2021

Choose a reason for hiding this comment

Uh oh!

lcmcninch Mar 1, 2021

Choose a reason for hiding this comment

Uh oh!

brentru Mar 1, 2021

Choose a reason for hiding this comment

Uh oh!

lcmcninch Mar 1, 2021

Choose a reason for hiding this comment

Uh oh!

lcmcninch commented Apr 16, 2021

Uh oh!

lcmcninch commented Nov 30, 2021

Uh oh!

brentru commented Nov 30, 2021

Uh oh!

lcmcninch commented Nov 30, 2021

Uh oh!

jwcooper commented Dec 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lcmcninch commented Dec 22, 2021

Uh oh!

lcmcninch commented Jan 17, 2022

Uh oh!

unlimitedbacon commented Feb 15, 2022

Uh oh!

Uh oh!

lcmcninch commented Feb 26, 2021 •

edited

Loading

jwcooper commented Dec 22, 2021 •

edited

Loading