Skip to content

Conversation

lcmcninch
Copy link
Contributor

@lcmcninch lcmcninch commented Feb 26, 2021

This PR resolves #108, at least partially. The end result is that the data method can use the limit query parameter and link header to return any number of data points (or all data points) of a feed. Current functionality returns all data up to 1000 points and doesn't provide the option to either decrease or increase that limit.

Scope of change:

  • The method _get of the Client class has a new keyword argument "params" which is passed directly to requests.get. The default value is None, which is also the default for requests.get so this should be completely backward compatible.
  • client.py now imports from re and urllib.parse. I'm not an expert on the various versions of python but I don't think this will cause any compatibility issues.
  • An attribute _last_response is added to the Client class to store the response of the last request. It is updated with each _get, _post and _delete call. This isn't ideal but I thought it would be the least intrusive way of allowing access to properties of the response other than the json (such as the 'link' header). I'm open to other ideas.
  • A get_next_link method is added which parses the 'link' header from the _last_response. This method is based on this example which is referenced in the API documentation here. However, as discussed in this forum post, there is a bug in the API. If and when this bug is fixed, the get_next_link method would break and need to be updated.
  • Finally, a keyword argument max_results is added to the data method of Client. If max_results is less than the API page limit of 1000, it simply applies the lower limit. If it is larger than 1000, it will use the pagination link in the get response to retrieve more data until max_responses (or the end of the data) is reached. If max_responses is None, all of the data for the feed is retrieved. This should be backwards compatible as the current implementation always returns all data (up to 1000). If max_results isn't given, it defaults to 1000 and previous behavior should be retained.

Limitations:

  • Ideally we would also implement the start_time, end_time and include parameters of the API, but I wanted to get this pull request in and discuss before taking it any further. There is also the issue with start_time mentioned in this post which would need to be addressed.
  • I did not add the new functionality to the examples. Because this could move a lot of data, I'm not sure how to best implement it in the examples. Let me know if you have thoughts on how you'd want an example implemented.

… retrieval of more than 1000 data points.

This implementation is subject to an existing bug in the pagination link header in the API that will break when and if that bug is fixed.
Copy link
Member

@brentru brentru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi - thank you for the PR, asking for a few changes.

params = {'limit': max_results} if max_results else None
data = []
path = "feeds/{0}/data".format(feed)
while True:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change this from while True to a non-blocking loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your distaste for while True. I implemented it this way in the event the user wants all data points. In that case it runs indefinitely until it runs out of data to retrieve. To remove the while True, I see at least two options. Either way would require an additional query to get the number of data points on a feed using feeds/{feed_key}/details query.

  1. Disallow None for max_results. If we do this, we should expose the number of data points in a feed so that the user can request all data without having to guess how many points. Right now the Feed object doesn't include this.
  2. Allow None as max_results but internally we query the feed details to get the total number of data points on the feed.

I can see pros/cons either way. Let me know if you have a preference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side note, it might be worth thinking about implementing more details for Feed anyway. Maybe I'll open another issue for discussing this.

return Data.from_dict(self._get(path))

def data(self, feed, data_id=None):
def data(self, feed, data_id=None, max_results=API_PAGE_LIMIT):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a call to this function return the same value as the master branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is backwards compatible. If max_results isn't provided (legacy code wouldn't provide it), up to 1000 points will be returned, as before.

from .errors import RequestError, ThrottlingError
from .model import Data, Feed, Group

API_PAGE_LIMIT = 1000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you comment what this is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the web API returns a maximum of 1000 data points per page. I could have hard-coded this into the function definition for data, but I prefer to have constants like this defined up front. That way if the API were to change for whatever reason, the python library would only need to be updated in this one location.

@lcmcninch lcmcninch requested a review from brentru April 16, 2021 02:17
@lcmcninch
Copy link
Contributor Author

I made the requested change based on option two from my response. I query the feed count, and use that as max_results, if it is given as None. I'm not 100% sure that that query will always work, but it was the way I was able to find to get the number of records in a feed. If there's a cleaner way, please let me know.

@lcmcninch lcmcninch closed this Nov 30, 2021
@lcmcninch lcmcninch deleted the get-pages branch November 30, 2021 00:21
@lcmcninch lcmcninch restored the get-pages branch November 30, 2021 00:57
@lcmcninch
Copy link
Contributor Author

Apologies, I inadvertently deleted this branch, which closed the PR. I restored the branch and I am re-opening.

@lcmcninch lcmcninch reopened this Nov 30, 2021
@brentru
Copy link
Member

brentru commented Nov 30, 2021

@lcmcninch Hi - could you resolve the conflicting files and I'll look at merging this in.

# Conflicts:
#	tests/test_client.py
@lcmcninch
Copy link
Contributor Author

@lcmcninch Hi - could you resolve the conflicting files and I'll look at merging this in.

@brentru No problem, conflict resolved. Thanks for looking at this!

@jwcooper
Copy link
Member

jwcooper commented Dec 22, 2021

I think we should drop the default from 1000 records to something much lower. In fact, I think we may change the server to default to a lower limit as well. I'd like to think 100 records is a better default limit.

@lcmcninch
Copy link
Contributor Author

@jwcooper I'm happy to make the change, do you want me to reduce it to 100?

@lcmcninch
Copy link
Contributor Author

I lowered the default limit to 100.

@unlimitedbacon
Copy link

Any update on getting this merged?

@brentru brentru merged commit 06df42d into adafruit:master Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ability to fetch more than 1000 rows of data
4 participants