You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tldr: rest_api resource with selected: False that has downstream transformations which yield tables, does not receive limits when applied on the top level of the source.
While developing a pipeline with the rest_api verified source I encountered a situation where using a source.add_limit(#) does not apply as expected. The scenario is that the API endpoint I was consuming returns many subjects in a single json response. Instead of materializing a table on the top level object I have a transform function that pulls out various nested objects and arrays and yields records for separate tables. For reference something like:
Support_Ticket_base_object:
-- Customer Details
-- Ticket Summary details
-- Subarray of Tags
-- Subarray of Messages
-- Subarray of actions taken by support on the ticket
As I was manually controlling the ticket object base and didn't want a flattened/json field for tags/etc I had the top level resource marked as 'selected:False'.
Expected behavior
The resource to receive the limit applied to the source, and downstream transformations to only receive the limits amount of batches from their respective resource. In this scenario if the main resource returns batches of 100, then I would expect the transform to receive 4 batches.
@FridayPush I have a question about your report. The limit does not limit the amount of items but the amount of calls the resource makes to the API. Have you taken that into account? If every call to a collection endpoint yields 10 items, a limit of 5 will yield 50 items in total.
dlt version
0.5.1
Describe the problem
tldr: rest_api resource with
selected: False
that has downstream transformations which yield tables, does not receive limits when applied on the top level of the source.While developing a pipeline with the rest_api verified source I encountered a situation where using a
source.add_limit(#)
does not apply as expected. The scenario is that the API endpoint I was consuming returns many subjects in a single json response. Instead of materializing a table on the top level object I have a transform function that pulls out various nested objects and arrays and yields records for separate tables. For reference something like:Support_Ticket_base_object:
-- Customer Details
-- Ticket Summary details
-- Subarray of Tags
-- Subarray of Messages
-- Subarray of actions taken by support on the ticket
As I was manually controlling the ticket object base and didn't want a flattened/json field for tags/etc I had the top level resource marked as 'selected:False'.
Expected behavior
The resource to receive the limit applied to the source, and downstream transformations to only receive the limits amount of batches from their respective resource. In this scenario if the main resource returns batches of 100, then I would expect the transform to receive 4 batches.
Steps to reproduce
Operating system
macOS
Runtime environment
Local
Python version
3.11
dlt data source
rest_api
dlt destination
DuckDB
Other deployment details
No response
Additional information
Workaround is to materialize the table with
selected: True
while in development.The text was updated successfully, but these errors were encountered: