Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limits are not applied to non-selected Resources #1597

Closed
FridayPush opened this issue Jul 16, 2024 · 1 comment
Closed

Limits are not applied to non-selected Resources #1597

FridayPush opened this issue Jul 16, 2024 · 1 comment
Assignees
Labels
community This issue came from slack community workspace

Comments

@FridayPush
Copy link

dlt version

0.5.1

Describe the problem

tldr: rest_api resource with selected: False that has downstream transformations which yield tables, does not receive limits when applied on the top level of the source.

While developing a pipeline with the rest_api verified source I encountered a situation where using a source.add_limit(#) does not apply as expected. The scenario is that the API endpoint I was consuming returns many subjects in a single json response. Instead of materializing a table on the top level object I have a transform function that pulls out various nested objects and arrays and yields records for separate tables. For reference something like:

Support_Ticket_base_object:
-- Customer Details
-- Ticket Summary details
-- Subarray of Tags
-- Subarray of Messages
-- Subarray of actions taken by support on the ticket

As I was manually controlling the ticket object base and didn't want a flattened/json field for tags/etc I had the top level resource marked as 'selected:False'.

Expected behavior

The resource to receive the limit applied to the source, and downstream transformations to only receive the limits amount of batches from their respective resource. In this scenario if the main resource returns batches of 100, then I would expect the transform to receive 4 batches.

Steps to reproduce

source = rest_api_source({
    "client": {
        "base_url": "https://mydomain.gorgias.com/api/",
        "auth": auth,
        "paginator": JSONResponseCursorPaginator(cursor_path="meta.next_cursor"),
    },
    "resource_defaults": {
        "primary_key": "id",
        "write_disposition": "merge",
        "endpoint": {
            "params": {
                "limit": 100,
            },
        },
    },
    "resources": [
        {
            "name": "teams",
            "endpoint": {
                "path": "teams",
            },
            "selected": False,
        },
        {
            "name": "tickets",
            "endpoint": {
                "path": "tickets",
                "params": {
                    "order_by": "updated_datetime:desc",
                    "limit": 100,
                    "view_id": 123
                }
            },
            "selected": False,
        }
    ],
}, max_table_nesting=0)

@dlt.transformer
def process_tickets(ticket_items):
  ...code that takes tickets and plucks fields and nested dicts 'id' field to top level like user{"id":....} to user_id
   yield dlt.mark.with_table_name(resulting_dict_list, "tickets")

load_info = pipeline.run(source.add_limit(4))

Operating system

macOS

Runtime environment

Local

Python version

3.11

dlt data source

rest_api

dlt destination

DuckDB

Other deployment details

No response

Additional information

Workaround is to materialize the table with selected: True while in development.

@VioletM VioletM added the community This issue came from slack community workspace label Jul 22, 2024
@sh-rp
Copy link
Collaborator

sh-rp commented Jul 31, 2024

@FridayPush I have a question about your report. The limit does not limit the amount of items but the amount of calls the resource makes to the API. Have you taken that into account? If every call to a collection endpoint yields 10 items, a limit of 5 will yield 50 items in total.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community This issue came from slack community workspace
Projects
Status: Done
Development

No branches or pull requests

4 participants