Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Connector-builder server] Allow client to specify record limit and enforce max of 1000 #20575

Merged
merged 6 commits into from
Jan 11, 2023

Conversation

clnoll
Copy link
Contributor

@clnoll clnoll commented Dec 16, 2022

What

In order to improve performance of test read requests in the connector builder UI, we want to avoid fetching more data from the source than is necessary to test the connection.

This update allows connector builder user to specify a record_limit that will limit the number of records returned to the client when they are testing the connector. Also enforces a maximum limit of 1000 records.

This is the server portion of #19500.

How

  • Updates the stream/read endpoint's to accept an optional record_limit key in the request body.
  • Adds a MAX_RECORD_LIMIT whose values is 1000; if the limit input by the user is not set or exceeds 1000, MAX_RECORD_LIMIT will be enforced.
  • Stops iteration over the messages generator (which fetches records from the source) once <limit> records have been seen.

Recommended reading order

  1. airbyte-connector-builder-server/connector_builder/impl/default_api.py
  2. airbyte-connector-builder-server/src/main/openapi/openapi.yaml

🚨 User Impact 🚨

No breaking changes. The user will see <limit> records when they click Test to test the connector, rather than all messages that would have been fetched before this change.

Tests

Unit

Put your unit tests output here.

Integration

Put your integration tests output here.

Acceptance

Put your acceptance tests output here.

@clnoll clnoll force-pushed the connector-builder-server-limit-records-read branch from ea87097 to 4285b70 Compare December 16, 2022 15:35
@clnoll clnoll temporarily deployed to more-secrets December 16, 2022 15:37 — with GitHub Actions Inactive
@clnoll clnoll temporarily deployed to more-secrets December 16, 2022 15:37 — with GitHub Actions Inactive
@clnoll clnoll temporarily deployed to more-secrets December 16, 2022 15:41 — with GitHub Actions Inactive
@clnoll clnoll temporarily deployed to more-secrets December 16, 2022 15:42 — with GitHub Actions Inactive
Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM but just a question on test/code structure

@@ -144,6 +144,9 @@ components:
type: object
description: The AirbyteStateMessage object to use as the starting state for this read
# $ref: "#/components/schemas/AirbyteProtocol/definitions/AirbyteStateMessage"
record_limit:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmossman do we need to regenerate any typescript code? (if yes we should make it more obvious/automatic for anyone touching this code)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generated typescript code is not committed to github, and since this is just adding a new optional parameter it won't break any of our code that consumes that generated typescript code. So this shouldn't require any frontend changes

else:
record_limit = min(request_record_limit, max_record_limit)

with patch.object(DefaultApiImpl, "MAX_RECORD_LIMIT", max_record_limit):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like having to patch in this way breaks encapsulation. Does it make sense to make this an optional param on DefaultApiImpl?

Copy link
Contributor Author

@clnoll clnoll Dec 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you, doesn't quite feel right to have MAX_RECORD_LIMIT on DefaultApiImpl. WDYT about moving it to a constant outside of the class? We could make it an optional param but it this specific variable doesn't feel to me like it's part of DefaultApiImpl which gives me a slight preference for making it a constant. But I could be convinced either way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just meant it would be better to patch it using depenency injection i.e: make this an init variable for DefaultApiImpl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay cool, I think that could be nice and tidy. Will make that change.

])
request_record_limit = 0

with patch.object(DefaultApiImpl, "_create_low_code_adapter", return_value=mock_source_adapter):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question, is this telling us that our code structure needs to change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular structural change you had in mind? I imagine this function was patched out of convenience (it's used throughout the test_default_api.py file) - looks to me like there are multiple places downstream that we could patch instead if we wanted but that might be a little trickier. As a simple alternative, _create_low_code_adapter is a staticmethod and doesn't need to be a method on the class - it could be moved to its own function if we don't want to surface the details of DefaultApiImpl in tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one restructuring that could make sense would be to move some implementation out of DefaultApiImpl, and have a separate class doing the business logic of fetching records. That separate class could take an adapter object, which could be injected during tests. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a lighter-weight change than moving all of the record-fetching out of DefaultApiImpl, injecting the adapter class into DefaultApiImpl.__init__ instead. This cleans up the patches.

@clnoll clnoll force-pushed the connector-builder-server-limit-records-read branch from a78e01e to 3e19fc7 Compare December 16, 2022 23:04
@clnoll clnoll temporarily deployed to more-secrets December 16, 2022 23:06 — with GitHub Actions Inactive
@clnoll clnoll temporarily deployed to more-secrets December 16, 2022 23:06 — with GitHub Actions Inactive
@clnoll clnoll temporarily deployed to more-secrets December 17, 2022 00:03 — with GitHub Actions Inactive
@clnoll clnoll temporarily deployed to more-secrets December 17, 2022 00:03 — with GitHub Actions Inactive


class DefaultApiImpl(DefaultApi):
logger = logging.getLogger("airbyte.connector-builder")

def __init__(self, adapter_cls: Type, max_record_limit: int = 1000):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we passing the class instead of the object?

Copy link
Contributor Author

@clnoll clnoll Dec 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The adapter requires a manifest, which we don't have when DefaultApiImpl is instantiated.

@@ -105,15 +109,22 @@ async def read_stream(self, stream_read_request_body: StreamReadRequestBody = Bo
Using the provided manifest and config, invokes a sync for the specified stream and returns groups of Airbyte messages
that are produced during the read operation
:param stream_read_request_body: Input parameters to trigger the read operation for a stream
:param limit: The maximum number of records requested by the client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we explicitly define the range as [1, max_record_limit]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good call. Added.

@@ -144,6 +144,9 @@ components:
type: object
description: The AirbyteStateMessage object to use as the starting state for this read
# $ref: "#/components/schemas/AirbyteProtocol/definitions/AirbyteStateMessage"
record_limit:
type: integer
description: Number of records that will be returned to the client from the connector builder (max of 1000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the minimum and maximum could be defined here so the FE doesn't have to redefine it https://swagger.io/docs/specification/data-models/data-types/#numbers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@clnoll clnoll temporarily deployed to more-secrets December 19, 2022 16:59 — with GitHub Actions Inactive
@clnoll
Copy link
Contributor Author

clnoll commented Dec 19, 2022

@girarda I've addressed your comments if you want to take another look.

@clnoll clnoll temporarily deployed to more-secrets December 19, 2022 17:02 — with GitHub Actions Inactive
@clnoll clnoll temporarily deployed to more-secrets December 19, 2022 17:03 — with GitHub Actions Inactive
@@ -27,7 +27,8 @@
class DefaultApiImpl(DefaultApi):
logger = logging.getLogger("airbyte.connector-builder")

def __init__(self, max_record_limit: int = 1000):
def __init__(self, adapter_cls: Type, max_record_limit: int = 1000):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two problems with this type signature:

  1. Type is not very informative (is it the correct type? it seems imported from airbyte_cdk.models which doesn't seem right?)
  2. as a result it's not clear what methods can be called on that type and with what params (e.g: how does the reader know the adapter_cls can be called with a manifest argument?)

Example of a signature which solves these might be:

class CdkAdapter(abc):
  <..describe methods we expect this class to have..>

class DefaultApiImpl(..):
  def __init__(self, adapter_cls: Callable[[], CdkAdapter], ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good call @sherifnada - that was supposed to be lowercase type. I like your suggestion to use an interface to be clearer about the type and went ahead and made that change.

@clnoll clnoll force-pushed the connector-builder-server-limit-records-read branch from 5468b97 to 58893e1 Compare January 9, 2023 21:10
@clnoll clnoll temporarily deployed to more-secrets January 9, 2023 21:12 — with GitHub Actions Inactive
@clnoll clnoll temporarily deployed to more-secrets January 9, 2023 21:12 — with GitHub Actions Inactive
@clnoll clnoll temporarily deployed to more-secrets January 9, 2023 21:40 — with GitHub Actions Inactive
@clnoll clnoll temporarily deployed to more-secrets January 9, 2023 21:40 — with GitHub Actions Inactive
@clnoll clnoll temporarily deployed to more-secrets January 9, 2023 22:24 — with GitHub Actions Inactive
@clnoll clnoll temporarily deployed to more-secrets January 9, 2023 22:25 — with GitHub Actions Inactive
@clnoll clnoll temporarily deployed to more-secrets January 10, 2023 15:11 — with GitHub Actions Inactive
@clnoll clnoll temporarily deployed to more-secrets January 10, 2023 15:11 — with GitHub Actions Inactive
@clnoll clnoll force-pushed the connector-builder-server-limit-records-read branch from f90f75d to 51a1c08 Compare January 10, 2023 15:47
@clnoll clnoll temporarily deployed to more-secrets January 10, 2023 15:50 — with GitHub Actions Inactive
@clnoll clnoll temporarily deployed to more-secrets January 10, 2023 15:50 — with GitHub Actions Inactive
@clnoll clnoll temporarily deployed to more-secrets January 10, 2023 16:52 — with GitHub Actions Inactive
@clnoll clnoll temporarily deployed to more-secrets January 10, 2023 16:53 — with GitHub Actions Inactive
@clnoll clnoll temporarily deployed to more-secrets January 11, 2023 13:50 — with GitHub Actions Inactive
@clnoll clnoll temporarily deployed to more-secrets January 11, 2023 13:50 — with GitHub Actions Inactive
@clnoll clnoll temporarily deployed to more-secrets January 11, 2023 14:47 — with GitHub Actions Inactive
@clnoll clnoll temporarily deployed to more-secrets January 11, 2023 14:48 — with GitHub Actions Inactive
@clnoll clnoll merged commit 8ef2872 into master Jan 11, 2023
@clnoll clnoll deleted the connector-builder-server-limit-records-read branch January 11, 2023 15:22
@clnoll clnoll restored the connector-builder-server-limit-records-read branch January 11, 2023 21:18
@clnoll clnoll temporarily deployed to more-secrets January 11, 2023 21:20 — with GitHub Actions Inactive
@clnoll clnoll temporarily deployed to more-secrets January 11, 2023 21:21 — with GitHub Actions Inactive
jbfbell pushed a commit that referenced this pull request Jan 13, 2023
@clnoll clnoll deleted the connector-builder-server-limit-records-read branch January 20, 2023 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants