Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDK: private configuration option _limit and _page_size #5617

Merged
merged 3 commits into from
Aug 31, 2021

Conversation

avida
Copy link
Contributor

@avida avida commented Aug 25, 2021

What

Resolves #4985 issue

How

Describe the solution

Recommended reading order

  1. x.java
  2. y.python

Pre-merge Checklist

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • docs/SUMMARY.md
    • docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
    • docs/integrations/README.md
    • airbyte-integrations/builds.md
  • PR name follows PR naming conventions
  • Connector added to connector index like described here

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions
  • Connector version bumped like described here

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here

Connector Generator

  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed.

@github-actions github-actions bot added the CDK Connector Development Kit label Aug 25, 2021
@avida avida force-pushed the drezchykov/4985-add-SAT-keys branch 3 times, most recently from a780d0f to bdebac1 Compare August 25, 2021 11:37
Copy link
Contributor

@Zirochkaa Zirochkaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just small comment

@@ -135,6 +147,9 @@ def _read_stream(
for record in record_iterator:
if record.type == MessageType.RECORD:
record_counter += 1
if internal_config.limit and record_counter > internal_config.limit:
logger.info("Reached limit defined by internal config, stop reading")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add number, like so:

Suggested change
logger.info("Reached limit defined by internal config, stop reading")
logger.info(f"Reached limit defined by internal config ({internal_config.limit}), stop reading")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@avida avida force-pushed the drezchykov/4985-add-SAT-keys branch from bdebac1 to e36793f Compare August 27, 2021 12:41
@avida avida requested a review from sherifnada August 27, 2021 12:41
@sherifnada sherifnada requested review from Phlair and removed request for sherifnada August 27, 2021 13:39
Copy link
Contributor

@Phlair Phlair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this idea. The limit keyword makes total sense and could be used by any cdk stream off the bat with this change, nice!

For page_size, iiuc, it needs to be individually implemented per stream for this setting to have any effect? I think right now it's not entirely clear what page_size actually is... is it a limit to number of pages to paginate through? Is it limit for number of records per page? Could either / both of those settings be useful depending on the connector?
Ultimately, I'm wondering if just implementing the ability to create any _internal_setting would provide more flexibility, including allowing implementing a _page_size if needs be, but since there's no high-level logic changes for it, it could fit into the generic box too.



class InternalConfig(BaseModel):
KEYWORDS: ClassVar[set] = {"_limit", "_page_size"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see why we'd be explicit here to avoid accidentally putting actual config values in the internal config... however, in adding this feature does it perhaps make sense to enable adding any internal flags without the need to update cdk code each time (e.g. taking any keywords that start with _)?
This is more of an open question than a suggestion, so wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my overall review comment which links to this :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For page_size, iiuc, it needs to be individually implemented per stream for this setting to have any effect? I think right now it's not entirely clear what page_size actually is... is it a limit to number of pages to paginate through? Is it limit for number of records per page? Could either / both of those settings be useful depending on the connector?

Its number of record per page. Right now its not a standard property but some streams use page_size name to configure number of records per page. It works only for instances of HttpStream class.

Ultimately, I'm wondering if just implementing the ability to create any _internal_setting would provide more flexibility, including allowing implementing a _page_size if needs be, but since there's no high-level logic changes for it, it could fit into the generic box too.

Do you mean filtering out records that starts with _ ? In this case we need somehow distinguish internal config for cdk and connector itself. We don't want to filter out connector config and also we don't want to pass cdk config to the connector.

I can see why we'd be explicit here to avoid accidentally putting actual config values in the internal config... however, in adding this feature does it perhaps make sense to enable adding any internal flags without the need to update cdk code each time (e.g. taking any keywords that start with _)?
This is more of an open question than a suggestion, so wdyt?

Its good suggestion but there is some connectors that using its own internal config. From the connector side we can set additionalproperties flag to true for spec jsonschema and we are sure that any config with any internal properties will pass checking against spec schema. But for cdk we need to filter out cdk specific configs to make this check pass, that's why there is split_config function. It still allows adding connector specific internal parameters but the should not be same as ones that as used for cdk. If we going to make it too generic it would be a mess with cdk and connector paramters

@avida avida requested a review from Phlair August 27, 2021 16:01
Copy link
Contributor

@Phlair Phlair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for detailed responses on previous review. Understand and happy with approach based on that.

One thing I think is needed for clarity in adding these is detail around the availability of these flags, descriptions and how to use (for example, in the CDK tutorial in the docs maybe add a section on it?)

Copy link
Contributor

@Phlair Phlair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meant to approve with comment above ^

@avida
Copy link
Contributor Author

avida commented Aug 31, 2021

/publish-cdk dry-run=false

🕑 https://github.com/airbytehq/airbyte/actions/runs/1185502392
https://github.com/airbytehq/airbyte/actions/runs/1185502392

@avida avida merged commit 7584440 into master Aug 31, 2021
@avida avida deleted the drezchykov/4985-add-SAT-keys branch August 31, 2021 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CDK Connector Development Kit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CDK: add private configuration option _limit and _page_size
3 participants