Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[airbyte-cdk] add print parameter to flush the print buffer after each invocation #37000

Merged
merged 1 commit into from
Apr 11, 2024

Conversation

brianjlai
Copy link
Contributor

@brianjlai brianjlai commented Apr 11, 2024

Fixes https://github.com/airbytehq/airbyte-internal-issues/issues/7087

Identifying the fix

For a small number of syncs to source-salesforce and source-stripe we were seeing a few records not make it to the platform. After validating that both source and platform counts were accurate and there was a legitimate functional mismatch we determined there's an issue with how we use print() in the entrypoint.

What the fix is

There is a Python article related to print buffers: https://realpython.com/python-flush-print-output/#set-the-flush-parameter-if-youve-disabled-newlines

Note: Remember that changing flush to True when printing output to a terminal only has an effect if you don鈥檛 use the default value for end. If you kept the default value for end, which is the newline character, then the data buffer would already flush automatically after each call to print(), even without the explicit flush=True.

In an earlier PR to fix concurrent CDK printing, we made a small change that replaced the default end value from \n to "", this had the side effect of disabling automatically flushing the print buffer. Adding back flush=True, gets us back to the previous behavior while still retaining the original fix that was made.

This does give me a slight pause because always flushing can have performance implications, but I think our biggest performance bottleneck is still like API request/response and this fix just aligns back to how things were acting before. I wish it were super clear why print() at high volume isn't working, but right now we need to fix the dropped records


Ellipsis 馃殌 This PR description was created by Ellipsis for commit 565847e.

Summary:

This PR fixes an issue with dropped records during syncs by adding flush=True to the print() function in the launch() function in /airbyte-cdk/python/airbyte_cdk/entrypoint.py, restoring automatic flushing of the print buffer.

Key points:

  • Identified an issue with the use of print() in the entrypoint causing dropped records during syncs to source-salesforce and source-stripe.
  • Added flush=True to the print() function in the launch() function in /airbyte-cdk/python/airbyte_cdk/entrypoint.py to restore automatic flushing of the print buffer.
  • The change is expected to fix the issue of dropped records without significantly impacting performance.

Generated with 鉂わ笍 by ellipsis.dev

Copy link

vercel bot commented Apr 11, 2024

The latest updates on your projects. Learn more about Vercel for Git 鈫楋笌

1 Ignored Deployment
Name Status Preview Updated (UTC)
airbyte-docs 猬滐笍 Ignored (Inspect) Apr 11, 2024 5:16pm

@octavia-squidington-iii octavia-squidington-iii added the CDK Connector Development Kit label Apr 11, 2024
@brianjlai brianjlai marked this pull request as ready for review April 11, 2024 18:39
@brianjlai brianjlai requested a review from a team as a code owner April 11, 2024 18:39
Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

馃憤 Looks good to me!

  • Reviewed the entire pull request up to 565847e
  • Looked at 14 lines of code in 1 files
  • Took 1 minute and 22 seconds to review
More info
  • Skipped 0 files when reviewing.
  • Skipped posting 1 additional comments because they didn't meet confidence threshold of 50%.
1. airbyte-cdk/python/airbyte_cdk/entrypoint.py:238:
  • Assessed confidence : 0%
  • Comment:
    The change made here seems to be in line with the PR description and should help prevent records from being dropped due to the print buffer not being flushed. The potential performance implications have been acknowledged and deemed acceptable given the current performance bottlenecks.
  • Reasoning:
    The change made in the PR seems to be in line with the description provided. The author has added the 'flush=True' parameter to the print function to ensure that the print buffer is flushed after each invocation. This is done to prevent any records from being dropped due to the print buffer not being flushed. The author has also provided a valid explanation for why this change is necessary and has acknowledged the potential performance implications. However, they have also noted that the main performance bottleneck is likely the API request/response, not the print function. Therefore, this change should not significantly impact performance. I don't see any clear violations of best practices, logical bugs, performance bugs, or security bugs in this change.

Workflow ID: wflow_3FsNRqpQ8scCdzis


Not what you expected? You can customize the content of the reviews using rules. Learn more here.

Copy link
Contributor

@girarda girarda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@brianjlai brianjlai merged commit 822d76c into master Apr 11, 2024
32 checks passed
@brianjlai brianjlai deleted the brian/fix_cdk_message_printing branch April 11, 2024 19:11
Copy link

sentry-io bot commented Apr 17, 2024

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • 鈥硷笍 requests.exceptions.InvalidURL: Invalid URL ParseResult(scheme='https', netloc='wioga%20basely.myshopify.com', path='/admin/api/2... /usr/local/lib/python3.9/site-packages/airbyte_... View Issue

Did you find this useful? React with a 馃憤 or 馃憥

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CDK Connector Development Kit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants