Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source-Postgres state messages in log is way too verbose #18765

Closed
ChristopheDuong opened this issue Nov 1, 2022 · 1 comment · Fixed by #20124
Closed

Source-Postgres state messages in log is way too verbose #18765

ChristopheDuong opened this issue Nov 1, 2022 · 1 comment · Fixed by #20124
Assignees
Labels
autoteam connectors/source/postgres needs-triage releaseStage/ga team/db-dw-sources Backlog for Database and Data Warehouse Sources team type/bug Something isn't working

Comments

@ChristopheDuong
Copy link
Contributor

ChristopheDuong commented Nov 1, 2022

Environment

  • Airbyte version: 0.40.17
  • OS Version / Instance: macOS
  • Deployment: Docker
  • Source Connector and version: source-postgres:1.0.22
  • Destination Connector and version: destination-bigquery:1.2.5
  • Step where error happened: Sync job

Current Behavior

The connector emits way too many states? or at least it's printing the value of the cursor way too many times in the logs:

2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230107.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230108.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230109.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230110.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230111.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230112.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230113.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230114.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230115.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230116.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230117.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230118.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230119.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230120.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230121.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230122.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230123.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230124.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230125.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230126.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230127.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230128.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230129.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230130.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230131.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230132.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230133.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230134.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230135.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230136.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230137.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230138.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230139.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230140.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230141.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230142.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230143.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230144.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230145.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230146.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230147.0 (count 1)
2022-11-01 09:37:57 �[44msource�[0m > State report for stream AirbyteStreamNameNamespacePair{name='users', namespace='huge_dataset_test'} - original: null = null (count 0) -> latest: id = 6230148.0 (count 1)

This results in a log file of more than 200-300 MB of useless line repetitions, adding loud noise around useful log information and slowing down unnecessarily the API and the UI when viewing logs.

Expected Behavior

Don't overcrowd the log outputs with so many state printing statements.

Logs

990a3933_e574_44ae_a392_e77dc3e1bb5a_logs_12_txt.zip

Are you willing to submit a PR?

No

@bleonard
Copy link
Contributor

bleonard commented Nov 30, 2022

Non CDC: Let's investigate to see if this is valuable at all (StateDecoratingIterator). If so, can it be rate limited (mod 20)?
Introduced recently. Why? If we can't figure it out, take it out.

Make sure this case didn't come up because of a side effect of a bug (and it shouldn't be logging that much) as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autoteam connectors/source/postgres needs-triage releaseStage/ga team/db-dw-sources Backlog for Database and Data Warehouse Sources team type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants