Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source Zendesk Supports: fails to sync over 100k organization records. #14253

Closed
emilmar opened this issue Jun 29, 2022 · 23 comments
Closed

Source Zendesk Supports: fails to sync over 100k organization records. #14253

emilmar opened this issue Jun 29, 2022 · 23 comments

Comments

@emilmar
Copy link

emilmar commented Jun 29, 2022

First off - thanks for making a great product which has solved 90% of my headaches!

Environment

  • Airbyte version: 0.39.26-alpha
  • OS Version / Instance: Debian, Debian GNU/Linux, 11 (bullseye), amd64 built on 20220621, GCP n2. medium
  • Deployment: GCP compute engine
  • Source Connector and version: Zendesk Support 0.2.11
  • Destination Connector and version: BigQuery 1.1.11
  • Step where error happened: Sync job

Current Behavior

Syncing over 100k organization records causes a 429 response code of rate limitation.
Response Code: 429, Response Text: {"detail":"Rate limited by Zendesk edge protection","status":"429","title":"Automated response"}

Expected Behavior

When reaching the rate limit, it should back off/slow down the requests and continue to sync the full dataset.

Logs

logs-37.txt

Steps to Reproduce

  1. Setup a connection between Zendesk Support and BigQuery
  2. Select only organizations and choose any sync mode. Start sync

Are you willing to submit a PR?

No

@marcosmarxm
Copy link
Member

marcosmarxm commented Jun 29, 2022

Similar to: #14253

Similar to #12591

@emilmar
Copy link
Author

emilmar commented Jun 30, 2022

@marcosmarxm I think you may have mistyped another issue, the link you provided goes to this issue.

@marcosmarxm
Copy link
Member

thanks, linked to correct issue.

@BenoitHugonnard
Copy link
Contributor

How can we fix this ? I'm willing to look into it.
At first glance it seems like Zendesk doesn't file the "Retry After" header in this specific case causing the connector to retry very quickly and then reaching the max retry. I have the same issue, had to downgrade to 0.2.8 to make it work.

@emilmar
Copy link
Author

emilmar commented Nov 16, 2022

Hi @BenoitHugonnard
Thanks for informing me about the downgrade possibility to work around it!

I'm not competent enough to fix this but I'm your biggest supporter in making it work. That's interesting what you say is the probable cause, could it perhaps be reported to Zendesk too?

@BenoitHugonnard
Copy link
Contributor

Found a very quick fix : What I found is that the _retry was never called because the response was always False when catching a 429.
What I did was replace the if response by if response or response.header and it worked (very long sync but that's expected as pagination after 1000 pages is limited to 10 calls/min.
I'm not very aware of how we can submit PR though but I'd gladly do it

@emilmar
Copy link
Author

emilmar commented Nov 17, 2022

Oh my! Awesome, so response is False when it's 429? I guess I could change it locally and it would work...
Yes that's expected, but on the other hand after the first successful sync it never needs to read the whole thing again (I hope)

I'm just surprised nobody else encountered this issue so far. Is it due to us not having extended API limits? I mean, 100k organization's is not that rare... right? ;)

Seems like the steps are few... probably a lot of boilerplate needed though!
https://airbytehq.github.io/connector-development/

@BenoitHugonnard
Copy link
Contributor

Yep response was false on 429.
The issue is that we have to go through the whole organizations each time because the API doesn't allow to query with updated_at so Airbyte locally download everything but only pushes changes.

@emilmar
Copy link
Author

emilmar commented Nov 17, 2022

So every time the sync runs, even with this fix, it will be slow?

@BenoitHugonnard
Copy link
Contributor

Yes but at least it will work and not exit! The slowness is on Zendesk side, we are limited to 10 query per minute when we query pages higher than 1000 (100k orgz)

@marcosmarxm
Copy link
Member

@BenoitHugonnard can you send me a message in Slack so I can help you submitting a PR?

@marcosmarxm marcosmarxm changed the title Zendesk Supports fails to sync over 100k organization records. Source Zendesk Supports: fails to sync over 100k organization records. Nov 30, 2022
@emilmar
Copy link
Author

emilmar commented Dec 14, 2022

@BenoitHugonnard Thanks for releasing a PR. I'm running it now to confirm the change is working as expected, is it working on your end?

Progress on my end is that it has not updated the logs in Airbyte for 40 minutes, stuck at 70k records, but I can see in the Zendesk api monitor that Airbyte is making requests.

EDIT: It finished after 1 hour! The airbyte logging froze for 80 minutes, and it did not sync a single organization record above 100k. My next sync is already freezing too.

Not so sure I'm ready to close this issue as fixed.

@emilmar
Copy link
Author

emilmar commented Dec 14, 2022

I'm pausing the sync for now, it caused trouble. Curious to hear your experience with it.

@marcosmarxm
Copy link
Member

@BenoitHugonnard would be nice to hear about your feedback here.

@BenoitHugonnard
Copy link
Contributor

On my side it did freeze indeed but it worked 🤔

2022-12-15 08:08:33 �[44msource�[0m > Syncing stream: organizations 
2022-12-15 08:08:34 �[43mdestination�[0m > Starting a new buffer for stream organizations (current state: 405 KB in 4 buffers)
2022-12-15 08:08:35 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 2000 (2 MB)
2022-12-15 08:08:36 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 3000 (3 MB)
2022-12-15 08:08:37 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 4000 (4 MB)
2022-12-15 08:08:37 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 5000 (4 MB)
2022-12-15 08:08:38 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 6000 (5 MB)
2022-12-15 08:08:39 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 7000 (6 MB)
2022-12-15 08:08:40 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 8000 (7 MB)
2022-12-15 08:08:41 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 9000 (7 MB)
2022-12-15 08:08:42 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 10000 (8 MB)
2022-12-15 08:08:43 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 11000 (9 MB)
2022-12-15 08:08:43 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 12000 (10 MB)
2022-12-15 08:08:44 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 13000 (11 MB)
2022-12-15 08:08:45 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 14000 (11 MB)
2022-12-15 08:08:46 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 15000 (12 MB)
2022-12-15 08:08:47 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 16000 (13 MB)
2022-12-15 08:08:48 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 17000 (14 MB)
2022-12-15 08:08:48 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 18000 (14 MB)
2022-12-15 08:08:49 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 19000 (15 MB)
2022-12-15 08:08:50 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 20000 (16 MB)
2022-12-15 08:08:51 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 21000 (17 MB)
2022-12-15 08:08:52 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 22000 (17 MB)
2022-12-15 08:08:53 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 23000 (18 MB)
2022-12-15 08:08:53 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 24000 (19 MB)
2022-12-15 08:08:54 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 25000 (20 MB)
2022-12-15 08:08:55 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 26000 (20 MB)
2022-12-15 08:08:56 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 27000 (21 MB)
2022-12-15 08:08:57 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 28000 (22 MB)
2022-12-15 08:08:58 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 29000 (23 MB)
2022-12-15 08:08:58 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 30000 (23 MB)
2022-12-15 08:08:59 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 31000 (24 MB)
2022-12-15 08:09:00 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 32000 (25 MB)
2022-12-15 08:09:01 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 33000 (26 MB)
2022-12-15 08:09:02 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 34000 (27 MB)
2022-12-15 08:09:03 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 35000 (27 MB)
2022-12-15 08:09:03 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 36000 (28 MB)
2022-12-15 08:09:04 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 37000 (29 MB)
2022-12-15 08:09:05 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 38000 (30 MB)
2022-12-15 08:09:06 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 39000 (30 MB)
2022-12-15 08:09:07 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 40000 (31 MB)
2022-12-15 08:09:08 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 41000 (32 MB)
2022-12-15 08:09:08 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 42000 (33 MB)
2022-12-15 08:09:09 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 43000 (34 MB)
2022-12-15 08:09:10 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 44000 (34 MB)
2022-12-15 08:09:11 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 45000 (35 MB)
2022-12-15 08:09:11 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 46000 (36 MB)
2022-12-15 08:09:12 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 47000 (37 MB)
2022-12-15 08:09:13 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 48000 (37 MB)
2022-12-15 08:09:14 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 49000 (38 MB)
2022-12-15 08:09:14 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 50000 (39 MB)
2022-12-15 08:09:15 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 51000 (40 MB)
2022-12-15 08:09:16 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 52000 (41 MB)
2022-12-15 08:09:17 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 53000 (41 MB)
2022-12-15 08:09:17 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 54000 (42 MB)
2022-12-15 08:09:18 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 55000 (43 MB)
2022-12-15 08:09:19 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 56000 (44 MB)
2022-12-15 08:09:20 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 57000 (44 MB)
2022-12-15 08:09:20 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 58000 (45 MB)
2022-12-15 08:09:21 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 59000 (46 MB)
2022-12-15 08:09:22 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 60000 (47 MB)
2022-12-15 08:09:23 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 61000 (47 MB)
2022-12-15 08:09:23 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 62000 (48 MB)
2022-12-15 08:09:24 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 63000 (49 MB)
2022-12-15 08:09:25 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 64000 (50 MB)
2022-12-15 08:09:26 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 65000 (50 MB)
2022-12-15 08:09:26 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 66000 (51 MB)
2022-12-15 08:09:27 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 67000 (52 MB)
2022-12-15 08:09:28 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 68000 (53 MB)
2022-12-15 08:09:29 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 69000 (53 MB)
2022-12-15 08:09:29 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 70000 (54 MB)
2022-12-15 08:09:30 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 71000 (55 MB)
2022-12-15 08:09:31 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 72000 (56 MB)
2022-12-15 08:09:32 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 73000 (56 MB)
2022-12-15 08:09:33 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 74000 (57 MB)
2022-12-15 08:09:33 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 75000 (58 MB)
2022-12-15 08:09:34 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 76000 (59 MB)
2022-12-15 08:09:35 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 77000 (59 MB)
2022-12-15 08:09:36 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 78000 (60 MB)
2022-12-15 08:09:37 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 79000 (61 MB)
2022-12-15 08:09:37 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 80000 (62 MB)
2022-12-15 08:09:38 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 81000 (62 MB)
2022-12-15 08:09:39 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 82000 (63 MB)
2022-12-15 08:09:39 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 83000 (64 MB)
2022-12-15 08:09:40 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 84000 (65 MB)
2022-12-15 08:09:41 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 85000 (65 MB)
2022-12-15 08:09:42 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 86000 (66 MB)
2022-12-15 08:09:43 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 87000 (67 MB)
2022-12-15 08:09:43 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 88000 (68 MB)
2022-12-15 08:09:44 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 89000 (69 MB)
2022-12-15 08:09:45 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 90000 (69 MB)
2022-12-15 08:09:46 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 91000 (70 MB)
2022-12-15 08:09:46 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 92000 (71 MB)
2022-12-15 08:09:47 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 93000 (72 MB)
2022-12-15 08:09:48 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 94000 (72 MB)
2022-12-15 08:09:49 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 95000 (73 MB)
2022-12-15 08:09:50 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 96000 (74 MB)
2022-12-15 08:09:50 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 97000 (75 MB)
2022-12-15 08:09:51 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 98000 (75 MB)
2022-12-15 08:09:52 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 99000 (76 MB)
2022-12-15 08:09:53 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 100000 (77 MB)
2022-12-15 08:09:53 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 101000 (78 MB)
2022-12-15 08:09:54 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 102000 (78 MB)
2022-12-15 08:51:00 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 103000 (79 MB)
2022-12-15 08:51:00 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 104000 (80 MB)
2022-12-15 08:51:01 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 105000 (81 MB)
2022-12-15 08:51:02 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 106000 (82 MB)
2022-12-15 08:51:03 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 107000 (82 MB)
2022-12-15 08:51:03 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 108000 (83 MB)
2022-12-15 08:51:04 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 109000 (84 MB)
2022-12-15 08:51:05 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 110000 (85 MB)
2022-12-15 08:51:05 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 111000 (86 MB)
2022-12-15 08:51:06 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 112000 (86 MB)
2022-12-15 08:51:07 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 113000 (87 MB)
2022-12-15 08:51:08 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 114000 (88 MB)
2022-12-15 08:51:08 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 115000 (89 MB)
2022-12-15 08:51:09 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 116000 (89 MB)
2022-12-15 08:51:10 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 117000 (90 MB)
2022-12-15 08:51:10 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 118000 (91 MB)
2022-12-15 08:51:11 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 119000 (92 MB)
2022-12-15 08:51:12 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 120000 (92 MB)
2022-12-15 08:51:13 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 121000 (93 MB)
2022-12-15 08:51:13 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$getReplicationRunnable$6):369 - Records read: 122000 (94 MB)
2022-12-15 08:51:13 �[44msource�[0m > Read 121000 records from organizations stream
2022-12-15 08:51:13 �[44msource�[0m > Finished syncing organizations

@BenoitHugonnard
Copy link
Contributor

The thing is after 100k organizations (1000 pages of 100 organizations) Zendesk switch from 700 requests/min to 10 requests/min.

Airbyte choose to retry after 2x the time Zendesk asks a retry.

So I have 120k organizations, very fast for 100k and then 1 min per 1k organizations so it would be 20 minutes but as Airbyte wait twice as long as recommended, 40 minutes. Exactly what the logs said.

There is a special thing about logs in Airbyte that I quite don't understand yet, they can freeze for a long time before showing but the end result is the same.

@emilmar
Copy link
Author

emilmar commented Dec 15, 2022

Happy to hear it worked out for you! Here's a shortened version of my logs

2022-12-14 12:08:59 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$readFromSrcAndWriteToDstRunnable$6):321 - Records read: 65000 (35 MB)
2022-12-14 12:08:59 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$readFromSrcAndWriteToDstRunnable$6):321 - Records read: 66000 (36 MB)
2022-12-14 12:09:00 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$readFromSrcAndWriteToDstRunnable$6):321 - Records read: 67000 (37 MB)
2022-12-14 12:09:01 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$readFromSrcAndWriteToDstRunnable$6):321 - Records read: 68000 (37 MB)
2022-12-14 12:09:01 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$readFromSrcAndWriteToDstRunnable$6):321 - Records read: 69000 (38 MB)
2022-12-14 13:22:04 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$readFromSrcAndWriteToDstRunnable$6):321 - Records read: 70000 (38 MB)
2022-12-14 13:22:04 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$readFromSrcAndWriteToDstRunnable$6):321 - Records read: 71000 (39 MB)
2022-12-14 13:22:05 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$readFromSrcAndWriteToDstRunnable$6):321 - Records read: 72000 (39 MB)
...
2022-12-14 13:22:23 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$readFromSrcAndWriteToDstRunnable$6):321 - Records read: 97000 (53 MB)
2022-12-14 13:22:23 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$readFromSrcAndWriteToDstRunnable$6):321 - Records read: 98000 (54 MB)
2022-12-14 13:22:24 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$readFromSrcAndWriteToDstRunnable$6):321 - Records read: 99000 (54 MB)
2022-12-14 13:22:25 �[32mINFO�[m i.a.w.g.DefaultReplicationWorker(lambda$readFromSrcAndWriteToDstRunnable$6):321 - Records read: 100000 (55 MB)
2022-12-14 13:22:25 �[44msource�[0m > Read 100000 records from organizations stream
2022-12-14 13:22:25 �[44msource�[0m > Finished syncing organizations
2022-12-14 13:22:25 �[44msource�[0m > SourceZendeskSupport runtimes:
Syncing stream group_memberships 0:00:01.468627
Syncing stream groups 0:00:00.500665

As can be seen, the logs froze at 12:09:01, and did not update for over an hour. It also stopped the stream completely after 100k records were synced (see bottom rows), and I have 89k left.

There is a special thing about logs in Airbyte that I quite don't understand yet, they can freeze for a long time before showing but the end result is the same.

I experienced the same thing!

@emilmar
Copy link
Author

emilmar commented Dec 15, 2022

My next sync started the second after my first one finished, but it froze for over an hour before I cancelled it, I assume because I reached the API limit in organizations in my previous sync. The logs never updated for over 1 hour either

2022-12-14 13:24:28 destination > Storage Object airbyte-elt-bigquery-dataloading/zendesk/zendesk_organizations/2022/12/14/13/d5716793-e394-4c58-b59d-3197222937bd/ does not exist in bucket; creating...
2022-12-14 13:24:28 destination > Storage Object airbyte-elt-bigquery-dataloading/zendesk/zendesk_organizations/2022/12/14/13/d5716793-e394-4c58-b59d-3197222937bd/ has been created in bucket.
2022-12-14 13:24:28 destination > Preparing tmp tables in destination completed.
2022-12-14 14:45:59 INFO i.a.w.t.TemporalAttemptExecution(lambda$getCancellationChecker$6):231 - Running sync worker cancellation...```

@BenoitHugonnard
Copy link
Contributor

Are you using full refresh or incremental refresh for organizations ?

The issues I've been seeing concerning this issue would be the use of the count endpoint. Airbyte asks Zendesk how many Organizations there are and Zendesk answers something that can sometimes be not exact (in your example, Zendesk told Airbyte that there is only 100k organizations so Airbyte stops after that). I haven't spend time understanding how we could fix this part 😞

For the initial problem of freezing at the beginning, I encourage you to upgrade Airbyte version. I used to have the same issues with the previous scheduler and they changed it during the summer.

@emilmar
Copy link
Author

emilmar commented Dec 15, 2022

I am using incremental sync. Are you using the full refresh?

Ah, you've gone deeper into the code than I have, I had no idea it started by asking count of records. According to Zendesk API docs it may occasionally return 100k exactly whenever there is a recount going on in the background, perhaps that was the case here. The Airbyte UI says it's using update_at, but that's not true then (ref an earlier comment by you)

I am using Airbyte version 0.40.25

So a workaround here is to perhaps have a daily, separate sync, that is a full refresh?

@BenoitHugonnard
Copy link
Contributor

I'm using an incremental sync as well and not experiencing problems as of now but I haven't been using the new release for long enough.

@marcosmarxm
Copy link
Member

@emilmar can we close the issue or do you want to investigate further?

@emilmar
Copy link
Author

emilmar commented Dec 15, 2022

@marcosmarxm The original problem description is no longer valid, but I do consider the behavior of freezing logs for an hour at a time and failing to sync more records at random intervals a bug.

We could close this issue and I'll make a new one if the workaround of having a separate sync for organizations does not work appropriately. I have yet to make it work in the same sync.

I know too little of python development to investigate the issue further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants