-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source Zendesk Supports: fails to sync over 100k organization records. #14253
Comments
@marcosmarxm I think you may have mistyped another issue, the link you provided goes to this issue. |
thanks, linked to correct issue. |
How can we fix this ? I'm willing to look into it. |
Hi @BenoitHugonnard I'm not competent enough to fix this but I'm your biggest supporter in making it work. That's interesting what you say is the probable cause, could it perhaps be reported to Zendesk too? |
Found a very quick fix : What I found is that the _retry was never called because the response was always False when catching a 429. |
Oh my! Awesome, so response is False when it's 429? I guess I could change it locally and it would work... I'm just surprised nobody else encountered this issue so far. Is it due to us not having extended API limits? I mean, 100k organization's is not that rare... right? ;) Seems like the steps are few... probably a lot of boilerplate needed though! |
Yep response was false on 429. |
So every time the sync runs, even with this fix, it will be slow? |
Yes but at least it will work and not exit! The slowness is on Zendesk side, we are limited to 10 query per minute when we query pages higher than 1000 (100k orgz) |
@BenoitHugonnard can you send me a message in Slack so I can help you submitting a PR? |
@BenoitHugonnard Thanks for releasing a PR. I'm running it now to confirm the change is working as expected, is it working on your end? Progress on my end is that it has not updated the logs in Airbyte for 40 minutes, stuck at 70k records, but I can see in the Zendesk api monitor that Airbyte is making requests. EDIT: It finished after 1 hour! The airbyte logging froze for 80 minutes, and it did not sync a single organization record above 100k. My next sync is already freezing too. Not so sure I'm ready to close this issue as fixed. |
I'm pausing the sync for now, it caused trouble. Curious to hear your experience with it. |
@BenoitHugonnard would be nice to hear about your feedback here. |
On my side it did freeze indeed but it worked 🤔
|
The thing is after 100k organizations (1000 pages of 100 organizations) Zendesk switch from 700 requests/min to 10 requests/min. Airbyte choose to retry after 2x the time Zendesk asks a retry. So I have 120k organizations, very fast for 100k and then 1 min per 1k organizations so it would be 20 minutes but as Airbyte wait twice as long as recommended, 40 minutes. Exactly what the logs said. There is a special thing about logs in Airbyte that I quite don't understand yet, they can freeze for a long time before showing but the end result is the same. |
Happy to hear it worked out for you! Here's a shortened version of my logs
As can be seen, the logs froze at 12:09:01, and did not update for over an hour. It also stopped the stream completely after 100k records were synced (see bottom rows), and I have 89k left.
I experienced the same thing! |
My next sync started the second after my first one finished, but it froze for over an hour before I cancelled it, I assume because I reached the API limit in organizations in my previous sync. The logs never updated for over 1 hour either
|
Are you using full refresh or incremental refresh for organizations ? The issues I've been seeing concerning this issue would be the use of the count endpoint. Airbyte asks Zendesk how many Organizations there are and Zendesk answers something that can sometimes be not exact (in your example, Zendesk told Airbyte that there is only 100k organizations so Airbyte stops after that). I haven't spend time understanding how we could fix this part 😞 For the initial problem of freezing at the beginning, I encourage you to upgrade Airbyte version. I used to have the same issues with the previous scheduler and they changed it during the summer. |
I am using incremental sync. Are you using the full refresh? Ah, you've gone deeper into the code than I have, I had no idea it started by asking count of records. According to Zendesk API docs it may occasionally return 100k exactly whenever there is a recount going on in the background, perhaps that was the case here. The Airbyte UI says it's using update_at, but that's not true then (ref an earlier comment by you) I am using Airbyte version 0.40.25 So a workaround here is to perhaps have a daily, separate sync, that is a full refresh? |
I'm using an incremental sync as well and not experiencing problems as of now but I haven't been using the new release for long enough. |
@emilmar can we close the issue or do you want to investigate further? |
@marcosmarxm The original problem description is no longer valid, but I do consider the behavior of freezing logs for an hour at a time and failing to sync more records at random intervals a bug. We could close this issue and I'll make a new one if the workaround of having a separate sync for organizations does not work appropriately. I have yet to make it work in the same sync. I know too little of python development to investigate the issue further. |
First off - thanks for making a great product which has solved 90% of my headaches!
Environment
Current Behavior
Syncing over 100k organization records causes a 429 response code of rate limitation.
Response Code: 429, Response Text: {"detail":"Rate limited by Zendesk edge protection","status":"429","title":"Automated response"}
Expected Behavior
When reaching the rate limit, it should back off/slow down the requests and continue to sync the full dataset.
Logs
logs-37.txt
Steps to Reproduce
Are you willing to submit a PR?
No
The text was updated successfully, but these errors were encountered: