-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source Mixpanel: Sync failure with high volume #8427
Comments
Another user declared having this bug on slack |
I could not reproduce this bug, because there are less data in our Mixpanel test account. Also decreased RAM in my laptop wsl (windows subsystem for linux) then ran the sync again, the same result. The next step is to create/import as much data as possible in Mixpanel account, ideally more than 1 million records for one stream, then retry reproducing. |
To reproduce the issue, 1.2 million records were imported for User Profiles (engage stream) by CSV file. |
@pranavhegde4 based on the latest comment it sounds like this hasn't been reproducible. Are you able to confirm that this issue still persists? |
@pranavhegde4 is there any updates? Can you confirm problem still persists? |
We had to limit the mixpanel ingestion so that it queries only 1 day's worth of data at a time. This has been working till now as there is less data in one day. But I suspect the issue still persists (unless any code change has been made) and we will encounter it if we try to ingest more than 1 day's worth of data (Maybe around 5 to 10 days at a time) |
Update: This issue is occurring again as even our 1 day's worth of data has now increased in volume and we aren't able to ingest data at all |
@pranavhegde4 can you please provide latest logs? |
This one |
@pranavhegde4 I could not find some memory bottleneck to fix But regarding to your issue - I've added |
Enviroment
Current Behavior
When syncing large number of rows from mixpanel to bigquery, the sync just fails in between.
Reducing date window size can help alleviate this problem. However I've noticed that even for data window size of 2,
if the number of rows in that window is greater than 3 million the same issue occurs.
I suspect its due to OOM as there is a brief spike in memory usage of the python process before its killed.
However the machine has 16 gb of ram, and on an average only around 6 gb was being used before the spike.
Another point of concern is that the ability to configure date window size was removed in MixPanel 0.1.6 and it defaults to 30 days.
Currently the only solution was to reduce date window size which is not possible any more.
Thus OOM is bound to happen as data will be synced for 30 days at once
Expected Behavior
Sync should succeed regardless of number of rows being synced in the date window
Logs
LOG
Steps to Reproduce
Are you willing to submit a PR?
Yes, but might take time as I am new to this.
The text was updated successfully, but these errors were encountered: