Trying to read a 14GB table fails without reading any rows. It seems to time out after 5 minutes while waiting for query results #8890

tuliren · 2021-12-18T00:31:35Z

Environment

Is this your first time deploying Airbyte: No
OS Version / Instance: macOS 11.6.1
Memory / Disk: 16GB / 500GB SSD
Deployment: Docker
Airbyte Version: 0.34.0-alpha
Source name/version: MySQL 0.4.13 (0.5.0 and 0.5.1 fail the connection test)
Destination name/version: S3 0.1.16
Step: On sync
Description: Extracting MySQL tables to S3 as parquet files with snappy compression is working for tables as large as 6GB so far, but trying on a 14GB table fails without reading any rows. It seems to time out after 5 minutes while waiting for query results

Problems

There are two issues:

When using MySQL 0.5.x, the connection check fails:

javax.net.ssl.SSLHandshakeException: No appropriate protocol (protocol is disabled or cipher suites are inappropriate)
When using MySQL 0.4.13, the connection check passes. However, when the database is relatively large, data sync fails before any row is read.

Expected Behavior

Tell us what should happen.

Logs

Slack thread.

The text was updated successfully, but these errors were encountered:

grishick · 2022-08-26T23:48:21Z

Assigning to GL to test using the new e2e testing tool.

Please create a data set (or a script that generates a data set, e.g. TPC-DI or TPC-DS)
create a test scenario
report on test results

DoNotPanicUA · 2022-09-06T13:59:39Z

The Simple sync scenario scenario is enough to reproduce the sync.
We need only credentials for the MySQL source with a 14GB dataset and the S3 destination.

grishick · 2022-09-07T01:25:45Z

@DoNotPanicUA @davinchia @evantahler I think this is a perfect use case for a perf test that should be run periodically to catch regressions. I think the right approach here would be to:

Save a data set in S3/GCP as a db dump that can be loaded into MySQL
create a GH action that spins up a MySQL DB, loads the dataset into it, runs the E2E tool, saves the results in GCS or S3 and emails them to DBS & DWs team

I am open to suggestions though and if you think we should have a quick call to lay out the solution - lets do that

DoNotPanicUA · 2022-09-07T06:41:02Z

@grishick
small input from my side.
We are currently developing a docker version of the E2E tool and the integration with GH actions. So, right now, we can focus on the dataset preparation, etc.
Then we can easily configure a GH action with the latest version.

P.S. we have plans to handle dataset management by the E2E tool in the roadmap, but there is a lot of stuff that needs to be done before :)

davinchia · 2022-09-07T21:06:49Z

@grishick Approach sounds sane to me.

Some thoughts:

instead of emailing we can do slack.
we can eventually reuse this dump for more than just mysql.

Sounds like we should wait for the E2E testing tool to be ready as per @DoNotPanicUA's comment to leverage that work.

grishick · 2022-09-07T21:50:51Z

@DoNotPanicUA that's great.

@davinchia I agree that Slack is better.

Yes, I think we should let the GL team make the E2E tool into a docker container, create a GH action, figure out how to spin up a external resources, such as a source or a destination database instance, figure out how to load test datasets and then we can have repeatable runs for data sets of various sizes and complexities.

davinchia · 2022-09-07T21:52:22Z

Great! Very excited!

@supertopher @git-phu @cpdeethree are around for any infra/tooling requests!

suhomud · 2022-09-15T10:17:50Z

@DoNotPanicUA @grishick, I think I'll open a sub tickets for a part which is related to the testing tool.

Regarding the issue itself I'm on it. Found that if I use local Mysql DB with a table size 16GB the sync passed successfully. So the issue happen only with Amazon RDS instance. My assumption is that something to do with RDS configs. Investigating it...

grishick · 2022-09-15T15:13:22Z

@DoNotPanicUA @grishick, I think I'll open a sub tickets for a part which is related to the testing tool.

Regarding the issue itself I'm on it. Found that if I use local Mysql DB with a table size 16GB the sync passed successfully. So the issue happen only with Amazon RDS instance. My assumption is that something to do with RDS configs. Investigating it...

Makes sense. There is also networking and RDS-special configs involved.

suhomud · 2022-09-16T12:18:15Z

Some new inputs from my side. I was able to pass sync with RDS MySql and S3 by changing PreparedStatement fetch size:
preparedStatement.setFetchSize(Integer.MIN_VALUE);
This allows to pass replication but it means that we are fetching one row at a time which is affects the time of sync significantly. Finding from Mysql Docs:
ResultSet

By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate and, due to the design of the MySQL network protocol, is easier to implement. If you are working with ResultSets that have a large number of rows or large values and cannot allocate heap space in your JVM for the memory required, you can tell the driver to stream the results back one row at a time.

To enable this functionality, create a Statement instance in the following manner:


stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,  java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);

I have some idea to check Local sync with resources similar to Airbyte Cloud. In case if it is the resource issue then we can solve this issue by increasing connectors resource

grishick · 2022-09-16T20:36:06Z

@tuliren @suhomud my understanding is that fetch size is set dynamically based on the estimation of an average row size. So, what I wonder is"

Why doesn't dynamic fetch size work in this scenario and how can we make it work?
How large is a single row and what is the distribution of row sizes in this table?
How the behavior of this scenario changes when we increase memory allocation for the source container to 1GB, 1.5GB, 2GB?
Is the data set in local test exactly the same as the data set in RDS test?
If the problem is that moving a result set over network takes more than 5 minutes - is there a timeout setting we can adjust?

suhomud · 2022-09-19T05:05:01Z

@grishick
I made a local db with a table with one million records and size is 14344 MB so each record is near 14 kB. I put the same resource quota as we have in Airbyte prod env:

JOB_MAIN_CONTAINER_CPU_REQUEST=2
JOB_MAIN_CONTAINER_CPU_LIMIT=2
JOB_MAIN_CONTAINER_MEMORY_REQUEST="200Mi"
JOB_MAIN_CONTAINER_MEMORY_LIMIT="200Mi"

NORMALIZATION_JOB_MAIN_CONTAINER_MEMORY_LIMIT: "1.5Gi"
NORMALIZATION_JOB_MAIN_CONTAINER_MEMORY_REQUEST: "1.5Gi"

Is it right value for prod env? Do we have a separate memory quota for Mysql?

The local sync failed. It looks like fetch size has been ignored and select query trying to fetch all in memory.
There are two things which I will test:

stmt.setFetchSize(Integer.MIN_VALUE) it disables client-side caching. What I'm going to check is the sync performance impact
Also I'll test increasing the resource quota. I assume this will also work but it can be an issue with table larger then 14GB

suhomud · 2022-09-19T13:34:58Z

@grishick @tuliren ,
Let me summarise what I found.

MySQL JDBC driver does not respect fetch size ⛔️

This part is ignored by mysql driver:
preparedStatement.setFetchSize(FetchSizeConstants.INITIAL_SAMPLE_SIZE);
https://github.com/airbytehq/airbyte/blob/master/airbyte-db/db-lib/src/main/java/io/airbyte/db/jdbc/streaming/AdaptiveStreamingQueryConfig.java#L36
Mysql connector trying to fetch all table records in memory.

JOB_MAIN_CONTAINER_MEMORY_REQUEST="500Mi"
JOB_MAIN_CONTAINER_MEMORY_LIMIT="500Mi"
fetchSize = 10

fetch_size_10_500MB.txt

Increasing Mysql connector memory does not solve the issue ⛔️
I tried to add connector memory to 1 GB but it only increase a waiting time before it failed. I have a guess that because we are using JDBC param ?useCursorFetch=true here we stressed RDS MySql DB since it tells to use server-side cursors however fetch size still ignored 🙉

JOB_MAIN_CONTAINER_MEMORY_REQUEST="1Gi"
JOB_MAIN_CONTAINER_MEMORY_LIMIT="1Gi"
fetchSize = 10

fetch_size_10_1GB.txt

Destination S3 connector is fail with OutOfMemoryError ⛔️

I changed fetch size to preparedStatement.setFetchSize(Integer.MIN_VALUE). This solved the issue with mysql-soruce memory issue but destination-S3 failed to consume 200 MB of data with OutOfMemoryError

JOB_MAIN_CONTAINER_MEMORY_REQUEST="500Mi"
JOB_MAIN_CONTAINER_MEMORY_LIMIT="500Mi"
fetchSize = Integer.MIN_VALUE

fetch_size_Integer.MIN_VALUE_500MB.txt

Working sync ✅🥇
The sync was successful with next changes:

JOB_MAIN_CONTAINER_MEMORY_REQUEST="700Mi"
JOB_MAIN_CONTAINER_MEMORY_LIMIT="700Mi"
fetchSize = Integer.MIN_VALUE

fetch_size_Integer.MIN_VALUE_700MB.txt

My proposal will be:

Add custom MySqlStreamingQueryConfig which will extend AdaptiveStreamingQueryConfig and override initialize method to use preparedStatement.setFetchSize(Integer.MIN_VALUE) for mysql only.
Set 700Mi resource quota for Mysql Source and S3 Destination

Please let me know you thoughts/ideas 🤔

DoNotPanicUA · 2022-09-19T16:58:41Z

Please note that preparedStatement.setFetchSize(Integer.MIN_VALUE) will affect the source perfomance.

suhomud · 2022-09-19T17:08:01Z

Please note that preparedStatement.setFetchSize(Integer.MIN_VALUE) will affect the source perfomance.

@DoNotPanicUA I'm not sure about previous metrics The one which I found is this.
In my case the sync took near 1.3 hours for 14GB table with 8 columns.
One finding from Stack Overflow:
setFetchSize(Integer.MIN_VALUE) is not as bad as the docs might imply. It actually just disables client-side caching of the entire response and gives you responses as they arrive; there is no round-trip per row required.

DoNotPanicUA · 2022-09-19T17:46:23Z

If we stop using caching somewhere, it will work less efficiently. The question is how dramatic it will be.
To figure out, we need to compare two runs with the different FetchSize. May I ask you to run them as you already did it before?

suhomud · 2022-09-19T18:54:49Z

If we stop using caching somewhere, it will work less efficiently. The question is how dramatic it will be.
To figure out, we need to compare two runs with the different FetchSize. May I ask you to run them as you already did it before?

I run it first without any connectors limitation so it was docker limitation in my case 8 GB for each connectors. Also the DB was local which CPU much better then on RDS. I noticed that when I use default fetch size (10) it stress CPU much because of ?useCursorFetch=true. Let me try to find the connector resources which will be enough for the sync and next step will be changing RDS resources to handle select * from table with ?useCursorFetch=true

suhomud · 2022-09-20T07:56:28Z

Benchmark

PreparedStatement FetchSize Integer.MIN_VALUE Local MySql DB instance

Database: Local MySql DB instance
Connectors: Mysql Source -> S3 Destination 700MB memory limit
Table: 12 - 14 GB, 1 000 000 records
PreparedStatement FetchSize: Integer.MIN_VALUE
Result: 32m

Note: This approach does not load DB CPU and disk memory

PreparedStatement FetchSize 10 Local Mysql DB instance
Database: Local MySql DB instance
Connectors: Mysql Source -> S3 Destination 1GB memory limit
Table: 12 - 14 GB, 1 000 000 records
PreparedStatement FetchSize: 10
Result: 37m

Note: this approach load CPU and disk memory of local MySql DB

PreparedStatement FetchSize Integer.MIN_VALUE RDS MySql instance

Database: MySql RDS instance
Connectors: Mysql Source -> S3 Destination 700MB memory limit
Table: 12 - 14 GB, 1 000 000 records
PreparedStatement FetchSize: Integer.MIN_VALUE
Result: 1 h 33m

PreparedStatement FetchSize 10 RDS Mysql DB
Database: MySql RDS instance
Connectors: Mysql Source -> S3 Destination 1GB memory limit
Table: 12 - 14 GB, 1 000 000 records
PreparedStatement FetchSize: 10
Result: The sync failed with The last packet successfully received from the server was 300,800 milliseconds ago
I tried with two RDS DB type:

and:

Also changed default RDS mysql configs with custom Parameter Group to override different timeouts param like wait_timeout, net_read_timeout but was not able to pass sync successfully.

Summary

The FetchSize Integer.MIN_VALUE sync was faster then FetchSize=10 ( 32m VS 37m) with local Mysql DB instance.
With FetchSize 10 clients must deal with DB configuration to be able to make successful sync because this approach use DB cursor ?useCursorFetch=true it consume DB disk memory in my case it was near 12 GB for 12 GB table

I think we should definitely go with FetchSize Integer.MIN_VALUE because performance did not change and maybe even improved
@DoNotPanicUA @grishick , please let me know your opinion on this

DoNotPanicUA · 2022-09-20T08:40:59Z

@suhomud, thank you for such a stunning report 👍
It looks like the cache stores the result in data locally to send in chunks. When you run source locally (near Airbyte), we get a boost because network traffic is minimal, and we skip storing data into cache files.
So, we can use FetchSize Integer.MIN_VALUE safely if we consider that user sources are not far from their Airbyte instances.

grishick · 2022-09-21T13:31:29Z

These are interesting findings. Most of the time Airbyte will not be colocated with the database and it is never colocated in case of Airbyte Cloud, so the defaults should optimize for remote DB. I would like understand more about the failure in the 4th scenario (remote DB and FetchSize = 10).

Could you capture the CPU and memory utilization graph from RDS during the sync job?
Could you post the entire stack trace with "The last packet successfully received from the server was 300,800 milliseconds ago" error?
Could you also document the values for wait_timeout and net_read_timeout?

cc @davinchia @evantahler @bleonard tagging you here, because you expressed interest in the project

suhomud · 2022-09-26T05:59:50Z

@grishick,

The resources.

Before the sync:

After the sync:

CPU:

Memory RAM:

NOTE: After sync has failed and canceled the memory has not been released and only rebooting helped
Memory HD:

Log file
Here is the list of params that I changed for RDS:

connect_timeout = 1200
innodb_lock_wait_timeout = 1200
interactive_timeout = 1200
net_read_timeout = 1200
wait_timeout = 1200

And RDS Instance is:

Do you have an idea what else can be changed for RDS?

cc @DoNotPanicUA @evantahler @bleonard

suhomud · 2022-09-26T07:53:11Z

Update. I was able to start sync successfully with fetch size 10 by increasing RDS instance Memory HD from 50 GB to 65GB. Let me finish the sync I'll post the result of this sync

suhomud · 2022-09-26T12:29:10Z

So it seems like extra disc memory solved issue particularly. The sync is still in progress. For 5 hours it fetch 2 GB only.
With FetchSize Integer.MIN_VALUE the full sync was done for 1.5 hours

grishick · 2022-09-26T17:30:52Z

@suhomud AFAIK Integer.MIN_VALUE is a negative number - what does it mean for a MySQL JDBC driver to set fetch size as a negative number? My guess is that this setting is equivalent to not setting fetch size at all or telling MySQL to automatically set it. @tuliren do you have any insight on this?

grishick · 2022-09-26T17:33:11Z

@suhomud nevermind my question - I see you have already answered it previously with a quote from MySQL docs

grishick · 2022-09-26T17:38:18Z

Looking at the test results above, I am leaning towards agreeing with the conclusion that we have to set fetch size to Integer.MIN_VALUE for MySQL and do not attempt to set it dynamically.

suhomud · 2022-09-27T18:21:07Z

@grishick I made a Draft PR with required change #17236
Also asked @alexandr-shegeda to make a test sync with this change since I'm located in Asia region now and maybe it is affecting the Amazon availability for me.

suhomud · 2022-09-28T14:47:15Z

Hi @grishick, the PR is ready for review.
We made additional tests. I did not notice a big difference with fetching time for both approaches. However the fetch size 10 was really slow to start the sync and loads RDS resources disk and memory. With Integer.MIN_VALUE the sync starts immediately with big tables and does not have affect on DB

tuliren added type/bug Something isn't working needs-triage area/connectors Connector related issues area/databases labels Dec 18, 2021

tuliren self-assigned this Dec 18, 2021

sherifnada removed the needs-triage label Dec 22, 2021

igrankova added connectors/sources-database connectors/source/mysql labels Jan 12, 2022

bleonard added autoteam team/databases labels Apr 26, 2022

grishick changed the title ~~MySQL source cannot connect (0.5.x) or sync (0.4.13) data~~ 🐛MySQL source cannot connect (0.5.x) or sync (0.4.13) data May 3, 2022

grishick changed the title ~~🐛MySQL source cannot connect (0.5.x) or sync (0.4.13) data~~ Trying to read a 14GB table fails without reading any rows. It seems to time out after 5 minutes while waiting for query results Aug 26, 2022

grishick added team/connectors-java and removed team/databases autoteam labels Aug 26, 2022

suhomud self-assigned this Sep 6, 2022

grishick added the from/connector-ops label Sep 27, 2022

suhomud linked a pull request Sep 28, 2022 that will close this issue

🎉 8890 Source MySql: Fix large table issue by fetch size #17236

Merged

suhomud mentioned this issue Oct 3, 2022

🎉 8890 Source MySql: Fix large table issue by fetch size #17236

Merged

suhomud closed this as completed in #17236 Oct 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trying to read a 14GB table fails without reading any rows. It seems to time out after 5 minutes while waiting for query results #8890

Trying to read a 14GB table fails without reading any rows. It seems to time out after 5 minutes while waiting for query results #8890

tuliren commented Dec 18, 2021

grishick commented Aug 26, 2022

DoNotPanicUA commented Sep 6, 2022

grishick commented Sep 7, 2022

DoNotPanicUA commented Sep 7, 2022 •

edited

davinchia commented Sep 7, 2022

grishick commented Sep 7, 2022

davinchia commented Sep 7, 2022

suhomud commented Sep 15, 2022

grishick commented Sep 15, 2022

suhomud commented Sep 16, 2022

grishick commented Sep 16, 2022

suhomud commented Sep 19, 2022 •

edited

suhomud commented Sep 19, 2022 •

edited

DoNotPanicUA commented Sep 19, 2022

suhomud commented Sep 19, 2022 •

edited

DoNotPanicUA commented Sep 19, 2022

suhomud commented Sep 19, 2022 •

edited

suhomud commented Sep 20, 2022 •

edited

DoNotPanicUA commented Sep 20, 2022

grishick commented Sep 21, 2022

suhomud commented Sep 26, 2022 •

edited

suhomud commented Sep 26, 2022

suhomud commented Sep 26, 2022 •

edited

grishick commented Sep 26, 2022

grishick commented Sep 26, 2022

grishick commented Sep 26, 2022

suhomud commented Sep 27, 2022

suhomud commented Sep 28, 2022 •

edited

Trying to read a 14GB table fails without reading any rows. It seems to time out after 5 minutes while waiting for query results #8890

Trying to read a 14GB table fails without reading any rows. It seems to time out after 5 minutes while waiting for query results #8890

Comments

tuliren commented Dec 18, 2021

Environment

Problems

Expected Behavior

Logs

grishick commented Aug 26, 2022

DoNotPanicUA commented Sep 6, 2022

grishick commented Sep 7, 2022

DoNotPanicUA commented Sep 7, 2022 • edited

davinchia commented Sep 7, 2022

grishick commented Sep 7, 2022

davinchia commented Sep 7, 2022

suhomud commented Sep 15, 2022

grishick commented Sep 15, 2022

suhomud commented Sep 16, 2022

grishick commented Sep 16, 2022

suhomud commented Sep 19, 2022 • edited

suhomud commented Sep 19, 2022 • edited

DoNotPanicUA commented Sep 19, 2022

suhomud commented Sep 19, 2022 • edited

DoNotPanicUA commented Sep 19, 2022

suhomud commented Sep 19, 2022 • edited

suhomud commented Sep 20, 2022 • edited

Benchmark

Summary

DoNotPanicUA commented Sep 20, 2022

grishick commented Sep 21, 2022

suhomud commented Sep 26, 2022 • edited

Before the sync:

After the sync:

suhomud commented Sep 26, 2022

suhomud commented Sep 26, 2022 • edited

grishick commented Sep 26, 2022

grishick commented Sep 26, 2022

grishick commented Sep 26, 2022

suhomud commented Sep 27, 2022

suhomud commented Sep 28, 2022 • edited

DoNotPanicUA commented Sep 7, 2022 •

edited

suhomud commented Sep 19, 2022 •

edited

suhomud commented Sep 19, 2022 •

edited

suhomud commented Sep 19, 2022 •

edited

suhomud commented Sep 19, 2022 •

edited

suhomud commented Sep 20, 2022 •

edited

suhomud commented Sep 26, 2022 •

edited

suhomud commented Sep 26, 2022 •

edited

suhomud commented Sep 28, 2022 •

edited