Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete data transfer when pulling over a large table #111

Open
munro opened this issue Jun 23, 2016 · 8 comments
Open

Incomplete data transfer when pulling over a large table #111

munro opened this issue Jun 23, 2016 · 8 comments

Comments

@munro
Copy link

munro commented Jun 23, 2016

create table foo as select * from mysql_remote.foo;

Ran this, and it returned successfully, but the pull was incomplete. Pulled down 148,8082 rows from the original 7,958,064 table. Just for context, this table stores message data around ~3 KiB in each column. MySQL says the table is 32.6 GiB (not sure if that's with indexes).

Here's the PG table size—
screen shot 2016-06-23 at 10 43 06 am

@ahsanhadi
Copy link

Can you please share your test case including your DDL and perhaps with some sample data that can help us reproduce the issue? Can you also do a select count(*) from both remote and local table after you have run the following statement.

create table foo as select * from mysql_remote.foo;

@munro
Copy link
Author

munro commented Jul 22, 2016

@ahsanhadi Sure! Yea it's really on me to make a reproducible failing test, isn't it? 😝 It may very well be just in the MySQL version I'm using

@ahsanhadi
Copy link

@munro Yes if you can share a sample test case, it will help narrow down and investigate the issue...

@munro
Copy link
Author

munro commented Oct 10, 2016

Here's the structure of the MySQL table [1], it's running on AWS's RDS MySQL InnoDB 5.6.21. I know this still leaves you in the dark—my next steps to try & reproduce this are to make a table with lots of rows, with a longtext field containing lots of data (similar to the statistics in that screenshot), and see if mysql_fdw successfully pulls it all in. The pure amount of rows & data in the longtext are what I assume to be causing it to fail.

[1]

CREATE TABLE `foo` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `created_on` datetime NOT NULL,
  `updated_on` datetime NOT NULL,
  `uuid` char(32) NOT NULL,
  `foo_type_id` int(11) NOT NULL,
  `from_email` varchar(255) NOT NULL,
  `subject` varchar(255) NOT NULL,
  `raw_email` varchar(200) NOT NULL,
  `plaintext` longtext NOT NULL,
  `status` int(11) NOT NULL,
  `test_group` varchar(25) DEFAULT NULL,
  `unique_id` varchar(255) NOT NULL,
  `from_name` varchar(70) NOT NULL,
  `group_id` int(11),
  `message_id` varchar(150),
  `campaign_id` varchar(255) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `uuid` (`uuid`),
  UNIQUE KEY `message_id` (`message_id`)
) ENGINE=InnoDB AUTO_INCREMENT=11370857 DEFAULT CHARSET=utf8;

@wwindcloud
Copy link

I have this similar problem lately and find out that if a column in mysql contains a huge text, mysql_fdw just simply ignore that row and the rows follow it. After some tracing, the problem can be remedied by changing the MAXDATALEN in mysql_fdw.h to a larger value.
Obviously, this is not a nice solution, I wander how this can be solved generally by dynamically increasing MAXDATALEN and check if MYSQL_DATA_TRUNCATED returned.

@eMerzh
Copy link

eMerzh commented Jul 12, 2018

Yes got the same issue with mediumtext , where rows get forgotten.
really annoying behavior, because its silent so i retrieved 6000 lines instead of 80 000.
Any way to replace the drop row by an error or smth at least ?

@lifeboy
Copy link

lifeboy commented Mar 5, 2019

Ah! thought I was alone with this problem, but it seems not :-)
wwindcloud, how were you able to trace this?

@wwindcloud
Copy link

Ah! thought I was alone with this problem, but it seems not :-)
wwindcloud, how were you able to trace this?

Yeah, it's really annoying. I fixed it by increasing MAXDATALEN, not nice but workable. I dig up the source code, notice a comment somewhere in mysqlIterateForeignScan at mysql_fdw.c, it recognize MYSQL_DATA_TRUNCATED, but did not really handle it properly, it says it will report this error as MYSQL_REPORT_DATA_TRUNCATION in option, not sure how that works.

Hopefully, someone with knowledge of both mysql and postgresql library submit a pull request on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants