Adaptor: Add MySQL #515

atomicules · 2022-04-21T16:23:28Z

I think this is at the point now where it's ready for review even if we don't merge yet.

There is obviously a LOT to review so will try to go over the main things worth looking at and considering:

This MySQL adaptor was built using the Postgresql one as a basis
You can follow the commit history to see how it was developed. But basically I copied the basic files over (client.go, mysql.go, reader.go, session.go, writer.go) from Postgresql (obviously renamed to mysql.go) and then added a test file at a time and used the tests to help me develop each file. Started with client first as that was easiest
Does it make sense that we have all this special handling in reader_test.go whereas Postgresql doesn't? I'm undecided. On the one hand I think this shouldn't be required as reader.go should do whatever is necessary, but on the other hand we don't necessarily specify how we are storing things at the Transporter level (i.e. compatibility layer between all adaptors, Golang native types). Maybe that should be in casifyValue instead? Could do with another pair of eyes, etc on this. In writing that comment I talked myself into changing things. See: c3fabd2 but I still welcome someone else's opinion on this.
It's worth looking through all the comments to see if that gives further things to think about. There are a lot of comments in tailer.go
Is any of the resumemap stuff relevant? To Postgresql either?
Uses go-mysql-org/go-mysql as a driver over go-sql-driver/mysql because the former supports replication. Currently references a specific commit rather than a release tag as TLS support for the driver has only just been merged in.
There are some TODOs left in the code, but I don't think these are things that need fixing right now. They are things that could be followed up with later. But you might disagree. Most are in tailer.go.
Performance is probably similar to Postgresql. I.e. we aren't doing any bulk inserts yet.
It has been tested with this test dataset copying between two DBaaS providers MySQLs using TLS connections and with tailing.
It runs SET FOREIGN_KEY_CHECKS=0; at the start of a session on the sink
It requires the connecting user to be able to read the binlog for tailing. Some DBaaS providers might not provide this right now
It's only been tested with a mysql sink and source. Unsure how it functions between other DB types. I don't think that should stop merging, but I could perhaps add a caveat to the mysql README about that.
Regarding handling binlog rotation, testing has shown that the replication/binlogsyncer handles this
Per Postgresql (and MongoDB?) need to create the destination database and table structure first. We could handle this in Transporter, but per the other adaptors probably makes sense to handle that in some other place, e.g. Code that references Transporter.

Fixes: #76

Required for all PRs:

CHANGELOG.md updated (feel free to wait until changes have been reviewed by a maintainer)
README.md updated (if needed)

- Copy files, don't bother with tests except client for now - Change names from Postgresq, pq, etc to MySQL, etc - Use github.com/go-sql-driver/mysql - Don't bother with tailing for now - Move the `casifyValue` function from tailing.go to reader.go since it seems it's needed even without tailing. It "builds", but the tests fail with the below currently: ``` $ go117 test --- FAIL: TestConnect (0.03s) panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x11078d3] goroutine 7 [running]: testing.tRunner.func1.2({0x123b0a0, 0x143e550}) /opt/pkg/go117/src/testing/testing.go:1209 +0x24e testing.tRunner.func1() /opt/pkg/go117/src/testing/testing.go:1212 +0x218 panic({0x123b0a0, 0x143e550}) /opt/pkg/go117/src/runtime/panic.go:1038 +0x215 database/sql.(*DB).conn(0x0, {0x12ca330, 0xc00001e098}, 0x1) /opt/pkg/go117/src/database/sql/sql.go:1260 +0x53 database/sql.(*DB).PingContext(0x1278b3b, {0x12ca330, 0xc00001e098}) /opt/pkg/go117/src/database/sql/sql.go:853 +0x7a database/sql.(*DB).Ping(...) /opt/pkg/go117/src/database/sql/sql.go:874 github.com/compose/transporter/adaptor/mysql.(*Client).Connect(0x143f8e0) /Users/simon/Code/go/src/github.com/compose/transporter/adaptor/mysql/client.go:76 +0xdc github.com/compose/transporter/adaptor/mysql.TestConnect(0xc000111380) /Users/simon/Code/go/src/github.com/compose/transporter/adaptor/mysql/client_test.go:76 +0xfe testing.tRunner(0xc000111380, 0x1285c58) /opt/pkg/go117/src/testing/testing.go:1259 +0x102 created by testing.(*T).Run /opt/pkg/go117/src/testing/testing.go:1306 +0x35a exit status 2 FAIL github.com/compose/transporter/adaptor/mysql 0.832s ``` However... that is to be expected. I haven't altered anything else to be MySQL specific and don't have a local instance of MySQL running.

- Use `test` db instead of `mysql`. Perhaps just temporarily, but the default local install I have has `test` available and accessible to the standard user where `mysql` is protected - Don't completely ignore the error with `sql.Open`, especially whilst developing - Use a minimal default URI of `/` - Since `mysql://` is not valid in a DSN, add this in at the `url.Parse` phase.

Just `/` worked for the client test, but with the reader tests we'll run into issues like this: ``` panic: default addr for network '127.0.0.1' unknown ``` Using 127.0.0.1 or `localhost` if we don't specify the protocol. And if we are doing it for the reader tests we might as well do it for the default. --- Since everything else in the world assumes a prefix we can strip off the prefix if one is supplied in the URI. Allows better consistency between Postgresql and MySQL

- Add reader and adaptor tests copied from Postgresql - Comment out writer and tailing stuff for now - Update README to include some notes on Postgresql element types (from the point of view of MySQL) - Update the `iterareTable` query to suit MySQL - Other minor tweaks in reader.go to change from Postgresql to MySQL Now the tests run and fail the next step is to tweak to make MySQL suitable - the existing SQL and data types are still for Postgresql. E.g: ``` ERRO[0000] unexpected Insert error, Error 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'at time zone 'utc' -- coltimestamp TIMESTAMP, )' at line 4 ``` and ``` ERRO[0000] unexpected Insert error, Error 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'at time zone 'utc', -- coltimestamp TIMESTAMP, '{1, 2, 3, 4}', ' at line 4 ```

Getting closer. Worked through this list: https://dev.mysql.com/doc/refman/5.7/en/data-types.html And defined all the column types in order at the top of `adaptor_test.go` and then in the same order at the bottom of the file. It's now progressing further as doesn't complain about inserting data, but is failing to read the row count for some reason - although data is there. That probably means `reader.go` needs altering if it's not pulling data. Still need to update `reader_test.go` similarly. Some types I've had to drop out for now as realised I'm only running 5.6 locally.

I still know I need to update `reader_test.go` per the changes to `adaptor_test.go`. Figured out why it wasn't reading the rows in. It was still including a `public` prefix for Postgresql. For MySQL we need the database name, so changed that - was actually just in the test file not `reader.go`. Also realised I should filter out `performanceSchema`. Have some other changes after this commit, but not tidied up enough yet.

- Use Errorf instead of Fatalf so we can see all the ones that are failing - Update TestReadComplex per changes in adaptor_test (still was Postgresql based) - Remove Array stuff from casifyValue since N/A - Make some changes / additions in casifyValue to suit MySQL. Some stuff still isn't right (bit for example) - Can't get the newline in TEXT to work yet so remove that from test - Fix some column name typos - Add some notes on TestComplexRead to README. Here's the current failures: ``` reader_test.go:121: Expected coltimestamp of row to equal 2021-12-13 16:53:31.501985 +0000 GMT m=+0.091384396 (time.Time), but was 2021-12-13 16:53:31 +0000 UTC (time.Time) reader_test.go:121: Expected colpolygon of row to equal (0 0,10 0,10 10,0 10,0 0),(5 5,7 5,7 7,5 7, 5 5) (string), but was $@$@$@$@@@@@@@@@@@ (string) reader_test.go:121: Expected colbit of row to equal 000101 (string), but was (string) reader_test.go:121: Expected colgeometrycollection of row to equal POINT(1 1),LINESTRING(0 0,1 1,2 2,3 3,4 4) (string), but was ????????@@@@ (string) reader_test.go:121: Expected colbinary of row to equal 3735928559 (int), but was ޭ?? (string) reader_test.go:121: Expected id of row to equal 0 (int64), but was 1 (int64) reader_test.go:121: Expected colblob of row to equal 3735928559 (int), but was ޭ?? (string) reader_test.go:121: Expected colpoint of row to equal (15 15) (string), but was .@.@ (string) reader_test.go:121: Expected collinestring of row to equal (0 0,1 1,2 2) (string), but was ????@@ (string) ```

Had not paid attention, but autoincrement wasn't working properly because we were overriding by inserting a value. Insert NULL instead and move the `%d` to the colinteger column.

Postgresql doesn't do it and we can't really do it comparing with `now()`.

Reads a PNG image and stores in MySQL. Nice as a proper test, but maybe a bit much for long-term. Will keep whilst developing though. I wanted to read the file into Go as a string and then store that "string" in MySQL, but I couldn't get it to work that way. I had to use MySQL `LOAD_FILE` instead. Copy/pasted the `io.Copy` code from https://opensource.com/article/18/6/copying-files-go

io.Copy seems to have stopped working for some reason - it just copies 0 bytes. Quite happily without error.

Figure something out for this. I'm honestly not sure how best to handle this (and similar bit and spatial stuff). I.e. should we actual do some decoding in reader? Or just do per this commit and manipulate stuff in testing just so we can compare values? Will probably come back and look at this again later on.

This "works", but is not nice so hopefully will clean this up, but committing whilst it works.

Doesn't work, I get: ``` reader_test.go:143: Expected colpoint of row to equal POINT(15 15) (string), but was [7.291122019556398e-304 2.5124430165029e-310] (orb.Point) ``` But commiting for reference as one way to try to use paulmach/orb. There are other options. Maybe for Spatial stuff we do need to process in reader.go though and not in the test. That's going to take me a bit to think about though.

Same difference really. This gives: ``` reader_test.go:144: Expected colpoint of row to equal POINT(15 15) (string), but was &{{1 2 [7.291122019556398e-304 2.5124430165029e-310] 0}} (*geom.Point) ``` It's interesting that both at recognising it's a point, but the value is way off. Maybe it's close to working though?

It works and passes... but needs a lot of tidying up. Commiting before I change and accidentally break it (and then forget how I got it working). One advantage of go-geom over orb is that is has wkbhex functions... If we get the Hex string from MySQL then we get it exactly as [per this doc][1]. We can then use wkbhex to decode to go-geom geometry format and a further function to change it to wkt Obviously there are various ways we could approach this. Instead of changing the value from MySQL to wkt we could leave it as go-geom and then make sure the value we are testing it against is also go-geom. At some point we probably want to store/reference spatial stuff internally using go-geom (or Orb - personally I like the look of Orb more, but go-geom is working) as that'll probably help with Postgresql (and transferring Spatial data between Postgresql and MySQL). Made some minor spacing changes to suit how go-geom converts stuff to WKT. [1]: https://dev.mysql.com/doc/refman/5.6/en/gis-data-formats.html

Was bugging me.

Loooooooooots of tidying up and future changes will be needed, but it feels like we are getting somewhere. Can definitely move on to the write tests soon.

No functional changes.

Copied from Postgresql and making the obvious changes (ugh... I really should have done commits before changing anything... oh well): - Changing the NewClient urls - Changing pqSession to mysqlSession - Changing from "public.table" to "database.table" Still lots here that is Postgresql specific though. Working on just one writer test at a time so have uncommented writerTestData in adaptor_test for now. This is where things are at the moment: ``` INFO[0000] no function registered for operation, command INFO[0000] no function registered for operation, command --- FAIL: TestInsert (0.11s) writer_test.go:62: unexpected Insert error, Error 1054: Unknown column '$1' in 'field list' [...] writer_test.go:93: Error on test query: sql: no rows in result set ```

Two main things needed changing: 1. With Postgresql we do this: INSERT INTO writer_insert_test.simple_test_table (id, colvar, coltimestamp) VALUES ($1, $2, $3) But with MySQL it needs to be a bit different: INSERT INTO writer_insert_test.simple_test_table (id, colvar, coltimestamp) VALUES (?, ?, ?) I thought ordering might be an issue, but seems not. 2. We need to use `parseTime=true` in the DSN to get the Timestamp automagically recongnised. [See docs][1]. [1]: https://github.com/go-sql-driver/mysql#timetime-support

Missed these before moving onto Writer stuff.

I think I'm mostly struggling with the Spatial / Geometry stuff - although since it randomises the insert order I could still have other bits wrong (binary, blob). I don't know how to get the geometry it in a form that MySQL expects via `sql.Exec`. I had initially tried providing like this: ``` "colpoint": "ST_GeomFromText('POINT (15 15)')" ``` But of course that gets interpreted as a string. I then thought I needed to do the reverse of what I do in reader_test.go and get it in byte format (per this commit), but that's not right either. I think I need to actually handle in writer.go? Not sure... ``` _, err := s.Exec(query, data...) ``` How to get in the correct form in `data`?

Honestly... a 50 character limit on commit titles is shit. The above is a bit misleading. This is better: Store geometry values in a Go geometry format so we can inspect their type in writer.go and wrap in `ST_GeomFromText` for the INSERT. This is currently failing with: ``` --- FAIL: TestComplexInsert (0.02s) writer_test.go:171: unexpected Insert error, Error 1416: Cannot get geometry object from data you send to the GEOMETRY field ``` But it seems like it should work. I've verified a manual: ``` INSERT INTO writer_complex_insert_test.complex_test_table (...) VALUES (...actual values...) ``` using the text dumped out and all the `ST_GeomFromText` bits do work and do insert ok. So I'm not really sure where it's going wrong at the moment. Guess I need to dig into the underlying driver and understand what it's actually doing.

Seeing this error: ``` writer_test.go:171: unexpected Insert error, Error 1406: Data too long for column 'colbinary' at row 1 ```

Figured it out. Rather than do/generate this: ``` INSERT INTO writer_complex_insert_test.complex_test_table (colpoint) VALUES (?) ``` And send `data` like so: ``` ST_GeomFromText('POINT (15 15)') ``` Do this approach instead: ``` INSERT INTO writer_complex_insert_test.complex_test_table (colpoint) VALUES (ST_GeomFromText(?)) ``` with `data` like: ``` "POINT (15 15)" ``` This approach (wrapping the placeholders) works a treat. The tests still aren't passing, but they aren't failing where they were and are getting to this point: ``` --- FAIL: TestComplexInsert (0.03s) writer_test.go:182: Error on test query: sql: no rows in result set ```

Whoop! 1. Add back in the 10 total inserts so the checks based on these pass 2. Need the `parseTime=true` otherwise get: writer_test.go:182: Error on test query: sql: Scan error on column index 2, name "coltimestamp": unsupported Scan, storing driver.Value type []uint8 into type *time.Time

I.e. don't know how it will work with other adaptors.

atomicules · 2022-04-21T16:27:04Z

Bah. gofmted everything, but still need to make some linting changes it seems.

Hopefully fix. I am having issues running the linter locally.

Or hopefully as can't lint locally.

This is just a hack to work around the linter, but I geniunely don't see the benefit in what the linter is saying.

atomicules · 2022-04-21T20:32:23Z

🎉 Checks passing.

Added TODOs where I've missed error handling and we could probably do with fixing that. At some point.

atomicules · 2022-04-23T14:18:31Z

Checks are now failing. Github actions have switched from focal to jammy and weirdly that caused a failure. Fix here: #516

The upstream PR was merged in, until a new release of that is cut reference by commit ID. Needs slight change to the use of `SetCustomTLSConfig` as part of the PR review required a change to that function to return an error.

adaptor/mysql/reader.go

MySQL doesn't support Json arrays. Well, at least not in the same way as Postgresql. This was copied over from Postgresql (see reference below). However... realised I _do_ need to add Json support as I'd skipped it initially as was developing against 5.6 which doesn't have Json support. References: #315

I don't really care about handling/checking errors in the test files, because the tests will fail if it goes wrong. That's my reasoning anyway.

This reverts commit 9b6a8c0. This wasn't for Json arrays. This is what handles items like: ``` "coljson": map[string]interface{}{"name": "batman"}, ``` and we might as well keep compatibility with how Postgresql works even though writing and reading Json strings on MySQL is actually straight forward.

With MySQL Json [seems pretty easy][1] and we can insert strings like so: ``` "{\"name\": \"batman\", \"sidekick\": \"robin\"}" ``` But for compatibility with the Postgresql adaptor added back in the stuff that lets us insert data like this: ``` map[string]interface{}{"name": "batman"}, ``` Add a comment explaining what: ``` case map[string]interface{}, mejson.M, []map[string]interface{}, mejson.S: ``` Now I actually know what this is - it's not handling array data like I thought. This also helps tidy up a lot of the TODO error handling. [1]: https://dev.mysql.com/doc/refman/5.7/en/json.html

After playing about and looking at this more, I _think_ that code (which we copied from Postgresql) was for handling data like: ``` "colarrayint": []interface{}{1, 2, 3, 4}, ``` Which definitely isn't applicable to MySQL as it doesn't have array support like Postgresql. We can always revert this at a later point if it turns out we do need it. But it doesn't stop the tests passing right now.

I.e. don't "make the tests pass" and do it properly. In writing a comment on the PR I talked myself round to doing this. I'd written this: > Does it make sense that we have all this [special handling in > `reader_test.go`][1] whereas Postgresql doesn't? I'm undecided. On > the one hand I think this shouldn't be required as reader.go should > do whatever is necessary, but on the other hand we don't necessarily > specify how we are storing things at the Transporter level (i.e. > compatibility layer between all adaptors, Golang native types). Maybe > that should be in [`casifyValue` instead][2]? Could do with another > pair of eyes, etc on this. And then decided to handle things properly as much as possible. I.e. it does make sense for `casifyValue` to do this as we'd ultimately want the adaptor to pass that off to another adaptor, etc. I think what I have here is correct. Think. There is one "hack" left in `reader_test.go` for the `colbinary` value of `0xDEADBEEF` and my reasons for that are explained in the comments. It doesn't stop binary data from working properly / as intended since the colblob test passes. [1]: https://github.com/compose/transporter/blob/9d6d829ed2c2c71688c4d7e427147d83c228ac1b/adaptor/mysql/reader_test.go#L134-L167 [2]: https://github.com/compose/transporter/blob/9d6d829ed2c2c71688c4d7e427147d83c228ac1b/adaptor/mysql/reader.go#L201-L231

pete-may

LGTM!!!

Tested ./transporter init mysql mysql works
Tested moving a couple rows between two Compose mysqls works
Tested tailing between two mysqls works

Code looks really clean and you seem to have covered more things than I would ever have even imagined as a first pass at getting this working (handling spatial objects). So I say let's merge this monster. Can always fix bugs as they come up.

One suggestion would be to move your development notes from the mysql readme into a DEVELOPMENT_NOTES.md file

Now PR has been reviewed. Feels a bit odd saying thanks to myself, but I'll go with it (well done me).

atomicules · 2022-05-26T07:48:53Z

Since we use squashed commits I'm keeping the branch, but will protect it, as otherwise all the useful commit history will be lost (i.e. if anyone else ever wants to use it as a basis for adding another adaptor).

atomicules added 30 commits January 28, 2022 21:38

Add some initial README notes on developing MySQL

ff84001

Fix id and colinteger test

93c0a79

Had not paid attention, but autoincrement wasn't working properly because we were overriding by inserting a value. Insert NULL instead and move the `%d` to the colinteger column.

Remove timestamp test

9666419

Postgresql doesn't do it and we can't really do it comparing with `now()`.

Use os/exec instead of io.Copy

5a99bec

io.Copy seems to have stopped working for some reason - it just copies 0 bytes. Quite happily without error.

Reference commit - Bit value passing for TestReadComplex

b6c8bcc

This "works", but is not nice so hopefully will clean this up, but committing whilst it works.

Hard wrap some text in README

4f81e87

Was bugging me.

All TestReadComplex tests pass (including Spatial)

c469e0d

Loooooooooots of tidying up and future changes will be needed, but it feels like we are getting somewhere. Can definitely move on to the write tests soon.

Formatting / commenting / layout changes

f93ab50

No functional changes.

TestReadComplex - Some comment/naming changes only

db0064c

Missed these before moving onto Writer stuff.

Bump Binary length for test

d73d7ad

Seeing this error: ``` writer_test.go:171: unexpected Insert error, Error 1406: Data too long for column 'colbinary' at row 1 ```

Add note to README about install MySQL 5.7 for testing

6e9e7df

Whitespace change only. Indent switch/case

51dc73a

atomicules added 2 commits April 21, 2022 16:10

Initial README And Changelog changes to suit PRing.

f117c7c

Only tested/developed with mysql as both sink and source

786b686

I.e. don't know how it will work with other adaptors.

atomicules added 3 commits April 21, 2022 19:11

Fix golangci-lint composite literal uses unkeyed fields

07c7cee

Hopefully fix. I am having issues running the linter locally.

Fix golangci-lint lostcancel

cbdb864

Or hopefully as can't lint locally.

Fix golangci-lint type assertions in switch cases

7ef88a4

This is just a hack to work around the linter, but I geniunely don't see the benefit in what the linter is saying.

atomicules added 2 commits April 22, 2022 10:09

Comment changes only. Add TODOs

cef843b

Added TODOs where I've missed error handling and we could probably do with fixing that. At some point.

Mention need to create sink table structure first

e394857

atomicules requested a review from pete-may April 23, 2022 14:18

atomicules added 2 commits May 5, 2022 10:25

Update references to go-mysql-org/go-mysql

7777280

The upstream PR was merged in, until a new release of that is cut reference by commit ID. Needs slight change to the use of `SetCustomTLSConfig` as part of the PR review required a change to that function to return an error.

Markup in adaptor README

69a00e3

atomicules commented May 6, 2022

View reviewed changes

adaptor/mysql/reader.go Show resolved Hide resolved

atomicules added 10 commits May 9, 2022 20:58

Add some error handling that was TODO, tidy up some comments

7ac0ec4

golangci-lint fixes

6b8575a

I don't really care about handling/checking errors in the test files, because the tests will fail if it goes wrong. That's my reasoning anyway.

Indentation only

8285344

Indentation (again)

2982e76

Merge branch 'master' into dev-add-mysql

f678a8b

pete-may approved these changes May 25, 2022

View reviewed changes

atomicules added 2 commits May 26, 2022 08:19

Move MySQL development notes to separate file

a7309f1

Update Changelog for MySQL adaptor

0c803e7

Now PR has been reviewed. Feels a bit odd saying thanks to myself, but I'll go with it (well done me).

atomicules force-pushed the dev-add-mysql branch from 7999103 to 0c803e7 Compare May 26, 2022 07:38

atomicules merged commit add7994 into master May 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adaptor: Add MySQL #515

Adaptor: Add MySQL #515

atomicules commented Apr 21, 2022 •

edited

atomicules commented Apr 21, 2022

atomicules commented Apr 21, 2022

atomicules commented Apr 23, 2022

pete-may left a comment •

edited

atomicules commented May 26, 2022

Adaptor: Add MySQL #515

Adaptor: Add MySQL #515

Conversation

atomicules commented Apr 21, 2022 • edited

Required for all PRs:

atomicules commented Apr 21, 2022

atomicules commented Apr 21, 2022

atomicules commented Apr 23, 2022

pete-may left a comment • edited

Choose a reason for hiding this comment

atomicules commented May 26, 2022

atomicules commented Apr 21, 2022 •

edited

pete-may left a comment •

edited