Skip to content
This repository has been archived by the owner on Oct 17, 2023. It is now read-only.

Adaptor: Add MySQL #515

Merged
merged 101 commits into from May 26, 2022
Merged

Adaptor: Add MySQL #515

merged 101 commits into from May 26, 2022

Conversation

atomicules
Copy link
Collaborator

@atomicules atomicules commented Apr 21, 2022

I think this is at the point now where it's ready for review even if we don't merge yet.

There is obviously a LOT to review so will try to go over the main things worth looking at and considering:

  • This MySQL adaptor was built using the Postgresql one as a basis
  • You can follow the commit history to see how it was developed. But basically I copied the basic files over (client.go, mysql.go, reader.go, session.go, writer.go) from Postgresql (obviously renamed to mysql.go) and then added a test file at a time and used the tests to help me develop each file. Started with client first as that was easiest
  • Does it make sense that we have all this special handling in reader_test.go whereas Postgresql doesn't? I'm undecided. On the one hand I think this shouldn't be required as reader.go should do whatever is necessary, but on the other hand we don't necessarily specify how we are storing things at the Transporter level (i.e. compatibility layer between all adaptors, Golang native types). Maybe that should be in casifyValue instead? Could do with another pair of eyes, etc on this. In writing that comment I talked myself into changing things. See: c3fabd2 but I still welcome someone else's opinion on this.
  • It's worth looking through all the comments to see if that gives further things to think about. There are a lot of comments in tailer.go
  • Is any of the resumemap stuff relevant? To Postgresql either?
  • Uses go-mysql-org/go-mysql as a driver over go-sql-driver/mysql because the former supports replication. Currently references a specific commit rather than a release tag as TLS support for the driver has only just been merged in.
  • There are some TODOs left in the code, but I don't think these are things that need fixing right now. They are things that could be followed up with later. But you might disagree. Most are in tailer.go.
  • Performance is probably similar to Postgresql. I.e. we aren't doing any bulk inserts yet.
  • It has been tested with this test dataset copying between two DBaaS providers MySQLs using TLS connections and with tailing.
  • It runs SET FOREIGN_KEY_CHECKS=0; at the start of a session on the sink
  • It requires the connecting user to be able to read the binlog for tailing. Some DBaaS providers might not provide this right now
  • It's only been tested with a mysql sink and source. Unsure how it functions between other DB types. I don't think that should stop merging, but I could perhaps add a caveat to the mysql README about that.
  • Regarding handling binlog rotation, testing has shown that the replication/binlogsyncer handles this
  • Per Postgresql (and MongoDB?) need to create the destination database and table structure first. We could handle this in Transporter, but per the other adaptors probably makes sense to handle that in some other place, e.g. Code that references Transporter.

Fixes: #76

Required for all PRs:

  • CHANGELOG.md updated (feel free to wait until changes have been reviewed by a maintainer)
  • README.md updated (if needed)

- Copy files, don't bother with tests except client for now
- Change names from Postgresq, pq, etc to MySQL, etc
- Use github.com/go-sql-driver/mysql
- Don't bother with tailing for now
- Move the `casifyValue` function from tailing.go to reader.go since it seems it's needed even without tailing.

It "builds", but the tests fail with the below currently:

```
$ go117 test
--- FAIL: TestConnect (0.03s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x11078d3]

goroutine 7 [running]:
testing.tRunner.func1.2({0x123b0a0, 0x143e550})
        /opt/pkg/go117/src/testing/testing.go:1209 +0x24e
testing.tRunner.func1()
        /opt/pkg/go117/src/testing/testing.go:1212 +0x218
panic({0x123b0a0, 0x143e550})
        /opt/pkg/go117/src/runtime/panic.go:1038 +0x215
database/sql.(*DB).conn(0x0, {0x12ca330, 0xc00001e098}, 0x1)
        /opt/pkg/go117/src/database/sql/sql.go:1260 +0x53
database/sql.(*DB).PingContext(0x1278b3b, {0x12ca330, 0xc00001e098})
        /opt/pkg/go117/src/database/sql/sql.go:853 +0x7a
database/sql.(*DB).Ping(...)
        /opt/pkg/go117/src/database/sql/sql.go:874
github.com/compose/transporter/adaptor/mysql.(*Client).Connect(0x143f8e0)
        /Users/simon/Code/go/src/github.com/compose/transporter/adaptor/mysql/client.go:76 +0xdc
github.com/compose/transporter/adaptor/mysql.TestConnect(0xc000111380)
        /Users/simon/Code/go/src/github.com/compose/transporter/adaptor/mysql/client_test.go:76 +0xfe
testing.tRunner(0xc000111380, 0x1285c58)
        /opt/pkg/go117/src/testing/testing.go:1259 +0x102
created by testing.(*T).Run
        /opt/pkg/go117/src/testing/testing.go:1306 +0x35a
exit status 2
FAIL    github.com/compose/transporter/adaptor/mysql    0.832s
```

However... that is to be expected. I haven't altered anything else to be MySQL specific and don't have a local instance of MySQL running.
- Use `test` db instead of `mysql`. Perhaps just temporarily, but the
default local install I have has `test` available and accessible to the
standard user where `mysql` is protected
- Don't completely ignore the error with `sql.Open`, especially whilst
developing
- Use a minimal default URI of `/`
- Since `mysql://` is not valid in a DSN, add this in at the
`url.Parse` phase.
Just `/` worked for the client test, but with the reader tests we'll
run into issues like this:

```
panic: default addr for network '127.0.0.1' unknown
```

Using 127.0.0.1 or `localhost` if we don't specify the protocol. And if
we are doing it for the reader tests we might as well do it for the
default.

---

Since everything else in the world assumes a prefix we can strip off
the prefix if one is supplied in the URI. Allows better consistency between Postgresql and MySQL
- Add reader and adaptor tests copied from Postgresql
- Comment out writer and tailing stuff for now
- Update README to include some notes on Postgresql element types (from
the point of view of MySQL)
- Update the `iterareTable` query to suit MySQL
- Other minor tweaks in reader.go to change from Postgresql to MySQL

Now the tests run and fail the next step is to tweak to make MySQL
suitable - the existing SQL and data types are still for Postgresql.

E.g:

```
ERRO[0000] unexpected Insert error, Error 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'at time zone 'utc' -- coltimestamp TIMESTAMP,
                        )' at line 4
```

and

```
ERRO[0000] unexpected Insert error, Error 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'at time zone 'utc', -- coltimestamp TIMESTAMP,

                                                        '{1, 2, 3, 4}',          ' at line 4
 ```
Getting closer. Worked through this list:

https://dev.mysql.com/doc/refman/5.7/en/data-types.html

And defined all the column types in order at the top of
`adaptor_test.go` and then in the same order at the bottom of the file.
It's now progressing further as doesn't complain about inserting data,
but is failing to read the row count for some reason - although data is
there.

That probably means `reader.go` needs altering if it's not pulling data.

Still need to update `reader_test.go` similarly.

Some types I've had to drop out for now as realised I'm only running 5.6
locally.
I still know I need to update `reader_test.go` per the changes to
`adaptor_test.go`. Figured out why it wasn't reading the rows in. It
was still including a `public` prefix for Postgresql. For MySQL we need
the database name, so changed that - was actually just in the test file
not `reader.go`.

Also realised I should filter out `performanceSchema`.

Have some other changes after this commit, but not tidied up enough yet.
- Use Errorf instead of Fatalf so we can see all the ones that are 
failing
- Update TestReadComplex per changes in adaptor_test (still was 
Postgresql based)
- Remove Array stuff from casifyValue since N/A
- Make some changes / additions in casifyValue to suit MySQL. Some 
stuff still isn't right (bit for example)
- Can't get the newline in TEXT to work yet so remove that from test
- Fix some column name typos
- Add some notes on TestComplexRead to README.

Here's the current failures:

```
    reader_test.go:121: Expected coltimestamp of row to equal 2021-12-13 16:53:31.501985 +0000 GMT m=+0.091384396 (time.Time), but was 2021-12-13 16:53:31 +0000 UTC (time.Time)
    reader_test.go:121: Expected colpolygon of row to equal (0 0,10 0,10 10,0 10,0 0),(5 5,7 5,7 7,5 7, 5 5) (string), but was $@$@$@$@@@@@@@@@@@ (string)
    reader_test.go:121: Expected colbit of row to equal 000101 (string), but was  (string)
    reader_test.go:121: Expected colgeometrycollection of row to equal POINT(1 1),LINESTRING(0 0,1 1,2 2,3 3,4 4) (string), but was ????????@@@@ (string)
    reader_test.go:121: Expected colbinary of row to equal 3735928559 (int), but was ޭ?? (string)
    reader_test.go:121: Expected id of row to equal 0 (int64), but was 1 (int64)
    reader_test.go:121: Expected colblob of row to equal 3735928559 (int), but was ޭ?? (string)
    reader_test.go:121: Expected colpoint of row to equal (15 15) (string), but was .@.@ (string)
    reader_test.go:121: Expected collinestring of row to equal (0 0,1 1,2 2) (string), but was ????@@ (string)
```
Had not paid attention, but autoincrement wasn't working properly 
because we were overriding by inserting a value. Insert NULL instead 
and move the `%d` to the colinteger column.
Postgresql doesn't do it and we can't really do it comparing with `now()`.
Reads a PNG image and stores in MySQL. Nice as a proper test, but maybe
a bit much for long-term. Will keep whilst developing though.

I wanted to read the file into Go as a string and then store that
"string" in MySQL, but I couldn't get it to work that way. I had to use
MySQL `LOAD_FILE` instead.

Copy/pasted the `io.Copy` code from https://opensource.com/article/18/6/copying-files-go
io.Copy seems to have stopped working for some reason - it just copies 
0 bytes. Quite happily without error.
Figure something out for this. I'm honestly not sure how best to handle
this (and similar bit and spatial stuff).

I.e. should we actual do some decoding in reader? Or just do per this
commit and manipulate stuff in testing just so we can compare values?

Will probably come back and look at this again later on.
This "works", but is not nice so hopefully will clean this up, but
committing whilst it works.
Doesn't work, I get:

```
    reader_test.go:143: Expected colpoint of row to equal POINT(15 15) (string), but was [7.291122019556398e-304 2.5124430165029e-310] (orb.Point)
```

But commiting for reference as one way to try to use paulmach/orb. 
There are other options.

Maybe for Spatial stuff we do need to process in reader.go though and 
not in the test. That's going to take me a bit to think about though.
Same difference really. This gives:

```
    reader_test.go:144: Expected colpoint of row to equal POINT(15 15) (string), but was &{{1 2 [7.291122019556398e-304 2.5124430165029e-310] 0}} (*geom.Point)
```

It's interesting that both at recognising it's a point, but the value 
is way off.

Maybe it's close to working though?
It works and passes... but needs a lot of tidying up. Commiting before 
I change and accidentally break it (and then forget how I got it 
working).

One advantage of go-geom over orb is that is has wkbhex functions...

If we get the Hex string from MySQL then we get it exactly as [per this 
doc][1]. We can then use wkbhex to decode to go-geom geometry format 
and a further function to change it to wkt

Obviously there are various ways we could approach this. Instead of 
changing the value from MySQL to wkt we could leave it as go-geom and 
then make sure the value we are testing it against is also go-geom. At 
some point we probably want to store/reference spatial stuff internally 
using go-geom (or Orb - personally I like the look of Orb more, but 
go-geom is working) as that'll probably help with Postgresql (and 
transferring Spatial data between Postgresql and MySQL).

Made some minor spacing changes to suit how go-geom converts stuff to 
WKT.


[1]: https://dev.mysql.com/doc/refman/5.6/en/gis-data-formats.html
Loooooooooots of tidying up and future changes will be needed, but it 
feels like we are getting somewhere. Can definitely move on to the 
write tests soon.
Copied from Postgresql and making the obvious changes (ugh... I really
should have done commits before changing anything... oh well):

- Changing the NewClient urls
- Changing pqSession to mysqlSession
- Changing from "public.table" to "database.table"

Still lots here that is Postgresql specific though.

Working on just one writer test at a time so have uncommented
writerTestData in adaptor_test for now.

This is where things are at the moment:

```
INFO[0000] no function registered for operation, command
INFO[0000] no function registered for operation, command
--- FAIL: TestInsert (0.11s)
    writer_test.go:62: unexpected Insert error, Error 1054: Unknown column '$1' in 'field list'
[...]
    writer_test.go:93: Error on test query: sql: no rows in result set
```
Two main things needed changing:

1. With Postgresql we do this:

        INSERT INTO writer_insert_test.simple_test_table (id, colvar, coltimestamp) VALUES ($1, $2, $3)

    But with MySQL it needs to be a bit different:

        INSERT INTO writer_insert_test.simple_test_table (id, colvar, coltimestamp) VALUES (?, ?, ?)

    I thought ordering might be an issue, but seems not.

2. We need to use `parseTime=true` in the DSN to get the Timestamp 
automagically recongnised. [See docs][1].

[1]: https://github.com/go-sql-driver/mysql#timetime-support
Missed these before moving onto Writer stuff.
I think I'm mostly struggling with the Spatial / Geometry stuff -
although since it randomises the insert order I could still have other
bits wrong (binary, blob).

I don't know how to get the geometry it in a form that MySQL expects via
`sql.Exec`.

I had initially tried providing like this:

```
"colpoint":              "ST_GeomFromText('POINT (15 15)')"
```

But of course that gets interpreted as a string.

I then thought I needed to do the reverse of what I do in reader_test.go
and get it in byte format (per this commit), but that's not right
either.

I think I need to actually handle in writer.go? Not sure...

```
_, err := s.Exec(query, data...)
```

How to get in the correct form in `data`?
Honestly... a 50 character limit on commit titles is shit. The above is 
a bit misleading. This is better:

Store geometry values in a Go geometry format so we can inspect their 
type in writer.go and wrap in `ST_GeomFromText` for the INSERT.

This is currently failing with:

```
--- FAIL: TestComplexInsert (0.02s)
    writer_test.go:171: unexpected Insert error, Error 1416: Cannot get geometry object from data you send to the GEOMETRY field
```

But it seems like it should work. I've verified a manual:

```
INSERT INTO writer_complex_insert_test.complex_test_table (...) VALUES (...actual values...)
```

using the text dumped out and all the `ST_GeomFromText` bits do work 
and do insert ok. So I'm not really sure where it's going wrong at the 
moment. Guess I need to dig into the underlying driver and understand 
what it's actually doing.
Seeing this error:

```
    writer_test.go:171: unexpected Insert error, Error 1406: Data too long for column 'colbinary' at row 1
```
Figured it out. Rather than do/generate this:

```
INSERT INTO writer_complex_insert_test.complex_test_table (colpoint) VALUES (?)
```

And send `data` like so:

```
ST_GeomFromText('POINT (15 15)')
```

Do this approach instead:

```
INSERT INTO writer_complex_insert_test.complex_test_table (colpoint) VALUES (ST_GeomFromText(?))
```

with `data` like:

```
"POINT (15 15)"
```

This approach (wrapping the placeholders) works a treat.

The tests still aren't passing, but they aren't failing where they were 
and are getting to this point:

```
--- FAIL: TestComplexInsert (0.03s)
    writer_test.go:182: Error on test query: sql: no rows in result set
```
Whoop!

1. Add back in the 10 total inserts so the checks based on these pass
2. Need the `parseTime=true` otherwise get:

		writer_test.go:182: Error on test query: sql: Scan error on column index 2, name "coltimestamp": unsupported Scan, storing driver.Value type []uint8 into type *time.Time
@atomicules
Copy link
Collaborator Author

Bah. gofmted everything, but still need to make some linting changes it seems.

Hopefully fix. I am having issues running the linter locally.
Or hopefully as can't lint locally.
This is just a hack to work around the linter, but I geniunely don't 
see the benefit in what the linter is saying.
@atomicules
Copy link
Collaborator Author

🎉 Checks passing.

Added TODOs where I've missed error handling and we could probably do 
with fixing that. At some point.
@atomicules
Copy link
Collaborator Author

Checks are now failing. Github actions have switched from focal to jammy and weirdly that caused a failure. Fix here: #516

The upstream PR was merged in, until a new release of that is cut 
reference by commit ID.

Needs slight change to the use of `SetCustomTLSConfig` as part of the 
PR review required a change to that function to return an error.
MySQL doesn't support Json arrays. Well, at least not in the same way
as Postgresql. This was copied over from Postgresql (see reference
below).

However... realised I _do_ need to add Json support as I'd skipped it
initially as was developing against 5.6 which doesn't have Json support.

References: #315
I don't really care about handling/checking errors in the test files, 
because the tests will fail if it goes wrong. That's my reasoning 
anyway.
This reverts commit 9b6a8c0.

This wasn't for Json arrays. This is what handles items like:

```
"coljson":            map[string]interface{}{"name": "batman"},
```

and we might as well keep compatibility with how Postgresql works even
though writing and reading Json strings on MySQL is actually straight
forward.
With MySQL Json [seems pretty easy][1] and we can insert strings like
so:

```
"{\"name\": \"batman\", \"sidekick\": \"robin\"}"
```

But for compatibility with the Postgresql adaptor added back in the
stuff that lets us insert data like this:

```
map[string]interface{}{"name": "batman"},
```

Add a comment explaining what:

```
case map[string]interface{}, mejson.M, []map[string]interface{}, mejson.S:
```

Now I actually know what this is - it's not handling array data like I
thought.

This also helps tidy up a lot of the TODO error handling.
[1]: https://dev.mysql.com/doc/refman/5.7/en/json.html
After playing about and looking at this more, I _think_ that code (which
we copied from Postgresql) was for handling data like:

```
"colarrayint":        []interface{}{1, 2, 3, 4},
```

Which definitely isn't applicable to MySQL as it doesn't have array
support like Postgresql.

We can always revert this at a later point if it turns out we do need
it. But it doesn't stop the tests passing right now.
I.e. don't "make the tests pass" and do it properly.

In writing a comment on the PR I talked myself round to doing this. I'd 
written this:

> Does it make sense that we have all this [special handling in 
> `reader_test.go`][1] whereas Postgresql doesn't? I'm undecided. On 
> the one hand I think this shouldn't be required as reader.go should 
> do whatever is necessary, but on the other hand we don't necessarily
> specify how we are storing things at the Transporter level (i.e.
> compatibility layer between all adaptors, Golang native types). Maybe
> that should be in [`casifyValue` instead][2]? Could do with another
> pair of eyes, etc on this.

And then decided to handle things properly as much as possible. I.e. it 
does make sense for `casifyValue` to do this as we'd ultimately want 
the adaptor to pass that off to another adaptor, etc.

 I think what I have here is correct. Think.

There is one "hack" left in `reader_test.go` for the `colbinary` value 
of `0xDEADBEEF` and my reasons for that are explained in the comments. 
It doesn't stop binary data from working properly / as intended since 
the colblob test passes.

[1]: https://github.com/compose/transporter/blob/9d6d829ed2c2c71688c4d7e427147d83c228ac1b/adaptor/mysql/reader_test.go#L134-L167
[2]: https://github.com/compose/transporter/blob/9d6d829ed2c2c71688c4d7e427147d83c228ac1b/adaptor/mysql/reader.go#L201-L231
Copy link
Contributor

@pete-may pete-may left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!!!

  • Tested ./transporter init mysql mysql works
  • Tested moving a couple rows between two Compose mysqls works
  • Tested tailing between two mysqls works

Code looks really clean and you seem to have covered more things than I would ever have even imagined as a first pass at getting this working (handling spatial objects). So I say let's merge this monster. Can always fix bugs as they come up.

One suggestion would be to move your development notes from the mysql readme into a DEVELOPMENT_NOTES.md file

Now PR has been reviewed. Feels a bit odd saying thanks to myself, but
I'll go with it (well done me).
@atomicules
Copy link
Collaborator Author

Since we use squashed commits I'm keeping the branch, but will protect it, as otherwise all the useful commit history will be lost (i.e. if anyone else ever wants to use it as a basis for adding another adaptor).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

adaptor for mysql
2 participants