Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration tests by compiling and calling Ghostferry from Ruby #64

Merged
merged 6 commits into from
Jan 21, 2019

Conversation

shuhaowu
Copy link
Contributor

@shuhaowu shuhaowu commented Sep 17, 2018

Instead of running integration tests directly in Go, this PR enables us to run integration tests by calling Ghostferry from Ruby as a subprocess. This is needed because it is not possible for the the Go code to interrupt and resume itself.

Given that the Ruby integration test code is also much cleaner than the Go equivalent and suffer less issues such as goroutine leakage, we should prefer writing integration tests in Ruby from now on.

The Ghostferry subprocess right now is a separate binary from copydb/sharding. This decision was made because the tests needs to perform some actions (such as interrupting Ghostferry or injecting some data) in the middle of the run at strategic and reproducible times(1) and the base Ghostferry framework doesn't support it for now. In order to accomplish this, bidirectional communication needs to exists between the Ruby and Go code and this is done via a WEBrick::HTTPServer on the local machine.

Using this server, Ghostferry can send a message and pause until a reply is sent from the server. The pause allows race injection and other test code to execute.

All of this is wrapped up in the class GhostferryHelper::Ghostferry. To use it in tests, one can extend from GhostferryTestCase, where other helpers like the database connection, a parallel data writer, also exists.

A test will look something like the following:

  def test_copy_data_with_writes_to_source
    seed_simple_database_with_single_table

    datawriter = new_source_datawriter
    ghostferry = new_ghostferry(MINIMAL_GHOSTFERRY)

    start_datawriter_with_ghostferry(datawriter, ghostferry)
    stop_datawriter_during_cutover(datawriter, ghostferry)

    ghostferry.run
    assert_test_table_is_identical
  end

@shuhaowu shuhaowu changed the title Integration tests by compiling and calling Ghostferry from Ruby [WIP] Integration tests by compiling and calling Ghostferry from Ruby Sep 25, 2018
@shuhaowu shuhaowu force-pushed the ruby-integration-test branch 2 times, most recently from 68f4f0a to 23a602d Compare October 30, 2018 19:03
@shuhaowu shuhaowu force-pushed the ruby-integration-test branch 5 times, most recently from a456cd5 to 2b25a1d Compare November 20, 2018 03:16
@shuhaowu shuhaowu force-pushed the ruby-integration-test branch 2 times, most recently from 3f45878 to e4c88ce Compare November 20, 2018 19:12
@shuhaowu shuhaowu changed the title [WIP] Integration tests by compiling and calling Ghostferry from Ruby Integration tests by compiling and calling Ghostferry from Ruby Nov 20, 2018
@shuhaowu
Copy link
Contributor Author

shuhaowu commented Jan 8, 2019

It would be great if I can get some reviews on this PR. It's a fairly independent piece that we'll need regardless what approach we take to implement the rest of #75 as we'll need a mechanism to test.

At least half of this PR is reasonably trivial: direct translation from Go (DataWriter, TrivialIntegrationTest, TestCase), glue code to work with database (DbManager). The other half is a simple implementation of a Ghostferry with unix socket for status reporting.

For future note: the integrationferry.Ferry.Main could be a method we'll eventually port back into ghostferry.Ferry, to simplify how Ghostferry application can be run as per an idea @pushrax had sometime ago.

Copy link
Contributor

@fjordan fjordan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the overall approach of running integration tests using a separate (ruby) process. Questioning the necessity of the socket communication and being event-driven as it seems to introduce a good amount of complexity, but apart from that this looks good to me. I can't speak to the conventions or idioms of ruby testing as much as others, so I'll defer that to the other reviewers.

StatusDone string = "DONE"

// Could be sent by multiple goroutines in parallel
StatusBeforeRowCopy string = "BEFORE_ROW_COPY"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all of these statuses actually needed and used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They will be useful. For example, if we want to terminate Ghostferry in a test after copy has begun (in a later PR):

@ghostferry.on_status(Ghostferry::Status::AFTER_ROW_COPY) do
  batches_written += 1
  if batches_written >= 2
    @ghostferry.send_signal("TERM")
  end
end

Furthermore it is the intention to eventually port all existing Ghostferry integration tests into this framework. Lots of those Go tests uses the same statuses.

*ghostferry.Ferry
}

// =========================================
Copy link
Contributor

@fjordan fjordan Jan 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work setting all of this up. It clearly works and does what it needs to do. However, I'm not sure we really need to setup and manage a socket to communicate between ghostferry and the ruby master process. Could the same thing not have been achieved using signals (SIGSTOP, SIGCONTv, and SIGUSR1/2), ENV vars or even files? This just seems like significant overhead and complexity that could be simplified using other methods.

Also, I'm not sure we really need to make it event-driven (do we really need to know when each row is written?) as much as simply communicating overall state, which I again feel could be communicated using the other methods mentioned above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't understand the need for this custom communication protocol.

Another option is to use the already existing HTTPCallback facility, similar to how the sharding package uses it in sharding/testhelpers/unit_test_suite.go and how the shop mover integration works (see ghostferry.rb in core).

Copy link
Contributor

@hkdsun hkdsun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on slack, this ruby testing setup/structure is super weird to me. Can we follow some existing patterns instead of hacking together a bunch of binaries? I understand it works but please let's refactor to something that our colleagues can understand quickly.

What I expect:

  • bundle exec rake test in the root of the repo runs all ruby tests (see Rakefile in nginx-routing-modules)
  • require 'test_helper' instead of require 'ghostferry_integration'
  • All ruby test files follow the pattern *_test.rb and live under test/integration or test/unit
  • Avoid implicit assumptions in tests such as @dbs and @ghostferry. It's fine to have them in one file (the base test case) but not all over the rest of the test cases.

I can provide a more detailed review when some of these concerns are taken care of

@hkdsun
Copy link
Contributor

hkdsun commented Jan 9, 2019

By the way, you may want to check out and possibly port some of the shop mover tests from core

@shuhaowu shuhaowu force-pushed the ruby-integration-test branch 3 times, most recently from 15c373e to 3901412 Compare January 15, 2019 15:03
Makes room to make the tests more like ruby tests, which will be
employed heavily.
Instead of running integration tests directly in Go, we run it in Ruby
by calling Ghostferry as a subprocess. The idea is that we will be able
to test the interrupt/resume work more easily as well as making
integration tests easier to write.

In order to allow the ruby tests to inject race conditions for testing
purposes, we need a way for the ruby code to pause and restart the
execution of Go code at strategic locations. This is done with a local
Unix socket. At locations of interests, the Go code will send some
string via the Unix socket to the Ruby code. The ruby code will perform
some sort of callback (such as injecting data, or locking a row) and
then send a command back to the Go code via the same Unix socket,
allowing it to continue executing. Note that each time we send some
status to the ruby server, we do it in a new connection. This is to
avoid race conditions if one shared connection was used between all the
goroutines we employ inside Ghostferry.

Stdout and stderr are captured and they can be asserted
against/examined, more helper methods can be created later to give
better structures for the stdout/stderr output such as being able to
recognize a panic, or recognize particular stages of execution from
within the logs.

Lastly, the intention is to replace all integration tests with this
framework in the future as it would likely be more robust and cause less
issues such as goroutine leaks during the go test run. However, for the
time being we will have both integration tests in Go and Ruby.
The DataWriterHelper is a straight port from data_writer.go, which is a
multithreaded data writer that writes random data to the database.

The DbHelper helps connects to the database, seed it, and verify that
the source matches the target so that tests don't have to implement it
over and over again.
Added some trivial integration tests and some interrupt/resume tests.

Note the interrupt and resume works on master as of this commit without
reconciler because the schema didn't change.
@shuhaowu
Copy link
Contributor Author

I've updated this PR to follow more of a Ruby convention:

  • Uses bundle exec rake test
  • require "test_helper"
  • Removed GhostferryIntegration module and moved functionality into *Helper modules that are included in GhostferryTestCase.
  • Avoided implicit instance variables like @ghostferry and force tests to create their own instances of ghostferry via ghostferry = new_ghostferry(minimal); ghostferry.run.
  • Avoided ambiguously named functions like use_datawriter in favour of start_datawriter_with_ghostferry(datawriter, ghostferry); stop_datawriter_during_cutover(datawriter, ghostferry)
  • Used WEBrick::HTTPServer instead of UNIX socket
  • Moved all go test code from test/*.go into test/go/*.go

end
end

class DataWriter
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much of this is translated directly from data_writer.go. The goal is eventually we can use Ruby tests only.

[source_metrics, target_metrics]
end

def table_metric(conn, table, sample_id: nil)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big fan of this name, any ideas?

Copy link
Contributor

@hkdsun hkdsun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So much better 👍 I found the Ghostferry helper very hard to read still but that can be massaged as we go along maybe.

Looks good to go functionally though!

assert_equal(
source[DEFAULT_FULL_TABLE_NAME][:sample_row],
target[DEFAULT_FULL_TABLE_NAME][:sample_row],
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the checksum assertion sufficient? I'm having trouble understanding where you're going with the sample rows

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checksum is likely sufficient and the row count (especially since you can have a checksum of 0 if a table is empty), I'm just being defensive. I'll remove this line.

refute dumped_state["LastSuccessfulPrimaryKeys"].nil?
refute dumped_state["CompletedTables"].nil?
refute dumped_state["LastWrittenBinlogPosition"].nil?
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit odd to have as a global helper and could easily go out of date. You could make it a private method where it's being used right now for the first concern. For the going out of date... maybe we need to write some negative tests? i.e. "if i remove field X, then we can't resume correctly?" or something like that?

if err != nil {
panic(err)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe keep IntegrationFerry in the same file for now? Or did you have a purpose in mind for having a separate IntegrationFerry abstraction ?

end

# When using this method, you need to call it within the block of
# GhostferryIntegration::TestCase#with_state_cleanup to ensure that the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with_state_cleanup doesn't exist?

end

dumped_state = ghostferry.run_expecting_interrupt
assert_basic_fields_exist_in_dumped_state(dumped_state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we assert more than that at this point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to assert the state of the database? It's difficult to know exactly what's going on due to the semi-randomness of the data writer and binlog streamer.

I'll try to add a test without the datawriter where we can better check the state of the target database.


start_datawriter_with_ghostferry(datawriter, ghostferry)
stop_datawriter_during_cutover(datawriter, ghostferry)
ghostferry.run(dumped_state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert_test_table_is_identical ?


class InterruptResumeTest < GhostferryTestCase
def test_interrupt_resume_with_writes_to_source
seed_simple_database_with_single_table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def setup?

// These should be kept in sync with ghostferry.rb
portEnvName string = "GHOSTFERRY_INTEGRATION_PORT"
timeout time.Duration = 30 * time.Second
maxMessageSize int = 256
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unused


# Keep these in sync with integrationferry.go
ENV_KEY_PORT = "GHOSTFERRY_INTEGRATION_PORT"
MAX_MESSAGE_SIZE = 256
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused as well

StatusRowCopyCompleted string = "ROW_COPY_COMPLETED"
StatusDone string = "DONE"

// Could be sent by multiple goroutines in parallel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this distinction matter?

@shuhaowu
Copy link
Contributor Author

I've addressed all the feedback as well as slightly reorganized the ghostferry_helper module with some comments to make it less all over the place. Hopefully that will help with some of the readability. We can improve it incrementally as we go forward.

@shuhaowu shuhaowu merged commit 07831b1 into Shopify:master Jan 21, 2019
@shuhaowu shuhaowu deleted the ruby-integration-test branch January 21, 2019 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants