Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Jepsen] Concurrent create-table calls can result in "Table Not Found" #798

Closed
aphyr opened this Issue Jan 24, 2019 · 4 comments

Comments

4 participants
@aphyr
Copy link

commented Jan 24, 2019

On Yugabyte 1.1.10.0, when two or more threads create a keyspace, create a table in that keyspace, then insert a record into that table concurrently, inserts can throw Table Not Found errors, even though the call to create that table completed successfully just prior. For instance, this code from the Jepsen bank test reliably throws when creating accounts, unless execution is restricted to a single thread at a time:

      (cql/create-keyspace conn keyspace
                           (if-not-exists)
                           (with
                             {:replication          
                              {"class"              "SimpleStrategy"
                               "replication_factor" 3}}))
      (info "Creating table") 
      (cassandra/execute conn (str "CREATE TABLE IF NOT EXISTS " keyspace "." table-name                           
                                   " (id INT PRIMARY KEY, balance BIGINT)"
                                   " WITH transactions = { 'enabled' : true }"))
      (dotimes [i n]
        (info "Creating account" i)
        (cql/insert-with-ks conn keyspace table-name {:id i :balance starting-balance}))))

@kmuthukk suggests that this might be a case where a concurrent call to create-table observes another create-table in process, and returns immediately, without blocking for the table to be completely ready.

@aphyr aphyr changed the title Concurrent CREATE-TABLE calls can result in "Table Not Found" Concurrent create-table calls can result in "Table Not Found" Jan 24, 2019

@kmuthukk kmuthukk added the bug label Jan 24, 2019

@kmuthukk kmuthukk added this to To Do in YBase features via automation Jan 24, 2019

@mbautin mbautin changed the title Concurrent create-table calls can result in "Table Not Found" [Jepsen] Concurrent create-table calls can result in "Table Not Found" Feb 7, 2019

@ravimurthy ravimurthy assigned hectorgcr and unassigned ravimurthy Feb 11, 2019

@hectorgcr

This comment has been minimized.

Copy link
Contributor

commented Feb 13, 2019

With the YugaByte cassandra java driver (https://github.com/YugaByte/cassandra-java-driver) this issue doesn't happen. Before an insert, our driver issues an RPC to the master to get all the tablet location along with the key ranges they serve so that it can send the request to the yb-tserver that is the leader of the tablet that contains the key for our insert statement. If one of the tablets is not ready yet, the master returns an error. Our driver then does a retry in a few seconds. In other words, even though the CREATE TABLE IF NOT EXISTS returns before the tablets are ready for writes, the master will not return the list of tablet servers until all the tablets are ready, so the INSERT will be delayed until then.

Here is part of the log immediately after an insert statement and before the tablets are ready:

W0213 00:54:00.757200 145428480 table-internal.cc:169] Error getting table locations: Service unavailable (yb/master/catalog_manager.cc:6245): Tablet not running, retrying.
2019-02-13 00:54:01,793 (cluster1-reconnection-0) [ERROR - com.datastax.driver.core.ControlConnection$1.onConnectionException(ControlConnection.java:176)] [Control connection] Cannot connect to any host, scheduling retry in 8000 milliseconds```
@aphyr

This comment has been minimized.

Copy link
Author

commented Feb 13, 2019

I can observe this reliably with com.yugabyte/cassandra-driver-core, version 3.2.0-yb-19, via Yugabyte's fork of Cassaforte. Is it possible that there's a different version of Yugabyte's Cassandra Java driver which works differently? Or that Cassaforte is bypassing some safety code you've added?

@hectorgcr

This comment has been minimized.

Copy link
Contributor

commented Feb 13, 2019

@aphyr never mind. It was a timing issue. I've been able to reproduce it in a unit test. I'll be submitting a fix soon.

@aphyr

This comment has been minimized.

Copy link
Author

commented Feb 13, 2019

Glad to hear it! :)

yugabyte-ci pushed a commit that referenced this issue Feb 15, 2019

#798: Concurrent create-table calls can result in "Table Not Found"
Summary:
Currently if a `CREATE TABLE IF NOT EXISTS` is sent concurrently with a `CREATE TABLE` request, the first statement could return successfully because the second statement has already created an entry in the master's memory, but that doesn't mean that the table is ready to receive requests, and it's possible that an `INSERT` statement following the first statement will be rejected with a "Table Not Found" error.

Implementation wise this is what happens:
` Executor::ExecPTNode` creates a `YBTableCreator` to send the request. If the returned status for a `CREATE TABLE IF NOT EXISTS` request is `IsAlreadyPresent`, ` Executor::ExecPTNode` returns `Status::OK`.

When the master leader returns `IsAlreadyPresent`, it only means that the table has been inserted in its memory. In order for the table to start accepting requests, all of its tablets have to be in the `RUNNING` state. In the case of a successful `CREATE TABLE` request, this is ensured by calling `YBClient::Data::WaitForCreateTableToFinish`, but we never wait when the error returned is `IsAlreadyPresent`

This diff makes the following changes:
- `YBTableCreator::Create` ensures that `YBClient::Data::WaitForCreateTableToFinish` gets called when a `CREATE TABLE` succeeds, or when the returned error is `IsAlreadyPresent`

Test Plan: New unit tests

Reviewers: ravi, bogdan, mikhail, robert, amitanand

Reviewed By: amitanand

Subscribers: neha, kannan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D6156

@hectorgcr hectorgcr closed this Feb 16, 2019

YBase features automation moved this from To Do to Done Feb 16, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.