Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dist-txns] About 6K logs per hr of the form [transaction_participant.cc:1005] <tablet-id>: Transaction not found: <txn-id> #479

Closed
kmuthukk opened this issue Sep 24, 2018 · 2 comments
Assignees
Labels
kind/enhancement This is an enhancement of an existing feature

Comments

@kmuthukk
Copy link
Collaborator

On a test cluster (3-node, 16-cpu machines), running the UniqueSecondaryIndex workload:

% java -jar yb-sample-apps.jar -workload CassandraUniqueSecondaryIndex --nodes $CIP_ADDR -num_threads_read 1 -num_threads_write 128 --num_unique_keys 1000000000 --num_writes 1000000000 --nouuid

Seeing some internal logging. Not seeing any external issues per se, but perhaps the internal retry logic is logging too unnecessarily? Or is this is an unexpected case that we should optimize further for.

centos@ip-172-151-28-173 logs]$ zgrep "0923 22:" yb-tserver.ip-172-151-28-173.us-west-2.compute.internal.yugabyte.log.INFO.20180923-040859.20748.gz | grep "transaction_partic" | head -10
W0923 22:00:01.118551 39933 transaction_participant.cc:263] Tablet application did not catch up in: 10.000s
I0923 22:00:01.119123 39933 transaction_participant.cc:290] Cleanup Aborts Task finished.
I0923 22:00:49.733860 39983 transaction_participant.cc:995] c96da8521b1e433882f6ca735cda13fc: Loading transaction: f9764372-fd6b-42d4-9254-da65ada29eb4, for: status
W0923 22:00:49.737884 39983 transaction_participant.cc:1005] c96da8521b1e433882f6ca735cda13fc: Transaction not found: f9764372-fd6b-42d4-9254-da65ada29eb4
I0923 22:00:49.738711 39983 transaction_participant.cc:995] c96da8521b1e433882f6ca735cda13fc: Loading transaction: 06000000-0000-0000-ffff-ffff117f0000, for: status
W0923 22:00:49.878545 39983 transaction_participant.cc:1005] c96da8521b1e433882f6ca735cda13fc: Transaction not found: 885d5c30-0000-0000-80ea-22e000000000
I0923 22:00:49.887105 39983 transaction_participant.cc:995] c96da8521b1e433882f6ca735cda13fc: Loading transaction: e8000000-0000-0000-8305-010000000000, for: status
W0923 22:00:49.887171 39983 transaction_participant.cc:1005] c96da8521b1e433882f6ca735cda13fc: Transaction not found: 03000000-0000-0000-0000-000000000000
I0923 22:00:49.887203 39983 transaction_participant.cc:995] c96da8521b1e433882f6ca735cda13fc: Loading transaction: e8000000-0000-0000-4022-62db00000000, for: status
W0923 22:00:49.887238 39983 transaction_participant.cc:1005] c96da8521b1e433882f6ca735cda13fc: Transaction not found: e8000000-0000-0000-4022-62db00000000

In terms of count, just grepping for "0923 22" (i.e. Sep/23 10pm log lines), we see about 6.9K occurrences each hour of each of the above two log lines.

$ zgrep "0923 22:" yb-tserver.ip-172-151-28-173.us-west-2.compute.internal.yugabyte.log.INFO.20180923-040859.20748.gz | grep "Transaction not found" | wc -l
6928
$ zgrep "0923 22:" yb-tserver.ip-172-151-28-173.us-west-2.compute.internal.yugabyte.log.INFO.20180923-040859.20748.gz | grep "Loading transaction" | wc -l
6928
@kmuthukk kmuthukk added this to To Do in Distributed transactions via automation Sep 24, 2018
@kmuthukk kmuthukk added the kind/enhancement This is an enhancement of an existing feature label Sep 24, 2018
@kmuthukk
Copy link
Collaborator Author

Similar count during previous "hour" windows...

$ zgrep "0923 21:" yb-tserver.ip-172-151-28-173.us-west-2.compute.internal.yugabyte.log.INFO.20180923-040859.20748.gz | grep "Transaction not found" | wc -l
6968
$ zgrep "0923 20:" yb-tserver.ip-172-151-28-173.us-west-2.compute.internal.yugabyte.log.INFO.20180923-040859.20748.gz | grep "Transaction not found" | wc -l
7701
$ zgrep "0923 19:" yb-tserver.ip-172-151-28-173.us-west-2.compute.internal.yugabyte.log.INFO.20180923-040859.20748.gz | grep "Transaction not found" | wc -l
8247
$ zgrep "0923 18:" yb-tserver.ip-172-151-28-173.us-west-2.compute.internal.yugabyte.log.INFO.20180923-040859.20748.gz | grep "Transaction not found" | wc -l
8293

yugabyte-ci pushed a commit that referenced this issue Sep 26, 2018
… cleaned.

Summary:
If transaction was committed locally it should be removed of list of transactions for cleanup. But we also iterate over this list concurrently, so we could end up with invalid iterator.

Also we remove transaction from this list if remove end responded that it is not in aborted state.
This could end up with invalid iteration also, because we iterate w/o holding required mutex.

To address those issues I've introduced list of erased transactions, and just make original list cleanup when
statuses for all transactions are known.

Also fixed rendering collection of uuids to string.

Test Plan:
Run local cluster.
Launch the following workload with it.
java -jar java/yb-sample-apps.jar -workload CassandraUniqueSecondaryIndex --nodes 127.0.0.1:9042,127.0.0.2:9042,127.0.0.3:9042 -num_threads_read 1 -num_threads_write 16 --num_unique_keys 1000000000 --num_writes 1000000000 --nouuid

Wait for 45 minutes, check that we don't have invalid transaction ids in logs, i.e. ids with many zeroes.

Reviewers: mikhail, timur, robert, venkatesh

Reviewed By: venkatesh

Subscribers: kannan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D5506
@kmuthukk
Copy link
Collaborator Author

Fixed in 42a4bc6

Distributed transactions automation moved this from To Do to Done Sep 28, 2018
yugabyte-ci pushed a commit that referenced this issue Oct 15, 2018
Summary:
1) Added new log utility to use different severity depending on
   number of messages in specified period of time.
   Used it in TransactionCoordinator.

2) Use different severities in tablet_rpc, depending on status error type.

3) Change severity from WARNING to INFO in Tunnel.

Test Plan: Jenkins

Reviewers: mikhail, robert, timur

Reviewed By: timur

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D5569
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement This is an enhancement of an existing feature
Projects
Development

No branches or pull requests

2 participants