Skip to content

Conversation

@xinhaoz
Copy link
Member

@xinhaoz xinhaoz commented Jul 25, 2023

…og line

The TestDockerCLI_test_exec_log test had a section where it waits for the last executed stmt to appear in the exec_log file before proceeding to the next section. This is done by grep'ing the query string of the last stmt. This works fine most of the time, however, the 2nd last and last stmt executed happen to have the same query string, so it's possible that we are only seeing the 2nd to last stmt in the log. This can cause the test to fail as in the next section, we expect all query logs to be present in the file.

This commit fixes this by issuing a distinct query as the last query so that we can grep for a distinct query string to verify all stmts should have been written.

A new proc is also added to common.tcl: flush_and_sync_logs is a proc that takes 2 arguments, filename and grep_text. It will trigger a signal hang up to trigger a log flush and then attempt to find the grep_text in the specified file.

Epic: none
Fixes: #106367

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@xinhaoz xinhaoz marked this pull request as ready for review July 25, 2023 19:33
@xinhaoz xinhaoz requested a review from a team July 25, 2023 20:08
@maryliag maryliag requested a review from abarganier July 25, 2023 20:17
Copy link
Contributor

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @xinhaoz)


pkg/cli/interactive_tests/test_exec_log.tcl line 117 at r1 (raw file):

# succeeds.
system "for i in `seq 1 3`; do
  grep 'SELECT ..*111..*' $logfile && exit 0;

Here's my mental model for this test:

  1. Issue some queries that will be logged to the exec log.
  2. Those query logs are buffered in CRDB, waiting to be flushed at some unpredictable time to the exec log.
  3. Trigger flush_server_logs, which returns once cockroach.log is flushed.
  4. Use the same loop logic to make assertions against the exec log, but this time look for a different query from the same file.

Don't we need to synchronize on the flush of the exec log to ensure that this race condition doesn't appear again? Maybe there's something about adding an additional query to the buffer of things waiting to be flushed to the exec log that I'm not seeing, let me know if so!

Based on what flush_server_logs is doing, what if we did something similar that's a bit more configurable? e.g.

proc flush_and_sync_file {filename, grep_text} {
    report "BEGIN FLUSH LOGS"
    system "kill -HUP `cat server_pid` 2>/dev/null"
    # Wait for flush to occur.
    system "for i in `seq 1 3`; do
              grep '$grep_text' $filename && exit 0;
              echo still waiting
              sleep 1
            done
            echo 'server failed to flush logs?'
            exit 1"
    report "END FLUSH LOGS"
}

# Then call in our test... NB: I'm not sure the exact filename
flush_and_sync_file "logs/db/logs/exec.log" "SELECT ..*550..* +"

I'm no expert when it comes to these TCL tests of course, just an idea. Lmk what you think!

…og line

The `TestDockerCLI_test_exec_log` test had a section where it
waits for the last executed stmt to appear in the exec_log file
before proceeding to the next section. This is done by grep'ing
the query string of the last stmt. This works fine most of the time,
 however, the 2nd last and last stmt executed happen to have the same
query string, so it's possible that we are only seeing the 2nd to last
stmt in the log. This can cause the test to fail as in the next section,
we expect all query logs to be present in the file.

This commit fixes this by issuing a distinct query as the last query
so that we can grep for a distinct query string to verify all stmts
should have been written.

A new proc is also added to `common.tcl`:
flush_and_sync_logs is a proc that takes 2 arguments, `filename` and
`grep_text`. It will trigger a signal hang up to trigger a log flush
and then attempt to find the `grep_text` in the specified file.

Epic: none
Fixes: cockroachdb#106367
@xinhaoz
Copy link
Member Author

xinhaoz commented Jul 27, 2023

pkg/cli/interactive_tests/test_exec_log.tcl line 117 at r1 (raw file):

Previously, abarganier (Alex Barganier) wrote…

Here's my mental model for this test:

  1. Issue some queries that will be logged to the exec log.
  2. Those query logs are buffered in CRDB, waiting to be flushed at some unpredictable time to the exec log.
  3. Trigger flush_server_logs, which returns once cockroach.log is flushed.
  4. Use the same loop logic to make assertions against the exec log, but this time look for a different query from the same file.

Don't we need to synchronize on the flush of the exec log to ensure that this race condition doesn't appear again? Maybe there's something about adding an additional query to the buffer of things waiting to be flushed to the exec log that I'm seeing, let me know if so!

Based on what flush_server_logs is doing, what if we did something similar that's a bit more configurable? e.g.

proc flush_and_sync_file {filename, grep_text} {
    report "BEGIN FLUSH LOGS"
    system "kill -HUP `cat server_pid` 2>/dev/null"
    # Wait for flush to occur.
    system "for i in `seq 1 3`; do
              grep '$grep_text' $filename && exit 0;
              echo still waiting
              sleep 1
            done
            echo 'server failed to flush logs?'
            exit 1"
    report "END FLUSH LOGS"
}

# Then call in our test... NB: I'm not sure the exact filename
flush_and_sync_file "logs/db/logs/exec.log" "SELECT ..*550..* +"

I'm no expert when it comes to these TCL tests of course, just an idea. Lmk what you think!

Sure, I like the idea of making it more generic 👍 I kept the grep text as the newly added line to make we wait for all logs to appear and increased the max attempts to 5. I'm not familiar with the log flushing process at that step so it might not be necessary so lmk!

@xinhaoz xinhaoz requested a review from abarganier July 27, 2023 20:59
Copy link
Contributor

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong: - nice work & thank you!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @xinhaoz)


pkg/cli/interactive_tests/test_exec_log.tcl line 117 at r1 (raw file):

Previously, xinhaoz (Xin Hao Zhang) wrote…

Sure, I like the idea of making it more generic 👍 I kept the grep text as the newly added line to make we wait for all logs to appear and increased the max attempts to 5. I'm not familiar with the log flushing process at that step so it might not be necessary so lmk!

I have no problem with increasing the # of attempts to 5. The most brittle piece of this is assuming that the flush will take under X seconds, which isn't guaranteed (although it is likely). Increasing from 3->5 should help with that!

@xinhaoz
Copy link
Member Author

xinhaoz commented Jul 28, 2023

TFTR!
bors r=abarganier

@craig
Copy link
Contributor

craig bot commented Jul 28, 2023

Build succeeded:

@craig craig bot merged commit 7ba8059 into cockroachdb:master Jul 28, 2023
@xinhaoz xinhaoz deleted the fix-exec-log branch April 1, 2024 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

acceptance: TestDockerCLI_test_exec_log failed

3 participants