Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server,etc: Add detailed tracing to the startup process #8712

Merged
merged 1 commit into from Aug 21, 2016

Conversation

bdarnell
Copy link
Member

@bdarnell bdarnell commented Aug 20, 2016

This will allow us to test the hypothesis in #8694 that our startup
time is resulting in a 20-second timeout in GRPC's TLS handshake.


This change is Reviewable

@bdarnell
Copy link
Member Author

@cockroachdb/stability

@bdarnell bdarnell added the S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting label Aug 20, 2016
@RaduBerinde
Copy link
Member

Hi Ben,

The change LGTM, however it's going to conflict on develop with #8625 which plumbs a "base" context through to Store (CCing @andreimatei as well)


Review status: 0 of 6 files reviewed at latest revision, all discussions resolved, some commit checks failed.


Comments from Reviewable

@bdarnell
Copy link
Member Author

Are there changes that could be made to this PR that would minimize the conflicts, or are you volunteering for the next master -> develop merge?

@RaduBerinde
Copy link
Member

I can work on the merge. I'm not entirely sure how to best implement it under the model we've been working on, but we'll figure it out. Can you leave some basic instructions in the commit message on how you tested the change?

@tamird: I think the best strategy would be to merge everything up to this commit, then I will work on a PR for this, and once that's merged the normal process can resume.


Review status: 0 of 6 files reviewed at latest revision, all discussions resolved, some commit checks failed.


Comments from Reviewable

@petermattis
Copy link
Collaborator

:lgtm:

I made some comments below about adding more detail to the trace, but as I think about it now, that probably isn't necessary if you're primarily looking at this for timing information.

Was there a strong reason to use tracing to gather this information? Seems like you could have also used standard log messages.


Review status: 0 of 10 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.


cli/start.go, line 356 [r1] (raw file):

  if err := serverCtx.InitStores(stopper); err != nil {
      return fmt.Errorf("failed to initialize stores: %s", err)
  }

log.Trace(startCtx, "initialized engines")?


server/node.go, line 402 [r1] (raw file):

  for _, e := range engines {
      s := storage.NewStore(n.ctx, e, &n.Descriptor)
      log.Trace(ctx, "created store object")

Can/should we output the store and engine?

log.Tracef(ctx, "created store: %s [%s] ", s, e)

server/node.go, line 462 [r1] (raw file):

      n.initNodeID(0)
      n.initialBoot = true
      log.Trace(ctx, "allocated node ID")

Output node ID?


server/server.go, line 329 [r1] (raw file):

      return err
  }
  log.Tracef(ctx, "bound to port %s", s.ctx.Addr)

Well, we're listening too, right? s/bound to port/listening on/g


server/server.go, line 468 [r1] (raw file):

      netutil.FatalIfUnexpected(m.Serve())
  })
  log.Trace(ctx, "starting to accept connections")

s/starting to accept/accepting/g


Comments from Reviewable

@bdarnell
Copy link
Member Author

Re: tracing vs logging: traces are much more accessible than logging. The beta cluster is currently retaining less than a day's worth of logging history (which is a problem in its own right...) and even if it were there we'd have to grep through a lot of rotated log files to find the one that contained the server start. The trace makes it network-accessible and retains it for the life of the server (it would be even better if we refactored to let us use the net/trace "family" parameter; for now we're relying on the fact that this trace is a duration outlier). I also don't think we'd be happy with this level of verbosity in log files, but it's fine in a trace.

To test this, I've been applying this patch to basictracer-go/events/event_nettrace.go:

diff --git a/events/event_nettrace.go b/events/event_nettrace.go
index 08dcc81..e3d540a 100644
--- a/events/event_nettrace.go
+++ b/events/event_nettrace.go
@@ -13,6 +13,7 @@ var NetTraceIntegrator = func() func(basictracer.SpanEvent) {
        switch t := e.(type) {
        case basictracer.EventCreate:
            tr = trace.New("tracing", t.OperationName)
+           tr.SetMaxEvents(100)
        case basictracer.EventFinish:
            tr.Finish()
        case basictracer.EventLog:

and viewing the results in the UI at http://localhost:8080/debug/requests?fam=tracing&b=4&exp=1.


Review status: 0 of 10 files reviewed at latest revision, 5 unresolved discussions, some commit checks pending.


cli/start.go, line 356 [r1] (raw file):

Previously, petermattis (Peter Mattis) wrote…

log.Trace(startCtx, "initialized engines")?

The method name InitStores is misleading; it's just parsing the store-related command-line flags here.

server/node.go, line 402 [r1] (raw file):

Previously, petermattis (Peter Mattis) wrote…

Can/should we output the store and engine?

log.Tracef(ctx, "created store: %s [%s] ", s, e)
The store isn't initialized enough to have a useful string representation, but added the engine.

server/node.go, line 462 [r1] (raw file):

Previously, petermattis (Peter Mattis) wrote…

Output node ID?

Done.

server/server.go, line 329 [r1] (raw file):

Previously, petermattis (Peter Mattis) wrote…

Well, we're listening too, right? s/bound to port/listening on/g

Done.

Comments from Reviewable

This will allow us to test the hypothesis in cockroachdb#8694 that our startup
time is resulting in a 20-second timeout in GRPC's TLS handshake.
@tamird
Copy link
Contributor

tamird commented Aug 21, 2016

Reviewed 4 of 10 files at r1, 5 of 6 files at r2.
Review status: 9 of 10 files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.


server/server.go, line 329 [r1] (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Done.

s.ctx.Addr is more than just a port

Comments from Reviewable

@bdarnell bdarnell merged commit 6c01f47 into cockroachdb:master Aug 21, 2016
@bdarnell bdarnell deleted the startup-tracing branch August 21, 2016 13:55
knz added a commit to knz/cockroach that referenced this pull request Oct 20, 2022
Prior to this patch, the code in `runStartSQL` was using
a different sequence of steps than in `runStart`, even
for those steps that are beneficial to run in both cases.

This commit fixes that. In particular it adds the following
missing bits to `cockroach mt start-sql`:

- it fixes support for the (test-only)
  `COCKROACH_EXPERIMENTAL_LINEARIZABLE` env var. (from cockroachdb#4754)

- it adds a tracing span for the startup code. (from cockroachdb#8712)

- it properly supports `--listening-url-file`. (from cockroachdb#15468)

- it properly does sanitization of `--external-io-dir`. (from cockroachdb#19725)

- it sets the proper log severity level for gRPC. (from cockroachdb#20308)

- it reports the command-line and env config to logs. (from cockroachdb#21344)

- it prevents startup if there is a `_CRITICAL_ALERT.txt` file
  in the store directory. (from cockroachdb#42401)

- sets the umask for newly created file to remove "other" permission
  bits. This was a security team request originally. (from cockroachdb#44043)

- it sets `GOMAXPROCS` from current cgroup limits. (from cockroachdb#57390)

- it stops the server early if the storage disk is full. (from cockroachdb#66893)

- it fixes support for the `COCKROACH_DISABLE_SPAN_CONFIGS` config env
  var. (from cockroachdb#73876)

To review this commit, it is useful to open the files
`start.go` and `mt_start_sql.go` side-by-side, which
clarifies how much closer they have become to each other.
Looking at the `mt_start_sql.go` changes without
that context likely makes the review more difficult.

A later commit will actually merge the two code paths together so
there is just one thing to maintain.

Release note: None
knz added a commit to knz/cockroach that referenced this pull request Oct 28, 2022
Prior to this patch, the code in `runStartSQL` was using
a different sequence of steps than in `runStart`, even
for those steps that are beneficial to run in both cases.

This commit fixes that. In particular it adds the following
missing bits to `cockroach mt start-sql`:

- it fixes support for the (test-only)
  `COCKROACH_EXPERIMENTAL_LINEARIZABLE` env var. (from cockroachdb#4754)

- it adds a tracing span for the startup code. (from cockroachdb#8712)

- it properly supports `--listening-url-file`. (from cockroachdb#15468)

- it properly does sanitization of `--external-io-dir`. (from cockroachdb#19725)

- it sets the proper log severity level for gRPC. (from cockroachdb#20308)

- it reports the command-line and env config to logs. (from cockroachdb#21344)

- it prevents startup if there is a `_CRITICAL_ALERT.txt` file
  in the store directory. (from cockroachdb#42401)

- sets the umask for newly created file to remove "other" permission
  bits. This was a security team request originally. (from cockroachdb#44043)

- it sets `GOMAXPROCS` from current cgroup limits. (from cockroachdb#57390)

- it stops the server early if the storage disk is full. (from cockroachdb#66893)

- it fixes support for the `COCKROACH_DISABLE_SPAN_CONFIGS` config env
  var. (from cockroachdb#73876)

To review this commit, it is useful to open the files
`start.go` and `mt_start_sql.go` side-by-side, which
clarifies how much closer they have become to each other.
Looking at the `mt_start_sql.go` changes without
that context likely makes the review more difficult.

A later commit will actually merge the two code paths together so
there is just one thing to maintain.

Release note: None
knz added a commit to knz/cockroach that referenced this pull request Oct 28, 2022
Prior to this patch, the code in `runStartSQL` was using
a different sequence of steps than in `runStart`, even
for those steps that are beneficial to run in both cases.

This commit fixes that. In particular it adds the following
missing bits to `cockroach mt start-sql`:

- it fixes support for the (test-only)
  `COCKROACH_EXPERIMENTAL_LINEARIZABLE` env var. (from cockroachdb#4754)

- it adds a tracing span for the startup code. (from cockroachdb#8712)

- it properly supports `--listening-url-file`. (from cockroachdb#15468)

- it properly does sanitization of `--external-io-dir`. (from cockroachdb#19725)

- it sets the proper log severity level for gRPC. (from cockroachdb#20308)

- it reports the command-line and env config to logs. (from cockroachdb#21344)

- it prevents startup if there is a `_CRITICAL_ALERT.txt` file
  in the store directory. (from cockroachdb#42401)

- sets the umask for newly created file to remove "other" permission
  bits. This was a security team request originally. (from cockroachdb#44043)

- it sets `GOMAXPROCS` from current cgroup limits. (from cockroachdb#57390)

- it stops the server early if the storage disk is full. (from cockroachdb#66893)

- it fixes support for the `COCKROACH_DISABLE_SPAN_CONFIGS` config env
  var. (from cockroachdb#73876)

To review this commit, it is useful to open the files
`start.go` and `mt_start_sql.go` side-by-side, which
clarifies how much closer they have become to each other.
Looking at the `mt_start_sql.go` changes without
that context likely makes the review more difficult.

A later commit will actually merge the two code paths together so
there is just one thing to maintain.

Release note: None
knz added a commit to knz/cockroach that referenced this pull request Oct 31, 2022
Prior to this patch, the code in `runStartSQL` was using
a different sequence of steps than in `runStart`, even
for those steps that are beneficial to run in both cases.

This commit fixes that. In particular it adds the following
missing bits to `cockroach mt start-sql`:

- it fixes support for the (test-only)
  `COCKROACH_EXPERIMENTAL_LINEARIZABLE` env var. (from cockroachdb#4754)

- it adds a tracing span for the startup code. (from cockroachdb#8712)

- it properly supports `--listening-url-file`. (from cockroachdb#15468)

- it properly does sanitization of `--external-io-dir`. (from cockroachdb#19725)

- it sets the proper log severity level for gRPC. (from cockroachdb#20308)

- it reports the command-line and env config to logs. (from cockroachdb#21344)

- it prevents startup if there is a `_CRITICAL_ALERT.txt` file
  in the store directory. (from cockroachdb#42401)

- sets the umask for newly created file to remove "other" permission
  bits. This was a security team request originally. (from cockroachdb#44043)

- it sets `GOMAXPROCS` from current cgroup limits. (from cockroachdb#57390)

- it stops the server early if the storage disk is full. (from cockroachdb#66893)

- it fixes support for the `COCKROACH_DISABLE_SPAN_CONFIGS` config env
  var. (from cockroachdb#73876)

To review this commit, it is useful to open the files
`start.go` and `mt_start_sql.go` side-by-side, which
clarifies how much closer they have become to each other.
Looking at the `mt_start_sql.go` changes without
that context likely makes the review more difficult.

A later commit will actually merge the two code paths together so
there is just one thing to maintain.

Release note: None
craig bot pushed a commit that referenced this pull request Oct 31, 2022
90176: cli/start: unify code between `cockroach start` and `cockroach mt start-sql` r=andreimatei a=knz

Fixes #89974.
Fixes #90831.
Fixes #90524.

This PR merges the server initialization code between `cockroach start` and `cockroach mt start-sql`.

In doing so, it brings `cockroach mt start-sql` closer to what we expect from proper CockroachDB server processes:

- it fixes support for the (test-only)
  `COCKROACH_EXPERIMENTAL_LINEARIZABLE` env var. (from #4754)

- it adds a tracing span for the startup code. (from #8712!!)

- it properly supports `--listening-url-file`. (from #15468)

- it properly does sanitization of `--external-io-dir`. (from #19725)

- it sets the proper log severity level for gRPC. (from #20308)

- it reports the command-line and env config to logs. (from #21344)

- it prevents startup if there is a `_CRITICAL_ALERT.txt` file
  in the store directory. (from #42401)

- sets the umask for newly created file to remove "other" permission
  bits. This was a security team request originally. (from #44043)

- it recovers support for `DISABLE_STARTING_BACKGROUND_JOBS`. (from #44786)

- it sets `GOMAXPROCS` from current cgroup limits. (from #57390)

- it stops the server early if the storage disk is full. (from #66893)

- it fixes support for the `COCKROACH_DISABLE_SPAN_CONFIGS` config env
  var. (from #73876)

See the individual commit for details.



90660: sql: add contention_events to cluster_execution_insights r=j82w a=j82w

The original contention column will remain
to make query operations faster. The events
are being put into a json column because it's
possible there could be multiple blocking events
for a single statement. The json column avoids the complexity of adding another table and keeping it
in sync with the insights table.

The table can be joined with index_columns and tables to get the database name, table name, and index name that contention occurred on. This does not contain the blocking statement information, and the blocking fingerprint id.

closes: #88561

Release note (sql change): Adds contention_events
to cluster_execution_insights. This is used
to see which transaction is blocking the specific
statement.

90719: opgen: added a bool field in struct opgen.transition r=Xiang-Gu a=Xiang-Gu

This PR adds a bool field in struct opgen.transition that indicates whether it results from a `equiv(xx)` transition spec in the opgen file. It will be useful for a test where we need to find the inital status on a adding/dropping path. Without such a change, it can be problematic if we have a `equiv(xx)` spec as the first transition. E.g.
```
  ToAbsent(
    PUBLIC,
    equiv(VALIDATED),
    to(WRITE_ONLY),
    to(ABSENT),
  )
```
Without this change, the inital status will confusingly be `VALIDATED`, and the next status will be `PUBLIC`.
With this change, the initial status will be `PUBLIC`, and the next status will be `WRITE_ONLY`.

We also added some comments when we make transitions from the specs.

Epic: None
Release note: None

90865: sql: use bare name string of new pk  to compare with pk name when altering primary key r=chengxiong-ruan a=chengxiong-ruan

Fixes #90836

Release note (sql change): previously, the `DROP CONSTRAINT, ADD CONSTRAINT` in one trick to have a new primary key without moving old primary key to be a secondary index didn't work if the primary key name is a reserved SQL keyword. A `constraint already exists` error was returned. This patch fixed the bug, the trick now also works with primary key named as reserved keywords.

Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Co-authored-by: j82w <jwilley@cockroachlabs.com>
Co-authored-by: Xiang Gu <xiang@cockroachlabs.com>
Co-authored-by: Chengxiong Ruan <chengxiongruan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants