Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

teamcity: excessive number of full builds during each CI run #64263

Open
knz opened this issue Apr 27, 2021 · 3 comments
Open

teamcity: excessive number of full builds during each CI run #64263

knz opened this issue Apr 27, 2021 · 3 comments
Labels
A-build-system C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-productivity Severe issues that impede the productivity of CockroachDB developers. T-dev-inf

Comments

@knz
Copy link
Contributor

knz commented Apr 27, 2021

I just noticed that every CI runs compiles the entire crdb binary, including all web assets etc, a minimum of 7 times:

  • The acceptance CI target runs teamcity-acceptance.sh which in turn runs pkg/acceptance/prepare.sh which in turn runs builder.sh mkrelease linux-gnu

    In other words, the acceptance CI target builds its own crdb executable.

  • Meanwhile, the Build test binary CI target runs teamcity-build-test-binary.sh which in turn runs builder.sh mkrelease linux-gnu

    In other words, the "build test binary" CI target builds its own linux crdb executable.

  • Meanwhile, the roachtest CI target runs teamcity-local-roachtest.sh which in turn runs builder.sh make build

    In other words, the roachtest CI target builds its own linux crdb executable.

  • Meanwhile, the Compile builds CI target runs teamcity-compile-builds.sh which in turn runs pkg/cmd/compile-builds which in turn runs release.MakeReleases with "all targets" as arguments, which in turns run builder.sh mkrelease on each supported target.

    In other words, the "Compile builds" CI target builds 3+ different crdb executables, including the linux-gnu one.

  • Meanwhile, the verify archive CI target runs teamcity-verify-archive.sh which in turns create a tar archive of the source directory, then extracts it, then checks that make clean && make install works on it.

    In other words, the "verify archive" CI target builds its own linux crdb executable.

We could shave some time on the entire CI run (as well as disk space, time collecting artifacts etc) by only building the linux executable just one time as pre-requisite for other CI targets.

I would recommend using verify archive as a starting point, since we also want to assert that the code can be built from a downloadable source archive.

cc @jlinder for triage

Epic CRDB-10439

Jira issue: CRDB-6973

@knz knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-productivity Severe issues that impede the productivity of CockroachDB developers. A-build-system T-dev-inf labels Apr 27, 2021
@knz
Copy link
Contributor Author

knz commented Apr 27, 2021

Another reason why it would be desirable to have a build target as prereq for the other targets: in case the source doesn't build, we would only hog 1 agent to report that error and abort the entire CI pipeline, instead of spawning 8 agents upfront.

@knz
Copy link
Contributor Author

knz commented Apr 27, 2021

Here's a sketch of a dep tree:

  1. new variant of "verify archive" which extracts the source then runs builder.sh mkrelease linux-gnu (instead of make install), then stores the produced binary somewhere where the other targets can find it. Possibly also runs make install separately to ensure the binary produced by mkrelease can be installed.
  2. depending on (1): "compile builds" but exclude the linux binary, since it was already built in (1)
  3. depending on (1): roachtest, and remove the "make build" step from the script
  4. depending on (1): acceptance, and remove the "mkrelease" step from the script

Then we can remove the "Build test binary" step altogether.

Since the CI config change and the script updates are not atomic, we'll need to have a "transition phase" where the scripts work in both configurations. For this we can modify the scripts to expect a pre-built executable in a fixed path, and use if ! test -e path/to/executable; then ...build...; fi to build it on demand if it's not there yet.

erikgrinaker added a commit to erikgrinaker/cockroach that referenced this issue Jul 26, 2021
The TeamCity nightly tests depend on the "publish bleeding edge" build,
which pulls in a ton of additional builds (e.g. for multiple OSes) that
we don't really need to run the nightlies. This can cause unrelated
build failures to fail the nightly tests.

This patch explicitly builds a CRDB binary in
`teamcity-nightly-roachtest.sh` such that the build can be decoupled
from "publish bleeding edge". This is a temporary mitigation until the
TeamCity builds and dependencies are cleaned up.

Touches https://cockroachlabs.atlassian.net/browse/DEVINF-16.
Touches cockroachdb#64263.

Release note: None
erikgrinaker added a commit to erikgrinaker/cockroach that referenced this issue Jul 26, 2021
The TeamCity nightly tests depend on the "publish bleeding edge" build,
which pulls in a ton of additional builds (e.g. for multiple OSes) that
we don't really need to run the nightlies. This can cause unrelated
build failures to fail the nightly tests.

This patch explicitly builds a CRDB binary in
`teamcity-nightly-roachtest.sh` such that the build can be decoupled
from "publish bleeding edge". This is a temporary mitigation until the
TeamCity builds and dependencies are cleaned up.

Touches https://cockroachlabs.atlassian.net/browse/DEVINF-16.
Touches cockroachdb#64263.

Release note: None
erikgrinaker added a commit to erikgrinaker/cockroach that referenced this issue Jul 28, 2021
The TeamCity nightly tests depend on the "publish bleeding edge" build,
which pulls in a ton of additional builds (e.g. for multiple OSes) that
we don't really need to run the nightlies. This can cause unrelated
build failures to fail the nightly tests.

This patch explicitly builds a CRDB binary in
`teamcity-nightly-roachtest.sh` such that the build can be decoupled
from "publish bleeding edge". This is a temporary mitigation until the
TeamCity builds and dependencies are cleaned up.

Touches https://cockroachlabs.atlassian.net/browse/DEVINF-16.
Touches cockroachdb#64263.

Release note: None
erikgrinaker added a commit to erikgrinaker/cockroach that referenced this issue Jul 28, 2021
The TeamCity nightly tests depend on the "publish bleeding edge" build,
which pulls in a ton of additional builds (e.g. for multiple OSes) that
we don't really need to run the nightlies. This can cause unrelated
build failures to fail the nightly tests.

This patch explicitly builds a CRDB binary in
`teamcity-nightly-roachtest.sh` such that the build can be decoupled
from "publish bleeding edge". This is a temporary mitigation until the
TeamCity builds and dependencies are cleaned up.

Touches https://cockroachlabs.atlassian.net/browse/DEVINF-16.
Touches cockroachdb#64263.

Release note: None
erikgrinaker added a commit to erikgrinaker/cockroach that referenced this issue Jul 29, 2021
The TeamCity nightly tests depend on the "publish bleeding edge" build,
which pulls in a ton of additional builds (e.g. for multiple OSes) that
we don't really need to run the nightlies. This can cause unrelated
build failures to fail the nightly tests.

This patch explicitly builds a CRDB binary in
`teamcity-nightly-roachtest.sh` such that the build can be decoupled
from "publish bleeding edge". This is a temporary mitigation until the
TeamCity builds and dependencies are cleaned up.

Touches https://cockroachlabs.atlassian.net/browse/DEVINF-16.
Touches cockroachdb#64263.

Release note: None
craig bot pushed a commit that referenced this issue Jul 29, 2021
67325: build: add TeamCity build script to stress roachtests r=tbg a=erikgrinaker

### roachtest: handle multiple runs of same test on TeamCity

The TeamCity markers output via `--teamcity` currently do not handle
multiple runs of the same test via `--count`, since they use the same
flow ID for all runs, making TeamCity unable to distinguish between
them. A similar problem exists with artifacts, where each run would
publish the root `artifacts/` directory rather than the `run_`
subdirectories.

This patch fixes the above issues by including run identifiers where
appropriate.

Release note: None

### roachtest: add roachtest/noop test

This adds a `roachtest/noop` test that instantly succeeds without
spinning up any servers, and a `roachtest/noop-maybefail` test that
randomly fails with 20% probability. These can be useful when testing
e.g. CI builds or other test infrastructure. They are disabled by
default and require the `roachtest` tag.

Release note: None

### roachtest: add --disable-issue flag to disable GitHub issues

Release note: None

### build: add TeamCity build script to stress roachtests

This adds a build script that can be used to stress roachtests on
TeamCity, in order to reproduce failures. The corresponding TeamCity
build is `Cockroach_Nightlies_RoachtestStress`.

Release note: None

### build: explicitly build binary for TeamCity nightly tests

The TeamCity nightly tests depend on the "publish bleeding edge" build,
which pulls in a ton of additional builds (e.g. for multiple OSes) that
we don't really need to run the nightlies. This can cause unrelated
build failures to fail the nightly tests.

This patch explicitly builds a CRDB binary in
`teamcity-nightly-roachtest.sh` such that the build can be decoupled
from "publish bleeding edge". This is a temporary mitigation until the
TeamCity builds and dependencies are cleaned up.

Touches https://cockroachlabs.atlassian.net/browse/DEVINF-16.
Touches #64263.

Release note: None

---

This is still a work-in-progress, but I suggest we merge the script itself
which makes it easier to generalize the TeamCity build and work on e.g.
branch specification.

There's a bit of work remaining on the actual build, which will be addressed
separately:

* Notifications: TeamCity does not expose the email address of the person
  who started the build, so for now we'll send notifications to the Slack
  channel `#roachtest-stress-ops`.

* Running clusters: We need to let the user know about any test clusters
  that are still running, so that these can be inspected and shut down as
  appropriate. We'll probably also need to change how roachtest extends
  cluster lifetimes, since we want to reuse clusters for tests but make sure
  every test invocation will extend the lifetime to 36 hours from the start
  of the test (not the cluster creation, which is currently the case).

* Trigger link: We should include a link on test failures to a TeamCity form
  for starting a stress build (if possible), or at least instructions for how
  to start one.

* Branch/commit selection: We need to make sure it's easy for users to pick
  which branch or commit to build.

68134: tenantcostserver: add TenantID to TokenBucketRequest r=RaduBerinde a=RaduBerinde

This should address the "error issuing TokenBucket RPC" loop observed in #65830.

Informs #65830.

#### tenantcostserver: add Error field

This change adds an Error field to TokenBucketResponse, allowing the
Connector to differentiate between RPC errors and logical errors
(similar to RangeLookup).

This fixes an infinite retry loop caused by a logical issue.

Release note: None

#### tenantcostserver: add TenantID to TokenBucketRequest

We are using TenantFromContext to identify the tenant from which a
TenantBucketRequest is coming from (similarly to other connector
APIs). Unfortunately, this does not work when the connections are
insecure (as is the case in some tests and possibly when testing
manually during development).

This commit adds a TenantID field. It is cross-checked against the
tenant when running in secure mode.

Release note: None

Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-build-system C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-productivity Severe issues that impede the productivity of CockroachDB developers. T-dev-inf
Projects
None yet
Development

No branches or pull requests

1 participant