Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cdc/bank roachtest pull 260MB off a 3rd party vendor upon every CI run, and fails if upstream unavailable #51543

Open
knz opened this issue Jul 17, 2020 · 9 comments
Assignees
Labels
A-cdc Change Data Capture A-roachprod A-testing Testing tools and infrastructure branch-master Failures on the master branch. S-3-productivity Severe issues that impede the productivity of CockroachDB developers. T-cdc

Comments

@knz
Copy link
Contributor

knz commented Jul 17, 2020

Describe the problem

The cdc/bank roachtest runs the following command every time it runs:

 curl -s https://packages.confluent.io/archive/4.0/confluent-oss-4.0.0-2.11.tar.gz | tar -xz -C /tmp/confluent

I went and checked and that is a 262MB archive to download (compressed).

The archive is not cached, unlike the builder image, so that's a mandatory ingress cost on every CI run.

Moreover, today the upstream HTTP server is saying "no" and is causing all the CI runs to fails.

Expected behavior

The archive should be embedded in the builder image, and/or the fetch should use a cached copy if it was already downloaded earlier on the TC agent.

(At the very least we should be fetching from a proxy cache inside the CRL infra so that the CI downloads are internal to GCP).

cc @jlinder @tbg for triage.

Epic DEVINF-109

Jira issue: CRDB-4033

@knz knz added C-test-failure Broken test (automatically or manually discovered). S-3-productivity Severe issues that impede the productivity of CockroachDB developers. A-testing Testing tools and infrastructure A-cdc Change Data Capture A-roachprod labels Jul 17, 2020
@knz knz added this to To do in KV 20.2 via automation Jul 17, 2020
@knz knz changed the title cdc/bank roachtest pull 260MB off a 3rd party vendor upon every CI run, and fails if upstrream unavailable cdc/bank roachtest pull 260MB off a 3rd party vendor upon every CI run, and fails if upstream unavailable Jul 17, 2020
@knz
Copy link
Contributor Author

knz commented Jul 17, 2020

I have marked the 3 roachtests that use this facility as skipped.

knz added a commit to knz/cockroach that referenced this issue Jul 17, 2020
@jlinder jlinder added this to Backlog in Test Infrastructure Team Backlog via automation Jul 20, 2020
@knz
Copy link
Contributor Author

knz commented Jul 21, 2020

@mwang1026 @dt the KV team meeting concluded that since Bulk I/O is owning the CDC product area, the Bulk I/O team is responsible to enhance the testing infrastructure for CDC tests. So we're pushing this to your plate.

Note that the test is currently skipped. That means we disabled test coverage for CDC. That means that addressing this becomes critical path to the next release.

@knz knz added this to Triage in Disaster Recovery Backlog via automation Jul 21, 2020
@knz knz added this to Triage in [DEPRECATED] CDC via automation Jul 21, 2020
@knz knz removed this from To do in KV 20.2 Jul 21, 2020
@dt
Copy link
Member

dt commented Jul 21, 2020

Thanks @knz.

@mwang1026 we should potentially re-enable this for now -- while it'd be nice to have it cached, 262MB once a night is a pretty minimal cost (compared to, say, the vms), and while i hate flakes due non-reproducible builds depending on external infra, not testing at all is worse.

@jlinder
Copy link
Collaborator

jlinder commented Jul 21, 2020

It turns out that cdc/bank is one of the roachtests run on every PR build too.

https://github.com/cockroachdb/cockroach/blob/master/build/teamcity-local-roachtest.sh#L37

@dt dt moved this from Triage to Testing in Disaster Recovery Backlog Jul 21, 2020
@knz
Copy link
Contributor Author

knz commented Jul 22, 2020

Yes, in fact on every CI there are three (not one) tests that do this. So the archive gets downloaded and extracted 3 times.

It's not just our network ingress $$ that this impacts; the upstream server probably blocked us because we were incurring outrageous egress $$ on their side.

@blathers-crl
Copy link

blathers-crl bot commented Aug 16, 2023

cc @cockroachdb/cdc

@blathers-crl
Copy link

blathers-crl bot commented Aug 16, 2023

cc @cockroachdb/cdc

@kenliu-crl
Copy link
Contributor

reassigning this to CDC team as this has to do with the implementation of the roachtest.

@blathers-crl
Copy link

blathers-crl bot commented Aug 16, 2023

cc @cockroachdb/cdc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cdc Change Data Capture A-roachprod A-testing Testing tools and infrastructure branch-master Failures on the master branch. S-3-productivity Severe issues that impede the productivity of CockroachDB developers. T-cdc
Development

No branches or pull requests

8 participants