Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qa: reduce fs:workload use of random selection #44486

Merged
merged 1 commit into from
May 25, 2022

Conversation

batrick
Copy link
Member

@batrick batrick commented Jan 6, 2022

It's more appropriate to use --subset to reduce the scheduling size. It
was previously laid out this way because we wanted to link to the common
qa/cephfs/mount directory so that ceph-fuse mounts are not needlessly
multiplied. We should just organize it correctly so that is not an
issue.

Signed-off-by: Patrick Donnelly pdonnell@redhat.com

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@batrick batrick added cephfs Ceph File System needs-review labels Jan 6, 2022
Copy link
Member

@gregsfortytwo gregsfortytwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@batrick batrick added the cleanup label Jan 6, 2022
@batrick
Copy link
Member Author

batrick commented Jan 6, 2022

jenkins test docs

@batrick
Copy link
Member Author

batrick commented Jan 6, 2022

https://shaman.ceph.com/builds/ceph/wip-pdonnell-testing-20220106.155636/

before:

(virtualenv) pdonnell@teuthology ~$ teuthology-suite --machine-type smithi --email pdonnell@redhat.com -p 99 --suite fs:workload --force-priority --ceph master --ceph-repo https://github.com/ceph/ceph.git --suite-repo https://github.com/ceph/ceph.git --dry-run |& tail -n3
2022-01-06 20:01:40,062.062 WARNING:teuthology.suite.run:Scheduled 2688/2688 jobs that are missing packages!
2022-01-06 20:01:40,085.085 INFO:teuthology.suite.util:Results: /home/pdonnell/teuthology/virtualenv/bin/teuthology-schedule --name pdonnell-2022-01-06_19:57:04-fs:workload-master-distro-basic-smithi --worker smithi --dry-run --priority 99 --last-in-suite --email pdonnell@redhat.com --timeout 43200
2022-01-06 20:01:40,085.085 INFO:teuthology.suite.run:Test results viewable at http://pulpito.front.sepia.ceph.com:80/pdonnell-2022-01-06_19:57:04-fs:workload-master-distro-basic-smithi/
(virtualenv) pdonnell@teuthology ~$ teuthology-suite --machine-type smithi --email pdonnell@redhat.com -p 99 --suite fs:workload --force-priority --ceph wip-pdonnell-testing-20220106.155636 --dry-run |& tail
...
2022-01-06 19:55:44,430.430 INFO:teuthology.suite.run:Suite fs:workload in /home/pdonnell/src/git.ceph.com_ceph-c_wip-pdonnell-testing-20220106.155636/qa/suites/fs/workload scheduled 12768 jobs.
2022-01-06 19:55:44,430.430 INFO:teuthology.suite.run:0/12768 jobs were filtered out.

looked through some of the jobs and the yaml constructions look correct.

@batrick
Copy link
Member Author

batrick commented Jan 10, 2022

The one thing I'm worried about with merging this is that because of this combinatorial explosion, we'll need to use a smaller subset which will simultaneously reduce coverage on other sub-suites. It'd be nice if we could indicate to teuthology a subset for each sub-suite.

@gregsfortytwo
Copy link
Member

The one thing I'm worried about with merging this is that because of this combinatorial explosion, we'll need to use a smaller subset which will simultaneously reduce coverage on other sub-suites. It'd be nice if we could indicate to teuthology a subset for each sub-suite.

We could re-separate the suites and have @yuriw schedule them separately.

You could also check what the shrinkage in other suites actually compresses down to — it may not be much!

@batrick
Copy link
Member Author

batrick commented Jan 11, 2022

With master and --subset X/32:

(virtualenv) pdonnell@teuthology ~$ teuthology-suite --machine-type smithi --email pdonnell@redhat.com -p 99 --suite fs --force-priority --ceph master --ceph-repo https://github.com/ceph/ceph.git --suite-repo https://github.com/ceph/ceph.git --dry-run --subset $((RANDOM % 32))/32 |& tail -n3
2022-01-11 02:33:36,864.864 WARNING:teuthology.suite.run:Scheduled 327/327 jobs that are missing packages!
2022-01-11 02:33:36,865.865 INFO:teuthology.suite.util:Results: /home/pdonnell/teuthology/virtualenv/bin/teuthology-schedule --name pdonnell-2022-01-11_02:33:03-fs-master-distro-basic-smithi --worker smithi --dry-run --priority 99 --last-in-suite --email pdonnell@redhat.com --timeout 43200
2022-01-11 02:33:36,865.865 INFO:teuthology.suite.run:Test results viewable at http://pulpito.front.sepia.ceph.com:80/pdonnell-2022-01-11_02:33:03-fs-master-distro-basic-smithi/
$ teuthology-suite --machine-type smithi --email pdonnell@redhat.com -p 99 --suite fs --force-priority --ceph master --ceph-repo https://github.com/ceph/ceph.git --suite-repo https://github.com/ceph/ceph.git --dry-run --subset $((RANDOM % 32))/32 |& grep Scheduling | grep fs/workload | wc -l
84

So ~25% of jobs were fs:workload.

For this PR:

(virtualenv) pdonnell@teuthology ~$ teuthology-suite --machine-type smithi --email pdonnell@redhat.com -p 99 --suite fs --force-priority --ceph wip-pdonnell-testing-20220106.155636 --dry-run --subset $((RANDOM % 32))/32 |& tail -n3
2022-01-11 02:36:21,382.382 INFO:teuthology.suite.run:0/642 jobs were filtered out.
2022-01-11 02:36:21,387.387 INFO:teuthology.suite.util:Results: /home/pdonnell/teuthology/virtualenv/bin/teuthology-schedule --name pdonnell-2022-01-11_02:36:01-fs-wip-pdonnell-testing-20220106.155636-distro-basic-smithi --worker smithi --dry-run --priority 99 --last-in-suite --email pdonnell@redhat.com --timeout 43200
2022-01-11 02:36:21,387.387 INFO:teuthology.suite.run:Test results viewable at http://pulpito.front.sepia.ceph.com:80/pdonnell-2022-01-11_02:36:01-fs-wip-pdonnell-testing-20220106.155636-distro-basic-smithi/
(virtualenv) pdonnell@teuthology ~$ teuthology-suite --machine-type smithi --email pdonnell@redhat.com -p 99 --suite fs --force-priority --ceph wip-pdonnell-testing-20220106.155636 --dry-run --subset $((RANDOM % 32))/32 |& grep Scheduling | grep fs/workload | wc -l
401

A jump to 62%!

I think either we need split the fs suite (yuck) or teuthology needs a way to (multiplicatively) subset only some sub-suites.

@batrick
Copy link
Member Author

batrick commented Jan 11, 2022

Switching this to draft, we shouldn't merge this as-is.

@batrick batrick marked this pull request as draft January 11, 2022 02:39
@gregsfortytwo
Copy link
Member

gregsfortytwo commented Jan 11, 2022

If you go up to 64 I get:

gregf@teuthology:~$ teuthology-suite --machine-type smithi --email pdonnell@redhat.com -p 99 --suite fs --force-priority --ceph wip-pdonnell-testing-20220106.155636 --dry-run --subset $((RANDOM % 64))/64 |& tail -n3
2022-01-11 15:45:51,706.706 INFO:teuthology.suite.run:0/388 jobs were filtered out.

versus

gregf@teuthology:~$ teuthology-suite --machine-type smithi --email pdonnell@redhat.com -p 99 --suite fs --force-priority --ceph wip-pdonnell-testing-20220106.155636 --dry-run --subset $((RANDOM % 64))/64 |& grep Scheduling | grep fs/workload | wc -l
198

So we're running 190 instead of (327-84=)243 non-workload jobs. I guess that's a noticeable loss.

Anyway, I don't think splitting out the suite would be so bad given the improvements in scheduling, but that's just me.

@batrick
Copy link
Member Author

batrick commented Jan 14, 2022

With ceph/teuthology#1704

master:

(virtualenv) pdonnell@teuthology ~$ teuthology-suite --machine-type smithi --email pdonnell@redhat.com -p 99 --suite fs --force-priority --ceph master --ceph-repo https://github.com/ceph/ceph.git --suite-repo https://github.com/ceph/ceph.git --dry-run --subset $((RANDOM % 32))/32 
2022-01-14 21:11:21,198.198 INFO:teuthology.suite:Using random seed=9257
2022-01-14 21:11:21,199.199 INFO:teuthology.suite.run:kernel sha1: distro
2022-01-14 21:11:21,835.835 INFO:teuthology.suite.run:ceph sha1: 581e10f9e8667bf38741e8a8abcd9a3c861d8ce1
2022-01-14 21:11:21,835.835 INFO:teuthology.suite.util:container build centos/8, checking for build_complete
2022-01-14 21:11:52,125.125 INFO:teuthology.suite.util:build not complete
2022-01-14 21:11:52,126.126 INFO:teuthology.suite.run:ceph version: None
2022-01-14 21:11:53,124.124 INFO:teuthology.suite.run:ceph branch: master 581e10f9e8667bf38741e8a8abcd9a3c861d8ce1
2022-01-14 21:11:53,130.130 INFO:teuthology.repo_utils:Fetching master from origin
2022-01-14 21:11:54,226.226 INFO:teuthology.repo_utils:Resetting repo at /home/pdonnell/src/github.com_ceph_ceph_master to origin/master
2022-01-14 21:11:54,611.611 INFO:teuthology.suite.run:teuthology branch: master 6fc2011361437a9dfe4e45b50de224392eed8abc
2022-01-14 21:11:54,624.624 INFO:teuthology.suite.build_matrix:Subset=9/32
2022-01-14 21:11:54,764.764 INFO:teuthology.suite.run:Suite fs in /home/pdonnell/src/github.com_ceph_ceph_master/qa/suites/fs generated 327 jobs (not yet filtered)

(virtualenv) pdonnell@teuthology ~$ teuthology-suite --machine-type smithi --email pdonnell@redhat.com -p 99 --suite fs --force-priority --ceph master --ceph-repo https://github.com/ceph/ceph.git --suite-repo https://github.com/ceph/ceph.git --dry-run --subset $((RANDOM % 32))/32 |& grep Scheduling | grep fs/workload | wc -l
84

this PR:

(virtualenv) pdonnell@teuthology ~$ teuthology-suite --machine-type smithi --email pdonnell@redhat.com -p 99 --suite fs --force-priority --ceph master --ceph-repo https://github.com/ceph/ceph.git --suite-repo https://github.com/batrick/ceph.git --suite-branch fs-workload-kclient-switches --dry-run --subset $((RANDOM % 32))/32 |& head -n 15
2022-01-14 21:00:55,400.400 INFO:teuthology.suite:Using random seed=5841
2022-01-14 21:00:55,401.401 INFO:teuthology.suite.run:kernel sha1: distro
2022-01-14 21:00:55,790.790 INFO:teuthology.suite.run:ceph sha1: 581e10f9e8667bf38741e8a8abcd9a3c861d8ce1
2022-01-14 21:00:55,790.790 INFO:teuthology.suite.util:container build centos/8, checking for build_complete
2022-01-14 21:00:56,044.044 INFO:teuthology.suite.util:build not complete
2022-01-14 21:00:56,044.044 INFO:teuthology.suite.run:ceph version: None
2022-01-14 21:00:56,581.581 INFO:teuthology.suite.run:ceph branch: fs-workload-kclient-switches 4893ab780e64c5577d2281fdc8b45c50219865f1
2022-01-14 21:00:56,582.582 INFO:teuthology.repo_utils:/home/pdonnell/src/github.com_batrick_ceph_fs-workload-kclient-switches was just updated or references a specific commit; assuming it is current
2022-01-14 21:00:56,583.583 INFO:teuthology.repo_utils:Resetting repo at /home/pdonnell/src/github.com_batrick_ceph_fs-workload-kclient-switches to origin/fs-workload-kclient-switches
2022-01-14 21:00:57,037.037 INFO:teuthology.suite.run:teuthology branch: master 6fc2011361437a9dfe4e45b50de224392eed8abc
2022-01-14 21:00:57,060.060 INFO:teuthology.suite.build_matrix:Subset=29/32
2022-01-14 21:00:57,226.226 INFO:teuthology.suite.run:Suite fs in /home/pdonnell/src/github.com_batrick_ceph_fs-workload-kclient-switches/qa/suites/fs generated 306 jobs (not yet filtered)

(virtualenv) pdonnell@teuthology ~$ teuthology-suite --machine-type smithi --email pdonnell@redhat.com -p 99 --suite fs --force-priority --ceph master --ceph-repo https://github.com/ceph/ceph.git --suite-repo https://github.com/batrick/ceph.git --suite-branch fs-workload-kclient-switches --dry-run --subset $((RANDOM % 32))/32 |& grep Scheduling | grep fs/workload | wc -l
63

(should probably reduce nested subset from 8 to 4)

Edit: corrected an erroneous command

@batrick batrick force-pushed the fs-workload-kclient-switches branch 2 times, most recently from 3c924fb to 574eb10 Compare February 14, 2022 18:23
It's more appropriate to use --subset to reduce the scheduling size. It
was previously laid out this way because we wanted to link to the common
`qa/cephfs/mount` directory so that ceph-fuse mounts are not needlessly
multiplied. We should just organize it correctly so that is not an
issue.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
@batrick
Copy link
Member Author

batrick commented Feb 14, 2022

With recent updates to ceph/teuthology#1704

==

master fs suite is currently 7728 jobs with 2016 fs:workload jobs, about 26%.

master --suite fs --subset 0/32 is 373 jobs with 63 fs:workload jobs, about 17%.

master --suite fs --subset 0/16 is 594 jobs with 126 fs:workload jobs, about 21%.

This branch --suite fs --no-nested-subset is 14448 jobs with 8736 fs:workload jobs, 60%.

This branch --suite fs --no-nested-subset --subset 0/32 is 583 jobs with 272 fs:workload jobs, 46%.

This branch --suite fs is 8400 jobs with 2688 fs:workload jobs, 32%.

This branch --suite fs --subset 0/32 is 394 jobs with 84 fs:workload jobs, 21%.

This branch --suite fs --subset 0/16 is 636 jobs with 168 fs:workload jobs, 26%.

Take-away is that for a typical subset --suite fs --subset ?/32, we go from 373 to 394 jobs and 17% to 21% fs:workload. This feels about right.

@batrick batrick marked this pull request as ready for review February 14, 2022 20:42
@vshankar
Copy link
Contributor

looks good. waiting for ceph/teuthology#1704 to be merged.

@batrick
Copy link
Member Author

batrick commented May 24, 2022

@vshankar this is good to merge now it hink.

@vshankar
Copy link
Contributor

@vshankar this is good to merge now it hink.

ofcourse.

@vshankar vshankar merged commit b76b6ea into ceph:master May 25, 2022
@batrick batrick deleted the fs-workload-kclient-switches branch May 25, 2022 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants