Optimize snapshot and report generation threads in `create_snapshots_reports_scorecard.py` #106

dav3r · 2024-01-26T16:05:49Z

🗣 Description

This PR gives work to threads more efficiently when creating snapshots and reports in extras/create_snapshots_reports_scorecard.py.

Note that this does not include any changes to the way third-party snapshots and reports are generated. That should be done in a later PR (see issue #60).

I also took this opportunity to improve a few logging statements that did not include the thread name. This will make the log output easier to understand.

💭 Motivation and context

The current code splits the list of snapshots/reports to be generating into a group of sub-lists and assigns each thread one of those sub-lists to work on. This PR improves upon this by enabling each thread to pull one snapshot or report (to be generated) from a single list and generating it, repeating the process until the list is empty, at which point each thread exits. This is a more efficient way to utilize the threads. It should result in a faster overall processing time by preventing some threads from having to generate multiple long-running snapshots/reports, while other threads are sitting idle because they have already completed their work.

This PR resolves #62. Since I was making related changes here, I also included changes to resolve #59.

🧪 Testing

I tested this by commenting out or not executing (via switches) parts of the script that were not relevant (e.g. pausing/resuming the commander, creating the sample report, creating third-party snapshots/reports, etc). Then I temporarily modified the script so that it ran a simple sleep command instead of cyhy-snapshot and cyhy-report. This allowed me to run the entire script (e.g. ./create_snapshots_reports_scorecard.py --no-dock --no-log --no-pause cyhy-read-only scan-read-only) and I confirmed that the threading and list consumption worked as intended.

Sample test run output (sanitized and snipped for clarity; note I also limited the number of reports here to generate to 100 for a quicker test run):

2024-01-26 10:29:15,075 INFO - BEGIN
2024-01-26 10:29:15,075 INFO - Building list of reports to generate...
2024-01-26 10:29:15,085 INFO - Building list of snapshots to generate...
2024-01-26 10:33:53,529 DEBUG - 100 snapshots to generate: [u'AAA', u'BBB', u'CCC', ...]
2024-01-26 10:33:53,529 DEBUG - [Thread-1] 100 snapshot(s) left to generate
2024-01-26 10:33:53,529 INFO - [Thread-1] Starting snapshot for: AAA
2024-01-26 10:33:53,562 DEBUG - [Thread-2] 99 snapshot(s) left to generate
2024-01-26 10:33:53,563 INFO - [Thread-2] Starting snapshot for: BBB
2024-01-26 10:33:53,563 DEBUG - [Thread-3] 98 snapshot(s) left to generate
2024-01-26 10:33:53,564 INFO - [Thread-3] Starting snapshot for: CCC
...
2024-01-26 10:33:54,603 INFO - [Thread-1] Successful snapshot: AAA (1.06 s)
2024-01-26 10:33:54,604 DEBUG - [Thread-1] 68 snapshot(s) left to generate
2024-01-26 10:33:54,605 INFO - [Thread-1] Starting snapshot for: YYY
2024-01-26 10:33:54,645 INFO - [Thread-3] Successful snapshot: CCC (1.04 s)
2024-01-26 10:33:54,699 INFO - [Thread-2] Successful snapshot: BBB (1.03 s)
...
2024-01-26 10:33:59,814 INFO - [Thread-1] Successful snapshot: ZZZ (2.04 s)
2024-01-26 10:33:59,814 DEBUG - [Thread-1] 0 snapshot(s) left to generate
2024-01-26 10:33:59,814 INFO - [Thread-1] No snapshots left to generate - thread exiting
...
2024-01-26 10:34:00,023 INFO - Time to complete snapshots: 4.75 minutes
2024-01-26 10:34:00,024 INFO - Longest Snapshots:
2024-01-26 10:34:00,024 INFO - GGG: 2.1 seconds
2024-01-26 10:34:00,025 INFO - HHH: 2.1 seconds
2024-01-26 10:34:00,025 INFO - MMM: 2.1 seconds
...
2024-01-26 10:34:00,026 DEBUG - 100 reports to generate: [u'AAA', u'BBB', u'CCC', ...]
2024-01-26 10:34:00,027 DEBUG - [Thread-33] 100 reports left to generate
2024-01-26 10:34:00,027 INFO - [Thread-33] Starting report for: AAA
2024-01-26 10:34:00,533 DEBUG - [Thread-34] 99 reports left to generate
2024-01-26 10:34:00,534 INFO - [Thread-34] Starting report for: BBB
2024-01-26 10:34:01,037 DEBUG - [Thread-35] 98 reports left to generate
2024-01-26 10:34:01,038 INFO - [Thread-35] Starting report for: CCC
2024-01-26 10:34:01,071 INFO - [Thread-33] Successful report generated: AAA (1.04 s)
2024-01-26 10:34:01,071 DEBUG - [Thread-33] 97 reports left to generate
2024-01-26 10:34:01,071 INFO - [Thread-33] Starting report for: FFF
...
2024-01-26 10:34:02,108 INFO - [Thread-33] Successful report generated: FFF (1.04 s)
2024-01-26 10:34:02,109 DEBUG - [Thread-33] 94 reports left to generate
2024-01-26 10:34:02,109 INFO - [Thread-33] Starting report for: JJJ
2024-01-26 10:34:02,582 INFO - [Thread-34] Successful report generated: BBB (2.05 s)
2024-01-26 10:34:02,583 DEBUG - [Thread-34] 92 reports left to generate
2024-01-26 10:34:02,583 INFO - [Thread-34] Starting report for: NNN
...
2024-01-26 10:34:03,086 INFO - [Thread-35] Successful report generated: CCC (2.05 s)
2024-01-26 10:34:03,087 DEBUG - [Thread-35] 89 reports left to generate
2024-01-26 10:34:03,087 INFO - [Thread-35] Starting report for: PPP
...
2024-01-26 10:34:14,273 INFO - [Thread-47] Successful report generated: ZZZ (2.03 s)
2024-01-26 10:34:14,273 DEBUG - [Thread-47] 0 reports left to generate
2024-01-26 10:34:14,273 INFO - [Thread-47] No reports left to generate - exiting
2024-01-26 10:34:14,274 INFO - Time to complete reports: 0.24 minutes
2024-01-26 10:34:14,274 INFO - Longest Reports:
2024-01-26 10:34:14,274 INFO - QQQ: 2.1 seconds
2024-01-26 10:34:14,274 INFO - TTT: 2.1 seconds
2024-01-26 10:34:14,274 INFO - VVV: 2.1 seconds
...
2024-01-26 10:34:14,275 INFO - Time to complete reports: 0.24 minutes
2024-01-26 10:34:14,275 INFO - Number of snapshots generated: 115
2024-01-26 10:34:14,276 INFO -   Third-party snapshots generated: 0
2024-01-26 10:34:14,276 INFO - Number of snapshots failed: 0
2024-01-26 10:34:14,276 INFO -   Third-party snapshots failed: 0
2024-01-26 10:34:14,276 INFO - Number of reports generated: 100
2024-01-26 10:34:14,276 INFO -   Third-party reports generated: 0
2024-01-26 10:34:14,277 INFO - Number of reports failed: 0
2024-01-26 10:34:14,277 INFO -   Third-party reports failed: 0
2024-01-26 10:34:14,277 INFO - Total time: 4.99 minutes
2024-01-26 10:34:14,277 INFO - END

I also ran a test against the entire Production set of orgs and confirmed that everything worked fine (from a thread and work consumption standpoint) with a larger set of snapshots and reports.

✅ Pre-approval checklist

This PR has an informative and human-readable title.
Changes are limited to a single goal - eschew scope creep!
All relevant type-of-change labels have been added.
I have read the CONTRIBUTING document.
These code changes follow cisagov code standards.
All new and existing tests pass.

✅ Post-merge checklist

Deploy the updated script to Production.
Validate that the updated script runs successfully in Production.

We now pull individual org IDs from the global list of snapshots_to_generate and pass them into the create_snapshot function instead of passing in the entire list of orgs. Each thread checks to see if the global list is empty; if it isn't, it grabs an org ID and generates a snapshot for it. If the global list is empty, the thread exits.

Also, add a new debug log statement.

This was overlooked in the past so I figured I'd fix it while I'm here.

This means using a single global list of reports_to_generate that each thread pulls from, rather than assigning a pre-determined list of reports to each thread.

jsf9k

Requesting a few minor changes.

It's probably also worth at least creating an issue for more gracefully handling snapshot or report generation failures. Right now I think the corresponding thread will crash and exit its thread loop and we will lose that bit of concurrency. At a minimum we should do some error handling to ensure that the thread does not exit, and eventually we probably want to put the item back onto the queue for another chance before abandoning all hope.

extras/create_snapshots_reports_scorecard.py

Co-authored-by: Shane Frasier <jeremy.frasier@gwe.cisa.dhs.gov>

dav3r · 2024-01-26T18:51:58Z

It's probably also worth at least creating an issue for more gracefully handling snapshot or report generation failures. Right now I think the corresponding thread will crash and exit its thread loop and we will lose that bit of concurrency. At a minimum we should do some error handling to ensure that the thread does not exit, and eventually we probably want to put the item back onto the queue for another chance before abandoning all hope.

I don't believe we will lose the thread if a failure occurs during snapshot or report generation because those commands (cyhy-snapshot and cyhy-report) are run in a sub-process, whose return code is checked by the thread. If I'm misunderstanding you or you think I'm wrong, please let me know.

I think that if a failure occurs, we will end up in the same position as we are now (before this PR). Namely, we will see in the output that there was one or more failed snapshots/reports and we will have to manually attempt to generate them later. I will still create an issue to document this where we can discuss options for automatically re-attempting the failed snapshots/reports before giving up.

jsf9k · 2024-01-26T19:41:47Z

It's probably also worth at least creating an issue for more gracefully handling snapshot or report generation failures. Right now I think the corresponding thread will crash and exit its thread loop and we will lose that bit of concurrency. At a minimum we should do some error handling to ensure that the thread does not exit, and eventually we probably want to put the item back onto the queue for another chance before abandoning all hope.

I don't believe we will lose the thread if a failure occurs during snapshot or report generation because those commands (cyhy-snapshot and cyhy-report) are run in a sub-process, whose return code is checked by the thread. If I'm misunderstanding you or you think I'm wrong, please let me know.

I think that if a failure occurs, we will end up in the same position as we are now (before this PR). Namely, we will see in the output that there was one or more failed snapshots/reports and we will have to manually attempt to generate them later. I will still create an issue to document this where we can discuss options for automatically re-attempting the failed snapshots/reports before giving up.

Nope, I think you are correct. IGNORE ME!!!

jsf9k

Nice work!

dav3r · 2024-01-26T19:48:54Z

I will still create an issue to document this where we can discuss options for automatically re-attempting the failed snapshots/reports before giving up.

See #107 for this.

mcdonnnj

I'm going to do one last pass on the logic but I wanted to throw out this feedback instead of sitting on it until I finish my review. Mostly around manual blackening and formatting with some questions/requests interspersed.

extras/create_snapshots_reports_scorecard.py

Co-authored-by: Nick <50747025+mcdonnnj@users.noreply.github.com>

This only needs to be done once before the loop starts, not on every iteration of the loop. Co-authored-by: Nick <50747025+mcdonnnj@users.noreply.github.com>

Co-authored-by: Nick <50747025+mcdonnnj@users.noreply.github.com>

jsf9k

One minor formatting thang.

extras/create_snapshots_reports_scorecard.py

Co-authored-by: Shane Frasier <jeremy.frasier@gwe.cisa.dhs.gov>

…ningful This also makes them named and structured similarly to the snapshot functions. While I was here, I tidied up the code that formulates the report generation command since it was a bit bulky before. Co-authored-by: Nick <50747025+mcdonnnj@users.noreply.github.com> Co-authored-by: Shane Frasier <jeremy.frasier@gwe.cisa.dhs.gov>

mcdonnnj

Thanks for clearing up my feedback. This LGTM with one minor quibble around a docstring.

extras/create_snapshots_reports_scorecard.py

Co-authored-by: Nick <50747025+mcdonnnj@users.noreply.github.com> Co-authored-by: Shane Frasier <jeremy.frasier@gwe.cisa.dhs.gov>

dav3r added 8 commits January 25, 2024 10:32

Add global lists and thread locks for snapshots and reports

b91dade

Populate global snapshots_to_generate list

5761ad6

Call updated snapshot creation function and remove outdated code

4c88978

Also, add a new debug log statement.

Add thread name to snapshot logging statements

ffa989f

This was overlooked in the past so I figured I'd fix it while I'm here.

Modify report generation to match snapshot generation

2fe75a1

This means using a single global list of reports_to_generate that each thread pulls from, rather than assigning a pre-determined list of reports to each thread.

Remove a function that is no longer needed

034bbe5

Populate the global reports_to_generate list

a460469

dav3r added the improvement This issue or pull request will add new or improve existing functionality label Jan 26, 2024

dav3r self-assigned this Jan 26, 2024

dav3r requested review from felddy, jsf9k, jasonodoom, mcdonnnj and a team January 26, 2024 16:06

jsf9k requested changes Jan 26, 2024

View reviewed changes

dav3r and others added 2 commits January 26, 2024 13:30

Add a space to preserve nested output formatting

67a471d

Co-authored-by: Shane Frasier <jeremy.frasier@gwe.cisa.dhs.gov>

Add comments explaining why we don't need thread locks in some spots

c2a99d6

Co-authored-by: Shane Frasier <jeremy.frasier@gwe.cisa.dhs.gov>

jsf9k approved these changes Jan 26, 2024

View reviewed changes

dav3r mentioned this pull request Jan 26, 2024

Automatically re-attempt failed snapshots and reports #107

Closed

2 tasks

mcdonnnj reviewed Jan 30, 2024

View reviewed changes

dav3r and others added 5 commits January 30, 2024 11:13

Clean up code formatting

ef3fd91

Co-authored-by: Nick <50747025+mcdonnnj@users.noreply.github.com>

Move global list declarations outside of loop

2143a3e

This only needs to be done once before the loop starts, not on every iteration of the loop. Co-authored-by: Nick <50747025+mcdonnnj@users.noreply.github.com>

Update function names, docstrings, and comments to be more meaningful

e969421

Co-authored-by: Nick <50747025+mcdonnnj@users.noreply.github.com>

Add missing thread locks to protect global list modifications

c13a86b

Co-authored-by: Nick <50747025+mcdonnnj@users.noreply.github.com>

Remove import statement for a library that is no longer used

1bae72b

jsf9k requested changes Jan 31, 2024

View reviewed changes

extras/create_snapshots_reports_scorecard.py Outdated Show resolved Hide resolved

dav3r and others added 2 commits January 31, 2024 15:59

Use a more standard format for a pydoc string

e613d5b

Co-authored-by: Shane Frasier <jeremy.frasier@gwe.cisa.dhs.gov>

dav3r requested review from jsf9k and mcdonnnj January 31, 2024 22:47

jsf9k approved these changes Feb 1, 2024

View reviewed changes

mcdonnnj approved these changes Feb 2, 2024

View reviewed changes

extras/create_snapshots_reports_scorecard.py Outdated Show resolved Hide resolved

Improve pydoc string formatting

aceb303

Co-authored-by: Nick <50747025+mcdonnnj@users.noreply.github.com> Co-authored-by: Shane Frasier <jeremy.frasier@gwe.cisa.dhs.gov>

dav3r merged commit b5c896b into develop Feb 2, 2024
1 check passed

dav3r deleted the improvement/optimize-snaps-and-report-generation branch February 2, 2024 18:23

dav3r mentioned this pull request Mar 4, 2024

Remove duplicate logging #113

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize snapshot and report generation threads in `create_snapshots_reports_scorecard.py` #106

Optimize snapshot and report generation threads in `create_snapshots_reports_scorecard.py` #106

dav3r commented Jan 26, 2024 •

edited

Loading

jsf9k left a comment

dav3r commented Jan 26, 2024

jsf9k commented Jan 26, 2024

jsf9k left a comment

dav3r commented Jan 26, 2024

mcdonnnj left a comment •

edited

Loading

jsf9k left a comment

mcdonnnj left a comment

Optimize snapshot and report generation threads in create_snapshots_reports_scorecard.py #106

Optimize snapshot and report generation threads in create_snapshots_reports_scorecard.py #106

Conversation

dav3r commented Jan 26, 2024 • edited Loading

🗣 Description

💭 Motivation and context

🧪 Testing

✅ Pre-approval checklist

✅ Post-merge checklist

jsf9k left a comment

Choose a reason for hiding this comment

dav3r commented Jan 26, 2024

jsf9k commented Jan 26, 2024

jsf9k left a comment

Choose a reason for hiding this comment

dav3r commented Jan 26, 2024

mcdonnnj left a comment • edited Loading

Choose a reason for hiding this comment

jsf9k left a comment

Choose a reason for hiding this comment

mcdonnnj left a comment

Choose a reason for hiding this comment

Optimize snapshot and report generation threads in `create_snapshots_reports_scorecard.py` #106

Optimize snapshot and report generation threads in `create_snapshots_reports_scorecard.py` #106

dav3r commented Jan 26, 2024 •

edited

Loading

mcdonnnj left a comment •

edited

Loading