Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize snapshot and report generation threads in create_snapshots_reports_scorecard.py #106

Merged
merged 18 commits into from
Feb 2, 2024

Conversation

dav3r
Copy link
Member

@dav3r dav3r commented Jan 26, 2024

🗣 Description

This PR gives work to threads more efficiently when creating snapshots and reports in extras/create_snapshots_reports_scorecard.py.

Note that this does not include any changes to the way third-party snapshots and reports are generated. That should be done in a later PR (see issue #60).

I also took this opportunity to improve a few logging statements that did not include the thread name. This will make the log output easier to understand.

💭 Motivation and context

The current code splits the list of snapshots/reports to be generating into a group of sub-lists and assigns each thread one of those sub-lists to work on. This PR improves upon this by enabling each thread to pull one snapshot or report (to be generated) from a single list and generating it, repeating the process until the list is empty, at which point each thread exits. This is a more efficient way to utilize the threads. It should result in a faster overall processing time by preventing some threads from having to generate multiple long-running snapshots/reports, while other threads are sitting idle because they have already completed their work.

This PR resolves #62. Since I was making related changes here, I also included changes to resolve #59.

🧪 Testing

I tested this by commenting out or not executing (via switches) parts of the script that were not relevant (e.g. pausing/resuming the commander, creating the sample report, creating third-party snapshots/reports, etc). Then I temporarily modified the script so that it ran a simple sleep command instead of cyhy-snapshot and cyhy-report. This allowed me to run the entire script (e.g. ./create_snapshots_reports_scorecard.py --no-dock --no-log --no-pause cyhy-read-only scan-read-only) and I confirmed that the threading and list consumption worked as intended.

Sample test run output (sanitized and snipped for clarity; note I also limited the number of reports here to generate to 100 for a quicker test run):

2024-01-26 10:29:15,075 INFO - BEGIN
2024-01-26 10:29:15,075 INFO - Building list of reports to generate...
2024-01-26 10:29:15,085 INFO - Building list of snapshots to generate...
2024-01-26 10:33:53,529 DEBUG - 100 snapshots to generate: [u'AAA', u'BBB', u'CCC', ...]
2024-01-26 10:33:53,529 DEBUG - [Thread-1] 100 snapshot(s) left to generate
2024-01-26 10:33:53,529 INFO - [Thread-1] Starting snapshot for: AAA
2024-01-26 10:33:53,562 DEBUG - [Thread-2] 99 snapshot(s) left to generate
2024-01-26 10:33:53,563 INFO - [Thread-2] Starting snapshot for: BBB
2024-01-26 10:33:53,563 DEBUG - [Thread-3] 98 snapshot(s) left to generate
2024-01-26 10:33:53,564 INFO - [Thread-3] Starting snapshot for: CCC
...
2024-01-26 10:33:54,603 INFO - [Thread-1] Successful snapshot: AAA (1.06 s)
2024-01-26 10:33:54,604 DEBUG - [Thread-1] 68 snapshot(s) left to generate
2024-01-26 10:33:54,605 INFO - [Thread-1] Starting snapshot for: YYY
2024-01-26 10:33:54,645 INFO - [Thread-3] Successful snapshot: CCC (1.04 s)
2024-01-26 10:33:54,699 INFO - [Thread-2] Successful snapshot: BBB (1.03 s)
...
2024-01-26 10:33:59,814 INFO - [Thread-1] Successful snapshot: ZZZ (2.04 s)
2024-01-26 10:33:59,814 DEBUG - [Thread-1] 0 snapshot(s) left to generate
2024-01-26 10:33:59,814 INFO - [Thread-1] No snapshots left to generate - thread exiting
...
2024-01-26 10:34:00,023 INFO - Time to complete snapshots: 4.75 minutes
2024-01-26 10:34:00,024 INFO - Longest Snapshots:
2024-01-26 10:34:00,024 INFO - GGG: 2.1 seconds
2024-01-26 10:34:00,025 INFO - HHH: 2.1 seconds
2024-01-26 10:34:00,025 INFO - MMM: 2.1 seconds
...
2024-01-26 10:34:00,026 DEBUG - 100 reports to generate: [u'AAA', u'BBB', u'CCC', ...]
2024-01-26 10:34:00,027 DEBUG - [Thread-33] 100 reports left to generate
2024-01-26 10:34:00,027 INFO - [Thread-33] Starting report for: AAA
2024-01-26 10:34:00,533 DEBUG - [Thread-34] 99 reports left to generate
2024-01-26 10:34:00,534 INFO - [Thread-34] Starting report for: BBB
2024-01-26 10:34:01,037 DEBUG - [Thread-35] 98 reports left to generate
2024-01-26 10:34:01,038 INFO - [Thread-35] Starting report for: CCC
2024-01-26 10:34:01,071 INFO - [Thread-33] Successful report generated: AAA (1.04 s)
2024-01-26 10:34:01,071 DEBUG - [Thread-33] 97 reports left to generate
2024-01-26 10:34:01,071 INFO - [Thread-33] Starting report for: FFF
...
2024-01-26 10:34:02,108 INFO - [Thread-33] Successful report generated: FFF (1.04 s)
2024-01-26 10:34:02,109 DEBUG - [Thread-33] 94 reports left to generate
2024-01-26 10:34:02,109 INFO - [Thread-33] Starting report for: JJJ
2024-01-26 10:34:02,582 INFO - [Thread-34] Successful report generated: BBB (2.05 s)
2024-01-26 10:34:02,583 DEBUG - [Thread-34] 92 reports left to generate
2024-01-26 10:34:02,583 INFO - [Thread-34] Starting report for: NNN
...
2024-01-26 10:34:03,086 INFO - [Thread-35] Successful report generated: CCC (2.05 s)
2024-01-26 10:34:03,087 DEBUG - [Thread-35] 89 reports left to generate
2024-01-26 10:34:03,087 INFO - [Thread-35] Starting report for: PPP
...
2024-01-26 10:34:14,273 INFO - [Thread-47] Successful report generated: ZZZ (2.03 s)
2024-01-26 10:34:14,273 DEBUG - [Thread-47] 0 reports left to generate
2024-01-26 10:34:14,273 INFO - [Thread-47] No reports left to generate - exiting
2024-01-26 10:34:14,274 INFO - Time to complete reports: 0.24 minutes
2024-01-26 10:34:14,274 INFO - Longest Reports:
2024-01-26 10:34:14,274 INFO - QQQ: 2.1 seconds
2024-01-26 10:34:14,274 INFO - TTT: 2.1 seconds
2024-01-26 10:34:14,274 INFO - VVV: 2.1 seconds
...
2024-01-26 10:34:14,275 INFO - Time to complete reports: 0.24 minutes
2024-01-26 10:34:14,275 INFO - Number of snapshots generated: 115
2024-01-26 10:34:14,276 INFO -   Third-party snapshots generated: 0
2024-01-26 10:34:14,276 INFO - Number of snapshots failed: 0
2024-01-26 10:34:14,276 INFO -   Third-party snapshots failed: 0
2024-01-26 10:34:14,276 INFO - Number of reports generated: 100
2024-01-26 10:34:14,276 INFO -   Third-party reports generated: 0
2024-01-26 10:34:14,277 INFO - Number of reports failed: 0
2024-01-26 10:34:14,277 INFO -   Third-party reports failed: 0
2024-01-26 10:34:14,277 INFO - Total time: 4.99 minutes
2024-01-26 10:34:14,277 INFO - END

I also ran a test against the entire Production set of orgs and confirmed that everything worked fine (from a thread and work consumption standpoint) with a larger set of snapshots and reports.

✅ Pre-approval checklist

  • This PR has an informative and human-readable title.
  • Changes are limited to a single goal - eschew scope creep!
  • All relevant type-of-change labels have been added.
  • I have read the CONTRIBUTING document.
  • These code changes follow cisagov code standards.
  • All new and existing tests pass.

✅ Post-merge checklist

  • Deploy the updated script to Production.
  • Validate that the updated script runs successfully in Production.

We now pull individual org IDs from the global list of snapshots_to_generate and pass them into the create_snapshot function instead of passing in the entire list of orgs.

Each thread checks to see if the global list is empty; if it isn't, it grabs an org ID and generates a snapshot for it.  If the global list is empty, the thread exits.
This was overlooked in the past so I figured I'd fix it while I'm here.
This means using a single global list of reports_to_generate that each thread pulls from, rather than assigning a pre-determined list of reports to each thread.
@dav3r dav3r added the improvement This issue or pull request will add new or improve existing functionality label Jan 26, 2024
@dav3r dav3r self-assigned this Jan 26, 2024
@dav3r dav3r requested review from felddy, jsf9k, jasonodoom, mcdonnnj and a team January 26, 2024 16:06
Copy link
Member

@jsf9k jsf9k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting a few minor changes.

It's probably also worth at least creating an issue for more gracefully handling snapshot or report generation failures. Right now I think the corresponding thread will crash and exit its thread loop and we will lose that bit of concurrency. At a minimum we should do some error handling to ensure that the thread does not exit, and eventually we probably want to put the item back onto the queue for another chance before abandoning all hope.

extras/create_snapshots_reports_scorecard.py Outdated Show resolved Hide resolved
extras/create_snapshots_reports_scorecard.py Outdated Show resolved Hide resolved
extras/create_snapshots_reports_scorecard.py Outdated Show resolved Hide resolved
extras/create_snapshots_reports_scorecard.py Show resolved Hide resolved
dav3r and others added 2 commits January 26, 2024 13:30
Co-authored-by: Shane Frasier <jeremy.frasier@gwe.cisa.dhs.gov>
Co-authored-by: Shane Frasier <jeremy.frasier@gwe.cisa.dhs.gov>
@dav3r
Copy link
Member Author

dav3r commented Jan 26, 2024

It's probably also worth at least creating an issue for more gracefully handling snapshot or report generation failures. Right now I think the corresponding thread will crash and exit its thread loop and we will lose that bit of concurrency. At a minimum we should do some error handling to ensure that the thread does not exit, and eventually we probably want to put the item back onto the queue for another chance before abandoning all hope.

I don't believe we will lose the thread if a failure occurs during snapshot or report generation because those commands (cyhy-snapshot and cyhy-report) are run in a sub-process, whose return code is checked by the thread. If I'm misunderstanding you or you think I'm wrong, please let me know.

I think that if a failure occurs, we will end up in the same position as we are now (before this PR). Namely, we will see in the output that there was one or more failed snapshots/reports and we will have to manually attempt to generate them later. I will still create an issue to document this where we can discuss options for automatically re-attempting the failed snapshots/reports before giving up.

@jsf9k
Copy link
Member

jsf9k commented Jan 26, 2024

It's probably also worth at least creating an issue for more gracefully handling snapshot or report generation failures. Right now I think the corresponding thread will crash and exit its thread loop and we will lose that bit of concurrency. At a minimum we should do some error handling to ensure that the thread does not exit, and eventually we probably want to put the item back onto the queue for another chance before abandoning all hope.

I don't believe we will lose the thread if a failure occurs during snapshot or report generation because those commands (cyhy-snapshot and cyhy-report) are run in a sub-process, whose return code is checked by the thread. If I'm misunderstanding you or you think I'm wrong, please let me know.

I think that if a failure occurs, we will end up in the same position as we are now (before this PR). Namely, we will see in the output that there was one or more failed snapshots/reports and we will have to manually attempt to generate them later. I will still create an issue to document this where we can discuss options for automatically re-attempting the failed snapshots/reports before giving up.

Nope, I think you are correct. IGNORE ME!!!

Copy link
Member

@jsf9k jsf9k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@dav3r
Copy link
Member Author

dav3r commented Jan 26, 2024

I will still create an issue to document this where we can discuss options for automatically re-attempting the failed snapshots/reports before giving up.

See #107 for this.

Copy link
Member

@mcdonnnj mcdonnnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to do one last pass on the logic but I wanted to throw out this feedback instead of sitting on it until I finish my review. Mostly around manual blackening and formatting with some questions/requests interspersed.

extras/create_snapshots_reports_scorecard.py Outdated Show resolved Hide resolved
extras/create_snapshots_reports_scorecard.py Outdated Show resolved Hide resolved
extras/create_snapshots_reports_scorecard.py Outdated Show resolved Hide resolved
extras/create_snapshots_reports_scorecard.py Outdated Show resolved Hide resolved
extras/create_snapshots_reports_scorecard.py Outdated Show resolved Hide resolved
extras/create_snapshots_reports_scorecard.py Outdated Show resolved Hide resolved
extras/create_snapshots_reports_scorecard.py Outdated Show resolved Hide resolved
extras/create_snapshots_reports_scorecard.py Outdated Show resolved Hide resolved
extras/create_snapshots_reports_scorecard.py Outdated Show resolved Hide resolved
extras/create_snapshots_reports_scorecard.py Show resolved Hide resolved
dav3r and others added 5 commits January 30, 2024 11:13
Co-authored-by: Nick <50747025+mcdonnnj@users.noreply.github.com>
This only needs to be done once before the loop starts, not on every iteration of the loop.

Co-authored-by: Nick <50747025+mcdonnnj@users.noreply.github.com>
Co-authored-by: Nick <50747025+mcdonnnj@users.noreply.github.com>
Co-authored-by: Nick <50747025+mcdonnnj@users.noreply.github.com>
Copy link
Member

@jsf9k jsf9k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor formatting thang.

extras/create_snapshots_reports_scorecard.py Outdated Show resolved Hide resolved
dav3r and others added 2 commits January 31, 2024 15:59
Co-authored-by: Shane Frasier <jeremy.frasier@gwe.cisa.dhs.gov>
…ningful

This also makes them named and structured similarly to the snapshot functions.

While I was here, I tidied up the code that formulates the report generation command since it was a bit bulky before.

Co-authored-by: Nick <50747025+mcdonnnj@users.noreply.github.com>
Co-authored-by: Shane Frasier <jeremy.frasier@gwe.cisa.dhs.gov>
@dav3r dav3r requested review from jsf9k and mcdonnnj January 31, 2024 22:47
Copy link
Member

@mcdonnnj mcdonnnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clearing up my feedback. This LGTM with one minor quibble around a docstring.

extras/create_snapshots_reports_scorecard.py Outdated Show resolved Hide resolved
Co-authored-by: Nick <50747025+mcdonnnj@users.noreply.github.com>
Co-authored-by: Shane Frasier <jeremy.frasier@gwe.cisa.dhs.gov>
@dav3r dav3r merged commit b5c896b into develop Feb 2, 2024
1 check passed
@dav3r dav3r deleted the improvement/optimize-snaps-and-report-generation branch February 2, 2024 18:23
@dav3r dav3r mentioned this pull request Mar 4, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement This issue or pull request will add new or improve existing functionality
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Improve parallelization Make weekly report generation code consistent with snapshot code
3 participants