-
Notifications
You must be signed in to change notification settings - Fork 0
Build framework for creating reports #121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Why these changes are being introduced: * The app requires a framework for creating reports that summarize results from DSC workflow runs. This framework should support creating different reports for the three main DSC CLI commands: reconcile, submit, and finalize. How this addresses that need: * Add boolean flag option 'create-and-send-report' to main CLI command * Apply 'create-and-send-report' to 'finalize' CLI command * Add report module for creating email data (i.e., subject heading, file attachments, message body) * Add HTML and plain-text templates (rendered by Jinja) * Update SESClient to render message body in emails and accept multiple attachments * Create WorkflowEvents dataclass to capture useful report data during execution of Workflow methods (replaces 'report_data' attribute) * Add test module for reporting framework * Update tests as needed Side effects of this change: * As of this writing, the existing report templates are quite simple, with the goal of getting to an minimum viable product. It would be good to include in coming documentation what email recipients can expect in the emailed report. Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/IN-1157
ehanson8
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really great, a number of questions and comments but this was easy to read and understand how it integrates into the app. Fantastic work
dsc/cli.py
Outdated
| ctx: click.Context, | ||
| workflow_name: str, | ||
| batch_id: str, | ||
| create_and_send_report: bool, # noqa: FBT001 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to put this on main rather than the CLI command itself? Then there would be no need to add it to the context object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #121 (comment).
dsc/cli.py
Outdated
| workflow.process_results() | ||
| workflow.report_results(email_recipients.split(",")) | ||
|
|
||
| if ctx.obj["create_and_send_report"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this boolean check is better suited to submit, I have a hard time imagining a workflow where the app wouldn't report out the results of the DSS run.
Also, if the create_and_send_report param is moved to submit, it could look this:
if create_and_send_report:
workflow.send_report(
report_class=FinalizeReport, email_recipients=email_recipients.split(",")
)
And it could probably also be simplified to just report
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I guess the idea behind the current structure is: giving the user an option to "send reports" (i.e., emails) that summarize the outcome of each DSC CLI command: reconcile, submit, finalize, and related to comment above, #121 (comment), this would mean adding the boolean flag option to all CLI commands.
I'm okay with renaming create_and_send_report to report, but perhaps we can discuss this in our meeting this afternoon!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a first pass, some questions and requests for discussion.
Overall, think it looks great! I had the opportunity to discuss and see sketches of this via a huddle, so felt very easy to reconigze how it would work. But I think that would be true even without pre-exposure.
For my own thinking, I break it down into these big blocks:
- Workflows always capture noteworthy events that happened while they worked, for reconcile, submit, or finalize
- Reports can optionally be created after a workflow completes, by using data from the complete workflow object (name, events, etc.)
- Reports can serialize themselves into plain text or HTML
- Workflows have the ability to "report out", so it's their method
send_reportthat is used
As I type all this... I wonder if Workflow's send_report() should actually be unaware of a Report object? Adding as a final comment in my review, only surfaced from this typing here.
But, overall, looking real good.
dsc/report.py
Outdated
| def to_plain_text(self) -> str: | ||
| template = self.env.get_template(f"{self.template_name}.txt") | ||
| return template.render( | ||
| workflow_name=self.workflow_name, | ||
| batch_id=self.batch_id, | ||
| report_date=self.report_date, | ||
| status=self.status, | ||
| processed_items=self.events.processed_items, | ||
| errors=self.events.errors, | ||
| ) | ||
|
|
||
| def to_rich_text(self) -> str: | ||
| template = self.env.get_template(f"{self.template_name}.html") | ||
| return template.render( | ||
| workflow_name=self.workflow_name, | ||
| batch_id=self.batch_id, | ||
| report_date=self.report_date, | ||
| status=self.status, | ||
| processed_items=self.events.processed_items, | ||
| errors=self.events.errors, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to the comment above about the base class having abstract methods for these to_plain_text() and to_rich_text(), perhaps if these are defined on each class they could just hardcode the precise template file they want to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking another parallel from the workflow classes, could self.env.get_template("hard_coded_template_name}.txt") be a property that is set in the subclasses but called here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that pattern @ehanson8. And maybe could extend one hop further. Maybe the properties are the actual template filename, which allows leaving self.env.get_template(...) the same for all classes, and therefore set as a property on the base Report class?
Like:
class Report:
@property
@abstractmethod
def jinja_template_plain_text_filename(self): ...
@property
@abstractmethod
def jinja_template_html_filename(self): ...
@property
def jinja_template_plain_text(self):
return self.env.get_template(self.jinja_template_plain_text_filename)
@property
def jinja_template_html(self):
return self.env.get_template(self.jinja_template_html_filename)
class FinalReport(Report):
@property
def jinja_template_plain_text_filename(self):
return "final.txt"
@property
def jinja_template_html_filename(self):
return "final.html"There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that extension a lot!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See changes in commit 84652e9
dsc/workflows/base/__init__.py
Outdated
| items.append( | ||
| {"item_identifier": item_identifier, "message_body": message_body} | ||
| ) | ||
| self.workflow_events.processed_items.append( | ||
| { | ||
| "item_identifier": item_identifier, | ||
| "doi": message_body["ItemHandle"], | ||
| } | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two objects look very similar. This also goes back to an interesting discussion I recall having with @ehanson8 about passing data vs reporting.
For the WorkflowEvents, do we want to accumlate objects like this? Or might we want more human-friendly strings?
Argument for objects: this gives the report the ability to change what it retrieves from the object later, and how it displays that.
Argument for strings: the workflow is deciding how to "spin" or describe the event, and it's a static string that it provides.
I don't think there is a "correct" answer, just an opinionated one. But I do think answering that question has bearing on how these two objects look very similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the reasons shared above, I think I prefer objects. It seems like it would be easier to either create a custom report class or update a template vs. changing a string in WorkflowEvents (and in Workflow method calls where WorkflowEvents is populated) whenever a rewording is needed. Similarly, adding a data point to WorkflowEvents if something is missing and then updating the report template feels more flexible...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonavellecuerdo - objects make sense to me! I guess my followup regarding their near similarity: why not use the same object for both then? It looks like the reporting object just is a subset of the SQS message body, but if you pass the whole thing, then reporting has more options on what to selectively report on.
* Update CLI help description for 'create-and-send-report' * Rename 'Report.env' -> 'Report.jinja_env' * Keep original Workflow.workflow_name (undo capitalization for Report.workflow_name) * Rename 'Report.to_rich_text' -> 'Report.to_html'
* Create 'reports' subfolder * Add property, abstract methods for Jinja template filenames to 'Report' classes * Add property method for loading templates
Pull Request Test Coverage Report for Build 13244853659Details
💛 - Coveralls |
* Remove boolean CLI option and make reporting default, add stub methods for 'reconcile' and 'submit' commands * Create 'reports' subfolder * Update Report.to_html and Report.to_plain_text to abstract methods * Remove Report.status property method * Simplify report templates * Reorganize logic in Workflow.process_sqs_queue, capturing all processed items and differentiate between 'processing errors' and 'non-ingested items'
ehanson8
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fantastic work and I would fully approve but there's one variable rename that I don't think is accurate unless I'm missing something. Otherwise, it looks great!
dsc/workflows/base/__init__.py
Outdated
| for sqs_message in sqs_client.receive(): | ||
| try: | ||
| item_identifier, message_body = sqs_client.process_result_message( | ||
| item_identifier, result_message = sqs_client.process_result_message( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is technically just the message body that's returned by process_result_message, the result message contains the message attributes as well (where the only thing we care about currently is the item_identifier which we're already extracting)
ghukill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!! Great discussions last week. Feel good about where it landed here to move forward with.
| # capture all processed items, whether ingested or not | ||
| item_data = { | ||
| "item_identifier": item_identifier, | ||
| "result_message": result_message, | ||
| "ingested": result_message["ResultType"] == "success", | ||
| } | ||
| self.workflow_events.processed_items.append(item_data) | ||
| items.append(item_data) | ||
|
|
||
| if not item_data["ingested"]: | ||
| message = ( | ||
| f"Item '{item_identifier}' did not ingest successfully: {sqs_message}" | ||
| ) | ||
| logger.info(message) | ||
| self.workflow_events.errors.append(message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After our discussions last week, I had to do a double/triple take here, but this feels right.
What I like about this flow:
- we always put the processed item into the
WorkflowEvents.processed_itemslist, whether ingested or not - we save human-friendly message strings to
WorkflowEvents.errors
Perhaps as reports grow in the future, we might want to lean even heavier into that and call it something like WorkflowEvents.error_messages or something, but maybe not required.
What we should be careful about -- and maybe this even suggests a docstring sometime -- is not treating Workflow.WorkflowEvents as data to feed into more work with the items. My understanding is that it's 100% about reporting... they just so happen to be structured objects.
Purpose and background context
The app requires a framework for creating reports that summarize results from DSC workflow runs. This framework should support creating different reports for the three main DSC CLI commands:
reconcile,submit, andfinalize. To demonstrate the construction of a report, this PR adds aFinalizeReport, which represents the email that is sent when thefinalizeCLI command is executed. TheFinalizeReportproduces an email with the following:item_identifieranddoi) and a text file for captured error messages (from processing DSS results)This PR touches several files, so here are some notes for reviewers:
Makefileis to fix thelint-applysyntax (to run multipleMakecommands, it seems they must be on the same line.dsc/templatescontain 4 very simply formatted HTML and plain-text templates. There is a 'base' template that should be extended by templates for 'child' reports (i.e., report templates forreconcile,submit, andfinalize).clithen dig 🤓 ). See order of changes described in commit message.How can a reviewer manually see the effects of these changes?
test_report.pytest_cli.pydemonstrating use of new boolean flag optiontest_ses.pydemonstrating changes to include email message body + support new format ofattachmentsargNote: I contemplated whether to add unit tests to verify the rendering of the report templates, but I would prefer to review the resulting email once we have the AWS infrastructure in place to send and review test emails. Once that is in place, I plan on including screenshots to this PR/or the ticket for reference. However, this shouldn't block the PR from being merged!
Includes new or updated dependencies?
YES - installs
jinja2; installsfreezegunas 'dev-package'.Changes expectations for external applications?
YES - As of this writing, the existing report templates are quite simple,
with the goal of getting to an minimum viable product. It would be good
to include in coming documentation what email recipients can expect
in the emailed report.
What are the relevant tickets?
https://mitlibraries.atlassian.net/browse/IN-1157
Developer
Code Reviewer(s)