Build framework for creating reports #121

jonavellecuerdo · 2025-02-06T14:05:28Z

Purpose and background context

The app requires a framework for creating reports that summarize results from DSC workflow runs. This framework should support creating different reports for the three main DSC CLI commands: reconcile, submit, and finalize. To demonstrate the construction of a report, this PR adds a FinalizeReport, which represents the email that is sent when the finalize CLI command is executed. The FinalizeReport produces an email with the following:

Subject heading: "DSpace Submission Results - <workflow_name>, batch='<batch_id>'"
File attachments: a CSV file describing successfully deposited items (columns for item_identifier and doi) and a text file for captured error messages (from processing DSS results)
Message body: See text in 'finalize' HTML and plain-text templates.

This PR touches several files, so here are some notes for reviewers:

One small change in the Makefile is to fix the lint-apply syntax (to run multiple Make commands, it seems they must be on the same line.
The new dsc/templates contain 4 very simply formatted HTML and plain-text templates. There is a 'base' template that should be extended by templates for 'child' reports (i.e., report templates for reconcile, submit, and finalize).
Recommend reviewing from top-level to low-level (i.e., start with the cli then dig 🤓 ). See order of changes described in commit message.

How can a reviewer manually see the effects of these changes?

Review unit tests in test_report.py
Review updated unit test in test_cli.py demonstrating use of new boolean flag option
Review updated unit tests in test_ses.py demonstrating changes to include email message body + support new format of attachments arg

Note: I contemplated whether to add unit tests to verify the rendering of the report templates, but I would prefer to review the resulting email once we have the AWS infrastructure in place to send and review test emails. Once that is in place, I plan on including screenshots to this PR/or the ticket for reference. However, this shouldn't block the PR from being merged!

Includes new or updated dependencies?

YES - installs jinja2; installs freezegun as 'dev-package'.

Changes expectations for external applications?

YES - As of this writing, the existing report templates are quite simple,
with the goal of getting to an minimum viable product. It would be good
to include in coming documentation what email recipients can expect
in the emailed report.

What are the relevant tickets?

https://mitlibraries.atlassian.net/browse/IN-1157

Developer

All new ENV is documented in README
All new ENV has been added to staging and production environments
All related Jira tickets are linked in commit message(s)
Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

The commit message is clear and follows our guidelines (not just this PR message)
There are appropriate tests covering any new functionality
The provided documentation is sufficient for understanding any new functionality introduced
Any manual tests have been performed or provided examples verified
New dependencies are appropriate or there were no changes

Why these changes are being introduced: * The app requires a framework for creating reports that summarize results from DSC workflow runs. This framework should support creating different reports for the three main DSC CLI commands: reconcile, submit, and finalize. How this addresses that need: * Add boolean flag option 'create-and-send-report' to main CLI command * Apply 'create-and-send-report' to 'finalize' CLI command * Add report module for creating email data (i.e., subject heading, file attachments, message body) * Add HTML and plain-text templates (rendered by Jinja) * Update SESClient to render message body in emails and accept multiple attachments * Create WorkflowEvents dataclass to capture useful report data during execution of Workflow methods (replaces 'report_data' attribute) * Add test module for reporting framework * Update tests as needed Side effects of this change: * As of this writing, the existing report templates are quite simple, with the goal of getting to an minimum viable product. It would be good to include in coming documentation what email recipients can expect in the emailed report. Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/IN-1157

ehanson8

This is really great, a number of questions and comments but this was easy to read and understand how it integrates into the app. Fantastic work

ehanson8 · 2025-02-06T16:20:22Z

dsc/cli.py

    ctx: click.Context,
    workflow_name: str,
    batch_id: str,
+    create_and_send_report: bool,  # noqa: FBT001


Is there a reason to put this on main rather than the CLI command itself? Then there would be no need to add it to the context object

See #121 (comment).

dsc/cli.py

ehanson8 · 2025-02-06T16:30:17Z

dsc/cli.py

    workflow.process_results()
-    workflow.report_results(email_recipients.split(","))
+
+    if ctx.obj["create_and_send_report"]:


I think this boolean check is better suited to submit, I have a hard time imagining a workflow where the app wouldn't report out the results of the DSS run.

Also, if the create_and_send_report param is moved to submit, it could look this:

if create_and_send_report: workflow.send_report( report_class=FinalizeReport, email_recipients=email_recipients.split(",") )

And it could probably also be simplified to just report

Hmm, I guess the idea behind the current structure is: giving the user an option to "send reports" (i.e., emails) that summarize the outcome of each DSC CLI command: reconcile, submit, finalize, and related to comment above, #121 (comment), this would mean adding the boolean flag option to all CLI commands.

I'm okay with renaming create_and_send_report to report, but perhaps we can discuss this in our meeting this afternoon!

dsc/report.py

dsc/workflows/base/__init__.py

dsc/report.py

dsc/workflows/base/__init__.py

tests/test_report.py

ghukill

As a first pass, some questions and requests for discussion.

Overall, think it looks great! I had the opportunity to discuss and see sketches of this via a huddle, so felt very easy to reconigze how it would work. But I think that would be true even without pre-exposure.

For my own thinking, I break it down into these big blocks:

Workflows always capture noteworthy events that happened while they worked, for reconcile, submit, or finalize
Reports can optionally be created after a workflow completes, by using data from the complete workflow object (name, events, etc.)
Reports can serialize themselves into plain text or HTML
Workflows have the ability to "report out", so it's their method send_report that is used

As I type all this... I wonder if Workflow's send_report() should actually be unaware of a Report object? Adding as a final comment in my review, only surfaced from this typing here.

But, overall, looking real good.

dsc/report.py

ghukill · 2025-02-06T18:54:29Z

dsc/report.py

+    def to_plain_text(self) -> str:
+        template = self.env.get_template(f"{self.template_name}.txt")
+        return template.render(
+            workflow_name=self.workflow_name,
+            batch_id=self.batch_id,
+            report_date=self.report_date,
+            status=self.status,
+            processed_items=self.events.processed_items,
+            errors=self.events.errors,
+        )
+
+    def to_rich_text(self) -> str:
+        template = self.env.get_template(f"{self.template_name}.html")
+        return template.render(
+            workflow_name=self.workflow_name,
+            batch_id=self.batch_id,
+            report_date=self.report_date,
+            status=self.status,
+            processed_items=self.events.processed_items,
+            errors=self.events.errors,
+        )


Related to the comment above about the base class having abstract methods for these to_plain_text() and to_rich_text(), perhaps if these are defined on each class they could just hardcode the precise template file they want to use?

Taking another parallel from the workflow classes, could self.env.get_template("hard_coded_template_name}.txt") be a property that is set in the subclasses but called here?

I like that pattern @ehanson8. And maybe could extend one hop further. Maybe the properties are the actual template filename, which allows leaving self.env.get_template(...) the same for all classes, and therefore set as a property on the base Report class?

Like:

class Report: @property @abstractmethod def jinja_template_plain_text_filename(self): ... @property @abstractmethod def jinja_template_html_filename(self): ... @property def jinja_template_plain_text(self): return self.env.get_template(self.jinja_template_plain_text_filename) @property def jinja_template_html(self): return self.env.get_template(self.jinja_template_html_filename) class FinalReport(Report): @property def jinja_template_plain_text_filename(self): return "final.txt" @property def jinja_template_html_filename(self): return "final.html"

I like that extension a lot!

See changes in commit 84652e9

dsc/report.py

ghukill · 2025-02-06T19:07:46Z

dsc/workflows/base/__init__.py

+                items.append(
+                    {"item_identifier": item_identifier, "message_body": message_body}
+                )
+                self.workflow_events.processed_items.append(
+                    {
+                        "item_identifier": item_identifier,
+                        "doi": message_body["ItemHandle"],
+                    }
+                )


These two objects look very similar. This also goes back to an interesting discussion I recall having with @ehanson8 about passing data vs reporting.

For the WorkflowEvents, do we want to accumlate objects like this? Or might we want more human-friendly strings?

Argument for objects: this gives the report the ability to change what it retrieves from the object later, and how it displays that.

Argument for strings: the workflow is deciding how to "spin" or describe the event, and it's a static string that it provides.

I don't think there is a "correct" answer, just an opinionated one. But I do think answering that question has bearing on how these two objects look very similar.

For the reasons shared above, I think I prefer objects. It seems like it would be easier to either create a custom report class or update a template vs. changing a string in WorkflowEvents (and in Workflow method calls where WorkflowEvents is populated) whenever a rewording is needed. Similarly, adding a data point to WorkflowEvents if something is missing and then updating the report template feels more flexible...

What do you think, @ehanson8 and @ghukill ? 🤔

@jonavellecuerdo - objects make sense to me! I guess my followup regarding their near similarity: why not use the same object for both then? It looks like the reporting object just is a subset of the SQS message body, but if you pass the whole thing, then reporting has more options on what to selectively report on.

dsc/cli.py

* Update CLI help description for 'create-and-send-report' * Rename 'Report.env' -> 'Report.jinja_env' * Keep original Workflow.workflow_name (undo capitalization for Report.workflow_name) * Rename 'Report.to_rich_text' -> 'Report.to_html'

* Create 'reports' subfolder * Add property, abstract methods for Jinja template filenames to 'Report' classes * Add property method for loading templates

coveralls · 2025-02-07T16:57:52Z

Pull Request Test Coverage Report for Build 13244853659

Details

107 of 120 (89.17%) changed or added relevant lines in 7 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-2.0%) to 96.552%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
dsc/reports/base.py	37	38	97.37%
dsc/reports/finalize.py	29	41	70.73%

Totals
Change from base Build 13163819468:	-2.0%
Covered Lines:	560
Relevant Lines:	580

💛 - Coveralls

* Remove boolean CLI option and make reporting default, add stub methods for 'reconcile' and 'submit' commands * Create 'reports' subfolder * Update Report.to_html and Report.to_plain_text to abstract methods * Remove Report.status property method * Simplify report templates * Reorganize logic in Workflow.process_sqs_queue, capturing all processed items and differentiate between 'processing errors' and 'non-ingested items'

ehanson8

This is fantastic work and I would fully approve but there's one variable rename that I don't think is accurate unless I'm missing something. Otherwise, it looks great!

dsc/reports/finalize.py

ehanson8 · 2025-02-10T14:46:21Z

dsc/workflows/base/__init__.py

        for sqs_message in sqs_client.receive():
            try:
-                item_identifier, message_body = sqs_client.process_result_message(
+                item_identifier, result_message = sqs_client.process_result_message(


This is technically just the message body that's returned by process_result_message, the result message contains the message attributes as well (where the only thing we care about currently is the item_identifier which we're already extracting)

ghukill

Nice work!! Great discussions last week. Feel good about where it landed here to move forward with.

dsc/reports/finalize.py

ghukill · 2025-02-10T14:54:30Z

dsc/workflows/base/__init__.py

+            # capture all processed items, whether ingested or not
+            item_data = {
+                "item_identifier": item_identifier,
+                "result_message": result_message,
+                "ingested": result_message["ResultType"] == "success",
+            }
+            self.workflow_events.processed_items.append(item_data)
+            items.append(item_data)
+
+            if not item_data["ingested"]:
+                message = (
+                    f"Item '{item_identifier}' did not ingest successfully: {sqs_message}"
+                )
+                logger.info(message)
+                self.workflow_events.errors.append(message)


After our discussions last week, I had to do a double/triple take here, but this feels right.

What I like about this flow:

we always put the processed item into the WorkflowEvents.processed_items list, whether ingested or not

we save human-friendly message strings to WorkflowEvents.errors

Perhaps as reports grow in the future, we might want to lean even heavier into that and call it something like WorkflowEvents.error_messages or something, but maybe not required.

What we should be careful about -- and maybe this even suggests a docstring sometime -- is not treating Workflow.WorkflowEvents as data to feed into more work with the items. My understanding is that it's 100% about reporting... they just so happen to be structured objects.

jonavellecuerdo marked this pull request as ready for review February 6, 2025 14:37

jonavellecuerdo requested review from a team, ehanson8 and ghukill February 6, 2025 14:37

ehanson8 reviewed Feb 6, 2025

View reviewed changes

ghukill requested changes Feb 6, 2025

View reviewed changes

jonavellecuerdo added 2 commits February 7, 2025 11:53

Address comments in PR #121

dbc967a

* Update CLI help description for 'create-and-send-report' * Rename 'Report.env' -> 'Report.jinja_env' * Keep original Workflow.workflow_name (undo capitalization for Report.workflow_name) * Rename 'Report.to_rich_text' -> 'Report.to_html'

Address comments in PR #121

84652e9

* Create 'reports' subfolder * Add property, abstract methods for Jinja template filenames to 'Report' classes * Add property method for loading templates

jonavellecuerdo requested review from ehanson8 and ghukill February 7, 2025 17:08

ehanson8 reviewed Feb 10, 2025

View reviewed changes

ghukill approved these changes Feb 10, 2025

View reviewed changes

Rename 'result_message' -> 'result_message_body'

4b3f41b

ehanson8 approved these changes Feb 10, 2025

View reviewed changes

jonavellecuerdo merged commit 64ab68d into main Feb 10, 2025
2 checks passed

jonavellecuerdo deleted the IN-1157-build-report-system branch February 10, 2025 16:42

Build framework for creating reports #121

Build framework for creating reports #121

Uh oh!

Conversation

jonavellecuerdo commented Feb 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose and background context

How can a reviewer manually see the effects of these changes?

Includes new or updated dependencies?

Changes expectations for external applications?

What are the relevant tickets?

Developer

Code Reviewer(s)

Uh oh!

ehanson8 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ghukill left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coveralls commented Feb 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 13244853659

Details

💛 - Coveralls

Uh oh!

ehanson8 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghukill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jonavellecuerdo commented Feb 6, 2025 •

edited

Loading

ghukill left a comment •

edited

Loading

coveralls commented Feb 7, 2025 •

edited

Loading