Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Activities tab rest optimization #1389

Merged
merged 4 commits into from
May 24, 2023

Conversation

jcabrerizo
Copy link
Contributor

It avoids to return the task-summary and the output of each step inside a workflow.
The automatic refresh of the view was pulling those, sometimes, huge objects few times per minute make the UI work slower than expected

@jcabrerizo jcabrerizo marked this pull request as ready for review April 24, 2023 08:56
@ahgittin
Copy link
Contributor

@jcabrerizo this is effective but a few things seem wrong to me:

(1) Sanitizer.suppress methods are to replace sensitive fields eg password with a hashcode so they are hidden. It feels like the wrong place to put logic for excluding content which is unwanted for other reasons eg optimizing size.

(2) getValueForDisplay doing a global exclusion of fields anywhere, to hide output and stderr in workflow and tasks, also feels wrong, and likely to exclude those fields in other places where we might want them. Can this logic live elsewhere so exclusion be more targeted?

The TaskTransformer feels like it might be the right place for both the above to live.

Where Workflow objects are stored as sensors it is trickier, as probably when listing sensors we normally don't usually want a huge amount of data back, but also we don't have the same context as elsewhere. I'm uncomfortable just removing any field named output, seems arbitrary. Not sure what to do here.

@jcabrerizo
Copy link
Contributor Author

@ahgittin I didn't find a better way to move out the output without modifying so much the fromTasks logic. My feeling is this a correct balance: it removes any value from a task which key is one of "output", "stdout" or "stderr" only when you getWorkflows is invoked, for the activities summary, not for the details. The output is not rendered currently in the UI

asfgit pushed a commit that referenced this pull request May 24, 2023
Some tweaks:
* tidy name of methods and params - call it filterOutputFields instead of suppressOutput (to me, suppress hints at security)
* expose the capability on the generic resolver (used for some sensors, below)
* when doing the filter, don't bother if it's short/trivial, and if it's not a string include the type for info
* change implementation of filter to run even if suppress secrets is false
* (in UI, remove one redundant activities call)

And some additional restrictions:
* exclude detail from tasks when doing a recursive/children lookup
* filter output fields in tasks when detail is excluded
* filter output fields from children and recursive task requests by default
* filter output fields for selected sensors (name starts with "internal") when doing a batch read of sensors (this affects the workflow cache which contains all output)

Impact for a simple test is:
* 3 med size workflows - call to list - reduced from 126kb to 2kb
* recursive task call for the above - from 25k to 5k
* sensor batch read - from 120k to 2k
* details of 1 workflow - 42kb to access -> unchanged
@asfgit asfgit merged commit 4fafb37 into apache:master May 24, 2023
@ahgittin
Copy link
Contributor

@jcabrerizo You are right, this makes a huge difference, and I agree with you it's the correct balance where we do it.

In fact I've extended it a bit, and tidied, in 2d6e8cd, as follows:

Use this in a few more places:

  • exclude detail from tasks when doing a recursive/children lookup
  • filter output fields in tasks when detail is excluded
  • filter output fields from children and recursive task requests by default
  • filter output fields for selected sensors (name starts with "internal") when doing a batch read of sensors (this affects the workflow cache which contains all output)
  • also filter various content... fields used by the http workflow step

And some tidying/tweaks:

  • change name of methods and params - call it filterOutputFields instead of suppressOutput (to me, suppress hints at security)
  • expose the capability on the generic resolver (used for the sensors above)
  • when doing the filter, don't bother if it's short/trivial, and if it's not a string include the type for info
  • change implementation of filter to run even if suppress secrets is false
  • (in UI, remove one redundant activities call)

I did a simple test by adding these lines to the SimpleBlueprintTest then attaching a UI and looking at network size:

        WorkflowYamlTest.addWorkflowTypes(mgmt);
        Application app1 = runTestOnBlueprint("services: [ { type: " + WorkflowSoftwareProcess.class.getName() + ", location: localhost } ]");
        WorkflowBasicTest.runWorkflow(app1.getChildren().iterator().next(), Strings.lines("steps:",
                "  - ssh aws cloudformation describe-stacks"), "cfn");
        WorkflowBasicTest.runWorkflow(app1.getChildren().iterator().next(), Strings.lines("steps:",
                "  - ssh aws cloudformation describe-stacks"), "cfn2");
        WorkflowBasicTest.runWorkflow(app1.getChildren().iterator().next(), Strings.lines("steps:",
                "  - ssh aws cloudformation describe-stacks"), "cfn3");

The impact is huge:

  • 3 med size workflows - call to list - reduced from 126kb to 2kb
  • recursive task call for the above - from 25k to 5k
  • sensor batch read - from 120k to 2k
  • details of 1 workflow - 42kb to access (unchanged)

It doesn't address the huge size of persistence but I will add a section to the docs on workflow for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants