Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: temporarily ignore sanity checks when creating traceflow via UI #1097

Merged
merged 1 commit into from
Aug 21, 2020

Conversation

ZhangYW18
Copy link
Contributor

@ZhangYW18 ZhangYW18 commented Aug 17, 2020

The plugin contains multiple sanity checks to the user input: whether the IP can be converted to a standard IPv4 IP, whether the namespace/pod name meets the requirement of Kubernetes naming rule, etc. Currently, when one of those checks fails with a specific user's input submitted, the plugin work as usual (jump back to the home page of traceflow plugin) and users cannot be notified due to the restrictions of Octant. This may confuse the users a lot.

The patch is a temporary change to allow users to create Traceflow even though their inputs do not meet Traceflow criteria.
Users' requests for creating traceflow would be submitted to Antrea despite the results of sanity checks, and if some error occurs, the phase of created traceflow would be marked as "Failed" and the detailed error can be found in the "Reason" field of traceflow CRD. Since Octant has just supported alerts, we will show alerts and prevent these kinds of Traceflow creation later after upgrading Octant.

@antrea-bot
Copy link
Collaborator

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

  • /test-e2e: to trigger e2e tests.
  • /skip-e2e: to skip e2e tests.
  • /test-conformance: to trigger conformance tests.
  • /skip-conformance: to skip conformance tests.
  • /test-whole-conformance: to trigger all conformance tests on linux.
  • /skip-whole-conformance: to skip all conformance tests on linux.
  • /test-networkpolicy: to trigger networkpolicy tests.
  • /skip-networkpolicy: to skip networkpolicy tests.
  • /test-windows-conformance: to trigger windows conformance tests.
  • /skip-windows-conformance: to skip windows conformance tests.
  • /test-windows-networkpolicy: to trigger windows networkpolicy tests.
  • /skip-windows-networkpolicy: to skip windows networkpolicy tests.
  • /test-hw-offload: to trigger ovs hardware offload test.
  • /skip-hw-offload: to skip ovs hardware offload test.
  • /test-all: to trigger all tests (except whole conformance).
  • /skip-all: to skip all tests (except whole conformance).

@ZhangYW18
Copy link
Contributor Author

/test-all

2 similar comments
@ZhangYW18
Copy link
Contributor Author

/test-all

@lzhecheng
Copy link
Contributor

/test-all

@mengdie-song
Copy link
Contributor

The change LGTM, however, the description is a little confusing. In order to avoid misunderstanding, the patch is a temporary change to allow users to create Traceflow even though their inputs do not meet Traceflow criteria. Since octant has just supported alerts, we will show alerts and prevent these kinds of Traceflow creation later.

@lzhecheng
Copy link
Contributor

/test-hw-offload

@tnqn
Copy link
Member

tnqn commented Aug 17, 2020

allow users to create Traceflow even though their inputs do not meet Traceflow criteria

Then what would users get if the inputs are invalid with this patch? I think the CRD creation would fail.

@gran-vmv
Copy link
Contributor

allow users to create Traceflow even though their inputs do not meet Traceflow criteria

Then what would users get if the inputs are invalid with this patch? I think the CRD creation would fail.

I prefer to report error to User in UI if apiserver responded error or CRD status.phase is "Failed".

@ZhangYW18
Copy link
Contributor Author

ZhangYW18 commented Aug 17, 2020

allow users to create Traceflow even though their inputs do not meet Traceflow criteria

Then what would users get if the inputs are invalid with this patch? I think the CRD creation would fail.

I prefer to report error to User in UI if apiserver responded error or CRD status.phase is "Failed".

Currently we cannot send alerts to users in Octant plugins if we do not update Octant.

@ZhangYW18
Copy link
Contributor Author

allow users to create Traceflow even though their inputs do not meet Traceflow criteria

Then what would users get if the inputs are invalid with this patch? I think the CRD creation would fail.

I prefer to report error to User in UI if apiserver responded error or CRD status.phase is "Failed".

More details can be seen in this issue: vmware-archive/octant#902.

@mengdie-song
Copy link
Contributor

@tnqn @gran-vmv It has been reported that invalid Traceflows can not be created by UI, however, it can be created via kubectl and yaml file, e.g. source port to be empty. Users thought it was an UI issue, but it was actually because UI blocked invalid creation without showing enough information. The best solution should be both UI and Kubectl can provide more information for users to identify invalid input.

@mengdie-song
Copy link
Contributor

@ZhangYW18 If the alert change is not big and we plan to enhance input check, I think we will not need this temporary change since it will bring more confusions.

@ZhangYW18
Copy link
Contributor Author

@tnqn @gran-vmv It has been reported that invalid Traceflows can not be created by UI, however, it can be created via kubectl and yaml file, e.g. source port to be empty. Users thought it was an UI issue, but it was actually because UI blocked invalid creation without showing enough information. The best solution should be both UI and Kubectl can provide more information for users to identify invalid input.

Thanks for Mengdie's explanation. One thing I have to add: Sometimes it is allowed to input an empty string as the source port, and the traceflow CRD would be generated successfully with Phase "Success" via kubectl and yaml file under such circumstance.

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still confused that what change users would expect after applying patch. I think the CRD validation will still fail the creation and how users get the failure information before the octant change you mentioned is included.
And this patch seems removing some necessary "return nil" wrongly.

plugins/octant/cmd/antrea-octant-plugin/traceflow.go Outdated Show resolved Hide resolved
@ZhangYW18
Copy link
Contributor Author

I'm still confused that what change users would expect after applying patch. I think the CRD validation will still fail the creation and how users get the failure information before the octant change you mentioned is included.
And this patch seems removing some necessary "return nil" wrongly.

Before this change, with the illegal input submitted, users will see the UI work as usual (jump back to the home page of traceflow plugin), while the backend does not submit requests to Antrea indeed. After the change, the request for creating traceflow would be submitted to Antrea, and if some error occurs, the phase of created traceflow would be marked as "Failed" and the detailed error can be found in the "Reason" field of traceflow CRD.

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of agree that this is a slight improvement over the current behavior, until we can upgrade Octant to a version that supports alerts. At least the behavior will be more consistent across the UI and creating the CRD directly. So I am personally ok with merging this.

Please consider doing the following though:

  • add a TODO in the code to express that this code will be improved when we upgrade Octant to a version that supports alerts (upcoming v0.16).
  • add a more detailed commit message, to match the PR description
  • in the PR description, "judges" doesn't make sense for what you are trying to express. Try: "The plugin contains multiple sanity checks for the user input."

I also agree with @tnqn that the return nil needs to be preserved for the case where CRD creation fails.

I am afraid that the current log messages for the sanity checks may also be confusing since they imply a failure, yet we continue with CRD creation. Maybe instead of something like this Failed to get srcNamespace as string: ..., we should wrap the error so that the final log message looks like this: Invalid user input, CRD creation or Traceflow request may fail: failed to get srcNamespace as string: ...

@ZhangYW18 ZhangYW18 force-pushed the tf-ui-fix branch 2 times, most recently from f693da2 to 1576d8f Compare August 18, 2020 02:22
@ZhangYW18 ZhangYW18 changed the title Fix Antrea Octant plugin no notification bug Fix: temporarily ignore sanity checks when creating traceflow via UI Aug 18, 2020
@ZhangYW18
Copy link
Contributor Author

I kind of agree that this is a slight improvement over the current behavior, until we can upgrade Octant to a version that supports alerts. At least the behavior will be more consistent across the UI and creating the CRD directly. So I am personally ok with merging this.

Please consider doing the following though:

  • add a TODO in the code to express that this code will be improved when we upgrade Octant to a version that supports alerts (upcoming v0.16).
  • add a more detailed commit message, to match the PR description
  • in the PR description, "judges" doesn't make sense for what you are trying to express. Try: "The plugin contains multiple sanity checks for the user input."

I also agree with @tnqn that the return nil needs to be preserved for the case where CRD creation fails.

I am afraid that the current log messages for the sanity checks may also be confusing since they imply a failure, yet we continue with CRD creation. Maybe instead of something like this Failed to get srcNamespace as string: ..., we should wrap the error so that the final log message looks like this: Invalid user input, CRD creation or Traceflow request may fail: failed to get srcNamespace as string: ...

Thanks for Antonin's inspiring suggestions. I have updated the logs, the commit message, etc.

@ZhangYW18 ZhangYW18 force-pushed the tf-ui-fix branch 2 times, most recently from f69fe64 to acf4205 Compare August 18, 2020 03:53
@ZhangYW18
Copy link
Contributor Author

/test-all

@@ -281,8 +282,8 @@ func (p *antreaOctantPlugin) actionHandler(request *service.ActionRequest) error
case showGraphAction:
name, err := request.Payload.String(traceNameCol)
if err != nil {
log.Printf("Failed to get name at string : %w", err)
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it fails to get the trace name, should it return? otherwise the graph would be wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter whether we return or not here. If the name is empty, then the plugin gets a traceflow CRD with all fields being empty value, and then generate an empty graph.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even the result is same, the readability and maintainability is different, returning here ensures the following code has a valid traceflow name, otherwise people would need to worry what's the behavior if getting traceflow with an empty name, and what would the graph look like with an empty string.
I think this patch shouldn't make the change to worse the readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return nil added.

log.Printf("Failed to get name at string : %w", err)
return nil
log.Printf("Invalid user input, CRD creation or Traceflow request may fail: "+
"failed to get name at string : %s", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"failed to get name at string : %s", err)
"failed to get name at string: %s", err)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra space removed.

@ZhangYW18 ZhangYW18 force-pushed the tf-ui-fix branch 2 times, most recently from 92c2b04 to 731eed1 Compare August 19, 2020 01:02
@ZhangYW18
Copy link
Contributor Author

/test-all

@abdallahyas
Copy link
Contributor

/test-hw-offload

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

log.Printf("Duplicate traceflow \"%s\": same source pod and destination pod in less than one second"+
": %+v. ", tfName, tfOld)
return nil
log.Printf("Invalid user input, CRD creation or Traceflow request may fail: "+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When iterating this feature, I suggest adding a suffix of a random string to save the step, or maybe we could just set "generateName" instead of "name" to ask apiserver to create a unique name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I fix it in this patch or fix it later?

@tnqn
Copy link
Member

tnqn commented Aug 19, 2020

/skip-conformance
/skip-networkpolicy
/skip-windows-conformance
/skip-windows-networkpolicy
/test-e2e

@tnqn
Copy link
Member

tnqn commented Aug 20, 2020

@mengdie-song @antoninbas do you have more comments? will merge it if not.

Copy link
Contributor

@mengdie-song mengdie-song left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants