Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected E2E test failure #2425

Closed
wcs1only opened this issue Nov 11, 2020 · 4 comments · Fixed by #2430
Closed

Unexpected E2E test failure #2425

wcs1only opened this issue Nov 11, 2020 · 4 comments · Fixed by #2430
Assignees
Labels
kind/bug Something isn't working P0

Comments

@wcs1only
Copy link
Contributor

wcs1only commented Nov 11, 2020

In what area(s)?

/area test-and-release

What version of Dapr?

edge

Expected Behavior

Test should get 403 for disallowed secrets. Expected log

2020/11/10 20:34:57 Fetching state from http://localhost:3500/v1.0/secrets/kubernetes/daprsecret2?metadata.namespace=dapr-tests
2020/11/10 20:34:57 Non 200 StatusCode: 403
2020/11/10 20:34:57 Error status code 403: got err response for key daprsecret2 from Dapr: {"errorCode":"ERR_PERMISSION_DENIED","message":"access denied by policy to get \"daprsecret2\" from \"kubernetes\""}
2020/11/10 20:34:57 Processing request for /test/get
2020/11/10 20:34:57 Processing get request for 1 states.

Actual Behavior

At least once, we've observed the test able to access what should be a forbidden secret. This occurred on windows, though we do not yet know if this is a platform specific issue:
https://github.com/dapr/dapr/runs/1381813018?check_suite_focus=true


TimeGenerated | LogEntry
-- | --
2020-11-10T20:57:10Z | 2020/11/10 20:57:10 Processing get request for daprsecret2.
2020-11-10T20:57:10Z | 2020/11/10 20:57:10 Fetching state from http://localhost:3500/v1.0/secrets/kubernetes/daprsecret2?metadata.namespace=dapr-tests
2020-11-10T20:57:10Z | 2020/11/10 20:57:10 Found secret for key daprsecret2: {"username":"admin"}
2020-11-10T20:57:10Z | 2020/11/10 20:57:10 Result for get request for key daprsecret2: &map[username:admin]
2020-11-10T20:57:10Z | 2020/11/10 20:57:10 Result for get request for 1 secrets: [{daprsecret2 kubernetes 0xc0000a01a8}]
2020-11-10T20:57:10Z | 2020/11/10 20:57:10 Processing get request for nonexistentsecret.
2020-11-10T20:57:10Z | 2020/11/10 20:57:10 Fetching state from 

The logs also indicate that the secreteapp config was applied correctly:
https://github.com/dapr/dapr/runs/1381813018?check_suite_focus=true#step:15:33

Steps to Reproduce the Problem

No reproduction yet. At the very least, we must have an understanding on why this happened before going to release.

Release Note

RELEASE NOTE: Fix Fail Dapr sidecar if specified configuration cannot be loaded properly.

@wcs1only wcs1only added the kind/bug Something isn't working label Nov 11, 2020
@wcs1only wcs1only added this to To do in 1.0.0 Milestone 1 via automation Nov 11, 2020
@wcs1only wcs1only added this to To do in 1.0.0 Milestone via automation Nov 11, 2020
@wcs1only wcs1only self-assigned this Nov 11, 2020
@artursouza artursouza moved this from To do to In progress in 1.0.0 Milestone 1 Nov 11, 2020
@artursouza artursouza moved this from To do to In progress in 1.0.0 Milestone Nov 11, 2020
@mukundansundar
Copy link
Contributor

mukundansundar commented Nov 11, 2020

cc @wcs1only @yaron2
Based on comments #1782 (comment) and #1782 (comment), the default behavior is to allow all secrets if no config is present.

Though I am able to see that the secretappconfig is created

kubectl apply -f ./tests/config/kubernetes_secret_config.yaml --namespace dapr-tests
configuration.dapr.io/secretappconfig created

and the logs do say

2020/11/10 20:55:00 Adding app {secretapp 0  true e2e-secretapp:dapr-aks-e2e-07 *** 1 true  secretappconfig 4.0 0.1 800Mi 250Mi 4.0 0.1 512Mi 250Mi}

So the app should contain the secretappconfig.

Additional to the logs in the description, I am only able to see 2 logs form daprd container:

11/10/2020, 8:57:12.000 PMtime="2020-11-10T20:57:12.5703613Z" level=info msg="dapr shutting down. Waiting 5 seconds to finish outstanding operations" app_id=secretapp instance=secretapp-7cf96d859c-jnl5h scope=dapr.runtime type=log ver=edge

11/10/2020, 8:57:12.000 PMtime="2020-11-10T20:57:12.5703613Z" level=info msg="dapr shutting down. Waiting 5 seconds to finish outstanding operations" app_id=secretapp instance=secretapp-7cf96d859c-jnl5h scope=dapr.runtime type=log ver=edge
``` @wcs1only also confirmed this on querying log analytics. 

@wcs1only
Copy link
Contributor Author

wcs1only commented Nov 11, 2020

So, I have been running both the windows and linux secreteapp tests in a tight loop. Each is at 10+ iterations. Working hypotheses:

  1. Configuration was not applied correctly, despite the kubectl apply succeeding.
  2. The pod was not created with the configuration correctly linked.
  3. A bug in daprd causes us to fail to read the configuration.
  4. A bug in daprd causes us to not respect the configuration.

@wcs1only
Copy link
Contributor Author

28x runs in a tight loop on both windows and linux. No repro yet. This doesn't eliminate the possibility of some configuration application failure.

@wcs1only
Copy link
Contributor Author

wcs1only commented Nov 11, 2020

Ok, overnight we got a total of
38 runs of the full test suite on windows
168 runs of the full test suite on linux
10 normal runs of the test.

https://github.com/dapr/dapr/runs/1386112166?check_suite_focus=true#step:20:232
We got 1 test failure (in a different test than secretapp). This time we have logs:

time="2020-11-11T16:56:31.2019687Z" level=info msg="metrics server started on :9090/" app_id=disabledmetric instance=disabledmetric-5866c54-gk9xv scope=dapr.metrics type=log ver=edge
time="2020-11-11T16:56:36.2929002Z" level=warning msg="error loading configuration: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp: lookup dapr-api.dapr-tests.svc.cluster.local: no such host\"" app_id=disabledmetric instance=disabledmetric-5866c54-gk9xv scope=dapr.runtime type=log ver=edge
time="2020-11-11T16:56:36.2929002Z" level=info msg="loading default configuration" app_id=disabledmetric instance=disabledmetric-5866c54-gk9xv scope=dapr.runtime type=log ver=edge

This is good news, and a hot lead. A failure of exactly this type would cause the behavior we observed in the first failed test.

wcs1only added a commit to wcs1only/dapr that referenced this issue Nov 11, 2020
@wcs1only wcs1only added the P0 label Nov 11, 2020
@wcs1only wcs1only changed the title [Release Blocker] Unexpected E2E test failure Unexpected E2E test failure Nov 11, 2020
1.0.0 Milestone 1 automation moved this from In progress to Review Nov 11, 2020
1.0.0 Milestone automation moved this from In progress to Done Nov 11, 2020
@wcs1only wcs1only moved this from Review to Done in 1.0.0 Milestone 1 Nov 13, 2020
@artursouza artursouza moved this from Done to Release in 1.0.0 Milestone 1 Nov 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working P0
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

2 participants