New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mgr/dashboard: introduce grafana frontend e2e testing #45811
Conversation
jenkins test dashboard |
jenkins test dashboard cephadm |
9517018
to
276e979
Compare
const dashboardArr: Input[] = [ | ||
{ | ||
id: 'GRAFANA_API_URL', | ||
newValue: 'https://192.168.100.100:3000', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@epuertat @aaSharma14 recently some changes went into cephadm which basically means fetching the fqdn for some config urls. And our kcli configuration causes the fqdn to be something like https://ceph-node-00.ceph-dashboard:3000
and this doesn't work. So for now I had to manually change the grafana-api-url.
5cc5c41
to
e79bd58
Compare
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
jenkins test make check |
jenkins test api |
@@ -11,7 +11,7 @@ mkdir -p /etc/ceph | |||
mon_ip=$(ifconfig eth0 | grep 'inet ' | awk '{ print $2}') | |||
|
|||
bootstrap_extra_options='--allow-fqdn-hostname --dashboard-password-noupdate' | |||
bootstrap_extra_options_not_expanded='--skip-monitoring-stack' | |||
bootstrap_extra_options_not_expanded='' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Can we instead remove all these lines (14-17)? As anyways we don't need extra options or we can just comment these lines maybe with a comment that they are temporarily commented, if any extra options needs to be added in future can be done here. Wdyt @nizamial09 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! @avanthakkar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice job @nizamial09 ! Just some suggestions & comments over there, but overall looks good to me.
@@ -11,7 +11,7 @@ mkdir -p /etc/ceph | |||
mon_ip=$(ifconfig eth0 | grep 'inet ' | awk '{ print $2}') | |||
|
|||
bootstrap_extra_options='--allow-fqdn-hostname --dashboard-password-noupdate' | |||
bootstrap_extra_options_not_expanded='--skip-monitoring-stack' | |||
bootstrap_extra_options_not_expanded='' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didnt' realize we weren't running this... So technically our cephadm CI didn't cover the Grafana image PR. Well, it's good to have this enabled. Are we somehow testing the the monitoring stack is healthy? Grafana, prometheus, node-exporter,... are up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disabled it previously because of the mgr restart that occurs when prometheus is started which interrupts the dashboard e2e. Now I disabled it found a workaround for it. Also I introduced a check to check whether all the monitoring stacks are up and running in this PR.
# otherwise creating the prometheus service cause | ||
# mgr restart and cause the dashboard to not accessible | ||
# briefly. | ||
sleep 120 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know I'm not a fan of unconditional waits: if this finishes in 60 seconds, we'll waste 1 minute waiting; if it takes longer, it'll fail. If this starts failing, we'll simply increase the timer to 3 minutes, 4 minutes, and so on.
Instead, what about actively checking whether all required services are ok? Having pre-conditions or assertions (or waits based on those) prior to start a test is a good practice that stabilizes complex (integration, e2e) testings. Conditionally waiting up to... 5 or 10 minutes (or the average wait + 3 times the std dev, which is exactly the 3-σ or >99% for a "process following a normal distribution") is ok, as long as it's conditional (most of the times is less than that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. and I've made some changes here. Can you take a look and see if its okay?
@@ -18,5 +18,6 @@ | |||
"mochaFile": "cypress/reports/results-[hash].xml" | |||
} | |||
}, | |||
"retries": 1 | |||
"retries": 1, | |||
"chromeWebSecurity": false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it needed (I guess iframe or cookie-related?) and... are there any consequences of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its because of the iframe yes and we need to bypass that issue. I went through the cypress official doc (workarounds are not really applicable for the iframe). Also I searched for a while for some serious consequences and I couldn't find one. Still, I removed the chromeWebSecurity
from the cypress.json file and limited it to only use on the cephadm e2e tests.
src/pybind/mgr/dashboard/frontend/cypress/integration/orchestrator/grafana/grafana.feature
Outdated
Show resolved
Hide resolved
log({ message, optional }) { | ||
optional ? console.log(message, optional) : console.log(message); | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the grafana-e2e commands like e2e.components.Panels.Panel.title...
were failing because it couldn't find a command called log
. So I had to manually add it on our code. Then it started passing.
ea6284e
to
3fad102
Compare
jenkins test dashboard cephadm |
d3ddd9e
to
49c1d25
Compare
# commenting the below lines. Uncomment it when any extra options are | ||
# needed for the bootstrap. | ||
# bootstrap_extra_options_not_expanded='' | ||
# {% if expanded_cluster is not defined %} | ||
# bootstrap_extra_options+=" ${bootstrap_extra_options_not_expanded}" | ||
# {% endif %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra_options
are just commented for now since we are not passing any extra options. It might be useful for later uses if we decides to pass some extra options so I thought I'll leave it here. Also, Avan pointed out to comment it out too. Maybe its useful in local too if we want to start a cluster and we don't want monitoring stacks, we can just add that skip option here.
9b6bbe3
to
cf998b3
Compare
Used the https://www.npmjs.com/package/@grafana/e2e npm packages and followed https://github.com/grafana/grafana/blob/main/contribute/style-guides/e2e.md to understand the style of the grafana e2e testing. In this PR I introduces the tests for the Hosts Overall Performance and also RGW per Daemon and Overall Performance Fixes: https://tracker.ceph.com/issues/54356 Signed-off-by: Nizamudeen A <nia@redhat.com>
After we increase/decrease the count of the node-exporter, we get a 500 - Internal server error from api/prometheus/rules endpoint. On further debugging its caused by the jsonDecodder, because I guess the expected input for the json.loads() is not a json formatted input. So to fix that issue I can either do an error handling on the json.loads() or I can move the json.loads() on the already existing try block. I went for the second approach here. Fixes: https://tracker.ceph.com/issues/54356 Signed-off-by: Nizamudeen A <nia@redhat.com>
jenkins test dashboard |
Used the https://www.npmjs.com/package/@grafana/e2e npm packages and
followed
https://github.com/grafana/grafana/blob/main/contribute/style-guides/e2e.md
to understand the style of the grafana e2e testing.
In this PR I introduces the tests for the Hosts Overall
Performance and also RGW per Daemon and Overall Performance
Description for a prometheus internal server error
After we increase/decrease the count of the node-exporter, we get a 500
Internal server error from api/prometheus/rules endpoint. On further
debugging its caused by the jsonDecodder, because I guess the expected
input for the json.loads() is not a json formatted input. So to fix
that issue I can either do an error handling on the json.loads() or I
can move the json.loads() on the already existing try block. I went for
the second approach here.
ERROR TRACEBACK:
Fixes: https://tracker.ceph.com/issues/54356
Signed-off-by: Nizamudeen A nia@redhat.com
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows