Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/dashboard: introduce grafana frontend e2e testing #45811

Merged
merged 2 commits into from Apr 27, 2022

Conversation

nizamial09
Copy link
Member

@nizamial09 nizamial09 commented Apr 7, 2022

Used the https://www.npmjs.com/package/@grafana/e2e npm packages and
followed
https://github.com/grafana/grafana/blob/main/contribute/style-guides/e2e.md
to understand the style of the grafana e2e testing.

In this PR I introduces the tests for the Hosts Overall
Performance and also RGW per Daemon and Overall Performance

Description for a prometheus internal server error
After we increase/decrease the count of the node-exporter, we get a 500
Internal server error from api/prometheus/rules endpoint. On further
debugging its caused by the jsonDecodder, because I guess the expected
input for the json.loads() is not a json formatted input. So to fix
that issue I can either do an error handling on the json.loads() or I
can move the json.loads() on the already existing try block. I went for
the second approach here.

ERROR TRACEBACK:

Apr 26 08:32:51 ceph-node-00 ceph-mgr[11429]: [dashboard ERROR exception] Internal Server Error
                                              Traceback (most recent call last):
                                                File "/usr/share/ceph/mgr/dashboard/services/exception.py", line 47, in dashboard_exception_handler
                                                  return handler(*args, **kwargs)
                                                File "/lib/python3.6/site-packages/cherrypy/_cpdispatch.py", line 54, in __call__
                                                  return self.callable(*self.args, **self.kwargs)
                                                File "/usr/share/ceph/mgr/dashboard/controllers/_base_controller.py", line 263, in inner
                                                  ret = func(*args, **kwargs)
                                                File "/usr/share/ceph/mgr/dashboard/controllers/_rest_controller.py", line 191, in wrapper
                                                  return func(*vpath, **params)
                                                File "/usr/share/ceph/mgr/dashboard/controllers/prometheus.py", line 79, in rules
                                                  return self.prometheus_proxy('GET', '/rules', params)
                                                File "/usr/share/ceph/mgr/dashboard/controllers/prometheus.py", line 34, in prometheus_proxy
                                                  verify=Settings.PROMETHEUS_API_SSL_VERIFY)
                                                File "/usr/share/ceph/mgr/dashboard/controllers/prometheus.py", line 58, in _proxy
                                                  content = json.loads(response.content)
                                                File "/lib64/python3.6/json/__init__.py", line 354, in loads
                                                  return _default_decoder.decode(s)
                                                File "/lib64/python3.6/json/decoder.py", line 339, in decode
                                                  obj, end = self.raw_decode(s, idx=_w(s, 0).end())
                                                File "/lib64/python3.6/json/decoder.py", line 357, in raw_decode
                                                  raise JSONDecodeError("Expecting value", s, err.value) from None
                                              json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Apr 26 08:32:51 ceph-node-00 ceph-mgr[11429]: [dashboard INFO request] [::ffff:192.168.100.1:55554] [GET] [200] [0.032s] [admin] [583.0B] /api/prometheus
Apr 26 08:32:51 ceph-node-00 ceph-mgr[11429]: [dashboard ERROR request] [::ffff:192.168.100.1:55528] [GET] [500] [0.043s] [admin] [513.0B] /api/prometheus/rules
Apr 26 08:32:51 ceph-node-00 ceph-mgr[11429]: [dashboard ERROR request] [b'{"status": "500 Internal Server Error", "detail": "The server encountered an unexpected condition which prevented it from fulfilling the request.", "request_id": "797162d6-0f9b-4d94-89c1-cdc919a6fd3b"}

Fixes: https://tracker.ceph.com/issues/54356
Signed-off-by: Nizamudeen A nia@redhat.com

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@nizamial09 nizamial09 requested a review from a team as a code owner April 7, 2022 13:42
@nizamial09 nizamial09 requested review from avanthakkar and pereman2 and removed request for a team April 7, 2022 13:42
@nizamial09 nizamial09 changed the title mgr/dashboard: introduce grafana frontend e2e testing mgr/dashboard: introduce grafana frontend e2e testing Apr 7, 2022
@nizamial09
Copy link
Member Author

jenkins test dashboard

@nizamial09
Copy link
Member Author

jenkins test dashboard cephadm

@github-actions github-actions bot added this to In progress in Dashboard Apr 7, 2022
@nizamial09 nizamial09 force-pushed the grafana-e2e branch 7 times, most recently from 9517018 to 276e979 Compare April 8, 2022 11:11
const dashboardArr: Input[] = [
{
id: 'GRAFANA_API_URL',
newValue: 'https://192.168.100.100:3000',
Copy link
Member Author

@nizamial09 nizamial09 Apr 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@epuertat @aaSharma14 recently some changes went into cephadm which basically means fetching the fqdn for some config urls. And our kcli configuration causes the fqdn to be something like https://ceph-node-00.ceph-dashboard:3000 and this doesn't work. So for now I had to manually change the grafana-api-url.

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@nizamial09
Copy link
Member Author

jenkins test make check

@avanthakkar
Copy link
Contributor

jenkins test api

Dashboard automation moved this from In progress to Reviewer approved Apr 19, 2022
@@ -11,7 +11,7 @@ mkdir -p /etc/ceph
mon_ip=$(ifconfig eth0 | grep 'inet ' | awk '{ print $2}')

bootstrap_extra_options='--allow-fqdn-hostname --dashboard-password-noupdate'
bootstrap_extra_options_not_expanded='--skip-monitoring-stack'
bootstrap_extra_options_not_expanded=''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!
Can we instead remove all these lines (14-17)? As anyways we don't need extra options or we can just comment these lines maybe with a comment that they are temporarily commented, if any extra options needs to be added in future can be done here. Wdyt @nizamial09 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! @avanthakkar

Copy link
Member

@epuertat epuertat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice job @nizamial09 ! Just some suggestions & comments over there, but overall looks good to me.

@@ -11,7 +11,7 @@ mkdir -p /etc/ceph
mon_ip=$(ifconfig eth0 | grep 'inet ' | awk '{ print $2}')

bootstrap_extra_options='--allow-fqdn-hostname --dashboard-password-noupdate'
bootstrap_extra_options_not_expanded='--skip-monitoring-stack'
bootstrap_extra_options_not_expanded=''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didnt' realize we weren't running this... So technically our cephadm CI didn't cover the Grafana image PR. Well, it's good to have this enabled. Are we somehow testing the the monitoring stack is healthy? Grafana, prometheus, node-exporter,... are up?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disabled it previously because of the mgr restart that occurs when prometheus is started which interrupts the dashboard e2e. Now I disabled it found a workaround for it. Also I introduced a check to check whether all the monitoring stacks are up and running in this PR.

# otherwise creating the prometheus service cause
# mgr restart and cause the dashboard to not accessible
# briefly.
sleep 120
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know I'm not a fan of unconditional waits: if this finishes in 60 seconds, we'll waste 1 minute waiting; if it takes longer, it'll fail. If this starts failing, we'll simply increase the timer to 3 minutes, 4 minutes, and so on.

Instead, what about actively checking whether all required services are ok? Having pre-conditions or assertions (or waits based on those) prior to start a test is a good practice that stabilizes complex (integration, e2e) testings. Conditionally waiting up to... 5 or 10 minutes (or the average wait + 3 times the std dev, which is exactly the 3-σ or >99% for a "process following a normal distribution") is ok, as long as it's conditional (most of the times is less than that).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. and I've made some changes here. Can you take a look and see if its okay?

@@ -18,5 +18,6 @@
"mochaFile": "cypress/reports/results-[hash].xml"
}
},
"retries": 1
"retries": 1,
"chromeWebSecurity": false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it needed (I guess iframe or cookie-related?) and... are there any consequences of this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its because of the iframe yes and we need to bypass that issue. I went through the cypress official doc (workarounds are not really applicable for the iframe). Also I searched for a while for some serious consequences and I couldn't find one. Still, I removed the chromeWebSecurity from the cypress.json file and limited it to only use on the cephadm e2e tests.

Comment on lines +19 to +21
log({ message, optional }) {
optional ? console.log(message, optional) : console.log(message);
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the grafana-e2e commands like e2e.components.Panels.Panel.title... were failing because it couldn't find a command called log. So I had to manually add it on our code. Then it started passing.

@nizamial09 nizamial09 force-pushed the grafana-e2e branch 2 times, most recently from ea6284e to 3fad102 Compare April 25, 2022 06:39
@nizamial09
Copy link
Member Author

jenkins test dashboard cephadm

@nizamial09 nizamial09 force-pushed the grafana-e2e branch 7 times, most recently from d3ddd9e to 49c1d25 Compare April 26, 2022 10:55
src/pybind/mgr/dashboard/controllers/prometheus.py Outdated Show resolved Hide resolved
Comment on lines +15 to +20
# commenting the below lines. Uncomment it when any extra options are
# needed for the bootstrap.
# bootstrap_extra_options_not_expanded=''
# {% if expanded_cluster is not defined %}
# bootstrap_extra_options+=" ${bootstrap_extra_options_not_expanded}"
# {% endif %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra_options are just commented for now since we are not passing any extra options. It might be useful for later uses if we decides to pass some extra options so I thought I'll leave it here. Also, Avan pointed out to comment it out too. Maybe its useful in local too if we want to start a cluster and we don't want monitoring stacks, we can just add that skip option here.

@nizamial09 nizamial09 force-pushed the grafana-e2e branch 4 times, most recently from 9b6bbe3 to cf998b3 Compare April 26, 2022 15:48
Used the https://www.npmjs.com/package/@grafana/e2e npm packages and
followed
https://github.com/grafana/grafana/blob/main/contribute/style-guides/e2e.md
to understand the style of the grafana e2e testing.

In this PR I introduces the tests for the Hosts Overall
Performance and also RGW per Daemon and Overall Performance

Fixes: https://tracker.ceph.com/issues/54356
Signed-off-by: Nizamudeen A <nia@redhat.com>
After we increase/decrease the count of the node-exporter, we get a 500
- Internal server error from api/prometheus/rules endpoint. On further
  debugging its caused by the jsonDecodder, because I guess the expected
input for the json.loads() is not a json formatted input. So to fix
that issue I can either do an error handling on the json.loads() or I
can move the json.loads() on the already existing try block. I went for
the second approach here.

Fixes: https://tracker.ceph.com/issues/54356
Signed-off-by: Nizamudeen A <nia@redhat.com>
@nizamial09
Copy link
Member Author

jenkins test dashboard

@nizamial09 nizamial09 moved this from Reviewer approved to Ready-to-merge in Dashboard Apr 27, 2022
@epuertat epuertat merged commit 88622d2 into ceph:master Apr 27, 2022
Dashboard automation moved this from Ready-to-merge to Done Apr 27, 2022
@epuertat epuertat deleted the grafana-e2e branch April 27, 2022 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Dashboard
  
Done
3 participants