Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dpdk_telemetry plugin: add plugin for DPDK metrics via DPDK Telemetry library #3273

Merged
merged 8 commits into from Feb 24, 2020

Conversation

reshmapattan
Copy link
Contributor

ChangeLog: dpdk_telemetry plugin: New plugin to fetch DPDK metrics.

This patch introduces a new plugin for collectd, which consumes DPDK metrics
via the dpdk_telemetry library. The collectd plugin here provides an
easy way to use the DPDK telemetry API to query ethernet device metrics.

The collectd plugin retrieves metrics from a DPDK packet forwarding
application by sending a JSON formatted message via a UNIX domain
socket. The DPDK telemetry component will respond with a JSON formatted
reply, delivering the requested metrics. The dpdk_telemetry collectd
plugin parses the JSON data, and publishes the metric values to collectd
for further use.

This plugin has a dependency on the DPDK Telemetry library, as it must be
"in sync" with the DPDK Telemetry implementation.

Signed-off-by: Emma Kenny emma.kenny@intel.com
Signed-off-by: Brian Archbold brian.archbold@intel.com
Signed-off-by: Reshma Pattan reshma.pattan@intel.com

@reshmapattan
Copy link
Contributor Author

Hi ,

I am a new contributor to the collectd and I have pushed new plugin and the build seems failing
https://travis-ci.org/collectd/collectd/jobs/583765756
Not clear with the reason of the build failure, please let me know what is the issue and how to fix it.
Appreciate the help.

Thanks,
Reshma

@rpv-tomsk
Copy link
Contributor

This is typical for Collectd infrastructure to have continuosly failing CI checks.

Nobody cares.

I am a new contributor to the collectd

Don't think things happen fast here. Some guys waits for years before his PR gets merged (Personally, I have been waiting for months).

This is a harsh truth, sorry for bad news.

@rpv-tomsk
Copy link
Contributor

rpv-tomsk commented Sep 11, 2019

Fortunately, you have passed "clang-format" check. That is serious success.

@sunkuranganath
Copy link
Member

Thanks @reshmapattan for your contributions. Looking in to the CI issues now.

@rpv-tomsk
Copy link
Contributor

@sunkuranganath, you is responsible person for Collectd CI?
You have appropriate permissions and technical accesses? When do you received them?

If you is responsible for Collectd CI, will you solve issue wide, or you will help only Intel people, leaving other away (example: #3264)?

@rpv-tomsk
Copy link
Contributor

From my POV, this plugin implementation is strange and needs to be reworked.
There should be no while(1) in read_callback and init_callback.
Plugin should not fail if there is no connection on init. That is not critical state, connection it can be restored later. There should not be race conditions on daemons starting.
And many other remarks, which I don't want to waste time.

@rpv-tomsk
Copy link
Contributor

But Trusted Maintainers can merge this as is, of course.

@sunkuranganath
Copy link
Member

@rpv-tomsk I do not have permissions/technical accesses to the Collectd CI nor that I am responsible for it. However as part of my learning, I am trying to ramp up as much as I can to see what I can do. Some sort of response is better than nothing, especially in open source community. While I am buried with my own deadlines, doing my best.

As per solving the issues, I do believe in helping as many as I can! As we make progress in one, we could help others with similar issues. :)

@reshmapattan
Copy link
Contributor Author

reshmapattan commented Sep 13, 2019

From my POV, this plugin implementation is strange and needs to be reworked.
There should be no while(1) in read_callback and init_callback.
Plugin should not fail if there is no connection on init. That is not critical state, connection it can be restored later. There should not be race conditions on daemons starting.

I will remove the while(1) in read_callback. Also I am planning to separate out while(1) from the init _callback to pthread like below. Any comments?

static void *init_socket(__attribute__((unused)) void *dummy) {
  INFO(PLUGIN_NAME ": %s:%d", __FUNCTION__, __LINE__);
  cdtime_t prev = 0;

  prev = cdtime();
  while (1) {
    /* If server socket is already not running, try connecting
     * for period of plugin interval to provide resiliency.
     * If server doesn't shows up even after plugin interval, mark the
     * initialization as failure.
     */
    if ((cdtime() - prev) > dpdk_tel_interval) {
      ERROR(PLUGIN_NAME ": Cannot retry socket initialization after the "
                        "plugin interval specified");
      return NULL;
    }
    if (dpdk_telemetry_socket_init() < 0) {
      ERROR(PLUGIN_NAME ": Socket initialization failed. Retrying..");
      continue;
    }
    break;
  }
  plugin_register_complex_read(NULL, PLUGIN_NAME, dpdk_telemetry_read,
                               dpdk_tel_interval, NULL);

  return NULL;
}

static int dpdk_telemetry_init(void) {

  int status;

  status = plugin_thread_create(&socket_thread, NULL, init_socket, (void *)0,
                                "dpdk-tel-socket");

  if (status != 0) {
    ERROR(PLUGIN_NAME ": pthread_create failed: %s", STRERRNO);
    return -1;
  }

  return 0;
}

@rpv-tomsk
Copy link
Contributor

rpv-tomsk commented Sep 13, 2019

Some sort of response is better than nothing, especially in open source community.

That's why my #3159 left opened w/o any comments for 4 months. Ok....

@rpv-tomsk
Copy link
Contributor

I will ... and I am planning to ..., any comments.

If you format your comments to be more readable, you will get more responses, I think.

@rpv-tomsk
Copy link
Contributor

Looking in to the CI issues now.

So, you simply can't do anything with this.
Do not reassure others and do not create illusions about community health.
No such problems has been solved over last year, at least.

@reshmapattan
Copy link
Contributor Author

From my POV, this plugin implementation is strange and needs to be reworked.
There should be no while(1) in read_callback and init_callback.
Plugin should not fail if there is no connection on init. That is not critical state, connection it can be restored later. There should not be race conditions on daemons starting.
And many other remarks, which I don't want to waste time.

I have pushed the new changes by removing the while(1) loops. Please provide your feedback.

@mrunge
Copy link
Member

mrunge commented Sep 13, 2019

The CI job failed due to network (or ubuntu mirror issues):
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/s/systemd/udev_229-4ubuntu21.22_amd64.deb Could not connect to apt.cache.travis-ci.com:80 (34.96.81.152), connection timed out

E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/s/systemd/libudev1_229-4ubuntu21.22_amd64.deb Unable to connect to apt.cache.travis-ci.com:http:

E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/liba/libatasmart/libatasmart4_0.19-3_amd64.deb Unable to connect to apt.cache.travis-ci.com:http:

@mrunge
Copy link
Member

mrunge commented Sep 13, 2019

Would it be possible to use something like yajl instead of another json library?

@kkepka
Copy link
Contributor

kkepka commented Sep 13, 2019

yajl seems to be not active (last commit is from 2015), that's why jansson has been chosen instead.

@mrunge
Copy link
Member

mrunge commented Sep 16, 2019

yajl seems to be not active (last commit is from 2015), that's why jansson has been chosen instead.

Oh, I didn't realize that. Good point!

@mrunge
Copy link
Member

mrunge commented Sep 16, 2019

So, CI failed due to networking issues, not because of the submission itself.

@@ -6802,6 +6852,7 @@ AC_PLUGIN([dns], [$with_libpcap], [DNS traffic analy
AC_PLUGIN([dpdkevents], [$plugin_dpdkevents], [Events from DPDK])
AC_PLUGIN([dpdkstat], [$plugin_dpdkstat], [Stats from DPDK])
AC_PLUGIN([drbd], [$plugin_drbd], [DRBD statistics])
AC_PLUGIN([dpdk_telemetry], [$plugin_dpdk_telemetry], [Metrics from DPDK Telemetry])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, keep list sorted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed and available in latest commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rpv-tomsk , Can you please approve the pull request , if no more comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm untrusted contributor and have no such permission.

Copy link
Member

@sunkuranganath sunkuranganath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. +1

anaudx
anaudx previously approved these changes Sep 17, 2019
src/dpdk_telemetry.c Outdated Show resolved Hide resolved
src/dpdk_telemetry.c Outdated Show resolved Hide resolved
src/dpdk_telemetry.c Outdated Show resolved Hide resolved
src/dpdk_telemetry.c Outdated Show resolved Hide resolved
src/dpdk_telemetry.c Outdated Show resolved Hide resolved
configure.ac Outdated Show resolved Hide resolved
@reshmapattan
Copy link
Contributor Author

@anaudx , @kkepka , @sunkuranganath
I pushed the one more commit. Might need re approval.

@mrunge mrunge added this to the 5.11.0 milestone Oct 11, 2019
… library

This patch introduces a new plugin for collectd, which consumes DPDK metrics
via the dpdk_telemetry library. The collectd plugin here provides an
easy way to use the DPDK telemetry API to query ethernet device metrics.

The collectd plugin retrieves metrics from a DPDK packet forwarding
application by sending a JSON formatted message via a UNIX domain
socket. The DPDK telemetry component will respond with a JSON formatted
reply, delivering the requested metrics. The dpdk_telemetry collectd
plugin parses the JSON data, and publishes the metric values to collectd
for further use.

This plugin has a dependency on the DPDK Telemetry library, as it must be
"in sync" with the DPDK Telemetry implementation.

Change-Id: If3343aae4c5473f0574465fab0395b7672fa2488
Signed-off-by: Emma Kenny <emma.kenny@intel.com>
Signed-off-by: Brian Archbold <brian.archbold@intel.com>
Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Remove socket reconnect logic from init callbacks.
Read callbacks will try to connect to the socket if
not connected.

Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Place dpdk_telemetry plugin in sorted order in the list.

Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Some places missing the lib prefix in
"with_libjansson", so add the missing prefix.

Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Use "json_loads" symbol instead of "json_is_object"
in configure.ac .

Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Fix some of the logs level to be debug instead of info.

Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Move read loop callback registration to module_register().

Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Instead of closing each socket during failures, just call
clean up function to be more consistent across the file.

Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Copy link
Contributor

@mkobyli mkobyli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me. I have tested with DPDK 19.08 (testpmd app) and pktgen powered by DPDK.

@reshmapattan
Copy link
Contributor Author

@mrunge : can this be mergered to master now.

@mrunge
Copy link
Member

mrunge commented Nov 19, 2019

I am unfortunately utterly busy with other things currently. This will have to wait (or until someone else approves this PR).

Self approving won't help you. In any case, you could help by reviewing other PRs.

@reshmapattan
Copy link
Contributor Author

reshmapattan commented Nov 19, 2019

I am unfortunately utterly busy with other things currently. This will have to wait (or until someone else approves this PR).

Self approving won't help you. In any case, you could help by reviewing other PRs.

Ok , if you are busy I can wait. In case if you missed it is this already aprroved by mkobyli .

@kkepka
Copy link
Contributor

kkepka commented Nov 19, 2019

I believe @reshmapattan just wanted to friendly ping / bump this up here, as 5.10 is out and window for new features for 5.11 is open now.
I wouldn't see it as self-approved, more like feedback for maintainers and others. Sorry if you see it that way, maybe in future such feedback could be provided via comment only (without approval checkbox?)

@sunkuranganath
Copy link
Member

Given there arent any more comments/review, i hope we can approve this to make in to 5.11 release.

@mrunge
Copy link
Member

mrunge commented Feb 13, 2020

Given there arent any more comments/review, i hope we can approve this to make in to 5.11 release.

There hasn't been any serious reviews for now....

@dago dago merged commit 8f22018 into collectd:master Feb 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants