Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

METRON-822 Improve Fastcapa Performance #509

Closed
wants to merge 6 commits into from

Conversation

nickwallen
Copy link
Contributor

@nickwallen nickwallen commented Apr 4, 2017

This PR contains significant improvements to the performance and scalability of Fastcapa.

  • Previously the 'distributor' framework was used. This did not scale well and has been replaced with a burst-oriented design.
  • Receive and transmission functions have been separated to allow each to scale independently.
  • Additional parameters have been added to allow the process to be tuned easily.
  • Output provides basic transparency into the current state of processing.
  • If the probe is overwhelmed with more packets than can be handled, it will continue processing the packets that it can without crashing.
  • The fact that packets are being dropped is very clear.
  • A great deal of documentation has been added to the README.
  • Support for latest release of DPDK; v16.11.1
  • Support for latest release of RdKafka; v0.9.4

This change has been tested on Cisco UCS hardware with a 10G Cisco VNIC. The probe was able to capture 1 gbps before packets started to drop. Additional performance tuning would push this ceiling much higher, but for my purposes, I just needed to reach 1 gbps. Additional work will proceed in the future to find its true performance ceiling.

To test the change yourself, simply spin-up the virtualized test environment which will deploy and validate that Fastcapa can land packets in Kafka correctly.

cd metron-deployment/vagrant/fastcapa-test-environment
vagrant up

Pull Request Checklist

For all changes:

  • Is there a JIRA ticket associated with this PR?
  • Does your PR title start with METRON-XXXX?
  • Has your PR been rebased against the latest commit within the target branch?

For code changes:

  • Have you included steps to reproduce the behavior or problem that is being changed or addressed?
  • Have you included steps or a guide to how the change may be verified and tested manually?
  • Have you ensured that the full suite of tests and checks have been executed in the root incubating-metron folder via:
  • Have you written or updated unit tests and or integration tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?

@nickwallen
Copy link
Contributor Author

The additional commits are showing up as Github is out-of-sync with Apache. That should clear-up once they sync back up.

Copy link
Member

@cestella cestella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really great; a few minor nits (and I might be talked out of some of them too if you feel strongly the other way). I'm +1 on this; this really enables the pcap ingestion functionality for us.

// update queue depth of this kafka connection
kaf_conn_stats[conn_id].depth = rd_kafka_outq_len(rk);

// TODO this should be handled by a logging lib that can handle faults and rolling the output file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a JIRA created on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created METRON-828 to track this.

printf("[ -b BURST_SIZE ] defined as %d \n", app.burst_size);

if(app.burst_size < 1 || app.burst_size > MAX_BURST_SIZE) {
printf("Invalid burst size; burst=%u must be in [1, %u]. \n", app.burst_size, MAX_BURST_SIZE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we send this to stderr?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, same comment on the error comments below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, makes sense.

@nickwallen
Copy link
Contributor Author

As a comment on the JIRA itself, I added some details of a performance test showing the probe operating successfully at ~1.1 Gbps.

@asfgit asfgit closed this in 81677fd Apr 6, 2017
lucesape pushed a commit to repairnator/repairnator-experiments that referenced this pull request Apr 12, 2017
@nickwallen nickwallen deleted the METRON-822 branch June 5, 2017 19:04
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants