METRON-1467: Replace guava caches in places where the keyspace might be large #947

cestella · 2018-03-02T23:52:44Z

Contributor Comments

Based on the performance tuning exercise as part of METRON-1460, guava has difficulties with cache sizes over 10k. We, unfortunately, are quite demanding of guava in this regard so we should transition a few uses of guava to Caffeine:

Stellar processor cache
The JoinBolt cache
The Enrichment Bolt Cache

Test plan:
First, spin up full-dev and ensure things continue to work.

For myself, I created a new type called dummy, added some stellar enrichments, stellar threat triage enrichments and a triage rule to ensure things continued to work with stellar (exercising the stellar processor cache and the enrichment bolt cache).

Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.
Please refer to our Development Guidelines for the complete guide to follow for contributions.
Please refer also to our Build Verification Guidelines for complete smoke testing guides.

In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following:

For all changes:

Is there a JIRA ticket associated with this PR? If not one needs to be created at Metron Jira.
Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically master)?

For code changes:

Have you included steps to reproduce the behavior or problem that is being changed or addressed?
Have you included steps or a guide to how the change may be verified and tested manually?
Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:
```
mvn -q clean integration-test install && dev-utilities/build-utils/verify_licenses.sh 
```
Have you written or updated unit tests and or integration tests to verify your changes?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?

For documentation related changes:

Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via site-book/target/site/index.html:
```
cd site-book
mvn site
```

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
It is also recommended that travis-ci is set up for your personal repository such that your branches are built there before submitting a pull request.

…l size

cestella · 2018-03-05T18:52:59Z

I ran this up with vagrant and ensured:

Normal stellar works still in field transformations as well as enrichments
swapped in and out new enrichments live
swapped in and out new threat intel live

nickwallen

Everything looks good. Just one comment on the "remove listener".

We know how important performance of these caches are, so it is important to sweat the small stuff here.

nickwallen · 2018-03-07T14:51:27Z

metron-platform/metron-enrichment/src/main/java/org/apache/metron/enrichment/bolt/JoinBolt.java

+    loader = s -> new HashMap<>();
+    cache = Caffeine.newBuilder().maximumSize(maxCacheSize)
+                         .expireAfterWrite(maxTimeRetain, TimeUnit.MINUTES)
+                         .removalListener(new JoinRemoveListener())


It seems like we only want notified of a full cache when ERROR logging is set. Is that the case?

In the JoinRemoveListener we end up doing some work that we probably don't need to do unless ERROR logging is set. One easy fix would be to only add the "remove listener" if LOG.isDebugEnabled().

So, I believe this was intentionally done before this PR (I migrated this to the new caching strategy) and the idea is that if a removal is happening from the join cache under specific circumstances, we want to know about it because a message could be being dropped because the cache is being overwhelmed. @merrimanr Can you chime in here on the rationale?

Yes, it is pre-existing. We can address at a later time.

I remember now, maxing out this cache causes the Split/Join to fail, which is a major problem for the Split/Join topology. And this cache here is only for the Split/Join, not the Unified topology.

We should probably look at adding similar logging (only when ERROR enabled) for the other places where we use the cache. Or just some mechanism to periodically log cache stats. Anywho, down the road.

nickwallen · 2018-03-07T15:52:27Z

+1 LGTM

cestella added 20 commits February 21, 2018 18:59

Adding parallel enrichment bolt.

a4f618a

Updating to include trace statements.

99fe0b8

Updating with some cleanup

79736c6

Updating spec.

cb4a527

Updating threadpool creation

fb4d438

better docs

87ef6a7

Updating readme.

6ae9594

Better documentation.

82ebc95

Updating bolt to avoid an error.

235046d

Updated shuffles to local or shuffle

2e09e6e

Updating bolt to not log so much.

cc3162d

Bug in the stellar adapter that is preventing caching.

0601a9b

Move the clone to the right place and make a test case for this.

6b7161d

enricher test update

4706529

Adding the ability to monitor the cache

09b3adf

parallel strategy should use a concurrency level set to the threadpoo…

167260a

…l size

Moved to a different cache.

934bdea

updating test

571be7d

dependencies for caffeine

d359746

Replacing guava caches in places where the cache sizes may be large.

5786282

cestella changed the title ~~METRON-1467: Replace guava caches in places where the keyspace might be large~~ METRON-1467: Replace guava caches in places where the keyspace might be large (NOTE: Review after METRON-1460) Mar 2, 2018

cestella mentioned this pull request Mar 3, 2018

METRON-1460: Create a complementary non-split-join enrichment topology #940

Closed

10 tasks

cestella closed this Mar 5, 2018

cestella reopened this Mar 5, 2018

Updating to avoid a rogue caffeine 2.3.3 transitive dependency

832e109

cestella added 4 commits March 5, 2018 20:18

Refactored for documentation and to make the abstraction more clear.

4f56cdc

Merge branch 'single_bolt_split_join_poc' into guava_cache_replacement

9237e25

Added diagram, documentation and explanation.

ba5473b

try a svg

852f599

cestella added 4 commits March 6, 2018 11:10

Migrated to svg

d2b41af

Merge branch 'single_bolt_split_join_poc' into guava_cache_replacement

54666e7

Added license to svg

1c62138

Merge branch 'single_bolt_split_join_poc' into guava_cache_replacement

4fd1639

cestella changed the title ~~METRON-1467: Replace guava caches in places where the keyspace might be large (NOTE: Review after METRON-1460)~~ METRON-1467: Replace guava caches in places where the keyspace might be large Mar 7, 2018

Merge branch 'master' into guava_cache_replacement

15518eb

nickwallen reviewed Mar 7, 2018

View reviewed changes

asfgit closed this in abb152b Mar 7, 2018

nickwallen mentioned this pull request Nov 17, 2018

METRON-1880 Use Caffeine for Profiler Caching #1270

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

METRON-1467: Replace guava caches in places where the keyspace might be large #947

METRON-1467: Replace guava caches in places where the keyspace might be large #947

cestella commented Mar 2, 2018 •

edited

Loading

cestella commented Mar 5, 2018

nickwallen left a comment

nickwallen Mar 7, 2018 •

edited

Loading

cestella Mar 7, 2018

nickwallen Mar 7, 2018

nickwallen commented Mar 7, 2018

METRON-1467: Replace guava caches in places where the keyspace might be large #947

METRON-1467: Replace guava caches in places where the keyspace might be large #947

Conversation

cestella commented Mar 2, 2018 • edited Loading

Contributor Comments

Pull Request Checklist

For all changes:

For code changes:

For documentation related changes:

Note:

cestella commented Mar 5, 2018

nickwallen left a comment

Choose a reason for hiding this comment

nickwallen Mar 7, 2018 • edited Loading

Choose a reason for hiding this comment

cestella Mar 7, 2018

Choose a reason for hiding this comment

nickwallen Mar 7, 2018

Choose a reason for hiding this comment

nickwallen commented Mar 7, 2018

cestella commented Mar 2, 2018 •

edited

Loading

nickwallen Mar 7, 2018 •

edited

Loading