-
Notifications
You must be signed in to change notification settings - Fork 512
METRON-706: Add Stellar transformations and filters to enrichment and threat intel loaders #445
Conversation
… flatfile load script
…x trampled commit for ExtractorHandler
…est for custom extractor definition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this; ultimately, very clear and clean, @mmiklavc
A few minor nits, but I'm +1 shortly thereafter.
@@ -35,6 +35,14 @@ protected ConvertUtilsBean initialValue() { | |||
} | |||
}; | |||
|
|||
public static <T> T convertOrFail(Object o, Class<T> clazz) { | |||
if (clazz.isInstance(o)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why this isn't just clazz.cast(o)
and called cast
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I'm removing this entirely. I'll just use casting in place, e.g.
Example 1
String val = (String) map.get("foo"); // throws class cast exception on failure, which is what we want
Example 2
Map<Object, Object> a = new HashMap() {{
put("hello", "world");
put(1, 2);
}};
Map<String, Object> b = new HashMap() {{
put("a", a);
}};
Map<Object, Object> c = (Map) b.get("a"); // throws class cast exception if not a Map
Map<String, String> d = new HashMap<>();
for (Map.Entry<Object, Object> entry : c.entrySet()) {
d.put((String) entry.getKey(), (String) entry.getValue()); // throws class cast exception. also what we want
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
Context.Builder builder = new Context.Builder(); | ||
if (zkClient.isPresent()) { | ||
builder.with(Context.Capabilities.ZOOKEEPER_CLIENT, zkClient::get) | ||
.with(Context.Capabilities.GLOBAL_CONFIG, () -> globalConfig); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want an empty global config even if the zkClient
isn't present. Not sure, just a thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By empty, do you mean null or "{}"? Does Stellar handle that differently from choosing not to add the capability at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean {}
@@ -200,6 +295,7 @@ public static void teardown() throws Exception { | |||
multilineZipFile.delete(); | |||
lineByLineExtractorConfigFile.delete(); | |||
wholeFileExtractorConfigFile.delete(); | |||
stellarExtractorConfigFile.delete(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you forget customLineByLineExtractorConfigFile.delete();
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, fixing
Note: Per the recent issue in master with Ansible, I tested the following as well
|
+1 by inspection |
Adding these timing notes about the import for reference: No filter, local load, multiple threads (5), batch 128 With filters, multiple threads (5), batch 128 (1 record less) MapReduce mode |
This PR completes work in https://issues.apache.org/jira/browse/METRON-706
(Note: there are commits from @cestella that I had merged in the process of working on this. They are squashed in master but show up here. They only show in the commit history, not the diff)
Motivation for this PR is to expand where we expose Stellar capabilities. This work enables transformations and filtering on enrichment and threatintel extractors. The user is now able to specify transformation expressions on the column values and separately filter records based on a provided predicate. The same can also be done independently for the key indicator value used as part of the HBase key. In addition, a new property has been added to the configuration that allows a user to specify a Zookeeper quorum and reference global properties specified in the global config.
See the updated README for documentation details on the new properties.
Testing
Testing follows closely with the methods defined in #432
The "port" property/variable here is referencing "es.port" from the global config.
You should see 9275 records in HBase. (Less than the perhaps expected 10k)
You should get 9 values as below:
Once again, we get fewer than the original dataset size. This is because multiple records are mapping to the same resulting keys in HBase.