-
Notifications
You must be signed in to change notification settings - Fork 507
METRON-817: Customise output file path patterns for HDFS indexing #505
Conversation
…SourceFileNameFormat to ensure additions work
…in SourceAwareMoveAction
} | ||
|
||
StellarCompiler.Expression expression = sourceTypeExpressionMap.computeIfAbsent(stellarFunction, s -> stellarProcessor.compile(stellarFunction)); | ||
VariableResolver resolver = new MapVariableResolver(message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to find out from the function metadata/annotation the return type, without doing all this work shouldn't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me think of the UI case. We configure the index configuration but have no way of validation before they save and deploy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I don't think we can, unless we want to do more work to actually look up the function and validate. On top of it, things like MAP_GET essentially return Object anyway, so we'd still want to check if it's a String afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, is there a reason why this isn't just:
//processor is a StellarProcessor();
VariableResolver resolver = new MapVariableResolver(message);
Object objResult = processor.parse(stellarFunction, resolver, StellarFunctions.FUNCTION_RESOLVER(), Context.EMPTY_CONTEXT());
if(!objResult instanceof String) {
throw new IllegalArgumentException("Stellar Function <" + stellarFunction + "> did not return a String value. Returned: " + objResult);
}
return objResult == null?"":(String)objResult;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cestella I'm mostly concerned about the performance of function compile on every single message that comes through indexing.
If we keep the current approach, I would be interested in if there's a way to make things a little cleaner.
In retrospect, I think this should be an LRU cache, so that we don't keep around a given parse forever. Any thoughts on that, assuming performance would be enough of a concern to not just use your proposal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's a good concern. We do actually have a cache in the StellarProcessor
so that compilations happen once and are cached afterwards. As long as StellarProcessor
is a transient member variable, I think you're good to do what I suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cestella Made that change. I did make the check if(objResult != null && !(objResult instanceof String)
, to avoid having falling into the IAE when objResult is null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking at this a bit further, while reusing the StellarProcessor is the right answer, it is apparent that we don't practice that everywhere...in fact, we practice it almost literally nowhere. I have created a follow-on PR ( #508 ) to address that problem, which is a substantial performance issue, in fact.
+1 by inspection |
Contributor Comments
Primarily this affects HdfsWriter by changing the output path from a set path (
/apps/metron/.../<sensor>
), and allow it to be defined via a Stellar Function. Specifically, the base path is still defined the same (The/apps/metron/.../
portion), but the<sensor>
portion is dropped and can now be defined by a Stellar function. By default, the original behavior of<sensor>
is used. This is defined in the<sensor>.json
file as indicated in the new README.md for metron-writer.Notes
Testing
Unit tests are added to pretty much cover HdfsWriter, and this can be spun up in a dev environment.
To test in dev
/usr/metron/0.3.1/config/zookeeper/indexing/bro.json
/usr/metron/0.3.1/bin/zk_load_configs.sh -z node1:2181 -m PUSH -i /usr/metron/0.3.1/config/zookeeper/
Pull Request Checklist
Thank you for submitting a contribution to Apache Metron (Incubating).
Please refer to our Development Guidelines for the complete guide to follow for contributions.
Please refer also to our Build Verification Guidelines for complete smoke testing guides.
In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following:
For all changes:
For code changes:
Have you included steps to reproduce the behavior or problem that is being changed or addressed?
Have you included steps or a guide to how the change may be verified and tested manually?
Have you ensured that the full suite of tests and checks have been executed in the root incubating-metron folder via:
Have you written or updated unit tests and or integration tests to verify your changes?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?
For documentation related changes:
Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via
site-book/target/site/index.html
:Note:
Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
It is also recommened that travis-ci is set up for your personal repository such that your branches are built there before submitting a pull request.