-
Notifications
You must be signed in to change notification settings - Fork 507
METRON-2005: Batch Writer writes 0-byte files to HDFS on rotation #1338
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix here @justinleet. The logging additions are a huge improvement as well.
metron-platform/metron-writer/src/main/java/org/apache/metron/writer/hdfs/SourceHandler.java
Show resolved
Hide resolved
import org.junit.rules.TemporaryFolder; | ||
import static org.mockito.Mockito.*; | ||
|
||
public class SourceHandlerTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff with the logging additions.
SourceHandler handler = getSourceHandler(sourceType, path, configurations); | ||
handler.handle(message, sourceType, configurations, syncPolicyCreator); | ||
} | ||
} catch (Exception e) { | ||
LOG.error("HdfsWriter encountered error writing", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This log statement is useful. Can you add the sourceType, number of messages, and output path to the log statement? That context is going to be helpful when debugging an issue.
SourceHandler removed = sourceHandlerMap.remove(key); | ||
removed.close(); // If it's getting removed, we want to close it to ensure things like Timers are ended. | ||
LOG.debug("Removed {} -> {}", key, removed); | ||
LOG.debug("Current state of sourceHandlerMap: {}", sourceHandlerMap); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to just push this into a single log statement? I'm worried that out-of-order logging would make it difficult to connect the dots that after I removed X, the current state is Y. If its one statement, you don't have that potential problem.
@@ -29,7 +34,10 @@ | |||
} | |||
|
|||
public void removeKey() { | |||
sourceHandlerMap.remove(key); | |||
SourceHandler removed = sourceHandlerMap.remove(key); | |||
removed.close(); // If it's getting removed, we want to close it to ensure things like Timers are ended. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do a null check on removed
just for our own sanity?
@@ -154,13 +170,16 @@ synchronized SourceHandler getSourceHandler(String sourceType, String stellarRes | |||
SourceHandler ret = sourceHandlerMap.get(key); | |||
if(ret == null) { | |||
if(sourceHandlerMap.size() >= maxOpenFiles) { | |||
throw new IllegalStateException("Too many HDFS files open!"); | |||
String errorMsg = "Too many HDFS files open! Maximum number of open files is: " + maxOpenFiles; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another useful addition to our logging. Can you also add the number of files that are open too?
I know it would normally be close to, if not equal to the max, but weirder things have happened and that's what good logging is there to help with.
Can you clarify a but more what file rotation should look like with and without Stellar functions modifying the output path? As pertains to the rotation number "0" I'm wondering why we don't just remove it altogether. |
@mmiklavc The description of what it looks like is pretty much https://github.com/apache/metron/tree/master/metron-platform/metron-writer. I'll add a note about file rotation. The reason it's still there is because we can't remove it. It's part of the FileNameFormat. We can't remove it without building something out ourselves. |
@nickwallen The various changes you suggested are implemented. @mmiklavc I updated the README slightly and added a comment inline re: rotationNum hardcoded to 0. |
That looks good @justinleet, thanks for the updates. As far as I'm concerned, this looks good. I'm +1 by inspection pending Travis and any other community requests. |
SourceHandler handler = getSourceHandler(sourceType, path, configurations); | ||
handler.handle(message, sourceType, configurations, syncPolicyCreator); | ||
} catch (Exception e) { | ||
LOG.error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the {}
substitution works with LOG.error. I think you have to build the error string first like so.
String err = String.format("HdfsWriter encountered.. %s... blah blah.", sourceType, messages.size(), path);
LOG.error(err, e);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of slf4j 1.6.0 (we use 1.7.7), it works and the last arg has to be the exception (check out https://www.slf4j.org/faq.html#paramException). The example they give is
String s = "Hello world";
try {
Integer i = Integer.valueOf(s);
} catch (NumberFormatException e) {
logger.error("Failed to format {}", s, e);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's awesome. I've been doing it the hard way!
+1 Thanks for the fix and the lesson on slf4j. |
Nick's comments have been addressed and he's +1, so I'm dismissing the review.
Contributor Comments
https://issues.apache.org/jira/browse/METRON-2005 has more info, but the short story is that file rotations were incorrectly creating extra 0-byte files.
This adds a bunch of logging used while debugging the issue, along with a couple minor changes / cleanups (e.g. making sure to close the SourceHandlers that were removed, etc.).
To reproduce the problem without this PR, in the Ambari config for Metron go to Indexing and set
HDFS Rotation Policy Units
andHDFS Rotation Policy Count
to something small (e.g. a couple minutes). You should see 0 byte files being produced. The format of a file name isenrichment-hdfsIndexingBolt-3-0-1550181963674.json
. The-0-
is the rotation number. All nonzero rotation numbers will be 0-byte files, while all zero rotation numbers will contain data.With the PR, all files should have a zero rotation number (with further explanation in the JIRA, as noted before). The file should change as expected still, e.g. for a 2 minute rotation, you'll see something like
Note that the latest two dates may be the same (e.g. in the example, there are two
22:54
. This is because one file is closed and done, and the newer one was just opened and hasn't been completed yet. Once closed the second one will be22:56
).The main change to tests was just to remove a unit test that showed handling of a now incorrect state. I added a fairly basic test to ensure rotation actions and the callback are called, but I'm definitely open to new tests if anyone has any suggestions on what they'd like to see.
I also had to add stellar-common as a dependency. I hadn't changed anything that would have broken that, but still was failing tests locally. I thought it was a transitive dependency, but I'm surprised it broke now. I'm open to thoughts about it.
Pull Request Checklist
Thank you for submitting a contribution to Apache Metron.
Please refer to our Development Guidelines for the complete guide to follow for contributions.
Please refer also to our Build Verification Guidelines for complete smoke testing guides.
In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following:
For all changes:
For code changes:
Have you included steps to reproduce the behavior or problem that is being changed or addressed?
Have you included steps or a guide to how the change may be verified and tested manually?
Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:
Have you written or updated unit tests and or integration tests to verify your changes?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?
For documentation related changes:
Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via
site-book/target/site/index.html
:Note:
Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
It is also recommended that travis-ci is set up for your personal repository such that your branches are built there before submitting a pull request.