Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

METRON-2005: Batch Writer writes 0-byte files to HDFS on rotation #1338

Closed
wants to merge 8 commits into from

Conversation

justinleet
Copy link
Contributor

@justinleet justinleet commented Feb 15, 2019

Contributor Comments

https://issues.apache.org/jira/browse/METRON-2005 has more info, but the short story is that file rotations were incorrectly creating extra 0-byte files.

This adds a bunch of logging used while debugging the issue, along with a couple minor changes / cleanups (e.g. making sure to close the SourceHandlers that were removed, etc.).

To reproduce the problem without this PR, in the Ambari config for Metron go to Indexing and set HDFS Rotation Policy Units and HDFS Rotation Policy Count to something small (e.g. a couple minutes). You should see 0 byte files being produced. The format of a file name is enrichment-hdfsIndexingBolt-3-0-1550181963674.json. The -0- is the rotation number. All nonzero rotation numbers will be 0-byte files, while all zero rotation numbers will contain data.

With the PR, all files should have a zero rotation number (with further explanation in the JIRA, as noted before). The file should change as expected still, e.g. for a 2 minute rotation, you'll see something like

[metron@node1 ~]$ hdfs dfs -ls /apps/metron/indexing/indexed/*
Found 5 items
-rw-r--r--   1 storm hadoop    5852029 2019-02-14 22:40 /apps/metron/indexing/indexed/bro/enrichment-hdfsIndexingBolt-3-0-1550181964099.json
-rw-r--r--   1 storm hadoop    2950408 2019-02-14 22:50 /apps/metron/indexing/indexed/bro/enrichment-hdfsIndexingBolt-3-0-1550184487351.json
-rw-r--r--   1 storm hadoop     736045 2019-02-14 22:52 /apps/metron/indexing/indexed/bro/enrichment-hdfsIndexingBolt-3-0-1550184609455.json
-rw-r--r--   1 storm hadoop     765852 2019-02-14 22:54 /apps/metron/indexing/indexed/bro/enrichment-hdfsIndexingBolt-3-0-1550184732852.json
-rw-r--r--   1 storm hadoop     588054 2019-02-14 22:54 /apps/metron/indexing/indexed/bro/enrichment-hdfsIndexingBolt-3-0-1550184855769.json
...

Note that the latest two dates may be the same (e.g. in the example, there are two 22:54. This is because one file is closed and done, and the newer one was just opened and hasn't been completed yet. Once closed the second one will be 22:56).

The main change to tests was just to remove a unit test that showed handling of a now incorrect state. I added a fairly basic test to ensure rotation actions and the callback are called, but I'm definitely open to new tests if anyone has any suggestions on what they'd like to see.

I also had to add stellar-common as a dependency. I hadn't changed anything that would have broken that, but still was failing tests locally. I thought it was a transitive dependency, but I'm surprised it broke now. I'm open to thoughts about it.

Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.
Please refer to our Development Guidelines for the complete guide to follow for contributions.
Please refer also to our Build Verification Guidelines for complete smoke testing guides.

In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following:

For all changes:

  • Is there a JIRA ticket associated with this PR? If not one needs to be created at Metron Jira.
  • Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
  • Has your PR been rebased against the latest commit within the target branch (typically master)?

For code changes:

  • Have you included steps to reproduce the behavior or problem that is being changed or addressed?

  • Have you included steps or a guide to how the change may be verified and tested manually?

  • Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:

    mvn -q clean integration-test install && dev-utilities/build-utils/verify_licenses.sh 
    
  • Have you written or updated unit tests and or integration tests to verify your changes?

  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?

  • Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via site-book/target/site/index.html:

    cd site-book
    mvn site
    

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
It is also recommended that travis-ci is set up for your personal repository such that your branches are built there before submitting a pull request.

Copy link
Contributor

@mmiklavc mmiklavc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix here @justinleet. The logging additions are a huge improvement as well.

import org.junit.rules.TemporaryFolder;
import static org.mockito.Mockito.*;

public class SourceHandlerTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this!

Copy link
Contributor

@nickwallen nickwallen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff with the logging additions.

SourceHandler handler = getSourceHandler(sourceType, path, configurations);
handler.handle(message, sourceType, configurations, syncPolicyCreator);
}
} catch (Exception e) {
LOG.error("HdfsWriter encountered error writing", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log statement is useful. Can you add the sourceType, number of messages, and output path to the log statement? That context is going to be helpful when debugging an issue.

SourceHandler removed = sourceHandlerMap.remove(key);
removed.close(); // If it's getting removed, we want to close it to ensure things like Timers are ended.
LOG.debug("Removed {} -> {}", key, removed);
LOG.debug("Current state of sourceHandlerMap: {}", sourceHandlerMap);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to just push this into a single log statement? I'm worried that out-of-order logging would make it difficult to connect the dots that after I removed X, the current state is Y. If its one statement, you don't have that potential problem.

@@ -29,7 +34,10 @@
}

public void removeKey() {
sourceHandlerMap.remove(key);
SourceHandler removed = sourceHandlerMap.remove(key);
removed.close(); // If it's getting removed, we want to close it to ensure things like Timers are ended.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do a null check on removed just for our own sanity?

@@ -154,13 +170,16 @@ synchronized SourceHandler getSourceHandler(String sourceType, String stellarRes
SourceHandler ret = sourceHandlerMap.get(key);
if(ret == null) {
if(sourceHandlerMap.size() >= maxOpenFiles) {
throw new IllegalStateException("Too many HDFS files open!");
String errorMsg = "Too many HDFS files open! Maximum number of open files is: " + maxOpenFiles;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another useful addition to our logging. Can you also add the number of files that are open too?

I know it would normally be close to, if not equal to the max, but weirder things have happened and that's what good logging is there to help with.

@mmiklavc
Copy link
Contributor

Can you clarify a but more what file rotation should look like with and without Stellar functions modifying the output path? As pertains to the rotation number "0" I'm wondering why we don't just remove it altogether.

@justinleet
Copy link
Contributor Author

@mmiklavc The description of what it looks like is pretty much https://github.com/apache/metron/tree/master/metron-platform/metron-writer. I'll add a note about file rotation.

The reason it's still there is because we can't remove it. It's part of the FileNameFormat. We can't remove it without building something out ourselves.

@justinleet
Copy link
Contributor Author

@nickwallen The various changes you suggested are implemented.

@mmiklavc I updated the README slightly and added a comment inline re: rotationNum hardcoded to 0.

@mmiklavc
Copy link
Contributor

That looks good @justinleet, thanks for the updates. As far as I'm concerned, this looks good. I'm +1 by inspection pending Travis and any other community requests.

SourceHandler handler = getSourceHandler(sourceType, path, configurations);
handler.handle(message, sourceType, configurations, syncPolicyCreator);
} catch (Exception e) {
LOG.error(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the {} substitution works with LOG.error. I think you have to build the error string first like so.

String err = String.format("HdfsWriter encountered.. %s... blah blah.", sourceType, messages.size(), path);
LOG.error(err, e);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of slf4j 1.6.0 (we use 1.7.7), it works and the last arg has to be the exception (check out https://www.slf4j.org/faq.html#paramException). The example they give is

String s = "Hello world";
try {
  Integer i = Integer.valueOf(s);
} catch (NumberFormatException e) {
  logger.error("Failed to format {}", s, e);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's awesome. I've been doing it the hard way!

@nickwallen
Copy link
Contributor

+1 Thanks for the fix and the lesson on slf4j.

@justinleet justinleet dismissed nickwallen’s stale review February 19, 2019 14:34

Nick's comments have been addressed and he's +1, so I'm dismissing the review.

@asfgit asfgit closed this in 5e9cf70 Feb 19, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants