Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip empty files for local, hdfs, and cloud input sources #9450

Merged
merged 10 commits into from
Mar 4, 2020

Conversation

jihoonson
Copy link
Contributor

Description

This PR modifies the input sources to skip empty files except for the HTTP input source. This PR additionally fixes the two bugs:


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths.
  • added integration tests.
  • been tested in a test Druid cluster.

@@ -40,6 +42,7 @@
public class MaxSizeSplitHintSpec implements SplitHintSpec
{
public static final String TYPE = "maxSize";
private static final Logger LOG = new Logger(MaxSizeSplitHintSpec.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Removed.

@@ -21,6 +21,7 @@

import com.fasterxml.jackson.databind.ObjectMapper;
import nl.jqno.equalsverifier.EqualsVerifier;
import org.apache.commons.compress.utils.Lists;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be the one from guava instead? (same for MaxSizeSplitHintSpecTest)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Fixed.

Comment on lines +121 to +122
prepareNextRequest();
fetchNextBatch();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch is not covered by unit tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test.

@@ -98,11 +99,19 @@ public void testGetFileIteratorWithBothBaseDirAndDuplicateFilesIteratingFilesOnl
File baseDir = temporaryFolder.newFolder();
List<File> filesInBaseDir = new ArrayList<>();
for (int i = 0; i < 10; i++) {
filesInBaseDir.add(File.createTempFile("local-input-source", ".data", baseDir));
final File file = File.createTempFile("local-input-source", ".data", baseDir);
try (FileWriter writer = new FileWriter(file)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The forbidden apis checks is flagging this: java.io.FileWriter [Uses default charset]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -19,6 +19,7 @@

package org.apache.druid.storage.azure;

import com.google.api.client.util.Lists;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to update the pom to add this dependency:

<dependency>
  <groupId>com.google.http-client</groupId>
  <artifactId>google-http-client</artifactId>
  <scope>test</scope>
</dependency>

https://travis-ci.org/apache/druid/jobs/657595721#L2090

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this was a mistake. I'm not sure why the Intellij keeps adding a wrong one. Fixed it now.

@jihoonson jihoonson merged commit 9466ac7 into apache:master Mar 4, 2020
jihoonson added a commit to jihoonson/druid that referenced this pull request Mar 4, 2020
* Skip empty files for local, hdfs, and cloud input sources

* split hint spec doc

* doc for skipping empty files

* fix typo; adjust tests

* unnecessary fluent iterable

* address comments

* fix test

* use the right lists

* fix test

* fix test
@jihoonson jihoonson added this to the 0.18.0 milestone Mar 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants