Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-18321.Fix when to read an additional record from a BZip2 text file split #4521

Merged
merged 3 commits into from
Jul 6, 2022

Conversation

ashutoshcipher
Copy link
Contributor

Description of PR

Fix when to read an additional record from a BZip2 text file split

JIRA - HADOOP-18321

How was this patch tested?

Added Units

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 41s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 9 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 14m 40s Maven dependency ordering for branch
+1 💚 mvninstall 25m 28s trunk passed
+1 💚 compile 24m 33s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 compile 21m 6s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 3m 50s trunk passed
+1 💚 mvnsite 2m 36s trunk passed
+1 💚 javadoc 2m 0s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 1m 28s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 4m 21s trunk passed
+1 💚 shadedclient 21m 11s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 29s Maven dependency ordering for patch
+1 💚 mvninstall 1m 30s the patch passed
+1 💚 compile 22m 28s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javac 22m 28s the patch passed
+1 💚 compile 21m 10s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 21m 10s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 3m 35s /results-checkstyle-root.txt root: The patch generated 13 new + 167 unchanged - 1 fixed = 180 total (was 168)
+1 💚 mvnsite 2m 35s the patch passed
+1 💚 javadoc 1m 49s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 1m 30s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 4m 26s the patch passed
+1 💚 shadedclient 21m 23s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 18m 18s hadoop-common in the patch passed.
+1 💚 unit 7m 10s hadoop-mapreduce-client-core in the patch passed.
+1 💚 asflicense 0m 59s The patch does not generate ASF License warnings.
232m 29s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4521/1/artifact/out/Dockerfile
GITHUB PR #4521
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 411a69ccdf05 4.15.0-169-generic #177-Ubuntu SMP Thu Feb 3 10:50:38 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 9261ada
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4521/1/testReport/
Max. process+thread count 3153 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4521/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 46s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 9 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 14m 35s Maven dependency ordering for branch
+1 💚 mvninstall 26m 24s trunk passed
+1 💚 compile 24m 12s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 compile 21m 21s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 4m 8s trunk passed
+1 💚 mvnsite 2m 53s trunk passed
+1 💚 javadoc 1m 56s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 1m 37s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 4m 28s trunk passed
+1 💚 shadedclient 21m 35s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 23s Maven dependency ordering for patch
+1 💚 mvninstall 1m 43s the patch passed
+1 💚 compile 22m 0s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javac 22m 0s the patch passed
+1 💚 compile 19m 56s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 19m 56s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 5m 59s root: The patch generated 0 new + 167 unchanged - 1 fixed = 167 total (was 168)
+1 💚 mvnsite 2m 41s the patch passed
+1 💚 javadoc 1m 50s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 1m 34s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 4m 24s the patch passed
+1 💚 shadedclient 21m 12s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 17m 59s hadoop-common in the patch passed.
+1 💚 unit 6m 48s hadoop-mapreduce-client-core in the patch passed.
+1 💚 asflicense 1m 6s The patch does not generate ASF License warnings.
234m 42s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4521/2/artifact/out/Dockerfile
GITHUB PR #4521
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 58bc6c9ef44a 4.15.0-169-generic #177-Ubuntu SMP Thu Feb 3 10:50:38 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 7a6efbb
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4521/2/testReport/
Max. process+thread count 3153 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4521/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Comment on lines 25 to 28
@Override
public void setUp() throws Exception {
super.setUp();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the lines are not required for the checkstyle fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aajisaka - I have addressed the above comment.

@aajisaka
Copy link
Member

aajisaka commented Jul 5, 2022

Other than the above comment, I'm +1 for this change.
Background: This fix has been merged internally and working without any failure related to this fix in several months.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 13m 6s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 9 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 14m 36s Maven dependency ordering for branch
+1 💚 mvninstall 24m 49s trunk passed
+1 💚 compile 23m 1s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 compile 20m 34s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 4m 25s trunk passed
+1 💚 mvnsite 3m 46s trunk passed
+1 💚 javadoc 3m 1s trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 2m 36s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 5m 23s trunk passed
+1 💚 shadedclient 22m 25s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 28s Maven dependency ordering for patch
+1 💚 mvninstall 1m 47s the patch passed
+1 💚 compile 22m 18s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javac 22m 18s the patch passed
+1 💚 compile 20m 33s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 20m 33s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 4m 12s root: The patch generated 0 new + 167 unchanged - 1 fixed = 167 total (was 168)
+1 💚 mvnsite 3m 44s the patch passed
+1 💚 javadoc 2m 55s the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚 javadoc 2m 36s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 5m 29s the patch passed
+1 💚 shadedclient 22m 47s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 18m 52s hadoop-common in the patch passed.
+1 💚 unit 7m 22s hadoop-mapreduce-client-core in the patch passed.
+1 💚 asflicense 1m 37s The patch does not generate ASF License warnings.
256m 57s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4521/3/artifact/out/Dockerfile
GITHUB PR #4521
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux cd47bfeba05f 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / a0f755e
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4521/3/testReport/
Max. process+thread count 2710 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4521/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@PrabhuJoseph
Copy link
Contributor

Thanks @ashutoshcipher for the patch and @aajisaka for the review.

@PrabhuJoseph PrabhuJoseph merged commit a432925 into apache:trunk Jul 6, 2022
@aajisaka
Copy link
Member

aajisaka commented Jul 6, 2022

My late +1. Thank you @PrabhuJoseph and @ashutoshcipher

@ashutoshcipher
Copy link
Contributor Author

Thanks @aajisaka @PrabhuJoseph and @saswata-dutta :)

import static org.apache.hadoop.io.compress.bzip2.BZip2TextFileWriter.BLOCK_SIZE;
import static org.junit.Assert.assertEquals;

import java.io.ByteArrayInputStream;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bit late, but the imports are completely out of sync with the normal hadoop rules. check your ide settings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that file puts statics at the bottom. at least it should. if it doesn't that's a bug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steveloughran - I will file a JIRA and fix the imports.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steveloughran - Created PR to sync imports - #4694

Sorry for being little late. Was busy in some other stuff. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no worries. if you aren't behind on lots of things then you aren't a full time software engineer....

aajisaka pushed a commit that referenced this pull request Aug 7, 2022
…file split (#4521)

* HADOOP-18321.Fix when to read an additional record from a BZip2 text file split

Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com> and Reviewed by Akira Ajisaka.
(cherry picked from commit a432925)
@ashutoshcipher ashutoshcipher deleted the HADOOP-18321 branch August 7, 2022 22:18
HarshitGupta11 pushed a commit to HarshitGupta11/hadoop that referenced this pull request Nov 28, 2022
…file split (apache#4521)

* HADOOP-18321.Fix when to read an additional record from a BZip2 text file split

Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com> and Reviewed by Akira Ajisaka.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants