HADOOP-18321.Fix when to read an additional record from a BZip2 text file split #4521

ashutoshcipher · 2022-06-30T16:18:53Z

Description of PR

Fix when to read an additional record from a BZip2 text file split

JIRA - HADOOP-18321

How was this patch tested?

Added Units

For code changes:

Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

…file split

hadoop-yetus · 2022-06-30T20:12:52Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 41s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 9 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	14m 40s		Maven dependency ordering for branch
+1 💚	mvninstall	25m 28s		trunk passed
+1 💚	compile	24m 33s		trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚	compile	21m 6s		trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚	checkstyle	3m 50s		trunk passed
+1 💚	mvnsite	2m 36s		trunk passed
+1 💚	javadoc	2m 0s		trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚	javadoc	1m 28s		trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚	spotbugs	4m 21s		trunk passed
+1 💚	shadedclient	21m 11s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 29s		Maven dependency ordering for patch
+1 💚	mvninstall	1m 30s		the patch passed
+1 💚	compile	22m 28s		the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚	javac	22m 28s		the patch passed
+1 💚	compile	21m 10s		the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚	javac	21m 10s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	3m 35s	/results-checkstyle-root.txt	root: The patch generated 13 new + 167 unchanged - 1 fixed = 180 total (was 168)
+1 💚	mvnsite	2m 35s		the patch passed
+1 💚	javadoc	1m 49s		the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚	javadoc	1m 30s		the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚	spotbugs	4m 26s		the patch passed
+1 💚	shadedclient	21m 23s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	18m 18s		hadoop-common in the patch passed.
+1 💚	unit	7m 10s		hadoop-mapreduce-client-core in the patch passed.
+1 💚	asflicense	0m 59s		The patch does not generate ASF License warnings.
		232m 29s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4521/1/artifact/out/Dockerfile
GITHUB PR	#4521
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux 411a69ccdf05 4.15.0-169-generic #177-Ubuntu SMP Thu Feb 3 10:50:38 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `9261ada`
Default Java	Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4521/1/testReport/
Max. process+thread count	3153 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core U: .
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4521/1/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2022-07-01T01:08:40Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 46s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 1s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 9 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	14m 35s		Maven dependency ordering for branch
+1 💚	mvninstall	26m 24s		trunk passed
+1 💚	compile	24m 12s		trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚	compile	21m 21s		trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚	checkstyle	4m 8s		trunk passed
+1 💚	mvnsite	2m 53s		trunk passed
+1 💚	javadoc	1m 56s		trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚	javadoc	1m 37s		trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚	spotbugs	4m 28s		trunk passed
+1 💚	shadedclient	21m 35s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 23s		Maven dependency ordering for patch
+1 💚	mvninstall	1m 43s		the patch passed
+1 💚	compile	22m 0s		the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚	javac	22m 0s		the patch passed
+1 💚	compile	19m 56s		the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚	javac	19m 56s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	5m 59s		root: The patch generated 0 new + 167 unchanged - 1 fixed = 167 total (was 168)
+1 💚	mvnsite	2m 41s		the patch passed
+1 💚	javadoc	1m 50s		the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚	javadoc	1m 34s		the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚	spotbugs	4m 24s		the patch passed
+1 💚	shadedclient	21m 12s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	17m 59s		hadoop-common in the patch passed.
+1 💚	unit	6m 48s		hadoop-mapreduce-client-core in the patch passed.
+1 💚	asflicense	1m 6s		The patch does not generate ASF License warnings.
		234m 42s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4521/2/artifact/out/Dockerfile
GITHUB PR	#4521
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux 58bc6c9ef44a 4.15.0-169-generic #177-Ubuntu SMP Thu Feb 3 10:50:38 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `7a6efbb`
Default Java	Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4521/2/testReport/
Max. process+thread count	3153 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core U: .
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4521/2/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

aajisaka · 2022-07-05T09:06:40Z

...ient-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestLineRecordReaderBZip2.java

+  @Override
+  public void setUp() throws Exception {
+    super.setUp();
+  }


I think the lines are not required for the checkstyle fix.

Thanks @aajisaka - I have addressed the above comment.

aajisaka · 2022-07-05T09:09:44Z

Other than the above comment, I'm +1 for this change.
Background: This fix has been merged internally and working without any failure related to this fix in several months.

hadoop-yetus · 2022-07-05T13:34:55Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	13m 6s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 9 new or modified test files.
			_ trunk Compile Tests _
+0 🆗	mvndep	14m 36s		Maven dependency ordering for branch
+1 💚	mvninstall	24m 49s		trunk passed
+1 💚	compile	23m 1s		trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚	compile	20m 34s		trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚	checkstyle	4m 25s		trunk passed
+1 💚	mvnsite	3m 46s		trunk passed
+1 💚	javadoc	3m 1s		trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚	javadoc	2m 36s		trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚	spotbugs	5m 23s		trunk passed
+1 💚	shadedclient	22m 25s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+0 🆗	mvndep	0m 28s		Maven dependency ordering for patch
+1 💚	mvninstall	1m 47s		the patch passed
+1 💚	compile	22m 18s		the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚	javac	22m 18s		the patch passed
+1 💚	compile	20m 33s		the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚	javac	20m 33s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
+1 💚	checkstyle	4m 12s		root: The patch generated 0 new + 167 unchanged - 1 fixed = 167 total (was 168)
+1 💚	mvnsite	3m 44s		the patch passed
+1 💚	javadoc	2m 55s		the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
+1 💚	javadoc	2m 36s		the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚	spotbugs	5m 29s		the patch passed
+1 💚	shadedclient	22m 47s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	18m 52s		hadoop-common in the patch passed.
+1 💚	unit	7m 22s		hadoop-mapreduce-client-core in the patch passed.
+1 💚	asflicense	1m 37s		The patch does not generate ASF License warnings.
		256m 57s

Subsystem	Report/Notes
Docker	ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4521/3/artifact/out/Dockerfile
GITHUB PR	#4521
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux cd47bfeba05f 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `a0f755e`
Default Java	Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4521/3/testReport/
Max. process+thread count	2710 (vs. ulimit of 5500)
modules	C: hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core U: .
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4521/3/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

PrabhuJoseph · 2022-07-06T04:29:47Z

Thanks @ashutoshcipher for the patch and @aajisaka for the review.

aajisaka · 2022-07-06T06:33:31Z

My late +1. Thank you @PrabhuJoseph and @ashutoshcipher

ashutoshcipher · 2022-07-06T10:25:18Z

Thanks @aajisaka @PrabhuJoseph and @saswata-dutta :)

steveloughran · 2022-07-07T16:52:53Z

...hadoop-common/src/test/java/org/apache/hadoop/io/compress/bzip2/TestBZip2TextFileWriter.java

+import static org.apache.hadoop.io.compress.bzip2.BZip2TextFileWriter.BLOCK_SIZE;
+import static org.junit.Assert.assertEquals;
+
+import java.io.ByteArrayInputStream;


bit late, but the imports are completely out of sync with the normal hadoop rules. check your ide settings.

Thanks @steveloughran for pointing it out. I am using this for code formatting - https://github.com/apache/hadoop/blob/trunk/dev-support/code-formatter/hadoop_idea_formatter.xml

that file puts statics at the bottom. at least it should. if it doesn't that's a bug

@steveloughran - I will file a JIRA and fix the imports.

@steveloughran - Created PR to sync imports - #4694

Sorry for being little late. Was busy in some other stuff. Thanks.

no worries. if you aren't behind on lots of things then you aren't a full time software engineer....

…file split (#4521) * HADOOP-18321.Fix when to read an additional record from a BZip2 text file split Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com> and Reviewed by Akira Ajisaka. (cherry picked from commit a432925)

…file split (apache#4521) * HADOOP-18321.Fix when to read an additional record from a BZip2 text file split Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com> and Reviewed by Akira Ajisaka.

HADOOP-18321.Fix when to read an additional record from a BZip2 text …

9261ada

…file split

Fixed Stylecheck

7a6efbb

saswata-dutta approved these changes Jul 1, 2022

View reviewed changes

aajisaka reviewed Jul 5, 2022

View reviewed changes

removing extra method

a0f755e

ashutoshcipher requested a review from aajisaka July 5, 2022 23:21

PrabhuJoseph merged commit a432925 into apache:trunk Jul 6, 2022

steveloughran reviewed Jul 7, 2022

View reviewed changes

ashutoshcipher deleted the HADOOP-18321 branch August 7, 2022 22:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HADOOP-18321.Fix when to read an additional record from a BZip2 text file split #4521

HADOOP-18321.Fix when to read an additional record from a BZip2 text file split #4521

ashutoshcipher commented Jun 30, 2022

hadoop-yetus commented Jun 30, 2022

hadoop-yetus commented Jul 1, 2022

aajisaka Jul 5, 2022

ashutoshcipher Jul 5, 2022

aajisaka commented Jul 5, 2022

hadoop-yetus commented Jul 5, 2022

PrabhuJoseph commented Jul 6, 2022

aajisaka commented Jul 6, 2022

ashutoshcipher commented Jul 6, 2022

steveloughran Jul 7, 2022

ashutoshcipher Jul 7, 2022

steveloughran Jul 11, 2022

ashutoshcipher Jul 11, 2022

steveloughran Jul 12, 2022

ashutoshcipher Aug 3, 2022

steveloughran Aug 4, 2022

HADOOP-18321.Fix when to read an additional record from a BZip2 text file split #4521

HADOOP-18321.Fix when to read an additional record from a BZip2 text file split #4521

Conversation

ashutoshcipher commented Jun 30, 2022

Description of PR

How was this patch tested?

For code changes:

hadoop-yetus commented Jun 30, 2022

hadoop-yetus commented Jul 1, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aajisaka commented Jul 5, 2022

hadoop-yetus commented Jul 5, 2022

PrabhuJoseph commented Jul 6, 2022

aajisaka commented Jul 6, 2022

ashutoshcipher commented Jul 6, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment