New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CARBONDATA-3300] Fixed ClassNotFoundException when using UDF in spark-shell #3132
Conversation
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2597/ |
LGTM, please make sure any other place we have used |
@ravipesala ok |
Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10856/ |
9bca36f
to
b25d8f2
Compare
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2827/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2598/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2828/ |
Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10857/ |
@@ -261,7 +262,8 @@ private void writeChunkInfoForOlderVersions(DataOutput output) throws IOExceptio | |||
|
|||
private DataChunk deserializeDataChunk(byte[] bytes) throws IOException { | |||
ByteArrayInputStream stream = new ByteArrayInputStream(bytes); | |||
ObjectInputStream inputStream = new ObjectInputStream(stream); | |||
ObjectInputStream inputStream = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method seems that "inputStream.close();" doesn't use finally block, can you add the protection in this pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not open the file to mandatorily close the stream, it just opens on the byte[] so it is not needed to close in finally
@@ -1536,7 +1537,8 @@ public static ValueEncoderMeta deserializeEncoderMetaV2(byte[] encoderMeta) { | |||
ValueEncoderMeta meta = null; | |||
try { | |||
aos = new ByteArrayInputStream(encoderMeta); | |||
objStream = new ObjectInputStream(aos); | |||
objStream = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"CarbonUtil.closeStreams(objStream);" cann't be called when not IOException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for other exception handling because stream throw only ClassNotFoundException/IOException.
retest this please |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2658/ |
Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10918/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2888/ |
@ravipesala Please review and merge |
retest this please |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2680/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2908/ |
Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10939/ |
LGTM |
…k-shell Analysis: When a spark-shell is run a scala interpreter session is started which is the main thread for that shell. This session uses TranslatingClassLoader, therefore the UDF( in the stacktrace) that is defined would be loaded into TranslatingClassLoader. When deserialization happens an ObjectInputStream is create and the application tries to read the object, the ObjectInputStream uses a native method(sun.misc.VM.latestUserDefinedLoader() ) call to determine the ClassLoader that will be used to load the class. This native method returns URLClassLoader which is the parent of TranslatingClassLoader where the class was loaded. Because of this ClassNotFoundException is thrown. Class Loader Hierarchy ExtClassLoader(head) -> AppClassLoader -> URLClassLoader -> TranslatingClassLoader This looks like a bug in the java ObjectInputStream implementation as suggested by the following post https://stackoverflow.com/questions/1771679/difference-between-threads-context-class-loader-and-normal-classloader Operation Thread Thread ClassLoader ClassLoader Register Main Translating Translating Serialize Main Translating Translating Deserialize Thread-1 Translating URLClassLoader Solution: Use ClassLoaderObjectInputStream to specify the class loader that should be used to load the class. This closes #3132
…k-shell Analysis: When a spark-shell is run a scala interpreter session is started which is the main thread for that shell. This session uses TranslatingClassLoader, therefore the UDF( in the stacktrace) that is defined would be loaded into TranslatingClassLoader. When deserialization happens an ObjectInputStream is create and the application tries to read the object, the ObjectInputStream uses a native method(sun.misc.VM.latestUserDefinedLoader() ) call to determine the ClassLoader that will be used to load the class. This native method returns URLClassLoader which is the parent of TranslatingClassLoader where the class was loaded. Because of this ClassNotFoundException is thrown. Class Loader Hierarchy ExtClassLoader(head) -> AppClassLoader -> URLClassLoader -> TranslatingClassLoader This looks like a bug in the java ObjectInputStream implementation as suggested by the following post https://stackoverflow.com/questions/1771679/difference-between-threads-context-class-loader-and-normal-classloader Operation Thread Thread ClassLoader ClassLoader Register Main Translating Translating Serialize Main Translating Translating Deserialize Thread-1 Translating URLClassLoader Solution: Use ClassLoaderObjectInputStream to specify the class loader that should be used to load the class. This closes apache#3132
Analysis:
When a spark-shell is run a scala interpreter session is started which is the main thread for that shell. This session uses TranslatingClassLoader, therefore the UDF( $anonfun$1 in the stacktrace) that is defined would be loaded into TranslatingClassLoader.
When deserialization happens an ObjectInputStream is create and the application tries to read the object, the ObjectInputStream uses a native method(sun.misc.VM.latestUserDefinedLoader() ) call to determine the ClassLoader that will be used to load the class. This native method returns URLClassLoader which is the parent of TranslatingClassLoader where the class was loaded.
Because of this ClassNotFoundException is thrown.
Class Loader Hierarchy
ExtClassLoader(head) -> AppClassLoader -> URLClassLoader -> TranslatingClassLoader
This looks like a bug in the java ObjectInputStream implementation as suggested by the following post
https://stackoverflow.com/questions/1771679/difference-between-threads-context-class-loader-and-normal-classloader
Solution:
Use ClassLoaderObjectInputStream to specify the class loader that should be used to load the class.
Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:
Any interfaces changed?
Any backward compatibility impacted?
Document update required?
Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.