Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-3300] Fixed ClassNotFoundException when using UDF in spark-shell #3132

Closed
wants to merge 1 commit into from

Conversation

kunal642
Copy link
Contributor

@kunal642 kunal642 commented Feb 22, 2019

Analysis:
When a spark-shell is run a scala interpreter session is started which is the main thread for that shell. This session uses TranslatingClassLoader, therefore the UDF( $anonfun$1 in the stacktrace) that is defined would be loaded into TranslatingClassLoader.

When deserialization happens an ObjectInputStream is create and the application tries to read the object, the ObjectInputStream uses a native method(sun.misc.VM.latestUserDefinedLoader() ) call to determine the ClassLoader that will be used to load the class. This native method returns URLClassLoader which is the parent of TranslatingClassLoader where the class was loaded.
Because of this ClassNotFoundException is thrown.

Class Loader Hierarchy

ExtClassLoader(head) -> AppClassLoader -> URLClassLoader -> TranslatingClassLoader

This looks like a bug in the java ObjectInputStream implementation as suggested by the following post
https://stackoverflow.com/questions/1771679/difference-between-threads-context-class-loader-and-normal-classloader

Operation Thread Thread ClassLoader ClassLoader
Register Main Translating Translating
Serialize Main Translating Translating
Deserialize Thread-1 Translating URLClassLoader

Solution:
Use ClassLoaderObjectInputStream to specify the class loader that should be used to load the class.

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

  • Any interfaces changed?

  • Any backward compatibility impacted?

  • Document update required?

  • Testing done
    Please provide details on
    - Whether new unit test cases have been added or why no new tests are required?
    - How it is tested? Please attach test report.
    - Is it a performance related change? Please attach the performance test report.
    - Any additional information to help reviewers in testing this change.

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2597/

@ravipesala
Copy link
Contributor

LGTM, please make sure any other place we have used ObjectInputStream apart from it and change in the same way.

@kunal642
Copy link
Contributor Author

@ravipesala ok

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10856/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2827/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2598/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2828/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10857/

@@ -261,7 +262,8 @@ private void writeChunkInfoForOlderVersions(DataOutput output) throws IOExceptio

private DataChunk deserializeDataChunk(byte[] bytes) throws IOException {
ByteArrayInputStream stream = new ByteArrayInputStream(bytes);
ObjectInputStream inputStream = new ObjectInputStream(stream);
ObjectInputStream inputStream =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method seems that "inputStream.close();" doesn't use finally block, can you add the protection in this pr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not open the file to mandatorily close the stream, it just opens on the byte[] so it is not needed to close in finally

@@ -1536,7 +1537,8 @@ public static ValueEncoderMeta deserializeEncoderMetaV2(byte[] encoderMeta) {
ValueEncoderMeta meta = null;
try {
aos = new ByteArrayInputStream(encoderMeta);
objStream = new ObjectInputStream(aos);
objStream =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"CarbonUtil.closeStreams(objStream);" cann't be called when not IOException

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for other exception handling because stream throw only ClassNotFoundException/IOException.

@kunal642
Copy link
Contributor Author

kunal642 commented Mar 7, 2019

retest this please

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2658/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10918/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2888/

@kunal642
Copy link
Contributor Author

kunal642 commented Mar 8, 2019

@ravipesala Please review and merge

@ravipesala
Copy link
Contributor

ravipesala commented Mar 12, 2019

retest this please

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2680/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2908/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10939/

@ravipesala
Copy link
Contributor

LGTM

@asfgit asfgit closed this in dda9c4d Mar 12, 2019
asfgit pushed a commit that referenced this pull request Apr 2, 2019
…k-shell

Analysis:
When a spark-shell is run a scala interpreter session is started which is the main thread for that shell. This session uses TranslatingClassLoader, therefore the UDF(  in the stacktrace) that is defined would be loaded into TranslatingClassLoader.

When deserialization happens an ObjectInputStream is create and the application tries to read the object, the ObjectInputStream uses a native method(sun.misc.VM.latestUserDefinedLoader() ) call to determine the ClassLoader that will be used to load the class. This native method returns URLClassLoader which is the parent of TranslatingClassLoader where the class was loaded.
Because of this ClassNotFoundException is thrown.

Class Loader Hierarchy

ExtClassLoader(head) -> AppClassLoader -> URLClassLoader -> TranslatingClassLoader

This looks like a bug in the java ObjectInputStream implementation as suggested by the following post
https://stackoverflow.com/questions/1771679/difference-between-threads-context-class-loader-and-normal-classloader

Operation	Thread	Thread ClassLoader	ClassLoader
Register	Main	Translating	Translating
Serialize	Main	Translating	Translating
Deserialize	Thread-1	Translating	URLClassLoader
Solution:
Use ClassLoaderObjectInputStream to specify the class loader that should be used to load the class.

This closes #3132
qiuchenjian pushed a commit to qiuchenjian/carbondata that referenced this pull request Jun 14, 2019
…k-shell

Analysis:
When a spark-shell is run a scala interpreter session is started which is the main thread for that shell. This session uses TranslatingClassLoader, therefore the UDF(  in the stacktrace) that is defined would be loaded into TranslatingClassLoader.

When deserialization happens an ObjectInputStream is create and the application tries to read the object, the ObjectInputStream uses a native method(sun.misc.VM.latestUserDefinedLoader() ) call to determine the ClassLoader that will be used to load the class. This native method returns URLClassLoader which is the parent of TranslatingClassLoader where the class was loaded.
Because of this ClassNotFoundException is thrown.

Class Loader Hierarchy

ExtClassLoader(head) -> AppClassLoader -> URLClassLoader -> TranslatingClassLoader

This looks like a bug in the java ObjectInputStream implementation as suggested by the following post
https://stackoverflow.com/questions/1771679/difference-between-threads-context-class-loader-and-normal-classloader

Operation	Thread	Thread ClassLoader	ClassLoader
Register	Main	Translating	Translating
Serialize	Main	Translating	Translating
Deserialize	Thread-1	Translating	URLClassLoader
Solution:
Use ClassLoaderObjectInputStream to specify the class loader that should be used to load the class.

This closes apache#3132
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants