Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Scala jar contains the scala standard library #13528

Closed
larroy opened this issue Dec 4, 2018 · 10 comments
Closed

Scala jar contains the scala standard library #13528

larroy opened this issue Dec 4, 2018 · 10 comments
Labels

Comments

@larroy
Copy link
Contributor

larroy commented Dec 4, 2018

Description

Created scala package with make && make scalapkg

Following command shows that it contains the scala standard library in addition to mxnet classes.

jar tvf ./scala-package/assembly/osx-x86_64-cpu/target/mxnet-full_2.11-osx-x86_64-cpu-1.4.0-SNAPSHOT.jar

Environment info (Required)

f2dcd7c

@larroy
Copy link
Contributor Author

larroy commented Dec 4, 2018

@nswamy @lanking520

@lanking520
Copy link
Member

@larroy Hi Pedro, indeed. It contains all Scala needs. You can run your code by simply starting with

java -cp

@larroy
Copy link
Contributor Author

larroy commented Dec 4, 2018

This doesn't sound like a good idea. Libraries shouldn't include the scala standard library. You should use scala -cp to run , and not java -cp.

@lanking520
Copy link
Member

In fact, all Scala code are generated to .class file when we build the jar and I think it is JVM code. In that case, java -cp is consider as a super class of scala -cp. What do you think?

@larroy
Copy link
Contributor Author

larroy commented Dec 5, 2018

While I think is ok to provide a fat jar for people trying MXNet in the console as you point out, we should provide a standard version with just MXNet and the binary dynamic libraries inside, without standard scala library or other dependencies that can be pulled from maven.

Some of the comments I got from other engineers when I explained this issue were:

Definitely not. Maven artifacts should not be fat jars. The jar should only include classes and resources that are part of the enclosing project.
maven has a concept of "assembly" builds and repo can have those types of artifacts published, but that obvioulsy should be clearly documented or has designated classifier to warn consumers
In any case the build should not replace the default artifact with the output of an assembly build that pulls all dependencies, especially if this jar is then published to a Maven repository and can be imported by other Maven projects.

it may cause classpath issues and provided scala runtime, especially if they are different version

I don't want to irritate anyone with this issue because you guys have done great work putting the Scala package forward. But my feedback about this topic stands as it is if there's no new information provided.

I asked other senior engineers in Amazon about this issue and their feedback was the same as mine.
They also have extensive experience running big java / Scala projects. I have maintained Spark myself inside amazon for a few releases (it's a huge project) and worked in big Scala projects for Amazon which are now in production (SageMaker) and other companies like Nokia, so I think I know one thing or two about Scala and dependencies.

I suggest you guys do your due diligence about this topic and find out what's the best practice if my arguments are not convincing. If you need people to contact to gather more opinions I can suggest the Scala lists inside Amazon, external sources like stackoverflow or I can provide some names.

If you need more help or feedback from my side, let me know. Happy to help.

@lanking520
Copy link
Member

@larroy I think your explanation does make sense in this case. For Scala users essentially this is ok since they should already have Scala in their system. However, it would not be friendly to Java/Clojure users as it is the dependencies for them. Wrapping them up into a single Jar would reduce the dependencies they need as they can just import the package and it works.

Currently, the assembly wrapping the following things

  • libmxnet.so/jnilib Dependencies from the MXNet backend (80MB - 700 MB depends on CPU/GPU)
  • org.apache.mxnet compiled class files (11.7MB)
  • slf4j (1.6KB)
  • scala language (96.8MB)

After packing up to jar file:

  • CPU: 61MB
  • GPU: 479MB

@larroy
Copy link
Contributor Author

larroy commented Dec 5, 2018

That's why maven / SBT etc. have the concept of transitive dependencies:

https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#Transitive_Dependencies

slf4j and scala language should not be in the package. Only libmxnet.so/jnilib Dependencies from the MXNet backend and libmxnet.so/jnilib Dependencies from the MXNet backend.

What you describe is considered a bad practice.

@lanking520
Copy link
Member

@larroy This problem are fixed now. We include them into the pom file instead of binding together. Please feel free to close this issue

@piyushghai
Copy link
Contributor

@larroy I'm suggesting to close this issue, since we've dealt with it now.
Please feel free to re-open it if closed in error.

@lanking520 Can you close this issue ?

@lanking520
Copy link
Member

Close it for now, please feel free to reopen if you are still seeing the same problems in the package

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants