-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-8852] [flume] Trim dependencies in flume assembly. #7247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Also, add support for the *-provided profiles. This avoids repackaging things that are already in the Spark assembly, or, in the case of the *-provided profiles, are provided by the distribution. The flume-ng-auth dependency was also excluded since it's not really used by Spark.
|
Assembly came down to ~ 2.5 MB from ~ 80 MB. @harishreedharan tells me the Also tested with the |
|
Test build #36621 has finished for PR 7247 at commit
|
pom.xml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really disagree with this, since there is only one usage of flume in the project now. But the exclusions are specific to the flume-assembly module's needs, not to consumers of Flume in general in the project right? can this be managed down in the module alongside the other changes for the same purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, mostly they're for the flume-assembly build. I don't mind moving them out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this exclusion belong in the child POM too? I had actually thought potentially all of them should go, unless we systematically want to exclude some deps from all uses of Flume across the project, of which there's really only one now anyway. That is, if the reason for the exclusion is specific to one child module, they can live there only. It's up to your better judgment IMHO so LGTM either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to Hari this dependency is not used in Spark, so it sounds better to exclude it everywhere so that if something is added that uses it, things break (instead of just generating a broken assembly without needed classes).
|
Test build #36693 has finished for PR 7247 at commit
|
|
Ping? |
|
LGTM, any objections? |
|
What is the flume provided field for? If flume is already provided at the runtime environment, then the only thing that needs to be add is spark-streaming-flume, which can be directly added as its own JAR. Why have a separate profile in assembly? |
Because that would mean that depending on the profiles you enable, the pydocs in the code need to be changed, which is ugly. This way, while you have some redundancy, at least the user interface (or at least the user docs) remain consistent. EDIT: I'm referring to this in flume.py: |
|
Oh its easy to add it in the pydoc that |
|
That maven profile already exists. We use it when packaging CDH. That's what all "*-provided" profiles are for - for distributions to use when they already provide the dependencies. |
On top of my previous comment, that's also not enough. You also need |
|
I understand it now. All right LGTM. |
Also, add support for the *-provided profiles. This avoids repackaging
things that are already in the Spark assembly, or, in the case of the
*-provided profiles, are provided by the distribution.
The flume-ng-auth dependency was also excluded since it's not really
used by Spark.