Skip to content

Conversation

@vanzin
Copy link
Contributor

@vanzin vanzin commented Jul 6, 2015

Also, add support for the *-provided profiles. This avoids repackaging
things that are already in the Spark assembly, or, in the case of the
*-provided profiles, are provided by the distribution.

The flume-ng-auth dependency was also excluded since it's not really
used by Spark.

Also, add support for the *-provided profiles. This avoids repackaging
things that are already in the Spark assembly, or, in the case of the
*-provided profiles, are provided by the distribution.

The flume-ng-auth dependency was also excluded since it's not really
used by Spark.
@vanzin
Copy link
Contributor Author

vanzin commented Jul 6, 2015

Assembly came down to ~ 2.5 MB from ~ 80 MB. @harishreedharan tells me the flume-ng-auth dependency is not needed, so I excluded it. I also fixed some indentation issues in the assembly pom.

Also tested with the flume-provided profile enabled, in which case the assembly is ~ 170 kB.

@SparkQA
Copy link

SparkQA commented Jul 7, 2015

Test build #36621 has finished for PR 7247 at commit c962082.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

pom.xml Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really disagree with this, since there is only one usage of flume in the project now. But the exclusions are specific to the flume-assembly module's needs, not to consumers of Flume in general in the project right? can this be managed down in the module alongside the other changes for the same purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, mostly they're for the flume-assembly build. I don't mind moving them out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this exclusion belong in the child POM too? I had actually thought potentially all of them should go, unless we systematically want to exclude some deps from all uses of Flume across the project, of which there's really only one now anyway. That is, if the reason for the exclusion is specific to one child module, they can live there only. It's up to your better judgment IMHO so LGTM either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to Hari this dependency is not used in Spark, so it sounds better to exclude it everywhere so that if something is added that uses it, things break (instead of just generating a broken assembly without needed classes).

@SparkQA
Copy link

SparkQA commented Jul 7, 2015

Test build #36693 has finished for PR 7247 at commit 298a7d5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor Author

vanzin commented Jul 9, 2015

Ping?

@srowen
Copy link
Member

srowen commented Jul 9, 2015

LGTM, any objections?

@tdas
Copy link
Contributor

tdas commented Jul 9, 2015

What is the flume provided field for? If flume is already provided at the runtime environment, then the only thing that needs to be add is spark-streaming-flume, which can be directly added as its own JAR. Why have a separate profile in assembly?

@vanzin
Copy link
Contributor Author

vanzin commented Jul 9, 2015

Why have a separate profile in assembly?

Because that would mean that depending on the profiles you enable, the pydocs in the code need to be changed, which is ugly. This way, while you have some redundancy, at least the user interface (or at least the user docs) remain consistent.

EDIT: I'm referring to this in flume.py:

2. Download the JAR of the artifact from Maven Central http://search.maven.org/,
   Group Id = org.apache.spark, Artifact Id = spark-streaming-flume-assembly, Version = %s.
   Then, include the jar in the spark-submit command as

   $ bin/spark-submit --jars <spark-streaming-flume-assembly.jar> ...

@tdas
Copy link
Contributor

tdas commented Jul 9, 2015

Oh its easy to add it in the pydoc that if Flume is already present you can just download spark-streaming-flume.jar. That is much better than having another maven profile to manage and reason about. And I dont even know who will even use that profile to compile. If someone is smart enough to use that profile for some purpose, then its fair to assume that he/she will be knowledgeable enough to know that just including spark-stremaing-flume.jar is sufficient, instead of including the spark-streaming-flume-assembly.jar

@vanzin
Copy link
Contributor Author

vanzin commented Jul 9, 2015

That maven profile already exists. We use it when packaging CDH. That's what all "*-provided" profiles are for - for distributions to use when they already provide the dependencies.

@vanzin
Copy link
Contributor Author

vanzin commented Jul 9, 2015

if Flume is already present you can just download spark-streaming-flume.jar

On top of my previous comment, that's also not enough. You also need spark-streaming-flume-sink.jar, which is included in the assembly... which is why having yet another set of instructions is confusing to users.

@tdas
Copy link
Contributor

tdas commented Jul 10, 2015

I understand it now. All right LGTM.

@asfgit asfgit closed this in 0e78e40 Jul 10, 2015
@vanzin vanzin deleted the SPARK-8852 branch July 30, 2015 00:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants