Skip to content
This repository has been archived by the owner on Feb 16, 2024. It is now read-only.

[BAHIR-38] clean Ivy cache during Maven install phase #14

Conversation

ckadner
Copy link
Member

@ckadner ckadner commented Jul 27, 2016

BAHIR-38: Spark-submit does not use latest locally installed Bahir packages

When we install the org.apache.bahir jars into the local Maven repository we also need to clean the previous jar files from the Ivy cache (~/iv2/cache/org.apache.bahir/*) so spark-submit -packages ... will pick up the new version from the the local Maven repository.

pom.xml:

  <build>
    <plugins>
      ...
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-clean-plugin</artifactId>
        <executions>
          <execution>
            <id>cleanup-ivy-cache</id>
            <phase>install</phase>
            <goals>
              <goal>clean</goal>
            </goals>
            <configuration>
              <followSymLinks>false</followSymLinks>
              <excludeDefaultDirectories>true</excludeDefaultDirectories>
              <filesets>
                <fileset>
                  <directory>${user.home}/.ivy2/cache/${project.groupId}/${project.artifactId}</directory>
                  <includes>
                    <include>*-${project.version}.*</include>
                    <include>jars/${project.artifactId}-${project.version}.jar</include>
                  </includes>
                </fileset>
              </filesets>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
  ...

Test

(1) Start with an empty Ivy cache:

[bahir]$ rm -rf ~/.ivy2/cache/org.apache.bahir/*
[bahir]$ tree ~/.ivy2/cache/org.apache.bahir/ 

0 directories, 0 files

(2) Run an install build:

[bahir]$ mvn clean install -DskipTests  2>&1  | grep -E "ERROR|SUCCESS|FAIL"

[INFO] Apache Bahir - Parent POM .......................... SUCCESS [  3.472 s]
[INFO] Apache Bahir - Spark Streaming Akka ................ SUCCESS [ 11.806 s]
[INFO] Apache Bahir - Spark Streaming MQTT ................ SUCCESS [ 13.265 s]
[INFO] Apache Bahir - Spark Streaming Twitter ............. SUCCESS [  7.532 s]
[INFO] Apache Bahir - Spark Streaming ZeroMQ .............. SUCCESS [  7.366 s]
[INFO] BUILD SUCCESS

(3) Run spark-submit with a Bahir package (i.e. streaming-akka with MQTT wordcount, this is a no-op -> the ImportError at end is expected):

[bahir]$ ${SPARK_HOME}/bin/spark-submit \
    --packages org.apache.bahir:spark-streaming-akka_2.11:2.0.0-SNAPSHOT \
    streaming-mqtt/examples/src/main/python/streaming/mqtt_wordcount.py

Ivy Default Cache set to: ~/.ivy2/cache
The jars for the packages stored in: ~/.ivy2/jars
...
org.apache.bahir#spark-streaming-akka_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
    ...
    found org.apache.bahir#spark-streaming-akka_2.11;2.0.0-SNAPSHOT in local-m2-cache
🔴downloading file:~/.m2/repository/org/apache/bahir/spark-streaming-akka_2.11/2.0.0-SNAPSHOT/spark-streaming-akka_2.11-2.0.0-SNAPSHOT.jar ...
    [SUCCESSFUL ] org.apache.bahir#spark-streaming-akka_2.11;2.0.0-SNAPSHOT!spark-streaming-akka_2.11.jar (2ms)
...

ImportError: No module named mqtt

(4) Check the Ivy cache:

[bahir]$ tree ~/.ivy2/cache/org.apache.bahir/

~/.ivy2/cache/org.apache.bahir/
├── bahir-parent_2.11
│   ├── ivy-2.0.0-SNAPSHOT.xml
│   ├── ivy-2.0.0-SNAPSHOT.xml.original
│   └── ivydata-2.0.0-SNAPSHOT.properties
└── spark-streaming-akka_2.11
    ├── ivy-2.0.0-SNAPSHOT.xml
    ├── ivy-2.0.0-SNAPSHOT.xml.original
    ├── ivydata-2.0.0-SNAPSHOT.properties
    └── jars
        └── spark-streaming-akka_2.11-2.0.0-SNAPSHOT.jar🔴

3 directories, 7 files

(5) Run an install build for the streaming-akka:

[bahir]$ mvn install -DskipTests -pl streaming-akka  2>&1 | \
    grep -E "ERROR|SUCCESS|FAIL|Deleting"

[INFO] 🔴 Deleting ~/.ivy2/cache/org.apache.bahir/spark-streaming-akka_2.11 (includes = [*-2.0.0-SNAPSHOT.*, jars/spark-streaming-akka_2.11-2.0.0-SNAPSHOT.jar], excludes = [])
[INFO] BUILD SUCCESS

(6) Check the Ivy cache again, all files related to streaming-akka are gone:

[bahir]$ tree ~/.ivy2/cache/org.apache.bahir/

~/.ivy2/cache/org.apache.bahir/
├── bahir-parent_2.11
│   ├── ivy-2.0.0-SNAPSHOT.xml
│   ├── ivy-2.0.0-SNAPSHOT.xml.original
│   └── ivydata-2.0.0-SNAPSHOT.properties
└── spark-streaming-akka_2.11
    └── jars 🔴

3 directories, 3 files

@ckadner ckadner force-pushed the BAHIR-38_clean_Ivy_cache_during_mvn_install branch from e21e5d3 to 27ccaec Compare July 27, 2016 07:21
@deroneriksson
Copy link
Member

LGTM.
Verified behavior on OS X. The spark-submit command with --packages places the jar and related metadata files in ~/.ivy2/cache/org.apache.bahir/spark-streaming-akka_2.11. Subsequently running a maven project install deletes the jar and related metadata files from the ivy2 cache. Note that this assumes .ivy2 is at ${user.home} which should be good at least 99% of the time. Since this involves both maven and ivy, this is probably the best, cleanest solution.

@jodersky
Copy link
Member

Agree with Deron, LGTM

@ckadner
Copy link
Member Author

ckadner commented Jul 27, 2016

@deroneriksson @jodersky - Thanks for your review!

@lresende - Any comments, suggestions, objections?

@jodersky
Copy link
Member

IIRC, @lresende is on vacation this week. Nevertheless, I think you can go ahead and merge this quite safely, there is nothing major being changed and the worst case scenario, messing up the ivy cache, is not that bad either.

@lresende
Copy link
Member

lresende commented Aug 1, 2016

LGTM, merging soon

@asfgit asfgit closed this in 5e07303 Aug 1, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants