Update Apache Spark to 2.3.0; resolves #218 #219

ruebot · 2018-05-12T23:44:53Z

GitHub issue(s): #218

What does this Pull Request do?

Update tests to use workaround for SPARK-2243
Comment out ExtractGraph test as per https://github.com/archivesunleashed/aut/pull/204/files#diff-4541b9834513985c360b64093fd45073
Align Hadoop version with Apache Spark pom.xml https://github.com/apache/spark/blob/branch-2.3/pom.xml#L120

How should this be tested?

TravisCI should take care of things, but a smoke test with a directory of warcs, and some basic tweet analysis would be good.

@lintool @TitusAn

- Update tests to use workaround for SPARK-2243 - Comment out ExtractGraph test as per https://github.com/archivesunleashed/aut/pull/204/files#diff-4541b9834513985c360b64093fd45073 - Align Hadoop version with Apache Spark pom.xml https://github.com/apache/spark/blob/branch-2.3/pom.xml#L120

codecov · 2018-05-12T23:59:04Z

Codecov Report

Merging #219 into master will decrease coverage by 4.96%.
The diff coverage is n/a.

@@            Coverage Diff            @@
##           master    #219      +/-   ##
=========================================
- Coverage   66.16%   61.2%   -4.97%     
=========================================
  Files          34      34              
  Lines         665     665              
  Branches      124     124              
=========================================
- Hits          440     407      -33     
- Misses        184     217      +33     
  Partials       41      41

Impacted Files	Coverage Δ
.../scala/io/archivesunleashed/app/ExtractGraph.scala	`0% <0%> (-94.29%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b8a8a97...b6eb72c. Read the comment docs.

ruebot · 2018-05-13T00:31:01Z

io.archivesunleashed.WarcTest: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:(..)

That's the error you were asking about at the datathon, that requires this hack conf.set("spark.driver.allowMultipleContexts", "true"); in all the tests. There's probably much better what to take care of it. Which, we can implement later.

lintool

lgtm

ianmilligan1

Tested on a tranche of about 200GB WARCs and a 17GB Twitter JSON file - all worked nicely.

ruebot · 2018-05-14T16:35:09Z

@ianmilligan1 thanks!! We should be good go after TravisCI turns green one last time.

ruebot requested a review from lintool May 12, 2018 23:44

lintool approved these changes May 13, 2018

View reviewed changes

ianmilligan1 approved these changes May 14, 2018

View reviewed changes

Merge branch 'master' of github.com:archivesunleashed/aut into issue-218

b6eb72c

ianmilligan1 merged commit fc8f4bf into master May 14, 2018

ianmilligan1 deleted the issue-218 branch May 14, 2018 16:49

ruebot mentioned this pull request May 15, 2018

Extract Image Links DF API + Test #221

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Apache Spark to 2.3.0; resolves #218 #219

Update Apache Spark to 2.3.0; resolves #218 #219

ruebot commented May 12, 2018

codecov bot commented May 12, 2018 •

edited

Loading

ruebot commented May 13, 2018

lintool left a comment

ianmilligan1 left a comment

ruebot commented May 14, 2018

Update Apache Spark to 2.3.0; resolves #218 #219

Update Apache Spark to 2.3.0; resolves #218 #219

Conversation

ruebot commented May 12, 2018

What does this Pull Request do?

How should this be tested?

codecov bot commented May 12, 2018 • edited Loading

Codecov Report

ruebot commented May 13, 2018

lintool left a comment

Choose a reason for hiding this comment

ianmilligan1 left a comment

Choose a reason for hiding this comment

ruebot commented May 14, 2018

codecov bot commented May 12, 2018 •

edited

Loading