-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Apache Spark to 2.3.0; resolves #218 #219
Conversation
- Update tests to use workaround for SPARK-2243 - Comment out ExtractGraph test as per https://github.com/archivesunleashed/aut/pull/204/files#diff-4541b9834513985c360b64093fd45073 - Align Hadoop version with Apache Spark pom.xml https://github.com/apache/spark/blob/branch-2.3/pom.xml#L120
Codecov Report
@@ Coverage Diff @@
## master #219 +/- ##
=========================================
- Coverage 66.16% 61.2% -4.97%
=========================================
Files 34 34
Lines 665 665
Branches 124 124
=========================================
- Hits 440 407 -33
- Misses 184 217 +33
Partials 41 41
Continue to review full report at Codecov.
|
That's the error you were asking about at the datathon, that requires this hack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested on a tranche of about 200GB WARCs and a 17GB Twitter JSON file - all worked nicely.
@ianmilligan1 thanks!! We should be good go after TravisCI turns green one last time. |
GitHub issue(s): #218
What does this Pull Request do?
How should this be tested?
TravisCI should take care of things, but a smoke test with a directory of warcs, and some basic tweet analysis would be good.
@lintool @TitusAn