Skip to content

@ruebot ruebot released this Aug 21, 2019 · 8 commits to master since this release

aut-0.18.0 (2019-08-21)

Full Changelog

Implemented enhancements:

  • Add method for unknown extensions in binary extractions #343
  • Use Tika's detected MIME type instead of ArchiveRecord getMimeType? #342
  • Add filter/keep by http status to RecordLoader class #315
  • Audio binary object extraction #307
  • Video binary object extraction #306
  • Powerpoint binary object extraction #305
  • Doc binary object extraction #304
  • Spreadsheet binary object extraction #303
  • PDF binary object extraction #302
  • Test aut with Apache Spark 2.4.0 #295
  • Replace hashing of unique ids with .zipWithUniqueId() #243
  • Integration of neural network models for image analysis #240
  • More complete Twitter Ingestion #194
  • Image Search Functionality #165
  • feature request: log when loadArchives opens and closes warc files in a dir #156

Fixed bugs:

  • DataFrame commands throwing java.lang.NullPointerException on example data #320
  • Class issues when using aut-0.17.0-fatjar.jar #313
  • Image extraction does not scale with number of WARCs #298
  • ExtractDomain mistakenly checks source first then url #277
  • Improve ExtractDomain to Better Isolate Domains #269

Closed issues:

  • Inconsistency in ArchiveRecord.getContentBytes #334
  • Rationalize computeHash and ComputeMD5 #333
  • Test additional Java versions with TravisCI #324
  • Remove Twitter/tweet analysis #322
  • Trouble testing s3 connectivity #319
  • Depfu Error: No dependency files found #309
  • Strategy to deal with conflict between application and Spark distribution dependencies #308
  • SaveImageTest.scala should delete saved image file #299
  • Remove Deprecated ExtractGraph.scala file for next release. #291
  • DetectLanguage.scala: class LanguageIdentifier in package language is deprecated #286
  • CVE-2017-7525 -- com.fasterxml.jackson.core:jackson-databind #279
  • Maven build warning during release #273
  • Improve DataFrameLoader.scala test coverage #265
  • Improve package.scala test coverage #263
  • Discussion: Idiom for loading DataFrames #231
  • DataFrame field names: open thread #229
  • DataFrame performance comparison: Scala vs. Python #215
  • TweetUtilsTest.scala doesn't test Spark, only underlying json4s library #206
  • feature request: ArchiveRecord.archiveFile #164
  • feature request: possibility to query about the progress #162
  • Update to Apache Tika 1.19.1; security vulnerabilities in 1.12 #131
  • Create tests for ExtractGraph.scala #49
  • Setup Victims #5

Merged pull requests:

Assets 29
You can’t perform that action at this time.