Add .getHttpStatus and .getFilename to ArchiveRecordImpl class #198 & #164 #292

greebie · 2018-11-22T20:12:05Z

GitHub issue(s):

What does this Pull Request do?

This PR includes three changes to the ArchiveRecord class:

The Status Code Header for the crawl (via .getHttpStatus)
The read-streamed filename (via .getFilename)
Explicit testing of the ArchiveRecord class (this will likely not change coverage by much if at all).

.getHttpStatus could be useful for studies that would like to compare status calls on crawls.

Example:

  import io.archivesunleashed._
  import io.archivesunleashed.matchbox._
  import io.archivesunleashed.util._

  val links = RecordLoader.loadArchives("/Users/ryandeschamps/warcs/*gz", sc)
    .keepValidPages()
    .keepContent(Set("apple".r))
    .map(r => (r.getHttpStatus, (ExtractLinks(r.getUrl, r.getContentString))))
    .flatMap(r => r._2.map(f => (r._1, ExtractDomain(f._1).replaceAll("^\\s*www\\.", ""), 
      ExtractDomain(f._2).replaceAll("^\\s*www\\.", ""))))
    .filter(r => r._2 != "" && r._3 != "")
    .countItems()
    .filter(r => r._2 > 5).take(10)

Should produce something like

Array(((200,nanaimodailynews.com,nanaimodailynews.com),445785), ((200,nanaimodailynews.com,blackpress.ca),188676), ((200,nanaimodailynews.com,bclocalnews.com),111400), ... )

It might be more interesting to test this script without .keepValidPages() since most valid pages would likely return a 200. Where an ArchiveRecord fails to get a status code (via null or empty string) the record will return 000. This default value can be changed.

.getFilename returns the fullpath (which may be a url) of the Warc that was consumed.

  import io.archivesunleashed._
  import io.archivesunleashed.matchbox._
  import io.archivesunleashed.util._
  import org.apache.commons.io.FilenameUtils

  val links = RecordLoader.loadArchives("/Users/ryandeschamps/warcs/*gz", sc)
        .keepValidPages()
        .keepContent(Set("apple".r))
        .map(r => (r.getFilename, (ExtractLinks(r.getUrl, r.getContentString))))
        .flatMap(r => r._2.map(f => (r._1, ExtractDomain(f._1).replaceAll("^\\s*www\\.", ""), 
          ExtractDomain(f._2).replaceAll("^\\s*www\\.", ""))))
        .filter(r => r._2 != "" && r._3 != "")
        .countItems()
        .filter(r => r._2 > 5).take(10)

should produce something like

links: Array[((String, String, String), Int)] = Array(((file:/Users/ryandeschamps/warcs/ARCHIVEIT-4656-CRAWL_SELECTED_SEEDS-JOB193391-20160127222913427-00000.warc.gz,nanaimodailynews.com,nanaimodailynews.com),439503), 
((file:/Users/ryandeschamps/warcs/ARCHIVEIT-4656-CRAWL_SELECTED_SEEDS-JOB193391-20160127222913427-00000.warc.gz,nanaimodailynews.com,blackpress.ca),186028), 
((file:/Users/ryandeschamps/warcs/ARCHIVEIT-4656-CRAWL_SELECTED_SEEDS-JOB193391-20160127222913427-00000.warc.gz,nanaimodailynews.com,bclocalnews.com),106107), 
((file:/Users/ryandeschamps/warcs/ARCHIVEIT-4656-CRAWL_SELECTED_SEEDS-JOB193391-20160127222913427-00000.warc.gz,nanaimodailynews.com,drivewaycanada.ca),53040),
...

To get just the filename, you could use FilenameUtils.getName(x.getFilename) (I have included the FilenameUtils import in the code above). I have not looked into whether FilenameUtils will get the filename from a url, but that is why I stuck with the fullpath. Also, I am open to feedback on whether .getFilename is the right call for this. IIPC calls it .getReaderIdentifier().

I can think of a number of use cases for this, but it would be great as a way to detect problems with warcs (e.g. with wonky data). It might also be helpful for error responses down the road.

How should this be tested?

The above scripts should have expected outputs.
Both functions should work for both arc and warc formats.
Travis should pass.

Additional Notes:

I used some of @dportabella 's ideas to produce the code here, in particular for getting the HttpStatus from a Warc file. For Arc and the Warc filename, I tried to use the WARCRecord and ARCRecord classes instead.

I did not include the full header responses in this PR. It seems like the ArcRecord and WarcRecord responses are quite different and I haven't been able to produce an effective testing mechanism.

Interested parties

@ianmilligan1 @ruebot

Thanks in advance for your help with the Archives Unleashed Toolkit!

- add .httpStatus to potential outputs - add tests for .httpStatus calls - improve ArchiveRecord testing overall.

- add filename to trait - add filename for ArchiveRecordImpl - add tests for filename.

codecov-io · 2018-11-22T20:26:10Z

Codecov Report

Merging #292 into master will increase coverage by 0.06%.
The diff coverage is 81.25%.

@@            Coverage Diff             @@
##           master     #292      +/-   ##
==========================================
+ Coverage   73.33%   73.39%   +0.06%     
==========================================
  Files          42       42              
  Lines        1170     1184      +14     
  Branches      205      210       +5     
==========================================
+ Hits          858      869      +11     
  Misses        244      244              
- Partials       68       71       +3

Impacted Files	Coverage Δ
...ain/scala/io/archivesunleashed/ArchiveRecord.scala	`82.14% <81.25%> (-1.2%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 80b9e2b...b722d1d. Read the comment docs.

ianmilligan1 · 2018-11-22T20:33:20Z

Thanks @greebie – I can test. Is this PR complete or is there anything else you're hoping to add to it before we start the review process?

greebie · 2018-11-22T20:34:10Z

Thanks. I'm pretty sure I learned from the last PR and did a very good job testing. fingers crossed I think it's ready to go.

ianmilligan1 · 2018-11-22T20:34:58Z

OK sg. I'll take first round testing, realistically probably tomorrow morning.

ruebot · 2018-11-23T13:41:08Z

src/test/scala/io/archivesunleashed/ArchiveRecordTest.scala

@@ -37,17 +38,76 @@ class ArchiveRecordTest extends FunSuite with BeforeAndAfter {
      .setAppName(appName)
    conf.set("spark.driver.allowMultipleContexts", "true");
    sc = new SparkContext(conf)
+


Remove blank line.

ruebot · 2018-11-23T13:42:15Z

src/main/scala/io/archivesunleashed/ArchiveRecord.scala


 /** Trait for a record in a web archive. */
 trait ArchiveRecord extends Serializable {
+  /** Returns the filename containing the Archive Records */
+  def getFilename: String


Let's use Resourcename instead of Filename.

Done. Just to confirm Resourcename (to mimic Filename) and not ResourceName (because resource name are two separate words)? I'm okay with either.

Current commit uses Resourcename. I think that's fine - I'm probably just overthinking it.

I think Resourcename works FWIW

Resourcename is fine.

ruebot · 2018-11-23T13:42:22Z

src/main/scala/io/archivesunleashed/ArchiveRecord.scala


 /** Trait for a record in a web archive. */
 trait ArchiveRecord extends Serializable {
+  /** Returns the filename containing the Archive Records */


Missing fullstop.

ruebot · 2018-11-23T13:43:42Z

src/main/scala/io/archivesunleashed/ArchiveRecord.scala

@@ -64,6 +74,7 @@ class ArchiveRecordImpl(r: SerializableWritable[ArchiveRecordWritable]) extends
  var arcRecord: ARCRecord = null
  var warcRecord: WARCRecord = null
  // scalastyle:on null
+  var headerResponseFormat: String = "US-ASCII"


What's the rationale for US-ASCII?

Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets. A recipient SHOULD treat other octets in field content (obs-text) as opaque data.

I suppose it this? https://tools.ietf.org/html/rfc7230#section-3.2.4

Yes - this part was a mimic of @dportabella 's suggestion. Seemed weird to me at first as well.

ianmilligan1

Tested with variations on

import io.archivesunleashed._
import io.archivesunleashed.matchbox._

RecordLoader.loadArchives("/tuna1/scratch/i2milligan/warcs.archive-it.org/cgi-bin/getarcs.pl/*.gz", sc)
  .keepValidPages()
  .map(r => (r.getFilename, r.getHttpStatus, r.getCrawlDate, r.getDomain, r.getUrl, RemoveHTML(r.getContentString)))
  .saveAsTextFile("/tuna1/scratch/i2milligan/results/plain-text-http")

WARC file and response codes all check out, and work in a variety of scripts. Good to go from my POV (and @ruebot thanks for your review too!).

- change .getFilename to .getResourcename - Other code style fixes.

greebie · 2018-11-23T16:08:06Z

I would like to run one more test on this minus the .keepValidRecords(). Seems like there's potential for that to break.

ianmilligan1 · 2018-11-23T16:09:57Z

So just i.e.

import io.archivesunleashed._
import io.archivesunleashed.matchbox._

RecordLoader.loadArchives("/tuna1/scratch/i2milligan/warcs.archive-it.org/cgi-bin/getarcs.pl/*.gz", sc)
  .map(r => (r.getFilename, r.getHttpStatus, r.getCrawlDate, r.getDomain, r.getUrl, RemoveHTML(r.getContentString)))
  .saveAsTextFile("/tuna1/scratch/i2milligan/results/plain-text-http")

I have a large body of WARCs to test on, so if that's what you mean I can test it here too @greebie

greebie · 2018-11-23T16:11:27Z

Yes. Would like to see some statuses besides 200. :) Also if the header is munged, I'd like to be able to cover for it. Really appreciate it if you could.

ianmilligan1 · 2018-11-23T16:12:06Z

OK. Let me run it at scale minus the plain text, so we can easily look around at status codes too.

ianmilligan1 · 2018-11-23T16:32:20Z

Tested it without .keepValidPages() and all seems to work.

Am seeing other response codes - i.e. 404s, 301s, etc. i.e.

(file:/tuna1/scratch/i2milligan/warcs.archive-it.org/cgi-bin/getarcs.pl/ARCHIVEIT-5421-MONTHLY-3224-20150514172923428-00000-wbgrp-crawl064.us.archive.org-6442.warc.gz,301,20150514,www.ontario.ca,http://www.ontario.ca/dgr)
(file:/tuna1/scratch/i2milligan/warcs.archive-it.org/cgi-bin/getarcs.pl/ARCHIVEIT-5421-MONTHLY-3224-20150514172923428-00000-wbgrp-crawl064.us.archive.org-6442.warc.gz,200,20150514,www1.toronto.ca,http://www1.toronto.ca/wps/portal/contentonly?vgnextoid=68a618b06a1aa410VgnVCM10000071d60f89RCRD)
(file:/tuna1/scratch/i2milligan/warcs.archive-it.org/cgi-bin/getarcs.pl/ARCHIVEIT-5421-MONTHLY-3224-20150514172923428-00000-wbgrp-crawl064.us.archive.org-6442.warc.gz,200,20150514,www.ontario.ca,http://www.ontario.ca/travel-and-recreation/about-ontario?utm_source=shortlinks&utm_medium=web&utm_campaign=dgr)
(file:/tuna1/scratch/i2milligan/warcs.archive-it.org/cgi-bin/getarcs.pl/ARCHIVEIT-5421-MONTHLY-3224-20150514172923428-00000-wbgrp-crawl064.us.archive.org-6442.warc.gz,404,20150514,www.ontario.ca,http://www.ontario.ca/travel-and-recreation/Msxml2.XMLHTTP)

greebie · 2018-11-23T16:33:18Z

Awesome! I think doing a status code visualisation would make a decent Medium post. That might be my december project.

greebie · 2018-11-23T17:34:23Z

More relevant to #260 but related to why coverage shrunk in this PR - I think we need test cases for ArchiveRecords that are neither WARC or ARC Format. The java takes care of the error handling I think, but codecov notices that we did not test for that here in ArchiveRecord for some reason. I think this is something for a different PR, however.

ruebot · 2018-11-27T18:26:35Z

I think we have different interpretations of what Resource name is. This appears to be putting out the filename of the ARC/WARC. My understanding was that it was the resource name of the resource being parsed. i.e. foo.jpg or index.html. If we're resolving #164 with this (my bad, I should I have read that more closely when first reviewing), then it should not be Resourcename. It should be ArchiveFilename or something similar that is more explicit what it is.

ruebot · 2018-11-27T18:28:31Z

We also need to decide if it is just the ARC/WARC filename name itself that the method returns, or the full path.

ruebot · 2018-11-27T18:31:48Z

src/test/scala/io/archivesunleashed/ArchiveRecordTest.scala

+    assert(textSampleWarc.deep == Array("20080430", "20080430", "20080430").deep)
+  }
+
+  test("Domains") {


There are a few extra test methods here. Are they scope for the issues posted in the original comment? Or do they cover other tickets?

No other tickets. Basically, last PR I had good test coverage, but failed to test particular cases and my code passed Travis with bugs. I decided to include these additional tests in case that happened again (it was unlikely, but I wanted this PR to go more smoothly). Since ArchiveRecord is used widely across the tests, I did not expect the tests to improve coverage (as per #260).

ruebot · 2018-11-27T18:57:46Z

Other than the above comments, functionality-wise, looks good to me:

import io.archivesunleashed._
import io.archivesunleashed.matchbox._

RecordLoader.loadArchives("/home/nruest/tmp/test-warcs/5467/*.gz", sc)
  .map(r => (r.getResourcename, r.getHttpStatus))
  .saveAsTextFile("/home/nruest/tmp/5467_output")

(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-NONE-560-20150313151809333-00004-wbgrp-crawl060.us.archive.org-6443.warc.gz,404)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-NONE-560-20150313151809333-00004-wbgrp-crawl060.us.archive.org-6443.warc.gz,404)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-NONE-560-20150313151809333-00004-wbgrp-crawl060.us.archive.org-6443.warc.gz,404)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-NONE-560-20150313151809333-00004-wbgrp-crawl060.us.archive.org-6443.warc.gz,404)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-NONE-560-20150313151809333-00004-wbgrp-crawl060.us.archive.org-6443.warc.gz,404)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-NONE-560-20150313151809333-00004-wbgrp-crawl060.us.archive.org-6443.warc.gz,404)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,200)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,200)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,200)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,200)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,204)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,403)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,200)
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB265607-20170206183529615-00000.warc.gz,000)

Nice work @greebie 😄

greebie · 2018-11-27T19:44:49Z

I think we have different interpretations of what Resource name is. This appears to be putting out the filename of the ARC/WARC. My understanding was that it was the resource name of the resource being parsed. i.e. foo.jpg or index.html. If we're resolving #164 with this (my bad, I should I have read that more closely when first reviewing), then it should not be Resourcename. It should be ArchiveFilename or something similar that is more explicit what it is.

I think archiveFilename works. The issue is that because the filename comes from the ArcRecord and WarcRecord classes it technically could be a url. IIPC calls it "getReaderIdentifier"

greebie · 2018-11-27T19:52:11Z

We also need to decide if it is just the ARC/WARC filename name itself that the method returns, or the full path.

I suggest we keep the fullpaths but provide information accessing the filename in the docs. There's an example to extract just the filename in the tests. Another option is to include a util.

ruebot · 2018-11-27T19:52:42Z

Ok, let's go with archiveFilename, and update doc comments as necessary too.

ruebot · 2018-11-27T19:54:11Z

I suggest we keep the fullpaths but provide information accessing the filename in the docs. There's an example to extract just the filename in the tests.

Ok. Update this ticket as necessary after this gets merged.

- include changes to tests.

ruebot · 2018-11-28T00:11:45Z

Good to go!

Script:

import io.archivesunleashed._
import io.archivesunleashed.matchbox._
import org.apache.commons.io.FilenameUtils

RecordLoader.loadArchives("/home/nruest/tmp/test-warcs/5467/*.gz", sc)
  .map(r => (r.getArchiveFilename, r.getHttpStatus, FilenameUtils.getName(r.getArchiveFilename)))
  .saveAsTextFile("/home/nruest/tmp/292_final_test")

Sample output:

(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz,200,ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz)                                                                                                               
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz,000,ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz)                                                                                                               
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz,000,ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz)                                                                                                               
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz,204,ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz)                                                                                                               
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz,403,ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz)                                                                                                               
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz,200,ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz)                                                                                                               
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz,000,ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz)                                                                                                               
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz,200,ARCHIVEIT-5467-WEEKLY-JOB220510-20160620183529180-00000.warc.gz)                                                                                                               
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz,000,ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz)                                                       
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz,000,ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz)                                                       
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz,000,ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz)                                                       
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz,000,ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz)                                                       
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz,000,ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz)                                                       
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz,000,ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz)                                                       
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz,000,ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz)                                                       
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz,000,ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz)                                                       
(file:/home/nruest/tmp/test-warcs/5467/ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz,000,ARCHIVEIT-5467-DAILY-28665-20150527153956545-00000-aidata404-bu.us.archive.org-6441.warc.gz)

greebie added 4 commits November 20, 2018 22:28

Add httpStatus to ArchiveRecord class & trait

82314ad

- add .httpStatus to potential outputs - add tests for .httpStatus calls - improve ArchiveRecord testing overall.

Add .fileName feature to ArchiveRecordImpl.

9bca66e

- add filename to trait - add filename for ArchiveRecordImpl - add tests for filename.

Fix getFileName to getFilename for consistency.

37ae8e0

Merge branch 'master' into issue-198

46b32a2

ruebot requested changes Nov 23, 2018

View reviewed changes

ianmilligan1 approved these changes Nov 23, 2018

View reviewed changes

Updates based on PR.

739e06c

- change .getFilename to .getResourcename - Other code style fixes.

ruebot mentioned this pull request Nov 23, 2018

Update aut documentation for https://github.com/archivesunleashed/aut/pull/292 archivesunleashed/archivesunleashed.org#74

Closed

ruebot reviewed Nov 27, 2018

View reviewed changes

Change .getResourcename to .getArchiveFile

b722d1d

- include changes to tests.

ruebot approved these changes Nov 28, 2018

View reviewed changes

ruebot merged commit 7731b6d into master Nov 28, 2018

ruebot deleted the issue-198 branch November 28, 2018 00:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add .getHttpStatus and .getFilename to ArchiveRecordImpl class #198 & #164 #292

Add .getHttpStatus and .getFilename to ArchiveRecordImpl class #198 & #164 #292

greebie commented Nov 22, 2018 •

edited by ianmilligan1

codecov-io commented Nov 22, 2018 •

edited

ianmilligan1 commented Nov 22, 2018

greebie commented Nov 22, 2018

ianmilligan1 commented Nov 22, 2018

ruebot Nov 23, 2018

ruebot Nov 23, 2018 •

edited

greebie Nov 23, 2018

greebie Nov 23, 2018

ianmilligan1 Nov 23, 2018

ruebot Nov 24, 2018

ruebot Nov 23, 2018

ruebot Nov 23, 2018

ruebot Nov 23, 2018

greebie Nov 23, 2018

ianmilligan1 left a comment •

edited

greebie commented Nov 23, 2018

ianmilligan1 commented Nov 23, 2018

greebie commented Nov 23, 2018 •

edited

ianmilligan1 commented Nov 23, 2018

ianmilligan1 commented Nov 23, 2018

greebie commented Nov 23, 2018

greebie commented Nov 23, 2018

ruebot commented Nov 27, 2018

ruebot commented Nov 27, 2018

ruebot Nov 27, 2018

greebie Nov 27, 2018

ruebot commented Nov 27, 2018

greebie commented Nov 27, 2018 •

edited

greebie commented Nov 27, 2018 •

edited

ruebot commented Nov 27, 2018

ruebot commented Nov 27, 2018

ruebot commented Nov 28, 2018

Add .getHttpStatus and .getFilename to ArchiveRecordImpl class #198 & #164 #292

Add .getHttpStatus and .getFilename to ArchiveRecordImpl class #198 & #164 #292

Conversation

greebie commented Nov 22, 2018 • edited by ianmilligan1

What does this Pull Request do?

How should this be tested?

Additional Notes:

Interested parties

codecov-io commented Nov 22, 2018 • edited

Codecov Report

ianmilligan1 commented Nov 22, 2018

greebie commented Nov 22, 2018

ianmilligan1 commented Nov 22, 2018

Choose a reason for hiding this comment

ruebot Nov 23, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ianmilligan1 left a comment • edited

Choose a reason for hiding this comment

greebie commented Nov 23, 2018

ianmilligan1 commented Nov 23, 2018

greebie commented Nov 23, 2018 • edited

ianmilligan1 commented Nov 23, 2018

ianmilligan1 commented Nov 23, 2018

greebie commented Nov 23, 2018

greebie commented Nov 23, 2018

ruebot commented Nov 27, 2018

ruebot commented Nov 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruebot commented Nov 27, 2018

greebie commented Nov 27, 2018 • edited

greebie commented Nov 27, 2018 • edited

ruebot commented Nov 27, 2018

ruebot commented Nov 27, 2018

ruebot commented Nov 28, 2018

greebie commented Nov 22, 2018 •

edited by ianmilligan1

codecov-io commented Nov 22, 2018 •

edited

ruebot Nov 23, 2018 •

edited

ianmilligan1 left a comment •

edited

greebie commented Nov 23, 2018 •

edited

greebie commented Nov 27, 2018 •

edited

greebie commented Nov 27, 2018 •

edited