Permalink
Commits on Nov 15, 2012
  1. run GC when looping through images

    committed Nov 15, 2012
Commits on Nov 8, 2012
Commits on Nov 6, 2012
  1. min image height

    committed Nov 6, 2012
Commits on Oct 23, 2012
  1. improve title extraction by looking for longest common substring in c…

    …andidate @titles, add tests and update corpus
    committed Oct 23, 2012
  2. remove time logs

    committed Oct 23, 2012
  3. bugfixes for image extractor

    committed Oct 23, 2012
  4. allow min-height for image scraper

    committed Oct 23, 2012
Commits on Oct 17, 2012
  1. bugfix

    committed Oct 17, 2012
Commits on Oct 12, 2012
  1. merge peterc and fix tests

    committed Oct 12, 2012
  2. add rake to gemspec

    committed Oct 12, 2012
Commits on Aug 15, 2012
  1. Merge pull request #19 from adamcrown/tags_attribute

    New Pismo::Document#tags method.
    peterc committed Aug 15, 2012
  2. New Pismo::Document#tags method.

    Support extracting tags from documents.
    adamcrown committed Aug 15, 2012
  3. Resolving

    peterc committed Aug 15, 2012
  4. Merge pull request #18 from adamcrown/datetimes_with_punctuation

    Support punctuation in dates.
    peterc committed Aug 15, 2012
Commits on Aug 14, 2012
  1. Support punctuation in dates.

    Such as periods after an abbreviated month or day of week and commas before the year.
    adamcrown committed Aug 14, 2012
  2. Set explicit timezone during testing.

    The metadata_expected.yaml includes datetimes with a +01:00 time zone. Which
    means the test will fail for anybody running the test in a different time zone.
    Explicitly setting the time zone to UTC in the helper solves this problem.
    adamcrown committed Aug 14, 2012
Commits on Apr 7, 2012
  1. * Rewrote image_extractor to use more idiomatic Ruby

    * Rewrote huge parts of internal_document to be more DRY and produce less garbage
    * Integrated the htmlentities gem for generalized HTML entity decoding
    * Fixed HTML entity decoding so that it happens when content is extracted, rather than doing it on the source document, which can break parsing
    * Stubbed out the network calls in the test suite, resulting in dramatically faster tests
    * General garbage, speed, and style tweaks
    * Removed trailing whitespace from many files
    * Make the ImageExtractor logger customizable, or pass false for no logger
    * In the same vein, use default options and pass them along down to the various pieces of the parser
    cheald committed Apr 7, 2012
Commits on Mar 17, 2012
Commits on Feb 29, 2012
  1. Further fixes for encoding bugs

    dparis committed Feb 29, 2012
  2. Fixed bug where raw_html content was being processed before the encod…

    …ing was enforced, leading to an invalid UTF-8 encoding exception
    dparis committed Feb 29, 2012
  3. Merge branch 'bborn_bugfixes'

    dparis committed Feb 29, 2012
  4. Merge branch 'ashleyw_phrasie'

    dparis committed Feb 29, 2012
Commits on Jan 3, 2012
  1. Merge pull request #10 from stipple/master

    shoulda should be a development dependency
    peterc committed Jan 3, 2012
Commits on Nov 29, 2011
Commits on Aug 11, 2011
  1. Merge pull request #9 from SleepTillSeven/patch-1

    removing duplicate summary
    peterc committed Aug 11, 2011
Commits on Aug 10, 2011
Commits on Apr 28, 2011
  1. bugfix redursive image searching

    committed Apr 28, 2011
  2. there

    committed Apr 28, 2011
  3. another logging bugfix

    committed Apr 28, 2011
  4. bugfix

    committed Apr 28, 2011