Skip to content
Commits on Jan 3, 2016
Commits on Sep 30, 2015
Commits on Sep 25, 2015
  1. Add lots of documentation on config file options (new -c arg).

    committed
    Make a single VersionString live in feedmeparser module.
    Bump version to 1.0b1.
Commits on Sep 14, 2015
  1. Add a special .EOF. sequence to the end of MANIFEST

    committed
    so the fetcher can tell when it's fully ready to download.
    Output plaintext, not html, from urlrss.
Commits on Sep 10, 2015
  1. Remove a too-verbose message

    committed
Commits on Sep 9, 2015
  1. If nocache is set, always save an index file

    committed
    even if we didn't fetch any content,
    so feeds like Xtraurls will warn us of failed URLs.
Commits on Aug 27, 2015
Commits on Aug 9, 2015
  1. Print page_start/end info to stderr, not stdout,

    committed
    so it gets saved in the log in debug mode.
Commits on Aug 4, 2015
  1. Handle gzipped http

    committed
Commits on Jul 31, 2015
Commits on Jul 26, 2015
  1. Allow for cookies in the request: some sites, notably nytimes.com,

    committed
    degrade to an infinite redirect loop if cookies aren't enabled.
Commits on Jun 2, 2015
  1. Minor changes

    committed
Commits on Jun 1, 2015
  1. Save a MANIFEST file with a list of all filenames written,

    committed
    to make it easier for a downloader to fetch them.
Commits on May 26, 2015
  1. Remove entire <head> section -- which means we'll

    committed
    miss meta tags, but the important thing is that we'll
    skip foreign style tags that mess up our rendering and
    cause a lot of unwanted data downloads. Try to notice
    any meta charset tags inside the old head and save them.
Commits on May 21, 2015
  1. Add Los Alamos Daily Post

    committed
Commits on May 1, 2015
  1. Try to guard against errors from bad unicode characters in URLs.

    committed
    Those seem to be showing up on Longreads, in particular.
Commits on Mar 5, 2015
Commits on Feb 28, 2015
  1. Obey base href on pages that have it, for image rewriting.

    committed
    This may solve a lot of the images that weren't downloading.
    Guard against images that don't download; don't bomb out with
    errors, and rewrite the URL to an absolute one so the image
    will at least show if there's a live net connection.
  2. Strip RSS content before deciding it's blank.

    committed
    Los Alamos Daily Post has whitespace-only content.
Commits on Feb 27, 2015
  1. Rewrite img tags in the indexstr to use locally fetched images.

    committed
    Include a note in the indexstr for pages with blank content.
Commits on Feb 26, 2015
  1. Add User-Agent to every urllib2.Request we make,

    committed
    including images and referred requests.
Commits on Feb 25, 2015
  1. Set the User-Agent

    committed
Commits on Nov 18, 2014
Commits on Oct 31, 2014
  1. Set a default socket timeout of 100 seconds.

    committed
    This is the only way to set a feedparser timeout for the RSS URLs.
  2. Add new continue_on_timeout config parameter

    committed
    to control whether a timeout means we skip to the next site,
    or to the next story (true means stay on site, skip to next story).
  3. Add a timeout of 100 seconds on stories.

    committed
    If the timeout is exceeded, we'll note that in the log
    and skip the rest of the site, assuming that the site is broken.
    (Hmm, but this may be wrong with Xtraurls or sites like
    Longreads where stories come from different sources.)
Commits on Sep 26, 2014
  1. Make shell=False explicit in subprocess calls.

    committed
    (It was already the default, but let's be sure.)
Commits on Jul 15, 2014
Commits on Jul 14, 2014
  1. Eliminate href links in RSS that only span images we're removing,

    committed
    and links that only span spaces. (E.g. Slashdot RSS.)
Something went wrong with that request. Please try again.