Skip to content
Commits on Jan 3, 2016
Commits on Sep 30, 2015
Commits on Sep 25, 2015
  1. Add lots of documentation on config file options (new -c arg).

    Make a single VersionString live in feedmeparser module.
    Bump version to 1.0b1.
Commits on Sep 14, 2015
  1. Add a special .EOF. sequence to the end of MANIFEST

    so the fetcher can tell when it's fully ready to download.
    Output plaintext, not html, from urlrss.
Commits on Sep 10, 2015
  1. Remove a too-verbose message

Commits on Sep 9, 2015
  1. If nocache is set, always save an index file

    even if we didn't fetch any content,
    so feeds like Xtraurls will warn us of failed URLs.
Commits on Aug 27, 2015
Commits on Aug 9, 2015
  1. Print page_start/end info to stderr, not stdout,

    so it gets saved in the log in debug mode.
Commits on Aug 4, 2015
  1. Handle gzipped http

Commits on Jul 31, 2015
Commits on Jul 26, 2015
  1. Allow for cookies in the request: some sites, notably,

    degrade to an infinite redirect loop if cookies aren't enabled.
Commits on Jun 2, 2015
  1. Minor changes

Commits on Jun 1, 2015
  1. Save a MANIFEST file with a list of all filenames written,

    to make it easier for a downloader to fetch them.
Commits on May 26, 2015
  1. Remove entire <head> section -- which means we'll

    miss meta tags, but the important thing is that we'll
    skip foreign style tags that mess up our rendering and
    cause a lot of unwanted data downloads. Try to notice
    any meta charset tags inside the old head and save them.
Commits on May 21, 2015
  1. Add Los Alamos Daily Post

Commits on May 1, 2015
  1. Try to guard against errors from bad unicode characters in URLs.

    Those seem to be showing up on Longreads, in particular.
Commits on Mar 5, 2015
Commits on Feb 28, 2015
  1. Obey base href on pages that have it, for image rewriting.

    This may solve a lot of the images that weren't downloading.
    Guard against images that don't download; don't bomb out with
    errors, and rewrite the URL to an absolute one so the image
    will at least show if there's a live net connection.
  2. Strip RSS content before deciding it's blank.

    Los Alamos Daily Post has whitespace-only content.
Commits on Feb 27, 2015
  1. Rewrite img tags in the indexstr to use locally fetched images.

    Include a note in the indexstr for pages with blank content.
Commits on Feb 26, 2015
  1. Add User-Agent to every urllib2.Request we make,

    including images and referred requests.
Commits on Feb 25, 2015
  1. Set the User-Agent

Commits on Nov 18, 2014
Commits on Oct 31, 2014
  1. Set a default socket timeout of 100 seconds.

    This is the only way to set a feedparser timeout for the RSS URLs.
  2. Add new continue_on_timeout config parameter

    to control whether a timeout means we skip to the next site,
    or to the next story (true means stay on site, skip to next story).
  3. Add a timeout of 100 seconds on stories.

    If the timeout is exceeded, we'll note that in the log
    and skip the rest of the site, assuming that the site is broken.
    (Hmm, but this may be wrong with Xtraurls or sites like
    Longreads where stories come from different sources.)
Commits on Sep 26, 2014
  1. Make shell=False explicit in subprocess calls.

    (It was already the default, but let's be sure.)
Commits on Jul 15, 2014
Commits on Jul 14, 2014
  1. Eliminate href links in RSS that only span images we're removing,

    and links that only span spaces. (E.g. Slashdot RSS.)
Something went wrong with that request. Please try again.