To be released.
- Python 3.2 is no more supported since even pip 8.0.0 also dropped their support for Python 3.2.
- Parsing RSS 1.0 feed is available. [
57
] - Refactoring
~libearth.parser
package. [54
]- Every single element parser can be specified using
~libearth.parser.base.ParserBase
and its decorator. When calling root element parser, the children elements are also parsed in hierarchical order. - Basic parsing information is stored in
~libearth.parser.base.SessionBase
and passed from parent parser to chlidren parsers. - Added
~libearth.parser.base.get_element_id
. It returns the string consists of an XML namespace and an element tag thatxml.etree.ElementTree
can recognize when finding children elements. - Support atom feed that
~libearth.feed.Text
with xhtml type.
- Every single element parser can be specified using
- Introduced new
libearth.defaults
module. This module provides small utilities and default data to fill initial state of Earth Reader apps. - HTML sanitizer now does rebase all links in the given document on the base uri. The
~libearth.feed.Text.get_sanitized_html()
method was added to~libearth.feed.Text
type. The~libearth.sanitizer.sanitize_html()
function became to additionally requirebase_uri
parameter. - Added
~libearth.session.Session.get_default_name()
for default session name.
Released on November 6, 2014.
- Fixed a bug that
~libearth.schema.complete()
never terminates for documents~libearth.schema.read()
from a single chunk.
Released on November 5, 2014.
- Fixed a bug that
~libearth.subscribe.SubscriptionList
s having~libearth.subscribe.Outline
s without theircreated_at <libearth.subscribe.Outline.created_at>
attribute failed to be merged on Python 3. [65
] - Fixed a bug that a
~libearth.schema.DocumentElement
in streamed read mode is not properly marked as complete even when it's completed by~libearth.schema.complete()
function in some cases.
Released on July 20, 2014.
- Fixed two backward compatibility breakages:
- A bug that subcategory changes hadn't been detected when
~libearth.subscribe.SubscriptionList
s are merged. - A bug that all children outlines become wiped when a category is deleted.
- A bug that subcategory changes hadn't been detected when
Released on July 12, 2014.
- Root
~libearth.session.MergeableDocumentElement
s'~libearth.session.MergeableDocumentElement.__merge_entities__()
methods are not ignored anymore. Respnosibilty to merge two documents is now moved fromSession.merge() <libearth.session.Session.merge>
method toMergeableDocumentElement.__merge_entities__() <libearth.session.MergeableDocumentElement.__merge_entities__>
method. ~libearth.crawler.crawl()
now return a set of~libearth.crawler.CrawlResult
objects instead oftuple
s.feeds
parameter of~libearth.crawler.crawl()
function was renamed tofeed_urls
.- Added
feed_uri
parameter and correspondingfeed_uri <libearth.crawler.CrawlError.feed_uri>
attribute to~libearth.crawler.CrawlError
exception. - Timeout option was added to crawler.
- Added optional
timeout
parameter to~libearth.crawler.crawl()
. - Added optional
timeout
parameter to~libearth.crawler.get_feed()
. - Added
~libearth.crawler.DEFAULT_TIMEOUT
constant which is 10 seconds.
- Added optional
- Added
LinkList.favicon <libearth.feed.LinkList.favicon>
property. [49
] Link.relation <libearth.feed.Link.relation>
attribute which had been optional now becomes requiredAutoDiscovery.find_feed_url() <libearth.parser.autodiscovery.AutoDiscovery.find_feed_url>
method (that returned feed links) was gone. InsteadAutoDiscovery.find() <libearth.parser.autodiscovery.AutoDiscovery.find>
method (that returns a pair of feed links and favicon links) was introduced. [49
]Subscription.icon_uri <libearth.subscribe.Subscription.icon_uri>
attribute was introduced. [49
]- Added an optional
icon_uri
parameter toSubscriptionSet.subscribe() <libearth.subscribe.SubscriptionSet.subscribe>
method. [49
] - Added
~libearth.parser.util.normalize_xml_encoding()
function to workaroundxml.etree.ElementTree
module's encoding detection bug. [41
] - Added
~libearth.tz.guess_tzinfo_by_locale()
function. [41
] - Added
microseconds
option to~libearth.codecs.Rfc822
codec. - Fixed incorrect merge of subscription/category deletion.
- Subscriptions are now archived rather than deleted.
~libearth.subscribe.Outline
(which is a common superclass of~libearth.subscribe.Subscription
and~libearth.subscribe.Category
) now has~libearth.subscribe.Outline.deleted_at
attribute and~libearth.subscribe.Outline.deleted
property.
- Fixed several
~libearth.parser.rss2
parser bugs.- Now the parser accepts several malformed
<pubDate>
and<lastBuildDate>
elements. - It become to guess the time zone according to its
<language>
and the ccTLD (if applicable) when the date time doesn't give any explicit time zone (which is also malformed). [41
] - It had ignored
<category>
elements other than the last one, now it become to accept as many as there are. - It had ignored
<comments>
links at all, now these become to be parsed to~libearth.feed.Link
objects withrelation='discussion'
. - Some RSS 2 feeds put a URI into
<generator>
, so the parser now treat it as~libearth.feed.Generator.uri
rather than~libearth.feed.Generator.value
for such situation. <enclosure>
links had been parsed as~libearth.feed.Link
object without~libearth.feed.Link.relation
attribute, but it becomes to properly set the attribute to'enclosure'
.- Mixed
<link>
elements with Atom namespace also becomes to be parsed well.
- Now the parser accepts several malformed
- Fixed several
~libearth.parser.atom
parser bugs.- Now it accepts obsolete PURL Atom namespace.
- Since some broken Atom feeds (e.g. Naver Blog) provide date time as
822
format which is incorrect according to4287#section-3.3
(section 3.3), the parser becomes to accept822
format as well. - Some broken Atom feeds (e.g. Naver Blog) use
<modified>
which is not standard instead of<updated>
which is standard, so the parser now treats<modified>
equivalent to<updated>
. <content>
and<summary>
can hastext/plain
andtext/html
in addition totext
andhtml
.<author>
/<contributor>
becomes ignored if it hasn't any of<name>
,<uri>
, or<email>
.- Fixed a parser bug that hadn't interpret omission of
link[rel] <libearth.feed.Link.relation>
attribute as'alternate'
.
- Fixed the parser to work well even if there's any file separator characters (FS,
'\x1c'
).
Released on July 12, 2014.
- Fixed
~libearth.parser.rss2
parsing error when any empty element occurs. - Fixed a bug that
~libearth.schema.validate()
function errored when any subelement has~libearth.schema.Text
descriptor.
Released on April 22, 2014.
- Session files in
.sessions/
directory become to be touched only once at a transaction. [43
] - Added
SubscriptionSet.contains() <libearth.subscribe.SubscriptionSet.contains>
method which providesrecursively=True
option. It's useful for determining that a subcategory or subscription is in the whole tree. Attribute.default <libearth.schema.Attribute.default>
option becomes to accept only callable objects. Below 0.2.0,~libearth.schema.Attribute.default
is not a function but a value which is simply used as it is.libearth.parser.heuristic
module is gone; andget_format()
function in the module is moved tolibearth.parser.autodiscovery
module:~libearth.parser.autodiscovery.get_format()
.- Added
Link.html <libearth.feed.Link.html>
property. - Added
LinkList.permalink <libearth.feed.LinkList.permalink>
property. - Fixed a
~libearth.repository.FileSystemRepository
bug that conflicts reading buffer and emits broken mixed bytes when there are simultaneous readings and writings to the same key. - Fixed broken functions related to repository urls on Windows.
- Fixed
libearth.compat.parallel.cpu_count()
function not to raiseNotImplementedError
in some cases. - Fixed
~libearth.codecs.Rfc822
to properly work also on non-English locales e.g.ko_KR
.
Released on January 19, 2014.
- XML elements in data files are written in canonical order. For example,
<title>
element of the feed was at the back before, but now is in front. write() <libearth.schema.write>
becomes to store length hints of children that is~libearth.schema.Child.multiple
, and~libearth.schema.read()
becomes aware of the hints. When hints are readlen()
for the~libearth.schema.ElementList
is O(1).- Fixed a bug that
~libearth.parser.autodiscovery
raisesAttributeError
when the given HTML contains<link>
to bothapplication/atom+xml
andapplication/rss+xml
. [40
] - Fill
<title>
to<description>
if there's no<title>
(~libearth.parser.rss2
). - Fill
<id>
to the feed URL if there's no<id>
(~libearth.parser.atom
).
Released on January 2, 2014.
- Added a workaround for thread unsafety
time.strftime()
on CPython. See http://bugs.python.org/issue7980 as well. [32
] - Fixed
UnicodeDecodeError
which is raised when a feed title contains any non-ASCII characters. [34
by Jae-Myoung Yu] - Now
libearth.parser.rss2
fillsEntry.updated_at <libearth.feed.Metadata.updated_at>
if it's not given. [35
] - Fixed
TypeError
which is raised when any~libearth.schema.DocumentElement
withmultiple
~libearth.schema.Child
elements is passed to~libearth.schema.validate()
function. - Fixed the race condition of two
FileSystemRepository <libearth.repository.FileSystemRepository>
objects creating the same directory. [36
by klutzy] ~libearth.compat.parallel.parallel_map()
becomes to raise exceptions at the last, if any errored. [38
]
Released on December 13, 2013. Initial alpha version.