Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

media:content is added to enclosures array #45

Closed
rdbcci opened this Issue Jan 25, 2013 · 6 comments

Comments

Projects
None yet
2 participants

rdbcci commented Jan 25, 2013

in the FeedParser.prototype.handleItem switch statement, media:content is treated as an enclosure. this does not seem correct. for instnce, if the same url is in both places, it ends up twice in the enclosures array. anyways since i don't why you did it i am confused.

Owner

danmactough commented Jan 25, 2013

The enclosures array, like the other default properties, is intended to give you access to feed elements regardless of the feed format. Some feeds use media:content for enclosures -- keep in mind that all enclosures need not be audio files, whereas media:content may include audio files.

As to whether or not they belong in enclosures in a particular case, I'm not sure I want to go down the road of applying a filter -- very mistake-prone and other reasons not to do it.

Might be able to dedupe, but again I hesitate to do that because it's impossible to know which element among the duplicates is the "preferred" one that should be in the enclosures array and which are not.

Open to suggestions and pull requests.

rdbcci commented Jan 26, 2013

my thought is that if the provider provides dupes in the same element they should be recognized. otherwise feedparser, as the agregator, should not dupe them from separate elements e.g media:content and enclosure. i say this, as it seems that providers, may duplicate media:content and enclosure for instance view source http://www.dailymail.co.uk/ushome/index.rss.

Owner

danmactough commented Jan 26, 2013

Sorry, I don't understand. What do you expect to see when you parse that feed?

rdbcci commented Jan 26, 2013

unless i am mistaken, the articles enclosures array contains 2 entries. both entries are the same. one comes from

<enclosure url="http://i.mol.im/i/pix/2013/01/26/article-2268707-172CC8D7000005DC-388_154x115.jpg" length="6026" type="image/jpeg" />

and the other comes from:

<media:content type="image/jpeg" url="http://i.mol.im/i/pix/2013/01/26/article-2268707-172CC8D7000005DC-388_154x115.jpg" />

the above comes from a particular item from http://www.dailymail.co.uk/ushome/index.rss
and feedparser parsed to:

enclosures:
[ { url: 'http://i.mol.im/i/pix/2013/01/26/article-2268707-172CC8D7000005DC-388_154x115.jpg',
type: 'image/jpeg',
length: null },
{ url: 'http://i.mol.im/i/pix/2013/01/26/article-2268707-172CC8D7000005DC-388_154x115.jpg',
type: 'image/jpeg',
length: null } ],

run http://www.dailymail.co.uk/ushome/index.rss through feedparser and look at enclosures array and then look at http://www.dailymail.co.uk/ushome/index.rss enclosure and media:content tags for particular item.

Owner

danmactough commented Jan 28, 2013

Yeah, that makes no sense. Thanks for clarifying.

rdbcci commented Feb 2, 2013

let's make this feature (aggregating media:content into enclosures) option-able. i vote that the default is not to aggregate. but since it already works the other way, i can see why you might not like that default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment