Skip to content
This repository has been archived by the owner on Apr 19, 2021. It is now read-only.

Duplicate items in feed #4

Closed
dsl101 opened this issue Jul 27, 2016 · 9 comments
Closed

Duplicate items in feed #4

dsl101 opened this issue Jul 27, 2016 · 9 comments

Comments

@dsl101
Copy link

dsl101 commented Jul 27, 2016

I've been having real trouble pinning down the source of a problem with a feed aggregator I'm building, and I've boiled it down to such a simple example that I can't see what I'm doing wrong.

I've captured a real feed and stored the XML locally so that I can be sure I'm getting the same stuff every time, but FeedParserPromised is returning a variable number of items, and almost always more than the feed contains, resulting in duplicates being emitted by my code. Can you tell me where I'm going wrong - or is it a problem with FPP? Note the code below uses the original source URL, but I get the same behaviour if I capture that and serve it locally as a static XML file. The feed itself has 10 items currently, but the code below outputs between 14 and 19 items:

var request = require('request');
var FeedParser = require('feedparser-promised');

var req = request({
    uri: 'http://www.europlanet-eu.org/feed/',
    timeout: 3000
});

FeedParser.parse(req).then(function (items) {   
    console.log(items.length);
});

Note I did recode this to use the underlying feedparser library, and always got 10 items returned, so I'm assuming it's something to do with the promisification... I'm running this on Debian 7 with node 1.8.1 if that makes any difference.

@dsl101
Copy link
Author

dsl101 commented Jul 27, 2016

Just to check, I upgraded node to v6 and I see still the issue. Example variable output below:

[root@git: /node/classifier] $ nodejs --version
v6.3.1
[root@git: /node/classifier] $ nodejs duplicates.js
19
[root@git: /node/classifier] $ nodejs duplicates.js
18

@alabeduarte
Copy link
Owner

alabeduarte commented Jul 27, 2016

Hi @dsl101

Thank you for bring this issue. I was able to reproduce this weird behaviour as well.

I will take a look on this and I should give you a feedback shortly.

Best!

@alabeduarte
Copy link
Owner

alabeduarte commented Jul 27, 2016

Hey @dsl101

At the moment feedparser-promised can't handle request object directly. Since there is no type checking javascript will fire the request twice, adding duplicated responses during the parsing.

Please, try to use this code below and see if it works for now:

var FeedParser = require('feedparser-promised');

FeedParser.parse('http://www.europlanet-eu.org/feed/').then(function (items) {
  console.log(items.length);
});

In the meanwhile I'll add the request object support (or an object that responds to uri and timeout), sounds good?

Best

@dsl101
Copy link
Author

dsl101 commented Jul 27, 2016

Ah - that's good news. Yes, I need to be able to specify the timeout which is why I switched to a request object. I thought I'd seen that in the docs somewhere, but maybe I just imagined it :). I will test without the timeout on Friday.

@alabeduarte
Copy link
Owner

@dsl101

Since you need to specify the timeout, I was wondering of having an API like this:

var FeedParser = require('feedparser-promised');

var options = { timeout: 3000 };

FeedParser.parse('http://www.europlanet-eu.org/feed/', options).then(function (items) {
  console.log(items.length);
});

What do you think? I would like to avoid an explicitly dependency of request package, but I would like to know your thoughts on that.

Best

@dsl101
Copy link
Author

dsl101 commented Jul 28, 2016

My only reason for putting it in the request object was in the admittedly
edge case of wanting different options for different URLs. It seemed to
make sense to keep the URL and its options together in the same object. But
I'd much rather have what you put below then the duplicates :)

On Thursday, 28 July 2016, Alabê Duarte notifications@github.com wrote:

@dsl101 https://github.com/dsl101

Since you need to specify the timeout, I was wondering of having an API
like this:

var FeedParser = require('feedparser-promised');
var options = { timeout: 3000 };
FeedParser.parse('http://www.europlanet-eu.org/feed/', options).then(function (items) {
console.log(items.length);
});

What do you think? I would like to avoid an explicitly dependency of
request package, but I would like to know your thoughts on that.

Best


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#4 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA29vvlu73QgvvNtytfrBdPf8c6Ep-xgks5qaCpFgaJpZM4JV-kv
.

@alabeduarte
Copy link
Owner

alabeduarte commented Jul 28, 2016

@dsl101

Just added some extra unit tests but it's also possible to use request options right away, like:

var FeedParser = require('./lib/feedParserPromised');

const tryout = (options) => {
  FeedParser.parse(options).then( (items) => {
    console.log(items.length);

    const allTitles = items.map( (item) => { return item.title; });
    console.log(allTitles);
  }).catch( (err) => {
    console.log('feedParserPromised error caught: ', err);
  });
};

const uri = 'http://www.europlanet-eu.org/feed/'

tryout({ uri: uri, timeout: 1 });
/*
expected output:
feedParserPromised error caught:  { [Error: ETIMEDOUT] code: 'ETIMEDOUT', connect: true }
*/

tryout({ uri: uri, timeout: 3000 });
/*
expected output:
10
[ 'ESOF 2016 – What do you think a comet smells like?',
  'Juno public event: Tuesday, 5 July 2016, Athens',
  'Liquid water in Ceres’s past',
  'Counting Down to Jupiter',
  'Mission Juno : University of Liège goes into orbit at Jupiter',
  'European involvement in the Juno mission',
  'Scientists come to Schloss Seggau to discuss Rosetta’s comet',
  'Coming soon: Jupiter and its Icy Moons',
  'DPS-EPSC Joint Meeting, 16-21 October 2016',
  'Jupiter blasted by 6.5 fireball impacts per year on average' ]
*/


tryout({ uri: uri });
/*
expected output:
10
[ 'ESOF 2016 – What do you think a comet smells like?',
  'Juno public event: Tuesday, 5 July 2016, Athens',
  'Liquid water in Ceres’s past',
  'Counting Down to Jupiter',
  'Mission Juno : University of Liège goes into orbit at Jupiter',
  'European involvement in the Juno mission',
  'Scientists come to Schloss Seggau to discuss Rosetta’s comet',
  'Coming soon: Jupiter and its Icy Moons',
  'DPS-EPSC Joint Meeting, 16-21 October 2016',
  'Jupiter blasted by 6.5 fireball impacts per year on average' ]
*/

tryout(uri);
/*
expected output:
10
[ 'ESOF 2016 – What do you think a comet smells like?',
  'Juno public event: Tuesday, 5 July 2016, Athens',
  'Liquid water in Ceres’s past',
  'Counting Down to Jupiter',
  'Mission Juno : University of Liège goes into orbit at Jupiter',
  'European involvement in the Juno mission',
  'Scientists come to Schloss Seggau to discuss Rosetta’s comet',
  'Coming soon: Jupiter and its Icy Moons',
  'DPS-EPSC Joint Meeting, 16-21 October 2016',
  'Jupiter blasted by 6.5 fireball impacts per year on average' ]
*/

Commit changes: 553be3a

@dsl101
Copy link
Author

dsl101 commented Jul 29, 2016

So, just because I'm curious to understand... It looks like I should have just passed the object directly to feedparser-promised all along, rather than creating the request object myself - is that right? Event though the parameter was named uri, it would have worked? Or did I miss some other changes you made?

@alabeduarte
Copy link
Owner

Yeah, you right. You should have just passed the object directly (without calling request function). The only change that I made was on the tests. I'll update the README as well with that.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants