New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #16: basic feed parser #37
Conversation
Tested with both types of feed: I wrote a simple test.js file using the code below which you can use to play around.
At the moment in the feed-parser.js file I put down console.log(items) , but it can also tracks metadata which you can check out with console.log(meta). The list is pretty exhaustive and can be found at the bottom of this document here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/parser.js
has the same code as src/feed-parser.js
.
Like you mention in your comment, the code I'm assuming is suppose to be in src/parser.js is:
const feedParser = require('./feed-parser.js');
const feed = feedParser('https://c3ho.blogspot.com/feeds/posts/default/-/open-source');
console.log(feed);
Also there should be a newline at the end of each JavaScript file, as @humphd mention in class can cause problems when building the project in different environments/systems.
Overall great work 👍! I did some testing and the code works fine on my end, just have to apply fixes and this should be a good starting point when merged. I also wanted to test htmlparser2 since it seems to have pretty good performance/coverage but this is for another issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Let's rework this into an async
/Promise
based function so we can use it in our pipeline. If you need help with this, let use know and we can help guide you on making the changes.
package.json
Outdated
"bull": "^3.11.0" | ||
"bull": "^3.11.0", | ||
"feedparser": "^2.2.9", | ||
"lodash.assign": "^4.2.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like any of these lodash.*
deps are being used below, let's remove them. Same for mri
and readable-stream
.
package.json
Outdated
"lodash.uniq": "^4.5.0", | ||
"mri": "^1.1.4", | ||
"readable-stream": "^3.4.0", | ||
"request": "^2.88.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The request
module isn't maintained anymore, so I've switched to using bent
above. You can remove this.
src/feed-parser.js
Outdated
@@ -0,0 +1,36 @@ | |||
const FeedParser = require('feedparser'); | |||
const request = require('request'); // for fetching the feed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can swap out bent
for this, see docs here https://github.com/mikeal/bent. Basically you can use bent(feed)
instead of request(feed)
.
src/feed-parser.js
Outdated
const FeedParser = require('feedparser'); | ||
const request = require('request'); // for fetching the feed | ||
|
||
module.exports = function(feed){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this function async
and have it return a Promise
, since we need to do work that will go beyond the lifetime of this call stack.
src/feed-parser.js
Outdated
const req = request(feed); | ||
|
||
req.on('error', function (error) { | ||
// handle any request errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can call reject(error)
on our promise here.
src/feed-parser.js
Outdated
const stream = this; // `this` is `feedparser`, which is a stream | ||
const meta = this.meta; // **NOTE** the "meta" is always available in the context of the feedparser instance | ||
let item = stream.read(); | ||
console.log(item); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you can resolve(item)
and give back the result for our Promise.
package.json
Outdated
@@ -13,7 +13,16 @@ | |||
}, | |||
"homepage": "https://github.com/Seneca-CDOT/telescope#readme", | |||
"dependencies": { | |||
"addressparser": "^1.0.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also seems unused, so you can remove it as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still outstanding, please remove.
@c3ho If you need any help, you can delegate some issues to me 👍 . |
Just a note but some issues can be alleviated and code simplified by using a promise-wrapped version of feedparser found at feedparser-promised on npm. Might be worth looking into as it'll do a lot of the work for us here. |
@lucacataldo, we could use it, but if you look at the code, it's essentially what I'm suggesting above, and then we don't add another dep. https://github.com/alabeduarte/feedparser-promised/blob/master/src/feedParserPromised.js. @c3ho let's move this PR forward ASAP, as it's blocking the rest of the work we need to do. Maybe we can discuss in class today. |
@humphd Since I'm not using request anymore and am using bent now I have to change a bit of the code. Mainly, I'm having an issue piping the data to feedparser(or rather I don't know how). Any tips? Here's what I have so far
|
@c3ho this needs to be commits on the PR, not a comment. Please submit this the appropriate way so that we can work through the code together. RE: bent vs. request, please see the docs, which discuss how to get a stream like request does: const bent = require('bent')
const getStream = bent('http://site.com')
let stream = await getStream('/json.api') Regarding the way you're doing the return new Promise((resolve, reject) => {
}); The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's prioritize getting this finished, since it's blocking other things. I'm happy to discuss today on slack or here if you want.
package.json
Outdated
@@ -13,7 +13,16 @@ | |||
}, | |||
"homepage": "https://github.com/Seneca-CDOT/telescope#readme", | |||
"dependencies": { | |||
"addressparser": "^1.0.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still outstanding, please remove.
ef1b23a
to
0b90289
Compare
@lucacataldo In the end we ended up using feedparser-promised as you suggested. Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this locally with the following modifications to the current feed-worker.js
code:
const feedQueue = require('./feed-queue');
const feedParser = require('./feed-parser');
exports.start = function () {
// Start processing jobs from the feed queue...
feedQueue.process(async (job) => {
const { url } = job.data;
console.log(`Processing job - ${url}`);
try {
const feed = await feedParser(url);
feed.forEach((item) => {
// Grab a few things off the feed for display
const { title, description, date } = item;
console.log(`${title} (${date})`);
console.log(`${description}\n\n`);
});
} catch (err) {
console.error(`Error parsing feed=${url}: ${err.message}`);
}
});
};
This works nicely. Great work @c3ho, and thanks to @lucacataldo for finding this promise wrapped version of the library, which turned out to be the right move after all. Apologies for delaying us from using it earlier.
NOTE: before this lands, we need the package.json
conflict fixed. My approval is pending and assuming that change is coming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
Fixes #16
Hi, this is a extremely basic feed-parser that will just spit out all the posts from the feed provided to console.