Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #16: basic feed parser #37

Merged
merged 1 commit into from Nov 9, 2019
Merged

Fixes #16: basic feed parser #37

merged 1 commit into from Nov 9, 2019

Conversation

c3ho
Copy link
Contributor

@c3ho c3ho commented Nov 5, 2019

Fixes #16

Hi, this is a extremely basic feed-parser that will just spit out all the posts from the feed provided to console.

@c3ho c3ho mentioned this pull request Nov 5, 2019
@c3ho
Copy link
Contributor Author

c3ho commented Nov 5, 2019

Tested with both types of feed:
Atom 1.0: https://blogname.blogspot.com/feeds/posts/default
RSS 2.0: https://blogname.blogspot.com/feeds/posts/default?alt=rss

I wrote a simple test.js file using the code below which you can use to play around.

const feedParser = require('./feed-parser.js');
const feed = feedParser('https://c3ho.blogspot.com/feeds/posts/default/-/open-source');
console.log(feed);

At the moment in the feed-parser.js file I put down console.log(items) , but it can also tracks metadata which you can check out with console.log(meta). The list is pretty exhaustive and can be found at the bottom of this document here

Copy link
Contributor

@Immutablevoid Immutablevoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


src/parser.js has the same code as src/feed-parser.js.

Like you mention in your comment, the code I'm assuming is suppose to be in src/parser.js is:
const feedParser = require('./feed-parser.js');
const feed = feedParser('https://c3ho.blogspot.com/feeds/posts/default/-/open-source');
console.log(feed);


Also there should be a newline at the end of each JavaScript file, as @humphd mention in class can cause problems when building the project in different environments/systems.


Overall great work 👍! I did some testing and the code works fine on my end, just have to apply fixes and this should be a good starting point when merged. I also wanted to test htmlparser2 since it seems to have pretty good performance/coverage but this is for another issue.

Copy link
Contributor

@humphd humphd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Let's rework this into an async/Promise based function so we can use it in our pipeline. If you need help with this, let use know and we can help guide you on making the changes.

package.json Outdated
"bull": "^3.11.0"
"bull": "^3.11.0",
"feedparser": "^2.2.9",
"lodash.assign": "^4.2.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like any of these lodash.* deps are being used below, let's remove them. Same for mri and readable-stream.

package.json Outdated
"lodash.uniq": "^4.5.0",
"mri": "^1.1.4",
"readable-stream": "^3.4.0",
"request": "^2.88.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The request module isn't maintained anymore, so I've switched to using bent above. You can remove this.

@@ -0,0 +1,36 @@
const FeedParser = require('feedparser');
const request = require('request'); // for fetching the feed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can swap out bent for this, see docs here https://github.com/mikeal/bent. Basically you can use bent(feed) instead of request(feed).

const FeedParser = require('feedparser');
const request = require('request'); // for fetching the feed

module.exports = function(feed){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this function async and have it return a Promise, since we need to do work that will go beyond the lifetime of this call stack.

const req = request(feed);

req.on('error', function (error) {
// handle any request errors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can call reject(error) on our promise here.

const stream = this; // `this` is `feedparser`, which is a stream
const meta = this.meta; // **NOTE** the "meta" is always available in the context of the feedparser instance
let item = stream.read();
console.log(item);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you can resolve(item) and give back the result for our Promise.

package.json Outdated
@@ -13,7 +13,16 @@
},
"homepage": "https://github.com/Seneca-CDOT/telescope#readme",
"dependencies": {
"addressparser": "^1.0.1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also seems unused, so you can remove it as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still outstanding, please remove.

@Immutablevoid
Copy link
Contributor

@c3ho If you need any help, you can delegate some issues to me 👍 .

@Reza-Rajabi Reza-Rajabi added this to Review in progress in Main Nov 6, 2019
@lucacataldo
Copy link
Contributor

Just a note but some issues can be alleviated and code simplified by using a promise-wrapped version of feedparser found at feedparser-promised on npm. Might be worth looking into as it'll do a lot of the work for us here.

@humphd
Copy link
Contributor

humphd commented Nov 7, 2019

@lucacataldo, we could use it, but if you look at the code, it's essentially what I'm suggesting above, and then we don't add another dep. https://github.com/alabeduarte/feedparser-promised/blob/master/src/feedParserPromised.js.

@c3ho let's move this PR forward ASAP, as it's blocking the rest of the work we need to do. Maybe we can discuss in class today.

@birtony birtony changed the title basic feed parser Fixes #16: basic feed parser Nov 7, 2019
@c3ho
Copy link
Contributor Author

c3ho commented Nov 7, 2019

@humphd Since I'm not using request anymore and am using bent now I have to change a bit of the code. Mainly, I'm having an issue piping the data to feedparser(or rather I don't know how). Any tips?

Here's what I have so far

const FeedParser = require('feedparser');
const request = require('bent'); // for fetching the feed

module.exports = async function(feed){
    const feedparser = new FeedParser();

    // The two req functions here don't work, because we don't use the request package anymore... I'm not even sure we need this
    const req = await request(feed);
    req.on('error', (error)=>{
        Promise.reject(error)
    });
    req.on('response', function (res) {
        const stream = this; // `this` is `req`, which is a stream

        if (res.statusCode !== 200) {
            this.emit('error', new Error('Bad status code'));
        }
        else {
            stream.pipe(feedparser);
        }
    });

    // Alternative implementation of the lines 8-21
    const getStream = bent(feed, 'GET', 'string', 200);
    const response = await getStream().catch((error)=>{
        Promise.reject(error);
    })

    // Lost on how to get the stream into parser... I'm actually just kind of confused on how to use the feedparser in general without piping through request.

    feedparser.on('error', (error)=>{
        Promise.reject(error);
    });

    feedparser.on('readable', function () {
        // This is where the action is!
        const stream = this; // `this` is `feedparser`, which is a stream
        const meta = this.meta; // **NOTE** the "meta" is always available in the context of the feedparser instance
        let item = stream.read();
        
        // I think we need to combine both meta and item to be combined and sent back
        Promise.resolve(item);
    });
}

@humphd
Copy link
Contributor

humphd commented Nov 7, 2019

@c3ho this needs to be commits on the PR, not a comment. Please submit this the appropriate way so that we can work through the code together.

RE: bent vs. request, please see the docs, which discuss how to get a stream like request does:

const bent = require('bent')
const getStream = bent('http://site.com')
let stream = await getStream('/json.api')

Regarding the way you're doing the Promise code, this won't work. Your function has to return a single new Promise:

return new Promise((resolve, reject) => {
 
});

The resolve and reject function args are what you use vs. Promise.resolve() and Promise.reject(). If you look at the link I provided above, it shows exactly how to do this https://github.com/alabeduarte/feedparser-promised/blob/master/src/feedParserPromised.js.

Copy link
Contributor

@humphd humphd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's prioritize getting this finished, since it's blocking other things. I'm happy to discuss today on slack or here if you want.

package.json Outdated Show resolved Hide resolved
package.json Outdated
@@ -13,7 +13,16 @@
},
"homepage": "https://github.com/Seneca-CDOT/telescope#readme",
"dependencies": {
"addressparser": "^1.0.1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still outstanding, please remove.

src/feed-parser.js Outdated Show resolved Hide resolved
src/feed-parser.js Outdated Show resolved Hide resolved
src/feed-parser.js Outdated Show resolved Hide resolved
src/feed-parser.js Outdated Show resolved Hide resolved
package.json Outdated Show resolved Hide resolved
src/feed-parser.js Outdated Show resolved Hide resolved
@c3ho c3ho requested a review from humphd November 8, 2019 21:37
@c3ho c3ho force-pushed the issue-16 branch 2 times, most recently from ef1b23a to 0b90289 Compare November 9, 2019 00:14
@c3ho
Copy link
Contributor Author

c3ho commented Nov 9, 2019

@lucacataldo In the end we ended up using feedparser-promised as you suggested. Thank you!

humphd
humphd previously approved these changes Nov 9, 2019
Copy link
Contributor

@humphd humphd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this locally with the following modifications to the current feed-worker.js code:

const feedQueue = require('./feed-queue');
const feedParser = require('./feed-parser');

exports.start = function () {
  // Start processing jobs from the feed queue...
  feedQueue.process(async (job) => {
    const { url } = job.data;
    console.log(`Processing job - ${url}`);

    try {
      const feed = await feedParser(url);
      feed.forEach((item) => {
        // Grab a few things off the feed for display
        const { title, description, date } = item;
        console.log(`${title} (${date})`);
        console.log(`${description}\n\n`);
      });
    } catch (err) {
      console.error(`Error parsing feed=${url}: ${err.message}`);
    }
  });
};

This works nicely. Great work @c3ho, and thanks to @lucacataldo for finding this promise wrapped version of the library, which turned out to be the right move after all. Apologies for delaying us from using it earlier.

NOTE: before this lands, we need the package.json conflict fixed. My approval is pending and assuming that change is coming.

Copy link
Contributor

@humphd humphd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

@humphd humphd merged commit 8e16281 into Seneca-CDOT:master Nov 9, 2019
Main automation moved this from In progress/Review to Done Nov 9, 2019
@c3ho c3ho deleted the issue-16 branch November 19, 2020 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Main
Done
Development

Successfully merging this pull request may close these issues.

Develop a worker to parse contents from the blog's url
5 participants