Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #16: basic feed parser #37

Merged
merged 1 commit into from Nov 9, 2019
Merged

Fixes #16: basic feed parser #37

merged 1 commit into from Nov 9, 2019

Conversation

@c3ho
Copy link
Collaborator

c3ho commented Nov 5, 2019

Fixes #16

Hi, this is a extremely basic feed-parser that will just spit out all the posts from the feed provided to console.

@c3ho c3ho mentioned this pull request Nov 5, 2019
@c3ho

This comment has been minimized.

Copy link
Collaborator Author

c3ho commented Nov 5, 2019

Tested with both types of feed:
Atom 1.0: https://blogname.blogspot.com/feeds/posts/default
RSS 2.0: https://blogname.blogspot.com/feeds/posts/default?alt=rss

I wrote a simple test.js file using the code below which you can use to play around.

const feedParser = require('./feed-parser.js');
const feed = feedParser('https://c3ho.blogspot.com/feeds/posts/default/-/open-source');
console.log(feed);

At the moment in the feed-parser.js file I put down console.log(items) , but it can also tracks metadata which you can check out with console.log(meta). The list is pretty exhaustive and can be found at the bottom of this document here

Copy link
Collaborator

ImmutableBox left a comment


src/parser.js has the same code as src/feed-parser.js.

Like you mention in your comment, the code I'm assuming is suppose to be in src/parser.js is:
const feedParser = require('./feed-parser.js');
const feed = feedParser('https://c3ho.blogspot.com/feeds/posts/default/-/open-source');
console.log(feed);


Also there should be a newline at the end of each JavaScript file, as @humphd mention in class can cause problems when building the project in different environments/systems.


Overall great work 👍! I did some testing and the code works fine on my end, just have to apply fixes and this should be a good starting point when merged. I also wanted to test htmlparser2 since it seems to have pretty good performance/coverage but this is for another issue.

Copy link
Contributor

humphd left a comment

Looking good. Let's rework this into an async/Promise based function so we can use it in our pipeline. If you need help with this, let use know and we can help guide you on making the changes.

"bull": "^3.11.0"
"bull": "^3.11.0",
"feedparser": "^2.2.9",
"lodash.assign": "^4.2.0",

This comment has been minimized.

Copy link
@humphd

humphd Nov 5, 2019

Contributor

It doesn't look like any of these lodash.* deps are being used below, let's remove them. Same for mri and readable-stream.

"lodash.uniq": "^4.5.0",
"mri": "^1.1.4",
"readable-stream": "^3.4.0",
"request": "^2.88.0"

This comment has been minimized.

Copy link
@humphd

humphd Nov 5, 2019

Contributor

The request module isn't maintained anymore, so I've switched to using bent above. You can remove this.

@@ -0,0 +1,36 @@
const FeedParser = require('feedparser');
const request = require('request'); // for fetching the feed

This comment has been minimized.

Copy link
@humphd

humphd Nov 5, 2019

Contributor

You can swap out bent for this, see docs here https://github.com/mikeal/bent. Basically you can use bent(feed) instead of request(feed).

const FeedParser = require('feedparser');
const request = require('request'); // for fetching the feed

module.exports = function(feed){

This comment has been minimized.

Copy link
@humphd

humphd Nov 5, 2019

Contributor

Let's make this function async and have it return a Promise, since we need to do work that will go beyond the lifetime of this call stack.

const req = request(feed);

req.on('error', function (error) {
// handle any request errors

This comment has been minimized.

Copy link
@humphd

humphd Nov 5, 2019

Contributor

We can call reject(error) on our promise here.

const stream = this; // `this` is `feedparser`, which is a stream
const meta = this.meta; // **NOTE** the "meta" is always available in the context of the feedparser instance
let item = stream.read();
console.log(item);

This comment has been minimized.

Copy link
@humphd

humphd Nov 5, 2019

Contributor

Here you can resolve(item) and give back the result for our Promise.

@@ -13,7 +13,16 @@
},
"homepage": "https://github.com/Seneca-CDOT/telescope#readme",
"dependencies": {
"addressparser": "^1.0.1",

This comment has been minimized.

Copy link
@humphd

humphd Nov 5, 2019

Contributor

This also seems unused, so you can remove it as well.

This comment has been minimized.

Copy link
@humphd

humphd Nov 8, 2019

Contributor

This is still outstanding, please remove.

@ImmutableBox

This comment has been minimized.

Copy link
Collaborator

ImmutableBox commented Nov 5, 2019

@c3ho If you need any help, you can delegate some issues to me 👍 .

@Reza-Rajabi Reza-Rajabi added this to Review in progress in Main Nov 6, 2019
@lucacataldo

This comment has been minimized.

Copy link
Collaborator

lucacataldo commented Nov 6, 2019

Just a note but some issues can be alleviated and code simplified by using a promise-wrapped version of feedparser found at feedparser-promised on npm. Might be worth looking into as it'll do a lot of the work for us here.

@humphd

This comment has been minimized.

Copy link
Contributor

humphd commented Nov 7, 2019

@lucacataldo, we could use it, but if you look at the code, it's essentially what I'm suggesting above, and then we don't add another dep. https://github.com/alabeduarte/feedparser-promised/blob/master/src/feedParserPromised.js.

@c3ho let's move this PR forward ASAP, as it's blocking the rest of the work we need to do. Maybe we can discuss in class today.

@birtony birtony changed the title basic feed parser Fixes #16: basic feed parser Nov 7, 2019
@c3ho

This comment has been minimized.

Copy link
Collaborator Author

c3ho commented Nov 7, 2019

@humphd Since I'm not using request anymore and am using bent now I have to change a bit of the code. Mainly, I'm having an issue piping the data to feedparser(or rather I don't know how). Any tips?

Here's what I have so far

const FeedParser = require('feedparser');
const request = require('bent'); // for fetching the feed

module.exports = async function(feed){
    const feedparser = new FeedParser();

    // The two req functions here don't work, because we don't use the request package anymore... I'm not even sure we need this
    const req = await request(feed);
    req.on('error', (error)=>{
        Promise.reject(error)
    });
    req.on('response', function (res) {
        const stream = this; // `this` is `req`, which is a stream

        if (res.statusCode !== 200) {
            this.emit('error', new Error('Bad status code'));
        }
        else {
            stream.pipe(feedparser);
        }
    });

    // Alternative implementation of the lines 8-21
    const getStream = bent(feed, 'GET', 'string', 200);
    const response = await getStream().catch((error)=>{
        Promise.reject(error);
    })

    // Lost on how to get the stream into parser... I'm actually just kind of confused on how to use the feedparser in general without piping through request.

    feedparser.on('error', (error)=>{
        Promise.reject(error);
    });

    feedparser.on('readable', function () {
        // This is where the action is!
        const stream = this; // `this` is `feedparser`, which is a stream
        const meta = this.meta; // **NOTE** the "meta" is always available in the context of the feedparser instance
        let item = stream.read();
        
        // I think we need to combine both meta and item to be combined and sent back
        Promise.resolve(item);
    });
}
@humphd

This comment has been minimized.

Copy link
Contributor

humphd commented Nov 7, 2019

@c3ho this needs to be commits on the PR, not a comment. Please submit this the appropriate way so that we can work through the code together.

RE: bent vs. request, please see the docs, which discuss how to get a stream like request does:

const bent = require('bent')
const getStream = bent('http://site.com')
let stream = await getStream('/json.api')

Regarding the way you're doing the Promise code, this won't work. Your function has to return a single new Promise:

return new Promise((resolve, reject) => {
 
});

The resolve and reject function args are what you use vs. Promise.resolve() and Promise.reject(). If you look at the link I provided above, it shows exactly how to do this https://github.com/alabeduarte/feedparser-promised/blob/master/src/feedParserPromised.js.

Copy link
Contributor

humphd left a comment

Let's prioritize getting this finished, since it's blocking other things. I'm happy to discuss today on slack or here if you want.

package.json Outdated Show resolved Hide resolved
@@ -13,7 +13,16 @@
},
"homepage": "https://github.com/Seneca-CDOT/telescope#readme",
"dependencies": {
"addressparser": "^1.0.1",

This comment has been minimized.

Copy link
@humphd

humphd Nov 8, 2019

Contributor

This is still outstanding, please remove.

src/feed-parser.js Outdated Show resolved Hide resolved
src/feed-parser.js Outdated Show resolved Hide resolved
src/feed-parser.js Outdated Show resolved Hide resolved
src/feed-parser.js Outdated Show resolved Hide resolved
package.json Outdated Show resolved Hide resolved
src/feed-parser.js Outdated Show resolved Hide resolved
@c3ho c3ho requested a review from humphd Nov 8, 2019
@c3ho c3ho force-pushed the c3ho:issue-16 branch 2 times, most recently from ef1b23a to 0b90289 Nov 8, 2019
@c3ho

This comment has been minimized.

Copy link
Collaborator Author

c3ho commented Nov 9, 2019

@lucacataldo In the end we ended up using feedparser-promised as you suggested. Thank you!

Copy link
Contributor

humphd left a comment

I tested this locally with the following modifications to the current feed-worker.js code:

const feedQueue = require('./feed-queue');
const feedParser = require('./feed-parser');

exports.start = function () {
  // Start processing jobs from the feed queue...
  feedQueue.process(async (job) => {
    const { url } = job.data;
    console.log(`Processing job - ${url}`);

    try {
      const feed = await feedParser(url);
      feed.forEach((item) => {
        // Grab a few things off the feed for display
        const { title, description, date } = item;
        console.log(`${title} (${date})`);
        console.log(`${description}\n\n`);
      });
    } catch (err) {
      console.error(`Error parsing feed=${url}: ${err.message}`);
    }
  });
};

This works nicely. Great work @c3ho, and thanks to @lucacataldo for finding this promise wrapped version of the library, which turned out to be the right move after all. Apologies for delaying us from using it earlier.

NOTE: before this lands, we need the package.json conflict fixed. My approval is pending and assuming that change is coming.

@c3ho c3ho force-pushed the c3ho:issue-16 branch from 0b90289 to f468a4c Nov 9, 2019
@c3ho c3ho force-pushed the c3ho:issue-16 branch from f468a4c to a43464c Nov 9, 2019
@humphd
humphd approved these changes Nov 9, 2019
Copy link
Contributor

humphd left a comment

Nice.

@humphd humphd merged commit 8e16281 into Seneca-CDOT:master Nov 9, 2019
Main automation moved this from In progress/Review to Done Nov 9, 2019
@Reza-Rajabi Reza-Rajabi added the on-board label Nov 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Main
Done
5 participants
You can’t perform that action at this time.