Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figues out why 304 got marked as invalid #2604

Closed
tpmai22 opened this issue Dec 13, 2021 · 9 comments
Closed

Figues out why 304 got marked as invalid #2604

tpmai22 opened this issue Dec 13, 2021 · 9 comments
Assignees
Labels
area: back-end type: bug Something isn't working
Milestone

Comments

@tpmai22
Copy link
Contributor

tpmai22 commented Dec 13, 2021

What happened:
Refer to #2569

The 304 show up as an invalid when it state that the content has not been modified yet and we would set shouldDowload to false

info.shouldDownload = false;

After my debugging attempt it seem like the 304 get into the parser
https://github.com/rbren/rss-parser/blob/33a9a4281d9c7aae051de148067223dcbdec5ad5/lib/parser.js#L87-L89

With my understand on the project, I will need more time to research and try different debugging technique to find it out

What should have happened:
304 status code should not show up as invalid

@tpmai22 tpmai22 added the type: bug Something isn't working label Dec 13, 2021
@tpmai22 tpmai22 self-assigned this Dec 13, 2021
@humphd
Copy link
Contributor

humphd commented Dec 13, 2021

@tpmai22 I did a whole debugging session for you on Slack to look at why this is, can you include that info here for follow up?

@tpmai22
Copy link
Contributor Author

tpmai22 commented Dec 13, 2021

Debugging session and advices:

run it locally or in gitpod, and change the feed list URL to point to something with only that one feed and set a breakpoint to watch it happen, 304 should be caught earlier and allowed

set a breakpoint at https://github.com/Seneca-CDOT/telescope/blob/master/src/backend/feed/processor.js#L143 to see why 304 doesn't stop the invalid feed going to the parser

we might need to add redirect:'follow' to the fetch() at

response = await fetch(feed.url, addHeaders({ method: 'HEAD' }, feed));

@TDDR TDDR self-assigned this Jan 18, 2022
@humphd humphd added this to the 2.6 Release milestone Jan 30, 2022
@humphd
Copy link
Contributor

humphd commented Jan 30, 2022

Bumping this, as it's causing blogs from Tim, Anatoliy and others to not show on Telescope. We need to prioritize getting this solved ASAP and ship a fix in 2.6.

@TDDR
Copy link
Contributor

TDDR commented Jan 30, 2022

I have spent some time on this with little progress. Here is a thread in slack for some more context.

I'm not the best at using the Visual Studio Code debugger so I would be interested in maybe doing a LiveShare session, or just a old fashion screen share session if someone could help figure it out.

@humphd
Copy link
Contributor

humphd commented Jan 30, 2022

I have spent some time on this with little progress. Here is a thread in slack for some more context.

I'm not the best at using the Visual Studio Code debugger so I would be interested in maybe doing a LiveShare session, or just a old fashion screen share session if someone could help figure it out.

cc'ing back-end folks @TueeNguyen, @nguyenhung15913, @tcvan0707, @AmasiaNalbandian, @joelazwar, @dbelokon, @JerryHue + sheriffs @sirinoks, @aserputov.

Here's how I'd approach this debugging.

We have 2 blogs that we know are not going to Telescope:

[https://dps909tddr.tech.blog/category/opensource/feed/]
name=Tim Roberts

[https://medium.com/feed/@aserputov]
name=Anatoliy Serputov
  1. I'd create a hacked-up local version of https://wiki.cdot.senecacollege.ca/wiki/Planet_CDOT_Feed_List#Feeds that only includes those two feeds, and change your FEED_URL env variable to point to this page. OR, you could just hack

    /*
    * Returns a Promise { <pending> }
    * It downloads the feed data from the Wiki, then processes it into an
    * Array of feed Objects of the following form:
    *
    * {
    * name: "name of user",
    * url: "feed url of user"
    * }
    */
    module.exports = async function () {
    to always return an Array of Objects with those two feeds. Either way, create a backend that only looks at these two feeds.

  2. Either using the debugger, logger, or console.log(), add a bunch of logging to

    /**
    * The processor for the feed queue receives feed jobs, where
    * the job to process is an Object with the `id` of the feed.
    * We expect the Feed to already exist in the system at this point.
    */
    module.exports = async function processor(job) {
    . I think the bug is that we somehow don't return from
    try {
    info = await getFeedInfo(feed);
    // If we get no new version info, there's nothing left to do.
    if (!info.shouldDownload) {
    // Log some common cases we see, with a general message if none of these
    switch (info.status) {
    case 304:
    logger.info(`${info.status} Feed is up-to-date: ${feed.url}`);
    break;
    case 404:
    logger.warn(`${info.status} Feed not found: ${feed.url}`);
    break;
    case 410:
    logger.warn(`${info.status} Feed no longer available: ${feed.url}`);
    break;
    case 429:
    logger.warn(`${info.status} Feed requested too many times, setting delay: ${feed.url}`);
    await feed.setDelayed(process.env.FEED_PROCESSING_DELAY_SEC || 3600);
    break;
    case 500:
    case 599:
    logger.warn(`${info.status} Feed server error: ${feed.url}`);
    break;
    default:
    logger.info(`${info.status} Feed not downloaded: ${feed.url}`);
    break;
    }
    // No posts were processed.
    return;
    }
    and end-up in
    const parser = new Parser(
    addHeaders(
    {
    // ms to wait for a connection to be assumed to have failed
    timeout: 20 * 1000,
    gzip: true,
    customFields: {
    item: [
    ['pubDate', 'pubdate'],
    ['creator', 'author'],
    ['content:encoded', 'contentEncoded'],
    ],
    },
    },
    feed
    )
    );
    , which treats a 304 as an error.

  3. Run the backend npm start and watch it fail.

  4. Get on a Teams call early this week with a few people and try to reproduce this bug in GitPod or locally. I love @TDDR's suggestion above about trying to work on it as a small group, since this isn't super easy to recreate and fix.

  5. Let me know how you make out, and how I can help.

@TueeNguyen TueeNguyen self-assigned this Feb 1, 2022
@humphd
Copy link
Contributor

humphd commented Feb 1, 2022

I can't reproduce this bug locally. I tried changing src/backend/utils/wiki-feed-parser.js to do this vs. download and read the planet feed list:

module.exports = /* async */ function () {
  return Promise.resolve([
    {
      author: 'Tim',
      url: 'https://dps909tddr.tech.blog/category/opensource/feed/',
    },
  ]);
};

Then I set breakpoints in src/backend/feed/processor.js.

I deleted all the Redis data files in ./redis-data/*

Then I ran Elasticsearch and Redis locally:

pnpm services:start redis elasticsearch

Next, I started the backend via VSCode's debugger, see docs.

But I always get a 200.

@TueeNguyen
Copy link
Contributor

I've been trying to debug this but all I've got so far is 200 and occasionally 503.

I let backend run with Tim's feed as the only feed to put Tim's feed into Redis. this response should return status 304 as the etag and lastModified weren't not null (pulled from Redis) but it didn't

response = await fetch(feed.url, addHeaders({ method: 'HEAD' }, feed));

@humphd
Copy link
Contributor

humphd commented Feb 2, 2022

Same for me (mostly 200s, a few 503s).

I'm going to close this as "not reproducible any more." Thanks for looking into it with me.

I'll file another issue to wipe our Redis cache on staging/production, and free @TDDR's blog from the invalid queue.

@humphd humphd closed this as completed Feb 2, 2022
@humphd
Copy link
Contributor

humphd commented Feb 2, 2022

Filed #2800.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: back-end type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants