Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Bypass Cloudflare DDoS protection when fetching feeds
Sometimes when a page behind Cloudflare is under too much traffic, Cloudflare responds with a specially crafted page; this may happen with the feed fetch url as well, if it's behind Cloudflare. The response has an HTTP 503 error code, which RestClient and other simple (non-js enabled) HTTP clients interpret as an unrecoverable error, stopping them from actually getting the feed. However the returned page has a js that after some time makes certain validations (not sure about the details) and finally redirects the browser to the actual page requested. So, to fetch pages that have this DDoS protection active, we must: - detect that FeedBunch is getting a 503 error but the page contains Cloudflare code - in this case, try fetching the feed again but using a full-featured browser (chrome-headless) so that the js code can run The page fetched with chrome-headless is then treated the same as we would with a successful RestClient response. We use chrome-headless only if FeedBunch thinks a page is behind Cloudflare DDoS protection, because it's slower and more resource-intensive than using a simple HTTP client like RestClient.
- Loading branch information