Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Since recently been unable to parse Reddit's web feeds (RSS) #14

Closed
why-not-try-calmer opened this issue Oct 14, 2021 · 2 comments
Closed

Comments

@why-not-try-calmer
Copy link

why-not-try-calmer commented Oct 14, 2021

import requests
import atoma
response = requests.get("https://www.reddit.com/r/<insert subreddit here>.rss")
decoded = response.content
parsed = atoma.parse_atom_bytes(decoded)

will yield
raise FeedXMLError('Not a valid XML document')
It used to work flawlessly. I'll look into the details I can get when debugging and update this Issue accordingly.

@NicolasLM
Copy link
Owner

Hi, you should check the status code of the response. It seems that reddit quickly returns 429 errors together with an HTML body, which fails to be parsed.

@why-not-try-calmer
Copy link
Author

Salut Nicolas, thank you for keeping a vigilant eye on your issues! I had 200 status codes, however with probably truncated contents due to a way I was sending requests. Your library is very likely not the culprit. Will get back to you when I have time to put my finger exactly on what went wrong. Take care!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants