-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting older articles #245
Comments
@tehnar can you provide a specific example? |
@yprez Also, there is an additional trouble with this website: if I try some other blog (for example, http://blog.jetbrains.com/pycharm/), I'll get 0 articles. I've managed to fix it by manually deleting contents of ~/.newspaper/feed_category_cache, but it's a strange hack. |
Did you try disabling cache? I'm getting 127 articles from http://blog.jetbrains.com/ruby/, not sure if it's all of them or not. |
@yprez |
Also, publish dates are not extracted properly for all the articles. |
Funny, but the bug with parsing the date is actually in the 2nd article where it succeeds... It parses the date from the URL (I couldn't find any meta date attributes in these articles), so 2015/12/20 part is parsed as a date, and the result is wrong too (20/05/2015 instead of the 22nd) |
@yprez |
There was a ticket about trying to find dates with regex within the article - #168, and a closed pull-request somewhere. Can't really get it from the URL if it only has the year and month. Regarding the amount of articles, newspaper combines several strategies to get a list of articles, e.g. links from the page, categories, rss feeds, etc. It doesn't go over the pages of paginated results... To get all the articles you would need to paginate over all result pages, I don't think that's currently possible with newspaper... |
Ok, I've got it. Thank for the clarification. |
Hello, is there any way to get more articles from a particular website? I get only the latest (amount and beginning date depends on site) news, not all of them. Caching is disabled, but it doesn't help.
The text was updated successfully, but these errors were encountered: