-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
is it possible to get the entire article (as rendered by Pocket), not just the 'excerpt' ? #6
Comments
Article Content API - Unfortunately, pocket does not provide extracted article content to api users without partner privileges. I'm open to other ideas though. Maybe use a custom extraction method, via BeautifulSoup, or something? |
Thanks for your attention to this detail ! But I'm still confused how there seems to be other ways to get the 'whole article' text directly from Pocket.For example with calibre, http://calibre-ebook.com , and it's python 'news recipe' scripts called 'readitlater.recipe' (1) I'm no python expert, I can barely code some shell scripts and grasp a little bit of python. When i used this like for example, I can get either a text file, or just a bunch of html files, (1) |
Interesting. I'll check this out and think of a possible lightweight implementation. Do you have the time to work on this, by any chance? |
Cool ! Thanks for your interest.
Time yes, unfortunately not the skills to do it. |
Newspaper seems to provide Pocket-like functionality. If this seems like a good enough alternative, I'm willing to integrate it. Thoughts? EDIT: Actually, the PyPi distribution of Instead, a better alternative seems to be readability-lxml. Significantly lighter and simpler to use. |
I'm hacking away on this right now. Let's see how it goes. EDIT: See #7. |
Oops, almost forgot. The reason I'm not considering the scripts you linked to is:
However, if this solution isn't good enough, I might reconsider. |
Is there any hack to this? I am going back to a historic collection of articles and what I have found is that the articles have been taken down by the news sites... I would think even saving the HTML response of the article at that time and store it into the DB will help tremendously |
@billlyzhaoyh - Good use-case, I had a bit of time to hack on this today. Managed to get HTML archiving working in 1.4.0. eg - Get all favorited items and save offline copies of them: Let me know if it works for you. |
Closing as stale. |
I know that for example,
to to get the latest 5 items' links & excerpts and save them to a file:
pockyt get -n 5 -f '{link} - {excerpt}' -o readlater.txt
works
Is is also possible to get the entire article, as it is displayed and rendered on the pocket website ?
I mean just the extracted text, stored on the Pocket.
I dont want to download from the original server and extract the text on my computer again.
The text was updated successfully, but these errors were encountered: