is it possible to get the entire article (as rendered by Pocket), not just the 'excerpt' ? #6

m040601 · 2015-10-09T00:53:54Z

I know that for example,
to to get the latest 5 items' links & excerpts and save them to a file:
pockyt get -n 5 -f '{link} - {excerpt}' -o readlater.txt
works

Is is also possible to get the entire article, as it is displayed and rendered on the pocket website ?
I mean just the extracted text, stored on the Pocket.
I dont want to download from the original server and extract the text on my computer again.

achembarpu · 2015-10-09T09:45:36Z

Article Content API - Unfortunately, pocket does not provide extracted article content to api users without partner privileges.

I'm open to other ideas though. Maybe use a custom extraction method, via BeautifulSoup, or something?

m040601 · 2015-10-12T03:41:31Z

Thanks for your attention to this detail !
I see what you mean with the api issue, it makes sense.

But I'm still confused how there seems to be other ways to get the 'whole article' text directly from Pocket.For example with calibre, http://calibre-ebook.com , and it's python 'news recipe' scripts called 'readitlater.recipe' (1)

I'm no python expert, I can barely code some shell scripts and grasp a little bit of python.
I was wondering then,
how is it that using that script and calibre's command line tool 'ebook-convert' , http://manual.calibre-ebook.com/cli/ebook-convert.html I do get the entire text of my Pocket articles.

When i used this like for example,
ebook-convert ./readitlater.recipe outputfile.txt --username my-pocket@username.com --password my-pocket-account-password
or
ebook-convert ./readitlater.recipe output.OEB --username my-pocket@username.com --password my-pocket-account-password

I can get either a text file, or just a bunch of html files,
with all my articles exactly as they are rendered by Pocket

(1)
a. as it is distributed when you install calibre,
https://gist.github.com/m040601/a4258870759f9ad8a6ee
it works for me
b. another fork of the same script (that was not working for me)
tbunnyman/ReadItLater-Calibre-Plugin
https://github.com/tbunnyman/ReadItLater-Calibre-Plugin
This is an updated & modified version of the official Calibre plugin for Pocket (Formerly ReadItLater)

achembarpu · 2015-10-16T12:44:54Z

Interesting. I'll check this out and think of a possible lightweight implementation.

Do you have the time to work on this, by any chance?

m040601 · 2015-10-17T11:35:39Z

Cool ! Thanks for your interest.

Do you have the time to work on this, by any chance?

Time yes, unfortunately not the skills to do it.
The only thing I can contribute is with research and feedback, as I like to thoroughly investigate and
compare all the available (python and others) solutions and implementations for this problem.

achembarpu · 2015-10-25T13:25:42Z

Newspaper seems to provide Pocket-like functionality.

If this seems like a good enough alternative, I'm willing to integrate it. Thoughts?

EDIT: Actually, the PyPi distribution of newspaper is outdated, and depends on a lot of heavy libraries - see requirements.

Instead, a better alternative seems to be readability-lxml. Significantly lighter and simpler to use.

achembarpu · 2015-10-25T15:30:27Z

I'm hacking away on this right now. Let's see how it goes.

EDIT: See #7.

achembarpu · 2015-10-25T15:56:43Z

Oops, almost forgot. The reason I'm not considering the scripts you linked to is:

They scrape getpocket.com directly, which is forbidden by their ToS.
Since it's a scrape, the moment Pocket changes their html, it will fail.

However, if this solution isn't good enough, I might reconsider.

billlyzhaoyh · 2020-03-25T17:30:21Z

Is there any hack to this? I am going back to a historic collection of articles and what I have found is that the articles have been taken down by the news sites... I would think even saving the HTML response of the article at that time and store it into the DB will help tremendously

achembarpu · 2020-03-27T21:47:17Z

@billlyzhaoyh - Good use-case, I had a bit of time to hack on this today. Managed to get HTML archiving working in 1.4.0.

eg - Get all favorited items and save offline copies of them:
pockyt get -v 1 -a ./pocket

Let me know if it works for you.

achembarpu · 2021-05-02T14:04:42Z

Closing as stale.

achembarpu added the enhancement label Oct 9, 2015

achembarpu mentioned this issue Oct 25, 2015

Get article content #7

Closed

achembarpu self-assigned this Oct 25, 2015

achembarpu added Epic and removed Epic labels Jun 22, 2017

achembarpu added the Epic label Jun 26, 2017

achembarpu removed the Epic label Jul 18, 2017

achembarpu removed their assignment Nov 22, 2019

achembarpu closed this as completed May 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

is it possible to get the entire article (as rendered by Pocket), not just the 'excerpt' ? #6

is it possible to get the entire article (as rendered by Pocket), not just the 'excerpt' ? #6

m040601 commented Oct 9, 2015

achembarpu commented Oct 9, 2015

m040601 commented Oct 12, 2015

achembarpu commented Oct 16, 2015

m040601 commented Oct 17, 2015

achembarpu commented Oct 25, 2015

achembarpu commented Oct 25, 2015

achembarpu commented Oct 25, 2015

billlyzhaoyh commented Mar 25, 2020

achembarpu commented Mar 27, 2020 •

edited

Loading

achembarpu commented May 2, 2021

is it possible to get the entire article (as rendered by Pocket), not just the 'excerpt' ? #6

is it possible to get the entire article (as rendered by Pocket), not just the 'excerpt' ? #6

Comments

m040601 commented Oct 9, 2015

achembarpu commented Oct 9, 2015

m040601 commented Oct 12, 2015

achembarpu commented Oct 16, 2015

m040601 commented Oct 17, 2015

achembarpu commented Oct 25, 2015

achembarpu commented Oct 25, 2015

achembarpu commented Oct 25, 2015

billlyzhaoyh commented Mar 25, 2020

achembarpu commented Mar 27, 2020 • edited Loading

achembarpu commented May 2, 2021

achembarpu commented Mar 27, 2020 •

edited

Loading