Skip to content
Switch branches/tags
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

PyPocketExplore - Unofficial API to Pocket Explore data

PyPocketExplore is a CLI-based and web-based API to access Pocket Explore data. It can be used to collect data about the most popular Pocket items for different topics.

An example usage would be crawling the data and use it as a training set to predict the number of pocket saves for a web page.


The easiest way to install the package is through PyPi. This should get you up-and-running pretty quickly.

$ pip install PyPocketExplore

Through the CLI there are two modes: topic and batch

With the first one (pypocketexplore topic) you can download items from specific topics and output them to a nicely formatted JSON file.

Usage: pypocketexplore topic [OPTIONS] [LABEL]...

  Download items for specific topics

  --limit INTEGER  Limit items to download
  --out TEXT       JSON output filepath
  --nlp            If set, also downloads the page and applies NLP (through

For example, this command

$ pypocketexplore topic python data sex books --nlp --out life_topics.json

will go through the corresponding pages:,,, one-by-one and then:

  • scrap and extract the immediately available data for each item (item_id, title, save count, excerpt and url)
  • run each item url through the awesome Newspaper library (in-parallel)
  • apply NLP to each item's text
  • save the results to life_topics.json

In the end you'll have a rich dataset full of text to play with and of course a popularity metric - pretty cool to experiment with. You can check it out here

For each topic on Pocket Explore, there are a set of related topics which one can crawl through pretty easily in a recursive way. For example after scraping on can then scrap the related topics: programming javascript google windows java linux data science python 3 developer.

This essentially means that one can crawl through the whole graph of topics by following the related topics as edges. To do this one of course needs a set of seed topics to initiate the crawling process. To get these seeds, the pypocketexplore batch mode fetches the taxonomy labels provided by IBM Watson. and then walks through the graph. (I guess Pocket uses the IBM Watson to label its items, so this kind of reverse-engineering make sense. (Sorry Pocket guys) )

Usage: pypocketexplore batch [OPTIONS]

  Download items for all topics recursively.  USE WITH CAUTION!

  --n INTEGER      Max number of total items to download
  --limit INTEGER  Limit items to download per topic
  --out TEXT       JSON output filepath
  --nlp            If set, also downloads the page and applies NLP (through
  --mongo TEXT     Mongo DB URI to save items
  --help           Show this message and exit.

CAUTION This mode with all goodies enabled will take few days to run and then collect around 300k unique items through 8k topics. I have tried to space the requests to Pocket's servers and handle rate limit errors, but one can never be sure with such things.


To have access to a standalone web API you need to clone the repo locally first.

$ git clone
$ cd PyPocketExplore
$ pip install -r requirements.txt

To run this API application, use the flask command as same as Flask Quickstart

$ cd PyPocketExplore
$ export FLASK_APP=./PyPocketExplore/pypocketexplore/api/
$ export FLASK_DEBUG=1 ## if you run in debug mode.
$ flask run
 * Running on http://localhost:5000/

Web API Documentation


  • GET /api/topic/{topic} - Get topic data

Example topics: python, finance, business and more

Example GET /api/topic/python


        "excerpt": "For part 1, see here. All the software written for this project is in Python. I’m not an expert python programmer, far from it but the huge number of available libraries and the fact that I can make some sense of it all without having spent a lifetime in Python made this a fairly obvious choice.",
        "image": ""url"",
        "item_id": "1731527024",
        "saves_count": 223,
        "title": "Sorting 2 Tons of Lego, The software Side · Jacques Mattheij",
        "topic": "python",
        "url": ""
        "excerpt": "There are lots of free resources for learning Python available now. I wrote about some of them way back in 2013, but there’s even more now then there was then! In this article, I want to share these resources with you.",
        "image": ""url"",
        "item_id": "1727350036",
        "saves_count": 59,
        "title": "Free Python Resources",
        "topic": "python",
        "url": ""
        "excerpt": "A surprisingly versatile Swiss Army knife — with very long blades!TL;DRWe (an investment bank in the Eurozone) are deploying Jupyter and the Python scientific stack in a corporate environment to provide employees and contractors with an interactive computing environment with to help them leve",
        "image": ""url"=https%3A%2F%2Fcdn-"image"",
        "item_id": "1726489646",
        "saves_count": 41,
        "title": "Jupyter & Python in the corporate LAN",
        "topic": "python",
        "url": ""


Copyright (c) 2017 Florents Tselai Licensed under the MIT license.


PyPocketExplore - Unofficial API to Pocket Explore data




No packages published