# Instapaper Downloader

Want to know how many articles you've read and have to read in Instaspaper? Want to collect your highlighted passages? Get your Bookmarks, Articles, and Highlights Data from Instapaper!

This code integrates with [Instapaper's API](https://www.instapaper.com/api). See [Instapaper's API Terms](https://www.instapaper.com/api/terms) for more info on terms of usage.  It's part of [QS Ledger](https://github.com/markwk/qs_ledger). 

-----

In [13]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [22]:
!ls /content/drive/'My Drive/Colab Notebooks/instapaper'

credentials.json  instapaper_downloader.ipynb


## Requirements:

* Pandas. Install with command: `$ pip install pandas`
* [PyInstapaper](https://github.com/mdorn/pyinstapaper): Install with command: `$ pip install pyinstapaper`

In [0]:
!pip install pandas



In [0]:
!pip install pyinstapaper

Collecting pyinstapaper
  Downloading https://files.pythonhosted.org/packages/5e/0b/0883ada9692b8398faf35cb20b8c3ca79cde71ec43f693beb1c453166462/pyinstapaper-0.2.2-py2.py3-none-any.whl
Collecting oauth2<2,>=1.9
  Downloading https://files.pythonhosted.org/packages/a0/6f/86db603912ecd04109af952c38bc08928886cf0e34c723481fa7db98b4b5/oauth2-1.9.0.post1-py2.py3-none-any.whl
Collecting lxml<=4,>=3.4
[?25l  Downloading https://files.pythonhosted.org/packages/a0/b5/4c6995f8f259f0858f79460e6d277888f8498ce1c1a466dfbb24f06ba83f/lxml-4.0.0-cp36-cp36m-manylinux1_x86_64.whl (5.3MB)
[K     |████████████████████████████████| 5.3MB 5.1MB/s 
Installing collected packages: oauth2, lxml, pyinstapaper
  Found existing installation: lxml 4.2.6
    Uninstalling lxml-4.2.6:
      Successfully uninstalled lxml-4.2.6
Successfully installed lxml-4.0.0 oauth2-1.9.0.post1 pyinstapaper-0.2.2


In [0]:
# dependencies
import pandas as pd
from pyinstapaper.instapaper import Instapaper, Folder

-----

## Instapaper Developer Setup and Authentification

Note: Once you get your app approved by Instapaper, this actual setup should only take a few minutes. 

### Step 1: Request Developer Access 

* Create an app and request Oauth Consumer token on Instapaper's Developer API: https://www.instapaper.com/main/request_oauth_consumer_token. 
* NOTE: This may take a day or more depending on human review. 

### Step 2: Add Credentials to credentials.json

* Clone sample-credentials.json and save as credentials.json
* Copy your Consumer ID and Consumer Secret to credentials.json
* Copy your login and password to credentials.json.

In [0]:
# get stored credentials
import json

with open("/content/drive/My Drive/Colab Notebooks/instapaper/credentials.json", "r") as file:
   credentials = json.load(file)
   instapaper_cr = credentials['instapaper']
   CONSUMERID = instapaper_cr['CONSUMERID'] # step 1 your consumer key
   CONSUMERSECRET = instapaper_cr['CONSUMERSECRET'] # step 1 your consumer key
   INSTAPAPER_LOGIN = instapaper_cr['LOGIN']
   INSTAPAPER_PASSWORD = instapaper_cr['PASSWORD']

In [0]:
# api login 
instapaper = Instapaper(CONSUMERID, CONSUMERSECRET)
instapaper.login(INSTAPAPER_LOGIN, INSTAPAPER_PASSWORD)

-----

## Get Unread Articles 

In [27]:
# get unread
(print("Getting unread bookmarks from Instapaper"))
unread = instapaper.get_bookmarks('unread', limit=500)

unread_list = []
for i in unread:
    unread_dict = {
        'bookmark_id': i.bookmark_id,
        'title': i.title,
        'url': i.url,
        'progress_timestamp': i.progress_timestamp,
        'time': i.time,
        'progress': i.progress,
        'starred': i.starred,
        'type': i.type,
        'private_source': i.private_source,
        'read_status': 'unread'
    }
    unread_list.append(unread_dict)

# total unread
print("{} unread articles in Instapaper".format(len(unread_list)))

# create df and export 
unread_df = pd.DataFrame(unread_list)
unread_df.to_csv("/content/drive/My Drive/Colab Notebooks/instapaper/data/instapaper_unread.csv", index=False)
print("Exported to CSV")

Getting unread bookmarks from Instapaper
500 unread articles in Instapaper
Exported to CSV


-----

## Get Read Articles 

In [28]:
# get read
(print("Getting read bookmarks from Instapaper"))
read = instapaper.get_bookmarks('archive', limit=500)

read_list = []
for i in read:
    read_dict = {
        'bookmark_id': i.bookmark_id,
        'title': i.title,
        'url': i.url,
        'progress_timestamp': i.progress_timestamp,
        'time': i.time,
        'progress': i.progress,
        'starred': i.starred,
        'type': i.type, 
        'private_source': i.private_source,
        'read_status': 'read'
    }
    read_list.append(read_dict)

# create df and export 
read_df = pd.DataFrame(read_list)
read_df.to_csv("/content/drive/My Drive/Colab Notebooks/instapaper/data/instapaper_read.csv", index=False)
print("Exported {} read articles from Instapaper".format(len(read_list)))

Getting read bookmarks from Instapaper
Exported 500 read articles from Instapaper


------

## Get Highlights

In [0]:
# NOTE: Unfortunately there is no direct method to get just the highlights 
# So we are looping through each archived bookmark item and running get_highlights()

print("Checking for Highlights from each bookmark...")
print("This might take some time.")

highlights_list = []

for bm in read:
    highlights = bm.get_highlights()
    for i in highlights:
        highlight_dict = {
            'highlight_id': i.highlight_id,
            'text': i.text,
            'note': i.note,
            'time': i.time,
            'position': i.position,
            'bookmark_id': i.bookmark_id,
            'type': i.type,
            'slug': i.slug,
        }
        highlights_list.append(highlight_dict)

higlights_df = pd.DataFrame(highlights_list)
higlights_df.to_csv("data/instapaper_highlights.csv", index=False)
print("Exported {} Highlights from Instapaper".format(len(higlights_df)))

Checking for Highlights from each bookmark...
This might take some time.
Exported 285 Highlights from Instapaper
