# Pocket Article Downloader

Want to know how many articles you've read and have to read in Pocket? 

This code provides setup and authorization to Pocket's API. We then pull your read and unread articles and export them to CSV. 

For more info and additional configuration, See [Pocket API Documentation](https://getpocket.com/developer/docs/overview)

-----

# Authentication and Pocket Developer Setup

Note: This setup may take a few minutes. Code is indebted to [What’s in your Pocket? Visualizing your Reading List with Python](https://www.twilio.com/blog/2017/09/whats-in-your-pocket-visualizing-your-reading-list-with-python.html). If any issues, refer to that article for screenshots and more details on setup. 

### Step 1: Intial Developer Setup 

* Create an app on Pocket's Developer API Portal: https://getpocket.com/developer/apps/new
* Ensure you add retrieval permission
* Copy your Consumer Key and add to either option 1 or 2
* Option 1 (Easiest but less secure): Copy keys and store in notebook   
* Option 2 (More Secure since not stored in notebook): Copy sample-credentials.json, create credentials and add keys

In [1]:
# Option 1 (Easiest but less secure):  
# Copy your keys here after each step

# CONSUMERKEY = 'add code here'
# REQUESTCODE = 'add code here'
# ACCESSTOKEN = 'add code here'

In [2]:
# Option 2 (More Secure since not stored in notebook): 
# Copy sample-credentials.json, create credentials 
# Uncomment lines below and add code after each step
# Copy your keys here after each step

# import json

# with open("credentials.json", "r") as file:
#    credentials = json.load(file)
#    pocket_cr = credentials['pocket']
#    CONSUMERKEY = pocket_cr['CONSUMERKEY'] # step 1 your consumer key
#    REQUESTCODE = pocket_cr['REQUESTCODE'] # step 2 your request token
#    ACCESSTOKEN = pocket_cr['ACCESSTOKEN'] # step 4 your access token

In [3]:
# Step 2

# import requests
# pocket_api = requests.post('https://getpocket.com/v3/oauth/request', 
#                           data = {'consumer_key': CONSUMERKEY, 
#                                   'redirect_uri':'https://google.com'})

# uncomment line below to see your request code
# pocket_api.text

In [4]:
# Step 3: 

# After modify URL to add your code and visit: 
# Visit: https://getpocket.com/auth/authorize?request_token=[Your-Request-Code]&redirect_uri=https://google.com
# Copy your request code

In [5]:
# Step 4: 

#import requests
#pocket_auth = requests.post('https://getpocket.com/v3/oauth/authorize', 
#                            data = {'consumer_key': CONSUMERKEY, 
#                                    'code': REQUESTCODE})

# uncomment line below to see your access token code
# pocket_auth.text

------

# Get and Export Current, Unread Articles

In [6]:
from pocket import Pocket, PocketException
import json
import pandas as pd

In [7]:
# If first time running script, please read "Authentication and Pocket Developer Setup" 
# and follow steps above to update your keys and tokens

with open("credentials.json", "r") as file:
   credentials = json.load(file)
   pocket_cr = credentials['pocket']
   CONSUMERKEY = pocket_cr['CONSUMERKEY'] # step 1 your consumer key
   REQUESTCODE = pocket_cr['REQUESTCODE'] # step 2 your request token
   ACCESSTOKEN = pocket_cr['ACCESSTOKEN'] # step 4 your access token

In [8]:
# Setup Pocket Object

p = Pocket(
 consumer_key=CONSUMERKEY,
 access_token=ACCESSTOKEN
)

In [9]:
# Retrieve all unread
articles_dict = {}
more_articles = True
offset = 0

# Get initial 5000 articles 
lis = p.get(state="unread", count=5000)
articles_dict.update(lis[0]['list'])

unread_articles = pd.DataFrame.from_dict(articles_dict, orient='index')

unread_articles['time_added'] = pd.to_datetime(unread_articles['time_added'], unit='s')
unread_articles['time_updated'] = pd.to_datetime(unread_articles['time_updated'], unit='s')
# unread_articles['time_read'] = pd.to_datetime(unread_articles['time_read'], unit='s')

In [10]:
# unread count
len(unread_articles)
# unread_articles.head()

477

In [11]:
# export to csv
unread_articles.to_csv('data/pocket_unread_articles.csv')

----

# Get and Export Read Articles

In [12]:
# Get Your Oldest Article in Pocket

oldest_date = ''
oldest_art = p.get(state="archive", count=1, sort='oldest')
oldest_article = oldest_art[0]['list']
for i in oldest_article:
    oldest_date = oldest_article[str(i)]['time_added']

# print(oldest_date) 
# oldest_article

In [13]:
# Retrieve all readings since oldest date

articles_dict = {}
more_articles = True
offset = 0

# Get initial 5000 articles 
lis = p.get(since=oldest_date, state="archive", count=5000, sort='oldest')
articles_dict.update(lis[0]['list'])

# Loop
while more_articles == True:
    if lis[0]['list'] == []:
        more_articles = False
    else:
        offset = offset + 5000
        lis = p.get(since=oldest_date, state="archive", count=5000, sort='oldest', offset=offset)
        articles_dict.update(lis[0]['list'])

In [14]:
# create dataframe
read_articles = pd.DataFrame.from_dict(articles_dict, orient='index')

# convert unix time to datetime
read_articles['time_added'] = pd.to_datetime(read_articles['time_added'], unit='s')
read_articles['time_updated'] = pd.to_datetime(read_articles['time_updated'], unit='s')
read_articles['time_favorited'] = pd.to_datetime(read_articles['time_favorited'], unit='s')
read_articles['time_read'] = pd.to_datetime(read_articles['time_read'], unit='s')

In [15]:
# read_articles.columns
# read_articles.info()

In [16]:
# total read
len(articles_dict)

6080

In [17]:
# save to csv
read_articles.to_csv("data/pocket_read_articles.csv")