# APIs

The idea of an application programming interface is to enable (or make more efficient) communication among multiple devices.

For DS purposes, you may find yourself on a website wondering how to access its data. Well, you may or may not be able to do this, but, if you can, you'll probably do it through an API.

Reddit's is pretty easy to use. Let's check it out!

In [1]:
import requests
import json
import pandas as pd

In [2]:
URL = 'http://www.reddit.com/hot.json'

In [3]:
requests.get(URL).status_code

429

Well, not quite that easy. The .get() method calls a "user-agent" to access the data, and Reddit won't allow the Python default user-agent in. So we'll create a new user-agent.

In [4]:
# Add in a parameter called 'headers' to the .get() call below and set it equal to
# a dictionary with 'User-agent' as key and '<YOUR NAME> Bot 0.1' as value.

res = requests.get(URL, headers={'User-agent': 'gadamico Bot 0.1'})

res.status_code

200

In [5]:
reddits = res.json()

Let's explore the structure of the Reddit data:

In [6]:
type(reddits)

dict

In [7]:
reddits.keys()

dict_keys(['kind', 'data'])

Let's try the 'data' key:

In [8]:
type(reddits['data'])

dict

Another dictionary! Let's explore this further:

In [9]:
reddits['data'].keys()

dict_keys(['modhash', 'dist', 'children', 'after', 'before'])

Probably 'before' and 'after' have to do with the ordering of Reddit posts. Let's look under 'children':

In [10]:
type(reddits['data']['children'])

list

In [11]:
reddits['data']['children'][0]

{'kind': 't3',
 'data': {'approved_at_utc': None,
  'subreddit': 'worldnews',
  'selftext': '',
  'author_fullname': 't2_4mhxmloq',
  'saved': False,
  'mod_reason_title': None,
  'gilded': 1,
  'clicked': False,
  'title': 'An anti-Putin blogger was murdered in a French hotel, and the killing has the hallmarks of the Russian hit squad causing chaos in Europe',
  'link_flair_richtext': [],
  'subreddit_name_prefixed': 'r/worldnews',
  'hidden': False,
  'pwls': 6,
  'link_flair_css_class': 'russia',
  'downs': 0,
  'thumbnail_height': 70,
  'hide_score': False,
  'name': 't3_ezsdk5',
  'quarantine': False,
  'link_flair_text_color': 'dark',
  'author_flair_background_color': None,
  'subreddit_type': 'public',
  'ups': 45302,
  'total_awards_received': 1,
  'media_embed': {},
  'thumbnail_width': 140,
  'author_flair_template_id': None,
  'is_original_content': False,
  'user_reports': [],
  'secure_media': None,
  'is_reddit_media_domain': False,
  'is_meta': False,
  'category': None

There's a dictionary in here (also, confusingly, called 'data') that looks important. Let's explore that. (Now we're really getting deep!)

In [12]:
reddits['data']['children'][0]['data']

{'approved_at_utc': None,
 'subreddit': 'worldnews',
 'selftext': '',
 'author_fullname': 't2_4mhxmloq',
 'saved': False,
 'mod_reason_title': None,
 'gilded': 1,
 'clicked': False,
 'title': 'An anti-Putin blogger was murdered in a French hotel, and the killing has the hallmarks of the Russian hit squad causing chaos in Europe',
 'link_flair_richtext': [],
 'subreddit_name_prefixed': 'r/worldnews',
 'hidden': False,
 'pwls': 6,
 'link_flair_css_class': 'russia',
 'downs': 0,
 'thumbnail_height': 70,
 'hide_score': False,
 'name': 't3_ezsdk5',
 'quarantine': False,
 'link_flair_text_color': 'dark',
 'author_flair_background_color': None,
 'subreddit_type': 'public',
 'ups': 45302,
 'total_awards_received': 1,
 'media_embed': {},
 'thumbnail_width': 140,
 'author_flair_template_id': None,
 'is_original_content': False,
 'user_reports': [],
 'secure_media': None,
 'is_reddit_media_domain': False,
 'is_meta': False,
 'category': None,
 'secure_media_embed': {},
 'link_flair_text': 'Russia

Suppose we wanted to put all these reddits into a DataFrame ...

In [13]:
# Your code below!

posts = []

for post in reddits['data']['children']:
    posts.append(post['data'])

reddits_df = pd.DataFrame(posts)

reddits_df.head(2)

Unnamed: 0,approved_at_utc,subreddit,selftext,author_fullname,saved,mod_reason_title,gilded,clicked,title,link_flair_richtext,...,permalink,parent_whitelist_status,stickied,url,subreddit_subscribers,created_utc,num_crossposts,media,is_video,link_flair_template_id
0,,worldnews,,t2_4mhxmloq,False,,1,False,An anti-Putin blogger was murdered in a French...,[],...,/r/worldnews/comments/ezsdk5/an_antiputin_blog...,all_ads,False,https://www.businessinsider.com/anti-putin-blo...,23068763,1580996000.0,12,,False,
1,,movies,"I watched that scene, I Need a Hero recently. ...",t2_46uhanzr,False,,2,False,Shrek 2 has the best climax ever.,[],...,/r/movies/comments/eztqbm/shrek_2_has_the_best...,all_ads,False,https://www.reddit.com/r/movies/comments/eztqb...,22240311,1581002000.0,4,,False,


In [14]:
# How many different columns do we have?

reddits_df.head(2).T

Unnamed: 0,0,1
approved_at_utc,,
subreddit,worldnews,movies
selftext,,"I watched that scene, I Need a Hero recently. ..."
author_fullname,t2_4mhxmloq,t2_46uhanzr
saved,False,False
...,...,...
created_utc,1.581e+09,1.581e+09
num_crossposts,12,4
media,,
is_video,False,False


Let's check out [this](https://www.pythonforbeginners.com/api/) site about APIs in Python!

Often when you interact with a bit of software using an API, you'll need an API key. Foursquare's API is a case in point. Let's turn to that now.