# Getting data from Web APIs

Most Web sites and applications nowadays provides access to its data and functionality using the REST architectural approach. The essential feature of REST is that it is acccessed using the HTTP protocol, i.e. you invoke operations by using the URLs of those.
While a call to a REST operation can be used for getting data or for putting or doing some modification, here we focus only on reading data.

NOTE: Data in this Notebook is downloaded live, so some of the sentences may not work if some jobs have been removed from AngelList or got outdated.

## A simple example

In REST APIs we need to first learn the different endpoints or objects offered. As an example, let's look at the RESTful API of AngelList: https://angel.co/api

We can see in the left different endpoints: Startups, Jobs, Comments, Follows, Search, etc. 
Let's focus first on Jobs.

Theere are four job related operations in the API:

    GET /jobs
    GET /jobs/:id
    GET /startups/:startup_id/jobs
    GET /tags/:tag_id/jobs


The first one gets all the jobs available (in pages, see below), the second gets the details for a particular job offering. The third gets the jobs offered by a particular startup. The fourth allows us to get the jobs that are assigned to particular tags. This is a typical structure of a REST API.

## Getting a piece of data

Getting data is as simple as requesting one of these URIs.

In [12]:
import requests as rq
r = rq.get('https://api.angel.co/1/startups/6702')
print r.status_code
print r.headers['content-type']
print r.text

200
application/json; charset=utf-8
{"id":6702,"hidden":false,"community_profile":false,"name":"AngelList","angellist_url":"https://angel.co/angellist","logo_url":"https://d1qb2nb5cznatu.cloudfront.net/startups/i/6702-766d1ce00c99ce9a5cbc19d0c87a436e-medium_jpg.jpg?buster=1367604615","thumb_url":"https://d1qb2nb5cznatu.cloudfront.net/startups/i/6702-766d1ce00c99ce9a5cbc19d0c87a436e-thumb_jpg.jpg?buster=1367604615","quality":10,"product_desc":"AngelList is a platform for startups to meet investors, talent and incubators. \n\n- Meet investors http://angel.co/people/investors and raise money online http://angel.co/invest\n\n- Meet candidates http://angel.co/jobs\n\n- Apply to incubators http://angel.co/incubators/apply\n\n- We also have an API http://angel.co/api.\n\nAccredited investors can invest in syndicates alongside some of the best angel investors in the world. In 2014, we've invested more than $100m in over 240 startups.\n\n- Meet lead investors: angel.co/syndicates \n\n- Find gre

## Parsing Json

Javascript Object Notation (JSON) is a format for encoding data in a hierarchical way similar to XML. Many REST APIs return JSON objects, as seen in the example above. We can parse JSON objects from inside Python. 

In [13]:
r = rq.get('https://api.angel.co/1/jobs/51171?access_token=ff0c51459704ba520fefe32bd86ba86b6d837ae6f5d4d93c') # Replace the id if not found.
print r.text

{"id":51171,"title":"Data Analyst","description":"We are hiring!\n\n\n\n\nData Analyst in Athens, GA\n\n\n\n\n\n\n\n\nIterative is looking for a Data Analyst to join our Business Analytics team.\u00a0\n\n\n\n\nHere at Iterative, we help brick-and-mortar retailers improve their operations and in-store experience to ultimately drive revenue.\u00a0\n\n\n\n\nResponsibilities\u00a0\n\n\n\n\nIterative is looking for Data Analyst to validate insights from retail data. As part of a growing and dynamic organisation, you will have a unique opportunity to work in an innovative product for the retail industry. \u00a0\n\n\n\n\nPosition Description:\u00a0\n\n\n\n\n\u2022\tValidate analysis of large data sets for key internal and external stakeholders.\u00a0\n\u2022\tPrepare customer-facing presentations for a variety of retail customers.\u00a0\n\u2022\tCollaborate with other analysts as well as other functional groups within the organization on existing and new data product development.\u00a0\n\u202

In [14]:
import json
job = json.loads(r.text)
print(json.dumps(job, indent=2))
print type(job)

{
  "startup": {
    "community_profile": false, 
    "thumb_url": "https://d1qb2nb5cznatu.cloudfront.net/startups/i/359901-686be18becf13d1b4af60fc3ba708c90-thumb_jpg.jpg?buster=1394710511", 
    "name": "Iterative", 
    "product_desc": "Our product is a retail analytics platform for the brick mortar store. We are manager of managers and we turn complex unstructured pools of data into simple actions. Our data collection engine aggregates data from video sources, merchandise sensors, beacons, network traffic, data bases, etc. \n\nOur output is automated and specific to each store location. We provide the store managers (staff) with simple actions so each location archives maximum profitability.", 
    "angellist_url": "https://angel.co/iterative", 
    "high_concept": "Retail analytics platform to turn data into simple automated actions", 
    "created_at": "2014-03-13T11:35:18Z", 
    "updated_at": "2015-04-27T14:25:53Z", 
    "company_url": "http://www.it-erative.com", 
    "follower

## Dictionaries in Python

JSON is basically composed of dictionaries, i.e. maps of keys (strings) to values. And the values associated can be another dictionary. For example, in the above JSON, key "title" has as value an string, but key "startup" is associated to another dictionary with several key-value pairs describing the startup posting the job.

Python has a dictionary type that is very similar to JSON structures.

In [15]:
myoffer = {"title": "Data scientist", "job_type": "full-time"}
print myoffer["job_type"]
myoffer["salary_max"] = 200000
print len(myoffer)

full-time
3


## Converting a JSON document to a Dictionary

The <code>json.loads</code> converts JSON documents into Python data types using the following conventions:
    https://docs.python.org/3/library/json.html#json.loads

The single job above is a JSON object, so that it is directly converted to a dict.

In [16]:
job["tags"]

[{u'angellist_url': u'https://angel.co/data-analysis',
  u'display_name': u'Data Analysis',
  u'id': 15731,
  u'name': u'data analysis',
  u'tag_type': u'SkillTag'},
 {u'angellist_url': u'https://angel.co/financial-modeling',
  u'display_name': u'Financial Modeling',
  u'id': 26875,
  u'name': u'financial modeling',
  u'tag_type': u'SkillTag'},
 {u'angellist_url': u'https://angel.co/financial-analysis',
  u'display_name': u'Financial Analysis',
  u'id': 29625,
  u'name': u'financial analysis',
  u'tag_type': u'SkillTag'},
 {u'angellist_url': u'https://angel.co/excel',
  u'display_name': u'Excel',
  u'id': 61036,
  u'name': u'excel',
  u'tag_type': u'SkillTag'},
 {u'angellist_url': u'https://angel.co/athens-ga',
  u'display_name': u'Athens, GA',
  u'id': 109037,
  u'name': u'athens, ga',
  u'tag_type': u'LocationTag'},
 {u'angellist_url': u'https://angel.co/devops-3',
  u'display_name': u'DevOps',
  u'id': 150979,
  u'name': u'devops',
  u'tag_type': u'RoleTag'}]

Let's now work with a page of job results:

In [21]:
r = rq.get('https://api.angel.co/1/jobs?access_token=ff0c51459704ba520fefe32bd86ba86b6d837ae6f5d4d93c')
jobs = json.loads(r.text)
# print(json.dumps(job, indent=2))
print type(jobs)
print type(jobs["jobs"])
print jobs.keys()
print jobs["per_page"]
print jobs["total"]
print jobs["last_page"]

<type 'dict'>
<type 'list'>
[u'per_page', u'last_page', u'total', u'jobs', u'page']
50
21603
433


The <code>per_page</code>, <code>last_page</code>, <code>total</code> and <code>page</code> allows us to retrieve all the job offerings using severall GET calls.

## Transforming data into DataFrames

We can build data frames from JSON data by first creating the columns (Series) and then populating one row per JSON fragment that is retrieved.

In [22]:
import pandas as pd

jobsframe = pd.DataFrame(columns=["Name", "Type", "MaxSalary", "MinSalary", "Currency", "MaxEquity"])


def add_row(df, joboffer):
    name = joboffer["startup"]["name"]
    typ = joboffer["job_type"]
    ms = joboffer["salary_max"]
    mis = joboffer["salary_min"]
    c = joboffer["currency_code"]
    me = joboffer["equity_max"]
    if me == None:
        me = 0
    else:
        me = float(me)
    df.loc[joboffer["id"]] = [name, typ, ms, mis, c, me]
    
for row in jobs["jobs"]:
    add_row(jobsframe, row)

print type(jobsframe["MaxEquity"][65312])
print type(jobsframe["MaxSalary"][65312])


jobsframe.head(10)    


<type 'numpy.float64'>
<type 'numpy.float64'>


Unnamed: 0,Name,Type,MaxSalary,MinSalary,Currency,MaxEquity
65312,WEPUL,full-time,60000.0,40000.0,GBP,0.0
65311,Box8,full-time,3000000.0,1000000.0,INR,0.0
65310,Graduate The Globe,cofounder,0.0,0.0,USD,20.0
65309,Finerd,internship,0.0,0.0,USD,0.0
65308,Finerd,internship,0.0,0.0,USD,0.0
65307,Box8,full-time,300000.0,1000000.0,INR,0.0
65306,Acomodeo,full-time,5000.0,5000.0,EUR,0.0
65305,Monefy,cofounder,,,USD,2.0
65304,TACS,contract,20000.0,15000.0,USD,0.0
65303,Omnisphyr Mobiquity,full-time,1200000.0,300000.0,INR,0.5


In [23]:
print len(jobsframe)

50


In [24]:
print len(jobsframe[jobsframe["MaxSalary"] !=0])

41
