# Web Scraping with APIs

### What is API?
An **Application Programming Interface (API)** enables developers to create repetitive but highly sophisticated software with minimal code. APIs act as prepacked functionality that developer can drop into their code. An example would be apps that use map based location. Almost all these softwares did not build their own map technology because it would be a costly endeavor for any small company. Instead, they most likely used a pre-built map API, like Google's. Furthermore,there is no overhead cost of using API.

### How does API work?
Basically, APIs act as an intermediary for two pieces of software. Implementing APIs into code requres two steps:

1. The script must create a **Get** query sent to API with parameters
2. After request is sent, the API returns a response that is typically encoded in a JSON format.

Note: **JSON** is the primary format in which data is passed back and forth to APIs and most API servers will send the responses in JSON format.

### Why using API?
Since APIs provide us direct access to the data from the web server, it is always a better option over building a web scraper from scratch. Web scraper could be used to extract data when the web server does not provide an API to access the data. 

## Part I - Creating API Requests

In this section, we are going to explore the first step of using APIs creating the request. To practice, we are using this free API on [upcitemdb.com](https://devs.upcitemdb.com/). Select the "Explorer FREE" service with no sign up required.
<br>
<br>
<div>
    <img src="images/upcitemdb.png"/>
</div>
<br>

This software translates barcodes into a whole host of information including the product's name and brand. To see an example of what an API call should look like, use their online GUI. To demo, we just use a Crosley Furniture barcode.

#### Sending API Request:
<br>
<div>
    <img src="images/upcitemdb_request.png"/>
</div>
<br>

#### Return JSON Format:
<div>
    <img src="images/upcitemdb_get.png"/>
</div>
<br>

This is a lot of information to break down. But, for now, let's just focus at the request URL line at the top. This line contain two factors, the base URL, or everything preceding the question mark, and the parameters of the API call, or everything after the question mark. The base URL is independent of our parameters and will be the same for any barcode we'll look up. On the other hand, the parameters are specific to this barcode search. This may sound complex, but this breaks down very simplly into code.

Let' start by contructing the base URL for the API call!

In [1]:
# Import the dependencies
import requests

In [2]:
# Assigning the base URL
baseURL = 'https://api.upcitemdb.com/prod/trial/lookup'

In [3]:
# Construct the parameter for the API call
# The parameter is a Python dictionary
parameters ={'upc': '710244229739'}

In [4]:
# Send the GET query containing the parameters and the base URL
response = requests.get(baseURL, params=parameters)

In [5]:
# Print the response URL attribute
print(response.url)

https://api.upcitemdb.com/prod/trial/lookup?upc=710244229739


## Part II - Parsing through JSON

In the previous section, we build the API request to the UPCitemdb interface and now it's time to work with the API's response. Recall that APIs transfer information in the form of JSON documents, which is not a native data structure in Python. So to make the response more navigable, we need the help of a library called **JSON**. JSON has a function called **loads** that converts JSON documents into dictionaries, so making the API's response much more Python-friendly. 

In [6]:
# Import the JSON package
import json

In [7]:
# Create the response content
content = response.content

In [8]:
# Print the content
print(content)

b'{"code":"OK","total":1,"offset":0,"items":[{"ean":"0710244229739","title":"Crosley Outdoor End Table Furniture Cover","description":"Protects furniture from sun, rain, snow, dirt, sap and more; Cover is waterproof; Made from heavy gauge, reinforced vinyl; Gray vinyl; Puncture resistant; Scratch resistant lining; Sewn-in drawstrings secure covers to furniture; Fully assembled","upc":"710244229739","brand":"Crosley Furniture","model":"CO7504-GY","color":"Gray","size":"","dimension":"","weight":"1.33lb","category":"Furniture > Outdoor Furniture > Outdoor Ottomans","currency":"","lowest_recorded_price":23.15,"highest_recorded_price":119.84,"images":["https://images.homedepot-static.com/productImages/910c3957-2a41-475a-a2f5-804812da423c/svn/crosley-furniture-patio-table-covers-co7504-gy-64_1000.jpg","https://i5.walmartimages.com/asr/03116700-8443-495a-a449-d648981d0d7d_1.8937966bee9f7e5ab6cd924844e702c6.jpeg?odnHeight=450&odnWidth=450&odnBg=ffffff","http://site.unbeatablesale.com/MDMC3356

In [9]:
# Convert JSON document into dictionaries
info = json.loads(content)

In [10]:
# Print the info dictionary
print(type(info))
print(info)

<class 'dict'>
{'code': 'OK', 'total': 1, 'offset': 0, 'items': [{'ean': '0710244229739', 'title': 'Crosley Outdoor End Table Furniture Cover', 'description': 'Protects furniture from sun, rain, snow, dirt, sap and more; Cover is waterproof; Made from heavy gauge, reinforced vinyl; Gray vinyl; Puncture resistant; Scratch resistant lining; Sewn-in drawstrings secure covers to furniture; Fully assembled', 'upc': '710244229739', 'brand': 'Crosley Furniture', 'model': 'CO7504-GY', 'color': 'Gray', 'size': '', 'dimension': '', 'weight': '1.33lb', 'category': 'Furniture > Outdoor Furniture > Outdoor Ottomans', 'currency': '', 'lowest_recorded_price': 23.15, 'highest_recorded_price': 119.84, 'images': ['https://images.homedepot-static.com/productImages/910c3957-2a41-475a-a2f5-804812da423c/svn/crosley-furniture-patio-table-covers-co7504-gy-64_1000.jpg', 'https://i5.walmartimages.com/asr/03116700-8443-495a-a449-d648981d0d7d_1.8937966bee9f7e5ab6cd924844e702c6.jpeg?odnHeight=450&odnWidth=450&odnB

Now that we have a Python dictionary, we can extract the data that we need from the dictionary by the keys. We want to extract the item's title, brand name, highest price, and lowest price. All the keys are within the "items" key, so we begin by extracting the item from the dictionary.

In [11]:
# Extract the item from the dictionary
item = info['items']

In [12]:
# Extract the item info form item
itemInfo = item[0]

In [13]:
# Extrac the title of the item
title = itemInfo['title']

In [14]:
# Extract the brand name of the item
brand = itemInfo['brand']

In [15]:
# Extract the highest price
high = itemInfo['highest_recorded_price']

In [16]:
# Extract the lowest price
low = itemInfo['lowest_recorded_price']

In [17]:
# Print the titel and brand
print("Item Title: ", title)
print("Item Brand: ", brand)
print("Highest Price: ", high)
print("Lowest Price: ", low)

Item Title:  Crosley Outdoor End Table Furniture Cover
Item Brand:  Crosley Furniture
Highest Price:  119.84
Lowest Price:  23.15


## Part III - Using API Keys

For most of the API services, **API key** is used to mandate access the software.  Making key mandatory for an API allows the developer to inspect who's calling their software as well as monitor how many calls each client makes.  This is important as there's an overhead cost for API developer who has to constantly update their interface's software. By monitoring their clients calls, they can appropriately price pans for each client's needs. As an result, some APIs are locked behind accounts or pay walls, so we need to make an account with the organization hosting the API to obtain keys for access.  

In this demo, we are using OpenWeather Map's API. You can create a free account [here](https://home.openweathermap.org/users/sign_up) which gives you access to these base features. Once you completed the registration process, they will send you an unique API key for your account that you must use when construting the requests.
<br>
<br>
<div>
    <img src="images/OpenWeather.png"/>
</div>
<br>

Head to their documentation for three-hour five-day forecasts, we can see how the API is called under the API call header.
<br>
<br>
<div>
    <img src="images/OpenWeather_doc.png"/>
</div>
<br>

We follow the same logic by creating the base URL and construct the parameters for the API call. In this example, the API call needs a city name and a country code. We are choosing the city Seattle with the country code US. 

### Attach the API Key:
To make sure the interface responds to us, we need to attach the API key with our GET request. To do this, we store the API key in a separated Python file (config.py). You can insert your own key to this file to follow the steps in this demo. We are going to import the API key directly from this file.

The reason we store the API key in a separated file because we do not want to include the API key in our code that could expose the private key to others. It is very dangerous exposing the API key to the public because most of these services are linked to your payment account.



In [18]:
# Import the dependencies and API key
import requests
import json
from config import api_key

In [19]:
# Assign the base URL
# Remember to attach the http protocol to the front of the URL
baseURL = "http://api.openweathermap.org/data/2.5/forecast"

In [20]:
# Construct the parameter and API key for the request
parameters = {'APPID':api_key, 'q':'Seattle,US'}

In [21]:
# Create the request
response = requests.get(baseURL, params=parameters)

In [22]:
# Print the content of the response
print(response.content)

b'{"cod":"200","message":0,"cnt":40,"list":[{"dt":1596078000,"main":{"temp":296.78,"feels_like":294.59,"temp_min":295.52,"temp_max":296.78,"pressure":1013,"sea_level":1013,"grnd_level":1004,"humidity":40,"temp_kf":1.26},"weather":[{"id":800,"main":"Clear","description":"clear sky","icon":"01d"}],"clouds":{"all":1},"wind":{"speed":2.9,"deg":11},"visibility":10000,"pop":0,"sys":{"pod":"d"},"dt_txt":"2020-07-30 03:00:00"},{"dt":1596088800,"main":{"temp":291.98,"feels_like":290.21,"temp_min":290.32,"temp_max":291.98,"pressure":1015,"sea_level":1015,"grnd_level":1006,"humidity":53,"temp_kf":1.66},"weather":[{"id":800,"main":"Clear","description":"clear sky","icon":"01n"}],"clouds":{"all":0},"wind":{"speed":2.23,"deg":23},"visibility":10000,"pop":0,"sys":{"pod":"n"},"dt_txt":"2020-07-30 06:00:00"},{"dt":1596099600,"main":{"temp":289.48,"feels_like":287.83,"temp_min":288.9,"temp_max":289.48,"pressure":1015,"sea_level":1015,"grnd_level":1006,"humidity":56,"temp_kf":0.58},"weather":[{"id":800,"

In [23]:
# Convert JSON document into dictionaries
info = json.loads(response.content)

In [24]:
# Print the dicitionary
print(info)

{'cod': '200', 'message': 0, 'cnt': 40, 'list': [{'dt': 1596078000, 'main': {'temp': 296.78, 'feels_like': 294.59, 'temp_min': 295.52, 'temp_max': 296.78, 'pressure': 1013, 'sea_level': 1013, 'grnd_level': 1004, 'humidity': 40, 'temp_kf': 1.26}, 'weather': [{'id': 800, 'main': 'Clear', 'description': 'clear sky', 'icon': '01d'}], 'clouds': {'all': 1}, 'wind': {'speed': 2.9, 'deg': 11}, 'visibility': 10000, 'pop': 0, 'sys': {'pod': 'd'}, 'dt_txt': '2020-07-30 03:00:00'}, {'dt': 1596088800, 'main': {'temp': 291.98, 'feels_like': 290.21, 'temp_min': 290.32, 'temp_max': 291.98, 'pressure': 1015, 'sea_level': 1015, 'grnd_level': 1006, 'humidity': 53, 'temp_kf': 1.66}, 'weather': [{'id': 800, 'main': 'Clear', 'description': 'clear sky', 'icon': '01n'}], 'clouds': {'all': 0}, 'wind': {'speed': 2.23, 'deg': 23}, 'visibility': 10000, 'pop': 0, 'sys': {'pod': 'n'}, 'dt_txt': '2020-07-30 06:00:00'}, {'dt': 1596099600, 'main': {'temp': 289.48, 'feels_like': 287.83, 'temp_min': 288.9, 'temp_max': 2

## Part IV - Linking API Calls

Now that we have seen how to make an API call for a single request. We would like to briefly talk about linking API calls. This strategy revolves around chaining information between APIs to generate complex processes with minimal coding. With the countless APIs on the market today, creating such a software is more feasible than ever. Here we have pulled up [RapidAPI.com](https://rapidapi.com/), which is a marketplace that contains hundreds of free APIs for software developers. 
<br>
<br>
<div>
    <img src="images/RapidAPI.png"/>
</div>
<br>

This can be a creative playground for cool projects. There is APIs for text messaging, weather info, recipes, text analysis, and facial recognition. So for example, by linking APIs we could create a program that could receive texts using Twilio's API that contain a food article then reduce the article to just key ingredient names with text analysis API. And finally, generate a recipe based off the initial article we texted in using a food recipe API. This complex process can be generated with just three API calls, which we have discovered is quite a simple process. So hopefully armed with this context, you can venture forth and complete your own complex processes using multiple APIs.