### Simple APIs
Application Program Interfaces, or APIs, let two pieces of software communicate with each other.

In [1]:
import pandas as pd

d = {
    'a':[11,21,31],
    'b':[12,22,32]
}
df = pd.DataFrame(d)

We use pandas API to process the data by communicating with the other software components. 

When you create a dictionary, then create a pandas object with the DataFrame constructor, in API lingo, this is an instance.

The data in the dictionary is passed along to the pandas API.

You then us the dataframe to communicate with the API.

In [2]:
df.head()

Unnamed: 0,a,b
0,11,12
1,21,22
2,31,32


When you call the method <code>head()</code>, the DataFrame communicates with the API displaying the first few rows of the DataFrame

##### REST APIs
REST APIS (Representational State Tranfer) allow you to communicate through the internet, letting you take advantage of resources like storage, access more data, artificial intelligence, algoritsm, and much more

You or your code are often referred to as **client**. The web service is referred to as a **resource**.

The client sends requests to the resource and the response to the client.
- We tell the REST APIs what to do by sending a request
    - The request is usually communicated via an HTTP message, which usually contains a JSON file
- This contains instructions for what operation we would like the service to perform.
    - This operation is transmitted to the webservice via the internet.
- The service performs the operation.
- The webservice returns a response via an HTTP message where information is usually returned via a JSON file.
    - This is transmitted back to the client/

For an example, we will us **Py-Coin-Gecko Python Client**, or **Wrapper**, for the Coin Gecko API, updated every minute by Coin-Gecko.

In [4]:
%pip install pycoingecko
from pycoingecko import CoinGeckoAPI
cg = CoinGeckoAPI
bitcoin_data = cg.get_coin_market_chart_by_id(id='bitcoin',vs_currency='usd',days=30)

Note: you may need to restart the kernel to use updated packages.


TypeError: get_coin_market_chart_by_id() missing 1 required positional argument: 'self'

### API Keys and Endpoints
API Keys are a unique set of characters that the API uses to identify and authorize you. This allows you access to the API.
- In many API keys you may get charged for each call

An endpoint is simply the location of the service. It's used to find the API on the internet, like a web address.

### REST APIs & HTTP Requests
When the client uses a web page, the browser sends an HTTP request to the server where the page is hosted. The server tries to find the desired resource.

A Uniform Resource Locator (URL) is the most popular way to find resources on the web. We can break the URL into three parts:
- Scheme: the protocol
    - example: http://
- Internet address or Base URL: used to find the location
    - example: www.ibm.com
- Route: location on the web server
    - example: /images/IDSNlogo.png

##### Request and Response Process
Status code:
- 100 - indicates that everythng is okay so far
- 200 - successfule responses
- 400 - the request is unauthorized
- 500 - server errors

##### Requests library
Requests is a python library that allows you to send HTTP/1.1 requests easily

In [3]:
import requests
url = 'https://www.ibm.com/'
r = requests.get(url)   # The response object, which has information about the request
r.status_code

200

The status code 200 means the operation was successful.

We can also get the request headers:

In [4]:
r.request.headers

{'User-Agent': 'python-requests/2.26.0', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive', 'Cookie': '_abck=E6F33CFD705AAB3946DD27C781E4FE4E~-1~YAAQp93dWK/ASrN9AQAA9orONAeN3HFSsOIqqEjvtp8rAd1kfBcWbzgJNpg1wTLIzi9tFdxeFE4x5G3T298Ey1LxasdvmYg4LKuWOb70ZkjwBMY7eRiLJRWFA1SYZCbUgb9KIlhOdrcAdRafhXwW7sMVGh/dgnPPZ9kmb7SKUytPeZzP6ug6FXXuceeFvx/vN2fER22Lv/GwLUoH+zofLpnBADbSCtk/9l4jYzAsQHqsV5AXGqCzgB6qBV0i0HICOPbGqZG+9tTODMQcOxArj9I49WDkAGRMqv4X1CalkgUzs4FjWIeEtIBgMfWx3dzR3PLRJFzeUJPipYwVw6UAmUKsWIPWOnXOOfYxLyUXPGI5q79/6fU=~-1~-1~-1; bm_sz=1A47C250D8C3869F81911A5A63D8A2AC~YAAQp93dWLDASrN9AQAA9orONA6TRqyKQhA1NVooVazpby0TyCT3mdzxhqtu2tv4EY6lM7+whvTO/WU8mDNAzIJTJdnW0/v8L01yq88GIvs0EQ58UZOoQAcw40Ddz9yTmeNo2NxFc9FPj11N6NVFrORJERNHxYh2seNxm7LBTqr6qWM2jliq0kvFuRPqvuPzgN7UnQX0Yu0WFwV7oYMQVteV9UqsB9OwuSDaIv+AZEcW9rXBbeAQPDcc7cymtJVmEdc8zHBF+SKibZgBNO0xHURIP7pd8bv2O92pnv55fsA=~3682628~4539186'}

We can view the request body in the following line. Since there is no body for a GET request, we return a None.

In [6]:
r.request.body

We can view the HTTP response headers. This returns a dictionary of HTTP response headers.

In [7]:
header = r.headers
header

{'Server': 'Apache', 'x-drupal-dynamic-cache': 'UNCACHEABLE', 'Link': '<https://www.ibm.com/de-de>; rel="canonical", <https://www.ibm.com/de-de>; rel="revision", <//1.cms.s81c.com>; rel=preconnect; crossorigin, <//1.cms.s81c.com>; rel=dns-prefetch', 'x-ua-compatible': 'IE=edge', 'Content-Language': 'de-de', 'permissions-policy': 'interest-cohort=()', 'x-generator': 'Drupal 9 (https://www.drupal.org)', 'x-dns-prefetch-control': 'on', 'x-drupal-cache': 'MISS', 'Last-Modified': 'Fri, 07 Jan 2022 05:55:07 GMT', 'ETag': '"1641534907"', 'Content-Type': 'text/html; charset=UTF-8', 'x-acquia-host': 'www.ibm.com', 'x-acquia-path': '/de-de', 'x-acquia-site': '', 'x-acquia-purge-tags': '', 'x-varnish': '814844159', 'x-age': '0', 'Accept-Ranges': 'bytes', 'Content-Encoding': 'gzip', 'Cache-Control': 'public, max-age=300', 'Expires': 'Fri, 07 Jan 2022 13:55:58 GMT', 'X-Akamai-Transformed': '9 11793 0 pmb=mTOE,1', 'Date': 'Fri, 07 Jan 2022 13:50:58 GMT', 'Content-Length': '11863', 'Connection': 'kee

We can look at the dictionary values

In [8]:
header['date']

'Fri, 07 Jan 2022 13:50:58 GMT'

In [9]:
header['Content-Type']

'text/html; charset=UTF-8'

Using the response object, 'r', we can also check the encoding:

In [10]:
r.encoding

'UTF-8'

Since the content-type is text or html, we can use the attribute text to display the HTML in the body

In [12]:
r.text[:100]

'<!DOCTYPE html>\n<html lang="de-de" dir="ltr">\n  <head>\n    <meta charset="utf-8" />\n<script>digitalD'

In [14]:
url_get = 'http://httpbin.org/get'    # We have the base URL with GET appended to the end
payload ={          # To create a Query string, we use the dictionary payload. The keys are the parameter names and the values are the value of the Query string
    'name':'Joseph',
    'ID':'123'
}
r = requests.get(url_get,params=payload)    # Then we pass the dictionary to the params parameter of the get() function

In [20]:
print(f'{r.url}\n{r.request.body}\n{r.status_code}')

http://httpbin.org/get?name=Joseph&ID=123
None
200


In [22]:
print(r.text)
print(r.headers['Content-Type'])

{
  "args": {
    "ID": "123", 
    "name": "Joseph"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate, br", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.26.0", 
    "X-Amzn-Trace-Id": "Root=1-61d8485d-20595e4c3ffd538b0f96bb18"
  }, 
  "origin": "92.206.237.56", 
  "url": "http://httpbin.org/get?name=Joseph&ID=123"
}

application/json


Since the content is in JSON, we format it using the method <code>.json()</code>:

In [23]:
r.json()

{'args': {'ID': '123', 'name': 'Joseph'},
 'headers': {'Accept': '*/*',
  'Accept-Encoding': 'gzip, deflate, br',
  'Host': 'httpbin.org',
  'User-Agent': 'python-requests/2.26.0',
  'X-Amzn-Trace-Id': 'Root=1-61d8485d-20595e4c3ffd538b0f96bb18'},
 'origin': '92.206.237.56',
 'url': 'http://httpbin.org/get?name=Joseph&ID=123'}

It returns a Python dict. The key 'args' has the name and values for the query string.

In [24]:
r.json()['args']

{'ID': '123', 'name': 'Joseph'}

##### Post Request
A post request is used to send data to a server, but the post request sends the data in a request body, not the url.

In order to send the post request in the URL, we change the route to POST

In [27]:
url_post = 'http://httpbin.org/post'
payload ={
    'name':'Joseph',
    'ID':'123'
}
r_post = requests.post(url_post,data=payload)

To make a post request, we use the <code>post()</code> function.

The variable <code>payload</code> is passed to the parameter data.

In [28]:
print(f'POST request URL: {r_post.url}')
print(f'GET request URL: {r.url}')

POST request URL: http://httpbin.org/post
GET request URL: http://httpbin.org/get?name=Joseph&ID=123


In [1]:
print(f'POST request URL: {r_post.request.body}')
print(f'GET request URL: {r.request.body}')

NameError: name 'r_post' is not defined

### HTML for Webscraping
The web page is comprised of HTML. It consist of text surronded by a series of blue text elements enclosed in angle brackets. The tags tell the browser how to display the content.

The first portion contains the "DOCTYPE html", which determines the documents type.

Next we have the body. This is what is displayed on the webpage

#### Webscraping

In [2]:
from bs4 import BeautifulSoup

We can store the webpage HTML as a string.

To parse the document, pass it into the BeautifulSoup constructor. We get the BeautifulSoup object which represents the document as a nested data structure.

BeautifulSoup represents HTML as a set of Tree like objects with methods used to parse the HTML