# Data is the new oil

<div>
<img src="images/info.png" width="500"/>
</div>

[Image source](https://www.stadafa.com/2021/10/largest-companies-2010-vs-2021.html)


Clive Humby, a british mathematician, is credited with coining the phrase "data is the new oil." 

This quote was expanded by Michael Palmer with the assertion that data is "valuable, but if unrefined it cannot really be used.  Oil has to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives profitable activity; so, data must be broken down and analysed for it to have value."


# Public data APIs

How do we access amazing data? The answer is data APIs.

## What is an API?
API stands for Application Programming Interface and it serves as a data transmitter between two different pieces of software. The term has historically been used to describe any sort of connectivity interface to an application. 

Today, the term API typically takes an additional meaning:
- Modern APIs adhere to certain standards (typically HTTP and REST) 
- They are treated like products
- They often adhere to high security standards

## Types of APIs

### By availability
- **Open and public APIs** are available to everyone. Open APIs will be completely open, in the sense that they will provide access to the complete architecture and all features of their code.  Public APIs, contrarily, will provide restricted access to the codes and datasets.

<img src="images/open-apis.png" width=400 height=400 />

- **Internal APIs** are used in-house by developers.

- **Partner APIs** are a form of open API where access is granted under certain conditions determined by the publisher.

### By structure
APIs also differ in architecture. The most popular API architectures are:
- **JSON-RPC and XML-RPC**
RPC stands for Remote Procedure Call and is a protocol for data transmission in JSON or XML format. 
- **REST**
REST stands for representational state transfer. It's a software architectural style that provides a set of recommendations for web development. 
- **SOAP** stands for simple object access protocol and is a definition of API protocols and standards.  

## Accessing a Public Data API with Python

In order to retrieve data from an API we will combine the ``request`` library and the ``JSON`` library.

When we want to receive data from the API we first make a *request*.

To do so in python, we need to install the ``requests`` library. 

Open the virtual environment you are using for this course and run

In [None]:
! pip install requests 
## Recall: The exclamation mark to the left makes this command run in the associated terminal

If you are using conda you can write ``conda install requests``.

As soon as we have installed it we can import it

In [None]:
import requests

### Example: NASA's APOD API

Let us test out [NASA's open API](https://api.nasa.gov/):

In [None]:
demo_key = "DEMO_KEY"

url = "https://api.nasa.gov/planetary/apod?api_key=" + demo_key

response = requests.get(url)
response

In [None]:
import json
data = response.json()
from IPython.display import Image
from IPython.core.display import HTML 

print(response.json()['explanation'])
Image(url=data['url'], width=600)

## Key concepts:

**Get and post**

The two most common requests we make are ``get`` and ``post``:
- ``Get`` is used for viewing (without changing)
- ``Post`` is used for viewing and possibly changing

**Response codes**

We can check the response code to see if our request was successfull. 

In [None]:
if response.status_code == 200:
    print('Request succeeded!')
else: 
    print(f'Uhoh, we got response code {response.status_code}...')

Here are some common status codes: 

(defined by HTTP)

| Code | Status | Description |
| --- | --- | --- |
| 200 | OK | The request was successfully completed   |
| 400 | Bad request| The request was invalid. |
| 401 | Unauthorized | The request did not include an authentication token or the authentication token was expired. |
| 403 | Forbidden | The client did not have permission to access the requested resource. |
| 404 | Not Found | The requested resource was not found. |
| 405 | Not Allowed | The client did not have permission to access the requested resource. |
| 500 | Internal Server Error | The request was not completed due to an internal error on the server side. |
| 503 | Service unavailable | The server was unavailable. |

**Headers**

We can communicate metadata (like who we are) to the API by means of a *header*.

The API also communicates back to us with a header:


In [None]:
response.headers

**Etiquette** 

When you access a public data API you are expected to follow the playing rules of the API. Therefore you should always start by checking the documentation on the website for the API.

<div>
<img src="images/rtfm.png" width="250"/>
</div>

In programming this is often referred to as RTFM (read the freaking manual).

[xkcd: RTFM](https://xkcd.com/293/)

### Exercise

1. Make a get request and parse the data from Kanye Rest API:

(it does not have any specific playing rules)

In [None]:
url = 'https://api.kanye.rest/'

r = requests.get( ... )

data = r. .... 

In [None]:
requests.get?

-----------
## Data formats


The actual data typically gets returned in a ``JSON``, ``XML`` or ``CSV file``.


### CSV

A Comma-Separated Values file is a delimited text file that uses a comma to separate values. It typically stores *tabular data*.

<div>
<img src="images/covid-nums.png" width="300"/>
</div>

*Example*: FHI provides [data](https://www.fhi.no/sv/smittsomme-sykdommer/corona/dags--og-ukerapporter/dags--og-ukerapporter-om-koronavirus/) about the number of confirmed COVID cases in Norway as a .csv file.



### JSON
JSON is the most common way of sending data back and forth in APIs. 

A JavaScript Object Notation (JSON) file encodes *data structures* so that they are easy to read for machines and somewhat easy to read for humans.

JSON is a text file or string that follows the JavaScript object syntax. Most programming languages will have the ability to read (parse) and generate JSON files. 

The ``json`` library in Python has two main functions:
- `json.dumps()` Takes in a python object and converts it (dumps it) to a string
- `json.loads()` Takes in a JSON string and converts it (loads it) to a Python object.

which will convert from/to the following formats

| JSON | Python | 
| --- | --- | 
| object | dict | 
| array  | list | 
| string | str  | 
| number (int) |  int | 
| number (real) | float | 
| true | True | 
| false | False | 
| null | None | 


Let's look at what NASA gave us


In [None]:
import json

# Load data using response.json 
response.json?

In [None]:
data = response.json()

print('Response.json is a ' + str(type(data)))
print('Printing it yields: ')

response.json()

-------

### Exercise

2. Make a get request from the [NASA Mars Rover API](https://api.nasa.gov/) to get pictures from Mars. 
3. Specifiy the camera viewpoint in parameters

In [None]:

# Get satellite image from Houston
parameters = { 
    'sol': 1000,
}

url = 'https://api.nasa.gov/mars-photos/api/v1/rovers/curiosity/photos?page=1&api_key=DEMO_KEY'
r = requests.get( url, parameters )
r.json()

## Entur API

The NASA is a good 'hello world' example of APIs but it does not quite showcase the usefullness of APIs. 

Entur has APIs is an example of an open API, that features both an open source code and APIs for stops, real-time data, mobility trends etc.

The following snippet is adapted from [ruterstop](https://github.com/stigok/ruterstop) by [stigok](https://github.com/stigok):

In [None]:
stop_id = 5926

__version__ = "0.5.1"

ENTUR_CLIENT_ID = __version__

ENTUR_GRAPHQL_ENDPOINT = "https://api.entur.io/journey-planner/v2/graphql"

ENTUR_GRAPHQL_QUERY = """
{
  stopPlace(id: "NSR:StopPlace:5926") {
    name
    estimatedCalls(timeRange: 72100, numberOfDepartures: 20) {
      expectedArrivalTime
      realtime
      destinationDisplay {
        frontText
      }
      serviceJourney {
        directionType
        line {
          publicCode
        }
      }
    }
  }
}
"""

headers = {
        "Accept": "application/json",
        "ET-Client-Name": "UIO:IN3110 - ingeborggjerde",
        "ET-Client-Id": ENTUR_CLIENT_ID,
    }

qry = ENTUR_GRAPHQL_QUERY % dict(stop_id=stop_id)
res = requests.post(
    ENTUR_GRAPHQL_ENDPOINT,
    headers=headers,
    timeout=5,
    json=dict(query=qry, variables={}),
)

In [None]:
res.json()