In [None]:
from cred import GUARDIAN_KEY
import requests
import pandas as pd

## Understanding APIs
When we visit websites online, we provide an address. The address specifies what website data we want to be sent back from the server for us to view in our browsers.

A simple way of thinking about an API, is to also think of it as a website that sends us data which varies depending on what address we provide it. The address has to be a little more complicated than simply www.google.com, but as long as we build that address correctly, we will get what we ask for.

### Guardian API - Interactive Exploration
The Guardian API provides a helpful tool for us to explore how the address it built, and what results we can get back. It is also useful in showing us what kind of options we might have when requesting data.

[Explore the Guardian API](https://open-platform.theguardian.com/explore/)

## Communicating with the API in Python
So we can see how the web address is built using the API explorer, but how do we build that address using Python, and communicate with the API server so that it sends us back the data?

### A Very Basic Example
Initially we will just make the simplest query we can, which is simply contacting the API with our credential key to get it to send back *something*.

First we define the end point, which is essentially the root of the address we are going to start with. The Guardian API has a [few different endpoints](https://open-platform.theguardian.com/documentation/) but for our purposes, the *content* end point is the one we need.

In [None]:
print('HERE IS SOME OUTPUT'*10)

In [None]:
API_ENDPOINT = 'http://content.guardianapis.com/search'

In [None]:
# We first create a dictionary that has a parameter name of api-key, and then our key as its value.
parameters = {'api-key':GUARDIAN_KEY}

Now we're going to communicate with the Guardian API using `requests`. We will pass in the address we are going to communicate with, the `API_ENDPOINT` and by providing `requests` with a dictionary of parameters, it can build the rest of the address for us before making its request for data.

In [None]:
response = requests.get(API_ENDPOINT, params=parameters)

Requests has now communicated with the server and whatever the server sent back has been packaged up in a special `Response` object.

In [None]:
# We can see the type of object
type(response)

In [None]:
# and if we look at the object itself it doesn't tell us much.
response

In [None]:
# One useful check is to see how requests built the url for us...

response.url

Finally, we can see the data that was sent back by asking the response object to show us its data in JSON format.
JSON is essentially a set of nested dictionaries.

In [None]:
response.json()

In [None]:
# The top level dictionary just has one key called 'response' which contains all the other information.
guardian_data = response.json()['response']
guardian_data

In [None]:
# The dictionary under response is what matters and has a few keys with associated values..
guardian_data.keys()

Whilst the other keys have useful information for later, for now `results` is key that contains the news results we want...

In [None]:
guardian_data['results']

In [None]:
# As a list of dictionaries Pandas is able to restructure this information into a table

results  = pd.DataFrame(guardian_data['results'])
results

In [None]:
# To summarise the process
parameters = {'api-key':GUARDIAN_KEY}
response = requests.get(API_ENDPOINT, params=parameters)
guardian_data = response.json()['response']
results = pd.DataFrame(guardian_data['results'])

results

## Customising your request with parameters
To customise our query we simply need to add to or adjust the parameters we pass to our request.

### Query
The search query is the primary way to filter our results.

In [None]:
parameters = {'api-key':GUARDIAN_KEY,
              'q':'crime'}

response = requests.get(API_ENDPOINT, params=parameters)
guardian_data = response.json()['response']
results = pd.DataFrame(guardian_data['results'])

results

In [None]:
response.url

Queries can be more than one word. The Guardian API documentation explains a number of ways you might adjust your query.
- 'Crime AND Prison' - Search for articles where both the terms 'crime' and 'prison' are used.
- 'Crime OR Prison' for either term. - Search for articles where either 'crime' or 'prison' are used.
- '"Criminal justice"' - Using quote marks to search for a phrase.
- 'debate AND NOT immigration' - Search for articles that use the term debate, but not the term immigration.

See the [Guardian API documentation](https://open-platform.theguardian.com/documentation/) for more options.

In [None]:
# Phrases require an extra step because of the way requests works.

parameters = {'api-key':GUARDIAN_KEY,
              'q':'"human rights"'}

response = requests.get(API_ENDPOINT, params=parameters)
guardian_data = response.json()['response']
results = pd.DataFrame(guardian_data['results'])
results

### Additional Filters
Other useful filters that might be of value when narrowing down your search...

In [None]:


parameters = {'api-key':GUARDIAN_KEY,
              'q':'"human rights"',
              'page-size':10, # controls how many results you get per request - max 200
              'production-office':'uk', # what
              'lang':'en',
              'from-date':'2023-01-20',
              'to-date':'2023-01-30'
              }

response = requests.get(API_ENDPOINT, params=parameters)
guardian_data = response.json()['response']
results = pd.DataFrame(guardian_data['results'])
results

In [None]:
guardian_data

In [None]:
results.loc[0,'tags']

In [None]:
query = 'royal family'
page_size = 50
section = 'uk-news'
format_query(query)

In [None]:
params = {'api-key':GUARDIAN_KEY,
          'q':query,
          'page-size':page_size,
          'show-tags':'keyword',
          'tag':'type/article',
          'order-by':'newest',
          'page':1,
          'section':section,
          'show-fields':'wordcount,body,byline',
          'from-date':'2023-02-10'
          }

In [None]:
res = requests.get(API_ENDPOINT, params=params,)
res.url

In [None]:
res.json()

In [None]:
def flatten_nested_dicts(df):
    dicts = df.to_dict(orient='records')
    flattened = pd.json_normalize(dicts)
    return flattened

flatten_nested_dicts(pd.DataFrame(res.json()['response']['results']))

##### API Details
In order for people to know how to query an API (without having a helpful University class) APIs  will usually have documentation. The Guardian API is no different. If you want to understand what different parameters do, or what other options you have to explore the API, the documentation is the best place to start.

[Guardian API Documentation](https://open-platform.theguardian.com/documentation/)

In [None]:
# We first create a dictionary that has a parameter name of api-key, and then our key as its value.
parameters = {'api-key':GUARDIAN_KEY}

parameters

Now we're going to communicate with the Guardian API using `requests`. We will pass in the address we are going to communicate with, the `API_ENDPOINT` and by providing `requests` with a dictionary of parameters, it can build the rest of the address for us before making its request for data.

In [None]:
response = requests.get(API_ENDPOINT, params=parameters)

Requests has now communicated with the server and whatever the server sent back has been packaged up in a special `Response` object.

In [None]:
# We can see the type of object
type(response)

In [None]:
# and if we look at the object itself it doesn't tell us much.
response

In [None]:
# One useful check is to see how requests built the url for us...

response.url

Finally, we can see the data that was sent back by asking the response object to show us its data in JSON format.
JSON is essentially a set of nested dictionaries.

In [None]:
response.json()

In [None]:
# The top level dictionary just has one key called 'response' which contains all the other information.
guardian_data = response.json()['response']
guardian_data

In [None]:
# The dictionary under response is what matters and has a few keys with associated values..
guardian_data.keys()

Whilst the other keys have useful information for later, for now `results` is key that contains the news results we want...

In [None]:
guardian_data['results']

In [None]:
# As a list of dictionaries Pandas is able to restructure this information into a table

results  = pd.DataFrame(guardian_data['results'])
results

In [None]:
# To summarise the process
parameters = {'api-key':GUARDIAN_KEY}
response = requests.get(API_ENDPOINT, params=parameters)
guardian_data = response.json()['response']
results = pd.DataFrame(guardian_data['results'])

results

## Customising your request with parameters
To customise our query we simply need to add to or adjust the parameters we pass to our request.

### Query
The search query is the primary way to filter our results.

In [None]:
parameters = {'api-key':GUARDIAN_KEY,
              'q':'crime'}

response = requests.get(API_ENDPOINT, params=parameters)
guardian_data = response.json()['response']
results = pd.DataFrame(guardian_data['results'])

results

In [None]:
response.url

Queries can be more than one word. The Guardian API documentation explains a number of ways you might adjust your query.
- 'Crime AND Prison' - Search for articles where both the terms 'crime' and 'prison' are used.
- 'Crime OR Prison' for either term. - Search for articles where either 'crime' or 'prison' are used.
- '"Criminal justice"' - Using quote marks to search for a phrase.
- 'debate AND NOT immigration' - Search for articles that use the term debate, but not the term immigration.

See the [Guardian API documentation](https://open-platform.theguardian.com/documentation/) for more options.

In [None]:
# Phrases require an extra step because of the way requests works.

parameters = {'api-key':GUARDIAN_KEY,
              'q':'"human rights"'}

response = requests.get(API_ENDPOINT, params=parameters)
guardian_data = response.json()['response']
results = pd.DataFrame(guardian_data['results'])
results

### Additional Filters
Other useful filters that might be of value when narrowing down your search...

In [None]:


parameters = {'api-key':GUARDIAN_KEY,
              'q':'"human rights"',
              'page-size':10, # controls how many results you get per request - max 200
              'production-office':'uk', # what
              'lang':'en',
              'from-date':'2023-01-20',
              'to-date':'2023-01-30'
              }

response = requests.get(API_ENDPOINT, params=parameters)
guardian_data = response.json()['response']
results = pd.DataFrame(guardian_data['results'])
results

In [None]:
guardian_data

In [None]:
results.loc[0,'tags']

In [None]:
query = 'royal family'
page_size = 50
section = 'uk-news'
format_query(query)

In [None]:
params = {'api-key':GUARDIAN_KEY,
          'q':query,
          'page-size':page_size,
          'show-tags':'keyword',
          'tag':'type/article',
          'order-by':'newest',
          'page':1,
          'section':section,
          'show-fields':'wordcount,body,byline',
          'from-date':'2023-02-10'
          }

In [None]:
res = requests.get(API_ENDPOINT, params=params,)
res.url

In [None]:
res.json()

In [None]:
def flatten_nested_dicts(df):
    dicts = df.to_dict(orient='records')
    flattened = pd.json_normalize(dicts)
    return flattened

flatten_nested_dicts(pd.DataFrame(res.json()['response']['results']))

##### API Details
In order for people to know how to query an API (without having a helpful University class) APIs  will usually have documentation. The Guardian API is no different. If you want to understand what different parameters do, or what other options you have to explore the API, the documentation is the best place to start.

[Guardian API Documentation](https://open-platform.theguardian.com/documentation/)