In [1]:
from cred import GUARDIAN_KEY
import requests
import pandas as pd

## Understanding APIs
When we visit websites online, we provide an address. The address specifies what website data we want to be sent back from the server for us to view in our browsers.

A simple way of thinking about an API, is to also think of it as a website that sends us data which varies depending on what address we provide it. The address has to be a little more complicated than simply www.google.com, but as long as we build that address correctly, we will get what we ask for.

### Guardian API - Interactive Exploration
The Guardian API provides a helpful tool for us to explore how the address it built, and what results we can get back. It is also useful in showing us what kind of options we might have when requesting data.

[Explore the Guardian API](https://open-platform.theguardian.com/explore/)

## Communicating with the API in Python
So we can see how the web address is built using the API explorer, but how do we build that address using Python, and communicate with the API server so that it sends us back the data?

### A Very Basic Example
Initially we will just make the simplest query we can, which is simply contacting the API with our credential key to get it to send back *something*.

First we define the end point, which is essentially the root of the address we are going to start with. The Guardian API has a [few different endpoints](https://open-platform.theguardian.com/documentation/) but for our purposes, the *content* end point is the one we need.

In [2]:
API_ENDPOINT = 'http://content.guardianapis.com/search'

In [3]:
# We first create a dictionary that has a parameter name of api-key, and then our key as its value.
parameters = {'api-key':GUARDIAN_KEY}

Now we're going to communicate with the Guardian API using `requests`. We will pass in the address we are going to communicate with, the `API_ENDPOINT` and by providing `requests` with a dictionary of parameters, it can build the rest of the address for us before making its request for data.

In [4]:
response = requests.get(API_ENDPOINT, params=parameters)

Requests has now communicated with the server and whatever the server sent back has been packaged up in a special `Response` object.

In [5]:
# We can see the type of object
type(response)

requests.models.Response

In [11]:
# and if we look at the object itself it doesn't tell us much.
response

<Response [200]>

In [12]:
# One useful check is to see how requests built the url for us...

response.url

'http://content.guardianapis.com/search?api-key=8e4e2cda-63eb-4538-b50a-c1e1d8192e5e'

Finally, we can see the data that was sent back by asking the response object to show us its data in JSON format.
JSON is essentially a set of nested dictionaries.

In [13]:
response.json()

{'response': {'status': 'ok',
  'userTier': 'developer',
  'total': 2463295,
  'startIndex': 1,
  'pageSize': 10,
  'currentPage': 1,
  'pages': 246330,
  'orderBy': 'newest',
  'results': [{'id': 'sport/live/2023/nov/08/england-v-netherlands-cricket-world-cup-2023-live-score-updates',
    'type': 'liveblog',
    'sectionId': 'sport',
    'sectionName': 'Sport',
    'webPublicationDate': '2023-11-08T15:29:42Z',
    'webTitle': 'England v Netherlands: Cricket World Cup 2023 – live',
    'webUrl': 'https://www.theguardian.com/sport/live/2023/nov/08/england-v-netherlands-cricket-world-cup-2023-live-score-updates',
    'apiUrl': 'https://content.guardianapis.com/sport/live/2023/nov/08/england-v-netherlands-cricket-world-cup-2023-live-score-updates',
    'isHosted': False,
    'pillarId': 'pillar/sport',
    'pillarName': 'Sport'},
   {'id': 'world/live/2023/nov/08/russia-ukraine-war-kyiv-avdiivka-zelenskiy-putin',
    'type': 'liveblog',
    'sectionId': 'world',
    'sectionName': 'World 

In [14]:
# The top level dictionary just has one key called 'response' which contains all the other information.
guardian_data = response.json()['response']
guardian_data

{'status': 'ok',
 'userTier': 'developer',
 'total': 2463295,
 'startIndex': 1,
 'pageSize': 10,
 'currentPage': 1,
 'pages': 246330,
 'orderBy': 'newest',
 'results': [{'id': 'sport/live/2023/nov/08/england-v-netherlands-cricket-world-cup-2023-live-score-updates',
   'type': 'liveblog',
   'sectionId': 'sport',
   'sectionName': 'Sport',
   'webPublicationDate': '2023-11-08T15:29:42Z',
   'webTitle': 'England v Netherlands: Cricket World Cup 2023 – live',
   'webUrl': 'https://www.theguardian.com/sport/live/2023/nov/08/england-v-netherlands-cricket-world-cup-2023-live-score-updates',
   'apiUrl': 'https://content.guardianapis.com/sport/live/2023/nov/08/england-v-netherlands-cricket-world-cup-2023-live-score-updates',
   'isHosted': False,
   'pillarId': 'pillar/sport',
   'pillarName': 'Sport'},
  {'id': 'world/live/2023/nov/08/russia-ukraine-war-kyiv-avdiivka-zelenskiy-putin',
   'type': 'liveblog',
   'sectionId': 'world',
   'sectionName': 'World news',
   'webPublicationDate': '20

In [15]:
# The dictionary under response is what matters and has a few keys with associated values..
guardian_data.keys()

dict_keys(['status', 'userTier', 'total', 'startIndex', 'pageSize', 'currentPage', 'pages', 'orderBy', 'results'])

Whilst the other keys have useful information for later, for now `results` is key that contains the news results we want...

In [16]:
guardian_data['results']

[{'id': 'sport/live/2023/nov/08/england-v-netherlands-cricket-world-cup-2023-live-score-updates',
  'type': 'liveblog',
  'sectionId': 'sport',
  'sectionName': 'Sport',
  'webPublicationDate': '2023-11-08T15:29:42Z',
  'webTitle': 'England v Netherlands: Cricket World Cup 2023 – live',
  'webUrl': 'https://www.theguardian.com/sport/live/2023/nov/08/england-v-netherlands-cricket-world-cup-2023-live-score-updates',
  'apiUrl': 'https://content.guardianapis.com/sport/live/2023/nov/08/england-v-netherlands-cricket-world-cup-2023-live-score-updates',
  'isHosted': False,
  'pillarId': 'pillar/sport',
  'pillarName': 'Sport'},
 {'id': 'world/live/2023/nov/08/russia-ukraine-war-kyiv-avdiivka-zelenskiy-putin',
  'type': 'liveblog',
  'sectionId': 'world',
  'sectionName': 'World news',
  'webPublicationDate': '2023-11-08T15:26:15Z',
  'webTitle': 'Russia-Ukraine war live: Ukraine spy agency says it killed Russia-installed lawmaker with car bomb',
  'webUrl': 'https://www.theguardian.com/world

In [17]:
# As a list of dictionaries Pandas is able to restructure this information into a table

results  = pd.DataFrame(guardian_data['results'])
results

Unnamed: 0,id,type,sectionId,sectionName,webPublicationDate,webTitle,webUrl,apiUrl,isHosted,pillarId,pillarName
0,sport/live/2023/nov/08/england-v-netherlands-c...,liveblog,sport,Sport,2023-11-08T15:29:42Z,England v Netherlands: Cricket World Cup 2023 ...,https://www.theguardian.com/sport/live/2023/no...,https://content.guardianapis.com/sport/live/20...,False,pillar/sport,Sport
1,world/live/2023/nov/08/russia-ukraine-war-kyiv...,liveblog,world,World news,2023-11-08T15:26:15Z,Russia-Ukraine war live: Ukraine spy agency sa...,https://www.theguardian.com/world/live/2023/no...,https://content.guardianapis.com/world/live/20...,False,pillar/news,News
2,world/live/2023/nov/08/israel-hamas-war-live-u...,liveblog,world,World news,2023-11-08T15:25:32Z,Israel-Hamas war live: Israel says it has dest...,https://www.theguardian.com/world/live/2023/no...,https://content.guardianapis.com/world/live/20...,False,pillar/news,News
3,business/2023/nov/08/half-of-australia-left-wi...,article,business,Business,2023-11-08T15:23:46Z,Half of Australia left without internet or pho...,https://www.theguardian.com/business/2023/nov/...,https://content.guardianapis.com/business/2023...,False,pillar/news,News
4,business/2023/nov/08/post-office-horizon-inqui...,article,business,Business,2023-11-08T15:22:06Z,"Horizon bugs were kept from me, ex-Post Office...",https://www.theguardian.com/business/2023/nov/...,https://content.guardianapis.com/business/2023...,False,pillar/news,News
5,football/2023/nov/08/manor-solomons-instagram-...,article,football,Football,2023-11-08T15:21:35Z,Manor Solomon’s Instagram account ‘removed by ...,https://www.theguardian.com/football/2023/nov/...,https://content.guardianapis.com/football/2023...,False,pillar/sport,Sport
6,artanddesign/2023/nov/08/picasso-masterpiece-l...,article,artanddesign,Art and design,2023-11-08T15:20:40Z,Picasso masterpiece kicks off auction season f...,https://www.theguardian.com/artanddesign/2023/...,https://content.guardianapis.com/artanddesign/...,False,pillar/arts,Arts
7,politics/live/2023/nov/08/labour-keir-starmer-...,liveblog,politics,Politics,2023-11-08T15:20:12Z,Ex-civil service head thought No 10 team durin...,https://www.theguardian.com/politics/live/2023...,https://content.guardianapis.com/politics/live...,False,pillar/news,News
8,us-news/live/2023/nov/08/ivanka-trump-testimon...,liveblog,us-news,US news,2023-11-08T15:20:00Z,Ivanka Trump takes the stand to testify at fam...,https://www.theguardian.com/us-news/live/2023/...,https://content.guardianapis.com/us-news/live/...,False,pillar/news,News
9,uk-news/2023/nov/08/organiser-of-armistice-day...,article,uk-news,UK news,2023-11-08T15:18:17Z,Organiser of Armistice Day event at Cenotaph h...,https://www.theguardian.com/uk-news/2023/nov/0...,https://content.guardianapis.com/uk-news/2023/...,False,pillar/news,News


In [18]:
# To summarise the process
parameters = {'api-key':GUARDIAN_KEY}
response = requests.get(API_ENDPOINT, params=parameters)
guardian_data = response.json()['response']
results = pd.DataFrame(guardian_data['results'])

results

Unnamed: 0,id,type,sectionId,sectionName,webPublicationDate,webTitle,webUrl,apiUrl,isHosted,pillarId,pillarName
0,sport/live/2023/nov/08/england-v-netherlands-c...,liveblog,sport,Sport,2023-11-08T15:33:26Z,England v Netherlands: Cricket World Cup 2023 ...,https://www.theguardian.com/sport/live/2023/no...,https://content.guardianapis.com/sport/live/20...,False,pillar/sport,Sport
1,us-news/live/2023/nov/08/election-results-abor...,liveblog,us-news,US news,2023-11-08T15:32:59Z,Abortion rights fight brings key victories for...,https://www.theguardian.com/us-news/live/2023/...,https://content.guardianapis.com/us-news/live/...,False,pillar/news,News
2,us-news/live/2023/nov/08/ivanka-trump-testimon...,liveblog,us-news,US news,2023-11-08T15:32:04Z,Ivanka Trump takes the stand to testify at fam...,https://www.theguardian.com/us-news/live/2023/...,https://content.guardianapis.com/us-news/live/...,False,pillar/news,News
3,uk-news/2023/nov/08/derbyshire-pc-guilty-over-...,article,uk-news,UK news,2023-11-08T15:31:51Z,Derbyshire PC guilty over sexual encounter wit...,https://www.theguardian.com/uk-news/2023/nov/0...,https://content.guardianapis.com/uk-news/2023/...,False,pillar/news,News
4,world/live/2023/nov/08/eu-ukraine-moldova-koso...,liveblog,world,World news,2023-11-08T15:30:34Z,European Commission recommends opening EU acce...,https://www.theguardian.com/world/live/2023/no...,https://content.guardianapis.com/world/live/20...,False,pillar/news,News
5,world/live/2023/nov/08/russia-ukraine-war-kyiv...,liveblog,world,World news,2023-11-08T15:26:15Z,Russia-Ukraine war live: Ukraine spy agency sa...,https://www.theguardian.com/world/live/2023/no...,https://content.guardianapis.com/world/live/20...,False,pillar/news,News
6,world/live/2023/nov/08/israel-hamas-war-live-u...,liveblog,world,World news,2023-11-08T15:25:32Z,Israel-Hamas war live: Israel says it has dest...,https://www.theguardian.com/world/live/2023/no...,https://content.guardianapis.com/world/live/20...,False,pillar/news,News
7,business/2023/nov/08/half-of-australia-left-wi...,article,business,Business,2023-11-08T15:23:46Z,Half of Australia left without internet or pho...,https://www.theguardian.com/business/2023/nov/...,https://content.guardianapis.com/business/2023...,False,pillar/news,News
8,business/2023/nov/08/post-office-horizon-inqui...,article,business,Business,2023-11-08T15:22:06Z,"Horizon bugs were kept from me, ex-Post Office...",https://www.theguardian.com/business/2023/nov/...,https://content.guardianapis.com/business/2023...,False,pillar/news,News
9,football/2023/nov/08/manor-solomons-instagram-...,article,football,Football,2023-11-08T15:21:35Z,Manor Solomon’s Instagram account ‘removed by ...,https://www.theguardian.com/football/2023/nov/...,https://content.guardianapis.com/football/2023...,False,pillar/sport,Sport


## Customising your request with parameters
To customise our query we simply need to add to or adjust the parameters we pass to our request.

### Query
The search query is the primary way to filter our results.

In [None]:
parameters = {'api-key':GUARDIAN_KEY,
              'q':'crime'}

response = requests.get(API_ENDPOINT, params=parameters)
guardian_data = response.json()['response']
results = pd.DataFrame(guardian_data['results'])

results

In [None]:
response.url

Queries can be more than one word. The Guardian API documentation explains a number of ways you might adjust your query.
- 'Crime AND Prison' - Search for articles where both the terms 'crime' and 'prison' are used.
- 'Crime OR Prison' for either term. - Search for articles where either 'crime' or 'prison' are used.
- '"Criminal justice"' - Using quote marks to search for a phrase.
- 'debate AND NOT immigration' - Search for articles that use the term debate, but not the term immigration.

See the [Guardian API documentation](https://open-platform.theguardian.com/documentation/) for more options.

In [19]:
# Phrases require an extra step because of the way requests works.

parameters = {'api-key':GUARDIAN_KEY,
              'q':'"human rights"'}

response = requests.get(API_ENDPOINT, params=parameters)
guardian_data = response.json()['response']
results = pd.DataFrame(guardian_data['results'])
results

Unnamed: 0,id,type,sectionId,sectionName,webPublicationDate,webTitle,webUrl,apiUrl,isHosted,pillarId,pillarName
0,us-news/2023/oct/15/biden-pro-palestine-activi...,article,us-news,US news,2023-10-15T12:51:02Z,Biden interrupted by pro-Palestine activist at...,https://www.theguardian.com/us-news/2023/oct/1...,https://content.guardianapis.com/us-news/2023/...,False,pillar/news,News
1,environment/2023/oct/12/human-rights-experts-w...,article,environment,Environment,2023-10-12T11:00:08Z,Human rights experts warn against European cra...,https://www.theguardian.com/environment/2023/o...,https://content.guardianapis.com/environment/2...,False,pillar/news,News
2,us-news/2023/nov/03/israel-aid-democrats-leahy...,article,us-news,US news,2023-11-03T21:19:59Z,Leftist Democrats invoke human rights law in s...,https://www.theguardian.com/us-news/2023/nov/0...,https://content.guardianapis.com/us-news/2023/...,False,pillar/news,News
3,world/2023/sep/04/twitter-saudi-arabia-human-r...,article,world,World news,2023-09-04T10:51:22Z,Twitter accused of helping Saudi Arabia commit...,https://www.theguardian.com/world/2023/sep/04/...,https://content.guardianapis.com/world/2023/se...,False,pillar/news,News
4,politics/2023/sep/26/smirking-suella-trashes-7...,article,politics,Politics,2023-09-26T18:27:30Z,Smirking Suella trashes 70 years of human righ...,https://www.theguardian.com/politics/2023/sep/...,https://content.guardianapis.com/politics/2023...,False,pillar/news,News
5,uk-news/2023/nov/01/amnesty-calls-for-prevent-...,article,uk-news,UK news,2023-11-01T18:03:48Z,Amnesty calls for Prevent strategy to be aboli...,https://www.theguardian.com/uk-news/2023/nov/0...,https://content.guardianapis.com/uk-news/2023/...,False,pillar/news,News
6,law/2023/sep/24/suella-braverman-makes-fresh-a...,article,law,Law,2023-09-24T06:00:16Z,Suella Braverman makes fresh attack on Europea...,https://www.theguardian.com/law/2023/sep/24/su...,https://content.guardianapis.com/law/2023/sep/...,False,pillar/news,News
7,us-news/2023/aug/02/us-mexico-border-human-rig...,article,us-news,US news,2023-08-02T16:50:05Z,US border agents habitually abuse human rights...,https://www.theguardian.com/us-news/2023/aug/0...,https://content.guardianapis.com/us-news/2023/...,False,pillar/news,News
8,world/2023/oct/08/campaigners-aim-to-lower-sup...,article,world,World news,2023-10-08T13:27:10Z,Campaigners aim to lower support for China on ...,https://www.theguardian.com/world/2023/oct/08/...,https://content.guardianapis.com/world/2023/oc...,False,pillar/news,News
9,law/2023/sep/15/bahraini-human-rights-defender...,article,law,Law,2023-09-15T14:43:14Z,Bahraini human rights defender denied travel t...,https://www.theguardian.com/law/2023/sep/15/ba...,https://content.guardianapis.com/law/2023/sep/...,False,pillar/news,News


In [20]:
response.url

'http://content.guardianapis.com/search?api-key=8e4e2cda-63eb-4538-b50a-c1e1d8192e5e&q=%22human+rights%22'

### Additional Filters
Other useful filters that might be of value when narrowing down your search...

See the [Guardian API documentation - Filters](https://open-platform.theguardian.com/documentation/search) for more options.


In [21]:


parameters = {'api-key':GUARDIAN_KEY,
              'q':'"human rights"',
              'page-size':10, # controls how many results you get per request - max 200
              'production-office':'uk', # filter based on where the article was produced
              'lang':'en', # language
              'from-date':'2023-01-20', # only published from a specific date
              'to-date':'2023-01-30', # only published before a specific date,
              'order-by':'oldest' # options - oldest, newest, relevance
              }

response = requests.get(API_ENDPOINT, params=parameters)
guardian_data = response.json()['response']
results = pd.DataFrame(guardian_data['results'])
results

Unnamed: 0,id,type,sectionId,sectionName,webPublicationDate,webTitle,webUrl,apiUrl,isHosted,pillarId,pillarName
0,uk-news/2023/jan/20/northern-irish-man-british...,article,uk-news,UK news,2023-01-20T10:56:45Z,‘Dirty wee torturers’: Northern Irish man tell...,https://www.theguardian.com/uk-news/2023/jan/2...,https://content.guardianapis.com/uk-news/2023/...,False,pillar/news,News
1,global-development/2023/jan/20/survivor-brisa-...,article,global-development,Global development,2023-01-20T13:45:40Z,Rape survivor wins case against ‘cruel and inh...,https://www.theguardian.com/global-development...,https://content.guardianapis.com/global-develo...,False,pillar/news,News
2,world/2023/jan/20/uk-mp-and-peer-on-kazakhstan...,article,world,World news,2023-01-20T13:59:18Z,UK MP and peer on Kazakhstan visit denied acce...,https://www.theguardian.com/world/2023/jan/20/...,https://content.guardianapis.com/world/2023/ja...,False,pillar/news,News
3,world/2023/jan/20/world-uyghur-congress-loses-...,article,world,World news,2023-01-20T15:38:15Z,World Uyghur Congress loses legal challenge ag...,https://www.theguardian.com/world/2023/jan/20/...,https://content.guardianapis.com/world/2023/ja...,False,pillar/news,News
4,technology/2023/jan/20/davos-elite-say-gen-z-w...,article,technology,Technology,2023-01-20T16:00:42Z,"‘They’re 25, they don’t do emails’: is instant...",https://www.theguardian.com/technology/2023/ja...,https://content.guardianapis.com/technology/20...,False,pillar/news,News
5,environment/2023/jan/20/new-carbon-offset-stan...,article,environment,Environment,2023-01-20T17:45:58Z,New carbon offset standards ‘should bring grea...,https://www.theguardian.com/environment/2023/j...,https://content.guardianapis.com/environment/2...,False,pillar/news,News
6,global-development/2023/jan/20/iran-fears-grow...,article,global-development,Global development,2023-01-20T18:10:51Z,Iran: fears grow of security crackdown in Zahe...,https://www.theguardian.com/global-development...,https://content.guardianapis.com/global-develo...,False,pillar/news,News
7,world/2023/jan/21/attack-on-freedom-israel-mov...,article,world,World news,2023-01-21T07:00:00Z,‘Attack on freedom’: Israel moves to claw back...,https://www.theguardian.com/world/2023/jan/21/...,https://content.guardianapis.com/world/2023/ja...,False,pillar/news,News
8,commentisfree/2023/jan/21/tory-migrant-policy-...,article,commentisfree,Opinion,2023-01-21T12:00:06Z,If you ever doubt the hateful effects of Tory ...,https://www.theguardian.com/commentisfree/2023...,https://content.guardianapis.com/commentisfree...,False,pillar/opinion,Opinion
9,commentisfree/2023/jan/21/stoking-a-culture-wa...,article,commentisfree,Opinion,2023-01-21T19:00:15Z,"Stoking a culture war? No, Nicola Sturgeon, th...",https://www.theguardian.com/commentisfree/2023...,https://content.guardianapis.com/commentisfree...,False,pillar/opinion,Opinion


In [22]:
response.url

'http://content.guardianapis.com/search?api-key=8e4e2cda-63eb-4538-b50a-c1e1d8192e5e&q=%22human+rights%22&page-size=10&production-office=uk&lang=en&from-date=2023-01-20&to-date=2023-01-30&order-by=oldest'

#### Should I use all of these?
No, these are OPTIONS, rather than requirements and should be used to refine your data request depending on the type of question you might be studying. However, for most projects about news reporting you will probably want to at least specify that the type of content should be an article.

#### Exercise
Examine the documentation for the Search section of the Guardian API. Can you find the correct filter to add that will allow you to only return results from the  `"society"` section of the Guardian? Add the filter to the parameters dictionary below and run the cell to see what gets returned.

[Guardian API Documentation - Search](https://open-platform.theguardian.com/documentation/search)

In [23]:
# adjust the parameters dictionary
parameters = {'api-key':GUARDIAN_KEY,
              'q':'crime',
              'page-size':10,
              'production-office':'uk',
              'lang':'en',
              'section':'society',
              }

response = requests.get(API_ENDPOINT, params=parameters)
guardian_data = response.json()['response']
results = pd.DataFrame(guardian_data['results'])
results

Unnamed: 0,id,type,sectionId,sectionName,webPublicationDate,webTitle,webUrl,apiUrl,isHosted,pillarId,pillarName
0,society/2023/sep/20/children-worry-more-about-...,article,society,Society,2023-09-20T04:00:08Z,Children worry more about rising prices than e...,https://www.theguardian.com/society/2023/sep/2...,https://content.guardianapis.com/society/2023/...,False,pillar/news,News
1,society/2023/jul/18/ai-could-worsen-epidemic-o...,article,society,Society,2023-07-17T23:01:40Z,AI could worsen epidemic of child sexual abuse...,https://www.theguardian.com/society/2023/jul/1...,https://content.guardianapis.com/society/2023/...,False,pillar/news,News
2,news/2023/oct/20/antisemitic-hate-crimes-in-lo...,article,society,Society,2023-10-20T16:27:08Z,"Antisemitic hate crimes in London up 1,350%, M...",https://www.theguardian.com/news/2023/oct/20/a...,https://content.guardianapis.com/news/2023/oct...,False,pillar/news,News
3,society/2023/oct/05/record-rise-hate-crimes-tr...,article,society,Society,2023-10-05T17:49:18Z,Hate crimes against transgender people hit rec...,https://www.theguardian.com/society/2023/oct/0...,https://content.guardianapis.com/society/2023/...,False,pillar/news,News
4,society/2023/jan/23/assisted-dying-should-be-r...,article,society,Society,2023-01-23T16:19:32Z,Assisted dying should be a right – not a crime...,https://www.theguardian.com/society/2023/jan/2...,https://content.guardianapis.com/society/2023/...,False,pillar/news,News
5,society/2023/mar/08/campaign-calls-for-gender-...,article,society,Society,2023-03-08T08:00:20Z,Campaign calls for gender apartheid to be crim...,https://www.theguardian.com/society/2023/mar/0...,https://content.guardianapis.com/society/2023/...,False,pillar/news,News
6,society/2023/oct/05/government-adviser-leaves-...,article,society,Society,2023-10-05T21:33:23Z,Government rape adviser leaves role over ‘lack...,https://www.theguardian.com/society/2023/oct/0...,https://content.guardianapis.com/society/2023/...,False,pillar/news,News
7,society/2022/oct/16/crime-gangs-raking-in-mill...,article,society,Society,2022-10-16T15:30:39Z,Crime gangs raking in millions through support...,https://www.theguardian.com/society/2022/oct/1...,https://content.guardianapis.com/society/2022/...,False,pillar/news,News
8,society/2023/jan/01/police-in-england-and-wale...,article,society,Society,2023-01-01T07:00:41Z,Police in England and Wales to screen suspects...,https://www.theguardian.com/society/2023/jan/0...,https://content.guardianapis.com/society/2023/...,False,pillar/news,News
9,society/2023/oct/06/over-100000-children-in-en...,article,society,Society,2023-10-06T04:00:32Z,"Over 100,000 children in England and Wales ha...",https://www.theguardian.com/society/2023/oct/0...,https://content.guardianapis.com/society/2023/...,False,pillar/news,News


### Getting Additional Content
By default the API provides us a limited range of information. Dates, titles, section categories etc can be useful as analysable data, but we may want additional content such as...
- Keyword tags - Human provided classification of articles, useful for a range of analysis techniques including network analysis.
- Content body - The actual article text, useful for text analysis.
- Article word counts

Again, the procedure is the same, we just need to adjust our parameters.

In [24]:
parameters = {'api-key':GUARDIAN_KEY,
              'q':'crime',
              'page-size':200,
              'production-office':'uk',
              'lang':'en',
              'section':'news',
              'show-tags':'keyword',
              'show-fields':'body, byline, wordcount',
              }

response = requests.get(API_ENDPOINT, params=parameters)
guardian_data = response.json()['response']
results = pd.DataFrame(guardian_data['results'])
results

Unnamed: 0,id,type,sectionId,sectionName,webPublicationDate,webTitle,webUrl,apiUrl,fields,tags,isHosted,pillarId,pillarName
0,news/2022/dec/07/inside-guardian-weekly-9-dece...,article,news,News,2022-12-07T09:00:36Z,Crime scene: Inside the 9 December Guardian We...,https://www.theguardian.com/news/2022/dec/07/i...,https://content.guardianapis.com/news/2022/dec...,{'body': '<p>Six months have passed since the ...,"[{'id': 'environment/activism', 'type': 'keywo...",False,pillar/news,News
1,news/commentisfree/2023/mar/28/how-our-founder...,article,news,News,2023-03-28T14:00:02Z,How our founders' links to slavery change the ...,https://www.theguardian.com/news/commentisfree...,https://content.guardianapis.com/news/commenti...,{'body': '<p>I remember the moment. We were me...,"[{'id': 'world/slavery', 'type': 'keyword', 's...",False,pillar/news,News
2,news/audio/2023/jan/13/jailed-for-life-for-ste...,audio,news,News,2023-01-13T03:00:03Z,Jailed for life for stealing $14 - podcast,https://www.theguardian.com/news/audio/2023/ja...,https://content.guardianapis.com/news/audio/20...,"{'body': '<p>In California, if a person commit...","[{'id': 'us-news/us-justice-system', 'type': '...",False,pillar/news,News
3,news/2023/apr/12/three-alleged-assault-victims...,article,news,News,2023-04-12T22:59:26Z,Three alleged assault victims launch UK civil ...,https://www.theguardian.com/news/2023/apr/12/t...,https://content.guardianapis.com/news/2023/apr...,{'body': '<p>Three alleged victims of sexual a...,"[{'id': 'news/andrew-tate', 'type': 'keyword',...",False,pillar/news,News
4,news/2023/oct/08/corrections-and-clarifications1,article,news,News,2023-10-08T20:00:50Z,Corrections and clarifications,https://www.theguardian.com/news/2023/oct/08/c...,https://content.guardianapis.com/news/2023/oct...,"{'body': '<p>• On 7 October 2023, a live blog ...",[],False,pillar/news,News
...,...,...,...,...,...,...,...,...,...,...,...,...,...
195,news/2017/jun/29/corrections-and-clarifications,article,news,News,2017-06-29T20:00:19Z,Corrections and clarifications,https://www.theguardian.com/news/2017/jun/29/c...,https://content.guardianapis.com/news/2017/jun...,{'body': '<p>• A pullquote on an article about...,[],False,pillar/news,News
196,news/2017/may/24/corrections-and-clarifications,article,news,News,2017-05-24T20:00:08Z,Corrections and clarifications,https://www.theguardian.com/news/2017/may/24/c...,https://content.guardianapis.com/news/2017/may...,{'body': '<p>• An editorial that discussed the...,[],False,pillar/news,News
197,news/2017/sep/08/lauri-love-british-hacker-ano...,article,news,News,2017-09-08T05:00:30Z,Keyboard warrior: the British hacker fighting ...,https://www.theguardian.com/news/2017/sep/08/l...,https://content.guardianapis.com/news/2017/sep...,"{'body': '<p>In October 2013, Lauri Love was d...","[{'id': 'technology/hacking', 'type': 'keyword...",False,pillar/news,News
198,news/2017/jun/20/buried-alive-the-old-men-stuc...,article,news,News,2017-06-20T05:00:01Z,'Buried alive': the old men stuck in Britain’s...,https://www.theguardian.com/news/2017/jun/20/b...,https://content.guardianapis.com/news/2017/jun...,{'body': '<p>Dave was 13 when he got his first...,"[{'id': 'society/older-people', 'type': 'keywo...",False,pillar/news,News


Later we will look at exploring tags, which may lead you to want to focus on a specific tag. If we find a tag or two we want to focus on, we can add them to our query...

In [25]:
parameters = {'api-key':GUARDIAN_KEY,
              'q':'crime',
              'page-size':200,
              'production-office':'uk',
              'lang':'en',
              'section':'news',
              'show-tags':'keyword',
              'show-fields':'body, byline, wordcount',
              'tag':'society/drugs'
              }

response = requests.get(API_ENDPOINT, params=parameters)
guardian_data = response.json()['response']
results = pd.DataFrame(guardian_data['results'])
results

Unnamed: 0,id,type,sectionId,sectionName,webPublicationDate,webTitle,webUrl,apiUrl,fields,tags,isHosted,pillarId,pillarName
0,news/2022/oct/06/cartel-journalist-gangland-ki...,article,news,News,2022-10-06T05:00:28Z,"The cartel, the journalist and the gangland ki...",https://www.theguardian.com/news/2022/oct/06/c...,https://content.guardianapis.com/news/2022/oct...,"{'body': '<p>Towards the end of 2016, four yea...","[{'id': 'world/netherlands', 'type': 'keyword'...",False,pillar/news,News
1,news/2017/sep/28/the-father-who-went-undercove...,article,news,News,2017-09-28T05:00:29Z,The father who went undercover to find his son...,https://www.theguardian.com/news/2017/sep/28/t...,https://content.guardianapis.com/news/2017/sep...,"{'body': '<p>At 4.30am on 22 November 1995, ta...","[{'id': 'world/spain', 'type': 'keyword', 'sec...",False,pillar/news,News
2,news/2018/jul/31/an-unsolved-at-italys-most-no...,article,news,News,2018-07-31T05:00:45Z,An unsolved murder at Italy’s most notorious t...,https://www.theguardian.com/news/2018/jul/31/a...,https://content.guardianapis.com/news/2018/jul...,{'body': '<p>It was raining heavily on 28 Marc...,"[{'id': 'world/migration', 'type': 'keyword', ...",False,pillar/news,News
3,news/datablog/2014/nov/12/opium-harvest-afghan...,article,news,News,2014-11-12T12:33:00Z,Opium harvest in Afghanistan reaches record le...,https://www.theguardian.com/news/datablog/2014...,https://content.guardianapis.com/news/datablog...,{'body': '<p>The cultivation of opium in Afgha...,"[{'id': 'world/world', 'type': 'keyword', 'sec...",False,pillar/news,News
4,news/datablog/2014/aug/01/gay-men-have-sex-whe...,article,news,News,2014-08-01T14:09:33Z,Gay men have sex when drunk. <em>And</em … ?,https://www.theguardian.com/news/datablog/2014...,https://content.guardianapis.com/news/datablog...,{'body': '<p>Hold the front page. BBC Newsbeat...,"[{'id': 'world/lgbt-rights', 'type': 'keyword'...",False,pillar/news,News
5,news/datablog/2014/jul/24/drug-usage-in-englan...,article,news,News,2014-07-24T12:04:36Z,Drug usage in England and Wales 2013/14 - get ...,https://www.theguardian.com/news/datablog/2014...,https://content.guardianapis.com/news/datablog...,"{'body': '<p>There were <a href=""http://www.th...","[{'id': 'politics/drugspolicy', 'type': 'keywo...",False,pillar/news,News
6,news/datablog/2013/jul/25/britons-illegal-drug...,article,news,News,2013-07-25T14:26:00Z,How many Britons have taken illegal drugs and ...,https://www.theguardian.com/news/datablog/2013...,https://content.guardianapis.com/news/datablog...,{'body': '<p>The figures on UK drug use <a hre...,"[{'id': 'uk/uk', 'type': 'keyword', 'sectionId...",False,pillar/news,News
7,news/datablog/2012/jul/26/drug-use-england-wal...,article,news,News,2012-07-26T15:14:00Z,Drug use in England and Wales: is it under con...,https://www.theguardian.com/news/datablog/2012...,https://content.guardianapis.com/news/datablog...,"{'body': '<p>Drug use in England and Wales is,...","[{'id': 'society/drugs', 'type': 'keyword', 's...",False,pillar/news,News
8,news/datablog/2012/dec/28/top-data-stories-2012,article,news,News,2012-12-28T10:00:00Z,"2012: the year in data, journalism (and charts)",https://www.theguardian.com/news/datablog/2012...,https://content.guardianapis.com/news/datablog...,{'body': '<p>What were the headline figures of...,"[{'id': 'uk/uk', 'type': 'keyword', 'sectionId...",False,pillar/news,News
9,theobserver/2010/aug/15/decriminalising-drugs-...,article,news,News,2010-08-14T23:05:46Z,It's time to end the propaganda. Prohibition s...,https://www.theguardian.com/theobserver/2010/a...,https://content.guardianapis.com/theobserver/2...,{'body': '<p>The catastrophic global effects o...,"[{'id': 'society/drugs', 'type': 'keyword', 's...",False,pillar/news,News


### Collecting more than 200 items
The maximum number of items sent back in a single call to the API is 200. This can be quite a large number for some projects, but what if we wanted to get a larger sample so we could...
- Do an exhaustive search of all content on a specific topic
- See trends over time - if the topic is frequently discussed 200 results may only cover a very short period of time.
- See large scale patterns across topics.

In this instance we need to make multiple calls to the API and each set of results to our locally held data, however we need to make sure the API always sends us data that we don't already have. This is where we need to work with some of the extra information we get in our response that isn't the results themselves.

In [26]:
parameters = {'api-key':GUARDIAN_KEY,
              'q':'crime',
              'page-size':200}

response = requests.get(API_ENDPOINT,params=parameters)
guardian_data = response.json()['response']
guardian_data

{'status': 'ok',
 'userTier': 'developer',
 'total': 118137,
 'startIndex': 1,
 'pageSize': 200,
 'currentPage': 1,
 'pages': 591,
 'orderBy': 'relevance',
 'results': [{'id': 'books/2023/oct/30/and-thrillers-of-the-month-reviews-janice-hallett-christmas-appeal-jean-kwok-leftover-woman-richard-armitage-geneva-yrsa-sigurdardottir-the-prey',
   'type': 'article',
   'sectionId': 'books',
   'sectionName': 'Books',
   'webPublicationDate': '2023-10-30T12:30:13Z',
   'webTitle': 'Crime and thrillers of the month – reviews',
   'webUrl': 'https://www.theguardian.com/books/2023/oct/30/and-thrillers-of-the-month-reviews-janice-hallett-christmas-appeal-jean-kwok-leftover-woman-richard-armitage-geneva-yrsa-sigurdardottir-the-prey',
   'apiUrl': 'https://content.guardianapis.com/books/2023/oct/30/and-thrillers-of-the-month-reviews-janice-hallett-christmas-appeal-jean-kwok-leftover-woman-richard-armitage-geneva-yrsa-sigurdardottir-the-prey',
   'isHosted': False,
   'pillarId': 'pillar/arts',
   

The key information here is `total`, `pages`,`currentPage`.
- `total` tells us how many records there are matching our parameters.
- `pages` tells us how many pages of results there are available to us given that there are `page-size` number of results per-page.
- `currentPage` tells us what page of results we've just received.

We can ask the API for a specific page of results using the `page` parameter.

In [27]:
parameters = {'api-key':GUARDIAN_KEY,
              'q':'crime',
              'page-size':200,
              'page':2}

response = requests.get(API_ENDPOINT,params=parameters)
guardian_data = response.json()['response']
guardian_data

{'status': 'ok',
 'userTier': 'developer',
 'total': 118137,
 'startIndex': 201,
 'pageSize': 200,
 'currentPage': 2,
 'pages': 591,
 'orderBy': 'relevance',
 'results': [{'id': 'books/2023/jun/06/the-speculations-of-country-people-by-majella-kelly-review-crimes-and-punishment',
   'type': 'article',
   'sectionId': 'books',
   'sectionName': 'Books',
   'webPublicationDate': '2023-06-06T08:00:04Z',
   'webTitle': 'The Speculations of Country People by Majella Kelly review – crimes and punishment',
   'webUrl': 'https://www.theguardian.com/books/2023/jun/06/the-speculations-of-country-people-by-majella-kelly-review-crimes-and-punishment',
   'apiUrl': 'https://content.guardianapis.com/books/2023/jun/06/the-speculations-of-country-people-by-majella-kelly-review-crimes-and-punishment',
   'isHosted': False,
   'pillarId': 'pillar/arts',
   'pillarName': 'Arts'},
  {'id': 'australia-news/2023/nov/01/sydney-wahroonga-boy-10-dies-st-lucys-school-for-students-with-disabilities-trapped-lift',

The most direct way to gather multiple pages of data then is to...
- Make a call to the API
- Store the results in a list.
- Increment the value of `page` by 1
- Repeat...
- Eventually hit a maximum number of pages we set, or run out of data.

Initially you will need to make one request to the API to see how much data could be available to you, and then base your max number of pages etc on that information.


In [28]:
# Let's just discuss the logic of how we handle the data collection here before we actually implement the real collection
from time import sleep


current_page = 1 # The page number we're requesting from the API. We start with page 1.
available_pages = 1 # We don't necessarily know how many pages the API call will be providing until we make our first call.

failsafe_pages = 5 # However many pages are available, we'll set our absolute limit to 5



# here we use a while loop that runs the code over and over until the expression is false

while (current_page <= available_pages) and (current_page <= failsafe_pages):
    parameters['page'] = current_page

    # We would do our data collection here
    print(parameters)

    # Here we pretend the API told us there were 124 pages available to us.
    available_pages = 124

    # We increment the value of current_page by 1
    current_page += 1

    # sleep stops our script for 1 second - we do this so we don't overload the Guardian's servers
    sleep(1)

{'api-key': '8e4e2cda-63eb-4538-b50a-c1e1d8192e5e', 'q': 'crime', 'page-size': 200, 'page': 1}
{'api-key': '8e4e2cda-63eb-4538-b50a-c1e1d8192e5e', 'q': 'crime', 'page-size': 200, 'page': 2}
{'api-key': '8e4e2cda-63eb-4538-b50a-c1e1d8192e5e', 'q': 'crime', 'page-size': 200, 'page': 3}
{'api-key': '8e4e2cda-63eb-4538-b50a-c1e1d8192e5e', 'q': 'crime', 'page-size': 200, 'page': 4}
{'api-key': '8e4e2cda-63eb-4538-b50a-c1e1d8192e5e', 'q': 'crime', 'page-size': 200, 'page': 5}


In [51]:
from time import sleep

parameters = {'api-key':GUARDIAN_KEY,
              'q': 'AI',
              'page-size':200,
              'show-tags':'keyword',
              'show-fields':'body,byline,wordcount'} #NOOOOO SPAAAAACCCEEESS

current_page = 1
available_pages = 1

failsafe_pages = 10

all_results = []

while (current_page <= available_pages) and (current_page <= failsafe_pages):
    parameters['page'] = current_page

    response = requests.get(API_ENDPOINT, params=parameters)
    guardian_data = response.json()['response']
    results = guardian_data['results']
    all_results += results

    available_pages = guardian_data['pages']
    print(f'Collected page {current_page} of {available_pages}')
    current_page += 1
    sleep(1)

Collected page 1 of 39
Collected page 2 of 39
Collected page 3 of 39
Collected page 4 of 39
Collected page 5 of 39
Collected page 6 of 39
Collected page 7 of 39
Collected page 8 of 39
Collected page 9 of 39
Collected page 10 of 39


In [52]:
df = pd.DataFrame(all_results)
df

Unnamed: 0,id,type,sectionId,sectionName,webPublicationDate,webTitle,webUrl,apiUrl,fields,tags,isHosted,pillarId,pillarName
0,technology/2023/oct/31/educators-teachers-ai-l...,article,technology,Technology,2023-10-31T10:00:39Z,‘Is this an appropriate use of AI or not?’: te...,https://www.theguardian.com/technology/2023/oc...,https://content.guardianapis.com/technology/20...,"{'byline': 'Johana Bhuiyan', 'body': '<p>In <a...","[{'id': 'technology/technology', 'type': 'keyw...",False,pillar/news,News
1,technology/ng-interactive/2023/oct/25/a-day-in...,interactive,technology,Technology,2023-10-25T13:38:11Z,A day in the life of AI,https://www.theguardian.com/technology/ng-inte...,https://content.guardianapis.com/technology/ng...,{'byline': 'Hannah Devlin Science Corresponden...,"[{'id': 'technology/artificialintelligenceai',...",False,pillar/news,News
2,technology/2023/oct/24/alphabet-q3-earnings-go...,article,technology,Technology,2023-10-24T22:07:37Z,Google Cloud revenue misses expectations despi...,https://www.theguardian.com/technology/2023/oc...,https://content.guardianapis.com/technology/20...,"{'byline': 'Kari Paul', 'body': '<p>Google is ...","[{'id': 'technology/alphabet', 'type': 'keywor...",False,pillar/news,News
3,stage/2023/sep/19/anthropology-review-hampstea...,article,stage,Stage,2023-09-19T12:02:55Z,Anthropology review – clever AI missing-person...,https://www.theguardian.com/stage/2023/sep/19/...,https://content.guardianapis.com/stage/2023/se...,"{'byline': 'Mark Lawson', 'body': '<p>While sc...","[{'id': 'stage/stage', 'type': 'keyword', 'sec...",False,pillar/arts,Arts
4,film/2023/aug/20/tim-review-clunky-ai-paranoia...,article,film,Film,2023-08-20T10:30:44Z,TIM review – clunky AI paranoia thriller,https://www.theguardian.com/film/2023/aug/20/t...,https://content.guardianapis.com/film/2023/aug...,"{'byline': 'Wendy Ide', 'body': '<p>This styli...","[{'id': 'film/thriller', 'type': 'keyword', 's...",False,pillar/arts,Arts
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1995,world/2020/dec/07/mohsen-fakhrizadeh-iran-says...,article,world,World news,2020-12-07T15:28:50Z,Iran says AI and ‘satellite-controlled’ gun us...,https://www.theguardian.com/world/2020/dec/07/...,https://content.guardianapis.com/world/2020/de...,{'byline': 'Patrick Wintour Diplomatic editor'...,"[{'id': 'world/iran', 'type': 'keyword', 'sect...",False,pillar/news,News
1996,world/2023/jun/16/first-edition-boris-johnson-...,article,world,World news,2023-06-16T05:36:59Z,Friday briefing: The 106-page report that says...,https://www.theguardian.com/world/2023/jun/16/...,https://content.guardianapis.com/world/2023/ju...,"{'byline': 'Nimo Omer', 'body': '<p>Good morni...",[],False,pillar/news,News
1997,world/2023/jul/27/thursday-briefing-coutts-nat...,article,world,World news,2023-07-27T05:42:18Z,Thursday briefing: What the Coutts/Farage row ...,https://www.theguardian.com/world/2023/jul/27/...,https://content.guardianapis.com/world/2023/ju...,"{'byline': 'Nimo Omer', 'body': '<p>Good morni...","[{'id': 'politics/nigel-farage', 'type': 'keyw...",False,pillar/news,News
1998,stage/2020/dec/04/a-festival-of-korean-dance-2...,article,stage,Stage,2020-12-04T20:30:21Z,AI and K-pop at the festival of Korean movers ...,https://www.theguardian.com/stage/2020/dec/04/...,https://content.guardianapis.com/stage/2020/de...,"{'byline': 'Lyndsey Winship', 'body': '<p>One ...","[{'id': 'stage/dance', 'type': 'keyword', 'sec...",False,pillar/arts,Arts


In [53]:
df.to_json('my_guardian_articles.json')