### Guardian API access

You will need to apply for a free developer key in order to use the API fully within Jupyter:

[Guardian Open Platform - Getting started](https://open-platform.theguardian.com/access/)

You can explore what is possible with the API here:

[Guardian Open Platform - explore](https://open-platform.theguardian.com/explore/)

In [28]:
#import required libraries
import requests
import json
import re
import time

In [29]:
#load your personal API key
with open('private/guardian_key.txt', 'r') as file:
    key = file.read().strip()
len(key)

36

In [30]:
# Build a search URL
base_url = 'https://content.guardianapis.com/'
#search_string = '"Advance Queensland"'
#search_string = '"Queensland Government" OR "QLD Government" OR "QLD Gov"'
#search_string = '(Queensland OR QLD) AND (Rural OR Regional)'
search_string = 'Queensland OR QLD'
type_string = "article"
production_office = "aus"
from_date = "2017-01-01"

#full_url = base_url+f"search?q={search_string}&production-office={production_office}&from-date={from_date}&show-fields=body&api-key={key}"
full_url = base_url+f"search?q={search_string}&type={type_string}&production-office={production_office}&from-date={from_date}&show-fields=body&api-key={key}" # TYPE SET IN ORDER TO KEEP OUT THE "AS IT HAPPENS" blogs which cover multiple topics.

#url = baseUrl+'"'+searchString+'"'+'&production-office='+production_office+'&from-date='+fromDate+'&api-key='+key
print(full_url[:120])

https://content.guardianapis.com/search?q="Queensland Government" OR "QLD Government" OR "QLD Gov"&type=article&producti


In [31]:
# get data from server
server_response = requests.get(full_url)
server_data = server_response.json()
resp_data = server_data.get('response','')
if resp_data == '':
    print("ERROR obtaining results:",server_data)
else:
    print("SUCCESS!")
    print(f"{resp_data['total']} results found available in {resp_data['pages']} pages")
    print(f"{resp_data['pageSize']} results per page")
    results = resp_data.get('results',[])
    

SUCCESS!
1063 results found available in 107 pages
10 results per page


In [32]:
results[1]

{'id': 'australia-news/article/2024/jun/15/queensland-government-accused-of-cowing-to-christian-lobby-on-anti-discrimination-bill',
 'type': 'article',
 'sectionId': 'australia-news',
 'sectionName': 'Australia news',
 'webPublicationDate': '2024-06-14T15:00:56Z',
 'webTitle': 'Queensland government accused of cowing to Christian Lobby on anti-discrimination bill',
 'webUrl': 'https://www.theguardian.com/australia-news/article/2024/jun/15/queensland-government-accused-of-cowing-to-christian-lobby-on-anti-discrimination-bill',
 'apiUrl': 'https://content.guardianapis.com/australia-news/article/2024/jun/15/queensland-government-accused-of-cowing-to-christian-lobby-on-anti-discrimination-bill',
 'fields': {'body': '<p>Queensland’s human rights commissioner, Scott McDougall, has said he is “deeply disappointed” and “at a loss to understand why” the state Labor government reneged on its promise to overhaul the state’s Anti-Discrimination Act.</p> <p>As <a href="https://www.theguardian.com/a

In [33]:
num_pages = resp_data['pages']
num_pages

107

In [34]:
def articles_from_page_results(page_results):
    articles = {}
    for result in page_results:
        article_date = result['webPublicationDate']
        article_title = result['webTitle']+f" [{article_date}]"
        article_html = result['fields']['body']
        article_text = re.sub(r'<.*?>','',article_html)
        articles[article_title] = article_text
    return articles

In [35]:
def get_all_articles_for_response(response_json,full_url):
    total_pages = response_json['pages']
    total_articles = response_json['total']
    print(f"Fetching {total_articles} articles from {total_pages} pages...")
    all_articles = {}
    page1_articles = articles_from_page_results(response_json['results'])
    all_articles.update(page1_articles)
    print("Added articles for page: 1")
    
    for page in range(2,total_pages+1):
        print("Getting articles from API for page:",page)
        page_response = requests.get(full_url+f"&page={page}")
        page_data = page_response.json()['response']
        print("Processing results for page:",page_data['currentPage']) # Not sure why this line was giving me errors sometimes
        page_articles = articles_from_page_results(page_data['results'])
        print(f"Fetched {len(page_articles)} articles.")
        all_articles.update(page_articles)
        print("Added articles for page:",page)
        print(f"Status: {len(all_articles)} articles.")
        time.sleep(1) # make sure we're not hitting the API to hard
    
    print(f"FINISHED: Fetched {len(all_articles)} articles.")
    return all_articles


In [37]:
my_articles = get_all_articles_for_response(resp_data,full_url)

Fetching 1063 articles from 107 pages...
Added articles for page: 1
Getting articles from API for page: 2
Processing results for page: 2
Fetched 10 articles.
Added articles for page: 2
Status: 20 articles.
Getting articles from API for page: 3
Processing results for page: 3
Fetched 10 articles.
Added articles for page: 3
Status: 30 articles.
Getting articles from API for page: 4
Processing results for page: 4
Fetched 10 articles.
Added articles for page: 4
Status: 40 articles.
Getting articles from API for page: 5
Processing results for page: 5
Fetched 10 articles.
Added articles for page: 5
Status: 50 articles.
Getting articles from API for page: 6
Processing results for page: 6
Fetched 10 articles.
Added articles for page: 6
Status: 60 articles.
Getting articles from API for page: 7
Processing results for page: 7
Fetched 10 articles.
Added articles for page: 7
Status: 70 articles.
Getting articles from API for page: 8
Processing results for page: 8
Fetched 10 articles.
Added articles

In [38]:
print("Total Articles:",len(my_articles))
for title,text in my_articles.items():
    print(title)

Total Articles: 1063
‘Harrowing’ footage sparks calls for Queensland government to remove children from police watch houses [2024-07-18T15:00:15Z]
Queensland government accused of cowing to Christian Lobby on anti-discrimination bill [2024-06-14T15:00:56Z]
Queensland government hoses down suggestions it is considering bailout for Bonza [2024-05-10T08:26:48Z]
Queensland government accused of failing to provide adequate schooling to locked up children [2024-04-11T15:00:28Z]
‘There has to be a way’: Queensland government working to reunite Molly the magpie with family, premier says [2024-03-28T06:26:13Z]
Queensland government to moving to establish peak youth justice body as crime issues dominate [2023-11-16T07:34:45Z]
Queensland government urged to intervene after police staff in racist recordings go unpunished [2023-09-12T15:07:02Z]
Townsville mayor Troy Thompson’s military service claims under investigation by watchdog [2024-04-29T07:22:54Z]
Queensland government tells landlords to nam

In [40]:
file_path = "data/"
file_name = "qld_gov_articles.json"

with open(f"{file_path}{file_name}",'w', encoding='utf-8') as fp:
    fp.write(json.dumps(my_articles))