# Load data about articles from the Guardian Media Group API

## 1. Extract information using the Guardian Media Group API

More information about using the Guardian Media Group API can be found in [Documentation](https://open-platform.theguardian.com/documentation/) 

To access the API, it is needed to [sign up for anAPI key](https://open-platform.theguardian.com/access/), which should be sent with every request.

I generated an API key and saved it in api.cfg file. 

Import required libraries:

In [1]:
import configparser
import json
from datetime import datetime, timedelta, date

In [2]:
import requests

In [3]:
import pandas as pd

Import API key:

In [4]:
config =  configparser.ConfigParser()
config.read('api.cfg')

['api.cfg']

Create query for extracting data from the Guardian Media Group API, which extacts data from 2021-01-01 to today:

In [5]:
from_date = '2022-07-01'
to_date = '2022-11-10'

In [6]:
url_querry = 'https://content.guardianapis.com/search?' \
    + 'api-key=' + config['API']['KEY'] + '&' \
    + 'from-date=' + from_date + '&' \
    + 'to-date=' + to_date + '&' \
    + 'type=' + 'article' + '&' \
    + 'show-tags=keyword' + '&' \
    + 'order-by=' + 'oldest'

Get response from API and save as JSON object.

In [7]:
response = requests.get(url_querry)
data_json = response.json()

Get number of pages of query:

In [8]:
data_json['response']['pages']

2766

In [9]:
data_json['response']

{'status': 'ok',
 'userTier': 'developer',
 'total': 27651,
 'startIndex': 1,
 'pageSize': 10,
 'currentPage': 1,
 'pages': 2766,
 'orderBy': 'oldest',
 'results': [{'id': 'crosswords/2022/jul/01/azed-slip-2612',
   'type': 'article',
   'sectionId': 'crosswords',
   'sectionName': 'Crosswords',
   'webPublicationDate': '2022-07-01T00:00:00Z',
   'webTitle': 'Azed slip No 2,612',
   'webUrl': 'https://www.theguardian.com/crosswords/2022/jul/01/azed-slip-2612',
   'apiUrl': 'https://content.guardianapis.com/crosswords/2022/jul/01/azed-slip-2612',
   'tags': [{'id': 'crosswords/crosswords',
     'type': 'keyword',
     'sectionId': 'crosswords',
     'sectionName': 'Crosswords',
     'webTitle': 'Crosswords',
     'webUrl': 'https://www.theguardian.com/crosswords/crosswords',
     'apiUrl': 'https://content.guardianapis.com/crosswords/crosswords',
     'references': [],
     'description': 'Play the latest Guardian Crosswords online. Free quick and cryptic crossword puzzles daily for you

Convert first page of response to dataframe `df_response`:

In [10]:
df_response = pd.DataFrame(data_json['response']['results'])

In [11]:
df_response.head()

Unnamed: 0,id,type,sectionId,sectionName,webPublicationDate,webTitle,webUrl,apiUrl,tags,isHosted,pillarId,pillarName
0,crosswords/2022/jul/01/azed-slip-2612,article,crosswords,Crosswords,2022-07-01T00:00:00Z,"Azed slip No 2,612",https://www.theguardian.com/crosswords/2022/ju...,https://content.guardianapis.com/crosswords/20...,"[{'id': 'crosswords/crosswords', 'type': 'keyw...",False,pillar/lifestyle,Lifestyle
1,seek-the-new-world-of-work-/2022/jul/01/should...,article,seek-the-new-world-of-work-,Seek: The new world of work,2022-07-01T00:35:35Z,Should you accept a counter-offer?,https://www.theguardian.com/seek-the-new-world...,https://content.guardianapis.com/seek-the-new-...,[],False,,
2,us-news/2022/jun/30/california-single-use-plas...,article,us-news,US news,2022-07-01T00:45:44Z,California passes first sweeping US law to red...,https://www.theguardian.com/us-news/2022/jun/3...,https://content.guardianapis.com/us-news/2022/...,"[{'id': 'us-news/us-news', 'type': 'keyword', ...",False,pillar/news,News
3,media/commentisfree/2022/jul/01/qantas-ditches...,article,media,Media,2022-07-01T02:04:54Z,Qantas ditches Sky News from airport lounges a...,https://www.theguardian.com/media/commentisfre...,https://content.guardianapis.com/media/comment...,"[{'id': 'media/australia-media', 'type': 'keyw...",False,pillar/news,News
4,media/2022/jul/01/how-the-abc-bid-farewell-to-...,article,media,Media,2022-07-01T02:57:57Z,How the ABC bid farewell to ‘famously unflappa...,https://www.theguardian.com/media/2022/jul/01/...,https://content.guardianapis.com/media/2022/ju...,[{'id': 'media/australian-broadcasting-corpora...,False,pillar/news,News


Get data from the second page to max number of pages from response and save these data to dataframe `df_response`:

In [12]:
count_of_pages = data_json['response']['pages']
for page in range(2,count_of_pages+1):
    response = requests.get(url_querry  + '&' + 'page=' + str(page))
    data_json = response.json()
    df_page = pd.DataFrame(data_json['response']['results'])
    frames = [df_response, df_page]
    result = pd.concat(frames)
    df_response = result
    print('Processed {}/{} - {:2.2%}'.format(page,count_of_pages,page/count_of_pages))

Processed 2/2766 - 0.07%
Processed 3/2766 - 0.11%
Processed 4/2766 - 0.14%
Processed 5/2766 - 0.18%
Processed 6/2766 - 0.22%
Processed 7/2766 - 0.25%
Processed 8/2766 - 0.29%
Processed 9/2766 - 0.33%
Processed 10/2766 - 0.36%
Processed 11/2766 - 0.40%
Processed 12/2766 - 0.43%
Processed 13/2766 - 0.47%
Processed 14/2766 - 0.51%
Processed 15/2766 - 0.54%
Processed 16/2766 - 0.58%
Processed 17/2766 - 0.61%
Processed 18/2766 - 0.65%
Processed 19/2766 - 0.69%
Processed 20/2766 - 0.72%
Processed 21/2766 - 0.76%
Processed 22/2766 - 0.80%
Processed 23/2766 - 0.83%
Processed 24/2766 - 0.87%
Processed 25/2766 - 0.90%
Processed 26/2766 - 0.94%
Processed 27/2766 - 0.98%
Processed 28/2766 - 1.01%
Processed 29/2766 - 1.05%
Processed 30/2766 - 1.08%
Processed 31/2766 - 1.12%
Processed 32/2766 - 1.16%
Processed 33/2766 - 1.19%
Processed 34/2766 - 1.23%
Processed 35/2766 - 1.27%
Processed 36/2766 - 1.30%
Processed 37/2766 - 1.34%
Processed 38/2766 - 1.37%
Processed 39/2766 - 1.41%
Processed 40/2766 - 

Processed 309/2766 - 11.17%
Processed 310/2766 - 11.21%
Processed 311/2766 - 11.24%
Processed 312/2766 - 11.28%
Processed 313/2766 - 11.32%
Processed 314/2766 - 11.35%
Processed 315/2766 - 11.39%
Processed 316/2766 - 11.42%
Processed 317/2766 - 11.46%
Processed 318/2766 - 11.50%
Processed 319/2766 - 11.53%
Processed 320/2766 - 11.57%
Processed 321/2766 - 11.61%
Processed 322/2766 - 11.64%
Processed 323/2766 - 11.68%
Processed 324/2766 - 11.71%
Processed 325/2766 - 11.75%
Processed 326/2766 - 11.79%
Processed 327/2766 - 11.82%
Processed 328/2766 - 11.86%
Processed 329/2766 - 11.89%
Processed 330/2766 - 11.93%
Processed 331/2766 - 11.97%
Processed 332/2766 - 12.00%
Processed 333/2766 - 12.04%
Processed 334/2766 - 12.08%
Processed 335/2766 - 12.11%
Processed 336/2766 - 12.15%
Processed 337/2766 - 12.18%
Processed 338/2766 - 12.22%
Processed 339/2766 - 12.26%
Processed 340/2766 - 12.29%
Processed 341/2766 - 12.33%
Processed 342/2766 - 12.36%
Processed 343/2766 - 12.40%
Processed 344/2766 -

Processed 602/2766 - 21.76%
Processed 603/2766 - 21.80%
Processed 604/2766 - 21.84%
Processed 605/2766 - 21.87%
Processed 606/2766 - 21.91%
Processed 607/2766 - 21.95%
Processed 608/2766 - 21.98%
Processed 609/2766 - 22.02%
Processed 610/2766 - 22.05%
Processed 611/2766 - 22.09%
Processed 612/2766 - 22.13%
Processed 613/2766 - 22.16%
Processed 614/2766 - 22.20%
Processed 615/2766 - 22.23%
Processed 616/2766 - 22.27%
Processed 617/2766 - 22.31%
Processed 618/2766 - 22.34%
Processed 619/2766 - 22.38%
Processed 620/2766 - 22.42%
Processed 621/2766 - 22.45%
Processed 622/2766 - 22.49%
Processed 623/2766 - 22.52%
Processed 624/2766 - 22.56%
Processed 625/2766 - 22.60%
Processed 626/2766 - 22.63%
Processed 627/2766 - 22.67%
Processed 628/2766 - 22.70%
Processed 629/2766 - 22.74%
Processed 630/2766 - 22.78%
Processed 631/2766 - 22.81%
Processed 632/2766 - 22.85%
Processed 633/2766 - 22.89%
Processed 634/2766 - 22.92%
Processed 635/2766 - 22.96%
Processed 636/2766 - 22.99%
Processed 637/2766 -

Processed 895/2766 - 32.36%
Processed 896/2766 - 32.39%
Processed 897/2766 - 32.43%
Processed 898/2766 - 32.47%
Processed 899/2766 - 32.50%
Processed 900/2766 - 32.54%
Processed 901/2766 - 32.57%
Processed 902/2766 - 32.61%
Processed 903/2766 - 32.65%
Processed 904/2766 - 32.68%
Processed 905/2766 - 32.72%
Processed 906/2766 - 32.75%
Processed 907/2766 - 32.79%
Processed 908/2766 - 32.83%
Processed 909/2766 - 32.86%
Processed 910/2766 - 32.90%
Processed 911/2766 - 32.94%
Processed 912/2766 - 32.97%
Processed 913/2766 - 33.01%
Processed 914/2766 - 33.04%
Processed 915/2766 - 33.08%
Processed 916/2766 - 33.12%
Processed 917/2766 - 33.15%
Processed 918/2766 - 33.19%
Processed 919/2766 - 33.22%
Processed 920/2766 - 33.26%
Processed 921/2766 - 33.30%
Processed 922/2766 - 33.33%
Processed 923/2766 - 33.37%
Processed 924/2766 - 33.41%
Processed 925/2766 - 33.44%
Processed 926/2766 - 33.48%
Processed 927/2766 - 33.51%
Processed 928/2766 - 33.55%
Processed 929/2766 - 33.59%
Processed 930/2766 -

Processed 1182/2766 - 42.73%
Processed 1183/2766 - 42.77%
Processed 1184/2766 - 42.81%
Processed 1185/2766 - 42.84%
Processed 1186/2766 - 42.88%
Processed 1187/2766 - 42.91%
Processed 1188/2766 - 42.95%
Processed 1189/2766 - 42.99%
Processed 1190/2766 - 43.02%
Processed 1191/2766 - 43.06%
Processed 1192/2766 - 43.09%
Processed 1193/2766 - 43.13%
Processed 1194/2766 - 43.17%
Processed 1195/2766 - 43.20%
Processed 1196/2766 - 43.24%
Processed 1197/2766 - 43.28%
Processed 1198/2766 - 43.31%
Processed 1199/2766 - 43.35%
Processed 1200/2766 - 43.38%
Processed 1201/2766 - 43.42%
Processed 1202/2766 - 43.46%
Processed 1203/2766 - 43.49%
Processed 1204/2766 - 43.53%
Processed 1205/2766 - 43.56%
Processed 1206/2766 - 43.60%
Processed 1207/2766 - 43.64%
Processed 1208/2766 - 43.67%
Processed 1209/2766 - 43.71%
Processed 1210/2766 - 43.75%
Processed 1211/2766 - 43.78%
Processed 1212/2766 - 43.82%
Processed 1213/2766 - 43.85%
Processed 1214/2766 - 43.89%
Processed 1215/2766 - 43.93%
Processed 1216

Processed 1465/2766 - 52.96%
Processed 1466/2766 - 53.00%
Processed 1467/2766 - 53.04%
Processed 1468/2766 - 53.07%
Processed 1469/2766 - 53.11%
Processed 1470/2766 - 53.15%
Processed 1471/2766 - 53.18%
Processed 1472/2766 - 53.22%
Processed 1473/2766 - 53.25%
Processed 1474/2766 - 53.29%
Processed 1475/2766 - 53.33%
Processed 1476/2766 - 53.36%
Processed 1477/2766 - 53.40%
Processed 1478/2766 - 53.43%
Processed 1479/2766 - 53.47%
Processed 1480/2766 - 53.51%
Processed 1481/2766 - 53.54%
Processed 1482/2766 - 53.58%
Processed 1483/2766 - 53.62%
Processed 1484/2766 - 53.65%
Processed 1485/2766 - 53.69%
Processed 1486/2766 - 53.72%
Processed 1487/2766 - 53.76%
Processed 1488/2766 - 53.80%
Processed 1489/2766 - 53.83%
Processed 1490/2766 - 53.87%
Processed 1491/2766 - 53.90%
Processed 1492/2766 - 53.94%
Processed 1493/2766 - 53.98%
Processed 1494/2766 - 54.01%
Processed 1495/2766 - 54.05%
Processed 1496/2766 - 54.09%
Processed 1497/2766 - 54.12%
Processed 1498/2766 - 54.16%
Processed 1499

Processed 1748/2766 - 63.20%
Processed 1749/2766 - 63.23%
Processed 1750/2766 - 63.27%
Processed 1751/2766 - 63.30%
Processed 1752/2766 - 63.34%
Processed 1753/2766 - 63.38%
Processed 1754/2766 - 63.41%
Processed 1755/2766 - 63.45%
Processed 1756/2766 - 63.49%
Processed 1757/2766 - 63.52%
Processed 1758/2766 - 63.56%
Processed 1759/2766 - 63.59%
Processed 1760/2766 - 63.63%
Processed 1761/2766 - 63.67%
Processed 1762/2766 - 63.70%
Processed 1763/2766 - 63.74%
Processed 1764/2766 - 63.77%
Processed 1765/2766 - 63.81%
Processed 1766/2766 - 63.85%
Processed 1767/2766 - 63.88%
Processed 1768/2766 - 63.92%
Processed 1769/2766 - 63.96%
Processed 1770/2766 - 63.99%
Processed 1771/2766 - 64.03%
Processed 1772/2766 - 64.06%
Processed 1773/2766 - 64.10%
Processed 1774/2766 - 64.14%
Processed 1775/2766 - 64.17%
Processed 1776/2766 - 64.21%
Processed 1777/2766 - 64.24%
Processed 1778/2766 - 64.28%
Processed 1779/2766 - 64.32%
Processed 1780/2766 - 64.35%
Processed 1781/2766 - 64.39%
Processed 1782

Processed 2031/2766 - 73.43%
Processed 2032/2766 - 73.46%
Processed 2033/2766 - 73.50%
Processed 2034/2766 - 73.54%
Processed 2035/2766 - 73.57%
Processed 2036/2766 - 73.61%
Processed 2037/2766 - 73.64%
Processed 2038/2766 - 73.68%
Processed 2039/2766 - 73.72%
Processed 2040/2766 - 73.75%
Processed 2041/2766 - 73.79%
Processed 2042/2766 - 73.83%
Processed 2043/2766 - 73.86%
Processed 2044/2766 - 73.90%
Processed 2045/2766 - 73.93%
Processed 2046/2766 - 73.97%
Processed 2047/2766 - 74.01%
Processed 2048/2766 - 74.04%
Processed 2049/2766 - 74.08%
Processed 2050/2766 - 74.11%
Processed 2051/2766 - 74.15%
Processed 2052/2766 - 74.19%
Processed 2053/2766 - 74.22%
Processed 2054/2766 - 74.26%
Processed 2055/2766 - 74.30%
Processed 2056/2766 - 74.33%
Processed 2057/2766 - 74.37%
Processed 2058/2766 - 74.40%
Processed 2059/2766 - 74.44%
Processed 2060/2766 - 74.48%
Processed 2061/2766 - 74.51%
Processed 2062/2766 - 74.55%
Processed 2063/2766 - 74.58%
Processed 2064/2766 - 74.62%
Processed 2065

Processed 2314/2766 - 83.66%
Processed 2315/2766 - 83.69%
Processed 2316/2766 - 83.73%
Processed 2317/2766 - 83.77%
Processed 2318/2766 - 83.80%
Processed 2319/2766 - 83.84%
Processed 2320/2766 - 83.88%
Processed 2321/2766 - 83.91%
Processed 2322/2766 - 83.95%
Processed 2323/2766 - 83.98%
Processed 2324/2766 - 84.02%
Processed 2325/2766 - 84.06%
Processed 2326/2766 - 84.09%
Processed 2327/2766 - 84.13%
Processed 2328/2766 - 84.16%
Processed 2329/2766 - 84.20%
Processed 2330/2766 - 84.24%
Processed 2331/2766 - 84.27%
Processed 2332/2766 - 84.31%
Processed 2333/2766 - 84.35%
Processed 2334/2766 - 84.38%
Processed 2335/2766 - 84.42%
Processed 2336/2766 - 84.45%
Processed 2337/2766 - 84.49%
Processed 2338/2766 - 84.53%
Processed 2339/2766 - 84.56%
Processed 2340/2766 - 84.60%
Processed 2341/2766 - 84.63%
Processed 2342/2766 - 84.67%
Processed 2343/2766 - 84.71%
Processed 2344/2766 - 84.74%
Processed 2345/2766 - 84.78%
Processed 2346/2766 - 84.82%
Processed 2347/2766 - 84.85%
Processed 2348

Processed 2597/2766 - 93.89%
Processed 2598/2766 - 93.93%
Processed 2599/2766 - 93.96%
Processed 2600/2766 - 94.00%
Processed 2601/2766 - 94.03%
Processed 2602/2766 - 94.07%
Processed 2603/2766 - 94.11%
Processed 2604/2766 - 94.14%
Processed 2605/2766 - 94.18%
Processed 2606/2766 - 94.22%
Processed 2607/2766 - 94.25%
Processed 2608/2766 - 94.29%
Processed 2609/2766 - 94.32%
Processed 2610/2766 - 94.36%
Processed 2611/2766 - 94.40%
Processed 2612/2766 - 94.43%
Processed 2613/2766 - 94.47%
Processed 2614/2766 - 94.50%
Processed 2615/2766 - 94.54%
Processed 2616/2766 - 94.58%
Processed 2617/2766 - 94.61%
Processed 2618/2766 - 94.65%
Processed 2619/2766 - 94.69%
Processed 2620/2766 - 94.72%
Processed 2621/2766 - 94.76%
Processed 2622/2766 - 94.79%
Processed 2623/2766 - 94.83%
Processed 2624/2766 - 94.87%
Processed 2625/2766 - 94.90%
Processed 2626/2766 - 94.94%
Processed 2627/2766 - 94.97%
Processed 2628/2766 - 95.01%
Processed 2629/2766 - 95.05%
Processed 2630/2766 - 95.08%
Processed 2631

KeyError: 'results'

In [13]:
data_json

{'response': {'status': 'error',
  'message': 'requested page is beyond the number of available pages'}}

In [14]:
df_response = df_response.reset_index()

In [15]:
df_response = df_response.drop(columns=['index'])

In [16]:
df_response['id_num'] = range(1, len(df_response) + 1)

## 2. Save general information about article

In [17]:
df_article = df_response[['id_num',
               'id', 
               'type', 
               'sectionId',
               'sectionName',
               'webPublicationDate',
               'webTitle',
               'isHosted',
               'pillarId',
               'pillarName'
              ]]

In [18]:
df_article.head()

Unnamed: 0,id_num,id,type,sectionId,sectionName,webPublicationDate,webTitle,isHosted,pillarId,pillarName
0,1,crosswords/2022/jul/01/azed-slip-2612,article,crosswords,Crosswords,2022-07-01T00:00:00Z,"Azed slip No 2,612",False,pillar/lifestyle,Lifestyle
1,2,seek-the-new-world-of-work-/2022/jul/01/should...,article,seek-the-new-world-of-work-,Seek: The new world of work,2022-07-01T00:35:35Z,Should you accept a counter-offer?,False,,
2,3,us-news/2022/jun/30/california-single-use-plas...,article,us-news,US news,2022-07-01T00:45:44Z,California passes first sweeping US law to red...,False,pillar/news,News
3,4,media/commentisfree/2022/jul/01/qantas-ditches...,article,media,Media,2022-07-01T02:04:54Z,Qantas ditches Sky News from airport lounges a...,False,pillar/news,News
4,5,media/2022/jul/01/how-the-abc-bid-farewell-to-...,article,media,Media,2022-07-01T02:57:57Z,How the ABC bid farewell to ‘famously unflappa...,False,pillar/news,News


In [19]:
name_csv = 'guardian-article-data_start-date-' + from_date + '_end-date-' + to_date + '_general.csv'

In [20]:
df_article.to_csv(name_csv,index=False,sep=';')

## 3. Save tag information

In [21]:
df_tags = df_response[['id_num',
               'id',
               'tags'
              ]]

In [22]:
df_tags.head()

Unnamed: 0,id_num,id,tags
0,1,crosswords/2022/jul/01/azed-slip-2612,"[{'id': 'crosswords/crosswords', 'type': 'keyw..."
1,2,seek-the-new-world-of-work-/2022/jul/01/should...,[]
2,3,us-news/2022/jun/30/california-single-use-plas...,"[{'id': 'us-news/us-news', 'type': 'keyword', ..."
3,4,media/commentisfree/2022/jul/01/qantas-ditches...,"[{'id': 'media/australia-media', 'type': 'keyw..."
4,5,media/2022/jul/01/how-the-abc-bid-farewell-to-...,[{'id': 'media/australian-broadcasting-corpora...


In [23]:
pd.DataFrame(df_tags.iloc[3]['tags'])

Unnamed: 0,id,type,sectionId,sectionName,webTitle,webUrl,apiUrl,references,description
0,media/australia-media,keyword,media,Media,Australian media,https://www.theguardian.com/media/australia-media,https://content.guardianapis.com/media/austral...,[],Latest news and analysis on Australian media
1,australia-news/australia-news,keyword,australia-news,Australia news,Australia news,https://www.theguardian.com/australia-news/aus...,https://content.guardianapis.com/australia-new...,[],
2,media/news-corporation,keyword,media,Media,News Corporation,https://www.theguardian.com/media/news-corpora...,https://content.guardianapis.com/media/news-co...,[],
3,media/sky-news-australia,keyword,media,Media,Sky News Australia,https://www.theguardian.com/media/sky-news-aus...,https://content.guardianapis.com/media/sky-new...,[],
4,business/qantas,keyword,business,Business,Qantas,https://www.theguardian.com/business/qantas,https://content.guardianapis.com/business/qantas,[],Latest news about airline company Qantas from ...
5,media/australian-broadcasting-corporation,keyword,media,Media,Australian Broadcasting Corporation,https://www.theguardian.com/media/australian-b...,https://content.guardianapis.com/media/austral...,[],Latest news and analysis on the Australian Bro...
6,media/media,keyword,media,Media,Media,https://www.theguardian.com/media/media,https://content.guardianapis.com/media/media,[],
7,media/channel-nine,keyword,media,Media,Nine Entertainment,https://www.theguardian.com/media/channel-nine,https://content.guardianapis.com/media/channel...,[],
8,media/channel-ten,keyword,media,Media,Channel Ten,https://www.theguardian.com/media/channel-ten,https://content.guardianapis.com/media/channel...,[],
9,tv-and-radio/neighbours,keyword,tv-and-radio,Television & radio,Neighbours,https://www.theguardian.com/tv-and-radio/neigh...,https://content.guardianapis.com/tv-and-radio/...,[],


In [24]:
name_json = 'guardian-article-data_start-date-' + from_date + '_end-date-' + to_date + '_pre-tags.json'

In [25]:
df_tags.to_json(name_json,orient="records")

In [26]:
with open(name_json,'r') as f:
    data = json.loads(f.read())

In [27]:
df_tag = pd.json_normalize(data, 
                           record_path =['tags'], 
                           meta =['id_num','id'],
                           record_prefix='tag_')

In [28]:
df_tag = df_tag[['id_num',
               'id',
               'tag_id',
               'tag_sectionName',
               'tag_webTitle',
               'tag_description'
              ]]

In [29]:
df_tag.head()

Unnamed: 0,id_num,id,tag_id,tag_sectionName,tag_webTitle,tag_description
0,1,crosswords/2022/jul/01/azed-slip-2612,crosswords/crosswords,Crosswords,Crosswords,Play the latest Guardian Crosswords online. Fr...
1,3,us-news/2022/jun/30/california-single-use-plas...,us-news/us-news,US news,US news,
2,3,us-news/2022/jun/30/california-single-use-plas...,us-news/california,US news,California,
3,3,us-news/2022/jun/30/california-single-use-plas...,environment/plastic,Environment,Plastics,
4,3,us-news/2022/jun/30/california-single-use-plas...,environment/plastic-bags,Environment,Plastic bags,


In [30]:
name_csv = 'guardian-article-data_start-date-' + from_date + '_end-date-' + to_date + '_tags.csv'

In [31]:
df_tag.to_csv(name_csv,index=False,sep=';')

In [32]:
df_tag[df_tag['tag_webTitle'] == 'Justin Trudeau']

Unnamed: 0,id_num,id,tag_id,tag_sectionName,tag_webTitle,tag_description
61252,10145,world/2022/aug/19/world-leaders-dancefloor-vid...,world/justin-trudeau,World news,Justin Trudeau,The latest news and comment on Justin Trudeau
61645,10209,world/2022/aug/19/indigenous-woman-canada-supr...,world/justin-trudeau,World news,Justin Trudeau,The latest news and comment on Justin Trudeau
65113,10779,world/2022/aug/22/germany-chancellor-visits-ca...,world/justin-trudeau,World news,Justin Trudeau,The latest news and comment on Justin Trudeau
86337,14303,world/2022/sep/08/canada-queen-elizabeth-death...,world/justin-trudeau,World news,Justin Trudeau,The latest news and comment on Justin Trudeau
86353,14306,uk-news/2022/sep/08/world-leaders-pay-tribute-...,world/justin-trudeau,World news,Justin Trudeau,The latest news and comment on Justin Trudeau
93008,15448,world/2022/sep/14/canada-queen-elizabeth-feder...,world/justin-trudeau,World news,Justin Trudeau,The latest news and comment on Justin Trudeau
95213,15837,politics/2022/sep/16/liz-truss-to-meet-joe-bid...,world/justin-trudeau,World news,Justin Trudeau,The latest news and comment on Justin Trudeau
99764,16630,politics/2022/sep/20/justin-trudeau-team-defen...,world/justin-trudeau,World news,Justin Trudeau,The latest news and comment on Justin Trudeau
121887,20183,world/2022/oct/06/trudeau-ice-hockey-canada-se...,world/justin-trudeau,World news,Justin Trudeau,The latest news and comment on Justin Trudeau
157467,25994,world/2022/nov/03/ontario-doug-ford-strike-fin...,world/justin-trudeau,World news,Justin Trudeau,The latest news and comment on Justin Trudeau
