# New York Times API
## Teng, Lance Ricco L.

This notebook access the New York Times API using `pynytimes` to access all articles from March 11 to 12.

### Install libraries

In [1]:
%pip install pynytimes
%pip install python-dotenv

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


### Import libraries

In [2]:
from dotenv import load_dotenv # To load the values found in the .env file
load_dotenv()

from pynytimes import NYTAPI
from datetime import datetime
import pandas as pd

### Initialize API

In [3]:
nyt = NYTAPI(os.environ['API_KEY'], parse_dates=True) # Initalize the API with the key found in the .env file

Get all articles from the month of March. The schema of the metadata function includes the following:
- date (`pub_date`)
- title (`headline`)
- author (`lead_paragraph`)
- body (`snippet`)

With that, we can get all the data needed to output to the `.json` file.


In [4]:
archive = nyt.archive_metadata(
    date = datetime(2021, 3, 11) 
    # While we include 11 as the date, the API gets articles from the entire month of March 2021. This operation will subsequently take a while to execute.
)

In [5]:
archive_df = pd.DataFrame(archive) # Convert the resulting .json to a DataFrame for easier access.

In [6]:
archive_df.head()

Unnamed: 0,abstract,web_url,snippet,lead_paragraph,source,multimedia,headline,keywords,pub_date,document_type,news_desk,section_name,byline,type_of_material,_id,word_count,uri,print_section,print_page,subsection_name
0,A quick guide to watching the ceremony.,https://www.nytimes.com/2021/02/28/movies/how-...,A quick guide to watching the ceremony.,When the 78th annual Golden Globes are handed ...,The New York Times,"[{'rank': 0, 'subtype': 'xlarge', 'caption': N...",{'main': 'How to watch the Golden Globes on TV...,[],2021-03-01 00:01:56+00:00,article,Culture,Movies,"{'original': 'By Sarah Bahr', 'person': [{'fir...",News,nyt://article/f1320567-f5d3-53f1-a2ee-ab273a7c...,120,nyt://article/f1320567-f5d3-53f1-a2ee-ab273a7c...,,,
1,"The Hollywood Foreign Press Association, which...",https://www.nytimes.com/2021/02/28/movies/what...,,"The Hollywood Foreign Press Association, which...",The New York Times,"[{'rank': 0, 'subtype': 'xlarge', 'caption': N...",{'main': 'What will win tonight? Here are our ...,[],2021-03-01 00:01:57+00:00,article,Culture,Movies,"{'original': '', 'person': [], 'organization':...",News,nyt://article/3fa9372d-b7fa-5167-9fa9-2256ded1...,268,nyt://article/3fa9372d-b7fa-5167-9fa9-2256ded1...,,,
2,Daniel Kaluuya won for his performance as the ...,https://www.nytimes.com/2021/02/28/movies/dani...,Daniel Kaluuya won for his performance as the ...,As soon as nominations were announced on Feb. ...,The New York Times,"[{'rank': 0, 'subtype': 'xlarge', 'caption': N...",{'main': 'Daniel Kaluuya and Chadwick Boseman ...,"[{'name': 'subject', 'value': 'Black People', ...",2021-03-01 00:02:02+00:00,article,Culture,Movies,{'original': 'By Brooks Barnes and Nicole Sper...,News,nyt://article/e42fc97e-9363-5d70-b0d3-b3263980...,408,nyt://article/e42fc97e-9363-5d70-b0d3-b3263980...,,,
3,Gov. Andrew Cuomo issued a statement on Sunday...,https://www.nytimes.com/2021/02/28/nyregion/fu...,Gov. Andrew Cuomo issued a statement on Sunday...,STATEMENT FROM GOVERNOR ANDREW M. CUOMO,The New York Times,[],{'main': 'Full Text of Cuomo’s Statement in Re...,"[{'name': 'persons', 'value': 'Cuomo, Andrew M...",2021-03-01 00:07:43+00:00,article,Metro,New York,"{'original': '', 'person': [], 'organization':...",News,nyt://article/77a499f5-ce4a-5eeb-9ca2-c9ad7981...,279,nyt://article/77a499f5-ce4a-5eeb-9ca2-c9ad7981...,A,17.0,
4,Discovery’s new app has taken off largely beca...,https://www.nytimes.com/2021/02/28/business/me...,Discovery’s new app has taken off largely beca...,"“Ninety Day Fiancé” is, on some Sunday nights,...",The New York Times,"[{'rank': 0, 'subtype': 'xlarge', 'caption': N...",{'main': 'Forget ‘Succession.’ You Can Watch ‘...,"[{'name': 'subject', 'value': 'Mobile Applicat...",2021-03-01 00:15:07+00:00,article,Business,Business Day,"{'original': 'By Ben Smith', 'person': [{'firs...",News,nyt://article/c3d48cdf-ca57-5904-b421-dc7dabf3...,1795,nyt://article/c3d48cdf-ca57-5904-b421-dc7dabf3...,B,1.0,Media


Now, we can filter the dataset to include *all articles from March 11, 00:00 to March 12, 11:59*. This can be representented below.

In [7]:
filtered_df = archive_df[(archive_df['pub_date'] > '2021-03-11') & (archive_df['pub_date'] < '2021-03-13') & (archive_df['document_type'] == 'article')][['pub_date', 'lead_paragraph', 'headline', 'byline']]

The above code also filters the columns to only include the specified attributes.

In [8]:
filtered_df

Unnamed: 0,pub_date,lead_paragraph,headline,byline
1577,2021-03-11 00:16:03+00:00,"Although first published in 2017 in Poland, Ol...","{'main': 'A Child’s-Eye View, Both Haunted and...","{'original': 'By Hillary Chute', 'person': [{'..."
1579,2021-03-11 00:20:47+00:00,WASHINGTON — The Biden administration publishe...,{'main': 'U.S. Releases New Covid-19 Guidance ...,"{'original': 'By Noah Weiland', 'person': [{'f..."
1580,2021-03-11 00:34:58+00:00,WASHINGTON — The Biden administration took ste...,"{'main': 'Facing Pressure, Biden Administratio...",{'original': 'By Zolan Kanno-Youngs and Catie ...
1581,2021-03-11 00:36:15+00:00,"After an absence of 16 years, the National Hoc...",{'main': 'N.H.L. Returns to ESPN in a 7-Year D...,"{'original': 'By Kevin Draper', 'person': [{'f..."
1582,2021-03-11 00:50:52+00:00,The first episode of “The Test Kitchen” was wi...,{'main': 'What Really Happened at ‘Reply All’?...,{'original': 'By Katherine Rosman and Reggie U...
...,...,...,...,...
1964,2021-03-12 23:09:10+00:00,WASHINGTON — A last-minute change in the $1.9 ...,{'main': 'A Last-Minute Add to Stimulus Bill C...,"{'original': 'By Alan Rappeport', 'person': [{..."
1965,2021-03-12 23:12:19+00:00,So much of this newsletter can be doom and glo...,{'main': 'Coronavirus Briefing: What Happened ...,"{'original': 'By Jonathan Wolfe', 'person': [{..."
1966,2021-03-12 23:25:01+00:00,(Want to get this newsletter in your inbox? He...,"{'main': 'Andrew Cuomo, Global Vaccines, Dayli...","{'original': 'By Whet Moser, Amelia Nierenberg..."
1967,2021-03-12 23:26:22+00:00,"Bolivia’s former interim president, Jeanine Añ...",{'main': 'Former Bolivian Leader Is Arrested f...,"{'original': 'By Julie Turkewitz', 'person': [..."


In [50]:
test = filtered_df['byline'][1577]['original'][3:]
test

'Hillary Chute'

### Format data into specified object as stated in specifications

In [59]:
nyt_json = []

In [60]:
for index, row in filtered_df.iterrows():
    article = {}
    article['date'] = str(row['pub_date'])
    article['lead_paragraph'] = row['lead_paragraph']
    article['title'] = row['headline']['main']
    article['authors'] = row['byline']['original'][3:]
    # Some of the data in `person` is missing, so instead `original` is used to get the author
    # We begin from character index 3 to remove "By " from the byline    
    nyt_json.append(article)

In [61]:
nyt_json

0',
  'lead_paragraph': 'JOHANNESBURG — Goodwill Zwelithini ka Bhekuzulu, the king of South Africa’s Zulu nation, who shepherded his people from the apartheid era into a modern democratic society, died on Friday in the eastern coastal city of Durban. He was 72.',
  'title': 'Goodwill Zwelithini ka Bhekuzulu, King of the Zulu Nation, Dies at 72',
  'authors': 'Lynsey Chutel'},
 {'date': '2021-03-12 18:23:13+00:00',
  'lead_paragraph': 'For all its challenges to mental health, the past year also put psychological science to the test, and in particular one of its most consoling truths: that age and emotional well-being tend to increase together, as a rule, even as mental acuity and physical health taper off.',
  'title': 'In the last year, older adults tended to be more positive than younger ones, surveys show.',
  'authors': 'Benedict Carey'},
 {'date': '2021-03-12 18:28:02+00:00',
  'lead_paragraph': 'This article is part of the On Tech newsletter. You can sign up here to receive it wee

### Save as `.JSON` file

In [62]:
import json
import os

filepath = ''

In [63]:
with open(os.path.join(filepath, 'nyt.json'), 'w', encoding='utf-8') as f:
    json.dump(nyt_json, f, ensure_ascii=False, indent=4)