# New York Times API Scraping Python Code
## How the media viewed Jewish institutions' responses to COVID-19

From Daniel A.:
"A colleague and I are trying to study how the media viewed Jewish institutions' responses to COVID19. We're interested in articles that use words like Jewish, minyan, COVID19, and/or institution. We've been doing interviews with the leaders of a large number of synagogues and are also putting together a quantitative dataset on changes in their policies during COVID19, and we hope that media sources will tell us about how society has viewed their choices."

In [1]:
# This code blocks imports various libraries required for web-scraping, basic statistical 
# framing, and outputing data.

import requests # API library
import json # APIs output as a .json file, so this let's us interact with it
import math # Statistical package
import matplotlib.pyplot as plt # Data visualization package
import time # This lets us insert a time element to not overload the NYT API and get banned
import pandas # Pandas are essentially spreadsheets for Python
import csv # Lets us output data as a .csv file

In [2]:
key = 'Kmm8z8GUeXauZ9y53D86ycMmPcOtDpOD' # This is my NYT API code, do not distribute.

In [3]:
# We're using the NYTime article search API, documentation here:
# https://developer.nytimes.com/docs/articlesearch-product/1/overview

base_url = "https://api.nytimes.com/svc/search/v2/articlesearch.json"

In [173]:
# These are the parameters we're using for our searches

search_params = {'q': 'jewish', # The keyword we're searching for
                 'api-key' : key, # The API key above as a variable
                 'begin_date': '20200301', # The beginning date of our query
                 'fq' : 'headline : ("covid")'} 

In [174]:
# This is our search variable using the requests library above

r = requests.get(base_url, params=search_params)

In [175]:
# These lines of code convert our search results and then tells us the total hits.

response_text = r.text # Converts our query response to text and makes it a variable
data = json.loads(response_text) # Converts our response text to a .json as a variable
data['response']['meta']['hits'] # Our total number of hits

20

In [176]:
# This calculates our number of pages (the NYT API only returns 10 hits per page).

maxpages = round((data['response']['meta']['hits'] + 0.5) / 10)
maxpages

2

In [177]:
# Next we will download the articles themselves
# This code block automates our download process by going through the pages.

data = [] # This is an empty list variable for holding our results.
page = 0 # Our starting page
for i in range(maxpages): # We create a loop here to automate our process
    search_params = {'q': 'jewish',
                     'api-key': key,
                     'begin_date': '20200301',
                     'page': page,
                     'fq' : 'headline : ("covid")'} # Similar search parameters as above but with pages.
    r = requests.get(base_url, params=search_params) # Request variable
    print("Page Number:", page, "OK!") # This will show us "OK!" if successful
    page += 1 # Adds 1 to our page number
    datatemp = json.loads(r.text) # Temporarily loads response as .json in variable.
    data.append(datatemp) # Appends temporary .json onto our blank list variable
    time.sleep(6) # Pauses for 6 seconds before performing the loop again.

Page Number: 0 OK!
Page Number: 1 OK!


In [178]:
# Now we will convert our data to a spreadsheet (Pandas) to view the results.

data2 = [] # Creates a new empty list variable
for i in data: 
    data2.extend(i['response']['docs'])
jc = pandas.DataFrame(data2)
jc

Unnamed: 0,_id,abstract,byline,document_type,headline,keywords,lead_paragraph,multimedia,news_desk,print_page,print_section,pub_date,section_name,snippet,source,subsection_name,type_of_material,uri,web_url,word_count
0,nyt://article/bd3c3b53-61d4-55c6-92a6-ab79590a...,The city needs to do a better job at getting i...,"{'original': 'By Ginia Bellafante', 'person': ...",article,{'main': 'When Covid Flared Again in Orthodox ...,"[{'name': 'subject', 'value': 'Coronavirus (20...",“DO NOT test your child for Covid.”,"[{'rank': 0, 'subtype': 'xlarge', 'caption': N...",Metropolitan,3.0,MB,2020-10-05T16:54:24+0000,New York,The city needs to do a better job at getting i...,The New York Times,,News,nyt://article/bd3c3b53-61d4-55c6-92a6-ab79590a...,https://www.nytimes.com/2020/10/05/nyregion/or...,1461
1,nyt://article/adb4aaf4-702b-58d4-aa8b-2d4c29f6...,"“This was amazingly irresponsible,” Mayor Bill...","{'original': 'By Liam Stack', 'person': [{'fir...",article,"{'main': '$15,000 Fine After Secret Hasidic We...","[{'name': 'subject', 'value': 'Hasidism', 'ran...",Thousands of ultra-Orthodox Jews gathered to c...,"[{'rank': 0, 'subtype': 'xlarge', 'caption': N...",Metro,8.0,A,2020-11-24T18:08:42+0000,New York,"“This was amazingly irresponsible,” Mayor Bill...",The New York Times,,News,nyt://article/adb4aaf4-702b-58d4-aa8b-2d4c29f6...,https://www.nytimes.com/2020/11/24/nyregion/wi...,871
2,nyt://article/ca2d0898-59ad-59e5-b922-4d7da5ce...,An octopus guiding a ship. Trump perched on a ...,"{'original': 'By Amanda Rosa', 'person': [{'fi...",article,"{'main': 'Unpacking Cuomo’s Covid-19 Poster', ...","[{'name': 'glocations', 'value': 'New York Cit...",[Want to get New York Today by email? Here’s ...,"[{'rank': 0, 'subtype': 'xlarge', 'caption': N...",Metro,,,2020-07-16T09:41:54+0000,New York,An octopus guiding a ship. Trump perched on a ...,The New York Times,,briefing,nyt://article/ca2d0898-59ad-59e5-b922-4d7da5ce...,https://www.nytimes.com/2020/07/16/nyregion/an...,1231
3,nyt://article/821f2759-8123-5e83-8f9d-373428d1...,President Trump acknowledged that an authorize...,"{'original': None, 'person': [], 'organization...",article,"{'main': 'After Criticism, C.D.C. Reverses Cov...","[{'name': 'subject', 'value': 'Coronavirus (20...",[Live Trump tests positive for Covid-19 update...,"[{'rank': 0, 'subtype': 'xlarge', 'caption': N...",Foreign,,,2020-09-18T09:07:46+0000,World,President Trump acknowledged that an authorize...,The New York Times,,News,nyt://article/821f2759-8123-5e83-8f9d-373428d1...,https://www.nytimes.com/2020/09/18/world/covid...,3590
4,nyt://article/869824ac-e7c3-5d76-8e35-a8b14c98...,How much freedom should front-line clinicians ...,"{'original': 'By Susan Dominus', 'person': [{'...",article,{'main': 'The Covid Drug Wars That Pitted Doct...,"[{'name': 'subject', 'value': 'Clinical Trials...",To hear more audio stories from publishers lik...,"[{'rank': 0, 'subtype': 'xlarge', 'caption': N...",Magazine,20.0,MM,2020-08-05T09:00:14+0000,Magazine,How much freedom should front-line clinicians ...,The New York Times,,News,nyt://article/869824ac-e7c3-5d76-8e35-a8b14c98...,https://www.nytimes.com/2020/08/05/magazine/co...,6403
5,nyt://article/33ecbeaa-1d77-5112-82bb-3fe4864a...,At least six U.S. states reported records for ...,"{'original': None, 'person': [], 'organization...",article,{'main': 'Under 10 Percent of Americans Have C...,"[{'name': 'subject', 'value': 'Coronavirus (20...",[Live Trump tests positive for Covid-19 update...,"[{'rank': 0, 'subtype': 'xlarge', 'caption': N...",Foreign,,,2020-09-25T09:31:31+0000,World,At least six U.S. states reported records for ...,The New York Times,,News,nyt://article/33ecbeaa-1d77-5112-82bb-3fe4864a...,https://www.nytimes.com/2020/09/25/world/covid...,5089
6,nyt://article/4cea833d-9d7b-5b06-82e9-4912288a...,New York is trying something novel for an Amer...,"{'original': 'By J. David Goodman', 'person': ...",article,{'main': 'How a New Virus Lockdown Turned New ...,"[{'name': 'subject', 'value': 'Coronavirus Reo...","At the Rego Center, a small mall in Queens, ha...","[{'rank': 0, 'subtype': 'xlarge', 'caption': N...",Metro,1.0,A,2020-10-16T07:00:08+0000,New York,New York is trying something novel for an Amer...,The New York Times,,News,nyt://article/4cea833d-9d7b-5b06-82e9-4912288a...,https://www.nytimes.com/2020/10/16/nyregion/ne...,1509
7,nyt://article/a88151e3-79d1-596b-afdd-22b3bb80...,"When virus patients died, New York hospitals w...",{'original': 'By Andrea Salcedo and Christophe...,article,{'main': 'These Are the Heartbreaking Belongin...,"[{'name': 'subject', 'value': 'Coronavirus (20...","Rafael Eli, 68, stopped breathing in the early...",[],Metro,6.0,A,2020-06-29T09:00:15+0000,New York,"When virus patients died, New York hospitals w...",The New York Times,,News,nyt://article/a88151e3-79d1-596b-afdd-22b3bb80...,https://www.nytimes.com/2020/06/29/nyregion/co...,1286
8,nyt://article/d0285f64-2dfd-5690-a7f5-e500b043...,Health experts say the treatment offered by a ...,"{'original': 'By Isabel Kershner', 'person': [...",article,"{'main': 'Badly Ill With Coronavirus, Some Ult...","[{'name': 'subject', 'value': 'Coronavirus (20...",JERUSALEM — When the elderly father of an ultr...,"[{'rank': 0, 'subtype': 'xlarge', 'caption': N...",Foreign,5.0,A,2020-10-16T13:05:18+0000,World,Health experts say the treatment offered by a ...,The New York Times,Middle East,News,nyt://article/d0285f64-2dfd-5690-a7f5-e500b043...,https://www.nytimes.com/2020/10/16/world/middl...,1356
9,nyt://article/5dffe949-2c46-5ce4-9173-067027fe...,"In two early studies, researchers said some pa...","{'original': 'By Lina Zeldovich', 'person': [{...",article,{'main': 'Some Signs of Recovery From Severe C...,"[{'name': 'subject', 'value': 'Ventilators (Me...",When Annie Coissieux tried to stand up for the...,"[{'rank': 0, 'subtype': 'xlarge', 'caption': N...",Science,8.0,D,2020-10-18T09:00:13+0000,Health,"In two early studies, researchers said some pa...",The New York Times,,News,nyt://article/5dffe949-2c46-5ce4-9173-067027fe...,https://www.nytimes.com/2020/10/18/health/Covi...,1467


In [179]:
# This line outputs our Pandas spreadsheet to a .csv

jc.to_csv('Output\Covid-JewishHeadline.csv')

### Automated Query for Search Term Combinations

In [181]:
# This is a list of our search terms - we can easily add more.

searchterms = [['covid', 'body : ("jewish")'],
               ['covid', 'headline : ("jewish")'],
               ['covid', 'body : ("minyan")'],
               ['jewish', 'body : ("minyan")'],
               ['jewish', 'body : ("institution")'],
               ['minyan', 'body : ("covid")'],
               ['jewish', 'body : ("covid")'],
               ['jewish', 'headline :("covid")'],
               ['jewish', 'body : ("institution") AND body : ("covid")']]

In [182]:
# This code block automates our search through the various searchterms above

datamaster = []
data = []
for x in searchterms:
    page = 0
    search_params = {'q': x[0],
                     'api-key' : key,
                     'begin_date' : 20200301,
                     'page' : page,
                     'fq' : x[1]}
    r = requests.get(base_url, params = search_params)
    datatemp = json.loads(r.text)
    data.append(datatemp)
    maxpages = round((datatemp['response']['meta']['hits'] + 0.5) / 10)
    print(x, "Page Number:", page, "Ok!")
    page += 1
    time.sleep(6)
    while page <= maxpages:
        search_params = {'q': x[0],
                         'api-key' : key,
                         'begin_date' : 20200301,
                         'page' : page,
                         'fq' : x[1]}
        r = requests.get(base_url, params = search_params)
        datatemp = json.loads(r.text)
        data.append(datatemp)
        print(x, "Page #:", page, "Ok!")
        page += 1
        time.sleep(6)
    datamaster.append([data])
    data = []

['covid', 'body : ("jewish")'] Page Number: 0 Ok!
['covid', 'body : ("jewish")'] Page #: 1 Ok!
['covid', 'body : ("jewish")'] Page #: 2 Ok!
['covid', 'body : ("jewish")'] Page #: 3 Ok!
['covid', 'body : ("jewish")'] Page #: 4 Ok!
['covid', 'body : ("jewish")'] Page #: 5 Ok!
['covid', 'body : ("jewish")'] Page #: 6 Ok!
['covid', 'body : ("jewish")'] Page #: 7 Ok!
['covid', 'body : ("jewish")'] Page #: 8 Ok!
['covid', 'body : ("jewish")'] Page #: 9 Ok!
['covid', 'body : ("jewish")'] Page #: 10 Ok!
['covid', 'body : ("jewish")'] Page #: 11 Ok!
['covid', 'body : ("jewish")'] Page #: 12 Ok!
['covid', 'body : ("jewish")'] Page #: 13 Ok!
['covid', 'body : ("jewish")'] Page #: 14 Ok!
['covid', 'body : ("jewish")'] Page #: 15 Ok!
['covid', 'body : ("jewish")'] Page #: 16 Ok!
['covid', 'body : ("jewish")'] Page #: 17 Ok!
['covid', 'body : ("jewish")'] Page #: 18 Ok!
['covid', 'body : ("jewish")'] Page #: 19 Ok!
['covid', 'body : ("jewish")'] Page #: 20 Ok!
['covid', 'body : ("jewish")'] Page #: 

In [183]:
# This converts our search results into an individual variable for each search term pair.

cbj = datamaster[0]
chj = datamaster[1]
cbm = datamaster[2]
jbm = datamaster[3]
jbi = datamaster[4]
mbc = datamaster[5]
jbc = datamaster[6]
jhc = datamaster[7]
jbibc = datamaster[8]

In [184]:
# This is a function that creates a pandas dataframe.

def extender(y):
    z = []
    for i in y[0]:
        z.extend(i['response']['docs'])
    dfz = pandas.DataFrame(z)
    return dfz

In [185]:
# This code block runs the function above for each search term pair.

cbj2 = extender(cbj)
chj2 = extender(chj)
cbm2 = extender(cbm)
jbm2 = extender(jbm)
jbi2 = extender(jbi)
mbc2 = extender(mbc)
jbc2 = extender(jbc)
jhc2 = extender(jhc)
jbibc2 = extender(jbibc)

In [188]:
# This code block saves all of our .csv files.

cbj2.to_csv('Output\Covid-JewishBody.csv')
chj2.to_csv('Output\Covid-JewishHeadline.csv')
cbm2.to_csv('Output\Covid-MinyanBody.csv')
jbm2.to_csv('Output\Jewish-MinyanBody.csv')
jbi2.to_csv('Output\Jewish-InstitutionBody.csv')
mbc2.to_csv('Output\Minyan-CovidBody.csv')
jbc2.to_csv('Output\Jewish-CovidBody.csv')
jhc2.to_csv('Output\Jewish-CovidHeadline.csv')
jbibc2.to_csv('Output\Jewish-InstitutionANDCovidBody.csv')

In [232]:
# Finally we concactenate all of dataframes and drop the duplicates, leaving only unique stories.

master = pandas.concat([cbj2, chj2, cbm2, jbm2, jbi2, mbc2, jbc2, jhc2, jbibc2], ignore_index = True, sort = False)
uniquestories = master.drop_duplicates('_id')
uniquestories.to_csv(r'Output\UniqueStories.csv')
uniquestories

Unnamed: 0,_id,abstract,byline,document_type,headline,keywords,lead_paragraph,multimedia,news_desk,print_page,print_section,pub_date,section_name,snippet,source,subsection_name,type_of_material,uri,web_url,word_count
0,nyt://article/adb4aaf4-702b-58d4-aa8b-2d4c29f6...,"“This was amazingly irresponsible,” Mayor Bill...","{'original': 'By Liam Stack', 'person': [{'fir...",article,"{'main': '$15,000 Fine After Secret Hasidic We...","[{'name': 'subject', 'value': 'Hasidism', 'ran...",Thousands of ultra-Orthodox Jews gathered to c...,"[{'rank': 0, 'subtype': 'xlarge', 'caption': N...",Metro,8,A,2020-11-24T18:08:42+0000,New York,"“This was amazingly irresponsible,” Mayor Bill...",The New York Times,,News,nyt://article/adb4aaf4-702b-58d4-aa8b-2d4c29f6...,https://www.nytimes.com/2020/11/24/nyregion/wi...,871
1,nyt://interactive/391ce2dd-6ac6-5c9a-ac6b-078a...,In a world where life changed seemingly overni...,"{'original': 'By The New York Times', 'person'...",multimedia,{'main': 'A Year Like No Other: 2020 in Pictur...,"[{'name': 'subject', 'value': 'Photography', '...",In a world where life changed seemingly overni...,"[{'rank': 0, 'subtype': 'xlarge', 'caption': N...",World,,,2020-12-09T03:45:12+0000,World,In a world where life changed seemingly overni...,The New York Times,,Interactive Feature,nyt://interactive/391ce2dd-6ac6-5c9a-ac6b-078a...,https://www.nytimes.com/interactive/2020/world...,0
2,nyt://article/b0faff56-4a8f-5343-95a9-3814546d...,"Ana Bowens and Benjamin Schenk, who met at Dar...","{'original': 'By Tammy La Gorce', 'person': [{...",article,{'main': 'That New York Vibe and a Life in San...,"[{'name': 'subject', 'value': 'Weddings and En...","By now, 2020 wedding guests have read the boil...","[{'rank': 0, 'subtype': 'xlarge', 'caption': N...",Styles,,,2020-12-11T10:00:22+0000,Fashion & Style,"Ana Bowens and Benjamin Schenk, who met at Dar...",The New York Times,Weddings,News,nyt://article/b0faff56-4a8f-5343-95a9-3814546d...,https://www.nytimes.com/2020/12/11/fashion/wed...,1969
3,nyt://article/5b4efbee-0dc9-557a-9ca9-1248d416...,A pandemic relief program allows no forgivenes...,"{'original': 'By Gillian Friedman', 'person': ...",article,"{'main': 'Jobless Benefits Saved Them, Until S...","[{'name': 'subject', 'value': 'Unemployment In...",Unemployment payments that looked like a lifel...,"[{'rank': 0, 'subtype': 'xlarge', 'caption': N...",Business,,,2020-12-11T10:00:22+0000,Business Day,A pandemic relief program allows no forgivenes...,The New York Times,Economy,News,nyt://article/5b4efbee-0dc9-557a-9ca9-1248d416...,https://www.nytimes.com/2020/12/11/business/ec...,1409
4,nyt://article/4cea079e-c16c-5b64-938d-36290745...,Religious Muslims in many nations are finding ...,"{'original': 'By George Yancy', 'person': [{'f...",article,"{'main': 'Of Death and Consequences', 'kicker'...","[{'name': 'subject', 'value': 'Religion and Be...",This month’s conversation in our series on how...,"[{'rank': 0, 'subtype': 'xlarge', 'caption': N...",OpEd,,,2020-12-08T10:00:12+0000,Opinion,Religious Muslims in many nations are finding ...,The New York Times,,Op-Ed,nyt://article/4cea079e-c16c-5b64-938d-36290745...,https://www.nytimes.com/2020/12/08/opinion/isl...,2435
5,nyt://article/445a8ca1-131d-5a88-b2a6-0c3692c1...,"Checking in with the blues bassist, the homele...","{'original': 'By Ginia Bellafante', 'person': ...",article,{'main': 'We Met Them in the Depths of the Pan...,"[{'name': 'subject', 'value': 'Coronavirus (20...","At the outset of the pandemic, I met many New ...","[{'rank': 0, 'subtype': 'xlarge', 'caption': N...",Metropolitan,3,MB,2020-12-04T10:00:20+0000,New York,"Checking in with the blues bassist, the homele...",The New York Times,,News,nyt://article/445a8ca1-131d-5a88-b2a6-0c3692c1...,https://www.nytimes.com/2020/12/04/nyregion/co...,1385
6,nyt://article/23b9268f-3da6-525e-bd01-d6ef5c45...,Will she join the Supreme Court’s grievance co...,"{'original': 'By Linda Greenhouse', 'person': ...",article,"{'main': 'Justice Amy Coney Barrett’s Choice',...","[{'name': 'organizations', 'value': 'Supreme C...",Justice Amy Coney Barrett had a choice.,"[{'rank': 0, 'subtype': 'xlarge', 'caption': N...",OpEd,,,2020-12-03T10:00:13+0000,Opinion,Will she join the Supreme Court’s grievance co...,The New York Times,,Op-Ed,nyt://article/23b9268f-3da6-525e-bd01-d6ef5c45...,https://www.nytimes.com/2020/12/03/opinion/amy...,1459
7,nyt://article/e087eaf8-c052-531b-ac35-3e844b94...,The location of the New York group’s annual ev...,"{'original': 'By Dana Rubinstein', 'person': [...",article,"{'main': 'Young Republicans Stage Secret Gala,...","[{'name': 'subject', 'value': 'Parties (Social...","On Thursday evening, the New York Young Republ...","[{'rank': 0, 'subtype': 'xlarge', 'caption': N...",Metro,22,A,2020-12-03T23:44:04+0000,New York,The location of the New York group’s annual ev...,The New York Times,,News,nyt://article/e087eaf8-c052-531b-ac35-3e844b94...,https://www.nytimes.com/2020/12/03/nyregion/yo...,1168
8,nyt://article/a5444668-1aef-5566-bea4-1da77d13...,The balance between Covid-19 precautions and c...,{'original': 'By Michael W. McConnell and Max ...,article,{'main': 'The Supreme Court Was Right to Block...,"[{'name': 'subject', 'value': 'Coronavirus (20...",The Supreme Court last week made a major move ...,"[{'rank': 0, 'subtype': 'xlarge', 'caption': N...",OpEd,,,2020-12-01T23:31:54+0000,Opinion,The balance between Covid-19 precautions and c...,The New York Times,,Op-Ed,nyt://article/a5444668-1aef-5566-bea4-1da77d13...,https://www.nytimes.com/2020/12/01/opinion/sup...,1147
9,nyt://article/f67439fc-3ff5-5d1d-9f80-6279a28a...,Newly released data gives the most detailed ac...,"{'original': 'By Stacy Cowley and Ella Koeze',...",article,{'main': '1 Percent of P.P.P. Borrowers Got Ov...,"[{'name': 'subject', 'value': 'Small Business'...",The Paycheck Protection Program was the center...,"[{'rank': 0, 'subtype': 'xlarge', 'caption': N...",Business,1,A,2020-12-02T18:01:12+0000,Business Day,Newly released data gives the most detailed ac...,The New York Times,,News,nyt://article/f67439fc-3ff5-5d1d-9f80-6279a28a...,https://www.nytimes.com/2020/12/02/business/pa...,1639
