## Fletcher Collect Data

Data was pulled from the NY Times API using requests.

The data came down as json files.

The json was parsed using jq to pull the relevent data from the json file.

In [3]:
import requests
import pyjq
import datetime
from dateutil.rrule import rrule, MONTHLY
import pickle

In [9]:
# create a list of (year,month) pairs for the data

start_dt = datetime.date(2015,1,1)
end_dt = datetime.date(2018,11,10)

dates = [(dt.year,dt.month) for dt in rrule(MONTHLY, dtstart=start_dt, until=end_dt)]


In [10]:
# loop through the (year,month) pairs, pull the json file from NY Times, extract the desired data
all_output = []
for year,month in dates:
    #print(year, month)
    this_url = f'https://api.nytimes.com/svc/archive/v1/{year}/{month}.json?&api-key=6a6c043c723548a98eafc5b624317cac'
    r = requests.get(this_url)
    summary_data = r.json()
    
    length = pyjq.all('.response .docs | length',summary_data)[0]
    print(f'For month {month} in {year} there were {length} articles')

    jq_query = f'.response .docs [] | {{snippet: .snippet, headline: .headline .main, date: .pub_date}}'
    output = pyjq.all(jq_query,summary_data)
    print(f'{year}{month:02}')
    #all_output.append([str(year)+str(month),output])
    all_output.append([f'{year}{month:02}',output])

For month 1 in 2015 there were 8252 articles
201501
For month 2 in 2015 there were 7561 articles
201502
For month 3 in 2015 there were 8397 articles
201503
For month 4 in 2015 there were 7768 articles
201504
For month 5 in 2015 there were 7973 articles
201505
For month 6 in 2015 there were 7888 articles
201506
For month 7 in 2015 there were 7893 articles
201507
For month 8 in 2015 there were 7435 articles
201508
For month 9 in 2015 there were 8273 articles
201509
For month 10 in 2015 there were 8499 articles
201510
For month 11 in 2015 there were 7777 articles
201511
For month 12 in 2015 there were 7582 articles
201512
For month 1 in 2016 there were 7644 articles
201601
For month 2 in 2016 there were 7297 articles
201602
For month 3 in 2016 there were 7502 articles
201603
For month 4 in 2016 there were 6699 articles
201604
For month 5 in 2016 there were 6802 articles
201605
For month 6 in 2016 there were 6640 articles
201606
For month 7 in 2016 there were 6120 articles
201607
For month

In [8]:
print(type(all_output))

<class 'list'>


In [11]:
# write data to pickle file
with open('all_output.pkl', 'wb') as fp:
    pickle.dump(all_output, fp)

In [4]:
# read data from pickle fil
with open ('all_output.pkl', 'rb') as fp:
    all_output = pickle.load(fp)

In [7]:
all_output

[['201610',
  [OrderedDict([('snippet',
                 'You probably had internships or summer office jobs, but this is different. It’s the start of your career. Start by making a good first impression, and don’t be afraid to ask questions. Pretty soon, you’ll be sitting in the boss’s chair, right?'),
                ('headline', 'How to win your first three months on the job'),
                ('date', '2018-10-01T02:37:47+0000')]),
   OrderedDict([('snippet',
                 'Ms. Fortune, an Indiana philanthropist, found a late-in-life purpose in restoring Renaissance art by women, earning her the nickname “Indiana Jane.”'),
                ('headline',
                 'Jane Fortune, Champion of Florence’s Female Artists, Dies at 76'),
                ('date', '2018-10-02T21:41:50+0000')]),
   OrderedDict([('snippet',
                 'Donna Strickland did pioneering work with lasers and shared the award with two men on Tuesday. She is the first woman to receive the award in 55 y

In [16]:
print(f'Data exists for {len(all_output[:])} months')
for x in all_output:
    print(f'For month {x[0][4:]} in {x[0][:4]} there were {len(x[1])} articles')

Data exists for 25 months
For month 10 in 2016 there were 5792 articles
For month 11 in 2016 there were 5418 articles
For month 12 in 2016 there were 5073 articles
For month 01 in 2017 there were 5205 articles
For month 02 in 2017 there were 5024 articles
For month 03 in 2017 there were 5650 articles
For month 04 in 2017 there were 4979 articles
For month 05 in 2017 there were 5337 articles
For month 06 in 2017 there were 5356 articles
For month 07 in 2017 there were 4828 articles
For month 08 in 2017 there were 4935 articles
For month 09 in 2017 there were 5094 articles
For month 10 in 2017 there were 5248 articles
For month 11 in 2017 there were 4895 articles
For month 12 in 2017 there were 2791 articles
For month 01 in 2018 there were 4808 articles
For month 02 in 2018 there were 4577 articles
For month 03 in 2018 there were 5053 articles
For month 04 in 2018 there were 4671 articles
For month 05 in 2018 there were 5132 articles
For month 06 in 2018 there were 5045 articles
For mont