# NYT API Calls

----------
## Section 1. API documentation

#### Article comments
With the Community API, you can get readers' article comments.
https://api.nytimes.com/svc/community/v3/user-content/url.json?api-key={your-key}&offset=0&url=https%3A%2F%2Fwww.nytimes.com%2F2019%2F06%2F21%2Fscience%2Fgiant-squid-cephalopod-video.html

#### Article replies
And replies to those comments.
https://api.nytimes.com/svc/community/v3/user-content/replies.json?api-key={your-key}&url=https%3A%2F%2

#### Base URI
https://api.nytimes.com/svc/community/v3

#### Scope
NYTimes.com user-generated content, currently comments on articles.

#### HTTP method
GET

#### Response formats
JSON

To use the Community API, you must sign up for an API key. **Usage is limited to 4,000 requests per day and 10 requests per minute (rate limits are subject to change).** Please read and agree to the API Terms of Use and the Attribution Guidelines before you proceed.

#### Pagination
**Use the offset query parameter to paginate thru the results, 25 comments at a time. Use offset=0 to get the first 25 comments, offset=25 to get the next 25 comments, ...**

The url.json endpoint returns top-level comments and the first three replies. The totalParentCommentsFound field has the total number of top-level comments. Use that to determine how many comments you need to paginate thru.

In the comment node, the replyCount indicates how many replies there are to that top-level comment. If there are more than three, use the replies.json endpoint, the comment sequence and offset query parameter to paginate thru replies, 25 at a time.

You can sort the comment list by **newest first, oldest first, or comments with most reader recommendations first (sort=newest, oldest, or reader).**

#### Responses
The Community API is RESTful. It uses response codes to indicate the API status (200 - OK, 401 - invalid key, 429 - rate limit reached, ...).

#### Data Returned
Date fields are in Unix/UTC format.

----------
## Section 2. Function for calling comments

In [3]:
nyt_key = ''
nyt_secret_key = ''

In [10]:
## Function that returns first 125 comments from NYTimes article in pandas dataframe

def comment_df_clean(url):
    
#### Import packages
    import requests
    import numpy as np
    import pandas as pd
    import re
    from datetime import datetime
    
     
#### 1) Generate API urls
    
    base_url = 'https://api.nytimes.com/svc/community/v3/'
    offset = ['0', '25', '50', '75', '100']
    
    # url_list to store generated urls
    url_list = []
    
    # for loop to generate the urls
    for a in range(0, 5):
        comment_api = 'user-content/url.json?api-key=' + nyt_key + '&offset=' + offset[a] + '&url='
        article_url = url
        article_tmp1 = re.sub(':', '%3A', article_url)
        article_tmp2 = re.sub('/', '%2F', article_tmp1)
        
        
        final_url = base_url + comment_api + article_tmp2
        
        url_list.append(final_url)

#### 2) GET request to API, store JSON objects

    # json_list to store generated json objects    
    json_list = []
    
    # for loop to generate the json objects
    for a in range(0, 5):
        response = requests.get(url_list[a], params = {'sort' : 'reader'})
        json_obj = response.json()
        
        json_list.append(json_obj)
        
#### 3) Parse JSON objects
      
    # result list to store parsed json in csv format
    result = []
    
    # for loop to parse through json object
    for k in range(0, 5):
        for i in json_list[k]['results']['comments']:
            result.append(str(i['userDisplayName']) + ',' + str(i['userDisplayName']).split(' ')[0] +',' + re.sub(',',' ',str(i['commentBody'])) + ',' + str(i['createDate']) + ',' 
                           + str(i['approveDate']) + ',' + str(i['recommendations']) + ',' + str(i['replyCount']) +','
                           + str(i['editorsSelection']) + ',' + str(i['recommendedFlag']))  

    
    # save it in a pandas dataframe
    result_df = pd.DataFrame([cols.split(',') for cols in result],
                            columns = ['userName', 'splitName', 'comment', 'createDate', 'approveDate', 'n_recommend', 'n_reply', 'nyt_select', 'recommendflag']
                            )
    result_df['url'] = np.repeat(url, len(result_df))
    result_df['ttlCommentNum'] = np.repeat(json_list[0]['results']['totalCommentsFound'], len(result_df))
    
#### 4) cleant the dataframe, add gender and time
    
    # add the gender for name
    from nltk.corpus import names
    male_names = names.words('male.txt')
    female_names = names.words('female.txt')
    
    for v in range(0, len(result_df)):
        # Clean the name structure
        result_df['splitName'].iloc[v] = re.sub('[^A-z]','', result_df['splitName'].iloc[v])
        if result_df['splitName'].iloc[v] in male_names:
            result_df.loc[v, 'gender'] = 0
        elif result_df['splitName'].iloc[v] in female_names:
            result_df.loc[v, 'gender'] = 1
        else:
            result_df.loc[v, 'gender'] = 2
        
        
     # transform data types   
    result_df['n_recommend'] = result_df['n_recommend'].astype(int)
    result_df['n_reply'] = result_df['n_reply'].astype(int)
    result_df['createDate'] = result_df['createDate'].astype(int)
    result_df['approveDate'] = result_df['approveDate'].astype(int)
    result_df['splitName'] = result_df['splitName'].astype(str)
    result_df['comment'] = result_df['comment'].astype(str)
    result_df['gender'] = result_df['gender'].astype('category')
    
    
    # Add time
    result_df['time'] = np.repeat(None, len(result_df))

    for v in range(0, len(result_df)):
        result_df['time'].iloc[v] = datetime.fromtimestamp(result_df['approveDate'].iloc[v])
        
    # Add time_order index
    result_df = result_df.sort_values(by = 'time').reset_index(drop = True)
    result_df['time_order'] = np.arange(1, len(result_df)+1)
    
    result_df = result_df.drop(['createDate','approveDate'], axis = 1)
    
    return(result_df)


In [11]:
# example call
ex = comment_df_clean('https://www.nytimes.com/2020/03/09/business/stock-market-today.html')

In [12]:
ex.head()

Unnamed: 0,userName,splitName,comment,n_recommend,n_reply,nyt_select,recommendflag,url,ttlCommentNum,gender,time,time_order
0,AB,AB,This is when it hurts us not having a competen...,1046,7,False,0,https://www.nytimes.com/2020/03/09/business/st...,1496,2.0,2020-03-09 06:46:01,1
1,Mark,Mark,Our illustrious commander-in-chief was all too...,1183,17,False,0,https://www.nytimes.com/2020/03/09/business/st...,1496,0.0,2020-03-09 06:58:13,2
2,Nomind7,Nomind,This is when it hurts us not having universal ...,1046,33,False,0,https://www.nytimes.com/2020/03/09/business/st...,1496,2.0,2020-03-09 06:59:10,3
3,Michael,Michael,Oil prices falling is a symptom of demand fall...,136,4,False,0,https://www.nytimes.com/2020/03/09/business/st...,1496,0.0,2020-03-09 07:19:59,4
4,Larry,Larry,Virus at my daughters school in Boston. I have...,68,2,False,0,https://www.nytimes.com/2020/03/09/business/st...,1496,0.0,2020-03-09 07:29:30,5


In [15]:
ex.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 125 entries, 0 to 124
Data columns (total 12 columns):
userName         125 non-null object
splitName        125 non-null object
comment          125 non-null object
n_recommend      125 non-null int32
n_reply          125 non-null int32
nyt_select       125 non-null object
recommendflag    125 non-null object
url              125 non-null object
ttlCommentNum    125 non-null int32
gender           125 non-null category
time             125 non-null object
time_order       125 non-null int32
dtypes: category(1), int32(4), object(7)
memory usage: 9.1+ KB
