# Fetching New York Times Article Comments and Saving to CSV

This Jupyter Notebook demonstrates how to read article URLs from a CSV file, fetch comments using the `nytimes_scraper` library, and save the comments to a CSV file.

## Step 1: Importing Necessary Libraries

In [8]:
# We'll be using pandas to handle CSV files and the nytimes_scraper to fetch article comments.
import pandas as pd
from nytimes_scraper.nyt_api import NytApi
from nytimes_scraper.comments import fetch_comments_by_article, comments_to_df

## Step 2: Initialize the NYTimes API

In [9]:

# You need an API key from the New York Times Developer Portal.
# Replace '<your_api_key>' with your actual API key.
api = NytApi('a03qiFO9FwyMSp0po7kHavUoCNGTXpmY')

## Step 3: Read URLs from a CSV file

In [10]:

# This function reads a CSV file containing article URLs and returns them as a list.
# We assume the CSV file has a column named 'article_url' which contains the URLs.
def read_urls_from_csv(file_path):
    """
    Reads a CSV file and extracts article URLs.

    Parameters:
    file_path (str): Path to the CSV file containing article URLs.

    Returns:
    List[str]: A list of article URLs.
    """
    # Read the CSV file into a DataFrame
    df = pd.read_csv('/Users/abhinav/Desktop/School/MSBA/2nd Semester/Advanced Programming/Code/Class/Week 4/currentarticles.csv')
    
    # Check and strip spaces from column names in case they exist
    df.columns = df.columns.str.strip()
    
    # Print the column names for debugging in case there's a different column name
    print("Columns in the CSV file:", df.columns)
    
    # Extract URLs from the 'article_url' column (adjust the column name if necessary)
    urls = df['web_url'].tolist()
    
    return urls


## Step 4: Fetch Comments from the URLs

In [12]:

# This function fetches comments for a list of article URLs using the NYTimes API.
def fetch_comments_for_urls(api, urls):
    """
    Fetch comments for a list of article URLs.

    Parameters:
    api (NytApi): An instance of the NytApi class.
    urls (List[str]): A list of article URLs.

    Returns:
    List[dict]: A list of comment dictionaries fetched from the articles.
    """
    all_comments = []
    
    # Iterate through the list of URLs and fetch comments for each
    for url in urls:
        try:
            print(f"Fetching comments for: {url}")
            comments = fetch_comments_by_article(api, url)
            all_comments.extend(comments)
        except Exception as e:
            print(f"Error fetching comments for {url}: {e}")
    
    return all_comments


## Step 5: Save Comments to a CSV File

In [13]:

# This function takes a list of comments and saves them to a CSV file.
def save_comments_to_csv(comments, output_file):
    """
    Save comments to a CSV file.

    Parameters:
    comments (List[dict]): A list of comment dictionaries.
    output_file (str): Path to the output CSV file.
    """
    if comments:
        # Convert the comments to a DataFrame
        comment_df = comments_to_df(comments)
        
        # Save the DataFrame to a CSV file
        comment_df.to_csv(output_file, index=False)
        print(f"Comments saved to {output_file}")
    else:
        print("No comments found!")
#change so that it keeps the article ID --> Current articles tab, keep URl at least, adx_keywords, title

## Step 6: Main Function to Orchestrate the Process

In [14]:

# Main function to read article URLs, fetch comments, and save them to a CSV.
def main(input_csv, output_csv):
    """
    Main function to read article URLs, fetch comments, and save them to a CSV.

    Parameters:
    input_csv (str): Path to the input CSV file containing article URLs.
    output_csv (str): Path to the output CSV file to save comments.
    """
    # Step 6.1: Read article URLs from the input CSV
    urls = read_urls_from_csv(input_csv)
    
    # Step 6.2: Fetch comments for the list of URLs
    comments = fetch_comments_for_urls(api, urls)
    
    # Step 6.3: Save the fetched comments to the output CSV file
    save_comments_to_csv(comments, output_csv)


## Step 7: Running the Script

In [15]:

# Provide the path to the input CSV file containing article URLs and the output file to save the comments.
input_csv = "currentarticles.csv"  # Example input file containing article URLs
output_csv = "nytimes_comments.csv"     # Example output file to save fetched comments

# Run the main function
main(input_csv, output_csv)

Columns in the CSV file: Index(['Unnamed: 0', 'abstract', 'web_url', 'snippet', 'lead_paragraph',
       'print_section', 'print_page', 'source', 'multimedia', 'headline',
       'keywords', 'pub_date', 'document_type', 'news_desk', 'section_name',
       'byline', 'type_of_material', '_id', 'word_count', 'uri',
       'subsection_name'],
      dtype='object')
Fetching comments for: https://www.nytimes.com/2025/02/14/opinion/trump-tariffs-china-mexico.html
Fetching comments for: https://www.nytimes.com/2025/02/13/us/politics/trump-tariffs.html
Fetching comments for: https://www.nytimes.com/2025/02/01/us/politics/canada-mexico-china-trump-tariffs.html
Fetching comments for: https://www.nytimes.com/2025/01/17/world/canada/canada-trump-tariffs.html
Fetching comments for: https://www.nytimes.com/2025/01/20/us/politics/trump-tariffs-executive-order.html
Fetching comments for: https://www.nytimes.com/2025/02/14/business/economy/whiskey-tariffs.html
Fetching comments for: https://www.nytimes.

  df[col] = pd.to_datetime(df[col], unit='s')
  df[col] = pd.to_datetime(df[col], unit='s')
  df[col] = pd.to_datetime(df[col], unit='s')
