## API exploration and limitations
- Topic: Enviroment and Climate Change

### NYT API

In [None]:
import requests
import os
url = 'https://api.nytimes.com/svc/search/v2/articlesearch.json'
headers = {
    "Accept" : "application/json"
}
parameters = {
    "api-key" : os.getenv("NYT_API_KEY"),
    "q" : "climate change"
}

r = requests.get(
    url = url,
    params = parameters,
    headers = headers
)
data = r.json()


#### exploring the response

In [8]:
data.keys()

dict_keys(['status', 'copyright', 'response'])

In [12]:
data['response'].keys()

dict_keys(['docs', 'metadata'])

how many articles does it produce?

In [14]:
num_articles = len(data['response']['docs'])
print(f"the number of articles in the response is {num_articles}")

the number of articles in the response is 10


what is the information inside each of the articles

In [15]:
articles = data['response']['docs']
print(f"the type of the articles is {type(articles)}")

the type of the articles is <class 'list'>


In [19]:
sample_article = articles[0]
for k in sample_article.keys():
    print(f"'{k}' : {sample_article[k]}")

'abstract' : Extreme weather events — deadly heat waves, floods, fires and hurricanes — are the consequences of a warming planet, scientists say.
'byline' : {'original': 'By David Gelles and Austyn Gaffney'}
'document_type' : article
'headline' : {'main': '‘We’re in a New Era’: How Climate Change Is Supercharging Disasters', 'kicker': '', 'print_headline': 'Fires in Los Angeles Area Are Grim Look Into Future'}
'_id' : nyt://article/a09b3cd3-63b9-5df7-8563-a88bde75361f
'keywords' : [{'name': 'Subject', 'value': 'Wildfires', 'rank': 1}, {'name': 'Subject', 'value': 'Global Warming', 'rank': 2}, {'name': 'Subject', 'value': 'Fires and Firefighters', 'rank': 3}, {'name': 'Subject', 'value': 'Southern California Wildfires (Jan 2025)', 'rank': 4}, {'name': 'Location', 'value': 'Los Angeles (Calif)', 'rank': 5}, {'name': 'Subject', 'value': 'Hurricanes and Tropical Storms', 'rank': 6}, {'name': 'Subject', 'value': 'Floods', 'rank': 7}, {'name': 'Subject', 'value': 'Heat and Heat Waves', 'rank

relevant fields from articles
- abstract
- byline (authors)
- headline (main, print_headline)
- pub_date,
- source
- web_url
- word_count

In [21]:
data['response']['metadata']

{'hits': 10000, 'offset': 0, 'time': 12}

### refining the query

In [None]:
import requests
import os
url = 'https://api.nytimes.com/svc/search/v2/articlesearch.json'
headers = {
    "Accept" : "application/json"
}
parameters = {
    "api-key" : os.getenv("NYT_API_KEY"),
    "q" : "climate change", # what are the articles about?
    "sort" : "newest", # available options: best (default), newest, oldest, relevance
    "begin_date" : "20200101", # format (YYYYMMDD)
    "end_date" : "20250401", # format (YYYYMMDD)
    "fq" : 'desk:("Climate", "Foreign") AND section.name:("Climate", "Science") AND type:("Article")',
    # the special quality about this filter is the "" on the fields and the filters available,
    # types of fields can be found in https://developer.nytimes.com/docs/articlesearch-product/1/overview
}

r = requests.get(
    url = url,
    params = parameters,
    headers = headers
)

r.json()

{'status': 'OK',
 'copyright': 'Copyright (c) 2025 The New York Times Company. All Rights Reserved.',
 'response': {'docs': [{'abstract': 'The exhibits were dedicated to the agency’s history. Mr. Zeldin said closing the collection would save $600,000 annually.',
    'byline': {'original': 'By Lisa Friedman'},
    'document_type': 'article',
    'headline': {'main': 'Lee Zeldin, E.P.A. Head, Shuts National Environmental Museum',
     'kicker': '',
     'print_headline': ''},
    '_id': 'nyt://article/a83470c0-6d26-57c7-8f60-024dc28a5522',
    'keywords': [{'name': 'Subject', 'value': 'Global Warming', 'rank': 1},
     {'name': 'Subject', 'value': 'Greenhouse Gas Emissions', 'rank': 2},
     {'name': 'Subject', 'value': 'Museums', 'rank': 3},
     {'name': 'Subject', 'value': 'Environment', 'rank': 4},
     {'name': 'Subject', 'value': 'Presidential Election of 2024', 'rank': 5},
     {'name': 'Organization',
      'value': 'Environmental Protection Agency',
      'rank': 6},
     {'name

### creating the class object

In [19]:
import os
import requests

class NYTnews:
    """
    This class consumes news, artciles, and other media from the New York Times API
    Future improvements:
        - hide the api key attribute
    """
    def __init__(
        self,
        api_key : str,
        query : str,
        sort : str = 'newest',
        begin_date : str = '20200101',
        end_date : str = '20250401'
    ):
        # query parameters
        self.api_key, self.query, self.sort, self.begin_date, self.end_date = api_key, query, sort, begin_date, end_date
        # url endpoint
        self.endpoint = 'https://api.nytimes.com/svc/search/v2/articlesearch.json'
        # headers
        self.headers = {
            "Accept" : "application/json"
        }
        # query parameters
        self.parameters = {
            "api-key" : self.api_key,
            "q" : self.query, # what are the articles about?
            "sort" : self.sort, # available options: best (default), newest, oldest, relevance
            "begin_date" : self.begin_date, # format (YYYYMMDD)
            "end_date" : self.end_date,
            "fq" : 'type:("Article")' # special parameters that allows granular filters
            # types of fields can be found in https://developer.nytimes.com/docs/articlesearch-product/1/overview
        }
        # placeholders for future attributes
        self.news_list = None
        self.news_urls = None
        self.news_authors = None
    # class methods
    def consume_endpoint(self):
        """
        Generates the list of news according to the query parameters
        """
        try:
            response = requests.get(
                url = self.endpoint,
                params = self.parameters,
                headers = self.headers
            )
            if response.status_code == 400:
                raise Exception('Invalid query parameters')
            if response.status_code == 401:
                raise Exception('Invalid API Key!')
            if response.status_code == 429:
                raise Exception('Daily limit reached')
            self.news_list = response.json()['response']['docs']
        except Exception as e:
            print(e)
    def get_total_news(self):
        """
        total number of news from the query
        """
        if self.news_list is not None:
            return len(self.news_list)
        else:
            print(f"Endpoint needs to consumed first")
    def get_news_urls(self):
        """
        creates the attribute where all the url links can be listed
        """
        try:
            if self.news_list is None:
                raise Exception("Endpoint needs to be consumed first")
            if len(self.news_list) == 0:
                raise Exception("There are no news in for this query")
            self.news_urls = [d['web_url'] for d in self.news_list]
            return self.news_urls
        except Exception as e:
            print(e)
    def get_news_authors(self):
        """
        creates the dictionary of headline and authors
        """
        try:
            if self.news_list is None:
                raise Exception("Endpoint needs to be consumed first")
            if len(self.news_list) == 0:
                raise Exception("There are no news in for this query")
            self.news_authors = {
                d['headline']['print_headline'] : d['byline']['original'] for d in self.news_list
            }
            return self.news_authors
        except Exception as e:
            print(e)
    def get_snippet_word_count(self):
        """
        return a dictionary of the news snippet and the wordcount
        """
        try:
            if self.news_list is None:
                raise Exception("Endpoint needs to be consumed first")
            if len(self.news_list) == 0:
                raise Exception("There are no news in for this query")
            self.summary_and_wordcount = {
                d['snippet'] : d['word_count'] for d in self.news_list
            }
            return self.summary_and_wordcount
        except Exception as e:
            print(e)

            
news = NYTnews(
    api_key = os.getenv('NYT_API_KEY'),
    query = 'Technology'
)
news.consume_endpoint()
for key, value in news.get_news_authors().items():
    print(f"headline: '{key}' : author: '{value}'")


headline: '' : author: 'By Benjamin Mueller'
headline: 'OpenAI Completes Deal That Values Company at $300 Billion' : author: 'By Cade Metz'
headline: 'Building a Farmhouse With a Pole-Barn Vibe' : author: 'By Tim McKeough'
headline: 'Experts See Science Cuts As a Big Risk' : author: 'By Ben Casselman'
headline: 'Pitch on Tariffs  Is That People  Can Take Pain' : author: 'By Alan Rappeport'
headline: 'Google’s A.I. Drug Design Lab, Isomorphic, Raises $600 Million' : author: 'By Michael J. de la Merced'
headline: 'A Chinese Truck Maker Wants the Green Light' : author: 'By Daisuke Wakabayashi'


next steps:
1. create the object
   1. attributes: the parameters of the query
   2. methods: get the the information from the articles
2. implement error handling with the responses

### trying out the custom classes

In [5]:
import entities.news_article 
import aggregator.api_client 
import entities.user_input
from importlib import reload
reload(entities.news_article)
reload(aggregator.api_client)
reload(entities.user_input)

<module 'entities.user_input' from '/Users/santiagocardenas/Documents/MDSI/202501/python programming/InfoAggregatorFinalProject/entities/user_input.py'>

In [8]:
from entities.news_article import NYTArticle
from aggregator.api_client import NYTNewsArticles
from entities.user_input import UserInput
from os import getenv

user_input = UserInput(
    category = 'Culture',
    source = 'The New York Times'
)

test = NYTNewsArticles(
    api_key = getenv("NYT_API_KEY"),
    base_url = 'https://api.nytimes.com/svc/search/v2/articlesearch.json'
)

articles = test.fetch_articles(
    user_input = user_input
)

for a in articles:
    print(f"title '{a.title}' : author: '{a.author}'")

title '' : author: 'By Adam Nossiter'
title '' : author: 'By Ivan Nechepurenko'
title '' : author: 'By David French'
title 'Across the Great Streamer Divide' : author: 'By Jack Crosbie'
title '' : author: 'By David Allen and Chet Strange'
title '' : author: 'By Jason Horowitz'
title 'Corrections' : author: ''
title '' : author: 'By Jason Horowitz'
title 'Trump’s  White House Is Black-Pilled' : author: 'By Ross Douthat'
title '' : author: 'By John Jeremiah Sullivan'


In [9]:
import requests
for a in articles:
    print(f"status code '{requests.get(a.url).status_code}' ")

status code '403' 
status code '403' 
status code '403' 
status code '403' 
status code '403' 
status code '403' 
status code '403' 
status code '403' 
status code '403' 
status code '403' 


In [5]:
import requests
import os
endpoint = "https://serpapi.com/search"
parameters = {
    "api_key" : os.getenv('GOOGLE_SEARCH_API_KEY'),
    "q" : articles[0].title,
    "location" : "Austin, Texas, United States"
}
response = requests.get(
    url = endpoint,
    params = parameters
)
data = response.json()
data

{'search_metadata': {'id': '680b5e9643957cd638faa5a0',
  'status': 'Success',
  'json_endpoint': 'https://serpapi.com/searches/8f21402941149367/680b5e9643957cd638faa5a0.json',
  'created_at': '2025-04-25 10:06:14 UTC',
  'processed_at': '2025-04-25 10:06:14 UTC',
  'google_url': 'https://www.google.com/search?q=The+Technology+That+Could+Fuel++Trump%E2%80%99s+Immigration+Offensive&oq=The+Technology+That+Could+Fuel++Trump%E2%80%99s+Immigration+Offensive&uule=w+CAIQICIaQXVzdGluLFRleGFzLFVuaXRlZCBTdGF0ZXM&sourceid=chrome&ie=UTF-8',
  'raw_html_file': 'https://serpapi.com/searches/8f21402941149367/680b5e9643957cd638faa5a0.html',
  'total_time_taken': 1.45},
 'search_parameters': {'engine': 'google',
  'q': 'The Technology That Could Fuel  Trump’s Immigration Offensive',
  'location_requested': 'Austin, Texas, United States',
  'location_used': 'Austin,Texas,United States',
  'google_domain': 'google.com',
  'device': 'desktop'},
 'search_information': {'query_displayed': 'The Technology Tha

In [7]:
google_results = data['organic_results']
filtered_results = list(filter(lambda r: requests.get(r.get('link')).status_code == 200, google_results))
filtered_results

[{'position': 3,
  'title': "How Technology Could Aid Trump's Immigration Crackdown",
  'link': 'https://borderlessmag.org/2025/03/04/donald-trump-ai-surveillance-deportation-immigration-technology/',
  'redirect_link': 'https://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&url=https://borderlessmag.org/2025/03/04/donald-trump-ai-surveillance-deportation-immigration-technology/&ved=2ahUKEwjllNfg9_KMAxVuMtAFHSj0JHEQFnoECBoQAQ',
  'displayed_link': 'https://borderlessmag.org › 2025/03/04 › donald-trump...',
  'favicon': 'https://serpapi.com/searches/680b5e9643957cd638faa5a0/images/c9724441b497778407bd9453ae222d2c2db59d0605f42668079131e09f4d89d6.png',
  'date': 'Mar 4, 2025',
  'snippet': 'From ankle monitors to biometric data collection, the Trump administration could use AI technology in its immigration enforcement actions.',
  'snippet_highlighted_words': ['Trump', 'could', 'technology', 'immigration'],
  'missing': ['Fuel', 'Offensive'],
  'source': 'Borderless Magazine NFP'},

In [10]:
from bs4 import BeautifulSoup
import requests
content_url = filtered_results[0].get('link')
response = requests.get(content_url)
soup = BeautifulSoup(
    markup = response.content,
    features = 'html.parser'
)
print(soup.prettify())

<!DOCTYPE html>
<html class="no-js" lang="en-US">
 <head>
  <meta charset="utf-8"/>
  <meta content="width=device-width, initial-scale=1, maximum-scale=5" name="viewport">
   <meta content="index, follow, max-image-preview:large, max-snippet:-1, max-video-preview:-1" name="robots"/>
   <style>
   </style>
   <!-- This site is optimized with the Yoast SEO Premium plugin v24.6 (Yoast SEO v24.6) - https://yoast.com/wordpress/plugins/seo/ -->
   <title>
    How Technology Could Aid Trump's Immigration Crackdown – Borderless Magazine NFP
   </title>
   <link as="font" crossorigin="" data-rocket-preload="" href="https://borderlessmag.org/wordpress/wp-content/plugins/powerkit/assets/fonts/powerkit-icons.woff" rel="preload"/>
   <link as="font" crossorigin="" data-rocket-preload="" href="https://borderlessmag.org/wordpress/wp-content/plugins/wp-accessibility/toolbar/fonts/css/a11y.woff2" rel="preload"/>
   <link as="font" crossorigin="" data-rocket-preload="" href="https://borderlessmag.org/wo

In [11]:
content_url

'https://borderlessmag.org/2025/03/04/donald-trump-ai-surveillance-deportation-immigration-technology/'

In [23]:
paragraph_text = set([p.text.strip() for p in soup.find_all(name = 'p')])
print('\n'.join(paragraph_text))


Many of the tools that the Trump administration is using for surveillance have long been in place, and we’ve seen both Democrats and Republicans carry out policies that would allow for deeper surveillance and data sharing with government agencies.
The federal budget proposed for FY2025 includes more than $3 billion toward AI tech investment and application across agencies, a $1.2 billion increase from 2023 funding.
“Now more than ever, we’re seeing that this is not just an abstract kind of exercise, but rather technology replicates the power differentials in society, and it’s always the people who are on the margins and who are the most vulnerable that are affected by technology the most,” said Molnar.
Share
Our work is made possible thanks to donations from people like you. Support high-quality reporting by making a tax-deductible donation today.
Want to receive stories like this in your inbox every week?
Yes, there are laws in some jurisdictions, like Illinois, around things like th

In [None]:
random_art = articles[2]
search_fields = ['title', 'main', 'kicker', 'summary']
next((getattr(random_art, field) for field in search_fields if getattr(random_art, field) is not None), None)

'Next Big Leap for A.I. Tech? Instant Videos on Command.'

In [39]:
sample_string = '   my name is santiago    '
print(sample_string.title().strip())

My Name Is Santiago
