# 📰 NY Daily News

## 📌 Instructions

1. Enter your **search term** by changing the `'s'` parameter in `params` (e.g., `"economy"`, `"sports"`, `"technology"`, `"elections"`).  
2. Optionally, set a **date range** using `'sp[f]'` (from) and `'sp[t]'` (to) in `params`  
   - Example: `'sp[f]': '2020-03-01', 'sp[t]': '2020-03-30'`.  
3. Define the **page range** to scrape (e.g., `range(1, 5)`).  
4. The script retrieves:  
   - Title  
   - Date  
   - Link  
   - Full article content
5. The results are stored in a **pandas DataFrame** and can be exported to CSV:

```python
nydn_df.to_csv("data_nydn_df.csv", index=False)

In [None]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import time

base_url = "https://www.nydailynews.com/page/{page}/"

params = {
    's': 'economy', # change to "sports", "technology", etc.
    'post_type': '',
    'category_name': '',
    'orderby': 'relevance',
    'order': 'desc',
    'sp[f]': '2020-03-01',
    'sp[t]': '2020-03-30',
    'obit__spotlight': '',
    'obit__site_name': ''
}

titles, dates, links = [], [], []

for page in range(1, 2):
    if page == 1:
        # Use original search URL for page 1
        url = "https://www.nydailynews.com/"
    else:
        url = base_url.format(page=page)
    
    response = requests.get(url, params=params)
    soup = BeautifulSoup(response.content, 'html.parser')

    articles = soup.find_all('article')

    for article in articles:
        title_tag = article.find('span', class_='dfm-title metered')
        date_tag = article.find('time', datetime=True)
        link_tag = article.find('a', class_='article-title')

        title = title_tag.text.strip() if title_tag else None
        date = date_tag['datetime'].split()[0] if date_tag else None
        link = link_tag['href'] if link_tag and 'href' in link_tag.attrs else None

        titles.append(title)
        dates.append(date)
        links.append(link)

nydn_df = pd.DataFrame({
    'Title': titles,
    'Date': dates,
    'Link': links
})

def extract_article_content(url):
    try:
        response = requests.get(url)
        if response.status_code != 200:
            return None
        soup = BeautifulSoup(response.content, 'html.parser')
        
        # Look for the main article content
        content_div = soup.find('div', class_='p402_premium')
        if not content_div:
            content_div = soup.find('div', class_='article-content')
        if not content_div:
            return None

        paragraphs = content_div.find_all('p')
        article_text = ' '.join([p.get_text(strip=True) for p in paragraphs])
        return article_text
    except Exception as e:
        print(f"Error processing {url}: {e}")
        return None

nydn_df['Content'] = nydn_df['Link'].apply(lambda url: extract_article_content(url) if url else None)
nydn_df.head()

Unnamed: 0,Title,Date,Link,Content
0,German finance minister commits suicide as cor...,2020-03-29,https://www.nydailynews.com/2020/03/29/german-...,A German finance minister is believed to have ...
1,"Service workers, hung out to dry: Coronavirus ...",2020-03-20,https://www.nydailynews.com/2020/03/20/service...,Just before restaurants were forcibly shuttere...
2,How to slow the bleeding: The action the coron...,2020-03-17,https://www.nydailynews.com/2020/03/17/how-to-...,It may be that our leaders are starting to wak...
3,Readers sound off on staying calm during coron...,2020-03-12,https://www.nydailynews.com/2020/03/13/readers...,Coronavirus will not last forever Ozone Park: ...
4,Viral president: Trump’s own misstatements hel...,2020-03-07,https://www.nydailynews.com/2020/03/07/viral-p...,A better-than-expected jobs report couldn’t st...


In [None]:
nydn_df.to_csv('data_nydn_df.csv', index=False)