# NewsAPI Data Ingestion

This notebook demonstrates how live financial news data
is ingested into the Market Mood & Moves pipeline using NewsAPI.

The goal is to illustrate:
- secure API key handling
- controlled data acquisition
- conversion from raw API responses to structured data

Only a small sample of articles is fetched to avoid
rate-limit and reproducibility issues.


In [4]:
import os
from dotenv import load_dotenv

load_dotenv()

NEWSAPI_KEY = os.getenv("NEWSAPI_KEY")

if NEWSAPI_KEY is None:
    raise ValueError("NEWSAPI_KEY not found. Please set it in your .env file.")

print("NewsAPI key loaded successfully.")


NewsAPI key loaded successfully.


In [5]:
from newsapi import NewsApiClient

newsapi = NewsApiClient(api_key=NEWSAPI_KEY)


In [6]:
response = newsapi.get_everything(
    q="Apple",
    language="en",
    sort_by="publishedAt",
    page_size=10  # intentionally small
)

print(f"Articles fetched: {len(response['articles'])}")


Articles fetched: 10


In [7]:
import pandas as pd

news_records = []

for article in response["articles"]:
    news_records.append({
        "source": article["source"]["name"],
        "headline": article["title"],
        "description": article["description"],
        "published_at": article["publishedAt"],
        "url": article["url"]
    })

df_news_api = pd.DataFrame(news_records)
df_news_api


Unnamed: 0,source,headline,description,published_at,url
0,Gizmodo.com,2026 Is Poised to Be the Year of the Tech IPO....,"SpaceX, OpenAI, Anthropic are all allegedly pl...",2026-01-02T15:35:00Z,https://gizmodo.com/2026-is-poised-to-be-the-y...
1,9to5Mac,M5 Vision Pro launch likely made minimal sales...,Apple launched a brand new M5 Vision Pro updat...,2026-01-02T15:33:31Z,https://9to5mac.com/2026/01/02/m5-vision-pro-l...
2,Notebookcheck.net,Apple MacBook Air reportedly stops artillery s...,A Ukrainian soldier has posted photos and vide...,2026-01-02T15:30:00Z,https://www.notebookcheck.net/Apple-MacBook-Ai...
3,Variety,French Box Office Drops by 13% to Roughly $1.1...,After a bullish year bolstered by local blockb...,2026-01-02T15:28:15Z,https://variety.com/2026/film/box-office/frenc...
4,Dansdeals.com,Lowest Price Ever! 4 Pack Of Method Apple Orch...,4 Pack Of Method Apple Orchard Daily Granite C...,2026-01-02T15:25:33Z,https://www.dansdeals.com/shopping-deals/amazo...
5,redmondpie.com,Save $70 Off This 220W Anker Power Bank Deal A...,"We all rely on our technology more than ever, ...",2026-01-02T15:22:47Z,https://www.redmondpie.com/save-70-off-this-22...
6,Github.com,Show HN: A standalone server for probabilistic...,Article URL: https://github.com/benitolopez/pd...,2026-01-02T15:22:17Z,https://github.com/benitolopez/pds
7,Observer,Care and Craft: Inside the Making of Seoul’s C...,"Inside Seoul’s cocktail culture, where hospita...",2026-01-02T15:21:39Z,https://observer.com/2026/01/inside-seoul-cock...
8,Bringatrailer.com,25k-Mile 2010 Mercedes-Benz G55 AMG,This 2010 Mercedes-Benz G55 AMG was purchased ...,2026-01-02T15:20:11Z,https://bringatrailer.com/listing/2010-mercede...
9,CNET,"Pebble's Bringing Its Round Watch Back, This T...",The Pebble Round is back for a decade-later se...,2026-01-02T15:12:00Z,https://www.cnet.com/tech/mobile/pebbles-bring...


In [8]:
df_news_api.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   source        10 non-null     object
 1   headline      10 non-null     object
 2   description   10 non-null     object
 3   published_at  10 non-null     object
 4   url           10 non-null     object
dtypes: object(5)
memory usage: 528.0+ bytes


## Design Notes

- This notebook focuses only on data acquisition.
- No filtering, sentiment analysis, or alignment is performed here.
- The output of this notebook is raw news data,
  which can be stored or processed downstream.

Separating ingestion from processing ensures
modularity, reproducibility, and cleaner debugging.


In [None]:
import sqlite3

conn = sqlite3.connect("news_data.db")

df_news_api.to_sql(
    "raw_news_api",
    conn,
    if_exists="replace",
    index=False
)

conn.close()

print("Raw NewsAPI data stored successfully.")
