## Challenge 1: Professional Data Storage

CSV files are inefficient for production-grade data pipelines as they do not preserve
schema, are slow for incremental updates, and do not support querying.

This notebook demonstrates storing financial news data using **SQLite**, enabling
structured storage, incremental inserts, and SQL-based access.

In [1]:
import pandas as pd
import sqlite3
from datetime import datetime

# Sample news data (simulating NewsAPI output)
news_df = pd.DataFrame({
    "headline": [
        "Apple reports strong quarterly earnings",
        "Markets fall amid global uncertainty",
        "Tesla announces new battery technology"
    ],
    "source": ["Reuters", "Bloomberg", "CNBC"],
    "published_at": [
        datetime(2025, 1, 1, 9, 30),
        datetime(2025, 1, 1, 14, 15),
        datetime(2025, 1, 2, 10, 45)
    ]
})

# Create SQLite database and store data
conn = sqlite3.connect("news_data.db")
news_df.to_sql("news_articles", conn, if_exists="replace", index=False)

# Append new data (incremental ingestion)
new_entry = pd.DataFrame({
    "headline": ["Federal Reserve hints at rate cuts"],
    "source": ["Wall Street Journal"],
    "published_at": [datetime(2025, 1, 2, 16, 10)]
})
new_entry.to_sql("news_articles", conn, if_exists="append", index=False)


1

In [2]:
# Read data back from SQLite to verify persistence
stored_df = pd.read_sql("SELECT * FROM news_articles", conn)
conn.close()

stored_df


Unnamed: 0,headline,source,published_at
0,Apple reports strong quarterly earnings,Reuters,2025-01-01 09:30:00
1,Markets fall amid global uncertainty,Bloomberg,2025-01-01 14:15:00
2,Tesla announces new battery technology,CNBC,2025-01-02 10:45:00
3,Federal Reserve hints at rate cuts,Wall Street Journal,2025-01-02 16:10:00


### Key Learnings

- SQLite preserves data types and schema better than CSV
- Incremental data ingestion is simple and safe
- SQL queries enable flexible data access
- This approach scales better for financial news pipelines
