What You’ll Learn Here:

JSON has nested objects (source.id, source.name) and sometimes missing fields (author).

You use .get("key", default) and .get("nested", {}) again, but the shape is different from weather data.

Same pattern applies: flatten → handle missing → enrich with extraction time → save to DB.

In [1]:
import pandas as pd
from datetime import datetime

# 1. Simulated News API JSON response
sample_news_api_response = {
    "status": "ok",
    "totalResults": 2,
    "articles": [
        {
            "source": {"id": "bbc-news", "name": "BBC News"},
            "author": "BBC Reporter",
            "title": "Global markets rally as tech stocks surge",
            "description": "Tech giants lead global stock markets to a strong rally.",
            "url": "https://www.bbc.com/news/markets",
            "publishedAt": "2023-09-17T12:34:56Z",
            "content": "Full article content here..."
        },
        {
            "source": {"id": None, "name": "Reuters"},
            "author": None,  # simulate missing author
            "title": "Oil prices climb amid supply concerns",
            "description": "Crude oil prices rose due to tightening global supply.",
            "url": "https://www.reuters.com/business/energy",
            "publishedAt": "2023-09-17T14:00:00Z",
            "content": "Full article content here..."
        }
    ]
}

# 2. Transform: Flatten JSON into DataFrame
records = []
for article in sample_news_api_response.get("articles", []):
    record = {
        "source_id": article.get("source", {}).get("id"),
        "source_name": article.get("source", {}).get("name"),
        "author": article.get("author") or "Unknown",  # default if missing
        "title": article.get("title"),
        "description": article.get("description"),
        "url": article.get("url"),
        "published_at": pd.to_datetime(article.get("publishedAt")),
        "extracted_at": datetime.now()  # when we processed it
    }
    records.append(record)

df = pd.DataFrame(records)

print("Transformed DataFrame:")
print(df)

# 3. (Optional) Load: Save to SQLite
from sqlalchemy import create_engine

engine = create_engine("sqlite:///news_practice.db")
df.to_sql("news_articles", con=engine, if_exists="replace", index=False)

print("Data loaded into DB successfully!")


Transformed DataFrame:
  source_id source_name        author  \
0  bbc-news    BBC News  BBC Reporter   
1      None     Reuters       Unknown   

                                       title  \
0  Global markets rally as tech stocks surge   
1      Oil prices climb amid supply concerns   

                                         description  \
0  Tech giants lead global stock markets to a str...   
1  Crude oil prices rose due to tightening global...   

                                       url              published_at  \
0         https://www.bbc.com/news/markets 2023-09-17 12:34:56+00:00   
1  https://www.reuters.com/business/energy 2023-09-17 14:00:00+00:00   

                extracted_at  
0 2025-09-18 21:49:32.451116  
1 2025-09-18 21:49:32.451116  
Data loaded into DB successfully!
