
# DATA 304 — Module 4: Importing Data II
## Session 2 Demo — APIs and HTTP (Step-by-step)

**What you'll practice**
1) Hello API (simple GET) → check status, headers, text vs JSON  
2) Parse JSON safely  
3) Query parameters with a real API  
4) Normalize response to a pandas DataFrame  
5) Basic error handling with 404  
6) Gentle pagination loop  
7) Minimal auth headers (User-Agent and Bearer token pattern)  
8) Export results


In [None]:
import requests, pandas as pd, json, time
from pathlib import Path
OUT = Path('./data/outputs'); OUT.mkdir(exist_ok=True)
print('Setup OK')

## 1) Hello API — simple GET

In [None]:

url = "https://api.github.com/"
r = requests.get(url, timeout=10)
print("Status:", r.status_code)
print("First 200 chars of text:")
print(r.text[:200])

## 2) Parsing JSON — `.json()` vs `.text`

In [None]:

# Safer: handle non-JSON responses
data = None
try:
    data = r.json()
    print("Top-level keys:", list(data.keys())[:10])
except ValueError:
    print("Response is not JSON; falling back to text")
    data = {"raw_text": r.text}
data

## 3) Query parameters — Open-Meteo (no key)

In [None]:

params = {"latitude": 35.96, "longitude": -83.92, "hourly": "temperature_2m"}
r_met = requests.get("https://api.open-meteo.com/v1/forecast", params=params, timeout=15)
r_met.raise_for_status()
payload_met = r_met.json()
list(payload_met.keys())

In [None]:
# Bad request
try:
    bad = requests.get("https://api.github.com/does/not/exist")
    bad.raise_for_status() # raises HTTPError because status = 404
except requests.HTTPError as e:
    print("Error:", e)


## 4) From JSON to DataFrame

In [None]:

hourly = payload_met.get("hourly", {})
df_met = pd.DataFrame({"time": hourly.get("time", []),
                       "temperature_2m": hourly.get("temperature_2m", [])})
df_met['time'] = pd.to_datetime(df_met['time'], utc=True, errors='coerce')
df_met.head()

## 5) Basic error handling — demonstrate 404

In [None]:

try:
    bad = requests.get("https://api.weather.gov/this/endpoint/does/not/exist", timeout=10)
    print("Status:", bad.status_code)
    bad.raise_for_status()  # will raise for 4xx/5xx
except requests.HTTPError as e:
    print("Handled HTTP error:", e)

## 6) Gentle pagination

In [None]:
import requests, pandas as pd, time

headers = {"User-Agent": "DATA304-Demo/1.0"}
base = "https://api.github.com/search/repositories"

all_items = []
for page in (1, 2):  # simple two-page demo
    params = {"q": "data", "per_page": 10, "page": page}
    r = requests.get(base, params=params, headers=headers, timeout=15)
    r.raise_for_status()
    payload = r.json()
    all_items.extend(payload.get("items", []))
    time.sleep(1)  # be polite
    print("Lines read so far:",len(all_items))

df = pd.json_normalize(all_items)
df = df[["full_name", "stargazers_count", "forks_count", "language", "html_url"]]
print(len(df), "rows from 2 pages")
df.head()


## 7) Minimal auth headers — User-Agent and Bearer token pattern

In [None]:

# NOAA NWS requires a descriptive User-Agent. Use your org/course contact.
headers = {"User-Agent": "DATA304/Module4 (contact: instructor@example.edu)"}
check = requests.get("https://api.weather.gov/points/35.96,-83.92", headers=headers, timeout=15)
print("weather.gov status:", check.status_code)

# Bearer token pattern (example only; do not hard-code real keys)
fake_headers = {"Authorization": "Bearer YOUR_TOKEN_HERE"}
print("Example Authorization header prepared:", fake_headers)

## 8) Export results

In [None]:

df_met.to_csv(OUT / "open_meteo_hourly.csv", index=False)
df.to_csv(OUT / "usgs_quakes_small.csv", index=False)
sorted(p.name for p in OUT.iterdir())


**Summary**
- Request → status + headers → `.json()` parse
- Params drive what the server returns
- Convert JSON → DataFrame for analysis
- Handle errors explicitly
- Small pagination loops are common
- Add headers when required; keep secrets out of notebooks
