
# DATA 304 — Module 4: Importing Data II
## Session 2 Demo — APIs and HTTP (Step-by-step)

**What you'll practice**
1) Hello API (simple GET) → check status, headers, text vs JSON  
2) Parse JSON safely  
3) Query parameters with a real API  
4) Normalize response to a pandas DataFrame  
5) Basic error handling with 404  
6) Gentle pagination loop  
7) Minimal auth headers (User-Agent and Bearer token pattern)  
8) Export results


In [1]:

import requests, pandas as pd, json, time
from pathlib import Path
OUT = Path('./data/outputs'); OUT.mkdir(exist_ok=True)
print('Setup OK')

Setup OK


## 1) Hello API — simple GET

In [2]:

url = "https://api.github.com/"
r = requests.get(url, timeout=10)
print("Status:", r.status_code)
print("First 200 chars of text:")
print(r.text[:200])

Status: 200
First 200 chars of text:
{
  "current_user_url": "https://api.github.com/user",
  "current_user_authorizations_html_url": "https://github.com/settings/connections/applications{/client_id}",
  "authorizations_url": "https://ap


## 2) Parsing JSON — `.json()` vs `.text`

In [3]:

# Safer: handle non-JSON responses
data = None
try:
    data = r.json()
    print("Top-level keys:", list(data.keys())[:10])
except ValueError:
    print("Response is not JSON; falling back to text")
    data = {"raw_text": r.text}
data

Top-level keys: ['current_user_url', 'current_user_authorizations_html_url', 'authorizations_url', 'code_search_url', 'commit_search_url', 'emails_url', 'emojis_url', 'events_url', 'feeds_url', 'followers_url']


{'current_user_url': 'https://api.github.com/user',
 'current_user_authorizations_html_url': 'https://github.com/settings/connections/applications{/client_id}',
 'authorizations_url': 'https://api.github.com/authorizations',
 'code_search_url': 'https://api.github.com/search/code?q={query}{&page,per_page,sort,order}',
 'commit_search_url': 'https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}',
 'emails_url': 'https://api.github.com/user/emails',
 'emojis_url': 'https://api.github.com/emojis',
 'events_url': 'https://api.github.com/events',
 'feeds_url': 'https://api.github.com/feeds',
 'followers_url': 'https://api.github.com/user/followers',
 'following_url': 'https://api.github.com/user/following{/target}',
 'gists_url': 'https://api.github.com/gists{/gist_id}',
 'hub_url': 'https://api.github.com/hub',
 'issue_search_url': 'https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}',
 'issues_url': 'https://api.github.com/issues',
 'keys_url': '

## 3) Query parameters — Open-Meteo (no key)

In [4]:

params = {"latitude": 35.96, "longitude": -83.92, "hourly": "temperature_2m"}
r_met = requests.get("https://api.open-meteo.com/v1/forecast", params=params, timeout=15)
r_met.raise_for_status()
payload_met = r_met.json()
list(payload_met.keys())

['latitude',
 'longitude',
 'generationtime_ms',
 'utc_offset_seconds',
 'timezone',
 'timezone_abbreviation',
 'elevation',
 'hourly_units',
 'hourly']

In [5]:
# Bad request
bad = requests.get("https://api.github.com/does/not/exist")
bad.raise_for_status() # raises HTTPError because status = 404


HTTPError: 404 Client Error: Not Found for url: https://api.github.com/does/not/exist

## 4) From JSON to DataFrame

In [6]:

hourly = payload_met.get("hourly", {})
df_met = pd.DataFrame({"time": hourly.get("time", []),
                       "temperature_2m": hourly.get("temperature_2m", [])})
df_met['time'] = pd.to_datetime(df_met['time'], utc=True, errors='coerce')
df_met.head()

Unnamed: 0,time,temperature_2m
0,2025-09-11 00:00:00+00:00,22.6
1,2025-09-11 01:00:00+00:00,20.3
2,2025-09-11 02:00:00+00:00,18.4
3,2025-09-11 03:00:00+00:00,17.4
4,2025-09-11 04:00:00+00:00,16.7


## 5) Basic error handling — demonstrate 404

In [7]:

try:
    bad = requests.get("https://api.weather.gov/this/endpoint/does/not/exist", timeout=10)
    print("Status:", bad.status_code)
    bad.raise_for_status()  # will raise for 4xx/5xx
except requests.HTTPError as e:
    print("Handled HTTP error:", e)

Status: 404
Handled HTTP error: 404 Client Error: Not Found for url: https://api.weather.gov/this/endpoint/does/not/exist


## 6) Gentle pagination

In [8]:
import requests, pandas as pd, time

headers = {"User-Agent": "DATA304-Demo/1.0"}
base = "https://api.github.com/search/repositories"

all_items = []
for page in (1, 2):  # simple two-page demo
    params = {"q": "data", "per_page": 10, "page": page}
    r = requests.get(base, params=params, headers=headers, timeout=15)
    r.raise_for_status()
    payload = r.json()
    all_items.extend(payload.get("items", []))
    time.sleep(1)  # be polite
    print(len(all_items))

df = pd.json_normalize(all_items)
df = df[["full_name", "stargazers_count", "forks_count", "language", "html_url"]]
print(len(df), "rows from 2 pages")
df.head()


10
20
20 rows from 2 pages


Unnamed: 0,full_name,stargazers_count,forks_count,language,html_url
0,fivethirtyeight/data,17178,11138,Jupyter Notebook,https://github.com/fivethirtyeight/data
1,GoogleTrends/data,4728,453,JavaScript,https://github.com/GoogleTrends/data
2,GSA/data,2193,277,HTML,https://github.com/GSA/data
3,aptnotes/data,1746,286,,https://github.com/aptnotes/data
4,binlist/data,640,361,,https://github.com/binlist/data


## 7) Minimal auth headers — User-Agent and Bearer token pattern

In [9]:

# NOAA NWS requires a descriptive User-Agent. Use your org/course contact.
headers = {"User-Agent": "DATA304/Module4 (contact: instructor@example.edu)"}
check = requests.get("https://api.weather.gov/points/35.96,-83.92", headers=headers, timeout=15)
print("weather.gov status:", check.status_code)

# Bearer token pattern (example only; do not hard-code real keys)
fake_headers = {"Authorization": "Bearer YOUR_TOKEN_HERE"}
print("Example Authorization header prepared:", fake_headers)

weather.gov status: 200
Example Authorization header prepared: {'Authorization': 'Bearer YOUR_TOKEN_HERE'}


## 8) Export results

In [10]:

df_met.to_csv(OUT / "open_meteo_hourly.csv", index=False)
df.to_csv(OUT / "usgs_quakes_small.csv", index=False)
sorted(p.name for p in OUT.iterdir())

['events.csv',
 'flat.csv',
 'open_meteo_hourly.csv',
 'posts_from_json.csv',
 'records_harmonized.csv',
 'users_posts_from_json.csv',
 'usgs_quakes_small.csv',
 'xml_users_posts.csv',
 'xml_users_posts_exploded.csv']


**Summary**
- Request → status + headers → `.json()` parse
- Params drive what the server returns
- Convert JSON → DataFrame for analysis
- Handle errors explicitly
- Small pagination loops are common
- Add headers when required; keep secrets out of notebooks
