# Lab 7: Automated Batch Pipeline (Cloud Functions 2nd Gen + Cloud Scheduler)
Date generated: 2025-08-21

**Objective:** Build a serverless, automated data pipeline that fetches API data and loads it to BigQuery on a schedule.  
**Stack:** Cloud Functions (2nd gen), Secret Manager, BigQuery, Cloud Scheduler.

**Deliverables:** (a) This notebook with outputs, (b) scheduler screenshot, (c) BigQuery table screenshot.

In [3]:
# Optional (Colab): install gcloud and libs
# !apt -y -qq install google-cloud-sdk
!pip -q install google-cloud-bigquery google-cloud-secret-manager requests
PROJECT_ID = "imposing-coast-442802-a7"
REGION     = "us-central1"
DATASET    = "superstore_data"
TABLE      = "realtime_weather"
CITY       = "Lafayette,IN,US"
print(PROJECT_ID, REGION, DATASET, TABLE, CITY)

imposing-coast-442802-a7 us-central1 superstore_data realtime_weather Lafayette,IN,US


### Enable required APIs (run once per project)

In [9]:
!gcloud services enable run.googleapis.com cloudfunctions.googleapis.com bigquery.googleapis.com secretmanager.googleapis.com cloudscheduler.googleapis.com logging.googleapis.com --project $PROJECT_ID

Operation "operations/acf.p2-1022300460213-74eb6cd9-c417-480d-84f1-3746cfc252ca" finished successfully.


### Authenticate gcloud

This step is necessary to authenticate your Google Cloud account, which is required for subsequent `gcloud` commands to function correctly.

In [7]:
!gcloud auth login

Go to the following link in your browser, and complete the sign-in prompts:

    https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=32555940559.apps.googleusercontent.com&redirect_uri=https%3A%2F%2Fsdk.cloud.google.com%2Fauthcode.html&scope=openid+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fappengine.admin+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fsqlservice.login+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcompute+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Faccounts.reauth&state=H95xdVvvSdfU6uIFUaWhZQu46mhivk&prompt=consent&token_usage=remote&access_type=offline&code_challenge=bUs14u3gydVyUidadtEIzUOqGM7vjhYP57P326A2RSQ&code_challenge_method=S256

Once finished, enter the verification code provided in your browser: 4/0Ab32j91_hS-t-JVzjtud8Eavlu8NfuFipPigJJMydPS4xi5Amcqyq30ta6eplKrCaFpFJQ

You are now logged in as [cchris2004@gmail.com].
Your current project i

In [11]:
from google.colab import auth
auth.authenticate_user()

In [12]:
from google.cloud import bigquery
client = bigquery.Client(project=PROJECT_ID)
# dataset
try:
    client.get_dataset(f"{PROJECT_ID}.{DATASET}")
except Exception:
    client.create_dataset(bigquery.Dataset(f"{PROJECT_ID}.{DATASET}"), exists_ok=True)
# table schema
schema = [
    bigquery.SchemaField("ingest_ts", "TIMESTAMP"),
    bigquery.SchemaField("city", "STRING"),
    bigquery.SchemaField("weather", "STRING"),
    bigquery.SchemaField("temp_c", "FLOAT"),
    bigquery.SchemaField("humidity", "FLOAT"),
    bigquery.SchemaField("wind_mps", "FLOAT"),
    bigquery.SchemaField("raw", "JSON"),
]
table_ref = bigquery.Table(f"{PROJECT_ID}.{DATASET}.{TABLE}", schema=schema)
client.create_table(table_ref, exists_ok=True)
print("Ready:", f"{PROJECT_ID}.{DATASET}.{TABLE}")

Ready: imposing-coast-442802-a7.superstore_data.realtime_weather


In [8]:
!gcloud config set project $PROJECT_ID

Updated property [core/project].


### Create/verify BigQuery dataset & table

In [13]:
from google.cloud import bigquery
client = bigquery.Client(project=PROJECT_ID)
# dataset
try:
    client.get_dataset(f"{PROJECT_ID}.{DATASET}")
except Exception:
    client.create_dataset(bigquery.Dataset(f"{PROJECT_ID}.{DATASET}"), exists_ok=True)
# table schema
schema = [
    bigquery.SchemaField("ingest_ts", "TIMESTAMP"),
    bigquery.SchemaField("city", "STRING"),
    bigquery.SchemaField("weather", "STRING"),
    bigquery.SchemaField("temp_c", "FLOAT"),
    bigquery.SchemaField("humidity", "FLOAT"),
    bigquery.SchemaField("wind_mps", "FLOAT"),
    bigquery.SchemaField("raw", "JSON"),
]
table_ref = bigquery.Table(f"{PROJECT_ID}.{DATASET}.{TABLE}", schema=schema)
client.create_table(table_ref, exists_ok=True)
print("Ready:", f"{PROJECT_ID}.{DATASET}.{TABLE}")

Ready: imposing-coast-442802-a7.superstore_data.realtime_weather


### Secret Manager: create `OWM_API_KEY` secret and add your API key

In [14]:
from google.cloud import secretmanager
sm = secretmanager.SecretManagerServiceClient()
parent = f"projects/{PROJECT_ID}"
secret_id = "OWM_API_KEY"
try:
    sm.get_secret(request={"name": f"{parent}/secrets/{secret_id}"})
except Exception:
    sm.create_secret(request={"parent": parent, "secret_id": secret_id, "secret": {"replication": {"automatic": {}}}})
sm.add_secret_version(request={"parent": f"{parent}/secrets/{secret_id}", "payload": {"data": b"YOUR_OPENWEATHERMAP_API_KEY"}})
print("Secret ready:", secret_id)

Secret ready: OWM_API_KEY


### Cloud Function (Python) â€” copy this into `main.py` when deploying
> Create `requirements.txt` with:  
`google-cloud-bigquery>=3.25.0`  
`google-cloud-secret-manager>=2.20.2`  
`requests>=2.31.0`

In [15]:
# --- BEGIN: main.py template ---
import os, json, logging, datetime, requests
from google.cloud import bigquery, secretmanager

PROJECT_ID = os.environ.get("PROJECT_ID")
DATASET    = os.environ.get("DATASET")
TABLE      = os.environ.get("TABLE")
CITY       = os.environ.get("CITY", "Lafayette,IN,US")
SECRET_ID  = os.environ.get("OWM_SECRET_ID", "OWM_API_KEY")

_bq = bigquery.Client()
_sm = secretmanager.SecretManagerServiceClient()
_api_key_cache = None

def _api_key():
    global _api_key_cache
    if _api_key_cache: return _api_key_cache
    name = f"projects/{PROJECT_ID}/secrets/{SECRET_ID}/versions/latest"
    resp = _sm.access_secret_version(request={"name": name})
    _api_key_cache = resp.payload.data.decode()
    return _api_key_cache

def weather_ingest(request):
    try:
        params = {"q": CITY, "appid": _api_key(), "units": "metric"}
        r = requests.get("https://api.openweathermap.org/data/2.5/weather", params=params, timeout=15)
        r.raise_for_status()
        data = r.json()
        row = {
            "ingest_ts": datetime.datetime.utcnow().isoformat()+"Z",
            "city": CITY,
            "weather": (data.get("weather") or [{}])[0].get("main"),
            "temp_c": (data.get("main") or {}).get("temp"),
            "humidity": (data.get("main") or {}).get("humidity"),
            "wind_mps": (data.get("wind") or {}).get("speed"),
            "raw": data,
        }
        table_id = f"{PROJECT_ID}.{DATASET}.{TABLE}"
        errors = _bq.insert_rows_json(table_id, [row])
        if errors:
            logging.error(errors); return (json.dumps({"ok": False, "errors": errors}), 500, {"Content-Type":"application/json"})
        return (json.dumps({"ok": True, "rows": 1}), 200, {"Content-Type":"application/json"})
    except Exception as e:
        logging.exception("error")
        return (json.dumps({"ok": False, "error": str(e)}), 500, {"Content-Type":"application/json"})
# --- END: main.py template ---

### Deploy (2nd gen) and set env vars

In [None]:
# !gcloud functions deploy weather_ingest --gen2 --region=$REGION --runtime=python312 --source=. #    --entry-point=weather_ingest --trigger-http --no-allow-unauthenticated #    --set-env-vars=PROJECT_ID=$PROJECT_ID,DATASET=$DATASET,TABLE=$TABLE,CITY="$CITY",OWM_SECRET_ID=OWM_API_KEY

### Grant Secret Accessor to the function's service account and schedule hourly with Cloud Scheduler (OIDC)

In [None]:
# FUNCTION_SA=$(gcloud functions describe weather_ingest --gen2 --region=$REGION --format="value(serviceConfig.serviceAccountEmail)")
# gcloud secrets add-iam-policy-binding OWM_API_KEY --member="serviceAccount:${FUNCTION_SA}" --role="roles/secretmanager.secretAccessor"
# URL=$(gcloud functions describe weather_ingest --gen2 --region=$REGION --format="value(serviceConfig.uri)")
# gcloud iam service-accounts create scheduler-invoker --display-name="Scheduler Invoker" --project $PROJECT_ID
# gcloud run services add-iam-policy-binding $(basename $URL) --region=$REGION --member="serviceAccount:scheduler-invoker@$PROJECT_ID.iam.gserviceaccount.com" --role="roles/run.invoker"
# gcloud scheduler jobs create http weather-ingest-hourly --schedule="0 * * * *" --uri="$URL" --http-method=POST --oidc-service-account-email="scheduler-invoker@$PROJECT_ID.iam.gserviceaccount.com" --oidc-token-audience="$URL"

### Validate recent rows in BigQuery

In [18]:
PROJECT_ID = 'imposing-coast-442802-a7'
DATASET    = "superstore_data"
TABLE      = "realtime_weather"
from google.cloud import bigquery
client = bigquery.Client(project=PROJECT_ID)
for row in client.query(f"SELECT ingest_ts, city, weather, temp_c FROM `{PROJECT_ID}.{DATASET}.{TABLE}` ORDER BY ingest_ts DESC LIMIT 5"):
    print(dict(row))

### Challenge (author a Gemini prompt)
Write a prompt to add try/except for `requests.exceptions.RequestException` and structured logging with `google-cloud-logging`. Place it here: