# Gauge rain data — DWD / WNH / LANUK

This notebook shows how to call the `/gauges` endpoint of the Heavyrain Data API
and preview raw rain gauge measurements from three data sources:

- **DWD/WNH/LANUK**

- use the same API helper function `fetch_gauges(...)`
- convert the JSON response to a pandas `DataFrame`
- display only the first rows with `df.head()` (no extra processing or plots)


In [26]:
import warnings
warnings.filterwarnings("ignore")

import os, sys
import pandas as pd

# Allow imports when running from /notebooks
sys.path.append("..")
sys.path.append(".")

# Load environment variables (.env in repo root or config/.env)
from dotenv import load_dotenv
loaded = (
    load_dotenv(os.path.join("..", "config", ".env"))
    or load_dotenv(os.path.join("..", ".env"))
)

print("Loaded .env:", loaded)

# Import API helper for gauges
from utils.iot_client import fetch_gauges

print("Setup complete")



Loaded .env: True
Setup complete


In [27]:
def to_df(rows):
    """
    Convert list[dict] -> pandas DataFrame.
    No extra parsing or transformations.
    """
    if not rows:
        return pd.DataFrame()
    return pd.DataFrame(rows)


In [28]:
# Simple pagination settings
PAGE = 500      # rows per request
CAP  = 20_000   # safety cap on total rows


def paginate_gauges(
    *,
    channel: str,
    hours: int = 24*365,
    only_with_known_location: bool = False,
    max_rows: int = CAP,
) -> pd.DataFrame:
    """
    Fetch gauge data in pages using limit/offset and return a single DataFrame.
    No extra transformations, just concatenation of pages.
    """

    frames = []
    offset = 0
    total = 0

    while total < max_rows:
        rows = fetch_gauges(
            channel=channel,
            hours=hours,
            only_with_known_location=only_with_known_location,
            limit=PAGE,
            offset=offset,
        )

        df_page = to_df(rows)
        if df_page.empty:
            break

        frames.append(df_page)
        total += len(df_page)
        offset += PAGE

    return pd.concat(frames, ignore_index=True) if frames else pd.DataFrame()


## Example 1 — DWD gauge data (no pagination)

**Purpose**

Fetch a small sample of rain gauge measurements from the **DWD** channel and
preview the first rows.

**What this example does**

1. Calls `fetch_gauges(...)` with:
   - `channel="DWD"`
   - a 1-year lookback (`hours=24*365`)
   - `only_with_known_location=False` so that all rows are allowed
2. Converts the JSON response to a DataFrame using `to_df(rows_dwd)`.
3. Prints how many rows were returned.
4. Shows `df_dwd.head()` to preview the columns and a few records.


In [29]:
rows_dwd = fetch_gauges(
    channel="DWD",
    hours=24*365,                 # larger window so dev DB returns data
    only_with_known_location=False,
    limit=50
)

df_dwd = to_df(rows_dwd)

print(f"Rows returned (DWD): {len(df_dwd)}")
df_dwd.head()


Rows returned (DWD): 50


Unnamed: 0,measurement_date,precipitation_height,city,station_name,channel,latitude,longitude,station_id,station_code,original_value,original_state
0,2025-11-13T12:55:00,0.0,Lübeck,Lübeck_Blankensee,DWD,53.8025,10.6989,3,3086,0.0,0
1,2025-11-13T12:55:00,0.0,Lüdenscheid,Lüdenscheid,DWD,51.2452,7.6425,2,3098,0.0,0
2,2025-11-13T12:54:00,0.0,Lübeck,Lübeck_Blankensee,DWD,53.8025,10.6989,3,3086,0.0,0
3,2025-11-13T12:54:00,0.0,Lüdenscheid,Lüdenscheid,DWD,51.2452,7.6425,2,3098,0.0,0
4,2025-11-13T12:53:00,0.0,Bochum,Bochum,DWD,51.5026,7.2289,1,555,0.0,-999


## Example 2 — DWD gauge data with pagination
### What is pagination?
APIs often return data in small chunks instead of all at once.  
Instead of asking for *everything*, we ask for:

- **page 1** (first 500 rows)  
- **page 2** (next 500 rows)  
- **page 3**, and so on…

This process is called **pagination**.

### Why do we use it?
Some datasets are too large for a single API call.  
Pagination lets us:

- fetch data in manageable pieces, uses `limit` and `offset` to fetch multiple pages
- avoid API timeouts  
- combine all pages into one final DataFrame  

### What this below example does
1. Calls the `/gauges` endpoint repeatedly using `limit` and `offset`.  
2. Fetches multiple “pages” of DWD gauge data for a 1-year window.  
3. Combines all pages into one DataFrame.  
4. Shows how many total rows were collected and displays the first few rows.

Pagination affects **how** data is fetched, not the final DataFrame structure.


In [31]:
# Example 2 — DWD gauge data with pagination

df_dwd_paginated = paginate_gauges(
    channel="DWD",
    hours=24*365,
    only_with_known_location=False,
)

print(f"Rows in window (DWD, paginated): {len(df_dwd_paginated)}")
df_dwd_paginated.head()


Rows in window (DWD, paginated): 20000


Unnamed: 0,measurement_date,precipitation_height,city,station_name,channel,latitude,longitude,station_id,station_code,original_value,original_state
0,2025-11-13T12:55:00,0.0,Lübeck,Lübeck_Blankensee,DWD,53.8025,10.6989,3,3086,0.0,0
1,2025-11-13T12:55:00,0.0,Lüdenscheid,Lüdenscheid,DWD,51.2452,7.6425,2,3098,0.0,0
2,2025-11-13T12:54:00,0.0,Lübeck,Lübeck_Blankensee,DWD,53.8025,10.6989,3,3086,0.0,0
3,2025-11-13T12:54:00,0.0,Lüdenscheid,Lüdenscheid,DWD,51.2452,7.6425,2,3098,0.0,0
4,2025-11-13T12:53:00,0.0,Bochum,Bochum,DWD,51.5026,7.2289,1,555,0.0,-999


## Example 3 — WNH gauge data (no pagination)

**Purpose**

Use the same query pattern, but filter to the **WNH** channel.

**Notes**

- The code is identical to Example 1 except for `channel="WNH"`.
- In this database there may currently be **no WNH data**,
  so the result can be an empty DataFrame (0 rows).


In [33]:
rows_wnh = fetch_gauges(
    channel="WNH",
    hours=24*365,
    only_with_known_location=False,
    limit=50
)

df_wnh = to_df(rows_wnh)

print(f"Rows returned (WNH): {len(df_wnh)}")
df_wnh.head()


Rows returned (WNH): 0


## Example 4 — LANUK gauge data (no pagination)

**Purpose**

Again reuse the same query, this time for the **LANUK** channel.

**Notes**

- We only change `channel="LANUK"`.

In [34]:
rows_lanuk = fetch_gauges(
    channel="LANUK",
    hours=24*365,
    only_with_known_location=False,
    limit=50
)

df_lanuk = to_df(rows_lanuk)

print(f"Rows returned (LANUK): {len(df_lanuk)}")
df_lanuk.head()


Rows returned (LANUK): 0
