# üå¶ IoT Rain Data Examples ‚Äî Citywise & Devicewise Analysis

This notebook demonstrates how to fetch, process IoT rainfall data using the `fetch_iot()` API and helper functions.  
Each example focuses on a specific use case, showing how to handle pagination.
---

### ‚öôÔ∏è **Setup**

Before running the examples:
1. Ensure `fetch_iot()` and `to_df()` are defined.  
2. Verify that your `.env` and API credentials are loaded correctly.  
3. Run the setup and import cells first.


In [11]:
import warnings
warnings.filterwarnings(
    "ignore",
    message="The behavior of DataFrame concatenation with empty or all-NA entries is deprecated"
)

# Setup & imports
import os, sys
import pandas as pd
import matplotlib.pyplot as plt

# allow imports when running from /notebooks
sys.path.append("..")
sys.path.append(".")

# 1) Load API client
from utils.iot_client import fetch_iot

# 2) load .env if present (kept for convenience)
from dotenv import load_dotenv
load_dotenv(os.path.join("..", "config", ".env")) or load_dotenv(os.path.join("config", ".env"))

print("Ready")


Ready


In [12]:
def to_df(rows):
    """
    Convert list[dict] -> DataFrame with light parsing.
    - parse minute_ts as UTC datetime
    - coerce rain_value to numeric
    - pre-sort by time if present
    """
    if not rows:
        return pd.DataFrame()

    df = pd.DataFrame(rows)

    if "minute_ts" in df.columns:
        df["minute_ts"] = pd.to_datetime(df["minute_ts"], utc=True, errors="coerce")
    if "rain_value" in df.columns:
        df["rain_value"] = pd.to_numeric(df["rain_value"], errors="coerce")
    if {"minute_ts", "rain_value"}.issubset(df.columns):
        df = df.sort_values("minute_ts")

    return df


PAGE = 500      # rows per request
CAP = 20_000    # safety cap on total rows

def paginate_iot(
    city,
    from_ts,
    to_ts,
    only_with_known_location=True,
    source="auto",
    order="asc",
    page=PAGE,
    max_rows=CAP,
):
    """
    Fetch IoT data in pages and return a single DataFrame.
    Simple, partner-friendly version.
    """
    assert "fetch_iot" in globals() and "to_df" in globals(), \
        "Run the setup cell first (fetch_iot / to_df must be defined)."

    frames = []
    offset = 0
    total = 0

    while total < max_rows:
        rows = fetch_iot(
            city=city,
            from_ts=from_ts,
            to_ts=to_ts,
            only_with_known_location=only_with_known_location,
            source=source,
            order=order,
            limit=page,
            offset=offset,
        )

        df_page = to_df(rows)
        if df_page is None or df_page.empty:
            break

        frames.append(df_page)
        total += len(df_page)
        offset += page

    return pd.concat(frames, ignore_index=True) if frames else pd.DataFrame()


# Example 1 ‚Äî Single-Device View (raw data)

### Purpose
Provide a data preview for a single IoT rain sensor.

### What this example does
1. Fetch one time window of IoT rain data for **Bochum**  
2. Convert the API response into a pandas DataFrame  
3. Clean timestamp and rain columns  
4. Automatically pick **one device** (the device with the most rows)  
5. Display the first rows for that device as a plain table  

### Output
A DataFrame showing:
- `minute_ts` ‚Äì timestamp  
- `dev_eui` ‚Äì device ID  
- `rain_value` ‚Äì rain measurement



### Fetch one batch of Bochum data (no pagination)

In [13]:
# Fetch one window of data for Bochum
rows1 = fetch_iot(
    city="Bochum",
    from_ts="2025-10-01T00:00:00Z",
    to_ts="2025-10-07T23:59:59Z",
    only_with_known_location=True,
    source="auto",
    order="asc",
    limit=1000,
    offset=0,
)

# Convert API rows -> DataFrame
df = to_df(rows1)

print(f"Rows returned: {len(df)}")
display(df.head())

Rows returned: 1000


Unnamed: 0,dev_eui,minute_ts,rain_value,quality_flag,status,dev_name,longitude,latitude,city,sensor_site_name
0,0080E115004E327C,2025-10-01 00:00:00+00:00,,9,,,7.2137,51.4401,Bochum,0080E115004E327C_KemnaderStra√üe_WeitmarMark
28,0080E115004E7C64,2025-10-01 00:00:00+00:00,0.0,1,,,7.241546,51.469333,Bochum,0080E115004E7C64_SportplatzPappelbusch_Altenbo...
27,0080E115004E6993,2025-10-01 00:00:00+00:00,0.0,1,,,7.152601,51.479614,Bochum,0080E115004E6993_LudwigSteilHaus_Wattenscheid
26,0080E115004E611F,2025-10-01 00:00:00+00:00,0.0,1,,,7.268676,51.491582,Bochum,0080E115004E611F_Kornharpen_KornhapenerStra√üe
25,0080E115004E6106,2025-10-01 00:00:00+00:00,0.0,1,,,7.142429,51.5004,Bochum,0080E115004E4707_Osterfeldstra√üe_FFG√ºnnigfeld


## Example 2 ‚Äî Bochum (Oct 1‚Äì7) with pagination

**Pagination**
fetch a longer time window of IoT rain data for **Bochum** using
`limit` and `offset` (pagination), and then see which devices recorded the
highest rain values.

**What this example does**

1. Fetches Bochum rain data for **2025-10-01 to 2025-10-07** using pagination.  
2. Combines all pages into a single pandas DataFrame.  
3. Shows a small preview of the paginated data.  

**Outputs**

- A DataFrame **preview** (`df_paginated.head()`) for Bochum (Oct 1‚Äì7).  



In [14]:
from_ts = "2025-10-01T00:00:00Z"
to_ts   = "2025-10-07T23:59:59Z"
CITY    = "Bochum"

df_paginated = paginate_iot(
    city=CITY,
    from_ts=from_ts,
    to_ts=to_ts,
    only_with_known_location=True,
    source="auto",
    order="asc",
)

print(f"Rows in window: {len(df_paginated)}")
display(df_paginated.head())


Rows in window: 20000


Unnamed: 0,dev_eui,minute_ts,rain_value,quality_flag,status,dev_name,longitude,latitude,city,sensor_site_name
0,0080E115004E327C,2025-10-01 00:00:00+00:00,,9,,,7.2137,51.4401,Bochum,0080E115004E327C_KemnaderStra√üe_WeitmarMark
1,0080E115004E7C64,2025-10-01 00:00:00+00:00,0.0,1,,,7.241546,51.469333,Bochum,0080E115004E7C64_SportplatzPappelbusch_Altenbo...
2,0080E115004E6993,2025-10-01 00:00:00+00:00,0.0,1,,,7.152601,51.479614,Bochum,0080E115004E6993_LudwigSteilHaus_Wattenscheid
3,0080E115004E611F,2025-10-01 00:00:00+00:00,0.0,1,,,7.268676,51.491582,Bochum,0080E115004E611F_Kornharpen_KornhapenerStra√üe
4,0080E115004E6106,2025-10-01 00:00:00+00:00,0.0,1,,,7.142429,51.5004,Bochum,0080E115004E4707_Osterfeldstra√üe_FFG√ºnnigfeld
