### Writing Pandas DataFrames to TimeDB

This notebook demonstrates how to insert pandas DataFrames directly into TimeDB using the high-level SDK.

#### What you'll learn:
- Using the TimeDB SDK (`import timedb as td`)
- Inserting point-in-time values from DataFrames
- Inserting interval values from DataFrames
- Handling multiple value keys (e.g., mean, quantiles) - automatic conversion
- Working with timezone-aware datetimes

**Note:** The SDK automatically handles DataFrame-to-TimeDB format conversion, so you can work directly with pandas DataFrames!


In [1]:
import uuid
import pandas as pd
from datetime import datetime, timezone, timedelta
from dotenv import load_dotenv

import timedb as td

load_dotenv()

# Create schema (uses TIMEDB_DSN or DATABASE_URL from environment)
td.create()


Creating database schema...
✓ Schema created successfully


## Example 1: Simple Time Series DataFrame

Let's start with a simple DataFrame containing a time series with a single value column. The SDK will automatically convert it to TimeDB format.


In [2]:
# Create a simple time series DataFrame
base_time = datetime(2025, 1, 1, 0, 0, tzinfo=timezone.utc)
dates = [base_time + timedelta(hours=i) for i in range(24)]
df = pd.DataFrame({
    'valid_time': dates,
    'value': [100.0 + i * 0.5 for i in range(24)]
})

print("Sample DataFrame:")
print(df.head())
print(f"\nShape: {df.shape}")
print(f"Time range: {df['valid_time'].min()} to {df['valid_time'].max()}")


Sample DataFrame:
                 valid_time  value
0 2025-01-01 00:00:00+00:00  100.0
1 2025-01-01 01:00:00+00:00  100.5
2 2025-01-01 02:00:00+00:00  101.0
3 2025-01-01 03:00:00+00:00  101.5
4 2025-01-01 04:00:00+00:00  102.0

Shape: (24, 2)
Time range: 2025-01-01 00:00:00+00:00 to 2025-01-01 23:00:00+00:00


With the SDK, we can insert the DataFrame directly! The SDK automatically:
- Converts the DataFrame to TimeDB format internally
- Generates all IDs if not provided (`run_id`, `workflow_id`, `run_start_time`, `entity_id`, `tenant_id`)
- Uses the column name as `value_key` (e.g., if column is "value", value_key becomes "value")
- For single-tenant installations, uses a default zeros UUID for `tenant_id`

**Under the hood:** The SDK converts each row to `(tenant_id, valid_time, entity_id, value_key, value)` format for point-in-time values.

Let's insert with minimal parameters (only DataFrame required!):


In [3]:
print(f"Prepared to insert DataFrame with {len(df)} rows")
print(f"DataFrame columns: {list(df.columns)}")
print(f"Sample data:")
print(df.head(3))
print("\nNote: All IDs (run_id, workflow_id, run_start_time, entity_id, tenant_id) will be auto-generated")


Prepared to insert DataFrame with 24 rows
DataFrame columns: ['valid_time', 'value']
Sample data:
                 valid_time  value
0 2025-01-01 00:00:00+00:00  100.0
1 2025-01-01 01:00:00+00:00  100.5
2 2025-01-01 02:00:00+00:00  101.0

Note: All IDs (run_id, workflow_id, run_start_time, entity_id, tenant_id) will be auto-generated


Now let's insert the data into TimeDB using the SDK:


In [4]:

# All optional: tenant_id (defaults to zeros UUID), run_id, workflow_id, 
# run_start_time, entity_id are auto-generated
# value_key is automatically "value" (from column name)
# The function returns the IDs that were used (including auto-generated ones)

result = td.insert_run(df=df)

print(f"✓ Successfully inserted run {result.run_id}")
print(f"✓ Tenant ID: {result.tenant_id} (default zeros UUID for single-tenant)")
print(f"✓ Workflow ID: {result.workflow_id}")
print(f"✓ Entity ID: {result.entity_id} (save this if you want to update this entity later)")
print(f"✓ Inserted {len(df)} values from DataFrame")
print(f"\nThe value_key used was: 'value' (from the DataFrame column name)")

Data values inserted successfully.
✓ Successfully inserted run 21b40dce-154e-4a3c-9be6-64dfebd6eb16
✓ Tenant ID: 00000000-0000-0000-0000-000000000000 (default zeros UUID for single-tenant)
✓ Workflow ID: sdk-insert
✓ Entity ID: 804f8efe-3f68-4ce7-ba5d-3de870184a9d (save this if you want to update this entity later)
✓ Inserted 24 values from DataFrame

The value_key used was: 'value' (from the DataFrame column name)


## Example 2: DataFrame with Multiple Value Keys

Often you'll have multiple value types in the same DataFrame (e.g., mean, quantiles, min, max). The SDK automatically handles this by melting the DataFrame internally!


In [5]:
# Create a DataFrame with multiple value columns
base_time = datetime(2025, 1, 2, 0, 0, tzinfo=timezone.utc)
dates = [base_time + timedelta(hours=i) for i in range(12)]

df_multi = pd.DataFrame({
    'valid_time': dates,
    'mean': [100.0 + i * 0.5 for i in range(12)],
    'quantile:0.1': [95.0 + i * 0.4 for i in range(12)],
    'quantile:0.9': [105.0 + i * 0.6 for i in range(12)],
    'min': [90.0 + i * 0.3 for i in range(12)],
    'max': [110.0 + i * 0.7 for i in range(12)],
})

print("DataFrame with multiple value columns:")
print(df_multi.head())


DataFrame with multiple value columns:
                 valid_time   mean  quantile:0.1  quantile:0.9   min    max
0 2025-01-02 00:00:00+00:00  100.0          95.0         105.0  90.0  110.0
1 2025-01-02 01:00:00+00:00  100.5          95.4         105.6  90.3  110.7
2 2025-01-02 02:00:00+00:00  101.0          95.8         106.2  90.6  111.4
3 2025-01-02 03:00:00+00:00  101.5          96.2         106.8  90.9  112.1
4 2025-01-02 04:00:00+00:00  102.0          96.6         107.4  91.2  112.8


The SDK automatically melts the DataFrame internally, so each value type becomes a separate row. You can either:
- Let the SDK auto-detect all value columns (all columns except `valid_time`)
- Explicitly specify which columns to use with `value_columns` parameter

Let's insert it:


In [6]:
# Option 1: Let SDK auto-detect value columns (all except valid_time)
# Option 2: Explicitly specify value columns
# We'll use option 2 for clarity

result_multi = td.insert_run(
    df=df_multi,
    value_columns=['mean', 'quantile:0.1', 'quantile:0.9', 'min', 'max'],  # SDK will auto-melt these
    # Column names become value_keys: 'mean', 'quantile:0.1', etc.
)

print(f"✓ Successfully inserted run {result_multi.run_id}")
print(f"✓ Entity ID: {result_multi.entity_id}")
print(f"✓ Inserted {len(df_multi) * 5} values (12 rows × 5 value keys)")
print(f"✓ Value keys used: {['mean', 'quantile:0.1', 'quantile:0.9', 'min', 'max']}")


Data values inserted successfully.
✓ Successfully inserted run 83331c2d-3ded-4194-b66f-ca160971ba60
✓ Entity ID: acebfaef-8d95-446b-8c73-c8ebc7ec3dc3
✓ Inserted 60 values (12 rows × 5 value keys)
✓ Value keys used: ['mean', 'quantile:0.1', 'quantile:0.9', 'min', 'max']


In [7]:
# Alternative: Let SDK auto-detect (uses all columns except valid_time)
result_multi_auto = td.insert_run(
    df=df_multi,
    # No value_columns specified - SDK auto-detects all value columns
    # Column names automatically become value_keys
)

print(f"✓ Successfully inserted run {result_multi_auto.run_id} (auto-detected columns)")
print(f"✓ Entity ID: {result_multi_auto.entity_id}")
print(f"✓ SDK auto-detected value columns: {['mean', 'quantile:0.1', 'quantile:0.9', 'min', 'max']}")


Data values inserted successfully.
✓ Successfully inserted run 8f94b941-b95c-4b2f-8dd5-88582d9f4ee3 (auto-detected columns)
✓ Entity ID: 1b1e565e-6d0d-4a3e-95a1-f72ab5d47a05
✓ SDK auto-detected value columns: ['mean', 'quantile:0.1', 'quantile:0.9', 'min', 'max']


## Example 3: Interval Values

TimeDB also supports interval values (values that are valid over a time range). The SDK handles this automatically when you specify the `valid_time_end_col` parameter.


In [None]:
# Create a DataFrame with intervals
base_time = datetime(2025, 1, 3, 0, 0, tzinfo=timezone.utc)

df_intervals = pd.DataFrame({
    'valid_time': [base_time + timedelta(hours=i*3) for i in range(8)],
    'valid_time_end': [base_time + timedelta(hours=(i+1)*3) for i in range(8)],
    'value': [50.0 + i * 2.0 for i in range(8)],
})

print("Interval DataFrame:")
print(df_intervals)


For interval values, the SDK automatically converts to:
- `(tenant_id, valid_time, valid_time_end, entity_id, value_key, value)`

Just specify the `valid_time_end_col` parameter:


In [None]:
# Insert interval values - SDK handles conversion automatically!
result_intervals = td.insert_run(
    df=df_intervals,
    valid_time_end_col='valid_time_end',  # Specify the interval end column
    # value_key will be "value" (from column name) if not specified
    # All IDs auto-generated (tenant_id defaults to zeros UUID)
)

print(f"✓ Successfully inserted run {result_intervals.run_id}")
print(f"✓ Entity ID: {result_intervals.entity_id}")
print(f"✓ Inserted {len(df_intervals)} interval values")
print(f"✓ Value key used: 'value' (from DataFrame column name)")


## Understanding the SDK

The SDK automatically handles DataFrame conversion, but here's what happens under the hood for educational purposes:


In [None]:
# The SDK internally converts DataFrames to this format:
# Point-in-time: (tenant_id, valid_time, entity_id, value_key, value)
# Interval: (tenant_id, valid_time, valid_time_end, entity_id, value_key, value)

# For multiple value columns, the SDK automatically:
# 1. Melts the DataFrame (converts wide to long format)
# 2. Creates rows for each (valid_time, value_key, value) combination
# 3. Converts to TimeDB tuple format

# Example: What the SDK does internally for df_multi
print("Original DataFrame (wide format):")
print(df_multi.head(3))
print(f"\nShape: {df_multi.shape}")

# SDK melts it internally (shown here for demonstration)
df_melted_demo = df_multi.melt(
    id_vars=['valid_time'],
    value_vars=['mean', 'quantile:0.1', 'quantile:0.9', 'min', 'max'],
    var_name='value_key',
    value_name='value'
)
print("\nAfter melting (long format - what SDK uses internally):")
print(df_melted_demo.head(6))
print(f"\nShape: {df_melted_demo.shape}")
print("\nEach row becomes a TimeDB value row with the value_key from the column name!")


## SDK Features Summary

The SDK provides a simple interface while handling all the complexity:


In [None]:
print("SDK Features:")
print("=" * 60)
print("✓ Automatic DataFrame conversion to TimeDB format")
print("✓ Handles single value columns (with value_key parameter)")
print("✓ Handles multiple value columns (auto-melts, uses column names as value_keys)")
print("✓ Supports point-in-time values (default)")
print("✓ Supports interval values (with valid_time_end_col parameter)")
print("✓ Auto-detects value columns if not specified")
print("✓ Validates timezone-aware datetimes")
print("✓ Atomic inserts (all-or-nothing)")
print("\nSimple API:")
print("  td.create()  # Create schema")
print("  td.insert_run(..., df=df, ...)  # Insert DataFrame")
print("\nNo manual conversion needed - just pass your DataFrame!")


## Summary

You've learned how to:
1. ✅ Use the TimeDB SDK (`import timedb as td`)
2. ✅ Insert simple time series DataFrames (single value column)
3. ✅ Insert DataFrames with multiple value keys (automatic conversion)
4. ✅ Insert interval values (with `valid_time_end_col` parameter)

**Key Points:**
- Use `td.create()` to set up the database schema
- Use `td.insert_run()` with your DataFrame - conversion is automatic!
- **Minimal parameters**: Only `df` is required - all IDs are auto-generated!
  - `tenant_id` defaults to zeros UUID (00000000-0000-0000-0000-000000000000) for single-tenant
  - `run_id`, `workflow_id`, `run_start_time`, `entity_id` are all auto-generated
- **Return value**: The function returns `InsertResult` with all IDs used (save `entity_id` for updates!)
- **Value keys**: Automatically use column names (e.g., column "value" → value_key "value")
- Specify `value_columns` list for multiple value columns (or let SDK auto-detect)
- Specify `valid_time_end_col` for interval values
- All datetimes must be timezone-aware (SDK validates this)
- For multi-tenant installations, provide `tenant_id` explicitly

**Entity ID for Updates:**
- If you plan to update a time series later, either:
  - Provide `entity_id` when inserting, OR
  - Save `result.entity_id` from the return value
- Without the `entity_id`, you won't be able to identify which entity to update

**Under the hood:**
- Point-in-time: `(tenant_id, valid_time, entity_id, value_key, value)`
- Intervals: `(tenant_id, valid_time, valid_time_end, entity_id, value_key, value)`
- Multiple value columns are automatically melted by the SDK
- Column names become value_keys automatically

Next: See `notebook_02_read_dataframe.ipynb` to learn how to read data back into pandas DataFrames!
