# Data Backup - Current Non-Historic Data

**Purpose:** Backup all current non-historic data from the database before the refactoring.

**Date:** October 1, 2025

This notebook will:
1. Connect to the database
2. Fetch all current data (sensors, clean_measurements, grid_runs)
3. Store as JSON files for recovery
4. Create a manifest file with metadata

---

## 1. Setup & Configuration

In [17]:
# Import required libraries
import os
import json
import pandas as pd
import sqlalchemy as sa
from datetime import datetime, timezone
from pathlib import Path
from dotenv import load_dotenv

print("Libraries imported successfully")

Libraries imported successfully


In [18]:
# Load environment variables
load_dotenv(Path(".env"), override=False)

DATABASE_URL = os.getenv("DATABASE_URL")
if not DATABASE_URL:
    raise ValueError("DATABASE_URL not found in environment variables")

print("✓ Database URL loaded")
print(f"  Database: {DATABASE_URL.split('@')[1] if '@' in DATABASE_URL else 'configured'}")

✓ Database URL loaded
  Database: db.shizuku.02labs.me/d6ijt5s230gvd1?sslmode=require


In [19]:
# Create backup directory
BACKUP_DIR = Path("data_backup")
BACKUP_DIR.mkdir(exist_ok=True)

# Create timestamped subdirectory
timestamp = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S")
BACKUP_SUBDIR = BACKUP_DIR / f"backup_{timestamp}"
BACKUP_SUBDIR.mkdir(exist_ok=True)

print(f"✓ Backup directory created: {BACKUP_SUBDIR}")

✓ Backup directory created: data_backup/backup_20251001_121447


In [20]:
# Create database engine
engine = sa.create_engine(DATABASE_URL, pool_pre_ping=True)

# Test connection
with engine.connect() as conn:
    result = conn.execute(sa.text("SELECT version();"))
    version = result.scalar()
    print("✓ Database connection successful")
    print(f"  PostgreSQL version: {version.split(',')[0]}")

✓ Database connection successful
  PostgreSQL version: PostgreSQL 17.4 on aarch64-unknown-linux-gnu


## 2. Fetch Sensors Data

In [21]:
# Fetch all sensors
query = """
SELECT 
    id,
    name,
    provider_id,
    lat,
    lon,
    elevation_m,
    city,
    subbasin,
    barrio,
    metadata,
    created_at,
    updated_at
FROM sensors
ORDER BY id;
"""

sensors_df = pd.read_sql(query, engine)

print(f"✓ Fetched {len(sensors_df)} sensors")
print(f"  First sensor: {sensors_df['id'].iloc[0] if len(sensors_df) > 0 else 'N/A'}")
print(f"  Last sensor: {sensors_df['id'].iloc[-1] if len(sensors_df) > 0 else 'N/A'}")

sensors_df.head()

✓ Fetched 239 sensors
  First sensor: pluvio_1
  Last sensor: vaisala_83


Unnamed: 0,id,name,provider_id,lat,lon,elevation_m,city,subbasin,barrio,metadata,created_at,updated_at
0,pluvio_1,Casa de Gobierno Altavista,1,6.2226,-75.6282,,Medellin,Q. Altavista,Altavista - Sector central,"{'barrio': 'Altavista - Sector central', 'comu...",2025-09-29 02:56:01.458825+00:00,2025-10-01 12:10:50.464704+00:00
1,pluvio_10,Escuela Rural El Boqueron,10,6.315334,-75.657627,,Medellin,Q. La Iguana,Boqueron,"{'barrio': 'Boqueron', 'comuna': '', 'source':...",2025-09-29 02:56:01.458825+00:00,2025-10-01 12:10:50.464704+00:00
2,pluvio_1019,Torre SIATA - Thies,1019,6.259215,-75.58864,,Medellin,Q. La Hueso,,"{'barrio': '', 'comuna': '', 'source': 'curren...",2025-09-29 02:56:01.458825+00:00,2025-10-01 12:10:50.464704+00:00
3,pluvio_105,Parque 3 Aguas,105,6.09628,-75.63536,,Caldas,R. Aburra-Medellin,La Miel,"{'barrio': 'La Miel', 'comuna': '', 'source': ...",2025-09-29 02:56:01.458825+00:00,2025-10-01 12:10:50.464704+00:00
4,pluvio_11,Escuela Rural Fabio Zuluaga,11,6.273131,-75.651637,,Medellin,Q. La Iguana,La Palma,"{'barrio': 'La Palma', 'comuna': '', 'source':...",2025-09-29 02:56:01.458825+00:00,2025-10-01 12:10:50.464704+00:00


In [22]:
# Save sensors to JSON
sensors_file = BACKUP_SUBDIR / "sensors.json"

# Convert timestamps to ISO format
sensors_export = sensors_df.copy()
sensors_export['created_at'] = sensors_export['created_at'].astype(str)
sensors_export['updated_at'] = sensors_export['updated_at'].astype(str)

# Convert to dict and save
sensors_data = sensors_export.to_dict(orient='records')
with open(sensors_file, 'w') as f:
    json.dump(sensors_data, f, indent=2, default=str)

print(f"✓ Saved sensors to: {sensors_file}")
print(f"  File size: {sensors_file.stat().st_size / 1024:.2f} KB")

✓ Saved sensors to: data_backup/backup_20251001_121447/sensors.json
  File size: 122.76 KB


## 3. Fetch Clean Measurements (Non-Historic)

We'll fetch clean measurements from the last 30 days to capture current operational data.

**Note:** Clean measurements may include imputed values using:
- **Primary method:** ARIMA (AutoRegressive Integrated Moving Average)
- **Fallback method:** 0 (zero) when ARIMA cannot be applied

In [23]:
# Fetch clean measurements from last 30 days
query = """
SELECT 
    id,
    sensor_id,
    ts,
    value_mm,
    qc_flags,
    imputation_method,
    version,
    created_at,
    updated_at
FROM clean_measurements
WHERE ts >= NOW() - INTERVAL '30 days'
ORDER BY ts DESC, sensor_id;
"""

clean_measurements_df = pd.read_sql(query, engine)

print(f"✓ Fetched {len(clean_measurements_df):,} clean measurements")
if len(clean_measurements_df) > 0:
    print(f"  Date range: {clean_measurements_df['ts'].min()} to {clean_measurements_df['ts'].max()}")
    print(f"  Unique sensors: {clean_measurements_df['sensor_id'].nunique()}")
    print(f"  Total measurements per sensor (avg): {len(clean_measurements_df) / clean_measurements_df['sensor_id'].nunique():.1f}")

clean_measurements_df.head()

✓ Fetched 74,806 clean measurements
  Date range: 2025-09-29 00:50:51+00:00 to 2025-10-01 12:01:15+00:00
  Unique sensors: 226
  Total measurements per sensor (avg): 331.0


Unnamed: 0,id,sensor_id,ts,value_mm,qc_flags,imputation_method,version,created_at,updated_at
0,636194,pluvio_1,2025-10-01 12:01:15+00:00,0.0,0,,1,2025-10-01 12:01:22.025362+00:00,2025-10-01 12:01:22.025362+00:00
1,636195,pluvio_10,2025-10-01 12:01:15+00:00,0.0,2,global_median,1,2025-10-01 12:01:22.025362+00:00,2025-10-01 12:01:22.025362+00:00
2,636196,pluvio_1019,2025-10-01 12:01:15+00:00,0.0,0,,1,2025-10-01 12:01:22.025362+00:00,2025-10-01 12:01:22.025362+00:00
3,636197,pluvio_105,2025-10-01 12:01:15+00:00,0.0,0,,1,2025-10-01 12:01:22.025362+00:00,2025-10-01 12:01:22.025362+00:00
4,636198,pluvio_11,2025-10-01 12:01:15+00:00,0.0,0,,1,2025-10-01 12:01:22.025362+00:00,2025-10-01 12:01:22.025362+00:00


In [24]:
# Save clean measurements to JSON (chunked if large)
clean_file = BACKUP_SUBDIR / "clean_measurements.json"

# Convert timestamps to ISO format
clean_export = clean_measurements_df.copy()
clean_export['ts'] = clean_export['ts'].astype(str)
clean_export['created_at'] = clean_export['created_at'].astype(str)
clean_export['updated_at'] = clean_export['updated_at'].astype(str)

# Convert to dict and save
clean_data = clean_export.to_dict(orient='records')

# If file is too large (>50MB), split into chunks
import sys
data_size = sys.getsizeof(json.dumps(clean_data, default=str))

if data_size > 50 * 1024 * 1024:  # 50 MB
    print(f"  Data size: {data_size / (1024*1024):.2f} MB - splitting into chunks")
    chunk_size = 10000
    for i in range(0, len(clean_data), chunk_size):
        chunk = clean_data[i:i+chunk_size]
        chunk_file = BACKUP_SUBDIR / f"clean_measurements_chunk_{i//chunk_size + 1}.json"
        with open(chunk_file, 'w') as f:
            json.dump(chunk, f, indent=2, default=str)
        print(f"  ✓ Saved chunk {i//chunk_size + 1} ({len(chunk)} records)")
else:
    with open(clean_file, 'w') as f:
        json.dump(clean_data, f, indent=2, default=str)
    print(f"✓ Saved clean measurements to: {clean_file}")
    print(f"  File size: {clean_file.stat().st_size / 1024:.2f} KB")

✓ Saved clean measurements to: data_backup/backup_20251001_121447/clean_measurements.json
  File size: 21530.34 KB


## 4. Fetch Grid Runs Data

In [25]:
# Fetch grid runs from last 30 days
query = """
SELECT 
    id,
    ts,
    res_m,
    bbox,
    crs,
    blob_url_json,
    blob_url_npz,
    blob_url_contours,
    status,
    message,
    created_at,
    updated_at
FROM grid_runs
WHERE ts >= NOW() - INTERVAL '30 days'
ORDER BY ts DESC;
"""

grid_runs_df = pd.read_sql(query, engine)

print(f"✓ Fetched {len(grid_runs_df)} grid runs")
if len(grid_runs_df) > 0:
    print(f"  Date range: {grid_runs_df['ts'].min()} to {grid_runs_df['ts'].max()}")
    print(f"  Status breakdown:")
    print(grid_runs_df['status'].value_counts().to_string())

grid_runs_df.head()

✓ Fetched 277 grid runs
  Date range: 2025-09-29 00:00:00+00:00 to 2025-10-01 11:00:00+00:00
  Status breakdown:
status
done      273
failed      4


Unnamed: 0,id,ts,res_m,bbox,crs,blob_url_json,blob_url_npz,blob_url_contours,status,message,created_at,updated_at
0,318,2025-10-01 11:00:00+00:00,500,"[-8533035.968636995, 604561.383004428, -830705...",EPSG:3857,https://nt9pzjxsvf6ahuq3.public.blob.vercel-st...,https://nt9pzjxsvf6ahuq3.public.blob.vercel-st...,https://nt9pzjxsvf6ahuq3.public.blob.vercel-st...,done,,2025-10-01 12:00:59.192186+00:00,2025-10-01 12:01:01.936731+00:00
1,317,2025-10-01 10:00:00+00:00,500,"[-8533035.968636995, 604561.383004428, -830705...",EPSG:3857,https://nt9pzjxsvf6ahuq3.public.blob.vercel-st...,https://nt9pzjxsvf6ahuq3.public.blob.vercel-st...,https://nt9pzjxsvf6ahuq3.public.blob.vercel-st...,done,,2025-10-01 11:00:41.891752+00:00,2025-10-01 11:00:44.845568+00:00
2,316,2025-10-01 09:00:00+00:00,500,"[-8533035.968636995, 604561.383004428, -830705...",EPSG:3857,https://nt9pzjxsvf6ahuq3.public.blob.vercel-st...,https://nt9pzjxsvf6ahuq3.public.blob.vercel-st...,https://nt9pzjxsvf6ahuq3.public.blob.vercel-st...,done,,2025-10-01 10:00:44.118530+00:00,2025-10-01 10:00:46.387564+00:00
3,315,2025-10-01 08:00:00+00:00,500,"[-8533035.968636995, 604561.383004428, -830705...",EPSG:3857,https://nt9pzjxsvf6ahuq3.public.blob.vercel-st...,https://nt9pzjxsvf6ahuq3.public.blob.vercel-st...,https://nt9pzjxsvf6ahuq3.public.blob.vercel-st...,done,,2025-10-01 09:00:36.753405+00:00,2025-10-01 09:00:39.334860+00:00
4,314,2025-10-01 07:00:00+00:00,500,"[-8533035.968636995, 604561.383004428, -830705...",EPSG:3857,https://nt9pzjxsvf6ahuq3.public.blob.vercel-st...,https://nt9pzjxsvf6ahuq3.public.blob.vercel-st...,https://nt9pzjxsvf6ahuq3.public.blob.vercel-st...,done,,2025-10-01 08:00:39.877302+00:00,2025-10-01 08:00:42.088216+00:00


In [26]:
# Save grid runs to JSON
grid_file = BACKUP_SUBDIR / "grid_runs.json"

# Convert timestamps to ISO format
grid_export = grid_runs_df.copy()
grid_export['ts'] = grid_export['ts'].astype(str)
grid_export['created_at'] = grid_export['created_at'].astype(str)
grid_export['updated_at'] = grid_export['updated_at'].astype(str)
# Convert bbox JSONB to string
grid_export['bbox'] = grid_export['bbox'].astype(str)

# Convert to dict and save
grid_data = grid_export.to_dict(orient='records')
with open(grid_file, 'w') as f:
    json.dump(grid_data, f, indent=2, default=str)

print(f"✓ Saved grid runs to: {grid_file}")
print(f"  File size: {grid_file.stat().st_size / 1024:.2f} KB")

✓ Saved grid runs to: data_backup/backup_20251001_121447/grid_runs.json
  File size: 188.90 KB


## 5. Fetch Raw Measurements (Last 7 days only)

In [27]:
# Fetch raw measurements from last 7 days (non-historic)
query = """
SELECT 
    id,
    sensor_id,
    ts,
    value_mm,
    quality,
    variable,
    source,
    ingested_at,
    created_at,
    updated_at
FROM raw_measurements
WHERE ts >= NOW() - INTERVAL '7 days'
  AND source = 'current'
ORDER BY ts DESC, sensor_id;
"""

raw_measurements_df = pd.read_sql(query, engine)

print(f"✓ Fetched {len(raw_measurements_df):,} raw measurements (current, last 7 days)")
if len(raw_measurements_df) > 0:
    print(f"  Date range: {raw_measurements_df['ts'].min()} to {raw_measurements_df['ts'].max()}")
    print(f"  Unique sensors: {raw_measurements_df['sensor_id'].nunique()}")

raw_measurements_df.head()

✓ Fetched 75,032 raw measurements (current, last 7 days)
  Date range: 2025-09-29 00:50:51+00:00 to 2025-10-01 12:10:49+00:00
  Unique sensors: 226


Unnamed: 0,id,sensor_id,ts,value_mm,quality,variable,source,ingested_at,created_at,updated_at
0,636434,pluvio_1,2025-10-01 12:10:49+00:00,0.0,,precipitacion,current,2025-10-01 12:10:50.592896+00:00,2025-10-01 12:10:50.592896+00:00,2025-10-01 12:10:50.592896+00:00
1,636422,pluvio_10,2025-10-01 12:10:49+00:00,,,precipitacion,current,2025-10-01 12:10:50.592896+00:00,2025-10-01 12:10:50.592896+00:00,2025-10-01 12:10:50.592896+00:00
2,636629,pluvio_1019,2025-10-01 12:10:49+00:00,0.0,,precipitacion,current,2025-10-01 12:10:50.592896+00:00,2025-10-01 12:10:50.592896+00:00,2025-10-01 12:10:50.592896+00:00
3,636500,pluvio_105,2025-10-01 12:10:49+00:00,0.0,,precipitacion,current,2025-10-01 12:10:50.592896+00:00,2025-10-01 12:10:50.592896+00:00,2025-10-01 12:10:50.592896+00:00
4,636425,pluvio_11,2025-10-01 12:10:49+00:00,0.0,,precipitacion,current,2025-10-01 12:10:50.592896+00:00,2025-10-01 12:10:50.592896+00:00,2025-10-01 12:10:50.592896+00:00


In [28]:
# Save raw measurements to JSON
raw_file = BACKUP_SUBDIR / "raw_measurements_current.json"

# Convert timestamps to ISO format
raw_export = raw_measurements_df.copy()
raw_export['ts'] = raw_export['ts'].astype(str)
raw_export['ingested_at'] = raw_export['ingested_at'].astype(str)
raw_export['created_at'] = raw_export['created_at'].astype(str)
raw_export['updated_at'] = raw_export['updated_at'].astype(str)

# Convert to dict and save
raw_data = raw_export.to_dict(orient='records')

# Check size and split if needed
data_size = sys.getsizeof(json.dumps(raw_data, default=str))

if data_size > 50 * 1024 * 1024:  # 50 MB
    print(f"  Data size: {data_size / (1024*1024):.2f} MB - splitting into chunks")
    chunk_size = 10000
    for i in range(0, len(raw_data), chunk_size):
        chunk = raw_data[i:i+chunk_size]
        chunk_file = BACKUP_SUBDIR / f"raw_measurements_current_chunk_{i//chunk_size + 1}.json"
        with open(chunk_file, 'w') as f:
            json.dump(chunk, f, indent=2, default=str)
        print(f"  ✓ Saved chunk {i//chunk_size + 1} ({len(chunk)} records)")
else:
    with open(raw_file, 'w') as f:
        json.dump(raw_data, f, indent=2, default=str)
    print(f"✓ Saved raw measurements to: {raw_file}")
    print(f"  File size: {raw_file.stat().st_size / 1024:.2f} KB")

✓ Saved raw measurements to: data_backup/backup_20251001_121447/raw_measurements_current.json
  File size: 26200.34 KB


## 6. Create Backup Manifest

In [29]:
# Create manifest with backup metadata
manifest = {
    "backup_timestamp": datetime.now(timezone.utc).isoformat(),
    "backup_type": "current_non_historic_data",
    "description": "Backup of current operational data before refactoring",
    "data_range": {
        "clean_measurements": "Last 30 days",
        "raw_measurements": "Last 7 days (current source only)",
        "grid_runs": "Last 30 days"
    },
    "counts": {
        "sensors": len(sensors_df),
        "clean_measurements": len(clean_measurements_df),
        "raw_measurements": len(raw_measurements_df),
        "grid_runs": len(grid_runs_df)
    },
    "imputation_strategy": {
        "primary_method": "ARIMA",
        "fallback_method": "0 (zero)",
        "notes": "Clean measurements may contain imputed values using ARIMA or 0 as fallback"
    },
    "files": [
        str(f.name) for f in BACKUP_SUBDIR.glob("*.json")
    ],
    "database_info": {
        "url_host": DATABASE_URL.split('@')[1].split('/')[0] if '@' in DATABASE_URL else 'configured',
        "backup_date": datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S UTC")
    },
    "recovery_notes": [
        "This backup contains current operational data only (non-historic).",
        "To restore: use the recovery notebook or import JSON files manually.",
        "Files are in JSON format for easy inspection and recovery.",
        "Large files may be split into chunks.",
        "Clean measurements may include ARIMA-imputed values (with 0 as fallback)."
    ]
}

manifest_file = BACKUP_SUBDIR / "MANIFEST.json"
with open(manifest_file, 'w') as f:
    json.dump(manifest, f, indent=2)

print("✓ Backup manifest created")
print("\nBackup Summary:")
print(f"  Sensors: {manifest['counts']['sensors']:,}")
print(f"  Clean Measurements: {manifest['counts']['clean_measurements']:,}")
print(f"  Raw Measurements: {manifest['counts']['raw_measurements']:,}")
print(f"  Grid Runs: {manifest['counts']['grid_runs']:,}")
print(f"\n  Imputation: {manifest['imputation_strategy']['primary_method']} (fallback: {manifest['imputation_strategy']['fallback_method']})")
print(f"\n  Files created: {len(manifest['files'])}")
print(f"  Location: {BACKUP_SUBDIR}")

✓ Backup manifest created

Backup Summary:
  Sensors: 239
  Clean Measurements: 74,806
  Raw Measurements: 75,032
  Grid Runs: 277

  Imputation: ARIMA (fallback: 0 (zero))

  Files created: 4
  Location: data_backup/backup_20251001_121447


## 7. Create README for Recovery

In [30]:
# Create README with recovery instructions
readme_content = f"""# Data Backup - {timestamp}

## Backup Information

**Date:** {datetime.now(timezone.utc).strftime('%Y-%m-%d %H:%M:%S UTC')}
**Type:** Current Non-Historic Data
**Purpose:** Pre-refactoring backup

## Contents

- `sensors.json` - All sensor metadata ({manifest['counts']['sensors']} sensors)
- `clean_measurements*.json` - Clean measurements from last 30 days ({manifest['counts']['clean_measurements']:,} records)
- `raw_measurements_current*.json` - Raw measurements from last 7 days ({manifest['counts']['raw_measurements']:,} records)
- `grid_runs.json` - Grid runs from last 30 days ({manifest['counts']['grid_runs']} runs)
- `MANIFEST.json` - Backup metadata and file list

## Data Ranges

- **Clean Measurements:** Last 30 days
- **Raw Measurements:** Last 7 days (current source only, non-historic)
- **Grid Runs:** Last 30 days
- **Sensors:** All sensors

## Data Quality & Imputation

Clean measurements may include imputed values:
- **Primary Method:** ARIMA (AutoRegressive Integrated Moving Average)
- **Fallback Method:** 0 (zero) when ARIMA cannot be applied
- Check the `imputation_method` field in clean_measurements to identify imputed values

## Recovery Instructions

### Option 1: Using Recovery Notebook

1. Open `notebooks/00_restore_backup.ipynb`
2. Update the backup path to point to this directory
3. Run all cells

### Option 2: Manual Recovery (Python)

```python
import json
import pandas as pd
import sqlalchemy as sa

# Load data
with open('sensors.json') as f:
    sensors = json.load(f)

# Create DataFrame
sensors_df = pd.DataFrame(sensors)

# Insert into database
engine = sa.create_engine(DATABASE_URL)
sensors_df.to_sql('sensors', engine, if_exists='append', index=False)
```

### Option 3: Direct SQL Import

Use the `00_restore_backup.ipynb` notebook for automated restoration.

## Notes

- All timestamps are in UTC
- Large files are split into chunks (\*_chunk_\*.json)
- JSON files use standard formatting for easy inspection
- This backup contains operational data only (excludes historic data)
- Clean measurements may contain ARIMA-imputed values (check `imputation_method` field)

## Verification

After recovery, verify counts match:
- Sensors: {manifest['counts']['sensors']:,}
- Clean Measurements: {manifest['counts']['clean_measurements']:,}
- Raw Measurements: {manifest['counts']['raw_measurements']:,}
- Grid Runs: {manifest['counts']['grid_runs']:,}

## Support

If you encounter issues during recovery, check:
1. Database connection is working
2. Required tables exist
3. No data conflicts (duplicate IDs)
4. Sufficient disk space

---

**Generated by:** 00_backup_current_data.ipynb
"""

readme_file = BACKUP_SUBDIR / "README.md"
with open(readme_file, 'w') as f:
    f.write(readme_content)

print(f"✓ README created: {readme_file}")

✓ README created: data_backup/backup_20251001_121447/README.md




## 8. Verify Backup Integrity

In [31]:
# Verify all files exist and can be read
print("Verifying backup integrity...\n")

verification_results = []

for json_file in BACKUP_SUBDIR.glob("*.json"):
    try:
        with open(json_file, 'r') as f:
            data = json.load(f)
        
        if json_file.name == "MANIFEST.json":
            status = "✓ Valid"
            record_count = "N/A (manifest)"
        else:
            record_count = len(data)
            status = "✓ Valid" if record_count > 0 else "⚠ Empty"
        
        verification_results.append({
            'file': json_file.name,
            'size_kb': json_file.stat().st_size / 1024,
            'records': record_count,
            'status': status
        })
        
    except Exception as e:
        verification_results.append({
            'file': json_file.name,
            'size_kb': json_file.stat().st_size / 1024,
            'records': 'Error',
            'status': f"✗ Failed: {str(e)}"
        })

verification_df = pd.DataFrame(verification_results)
print(verification_df.to_string(index=False))

# Calculate total size
total_size = sum(f.stat().st_size for f in BACKUP_SUBDIR.glob("*.json"))
print(f"\nTotal backup size: {total_size / (1024*1024):.2f} MB")

# Check if all verifications passed
all_valid = all('✓' in result['status'] for result in verification_results)
if all_valid:
    print("\n✅ All files verified successfully!")
else:
    print("\n⚠️  Some files failed verification - please check the results above")

Verifying backup integrity...



                         file      size_kb        records  status
raw_measurements_current.json 26200.342773          75032 ✓ Valid
               grid_runs.json   188.895508            277 ✓ Valid
                 sensors.json   122.762695            239 ✓ Valid
      clean_measurements.json 21530.336914          74806 ✓ Valid
                MANIFEST.json     1.226562 N/A (manifest) ✓ Valid

Total backup size: 46.92 MB

✅ All files verified successfully!


## 9. Summary & Next Steps

In [32]:
# Display summary
print("=" * 70)
print("BACKUP COMPLETED SUCCESSFULLY")
print("=" * 70)
print(f"\nBackup Location: {BACKUP_SUBDIR.absolute()}")
print(f"\nData Summary:")
print(f"  • Sensors: {manifest['counts']['sensors']:,}")
print(f"  • Clean Measurements: {manifest['counts']['clean_measurements']:,} (last 30 days)")
print(f"  • Raw Measurements: {manifest['counts']['raw_measurements']:,} (last 7 days, non-historic)")
print(f"  • Grid Runs: {manifest['counts']['grid_runs']:,} (last 30 days)")
print(f"\nTotal Files: {len(list(BACKUP_SUBDIR.glob('*')))}")
print(f"Total Size: {total_size / (1024*1024):.2f} MB")

print("\n" + "=" * 70)
print("NEXT STEPS")
print("=" * 70)
print("\n1. Verify backup files are accessible")
print("2. Review the refactoring plan in docs/REFACTORING_PLAN.md")
print("3. Proceed with database schema changes (Phase 1)")
print("4. Keep this backup until refactoring is complete and verified")
print("\n📝 Note: This backup contains ONLY current operational data (non-historic)")
print("   Historic data remains in the database and is not affected by the refactoring.")
print("\n✅ You can now safely proceed with the refactoring!")

BACKUP COMPLETED SUCCESSFULLY

Backup Location: /workspaces/Siata-Contamination-Viewer/notebooks/data_backup/backup_20251001_121447

Data Summary:
  • Sensors: 239
  • Clean Measurements: 74,806 (last 30 days)
  • Raw Measurements: 75,032 (last 7 days, non-historic)
  • Grid Runs: 277 (last 30 days)

Total Files: 6
Total Size: 46.92 MB

NEXT STEPS

1. Verify backup files are accessible
2. Review the refactoring plan in docs/REFACTORING_PLAN.md
3. Proceed with database schema changes (Phase 1)
4. Keep this backup until refactoring is complete and verified

📝 Note: This backup contains ONLY current operational data (non-historic)
   Historic data remains in the database and is not affected by the refactoring.

✅ You can now safely proceed with the refactoring!


---

## Backup Complete!

The backup has been created successfully. All current non-historic data has been exported to JSON format and is ready for recovery if needed.

**Important Notes:**
- This backup contains only recent operational data (last 30 days for clean data, 7 days for raw data)
- Historic data remains untouched in the database
- The backup is stored in: `data_backup/backup_TIMESTAMP/`
- Use the recovery notebook if you need to restore this data

**Before proceeding with refactoring:**
1. ✅ Backup complete
2. ⏳ Review refactoring plan
3. ⏳ Create database schema changes
4. ⏳ Update ETL service
5. ⏳ Update API service

---

In [33]:
import zipfile

zip_path = BACKUP_DIR / f"{BACKUP_SUBDIR.name}.zip"

with zipfile.ZipFile(zip_path, 'w', zipfile.ZIP_DEFLATED) as zipf:
    for file in BACKUP_SUBDIR.iterdir():
        zipf.write(file, arcname=file.name)

print(f"✓ Backup folder zipped: {zip_path} ({zip_path.stat().st_size / (1024*1024):.2f} MB)")

✓ Backup folder zipped: data_backup/backup_20251001_121447.zip (1.07 MB)
