fix: prevent data loss during S3 outages with WAL-based recovery by khalid244 · Pull Request #162 · Basekick-Labs/arc

khalid244 · 2026-01-25T20:20:03Z

Summary

Fixes critical data loss bugs when S3 becomes unavailable. Data is preserved in WAL and automatically recovered when S3 is healthy again.

Key Changes

WAL Recovery - Data stays in WAL if flush fails, recovered on restart or via periodic recovery
No Memory Growth - Buffer cleared on flush failure (data safe in WAL)
Periodic Recovery - Background goroutine replays WAL every 5 minutes
WAL File Deletion - Files deleted after successful recovery (not renamed to .recovered)

Flow During S3 Outage

Data In -> WAL (disk) -> Buffer (memory) -> Flush -> S3 FAIL
                                                        |
Buffer cleared (memory freed) <-------------------------+
Data preserved in WAL
Periodic recovery replays when S3 recovers
WAL file deleted after successful recovery

Changes by File

cmd/arc/main.go

Scenario	Before	After
WAL Recovery Timing	Runs before ArrowBuffer exists	Runs after ArrowBuffer is ready
Recovery Callback	Just logs, doesn't replay data	Actually replays to ArrowBuffer
S3 Outage Recovery	Requires restart	Periodic recovery every 5 minutes

internal/ingest/arrow_writer.go

Scenario	Before	After
Flush Queue Full	dropping task - unclear fate	preserved in WAL - data safe
Async Flush Failure	Silent error, no record count	Logs record count, WAL safety note
Sync Flush Failure	No logging at all	Warns with record count
Memory During Outage	Could grow indefinitely	Bounded - data stays in WAL only

internal/wal/recovery.go

Scenario	Before	After
Entry Failure	continue - processes remaining	break - stops, retries later
After Recovery	Rename to .recovered	Delete file
Partial Failure	File renamed anyway (data lost)	File kept for retry
Stats Tracking	Always counts as recovered	Only counts if ALL succeed
Logging	No entry count	Shows entry count

End-to-End Scenarios

Scenario	Before	After
S3 goes down during flush	Data lost or memory grows	Data safe in WAL, memory bounded
S3 recovers	Must restart to recover	Auto-recovers within 5 minutes
Recovery succeeds	.recovered files pile up	Files deleted, disk freed
Recovery partially fails	Lost entries, file archived	File kept, retry on next cycle
Startup recovery	Logs only, data lost	Actually replays all data

Test Plan

Deploy to Kubernetes (arc namespace)
Ingest data during S3 outage (503 errors)
Verify data preserved in WAL log message
Restart pod, verify WAL recovery runs
Verify WAL file recovered and deleted log message
Confirm WAL directory only has new empty file

xe-nvdk

Review: PR #162 - WAL-based S3 Recovery

Thanks for identifying this issue - you're correct that our WAL recovery callback was essentially a no-op since ArrowBuffer isn't available at that point. This is a legitimate bug that needs fixing.

However, I have some concerns about the implementation that need to be addressed before we can merge.

1. Code Duplication (~50 lines repeated)

The recovery callback logic is copy-pasted twice in main.go:

Lines 309-356 (startup recovery)
Lines 371-402 (periodic recovery)

Request: Extract this into a shared function, something like:

func createWALRecoveryCallback(arrowBuffer *ingest.ArrowBuffer, logger zerolog.Logger) wal.RecoveryCallback {
    return func(ctx context.Context, records []map[string]interface{}) error {
        // ... shared logic
    }
}

2. Hardcoded 5-Minute Interval

The periodic recovery interval is hardcoded:

ticker := time.NewTicker(5 * time.Minute)

Request: Make this configurable via cfg.WAL.RecoveryInterval with a sensible default (5 minutes is fine). Add it to the config struct and documentation.

3. Missing Tests

This PR adds critical durability behavior but no tests. Please add:

Unit test for the recovery callback (mock WriteColumnarDirect)
Integration test simulating S3 failure → WAL accumulation → recovery
Test for partial recovery failure (ensure file is kept for retry)

4. Recovery During Active WAL Write

When periodic recovery runs, what happens if the WAL writer is actively appending to the current file? The findWALFiles() will return the active file, and we might try to recover entries that haven't been synced yet.

Request: Either:

Skip the current/active WAL file during recovery (check by filename timestamp vs current WAL)
Or rotate the WAL before recovery to ensure clean boundaries

5. No Backpressure During Mass Recovery

If S3 is down for hours, WAL files accumulate. When S3 recovers, periodic recovery tries to replay everything at once. This could:

Overwhelm the ArrowBuffer
Cause memory pressure
Create a thundering herd of flushes

Request: Add rate limiting or batching to recovery replay.

Minor Comments

arrow_writer.go - The new log messages are helpful. Consider adding a metric counter for "records preserved in WAL for recovery" so operators can track this.

recovery.go - The change from continue to break on entry failure is correct. Good catch on the partial failure handling.

Summary

The core idea is sound and addresses a real bug. Please address the above concerns, especially:

Extract shared recovery callback
Make interval configurable
Add tests
Handle active WAL file during periodic recovery

Happy to discuss any of these points!

khalid244 · 2026-01-25T22:27:11Z

Addressed All Review Feedback

Thanks for the detailed review! I've implemented all the requested changes:

1. ✅ Extracted Shared Recovery Callback

Created createWALRecoveryCallback() function in main.go:981 that's used by both startup and periodic recovery, eliminating ~50 lines of duplication.

2. ✅ Configurable Recovery Interval

Added two new config options to WALConfig:

RecoveryIntervalSeconds (default: 300 = 5 minutes)
RecoveryBatchSize (default: 10000 records)

Can be configured via:

[wal]
recovery_interval_seconds = 300
recovery_batch_size = 10000

Or environment variables:

ARC_WAL_RECOVERY_INTERVAL_SECONDS=600
ARC_WAL_RECOVERY_BATCH_SIZE=5000

3. ✅ Added Tests

New tests in wal_test.go:

TestRecovery_SkipActiveFile - Verifies active file is skipped
TestRecovery_BatchSize - Verifies rate limiting splits large batches
TestRecovery_PartialFailure_KeepsFileForRetry - Verifies file retention on failure
TestRecovery_BatchSize_PartialBatchFailure - Verifies partial batch failure handling

New tests in config_test.go:

TestWALConfig_Defaults - Verifies default values
TestWALConfig_EnvOverride - Verifies env var overrides

4. ✅ Skip Active WAL File During Periodic Recovery

Added RecoveryOptions struct with SkipActiveFile field. Periodic recovery now calls walWriter.CurrentFile() and passes it to skip the file being actively written.

5. ✅ Backpressure During Mass Recovery

Added BatchSize option to RecoveryOptions. Large WAL entries are split into smaller batches (default 10,000 records) to prevent overwhelming ArrowBuffer during mass recovery after prolonged outages.

6. ✅ Added WAL Metrics

New Prometheus metrics in metrics.go:

arc_wal_records_preserved_total - Records preserved in WAL for recovery (flush failures)
arc_wal_recovery_total - Successful WAL recovery operations
arc_wal_recovery_records_total - Total records recovered from WAL

Test Results:

All WAL tests pass (21 tests, 83.1% coverage)
All config tests pass (34 tests, 94.2% coverage)
go vet clean

This commit fixes critical data loss bugs when S3 becomes unavailable. Instead of growing memory indefinitely, data is preserved in WAL and recovered when S3 is healthy again. Key changes: 1. WAL Recovery - Data written to WAL at ingest time stays there if flush fails. Recovery happens on restart AND via periodic recovery (every 5 minutes) without requiring restart. 2. No Memory Growth - When flush fails, data is NOT restored to buffer. Since it's already in WAL, memory stays bounded during prolonged S3 outages. 3. Periodic WAL Recovery - Background goroutine checks for pending WAL files every 5 minutes and replays them through ArrowBuffer when S3 is healthy. 4. WAL File Deletion - WAL files are now deleted immediately after successful recovery instead of being renamed to .recovered. Files are only deleted if ALL entries are successfully replayed. Flow during S3 outage: Data In -> WAL (disk) -> Buffer (memory) -> Flush -> S3 FAIL | Buffer cleared (memory freed) <------------------------+ Data preserved in WAL Periodic recovery replays when S3 recovers WAL file deleted after successful recovery Changes: - internal/ingest/arrow_writer.go: Remove restore-to-buffer logic, rely on WAL for durability during S3 outages - cmd/arc/main.go: Add periodic WAL recovery goroutine that runs every 5 minutes to recover data without restart - internal/wal/recovery.go: Delete WAL files after successful recovery, keep files for retry if any entry fails

xe-nvdk

Code looks good. All feedback addressed:

Extracted shared callback - createWALRecoveryCallback() eliminates duplication
Configurable recovery interval - recovery_interval_seconds (default 300s)
Configurable batch size - recovery_batch_size (default 10000) for backpressure
Skip active WAL file - Periodic recovery skips file being written
WAL metrics - wal_records_preserved_total, wal_recovery_total, wal_recovery_records_total
Comprehensive tests - 4 new test cases

PR has merge conflicts with main. Please rebase and resolve, then we can merge.

- Extract shared recovery callback to eliminate code duplication - Make recovery interval configurable via wal.recovery_interval_seconds - Add rate limiting via wal.recovery_batch_size config - Skip active WAL file during periodic recovery - Add WAL metrics (records_preserved, recovery_total, recovery_records) - Add comprehensive tests for recovery options

Documents the WAL recovery fix for preventing data loss during S3 outages, including new configuration options and metrics.

@khalid244

## New Features ### InfluxDB Client Compatibility Arc's Line Protocol endpoints now use the same paths as InfluxDB, enabling drop-in compatibility with all official InfluxDB client libraries (Go, Python, JavaScript, Java, C#, PHP, Ruby, Telegraf, Node-RED). - `/api/v1/write` → `/write` (InfluxDB 1.x clients, Telegraf) - `/api/v1/write/influxdb` → `/api/v2/write` (InfluxDB 2.x clients) - Arc-native endpoint `/api/v1/write/line-protocol` preserved - Supports Bearer token, Token header, API key header, and query parameter authentication ### MQTT Ingestion Support Native MQTT subscription for IoT and edge data ingestion. Connect directly to MQTT brokers without requiring additional infrastructure. - Subscribe to multiple MQTT topics with wildcard support (`+`, `#`) - Dynamic subscription management via REST API - TLS/SSL connections with certificate validation - Authentication via username/password or client certificates - Connection auto-reconnect with exponential backoff - Per-subscription statistics and monitoring - Passwords encrypted at rest - Auto-start subscriptions on server restart ### S3 File Caching via cache_httpfs Extension (PR #149) Optional in-memory caching of S3 Parquet files via DuckDB's `cache_httpfs` community extension. 5-10x query performance improvement for workloads with repeated file access (CTEs, subqueries, Grafana dashboards). - In-memory only — no disk caching, preserves stateless compute philosophy - Opt-in, configurable cache size and TTL - Graceful degradation if extension fails to load *Contributed by @khalid244* ### Relative Time Expression Support in Partition Pruning Queries using `NOW() - INTERVAL` now benefit from partition pruning. Supports seconds, minutes, hours, days, weeks, months. ## Bug Fixes - **Control Characters in Measurement Names Break S3 (Issue #122)** — Added strict validation for measurement names across all ingestion endpoints (Line Protocol, MsgPack, Continuous Queries). Names must start with a letter, contain only alphanumeric/underscore/hyphen, max 128 chars. - **Partition Pruner Fails on Non-Existent S3 Partitions (Issue #125, #144, PR #145)** — Fixed "No files found" errors when time range includes non-existent S3 partitions. Extended `filterExistingPaths()` for S3/Azure storage. Also fixed `filepath.Join()` mangling S3 URLs. *(Day-level fix by @khalid244)* - **Server Timeout Config Values Ignored (Issue #126)** — `server.read_timeout` and `server.write_timeout` now correctly use configuration values instead of hardcoded 30s. - **Large Payload Ingestion (413 Request Entity Too Large)** — `MaxPayloadSize` config now correctly passed to Fiber's `BodyLimit`. - **Query Results Timestamp Timezone Inconsistency** — All timestamps in query results now normalized to UTC. - **Azure Blob Storage Query SSL Certificate Errors on Linux (PR #92)** — Fixed by using system curl for SSL on Linux. *(Contributed by @schotime)* - **UTC Consistency for Compaction Filenames (PR #132)** — Compacted filenames now use UTC timestamps consistently. *(Contributed by @schotime)* - **S3 Subprocess Configuration Issues (Issue #131)** — Fixed missing credentials and SSL config forwarding to compaction subprocess. - **Query Failures with Non-UTF8 Data (Issue #136)** — Added automatic UTF-8 sanitization during ingestion with optimized fast-path (~6-25ns overhead). - **Nanosecond Timestamp Support for MessagePack Ingestion** — Extended timestamp detection to handle 19-digit nanosecond timestamps (important for InfluxDB migrations). - **WHERE Clause Regex Fails to Match Multi-Line Queries (Issue #146, PR #148)** — Fixed partition pruner failing on multi-line SQL queries. *(Contributed by @khalid244)* - **String Literals Containing SQL Keywords Break Partition Pruning** — Added string literal masking before regex matching. - **Buffer Age-Based Flush Timing Under High Load (Issue #142)** — Ticker now fires at half the configured interval for more consistent flush timing (~25% improvement). - **Arrow Writer Panic During High-Concurrency Writes (Issue #130)** — Fixed out-of-bounds panic during schema evolution with concurrent writes. - **Empty Directories Not Cleaned Up After Daily Compaction** — Automatic cleanup of empty hour-level partition directories after compaction. - **Compactor OOM and Segfaults with Large Datasets (Issue #102)** — Streaming I/O, memory limit passthrough, file batching (1000 max), and adaptive batch sizing on failure. - **Orphaned Compaction Temp Directories (Issue #164, PR #165)** — Two-layer cleanup: startup cleanup + parent-side cleanup after subprocess completion. - **Compaction Data Duplication on Crash (Issue #157, PR #163)** — Manifest-based tracking in S3 prevents re-compaction after crash. *(Contributed by @khalid244)* - **WAL-Based S3 Recovery (Issue #159, PR #162)** — Fixed data loss during S3 outages with startup recovery, periodic recovery, and backpressure handling. *(Contributed by @khalid244)* - **Tiered Storage Query Routing (Issue #166, PR #167)** — Fixed queries not routing to cold tier data and database listing not showing cold-only databases. - **Retention Policies for S3/Azure Storage Backends (Issue #169, PR #170)** — Retention policies now work with all storage backends. - **Retention Policy Empty Directory Cleanup (Issue #171)** — Empty directories cleaned up after retention policy file deletion. - **Query Timeout for S3 Disconnection (Issue #151, PR #152)** — Configurable query timeout prevents indefinite hangs. Returns HTTP 504 when exceeded. *(Contributed by @khalid244)* ## Improvements - **Configurable Server Idle and Shutdown Timeouts** — `server.idle_timeout` and `server.shutdown_timeout` now configurable. - **Automatic Time Function Query Optimization** — `time_bucket()` and `date_trunc()` automatically rewritten to epoch arithmetic (2-2.5x faster GROUP BY queries). - **Parallel Partition Scanning** — Queries spanning 3+ partitions now execute concurrently (2-4x speedup). - **Two-Stage Distributed Aggregation (Enterprise)** — 5-20x speedup for cross-shard aggregations via scatter/gather. - **DuckDB Query Engine Optimizations** — Parquet metadata caching, prefetching, pool-wide `SET GLOBAL` consistency (18-24% faster aggregations). *(SET GLOBAL fix by @khalid244)* - **Automatic Regex-to-String Function Optimization** — URL domain extraction patterns rewritten to native string functions (2x+ faster). - **Database Header for Query Optimization** — `x-arc-database` header skips regex parsing (5-17% faster queries). - **MQTT Client Auto-Generated Client ID** — Prevents client ID collisions across multiple Arc instances. - **MQTT Restart Endpoint** — `/api/v1/restart` to apply MQTT config changes without server restart. ## Security - **Token Hashing Security Model** — New tokens hashed with bcrypt (cost 10). SHA256 prefixes for O(1) lookup. Legacy tokens supported. ## Breaking Changes - `/api/v1/write` → `/write` and `/api/v1/write/influxdb` → `/api/v2/write`. Update client config if using old Arc-specific paths. InfluxDB client libraries need no changes. ## Contributors Thanks to @schotime (Adam Schroder) and @khalid244 for their contributions to this release.

@khalid244

## New Features ### InfluxDB Client Compatibility Arc's Line Protocol endpoints now use the same paths as InfluxDB, enabling drop-in compatibility with all official InfluxDB client libraries (Go, Python, JavaScript, Java, C#, PHP, Ruby, Telegraf, Node-RED). - `/api/v1/write` → `/write` (InfluxDB 1.x clients, Telegraf) - `/api/v1/write/influxdb` → `/api/v2/write` (InfluxDB 2.x clients) - Arc-native endpoint `/api/v1/write/line-protocol` preserved - Supports Bearer token, Token header, API key header, and query parameter authentication ### MQTT Ingestion Support Native MQTT subscription for IoT and edge data ingestion. Connect directly to MQTT brokers without requiring additional infrastructure. - Subscribe to multiple MQTT topics with wildcard support (`+`, `#`) - Dynamic subscription management via REST API - TLS/SSL connections with certificate validation - Authentication via username/password or client certificates - Connection auto-reconnect with exponential backoff - Per-subscription statistics and monitoring - Passwords encrypted at rest - Auto-start subscriptions on server restart ### S3 File Caching via cache_httpfs Extension (PR #149) Optional in-memory caching of S3 Parquet files via DuckDB's `cache_httpfs` community extension. 5-10x query performance improvement for workloads with repeated file access (CTEs, subqueries, Grafana dashboards). - In-memory only — no disk caching, preserves stateless compute philosophy - Opt-in, configurable cache size and TTL - Graceful degradation if extension fails to load *Contributed by @khalid244* ### Relative Time Expression Support in Partition Pruning Queries using `NOW() - INTERVAL` now benefit from partition pruning. Supports seconds, minutes, hours, days, weeks, months. ## Bug Fixes - **Control Characters in Measurement Names Break S3 (Issue #122)** — Added strict validation for measurement names across all ingestion endpoints (Line Protocol, MsgPack, Continuous Queries). Names must start with a letter, contain only alphanumeric/underscore/hyphen, max 128 chars. - **Partition Pruner Fails on Non-Existent S3 Partitions (Issue #125, #144, PR #145)** — Fixed "No files found" errors when time range includes non-existent S3 partitions. Extended `filterExistingPaths()` for S3/Azure storage. Also fixed `filepath.Join()` mangling S3 URLs. *(Day-level fix by @khalid244)* - **Server Timeout Config Values Ignored (Issue #126)** — `server.read_timeout` and `server.write_timeout` now correctly use configuration values instead of hardcoded 30s. - **Large Payload Ingestion (413 Request Entity Too Large)** — `MaxPayloadSize` config now correctly passed to Fiber's `BodyLimit`. - **Query Results Timestamp Timezone Inconsistency** — All timestamps in query results now normalized to UTC. - **Azure Blob Storage Query SSL Certificate Errors on Linux (PR #92)** — Fixed by using system curl for SSL on Linux. *(Contributed by @schotime)* - **UTC Consistency for Compaction Filenames (PR #132)** — Compacted filenames now use UTC timestamps consistently. *(Contributed by @schotime)* - **S3 Subprocess Configuration Issues (Issue #131)** — Fixed missing credentials and SSL config forwarding to compaction subprocess. - **Query Failures with Non-UTF8 Data (Issue #136)** — Added automatic UTF-8 sanitization during ingestion with optimized fast-path (~6-25ns overhead). - **Nanosecond Timestamp Support for MessagePack Ingestion** — Extended timestamp detection to handle 19-digit nanosecond timestamps (important for InfluxDB migrations). - **WHERE Clause Regex Fails to Match Multi-Line Queries (Issue #146, PR #148)** — Fixed partition pruner failing on multi-line SQL queries. *(Contributed by @khalid244)* - **String Literals Containing SQL Keywords Break Partition Pruning** — Added string literal masking before regex matching. - **Buffer Age-Based Flush Timing Under High Load (Issue #142)** — Ticker now fires at half the configured interval for more consistent flush timing (~25% improvement). - **Arrow Writer Panic During High-Concurrency Writes (Issue #130)** — Fixed out-of-bounds panic during schema evolution with concurrent writes. - **Empty Directories Not Cleaned Up After Daily Compaction** — Automatic cleanup of empty hour-level partition directories after compaction. - **Compactor OOM and Segfaults with Large Datasets (Issue #102)** — Streaming I/O, memory limit passthrough, file batching (1000 max), and adaptive batch sizing on failure. - **Orphaned Compaction Temp Directories (Issue #164, PR #165)** — Two-layer cleanup: startup cleanup + parent-side cleanup after subprocess completion. - **Compaction Data Duplication on Crash (Issue #157, PR #163)** — Manifest-based tracking in S3 prevents re-compaction after crash. *(Contributed by @khalid244)* - **WAL-Based S3 Recovery (Issue #159, PR #162)** — Fixed data loss during S3 outages with startup recovery, periodic recovery, and backpressure handling. *(Contributed by @khalid244)* - **Tiered Storage Query Routing (Issue #166, PR #167)** — Fixed queries not routing to cold tier data and database listing not showing cold-only databases. - **Retention Policies for S3/Azure Storage Backends (Issue #169, PR #170)** — Retention policies now work with all storage backends. - **Retention Policy Empty Directory Cleanup (Issue #171)** — Empty directories cleaned up after retention policy file deletion. - **Query Timeout for S3 Disconnection (Issue #151, PR #152)** — Configurable query timeout prevents indefinite hangs. Returns HTTP 504 when exceeded. *(Contributed by @khalid244)* ## Improvements - **Configurable Server Idle and Shutdown Timeouts** — `server.idle_timeout` and `server.shutdown_timeout` now configurable. - **Automatic Time Function Query Optimization** — `time_bucket()` and `date_trunc()` automatically rewritten to epoch arithmetic (2-2.5x faster GROUP BY queries). - **Parallel Partition Scanning** — Queries spanning 3+ partitions now execute concurrently (2-4x speedup). - **Two-Stage Distributed Aggregation (Enterprise)** — 5-20x speedup for cross-shard aggregations via scatter/gather. - **DuckDB Query Engine Optimizations** — Parquet metadata caching, prefetching, pool-wide `SET GLOBAL` consistency (18-24% faster aggregations). *(SET GLOBAL fix by @khalid244)* - **Automatic Regex-to-String Function Optimization** — URL domain extraction patterns rewritten to native string functions (2x+ faster). - **Database Header for Query Optimization** — `x-arc-database` header skips regex parsing (5-17% faster queries). - **MQTT Client Auto-Generated Client ID** — Prevents client ID collisions across multiple Arc instances. - **MQTT Restart Endpoint** — `/api/v1/restart` to apply MQTT config changes without server restart. ## Security - **Token Hashing Security Model** — New tokens hashed with bcrypt (cost 10). SHA256 prefixes for O(1) lookup. Legacy tokens supported. ## Breaking Changes - `/api/v1/write` → `/write` and `/api/v1/write/influxdb` → `/api/v2/write`. Update client config if using old Arc-specific paths. InfluxDB client libraries need no changes. ## Contributors Thanks to @schotime (Adam Schroder) and @khalid244 for their contributions to this release.

@khalid244

## New Features ### InfluxDB Client Compatibility Arc's Line Protocol endpoints now use the same paths as InfluxDB, enabling drop-in compatibility with all official InfluxDB client libraries (Go, Python, JavaScript, Java, C#, PHP, Ruby, Telegraf, Node-RED). - `/api/v1/write` → `/write` (InfluxDB 1.x clients, Telegraf) - `/api/v1/write/influxdb` → `/api/v2/write` (InfluxDB 2.x clients) - Arc-native endpoint `/api/v1/write/line-protocol` preserved - Supports Bearer token, Token header, API key header, and query parameter authentication ### MQTT Ingestion Support Native MQTT subscription for IoT and edge data ingestion. Connect directly to MQTT brokers without requiring additional infrastructure. - Subscribe to multiple MQTT topics with wildcard support (`+`, `#`) - Dynamic subscription management via REST API - TLS/SSL connections with certificate validation - Authentication via username/password or client certificates - Connection auto-reconnect with exponential backoff - Per-subscription statistics and monitoring - Passwords encrypted at rest - Auto-start subscriptions on server restart ### S3 File Caching via cache_httpfs Extension (PR #149) Optional in-memory caching of S3 Parquet files via DuckDB's `cache_httpfs` community extension. 5-10x query performance improvement for workloads with repeated file access (CTEs, subqueries, Grafana dashboards). - In-memory only — no disk caching, preserves stateless compute philosophy - Opt-in, configurable cache size and TTL - Graceful degradation if extension fails to load *Contributed by @khalid244* ### Relative Time Expression Support in Partition Pruning Queries using `NOW() - INTERVAL` now benefit from partition pruning. Supports seconds, minutes, hours, days, weeks, months. ## Bug Fixes - **Control Characters in Measurement Names Break S3 (Issue #122)** — Added strict validation for measurement names across all ingestion endpoints (Line Protocol, MsgPack, Continuous Queries). Names must start with a letter, contain only alphanumeric/underscore/hyphen, max 128 chars. - **Partition Pruner Fails on Non-Existent S3 Partitions (Issue #125, #144, PR #145)** — Fixed "No files found" errors when time range includes non-existent S3 partitions. Extended `filterExistingPaths()` for S3/Azure storage. Also fixed `filepath.Join()` mangling S3 URLs. *(Day-level fix by @khalid244)* - **Server Timeout Config Values Ignored (Issue #126)** — `server.read_timeout` and `server.write_timeout` now correctly use configuration values instead of hardcoded 30s. - **Large Payload Ingestion (413 Request Entity Too Large)** — `MaxPayloadSize` config now correctly passed to Fiber's `BodyLimit`. - **Query Results Timestamp Timezone Inconsistency** — All timestamps in query results now normalized to UTC. - **Azure Blob Storage Query SSL Certificate Errors on Linux (PR #92)** — Fixed by using system curl for SSL on Linux. *(Contributed by @schotime)* - **UTC Consistency for Compaction Filenames (PR #132)** — Compacted filenames now use UTC timestamps consistently. *(Contributed by @schotime)* - **S3 Subprocess Configuration Issues (Issue #131)** — Fixed missing credentials and SSL config forwarding to compaction subprocess. - **Query Failures with Non-UTF8 Data (Issue #136)** — Added automatic UTF-8 sanitization during ingestion with optimized fast-path (~6-25ns overhead). - **Nanosecond Timestamp Support for MessagePack Ingestion** — Extended timestamp detection to handle 19-digit nanosecond timestamps (important for InfluxDB migrations). - **WHERE Clause Regex Fails to Match Multi-Line Queries (Issue #146, PR #148)** — Fixed partition pruner failing on multi-line SQL queries. *(Contributed by @khalid244)* - **String Literals Containing SQL Keywords Break Partition Pruning** — Added string literal masking before regex matching. - **Buffer Age-Based Flush Timing Under High Load (Issue #142)** — Ticker now fires at half the configured interval for more consistent flush timing (~25% improvement). - **Arrow Writer Panic During High-Concurrency Writes (Issue #130)** — Fixed out-of-bounds panic during schema evolution with concurrent writes. - **Empty Directories Not Cleaned Up After Daily Compaction** — Automatic cleanup of empty hour-level partition directories after compaction. - **Compactor OOM and Segfaults with Large Datasets (Issue #102)** — Streaming I/O, memory limit passthrough, file batching (1000 max), and adaptive batch sizing on failure. - **Orphaned Compaction Temp Directories (Issue #164, PR #165)** — Two-layer cleanup: startup cleanup + parent-side cleanup after subprocess completion. - **Compaction Data Duplication on Crash (Issue #157, PR #163)** — Manifest-based tracking in S3 prevents re-compaction after crash. *(Contributed by @khalid244)* - **WAL-Based S3 Recovery (Issue #159, PR #162)** — Fixed data loss during S3 outages with startup recovery, periodic recovery, and backpressure handling. *(Contributed by @khalid244)* - **Tiered Storage Query Routing (Issue #166, PR #167)** — Fixed queries not routing to cold tier data and database listing not showing cold-only databases. - **Retention Policies for S3/Azure Storage Backends (Issue #169, PR #170)** — Retention policies now work with all storage backends. - **Retention Policy Empty Directory Cleanup (Issue #171)** — Empty directories cleaned up after retention policy file deletion. - **Query Timeout for S3 Disconnection (Issue #151, PR #152)** — Configurable query timeout prevents indefinite hangs. Returns HTTP 504 when exceeded. *(Contributed by @khalid244)* ## Improvements - **Configurable Server Idle and Shutdown Timeouts** — `server.idle_timeout` and `server.shutdown_timeout` now configurable. - **Automatic Time Function Query Optimization** — `time_bucket()` and `date_trunc()` automatically rewritten to epoch arithmetic (2-2.5x faster GROUP BY queries). - **Parallel Partition Scanning** — Queries spanning 3+ partitions now execute concurrently (2-4x speedup). - **Two-Stage Distributed Aggregation (Enterprise)** — 5-20x speedup for cross-shard aggregations via scatter/gather. - **DuckDB Query Engine Optimizations** — Parquet metadata caching, prefetching, pool-wide `SET GLOBAL` consistency (18-24% faster aggregations). *(SET GLOBAL fix by @khalid244)* - **Automatic Regex-to-String Function Optimization** — URL domain extraction patterns rewritten to native string functions (2x+ faster). - **Database Header for Query Optimization** — `x-arc-database` header skips regex parsing (5-17% faster queries). - **MQTT Client Auto-Generated Client ID** — Prevents client ID collisions across multiple Arc instances. - **MQTT Restart Endpoint** — `/api/v1/restart` to apply MQTT config changes without server restart. ## Security - **Token Hashing Security Model** — New tokens hashed with bcrypt (cost 10). SHA256 prefixes for O(1) lookup. Legacy tokens supported. ## Breaking Changes - `/api/v1/write` → `/write` and `/api/v1/write/influxdb` → `/api/v2/write`. Update client config if using old Arc-specific paths. InfluxDB client libraries need no changes. ## Contributors Thanks to @schotime (Adam Schroder) and @khalid244 for their contributions to this release.

xe-nvdk requested changes Jan 25, 2026

View reviewed changes

xe-nvdk approved these changes Jan 25, 2026

View reviewed changes

xe-nvdk merged commit babb589 into Basekick-Labs:main Jan 25, 2026

xe-nvdk added a commit that referenced this pull request Jan 25, 2026

docs: add WAL-based S3 recovery to release notes (PR #162)

b19a7db

Documents the WAL recovery fix for preventing data loss during S3 outages, including new configuration options and metrics.

xe-nvdk mentioned this pull request Feb 1, 2026

release: Arc v26.02.1 #175

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent data loss during S3 outages with WAL-based recovery#162

fix: prevent data loss during S3 outages with WAL-based recovery#162
xe-nvdk merged 2 commits intoBasekick-Labs:mainfrom
khalid244:fix/data-durability-s3-outage

khalid244 commented Jan 25, 2026 •

edited

Loading

Uh oh!

xe-nvdk left a comment

Uh oh!

khalid244 commented Jan 25, 2026

Uh oh!

xe-nvdk left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

khalid244 commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Flow During S3 Outage

Changes by File

cmd/arc/main.go

internal/ingest/arrow_writer.go

internal/wal/recovery.go

End-to-End Scenarios

Test Plan

Uh oh!

xe-nvdk left a comment

Choose a reason for hiding this comment

Review: PR #162 - WAL-based S3 Recovery

1. Code Duplication (~50 lines repeated)

2. Hardcoded 5-Minute Interval

3. Missing Tests

4. Recovery During Active WAL Write

5. No Backpressure During Mass Recovery

Minor Comments

Summary

Uh oh!

khalid244 commented Jan 25, 2026

Addressed All Review Feedback

1. ✅ Extracted Shared Recovery Callback

2. ✅ Configurable Recovery Interval

3. ✅ Added Tests

4. ✅ Skip Active WAL File During Periodic Recovery

5. ✅ Backpressure During Mass Recovery

6. ✅ Added WAL Metrics

Uh oh!

xe-nvdk left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

khalid244 commented Jan 25, 2026 •

edited

Loading