Columnar time series file format optimized for numeric data. Ideal for financial tick data, IoT sensors, and infrastructure metrics.
Combines ALP, Gorilla, and Pongo compression for doubles, delta-varint for timestamps, and ZSTD/gzip for binary data — with per-column codec selection in a single .lastra file.
- Per-column codecs: ALP/Gorilla/Pongo for numeric data, delta-varint for timestamps, ZSTD/gzip for binary
- Two sections: regular time series (series) + sparse timestamped events (events)
- Column metadata: optional key-value metadata per column (e.g., indicator parameters, sensor config)
- Selective column access: footer offsets enable reading specific columns without decompressing others
- Little-endian throughout: JS/TS readers can use
Float64Arrayzero-copy on decoded data - Zero Hadoop/Parquet dependency: pure Java + alp-java + zstd-jni, Gorilla and Pongo codecs have zero external deps
- Java 11+
HEADER (22 bytes, LE):
"LAST" magic | version (1) | flags | seriesRowCount | seriesColCount
eventsRowCount | eventsColCount
COLUMN DESCRIPTORS (series, then events):
codec | dataType | flags | name
optional: metadata (JSON, gzip-compressed)
SERIES DATA:
Without row groups: per column: [4 bytes length] [compressed data]
With row groups: per RG: per column: [4 bytes length] [compressed data]
EVENTS DATA: per column: [4 bytes length] [compressed data]
FOOTER:
Without row groups: [column offsets] + [column CRC32s] + "LAS!" magic
With row groups: [rgCount] + [per-RG stats] + [per-RG CRC32s]
+ [event offsets] + [event CRC32s] + "LAS!" magic
All event columns share a single eventsRowCount. When columns have
different logical lengths, use the highest count and pad shorter columns
with zero/empty values. Use a type column (VARLEN) to identify event
categories and filter on read.
| Flag | Bit | Description |
|---|---|---|
FLAG_HAS_EVENTS |
0 | File contains an events section |
FLAG_HAS_FOOTER |
1 | Footer with column offsets is present |
FLAG_HAS_CHECKSUMS |
2 | Per-column CRC32 checksums in footer |
FLAG_HAS_ROW_GROUPS |
3 | Multiple row groups with per-group statistics |
When FLAG_HAS_ROW_GROUPS is set, series data is partitioned into row groups. Each row group contains all series columns for a subset of rows, with codecs reset per group (enabling independent decoding).
The footer stores per-RG metadata: byte offset, row count, tsMin, tsMax. Readers can skip row groups whose timestamp range doesn't overlap the query window — enabling efficient HTTP range requests against remote files.
Row group size is configurable via setRowGroupSize() (default: 4096 rows). Files with fewer rows than the group size are written as a single implicit row group without the flag.
When FLAG_HAS_CHECKSUMS is set, the footer contains one CRC32 (IEEE 802.3) per column per row group, computed over the compressed data bytes. The reader verifies each column on access — a corrupted column throws an exception identifying which column failed, while intact columns remain readable.
| Codec | ID | DataType | Use case |
|---|---|---|---|
DELTA_VARINT |
1 | LONG | Timestamps (~1 byte/value for regular intervals) |
ALP |
2 | DOUBLE | Decimal doubles: prices, temperatures, measurements (~3-4 bits/value for 2dp) |
GORILLA |
6 | DOUBLE | XOR compression (Facebook VLDB 2015). Best for volatile metrics: CPU%, latency, network |
PONGO |
7 | DOUBLE | Decimal-aware erasure + Gorilla XOR. Best for decimal-native data: prices, sensor readings (~18 bits/value on 2dp) |
VARLEN |
3 | BINARY | Short strings, event types, labels |
VARLEN_ZSTD |
4 | BINARY | JSON payloads, bulk binary data |
VARLEN_GZIP |
5 | BINARY | Metadata, small text (browser DecompressionStream compatible) |
RAW |
0 | LONG/DOUBLE | Uncompressed fallback |
try (LastraWriter w = new LastraWriter(outputStream)) {
w.addSeriesColumn("ts", DataType.LONG, Codec.DELTA_VARINT);
w.addSeriesColumn("open", DataType.DOUBLE, Codec.ALP);
w.addSeriesColumn("high", DataType.DOUBLE, Codec.ALP);
w.addSeriesColumn("low", DataType.DOUBLE, Codec.ALP);
w.addSeriesColumn("close", DataType.DOUBLE, Codec.ALP);
w.addSeriesColumn("volume", DataType.DOUBLE, Codec.ALP);
w.writeSeries(rowCount, ts, open, high, low, close, volume);
}try (LastraWriter w = new LastraWriter(outputStream)) {
// Series: sensor readings with metadata
w.addSeriesColumn("ts", DataType.LONG, Codec.DELTA_VARINT);
w.addSeriesColumn("temperature", DataType.DOUBLE, Codec.PONGO,
Map.of("unit", "celsius", "sensor", "dht22"));
w.addSeriesColumn("humidity", DataType.DOUBLE, Codec.ALP,
Map.of("unit", "%", "sensor", "dht22"));
w.addSeriesColumn("pressure", DataType.DOUBLE, Codec.GORILLA);
// Events: alerts with their own timestamps
w.addEventColumn("ts", DataType.LONG, Codec.DELTA_VARINT);
w.addEventColumn("type", DataType.BINARY, Codec.VARLEN);
w.addEventColumn("data", DataType.BINARY, Codec.VARLEN_ZSTD);
w.writeSeries(sampleCount, ts, temp, humidity, pressure);
w.writeEvents(alertCount, alertTs, alertTypes, alertData);
}try (LastraWriter w = new LastraWriter(outputStream)) {
w.addSeriesColumn("ts", DataType.LONG, Codec.DELTA_VARINT);
w.addSeriesColumn("close", DataType.DOUBLE, Codec.ALP);
w.addSeriesColumn("ema1", DataType.DOUBLE, Codec.ALP,
Map.of("indicator", "ema", "periods", "10"));
w.addSeriesColumn("rsi1", DataType.DOUBLE, Codec.ALP,
Map.of("indicator", "rsi", "periods", "14"));
w.addEventColumn("ts", DataType.LONG, Codec.DELTA_VARINT);
w.addEventColumn("type", DataType.BINARY, Codec.VARLEN);
w.addEventColumn("data", DataType.BINARY, Codec.VARLEN_ZSTD);
w.writeSeries(tickCount, ts, close, ema, rsi);
w.writeEvents(signalCount, signalTs, signalTypes, signalData);
}try (LastraWriter w = new LastraWriter(outputStream)) {
w.setRowGroupSize(3600); // 1 hour of 1s ticks per row group
w.addSeriesColumn("ts", DataType.LONG, Codec.DELTA_VARINT);
w.addSeriesColumn("close", DataType.DOUBLE, Codec.ALP);
// writeSeries auto-partitions into row groups
w.writeSeries(86400, dayTimestamps, dayCloses); // 24 RGs for a full day
}LastraReader r = LastraReader.from(inputStream);
// Read only what you need — other columns are not decompressed
long[] ts = r.readSeriesLong("ts");
double[] close = r.readSeriesDouble("close");
// Column metadata
Map<String, String> meta = r.getSeriesColumn("ema1").metadata();
// {"indicator": "ema", "periods": "10"}
// Events (independent timestamps)
long[] signalTs = r.readEventLong("ts");
byte[][] signalData = r.readEventBinary("data");LastraReader r = LastraReader.from(inputStream);
// Skip row groups outside the query window
long queryFrom = Instant.parse("2026-04-07T14:00:00Z").toEpochMilli();
long queryTo = Instant.parse("2026-04-07T16:00:00Z").toEpochMilli();
for (int i = 0; i < r.rowGroupCount(); i++) {
RowGroupStats stats = r.rowGroupStats(i);
if (stats.tsMax() < queryFrom || stats.tsMin() > queryTo) continue;
// Only decode row groups that overlap the query range
long[] ts = r.readRowGroupLong(i, "ts");
double[] close = r.readRowGroupDouble(i, "close");
}OHLCV ticker data (1000 rows, 2 decimal places):
| Format | Size | Ratio |
|---|---|---|
| Raw | 48 KB | 1x |
| Lastra | ~3.5 KB | ~13x |
| Apache Parquet (SNAPPY) | ~8 KB | ~6x |
| Aspect | Apache Parquet | Lastra |
|---|---|---|
| Timestamps (int64) | DELTA_BINARY_PACKED (~1-2 bytes/value) | DELTA_VARINT (~1 byte/value) |
| Doubles (numeric) | PLAIN + block compression (SNAPPY/ZSTD) | ALP (~3-4 bits/value), Pongo (~18 bits/value), Gorilla (XOR) |
| Strings / binary | DICTIONARY + RLE, DELTA_BYTE_ARRAY | VARLEN, VARLEN_ZSTD, VARLEN_GZIP |
| Block compression | SNAPPY, GZIP, ZSTD, LZ4, BROTLI | No (compression integrated per codec) |
| Per-column codec | Same codec per file or row group | Different codec per column |
| Optimized for | General purpose, big data | Numeric time series (financial, IoT, infra) |
| Format | Size | Ratio vs Parquet |
|---|---|---|
| Apache Parquet (SNAPPY) | 118 KB | 1x |
| Lastra (ALP default) | 82 KB | 1.4x smaller |
Lastra (mixed codecs via lastra-convert --best) |
73 KB | 1.6x smaller |
Apache Parquet stores doubles as raw 8 bytes (PLAIN encoding) then applies generic block compression (SNAPPY/ZSTD). Lastra applies semantic compression per column:
- ALP: understands that
65007.28has 2 decimal places → 3-4 bits/value - Pongo: detects decimal patterns and erases mantissa noise before XOR → ~18 bits/value
- Gorilla: XOR between consecutive similar values → good for volatile data
Apache Parquet has a much larger ecosystem (Spark, DuckDB, Arrow, Pandas) and advanced features: predicate pushdown, bloom filters, column statistics, nested types, and modular encryption.
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
<dependency>
<groupId>com.github.qtsurfer</groupId>
<artifactId>lastra-java</artifactId>
<version>0.8.0</version>
</dependency>repositories {
maven { url 'https://jitpack.io' }
}
dependencies {
implementation 'com.github.qtsurfer:lastra-java:0.8.0'
}Copyright 2026 Wualabs LTD. Apache License 2.0 — see LICENSE.