Write Opt. by ColinLeeo · Pull Request #749 · apache/tsfile

ColinLeeo · 2026-03-22T09:35:49Z

No description provided.

jt2594838 · 2026-03-24T09:54:37Z

        page_size_ = buf_len;
+        page_mask_ = buf_len - 1;


Is it ensured that page_size_ must be the power of 2?

jt2594838 · 2026-03-24T10:06:36Z

+                auto* sc = new StringColumn();
+                sc->init(max_row_num_, max_row_num_ * 32);
+                value_matrix_[c].string_col = sc;


Where does the 32 come from?

Use mem_alloc?

jt2594838 · 2026-03-24T10:12:59Z

+int Tablet::set_timestamps(const int64_t* timestamps, uint32_t count) {
+    if (err_code_ != E_OK) return err_code_;
+    ASSERT(timestamps_ != NULL);
+    if (UNLIKELY(count > static_cast<uint32_t>(max_row_num_))) {
+        return E_OUT_OF_RANGE;
+    }
+    std::memcpy(timestamps_, timestamps, count * sizeof(int64_t));
+    cur_row_size_ = std::max(count, cur_row_size_);
+    return E_OK;
+}
+
+int Tablet::set_column_values(uint32_t schema_index, const void* data,
+                              const uint8_t* bitmap, uint32_t count) {
+    if (err_code_ != E_OK) return err_code_;
+    if (UNLIKELY(schema_index >= schema_vec_->size())) return E_OUT_OF_RANGE;
+    if (UNLIKELY(count > static_cast<uint32_t>(max_row_num_)))
+        return E_OUT_OF_RANGE;
+


Duplicated with #747

Refer to the comments in that PR and discuss with @hongzhi-gao to decide where to put the changes.

closed #747

jt2594838 · 2026-03-24T10:29:05Z

+    for (auto col_idx : id_column_indexes_) {
+        const StringColumn& sc = *value_matrix_[col_idx].string_col;
+        const uint32_t* off = sc.offsets;
+        const char* buf = sc.buffer;
+        for (uint32_t i = 1; i < row_count; i++) {
+            if (boundary[i >> 6] & (1ULL << (i & 63))) continue;
+            uint32_t len_a = off[i] - off[i - 1];
+            uint32_t len_b = off[i + 1] - off[i];
+            if (len_a != len_b ||
+                (len_a > 0 &&
+                 memcmp(buf + off[i - 1], buf + off[i], len_a) != 0)) {
+                boundary[i >> 6] |= (1ULL << (i & 63));
+            }
+        }
+    }


id_column_indexes_ -> tag_column_indexes.

May scan the indexes in a reversed order, because in general, the later tag is more likely to change often.
E.g., devices are more likely to have orders like:
beijing.haidian.wf0001.wt0001
beijing.haidian.wf0001.wt0002
beijing.haidian.wf0001.wt0003
beijing.haidian.wf0002.wt0001
beijing.haidian.wf0002.wt0002
beijing.haidian.wf0002.wt0003

And, may break fast when the number of boundaries set equals row_count.

Will we benefit from it if we maintain the boundaries during insertions of tag columns?

jt2594838 · 2026-03-24T10:40:45Z

+    std::vector<uint32_t> result;
+    for (uint32_t w = 0; w < nwords; w++) {
+        uint64_t bits = boundary[w];
+        while (bits) {
+            uint32_t bit = __builtin_ctzll(bits);
+            uint32_t idx = w * 64 + bit;
+            if (idx > 0 && idx < row_count) {
+                result.push_back(idx);
+            }
+            bits &= bits - 1;  // clear lowest set bit
+        }
+    }
+    return result;


Add some comments, better to give some examples.

jt2594838 · 2026-03-24T11:11:50Z

+            // Collect column tasks for parallel execution
+            struct ColTask {
+                ValueChunkWriter* writer;
+                uint32_t col_idx;
+            };
+            std::vector<ColTask> tasks;


This may not be desirable when resources are limited.
Better to add a compilation option or configuration for this.

jt2594838 · 2026-03-24T11:13:40Z

+            if (tasks.size() >= 2) {
+                // Launch time column + value columns in parallel via thread pool
+                auto time_future = thread_pool_.submit(
+                    [this, time_chunk_writer, &tablet, si, ei]() {
+                        return time_write_column_batch(time_chunk_writer,
+                                                       tablet, si, ei);
+                    });
+
+                std::vector<std::future<int>> val_futures;
+                for (size_t t = 0; t < tasks.size(); t++) {
+                    auto& task = tasks[t];
+                    val_futures.push_back(thread_pool_.submit(
+                        [this, &task, &tablet, si, ei]() {
+                            return value_write_column_batch(
+                                task.writer, tablet, task.col_idx, si, ei);
+                        }));
+                }
+
+                // Wait for all and check errors
+                ret = time_future.get();
+                if (ret != E_OK) return ret;
+                for (auto& f : val_futures) {
+                    int r = f.get();
+                    if (r != E_OK && ret == E_OK) ret = r;
+                }
+                if (ret != E_OK) return ret;
+            } else {


May always encode the time column in the local thread to reduce overhead.

jt2594838 · 2026-03-24T11:17:21Z

+
+    uint32_t seg_start = 0;
+    for (uint32_t b : boundaries) {
+        std::shared_ptr<IDeviceID> dev_id(tablet.get_device_id(seg_start));


May add an IDeviceID implementation that directly points to a row in a tablet.

jt2594838 · 2026-03-25T01:38:58Z

+        if (col_notnull_bitmap.test(r)) {
+            has_null = true;


col_notnull_bitmap -> col_null_bitmap

check other places

jt2594838 · 2026-03-25T01:42:36Z

    bool write_file_created_;
    bool io_writer_owned_;  // false when init(RestorableTsFileIOWriter*)
    bool table_aligned_ = true;
+    common::ThreadPool thread_pool_{6};


Add an item in the global configuration and use it

Brings together batch decode infrastructure, multi-value aligned read, parallel page decode, columnar tablet write, and SIMD micro-optimizations from the long-lived `final` branch into a single review-ready change. This change is a code snapshot, not a replay of `final` commit history -- the upstream history was a long sequence of WIP commits that wasn't fit for review. Supersedes #749, #754, #774. Read path - Decoder base gains batch APIs (read_batch_int32/int64/float/double, skip_*); PLAIN, TS2DIFF, Gorilla decoders implement them. TS2DIFF has block-level peeking so time filters can skip blocks without decoding. Gorilla adds a raw-pointer GorillaBitReader that bypasses ByteStream overhead. - ChunkReader / AlignedChunkReader add *_DECODE_TV_BATCH methods that decode time + value into a TsBlock in one pass, applying batch time filters before append. - AlignedChunkReader supports a multi-value mode: one time chunk + N value chunks decoded in a single pass, sharing the decoded timestamps and filter mask. SingleDeviceTsBlockReader auto-detects same-device measurements via VectorMeasurementColumnContext. - Optional page-level parallel decompression via a DecodeThreadPool + BlockingQueue when ENABLE_THREADS is set. Page-plan classification (SKIP / FULL_PASS / BOUNDARY) lets a scatter-free memcpy fast path fire when every row passes and no column has nulls. Write path - ValuePageWriter gains write_batch / write_string_batch that take timestamp+value+nullness arrays directly, removing the per-value append loop. Tablet exposes set_timestamps / set_column_values / set_column_string_repeated / reset for bulk reuse and switches StringColumn to an Arrow-compatible offset+buffer layout. - TS2DIFFEncoder::flush now packs all deltas with a single pack_bits_msb + write_buf instead of per-value write_bits, falling back to the scalar path for the rare bit_width > 56 case. - Int64Statistic::update_batch (NEON-accelerated min/max/sum). Encoding / SIMD - TS2DIFF batch decode adds AVX2 helpers via SIMDe (already on develop) for both i32 and i64; scalar fallback unchanged. - PLAIN byte-swap path uses ARM NEON (vrev64q_u8 / vrev32q_u8) when available, falling back to __builtin_bswap. - CMakeLists adds ENABLE_SIMD and turns on -O3 -march=native -flto in Release builds. Allocator / ByteStream - ByteStream caches page_mask_ (= page_size - 1) so the hot path uses a bitmask instead of modulo; wrap_from rounds buffer sizes up to a power of two so the mask remains correct. total_size_ widened to uint64_t to support files > 4GB. - UncompressedCompressor now copies its output instead of aliasing caller buffers, letting callers free input safely. C wrapper / Arrow - Trimmed unused metadata-export surface (TsFileStatisticBase, TimeseriesMetadata, DeviceTimeseriesMetadataEntry, tag-filter handles) out of the public C API. Internal tag filtering is unaffected. - arrow_c.cc simplified: per-row offset handling for sliced variable-length arrays in place of the InvertArrowBitmap copy. Tests / benchmarks - New tsfile_reader_table_batch_test.cc covers the TsBlock batch read path. gorilla_codec_test.cc adds Int32/Int64/Float batch decode tests. examples/cpp_examples adds bench_read.cpp/.h and an examples/read_perf_compare/ target. - Removed cwrapper_metadata_test.cc and common/path.cc (Path bodies inlined into path.h; the C metadata API they covered is gone). Compatibility - All new C++ methods are additions; no existing C++ API was removed. - C wrapper headers lost the metadata export / tag filter symbols listed above -- downstream callers (Python wrapper in particular) will want a sanity check before merge. - cpp/third_party/ intentionally left at develop's state so the recent MSVC compatibility fixes (WITH_STATIC_CRT OFF, CMP0054 NEW, CMAKE_POLICY_VERSION_MINIMUM=3.5, _MSC_VER guards) are preserved. Verification - cmake configure + make -j on macOS arm64 (AppleClang, C++11) builds cleanly: libtsfile.2.2.1.dev.dylib and TsFile_Test both link, zero errors, only unused-lambda-capture warnings in pre-existing tests. - Full TsFile_Test run and downstream Python binding load are left as pre-merge checks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ColinLeeo added 4 commits March 20, 2026 00:36

refine tabet writing.

3322a03

fix some code.

0297161

fix format.

a79a4e7

experiment.

344ee5d

jt2594838 reviewed Mar 25, 2026

View reviewed changes

ColinLeeo changed the title ~~Experiment~~ Write Opt. Mar 27, 2026

ColinLeeo mentioned this pull request May 26, 2026

TsFile C++ batch read/write optimization #823

Open

5 tasks

Conversation

ColinLeeo commented Mar 22, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants