Skip to content

GH-49958: Do not merge [C++][Dataset] DEBUG -> blocked on upstream gcc-16 fix#49961

Draft
tadeja wants to merge 4 commits into
apache:mainfrom
tadeja:49958-dataset-bad_weak_ptr
Draft

GH-49958: Do not merge [C++][Dataset] DEBUG -> blocked on upstream gcc-16 fix#49961
tadeja wants to merge 4 commits into
apache:mainfrom
tadeja:49958-dataset-bad_weak_ptr

Conversation

@tadeja
Copy link
Copy Markdown
Contributor

@tadeja tadeja commented May 11, 2026

Rationale for this change

Debug #49958

What changes are included in this PR?

Only debug for now.

Are these changes tested?

CI debug

Are there any user-facing changes?

No

@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #49958 has been automatically assigned in GitHub to PR creator.

@tadeja
Copy link
Copy Markdown
Contributor Author

tadeja commented May 12, 2026

Both failing tests throw bad_weak_ptr at the same line dataset_writer.cc:212 DatasetWriterFileQueue::ScheduleBatch:

  void ScheduleBatch(std::shared_ptr<RecordBatch> batch) {
    file_tasks_->AddSimpleTask(
        [self = SfwDbg(__LINE__) /* was shared_from_this() */, batch = std::move(batch)]() {
          return self->WriteNext(std::move(batch));
        },
        "DatasetWriter::WriteBatch"sv);
  }

https://github.com/apache/arrow/actions/runs/25690826182/job/75426470835#step:12:614

[ RUN      ] DatasetWriterTestFixture.BatchWriteConcurrent
GH-49958: bad_weak_ptr DatasetWriterFileQueue dataset_writer.cc:212
terminate called after throwing an instance of 'std::bad_weak_ptr'
  what():  bad_weak_ptr

      Start 62: arrow-dataset-dataset-writer-test
    Test #62: arrow-dataset-dataset-writer-test ............Exit code 0xc0000374***Exception:   0.34 sec

https://github.com/apache/arrow/actions/runs/25690826182/job/75426470835#step:12:794

[ RUN      ] TestFileSystemDataset.MultiThreadedWritePersistsOrder
GH-49958: bad_weak_ptr DatasetWriterFileQueue dataset_writer.cc:212
terminate called after throwing an instance of 'std::bad_weak_ptr'
  what():  bad_weak_ptr

      Start 65: arrow-dataset-file-test
    Test #65: arrow-dataset-file-test ......................***Exception: SegFault  0.20 sec

@tadeja
Copy link
Copy Markdown
Contributor Author

tadeja commented May 12, 2026

Diagnosed - The make_shared did not fix this, and the new CI log shows:
https://github.com/apache/arrow/actions/runs/25731999852/job/75559307182?pr=49961#step:12:870

[ RUN      ] DatasetWriterTestFixture.BatchWriteConcurrent
GH-49958: ScheduleBatch this=000002f3e2fda080 EXPIRED use_count=0
GH-49958: bad_weak_ptr DatasetWriterFileQueue dataset_writer.cc:218
terminate called after throwing an instance of 'std::bad_weak_ptr'
  what():  bad_weak_ptr

(dataset_writer.cc:218 is the same as the previous :212 line - because of added diagnostics shifted by 6 lines)

(1) It's not the reset vs make_shared wiring
(2) By the time ScheduleBatch runs, the control block of file queue already reports use_count=0.

@tadeja
Copy link
Copy Markdown
Contributor Author

tadeja commented May 12, 2026

Diagnosed further - this isn't a dataset_writer bug. It looks like lower-level memory corruption, likely a gcc-16/MinGW regression. Keeping workaround of #49931's MSYS2 pin for now, #49945 stays waiting, tracked under #49948.

    Test #62: arrow-dataset-dataset-writer-test ............Exit code 0xc0000374***Exception:   0.30 sec
...
    Test #62: arrow-dataset-dataset-writer-test ............   Passed    4.85 sec
...
The following tests FAILED:
	 41 - arrow-async-utility-test (Exit code 0xc0000374)   arrow-tests unittest
	 44 - arrow-threading-utility-test (Timeout)            arrow-tests unittest
	 65 - arrow-dataset-file-test (Failed)                  arrow_dataset unittest
	 78 - arrow-flight-test (Failed)                        arrow_flight unittest
Error: Process completed with exit code 8.
  • Independent reproducers without dataset_writer are: arrow-async-utility-test always dies at PushGenerator.Stress and arrow-threading-utility-test always times out. Both fine on CLANG64/libc++ and on gcc‑15.2. Only gcc 15.2->gcc 16.1 is affected.

@tadeja tadeja changed the title GH-49958: [C++][Dataset] DEBUG: log shared_from_this() callsite throwing bad_weak_ptr GH-49958: Do not merge [C++][Dataset] DEBUG -> blocked on upstream gcc-16 fix May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant