feat(c-api): add nullable setter and per-doc write results#234
feat(c-api): add nullable setter and per-doc write results#234chinaux merged 1 commit intoalibaba:feat/add_c_apifrom
Conversation
| const std::string pk = i < pks.size() ? pks[i] : std::string(); | ||
| const std::string message = statuses[i].message(); | ||
| (*results)[i].pk = copy_string(pk); | ||
| (*results)[i].message = copy_string(message); | ||
| (*results)[i].code = status_to_error_code(statuses[i]); |
There was a problem hiding this comment.
Null PK in results when doc has empty primary key
copy_string returns nullptr for empty strings (line 795: if (str.empty()) return nullptr;). This means results[i].pk will silently be nullptr whenever a document was written without setting a PK (e.g. when docs[i] was null in collect_doc_pks, which pushes ""). The struct documentation says const char *pk; /**< Primary key (allocated by API) */ which implies it's always non-null.
Callers who rely on results[i].pk being non-null for all entries will silently encounter a null dereference. Consider returning an empty string instead of null to ensure a consistent, non-null result:
| const std::string pk = i < pks.size() ? pks[i] : std::string(); | |
| const std::string message = statuses[i].message(); | |
| (*results)[i].pk = copy_string(pk); | |
| (*results)[i].message = copy_string(message); | |
| (*results)[i].code = status_to_error_code(statuses[i]); | |
| for (size_t i = 0; i < *result_count; ++i) { | |
| const std::string pk = i < pks.size() ? pks[i] : std::string(); | |
| const std::string message = statuses[i].message(); | |
| // Allocate empty string for pk/message if blank, so callers always get a non-null pointer | |
| auto alloc_string = [](const std::string &s) -> char * { | |
| char *copy = static_cast<char *>(malloc(s.length() + 1)); | |
| if (!copy) return nullptr; | |
| memcpy(copy, s.c_str(), s.length() + 1); | |
| return copy; | |
| }; | |
| (*results)[i].pk = alloc_string(pk); | |
| (*results)[i].message = alloc_string(message); | |
| (*results)[i].code = status_to_error_code(statuses[i]); | |
| } |
This also fixes the related copy_string malloc-failure issue described below.
| auto doc_map = result.value(); | ||
| auto schema_result = (*coll_ptr)->Schema(); | ||
| if (schema_result.has_value()) { | ||
| normalize_nullable_fields_for_fetch(schema_result.value(), doc_map); | ||
| } |
There was a problem hiding this comment.
Silent schema-fetch failure during nullable normalization
If Schema() returns an error (e.g. I/O failure, corrupted metadata), nullable-field normalization is quietly skipped:
if (schema_result.has_value()) {
normalize_nullable_fields_for_fetch(schema_result.value(), doc_map);
}
// no else: silently falls through with un-normalized docsThe caller receives documents where nullable fields that are absent are not marked null — directly contradicting the intent of this feature. At minimum, consider logging a warning or propagating the error so callers are aware that the returned data may be inconsistent:
| auto doc_map = result.value(); | |
| auto schema_result = (*coll_ptr)->Schema(); | |
| if (schema_result.has_value()) { | |
| normalize_nullable_fields_for_fetch(schema_result.value(), doc_map); | |
| } | |
| auto doc_map = result.value(); | |
| auto schema_result = (*coll_ptr)->Schema(); | |
| if (schema_result.has_value()) { | |
| normalize_nullable_fields_for_fetch(schema_result.value(), doc_map); | |
| } else { | |
| set_last_error("Failed to fetch schema for nullable normalization: " + | |
| schema_result.error().message()); | |
| // Proceed with un-normalized data rather than failing the fetch | |
| } | |
| return convert_fetched_document_results(doc_map, results, doc_count);) |
| void *buffer = malloc(64); | ||
| TEST_ASSERT(buffer != NULL); | ||
| zvec_free_ptr(buffer); |
There was a problem hiding this comment.
Test for
zvec_free_ptr doesn't reflect actual API usage
The test allocates memory with the caller's own malloc and then frees it via zvec_free_ptr. The documented purpose of zvec_free_ptr is to free memory allocated by the zvec C API to avoid allocator-mismatch across DLL boundaries. Using it on a buffer you malloc'd yourself is technically valid (same allocator), but it doesn't exercise the intended cross-DLL safety story.
A better test would call a zvec API that returns a heap-allocated buffer (e.g. a string returned from zvec_get_last_error_message or similar), and free it with zvec_free_ptr. This would validate the allocator symmetry the API is designed to guarantee.
| auto doc_map = result.value(); | ||
| auto schema_result = (*coll_ptr)->Schema(); | ||
| if (schema_result.has_value()) { | ||
| normalize_nullable_fields_for_fetch(schema_result.value(), doc_map); |
There was a problem hiding this comment.
Currently, the C++ implementation omits the key from the document if the field is null. However, if you require the document to include all fields regardless, we can certainly accommodate that.
Summary
zvec_doc_set_field_nullfor explicit nullable writeszvec_free_ptr) and testsTests
cmake --build build-capi --target c_api_test -j4ctest --output-on-failure -R c_api_testGreptile Summary
This PR extends the C API with four new
_with_resultsDML variants (insert/update/upsert/delete) that return per-documentZVecWriteResultstructs, azvec_doc_set_field_nullsetter for explicit nullable writes, a genericzvec_free_ptrhelper, and fetch-side normalization of nullable fields. The overall design and API shape are sound and consistent with the existing C API patterns.Key concerns:
copy_string: The pre-existingcopy_stringhelper does not check ifmallocreturns null before callingstrcpy, causing a null-pointer dereference. This is now called2×Ntimes per batch write inbuild_write_results, making it realistically reachable under memory pressure.pkinZVecWriteResult:copy_stringreturnsnullptrfor empty strings, so documents without a PK silently produce anullptrinresults[i].pk. The struct documentation does not call this out, risking unexpected null dereferences in caller code.Schema()fails while normalizing nullable fields on fetch, the function silently skips normalization and returns data without null markers, directly undermining the feature's correctness guarantee.zvec_free_ptrtest doesn't exercise cross-DLL safety: The test allocates with the caller'smallocrather than with a zvec API, so it does not validate the allocator-symmetry the function is designed to provide.Confidence Score: 2/5
Important Files Changed
copy_stringcrashes on malloc failure (null deref via strcpy), and empty PKs silently produce null entries in result arrays.Sequence Diagram
sequenceDiagram participant Caller as C Caller participant API as C API (c_api.cc) participant Coll as zvec::Collection Note over Caller,Coll: _with_results DML flow (insert/update/upsert) Caller->>API: zvec_collection_upsert_with_results(collection, docs, N, &results, &count) API->>API: collect_doc_pks(docs, N) → pks[] API->>API: convert_zvec_docs_to_internal(docs, N) → internal_docs[] API->>Coll: Upsert(internal_docs) Coll-->>API: Expected<vector<Status>> API->>API: handle_expected_result(result) alt operation failed API-->>Caller: ZVEC_ERROR_* (results=nullptr, count=0) else operation succeeded API->>API: build_write_results(statuses, pks, &results, &count) Note over API: calloc(N, sizeof(ZVecWriteResult))<br/>loop: copy_string(pk), copy_string(message) API-->>Caller: ZVEC_OK, results[N], count=N Caller->>API: zvec_write_results_free(results, count) API->>API: free_write_results_internal(results, count) end Note over Caller,Coll: fetch + nullable normalization flow Caller->>API: zvec_collection_fetch(collection, pks, N, &docs, &count) API->>Coll: Fetch(pk_vector) Coll-->>API: Expected<DocPtrMap> API->>Coll: Schema() Coll-->>API: Expected<CollectionSchema> alt schema available API->>API: normalize_nullable_fields_for_fetch(schema, doc_map) Note over API: For each nullable field absent from a doc,<br/>call doc->set_null(field_name) else schema fetch failed API->>API: (silently skip normalization) end API->>API: convert_fetched_document_results(doc_map, &docs, &count) API-->>Caller: ZVEC_OK, docs[N]Comments Outside Diff (1)
src/c_api/c_api.cc, line 793-799 (link)copy_stringWhen
mallocreturnsnullptr(OOM),strcpy(copy, str.c_str())is called with a null destination pointer, causing a segfault/undefined behavior instead of returning an error. This is a pre-existing bug, butbuild_write_resultsnow callscopy_stringin a tight loop — 2×N times for a batch of N documents — making this path reachable under memory pressure in production:Fix by checking before dereferencing:
Additionally,
build_write_resultsshould then detect a nullpk/messagefrom a non-empty input and clean up the partially-built array before returningZVEC_ERROR_INTERNAL_ERROR.Last reviewed commit: f137932