Skip to content

Commit

Permalink
Add CurrentDomain API support. (#5041)
Browse files Browse the repository at this point in the history
This PR adds the foundation for the CurrentDomain API:
- format change in array schema and disk serialization
- plumbing so you can create/open an array with current_domain data
using sm APIs
- array schema dump extension
- test coverage

To be done C and CPP API that wrap APIs defined here.

[sc-42489]

---
TYPE: FEATURE
DESC: Add CurrentDomain API support.

---------

Co-authored-by: Luc Rancourt <lucrancourt@gmail.com>
Co-authored-by: KiterLuc <67824247+KiterLuc@users.noreply.github.com>
  • Loading branch information
3 people committed Jun 7, 2024
1 parent 1326ed4 commit 9116d3c
Show file tree
Hide file tree
Showing 27 changed files with 1,870 additions and 38 deletions.
2 changes: 1 addition & 1 deletion format_spec/FORMAT_SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: Format Specification

**Notes:**

* The current TileDB format version number is **21** (`uint32_t`).
* The current TileDB format version number is **22** (`uint32_t`).
* Data written by TileDB and referenced in this document is **little-endian**
with the following exceptions:

Expand Down
1 change: 1 addition & 0 deletions format_spec/array_schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ The array schema file consists of a single [generic tile](./generic_tile.md), wi
| Label 1 | [Dimension Label](#dimension_label) | First dimension label |
||||
| Label N | [Dimension Label](#dimension_label) | Nth dimension label |
| CurrentDomain | [CurrentDomain](./current_domain.md) | The array current domain |

## Domain

Expand Down
15 changes: 15 additions & 0 deletions format_spec/current_domain.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
title: Current Domain
---

## Main Structure

The array current domain is stored within the [Array Schema](./array_schema.md#array-schema-file)
file. If a current domain is empty, only the version number and the empty flag are serialized to storage.

| **Field** | **Type** | **Description** |
| :--- | :--- | :--- |
| Version number | `uint32_t` | Current domain version number |
| Empty | `uint8_t` | Whether the current domain has a representation(e.g. NDRectangle) set or not |
| Type | `uint8_t` | The type of current domain stored in this file |
| NDRectangle | [MBR](./fragment.md#mbr) | A hyperrectangle defined using [1DRange](./fragment.md#mbr) items for each dimension |
6 changes: 5 additions & 1 deletion test/src/unit-capi-array_schema.cc
Original file line number Diff line number Diff line change
Expand Up @@ -921,7 +921,11 @@ void ArraySchemaFx::load_and_check_array_schema(const std::string& path) {
"- Cell val num: " + CELL_VAL_NUM_STR + "\n" + "- Filters: 2\n" +
" > BZIP2: COMPRESSION_LEVEL=5\n" +
" > BitWidthReduction: BIT_WIDTH_MAX_WINDOW=1000\n" +
"- Fill value: " + FILL_VALUE_STR + "\n";
"- Fill value: " + FILL_VALUE_STR + "\n" + "### Current domain ###\n" +
"- Version: " +
std::to_string(tiledb::sm::constants::current_domain_version) + "\n" +
"- Empty: 1" + "\n";

FILE* gold_fout = fopen("gold_fout.txt", "w");
const char* dump = dump_str.c_str();
fwrite(dump, sizeof(char), strlen(dump), gold_fout);
Expand Down
2 changes: 2 additions & 0 deletions tiledb/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,8 @@ set(TILEDB_CORE_SOURCES
${TILEDB_CORE_INCLUDE_DIR}/tiledb/sm/array_schema/dimension_label.cc
${TILEDB_CORE_INCLUDE_DIR}/tiledb/sm/array_schema/domain.cc
${TILEDB_CORE_INCLUDE_DIR}/tiledb/sm/array_schema/enumeration.cc
${TILEDB_CORE_INCLUDE_DIR}/tiledb/sm/array_schema/ndrectangle.cc
${TILEDB_CORE_INCLUDE_DIR}/tiledb/sm/array_schema/current_domain.cc
${TILEDB_CORE_INCLUDE_DIR}/tiledb/sm/buffer/buffer.cc
${TILEDB_CORE_INCLUDE_DIR}/tiledb/sm/buffer/buffer_list.cc
${TILEDB_CORE_INCLUDE_DIR}/tiledb/sm/c_api/api_argument_validator.cc
Expand Down
7 changes: 7 additions & 0 deletions tiledb/sm/array/array.cc
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
#include "tiledb/sm/array_schema/array_schema_evolution.h"
#include "tiledb/sm/array_schema/attribute.h"
#include "tiledb/sm/array_schema/auxiliary.h"
#include "tiledb/sm/array_schema/current_domain.h"
#include "tiledb/sm/array_schema/dimension.h"
#include "tiledb/sm/array_schema/domain.h"
#include "tiledb/sm/crypto/crypto.h"
Expand Down Expand Up @@ -264,6 +265,12 @@ Status Array::create(
array_schema->generate_uri();
array_schema->check(resources.config());

// Check current domain is specified correctly if set
if (!array_schema->get_current_domain()->empty()) {
array_schema->get_current_domain()->check_schema_sanity(
array_schema->shared_domain());
}

// Create array directory
throw_if_not_ok(resources.vfs().create_dir(array_uri));

Expand Down
18 changes: 17 additions & 1 deletion tiledb/sm/array_schema/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -65,13 +65,29 @@ commence(object_library enumeration)
this_target_object_libraries(buffer constants seedable_global_PRNG)
conclude(object_library)

#
# `ndrectangle` object library
#
commence(object_library ndrectangle)
this_target_sources(ndrectangle.cc)
this_target_object_libraries(constants domain)
conclude(object_library)

#
# `current_domain` object library
#
commence(object_library current_domain)
this_target_sources(current_domain.cc)
this_target_object_libraries(ndrectangle constants)
conclude(object_library)

#
# `array_schema` object library
#
commence(object_library array_schema)
this_target_sources(array_schema.cc dimension_label.cc)
this_target_object_libraries(
attribute domain enumeration fragment time uri_format vfs)
attribute domain enumeration fragment current_domain time uri_format vfs)
conclude(object_library)

add_test_subdirectory()
63 changes: 61 additions & 2 deletions tiledb/sm/array_schema/array_schema.cc
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
#include "tiledb/common/logger.h"
#include "tiledb/common/memory_tracker.h"
#include "tiledb/sm/array_schema/attribute.h"
#include "tiledb/sm/array_schema/current_domain.h"
#include "tiledb/sm/array_schema/dimension.h"
#include "tiledb/sm/array_schema/dimension_label.h"
#include "tiledb/sm/array_schema/domain.h"
Expand Down Expand Up @@ -104,7 +105,9 @@ ArraySchema::ArraySchema(
memory_tracker_->get_resource(MemoryType::DIMENSION_LABELS))
, enumeration_map_(memory_tracker_->get_resource(MemoryType::ENUMERATION))
, enumeration_path_map_(
memory_tracker_->get_resource(MemoryType::ENUMERATION_PATHS)) {
memory_tracker_->get_resource(MemoryType::ENUMERATION_PATHS))
, current_domain_(make_shared<CurrentDomain>(
memory_tracker, constants::current_domain_version)) {
// Set up default filter pipelines for coords, offsets, and validity values.
coords_filters_.add_filter(CompressionFilter(
constants::coords_compression,
Expand Down Expand Up @@ -141,6 +144,7 @@ ArraySchema::ArraySchema(
FilterPipeline cell_var_offsets_filters,
FilterPipeline cell_validity_filters,
FilterPipeline coords_filters,
shared_ptr<const CurrentDomain> current_domain,
shared_ptr<MemoryTracker> memory_tracker)
: memory_tracker_(memory_tracker)
, uri_(uri)
Expand All @@ -165,7 +169,8 @@ ArraySchema::ArraySchema(
memory_tracker_->get_resource(MemoryType::ENUMERATION_PATHS))
, cell_var_offsets_filters_(cell_var_offsets_filters)
, cell_validity_filters_(cell_validity_filters)
, coords_filters_(coords_filters) {
, coords_filters_(coords_filters)
, current_domain_(current_domain) {
for (auto atr : attributes) {
attributes_.push_back(atr);
}
Expand Down Expand Up @@ -256,6 +261,7 @@ ArraySchema::ArraySchema(const ArraySchema& array_schema)
, cell_var_offsets_filters_{array_schema.cell_var_offsets_filters_}
, cell_validity_filters_{array_schema.cell_validity_filters_}
, coords_filters_{array_schema.coords_filters_}
, current_domain_(array_schema.current_domain_)
, mtx_{} {
throw_if_not_ok(set_domain(array_schema.domain_));

Expand Down Expand Up @@ -737,6 +743,9 @@ void ArraySchema::dump(FILE* out) const {
fprintf(out, "\n");
label->dump(out);
}

// Print out array current domain
current_domain_->dump(out);
}

Status ArraySchema::has_attribute(
Expand Down Expand Up @@ -804,6 +813,7 @@ bool ArraySchema::is_nullable(const std::string& name) const {
// dimension_label #1
// dimension_label #2
// ...
// current_domain
void ArraySchema::serialize(Serializer& serializer) const {
// Write version, which is always the current version. Despite
// the in-memory `version_`, we will serialize every array schema
Expand Down Expand Up @@ -871,6 +881,9 @@ void ArraySchema::serialize(Serializer& serializer) const {
serializer.write<uint32_t>(enmr_uri_size);
serializer.write(enmr_uri.data(), enmr_uri_size);
}

// Serialize array current domain information
current_domain_->serialize(serializer);
}

Layout ArraySchema::tile_order() const {
Expand Down Expand Up @@ -1422,6 +1435,15 @@ shared_ptr<ArraySchema> ArraySchema::deserialize(
}
}

// Load the array current domain, if this is an older array, it'll get by
// default an empty current domain object
auto current_domain = make_shared<const CurrentDomain>(
memory_tracker, constants::current_domain_version);
if (version >= constants::current_domain_min_format_version) {
current_domain =
CurrentDomain::deserialize(deserializer, memory_tracker, domain);
}

// Validate
if (cell_order == Layout::HILBERT &&
domain->dim_num() > Hilbert::HC_MAX_DIM) {
Expand Down Expand Up @@ -1471,6 +1493,7 @@ shared_ptr<ArraySchema> ArraySchema::deserialize(
FilterPipeline(
coords_filters,
version < 5 ? domain->dimension_ptr(0)->type() : Datatype::UINT64),
current_domain,
memory_tracker);
}

Expand Down Expand Up @@ -1791,4 +1814,40 @@ void ArraySchema::generate_uri(
array_uri_.join_path(constants::array_schema_dir_name).join_path(name_);
}

void ArraySchema::expand_current_domain(
shared_ptr<const CurrentDomain> new_current_domain) {
if (new_current_domain == nullptr) {
throw ArraySchemaException(
"The argument specified for current domain expansion is nullptr.");
}

// Check that the new current domain expands the existing one and not shrinks
// it Every current domain covers an empty current domain.
if (!current_domain_->empty() &&
!current_domain_->covered(new_current_domain)) {
throw ArraySchemaException(
"The current domain of an array can only be expanded, please adjust "
"your new current domain object.");
}

new_current_domain->check_schema_sanity(this->shared_domain());

current_domain_ = new_current_domain;
}

shared_ptr<const CurrentDomain> ArraySchema::get_current_domain() const {
return current_domain_;
}

void ArraySchema::set_current_domain(
shared_ptr<const CurrentDomain> current_domain) {
if (current_domain == nullptr) {
throw ArraySchemaException(
"The argument specified for setting the current domain on the "
"schema is nullptr.");
}

current_domain_ = current_domain;
}

} // namespace tiledb::sm
24 changes: 24 additions & 0 deletions tiledb/sm/array_schema/array_schema.h
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ class DimensionLabel;
class Domain;
class Enumeration;
class MemoryTracker;
class CurrentDomain;

enum class ArrayType : uint8_t;
enum class Compressor : uint8_t;
Expand Down Expand Up @@ -120,6 +121,7 @@ class ArraySchema {
* @param cell_validity_filters
* The filter pipeline run on validity tiles for nullable attributes.
* @param coords_filters The filter pipeline run on coordinate tiles.
* @param current_domain The array current domain object
* @param memory_tracker The memory tracker of the array this fragment
* metadata corresponds to.
**/
Expand All @@ -141,6 +143,7 @@ class ArraySchema {
FilterPipeline cell_var_offsets_filters,
FilterPipeline cell_validity_filters,
FilterPipeline coords_filters,
shared_ptr<const CurrentDomain> current_domain,
shared_ptr<MemoryTracker> memory_tracker);

/**
Expand Down Expand Up @@ -585,6 +588,24 @@ class ArraySchema {
std::optional<std::pair<uint64_t, uint64_t>> timestamp_range =
std::nullopt);

/**
* Expand the array current domain
*
* @param new_current_domain The new array current domain we want to expand to
*/
void expand_current_domain(
shared_ptr<const CurrentDomain> new_current_domain);

/**
* Set the array current domain on the schema
*
* @param current_domain The array current domain we want to set on the schema
*/
void set_current_domain(shared_ptr<const CurrentDomain> current_domain);

/** Array current domain accessor */
shared_ptr<const CurrentDomain> get_current_domain() const;

private:
/* ********************************* */
/* PRIVATE ATTRIBUTES */
Expand Down Expand Up @@ -700,6 +721,9 @@ class ArraySchema {
/** The filter pipeline run on coordinate tiles. */
FilterPipeline coords_filters_;

/** The array current domain */
shared_ptr<const CurrentDomain> current_domain_;

/** Mutex for thread-safety. */
mutable std::mutex mtx_;

Expand Down
35 changes: 34 additions & 1 deletion tiledb/sm/array_schema/array_schema_evolution.cc
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
#include "tiledb/common/status.h"
#include "tiledb/sm/array_schema/array_schema.h"
#include "tiledb/sm/array_schema/attribute.h"
#include "tiledb/sm/array_schema/current_domain.h"
#include "tiledb/sm/array_schema/dimension.h"
#include "tiledb/sm/array_schema/domain.h"
#include "tiledb/sm/array_schema/enumeration.h"
Expand Down Expand Up @@ -91,6 +92,7 @@ ArraySchemaEvolution::ArraySchemaEvolution(
enmrs_to_extend,
std::unordered_set<std::string> enmrs_to_drop,
std::pair<uint64_t, uint64_t> timestamp_range,
shared_ptr<const CurrentDomain> current_domain,
shared_ptr<MemoryTracker> memory_tracker)
: memory_tracker_(memory_tracker)
, attributes_to_add_map_(
Expand All @@ -101,7 +103,8 @@ ArraySchemaEvolution::ArraySchemaEvolution(
, enumerations_to_extend_map_(
memory_tracker_->get_resource(MemoryType::ENUMERATION))
, enumerations_to_drop_(enmrs_to_drop)
, timestamp_range_(timestamp_range) {
, timestamp_range_(timestamp_range)
, current_domain_to_expand_(current_domain) {
for (auto& elem : attrs_to_add) {
attributes_to_add_map_.insert(elem);
}
Expand Down Expand Up @@ -175,6 +178,11 @@ shared_ptr<ArraySchema> ArraySchemaEvolution::evolve_schema(
schema->generate_uri();
}

// Get expanded current domain
if (current_domain_to_expand_) {
schema->expand_current_domain(current_domain_to_expand_);
}

return schema;
}

Expand Down Expand Up @@ -370,6 +378,30 @@ std::pair<uint64_t, uint64_t> ArraySchemaEvolution::timestamp_range() const {
timestamp_range_.first, timestamp_range_.second);
}

void ArraySchemaEvolution::expand_current_domain(
shared_ptr<const CurrentDomain> current_domain) {
if (current_domain == nullptr) {
throw ArraySchemaEvolutionException(
"Cannot expand the array current domain; Input current domain is null");
}

if (current_domain->empty()) {
throw ArraySchemaEvolutionException(
"Unable to expand the array current domain, the new current domain "
"specified is empty");
}

std::lock_guard<std::mutex> lock(mtx_);
current_domain_to_expand_ = current_domain;
}

shared_ptr<const CurrentDomain> ArraySchemaEvolution::current_domain_to_expand()
const {
std::lock_guard<std::mutex> lock(mtx_);

return current_domain_to_expand_;
}

/* ****************************** */
/* PRIVATE METHODS */
/* ****************************** */
Expand All @@ -380,6 +412,7 @@ void ArraySchemaEvolution::clear() {
enumerations_to_add_map_.clear();
enumerations_to_drop_.clear();
timestamp_range_ = {0, 0};
current_domain_to_expand_ = nullptr;
}

} // namespace tiledb::sm
Loading

0 comments on commit 9116d3c

Please sign in to comment.