Background
Both RecordBatch/Schema and Field can have metadata. In both cases they are encoded as HashMap<String, String>
One downside with this is that cloning the metadata is slow (requires a deep clone and a lot of allocations). This is in contrast with basically everything else in arrow-rs, which uses an Arc for fast cloning.
Proposal outline
#[derive(Clone, Default, …)]
pub struct Metadata(
// Use `Option` to avoid allocation in case of empty metadata
Option<Arc<BTreeMap<String, String>>>
)
impl Metdata {
pub fn get(&self, key: &str) -> Option<&String> { … }
/// Does deep clone if (and only if) this `Metadata` is shared
pub fn insert(&mut self, key: impl Into<String>, value: impl Into<String>) {
Arc::make_mut(self.0.get_or_insert_default()).insert(key.into(), value.into());
}
…
}
impl Index<…> for Metadata …
impl From<HashMap<String, String>> for Metadata { … }
impl From<BTreeMap<String, String>> for Metadata { … }
impl Into<HashMap<String, String>> for Metadata { … }
impl Into<BTreeMap<String, String>> for Metadata { … }
impl IntoIterator, FromIterator, …
PRO/CON vs status quo (HashMap<String, String>)
- PRO: Fast cloning of the whole
Metadata
- PRO: Deterministic iteration order (thanks to
BTreeMap) - good for IPC/FFI encoding, test stability, hashing, …
- NEUTRAL: Can still add/remove
Metadata fields without extra cost
- CON: New type; more complexity
Alternatives
Instead of storing String, we could store Arc<str>. That would make it efficient to share the same keys across many metadata tables.
The downside is added complexity.
Background
Both
RecordBatch/SchemaandFieldcan havemetadata. In both cases they are encoded asHashMap<String, String>One downside with this is that cloning the metadata is slow (requires a deep clone and a lot of allocations). This is in contrast with basically everything else in
arrow-rs, which uses anArcfor fast cloning.Proposal outline
PRO/CON vs status quo (
HashMap<String, String>)MetadataBTreeMap) - good for IPC/FFI encoding, test stability, hashing, …Metadatafields without extra costAlternatives
Instead of storing
String, we could storeArc<str>. That would make it efficient to share the same keys across many metadata tables.The downside is added complexity.