Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add user defined metadata #5915

Merged
merged 10 commits into from
Jul 2, 2024
17 changes: 13 additions & 4 deletions object_store/src/attributes.rs
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,10 @@ pub enum Attribute {
///
/// See [Cache-Control](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control)
CacheControl,
/// Specifies a user-defined metadata field for the object
///
/// The String is a user-defined key
Metadata(String),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's likely a silliness that will make no measurable performance difference, but given these will often be static strings, I wonder what you think of making this Cow<'static, str>

}

/// The value of an [`Attribute`]
Expand Down Expand Up @@ -194,10 +198,11 @@ mod tests {
(Attribute::ContentLanguage, "en-US"),
(Attribute::ContentType, "test"),
(Attribute::CacheControl, "control"),
(Attribute::Metadata("key1".to_string()), "value1"),
]);

assert!(!attributes.is_empty());
assert_eq!(attributes.len(), 5);
assert_eq!(attributes.len(), 6);

assert_eq!(
attributes.get(&Attribute::ContentType),
Expand All @@ -210,18 +215,18 @@ mod tests {
attributes.insert(Attribute::CacheControl, "v1".into()),
Some(metav)
);
assert_eq!(attributes.len(), 5);
assert_eq!(attributes.len(), 6);

assert_eq!(
attributes.remove(&Attribute::CacheControl).unwrap(),
"v1".into()
);
assert_eq!(attributes.len(), 4);
assert_eq!(attributes.len(), 5);

let metav: AttributeValue = "v2".into();
attributes.insert(Attribute::CacheControl, metav.clone());
assert_eq!(attributes.get(&Attribute::CacheControl), Some(&metav));
assert_eq!(attributes.len(), 5);
assert_eq!(attributes.len(), 6);

assert_eq!(
attributes.get(&Attribute::ContentDisposition),
Expand All @@ -235,5 +240,9 @@ mod tests {
attributes.get(&Attribute::ContentLanguage),
Some(&"en-US".into())
);
assert_eq!(
attributes.get(&Attribute::Metadata("key1".to_string())),
Some(&"value1".into())
);
}
}
6 changes: 6 additions & 0 deletions object_store/src/aws/client.rs
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ use std::sync::Arc;

const VERSION_HEADER: &str = "x-amz-version-id";
const SHA256_CHECKSUM: &str = "x-amz-checksum-sha256";
const USER_DEFINED_METADATA_HEADER_PREFIX: &str = "x-amz-meta-";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe METADATA_HEADER_PREFIX is clear enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to rename to whatever. I chose to be quite verbose because I feel there's a lot of confusion about "metadata". You can see this play out in #4754, where the conversation muddles user-defined metadata and tags. Add in standard (non-user-defined) metadata and it gets even more murky.

Y'all let me know what to rename to, though. :)


/// A specialized `Error` for object store-related errors
#[derive(Debug, Snafu)]
Expand Down Expand Up @@ -326,6 +327,10 @@ impl<'a> Request<'a> {
has_content_type = true;
builder.header(CONTENT_TYPE, v.as_ref())
}
Attribute::Metadata(k_suffix) => builder.header(
&format!("{}{}", USER_DEFINED_METADATA_HEADER_PREFIX, k_suffix),
criccomini marked this conversation as resolved.
Show resolved Hide resolved
v.as_ref(),
),
};
}

Expand Down Expand Up @@ -642,6 +647,7 @@ impl GetClient for S3Client {
etag_required: false,
last_modified_required: false,
version_header: Some(VERSION_HEADER),
user_defined_metadata_prefix: Some(USER_DEFINED_METADATA_HEADER_PREFIX),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: how about using metadata_header_prefix?

};

/// Make an S3 GET request <https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html>
Expand Down
6 changes: 6 additions & 0 deletions object_store/src/azure/client.rs
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ use std::time::Duration;
use url::Url;

const VERSION_HEADER: &str = "x-ms-version-id";
const USER_DEFINED_METADATA_HEADER_PREFIX: &str = "x-ms-meta-";
static MS_CACHE_CONTROL: HeaderName = HeaderName::from_static("x-ms-blob-cache-control");
static MS_CONTENT_TYPE: HeaderName = HeaderName::from_static("x-ms-blob-content-type");
static MS_CONTENT_DISPOSITION: HeaderName =
Expand Down Expand Up @@ -208,6 +209,10 @@ impl<'a> PutRequest<'a> {
has_content_type = true;
builder.header(&MS_CONTENT_TYPE, v.as_ref())
}
Attribute::Metadata(k_suffix) => builder.header(
&format!("{}{}", USER_DEFINED_METADATA_HEADER_PREFIX, k_suffix),
v.as_ref(),
),
};
}

Expand Down Expand Up @@ -499,6 +504,7 @@ impl GetClient for AzureClient {
etag_required: true,
last_modified_required: true,
version_header: Some(VERSION_HEADER),
user_defined_metadata_prefix: Some(USER_DEFINED_METADATA_HEADER_PREFIX),
};

/// Make an Azure GET request
Expand Down
54 changes: 51 additions & 3 deletions object_store/src/client/get.rs
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,23 @@ fn get_result<T: GetClient>(
)
);

// Add attributes that match the user-defined metadata prefix (e.g. x-amz-meta-)
let attributes = if let Some(prefix) = T::HEADER_CONFIG.user_defined_metadata_prefix {
let mut attributes = attributes.clone();
criccomini marked this conversation as resolved.
Show resolved Hide resolved
for (key, val) in response.headers() {
criccomini marked this conversation as resolved.
Show resolved Hide resolved
if key.as_str().starts_with(prefix) {
let suffix = key.as_str()[prefix.len()..].to_string();
attributes.insert(
Attribute::Metadata(suffix),
val.to_str().unwrap().to_string().into(),
);
}
}
attributes
} else {
attributes
};

let stream = response
.bytes_stream()
.map_err(|source| crate::Error::Generic {
Expand Down Expand Up @@ -253,6 +270,7 @@ mod tests {
etag_required: false,
last_modified_required: false,
version_header: None,
user_defined_metadata_prefix: Some("x-test-meta-"),
};

async fn get_request(&self, _: &Path, _: GetOptions) -> Result<Response> {
Expand All @@ -265,6 +283,7 @@ mod tests {
range: Option<Range<usize>>,
status: StatusCode,
content_range: Option<&str>,
headers: Option<Vec<(&str, &str)>>,
) -> Response {
let mut builder = http::Response::builder();
if let Some(range) = content_range {
Expand All @@ -276,6 +295,12 @@ mod tests {
None => vec![0_u8; object_size],
};

if let Some(headers) = headers {
for (key, value) in headers {
builder = builder.header(key, value);
}
}

builder
.status(status)
.header(CONTENT_LENGTH, object_size)
Expand All @@ -288,7 +313,7 @@ mod tests {
async fn test_get_result() {
let path = Path::from("test");

let resp = make_response(12, None, StatusCode::OK, None);
let resp = make_response(12, None, StatusCode::OK, None, None);
let res = get_result::<TestClient>(&path, None, resp).unwrap();
assert_eq!(res.meta.size, 12);
assert_eq!(res.range, 0..12);
Expand All @@ -302,14 +327,15 @@ mod tests {
Some(2..3),
StatusCode::PARTIAL_CONTENT,
Some("bytes 2-2/12"),
None,
);
let res = get_result::<TestClient>(&path, Some(get_range.clone()), resp).unwrap();
assert_eq!(res.meta.size, 12);
assert_eq!(res.range, 2..3);
let bytes = res.bytes().await.unwrap();
assert_eq!(bytes.len(), 1);

let resp = make_response(12, Some(2..3), StatusCode::OK, None);
let resp = make_response(12, Some(2..3), StatusCode::OK, None, None);
let err = get_result::<TestClient>(&path, Some(get_range.clone()), resp).unwrap_err();
assert_eq!(
err.to_string(),
Expand All @@ -321,6 +347,7 @@ mod tests {
Some(2..3),
StatusCode::PARTIAL_CONTENT,
Some("bytes 2-3/12"),
None,
);
let err = get_result::<TestClient>(&path, Some(get_range.clone()), resp).unwrap_err();
assert_eq!(err.to_string(), "Requested 2..3, got 2..4");
Expand All @@ -330,14 +357,15 @@ mod tests {
Some(2..3),
StatusCode::PARTIAL_CONTENT,
Some("bytes 2-2/*"),
None,
);
let err = get_result::<TestClient>(&path, Some(get_range.clone()), resp).unwrap_err();
assert_eq!(
err.to_string(),
"Failed to parse value for CONTENT_RANGE header: \"bytes 2-2/*\""
);

let resp = make_response(12, Some(2..3), StatusCode::PARTIAL_CONTENT, None);
let resp = make_response(12, Some(2..3), StatusCode::PARTIAL_CONTENT, None, None);
let err = get_result::<TestClient>(&path, Some(get_range.clone()), resp).unwrap_err();
assert_eq!(
err.to_string(),
Expand All @@ -349,6 +377,7 @@ mod tests {
Some(2..3),
StatusCode::PARTIAL_CONTENT,
Some("bytes 2-3/2"),
None,
);
let err = get_result::<TestClient>(&path, Some(get_range.clone()), resp).unwrap_err();
assert_eq!(
Expand All @@ -361,6 +390,7 @@ mod tests {
Some(2..6),
StatusCode::PARTIAL_CONTENT,
Some("bytes 2-5/6"),
None,
);
let res = get_result::<TestClient>(&path, Some(GetRange::Suffix(4)), resp).unwrap();
assert_eq!(res.meta.size, 6);
Expand All @@ -373,8 +403,26 @@ mod tests {
Some(2..6),
StatusCode::PARTIAL_CONTENT,
Some("bytes 2-3/6"),
None,
);
let err = get_result::<TestClient>(&path, Some(GetRange::Suffix(4)), resp).unwrap_err();
assert_eq!(err.to_string(), "Requested 2..6, got 2..4");

let resp = make_response(
12,
None,
StatusCode::OK,
None,
Some(vec![("x-test-meta-foo", "bar")]),
);
let res = get_result::<TestClient>(&path, None, resp).unwrap();
assert_eq!(res.meta.size, 12);
assert_eq!(res.range, 0..12);
assert_eq!(
res.attributes.get(&Attribute::Metadata("foo".to_string())),
Some(&"bar".into())
);
let bytes = res.bytes().await.unwrap();
assert_eq!(bytes.len(), 12);
}
}
4 changes: 4 additions & 0 deletions object_store/src/client/header.rs
Original file line number Diff line number Diff line change
Expand Up @@ -31,13 +31,17 @@ pub struct HeaderConfig {
///
/// Defaults to `true`
pub etag_required: bool,

/// Whether to require a Last-Modified header when extracting [`ObjectMeta`] from headers.
///
/// Defaults to `true`
pub last_modified_required: bool,

/// The version header name if any
pub version_header: Option<&'static str>,

/// The user defined metadata prefix if any
pub user_defined_metadata_prefix: Option<&'static str>,
}

#[derive(Debug, Snafu)]
Expand Down
6 changes: 6 additions & 0 deletions object_store/src/gcp/client.rs
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ use std::sync::Arc;

const VERSION_HEADER: &str = "x-goog-generation";
const DEFAULT_CONTENT_TYPE: &str = "application/octet-stream";
const USER_DEFINED_METADATA_HEADER_PREFIX: &str = "x-goog-meta-";

static VERSION_MATCH: HeaderName = HeaderName::from_static("x-goog-if-generation-match");

Expand Down Expand Up @@ -199,6 +200,10 @@ impl<'a> Request<'a> {
has_content_type = true;
builder.header(CONTENT_TYPE, v.as_ref())
}
Attribute::Metadata(k_suffix) => builder.header(
&format!("{}{}", USER_DEFINED_METADATA_HEADER_PREFIX, k_suffix),
v.as_ref(),
),
};
}

Expand Down Expand Up @@ -567,6 +572,7 @@ impl GetClient for GoogleCloudStorageClient {
etag_required: true,
last_modified_required: true,
version_header: Some(VERSION_HEADER),
user_defined_metadata_prefix: Some(USER_DEFINED_METADATA_HEADER_PREFIX),
};

/// Perform a get request <https://cloud.google.com/storage/docs/xml-api/get-object-download>
Expand Down
3 changes: 3 additions & 0 deletions object_store/src/http/client.rs
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,8 @@ impl Client {
has_content_type = true;
builder.header(CONTENT_TYPE, v.as_ref())
}
// Ignore metadata attributes
Attribute::Metadata(_) => builder,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WebDAV handles custom metadata in prop, which differs from other object storage services. We can create an issue to track this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

};
}

Expand Down Expand Up @@ -319,6 +321,7 @@ impl GetClient for Client {
etag_required: false,
last_modified_required: false,
version_header: None,
user_defined_metadata_prefix: None,
};

async fn get_request(&self, path: &Path, options: GetOptions) -> Result<Response> {
Expand Down
1 change: 1 addition & 0 deletions object_store/src/integration.rs
Original file line number Diff line number Diff line change
Expand Up @@ -480,6 +480,7 @@ pub async fn put_get_attributes(integration: &dyn ObjectStore) {
(Attribute::ContentEncoding, "gzip"),
(Attribute::ContentLanguage, "en-US"),
(Attribute::ContentType, "text/html; charset=utf-8"),
(Attribute::Metadata("test_key".to_string()), "test_value"),
]);

let path = Path::from("attributes");
Expand Down
Loading