rgw/s3vector: implement the VectorBucket API#66327
rgw/s3vector: implement the VectorBucket API#66327yuvalif wants to merge 3 commits intoceph:wip-s3vectorfrom
Conversation
according to: https://docs.aws.amazon.com/AmazonS3/latest/API/API_Operations_Amazon_S3_Vectors.html Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
do some fixes to message validation based on the tesst Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
726f536 to
7e88e92
Compare
TODO:
that's all that metadata sync should require, other than these mdlog entries |
done here: 340d931 |
src/rgw/driver/rados/rgw_bucket.cc
Outdated
| class RGWVectorBucketInstanceMetadataHandler : public RGWBucketInstanceMetadataHandler { | ||
| protected: | ||
| int put_prepare(const DoutPrefixProvider* dpp, optional_yield y, | ||
| const std::string& entry, RGWBucketCompleteInfo& bci, | ||
| const std::optional<RGWBucketCompleteInfo>& old_bci, | ||
| const RGWObjVersionTracker& objv_tracker, | ||
| bool from_remote_zone) override { return 0;} |
There was a problem hiding this comment.
for multisite, RGWBucketInstanceMetadataHandler::put_prepare() is what calls init_default_bucket_layout() to initialize the local bucket layout
since you're overriding put_prepare(), you can just force indexless here:
// vector buckets are indexless
bci.info.layout.current_index.layout.type = rgw::BucketIndexType::Indexless;
return 0;| rgw::sal::VectorBucket::CreateParams createparams; | ||
| createparams.owner = s->user->get_id(); | ||
| createparams.zonegroup_id = zonegroup.id; |
There was a problem hiding this comment.
then on creation, you can request indexless with:
// vector buckets are indexless
createparams.index_type = rgw::BucketIndexType::Indexless;|
jenkins test docs |
should be merged after ceph#66327 Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
should be merged after ceph#66327 Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
not implemented in this commit: * VectorBucket APIs (done in ceph#66327) * caching information as VectorBucket attributes and fetching them gtom there (after the VectoBucket PR is merged): schema, distance type * VectorBucket policy APIs * metadata support Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
| const CreateParams& params, | ||
| optional_yield y) = 0; | ||
|
|
||
| /** Get the cached attributes associated with this vector bucket */ |
There was a problem hiding this comment.
Could we avoid using optional_yield since this is a new interface? Since we're trying to do async between rust and C++, I'd really rather avoid having the back-end have to deal with the possibility of blocking, and instead have that handled at the caller. (If the functions we call use take optional_yield they should be callable with the straight yield_context?
There was a problem hiding this comment.
the vector bucket creation does not involve calling into lancedb, this is veriy similar to the flow of S3 bucket creation.
since optional_yield is used there, and since I'm using the same (or similar) calls, I think that we are "stuck" with it for now...
src/rgw/rgw_sal.h
Outdated
| /** Get the cached placement rule of this vector bucket */ | ||
| virtual rgw_placement_rule& get_placement_rule() = 0; | ||
| /** Get the cached creation time of this vector bucket */ | ||
| virtual ceph::real_time& get_creation_time() = 0; |
There was a problem hiding this comment.
Returning a non-const reference here doesn't seem right. SInce it's a 64-bit integer, could we just return it by value?
There was a problem hiding this comment.
i copied that from sal::Bucket, but i completly agree that it should be by value
src/rgw/rgw_sal.h
Outdated
| /** Get the cached ID of this vector bucket */ | ||
| virtual const std::string& get_bucket_id() const = 0; | ||
| /** Get the cached placement rule of this vector bucket */ | ||
| virtual rgw_placement_rule& get_placement_rule() = 0; |
There was a problem hiding this comment.
This being a mutable reference doesn't seem right?
src/rgw/rgw_sal.h
Outdated
| /** Get the key for this vector bucket */ | ||
| virtual rgw_bucket& get_key() = 0; | ||
| /** Get the info for this vector bucket */ | ||
| virtual RGWBucketInfo& get_info() = 0; |
src/rgw/driver/rados/rgw_rados.cc
Outdated
| ldpp_dout(dpp, 20) << "s3vector --- RGWRados::create_vector_bucke called" << dendl; | ||
| int ret = 0; | ||
|
|
||
| #define MAX_CREATE_RETRIES 20 /* need to bound retries */ |
There was a problem hiding this comment.
Please don't do this. Use a constexpr.
src/rgw/driver/rados/rgw_rados.cc
Outdated
|
|
||
| int RGWRados::get_raw_obj_ref(const DoutPrefixProvider *dpp, rgw_raw_obj obj, rgw_rados_ref* ref) | ||
| { | ||
| ldpp_dout(dpp, 1) << "INFO: s3vector -- called RGWRados::get_raw_obj_ref()" << dendl; |
There was a problem hiding this comment.
This prints s3vector whether s3vector is involved or not?
There was a problem hiding this comment.
will remove all of the debug log used to help with the call flow
src/rgw/driver/rados/rgw_rados.cc
Outdated
| RGWBucketEntryPoint ep; | ||
| r = ctl.vector_bucket->read_bucket_entrypoint_info(bucket_info.bucket, | ||
| &ep, | ||
| null_yield, |
There was a problem hiding this comment.
Why are we passing null_yield here?
There was a problem hiding this comment.
copied from the s3 bucket code. will fix here, but i'm not sure why null_yield is used there? @cbodley ?
src/rgw/rgw_sal.h
Outdated
| /** Store the cached bucket info into the backing store */ | ||
| //virtual int put_info(const DoutPrefixProvider* dpp, bool exclusive, ceph::real_time mtime, optional_yield y) = 0; |
There was a problem hiding this comment.
you'll probably need this to implement things like PutVectorBucketPolicy to update existing bucket instance metadata objects
There was a problem hiding this comment.
will add that. though the bucket policy part will be done in subsequent work
Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
e69ecd4 to
7e36888
Compare
|
cherry-picked into #66066 |
VectorBucketsandBucketsare with the prefixes of the RADOS object names. However, to make the code change less intrusive, I mainly used copy&paste, and inheritence to implement thatbucket_sobjrefactoringRGWBucketCtlrefactoringsal::VectorBucketrefactoringRGWRadosrefactoringVectorBucketsCurrently the tests that verify combination of buckets and vector buckets are failing. if a bucket is created with the same name as a vector bucket, the bucket creation fails (and vice versa)Checklist
Show available Jenkins commands
jenkins test classic perfJenkins Job | Jenkins Job Definitionjenkins test crimson perfJenkins Job | Jenkins Job Definitionjenkins test signedJenkins Job | Jenkins Job Definitionjenkins test make checkJenkins Job | Jenkins Job Definitionjenkins test make check arm64Jenkins Job | Jenkins Job Definitionjenkins test submodulesJenkins Job | Jenkins Job Definitionjenkins test dashboardJenkins Job | Jenkins Job Definitionjenkins test dashboard cephadmJenkins Job | Jenkins Job Definitionjenkins test apiJenkins Job | Jenkins Job Definitionjenkins test docsReadTheDocs | Github Workflow Definitionjenkins test ceph-volume allJenkins Jobs | Jenkins Jobs Definitionjenkins test windowsJenkins Job | Jenkins Job Definitionjenkins test rook e2eJenkins Job | Jenkins Job DefinitionYou must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.