Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RGW - Add POSIX Driver #52933

Merged
merged 1 commit into from
Sep 15, 2023
Merged

RGW - Add POSIX Driver #52933

merged 1 commit into from
Sep 15, 2023

Conversation

dang
Copy link
Contributor

@dang dang commented Aug 10, 2023

This is the MVP for a driver for RGW that operates on top of a POSIX filesystem. It supports get, put, list, copy, multipart, external access via the filesystem itself, and ordered bucket listings via an LRU-based cache.

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a lot to review, but looks wonderful overall

src/rgw/driver/posix/lmdb-safe.cc Show resolved Hide resolved
src/rgw/CMakeLists.txt Outdated Show resolved Hide resolved
Comment on lines +154 to +160
#if 1
// depends on safe_link
if (! name_hook.is_linked()) {
// this should not happen!
abort();
}
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the #if seems unnecessary. can we replace that with an assertion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, what I was doing here was a) preferring to use normal_link; but b) have this to print info

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's an #if iow because I'd like this normally to have normal_link; I can templatize this or something later. I don't want asserts or ceph_asserts in the path. if we're going to change this now, make it normal_link in the decl and just remove this block

src/rgw/rgw_sal_filter.h Outdated Show resolved Hide resolved
src/rgw/rgw_op.cc Outdated Show resolved Hide resolved
Comment on lines +277 to +273
int get_shadow_bucket(const DoutPrefixProvider* dpp, optional_yield y,
const std::string& ns, const std::string& tenant,
const std::string& name, bool create,
std::unique_ptr<POSIXBucket>* shadow);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i take it that 'shadow bucket' is a subdirectory that stores the parts for a given multipart upload? sal::Bucket seems like a very wide interface to represent this. i think we should avoid using the sal interfaces to represent things that aren't s3 buckets or objects. what would a minimal 'shadow directory' interface need, if we created an internal class for it instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it this way to avoid code duplication.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

understood, but you can share code without sharing the interface. i'm just picturing a class POSIXBucket and class MultipartDirectory that both call helper functions that operate on file descriptors and paths

with this separation, it would be much easier to make changes or bugfixes to one without introducing weird regressions in the other

src/rgw/driver/posix/rgw_sal_posix.cc Outdated Show resolved Hide resolved
src/rgw/driver/posix/rgw_sal_posix.h Outdated Show resolved Hide resolved
src/rgw/driver/posix/rgw_sal_posix.h Outdated Show resolved Hide resolved
src/rgw/driver/posix/rgw_sal_posix.cc Outdated Show resolved Hide resolved
@@ -3618,6 +3618,7 @@ options:
- none
- base
- d4n
- posix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

part of rgw_config_store?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the question? In the future (work Ali is working on) all this will be replaced with JSON stored in the zonegroup, but for now it has to live in ceph.conf.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think it's necessary to have a posix-file backend for rgw::sal::ConfigStore. rgw_config_store=dbstore already works on a local filesystem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is "JSON stored in the zonegroup" equivalent to creating a posix-file backend for rgw::sal::ConfigStore? I don't follow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is "JSON stored in the zonegroup" equivalent to creating a posix-file backend for rgw::sal::ConfigStore? I don't follow

No, it describes the driver stack, so it's a replacement for rgw_backend_store and rgw_filter currently in ceph.conf. The actual zonegroup is stored in the ConfigStore, so whichever ConfigStore you use will store the JSON that describes the driver stack. I agree with Casey, we'll probably use DBStore's ConfigStore for POSIXDriver.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

part of rgw_config_store?

Ah, was the original question asking if this should be in rgw_config_store instead of rgw_filter? If so, then yes, eventually. Currently, POSIXDriver is a filter, since it doesn't implement any User APIs, but instead depends on DBUser. So it has to be a filter stacked on top of DBStore until we finish the User APIs.

@dang dang force-pushed the wip-dang-posix-driver branch 2 times, most recently from f38b739 to 8e44ec3 Compare August 24, 2023 15:34
@dang
Copy link
Contributor Author

dang commented Aug 29, 2023

jenkins test api

@dang
Copy link
Contributor Author

dang commented Aug 29, 2023

https://pulpito.ceph.com/dang-2023-08-25_09:21:58-rgw-wip-dang-posix-driver-distro-default-smithi/

This isn't a great run, but it's all valgind and health warns, nothing that could be caused by the (small) refactors that aren't in posixdriver.

@dang
Copy link
Contributor Author

dang commented Sep 7, 2023

https://pulpito.ceph.com/dang-2023-09-06_10:06:10-rgw-wip-dang-posix-driver-distro-default-smithi/

Finally a clean run. Are we good? Or are there other changes to make?

@cbodley
Copy link
Contributor

cbodley commented Sep 12, 2023

jenkins test windows

Comment on lines 319 to 328
# unittest_posix_bucket_cache
add_executable(unittest_posix_bucket_cache
test_posix_bucket_cache.cc)
target_compile_definitions(unittest_posix_bucket_cache PUBLIC LMDB_SAFE_NO_CPP_UTILITIES)
target_include_directories(unittest_posix_bucket_cache
SYSTEM PRIVATE "${CMAKE_SOURCE_DIR}/src/rgw"
SYSTEM PRIVATE "${CMAKE_SOURCE_DIR}/src/rgw/driver/posix")
target_link_libraries(unittest_posix_bucket_cache ${UNITTEST_LIBS}
${rgw_libs} ${LMDB_LIBRARIES})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be a add_ceph_unittest() to make this run during 'make check'

Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to merge once we see unittest_posix_bucket_cache passing in 'make check'

@cbodley
Copy link
Contributor

cbodley commented Sep 13, 2023

195/286 Test #232: unittest_posix_bucket_cache ...............***Failed 17.59 sec
...
[==========] 23 tests from 1 test suite ran. (17392 ms total)
[ PASSED ] 20 tests.
[ FAILED ] 3 tests, listed below:
[ FAILED ] BucketCache.ListMarker1
[ FAILED ] BucketCache.ListInotify1
[ FAILED ] BucketCache.List2Inotify1

@dang
Copy link
Contributor Author

dang commented Sep 14, 2023

195/286 Test #232: unittest_posix_bucket_cache ...............***Failed 17.59 sec
...
[==========] 23 tests from 1 test suite ran. (17392 ms total)
[ PASSED ] 20 tests.
[ FAILED ] 3 tests, listed below:
[ FAILED ] BucketCache.ListMarker1
[ FAILED ] BucketCache.ListInotify1
[ FAILED ] BucketCache.List2Inotify1

Yep, I'm looking at it.

@dang dang force-pushed the wip-dang-posix-driver branch 2 times, most recently from 265527c to 3ff66c7 Compare September 14, 2023 15:21
This is the MVP for a driver for RGW that operates on top of a POSIX
filesystem.  It supports get, put, list, copy, multipart, external
access via the filesystem itself, and ordered bucket listings via an
LRU-based cache.

Note that this is currently a Filter, indended to run on top of dbstore.
This is because it currently doesn't have any User implementation, so it
depends on dbstore's User.  Everything else is implemented in
POSIXDriver.  Once there is a User implementation, this will become a
Store, instead of a Filter.

Commit messages from bucket listing cache:

  rgw/posixdriver: recycle lmdb database handles as required

    While LMDB workflows often do not close/return database handles,
    ours continually reuses them.  This requires us to close each
    handle (atomically) when a cache entry is recycled.

  rgw/posixdriver: don't instantiate bucket cache entries from notify events

  rgw/posixdriver: incorporate lmdb-safe for now

    The current inclusion is based on https://github.com/Martchus/lmdb-safe,
    which is actively maintained but currently has some packaging issues the
    author has agreed to accept fixes for.

    For now, skip the submodule to save time and remove an external dependency.

  rgw/posixdriver: fix listing of cached, empty bucket

    * check lmdb enumeration result in all cases and w/better style
    * add unit test for enumeration of an empty cached directory

  rgw/posixdriver: nest lmdbs in a directory under the dbroot path to avoid cleanup issues

  rgw/posixdriver: refactor for posix integration

    * Derive BucketCache types as templates on a SAL driver and SAL
      bucket pair.

    * Integrate cache fills as callbacks into SAL layer (or mock, for
      tests)

    * Renaming and cleanups

  rgw/posixdriver: add bucket cache implementation and tests

    Adds free-standing cache of buckets and object names, with
    bucket names (and listing attributes, upcoming) managed in
    a hashed set of lmdb databases, which provides ordering and
    a high-performance listing cache.

    An framework for notification on new object creation (e.g.,
    outside S3 workflow) is provided, and a Linux implementation
    using inotify.

    FindLMDB.cmake taken with attribution and license.

  rgw/posixdriver: add zpp_bits serialization (FAST)

Signed-off-by: Daniel Gryniewicz <dang@redhat.com>
Signed-off-by: Ali Maredia <amaredia@redhat.com>
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
@dang
Copy link
Contributor Author

dang commented Sep 14, 2023

jenkins test make check

Copy link
Contributor

@mattbenjamin mattbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm (mint_listing_entry)

POSIXObject *pobj;
int ret;

ret = get_bucket(nullptr, nullptr, std::string(), bname, &b, null_yield);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@dang dang merged commit a4b1913 into ceph:main Sep 15, 2023
10 of 11 checks passed
@mattbenjamin
Copy link
Contributor

yay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants