Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNM]Wip rgw d4n next #56336

Draft
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

pritha-srivastava
Copy link
Contributor

@pritha-srivastava pritha-srivastava commented Mar 20, 2024

This PR contains the next set of changes for the d4n filter driver.

This PR so far has achieved a working write-back cache on a single node for non-multipart (small) objects, for both versioned and non-versioned objects.

To test various scenarios, bring up vstart cluster as follows:
MON=1 OSD=1 RGW=1 MGR=0 MDS=0 ../src/vstart.sh -n -d -o rgw_d4n_l1_datacache_persistent_path=/home/prsrivas/ceph/build/rgw_d4n_datacache/ -o rgw_d4n_l1_datacache_size=5368709120 -o rgw_filter=d4n -o d4n_writecache_enabled=true -o rgw_d4n_cache_cleaning_interval=600

The following steps are for uploading and downloading an object to/from a non-versioned bucket:

  1. Create a bucket:
    aws s3 mb s3://my-new-bucket --endpoint-url http://localhost:8000 --region us-east-1

  2. Upload a small object (non-multipart) object:
    aws s3 cp ./1M s3://my-new-bucket --endpoint-url http://localhost:8000 --region us-east-1

  3. Check d4n datacache contents - you will see one entry for head, and other entries belonging to data of an object
    ls -l rgw_d4n_datacache/
    -rw-r--r--. 1 prsrivas prsrivas 0 May 2 12:43 D_my-new-bucket_09v6V6FcjyJ17hNJmFgysOjxtLAAX2n_1M
    -rw-r--r--. 1 prsrivas prsrivas 1048576 May 2 12:43 D_my-new-bucket_09v6V6FcjyJ17hNJmFgysOjxtLAAX2n_1M_0_1048576

  4. Check if get-object works:
    aws s3api get-object --bucket my-new-bucket --key 1M --endpoint-url http://localhost:8000 --region us-east-1 ./1M-out

  5. Check md5 of the objects, to ensure that the contents are as expected:
    md5sum ./1M ./1M-out

  6. Now wait for the cleaning process to kick in (roughly after rgw_d4n_cache_cleaning_interval which is 600 seconds)

  7. Check d4n datacache contents - all dirty entries are converted to non-dirty now (D prefix removed), which means they have been written to backend store
    -rw-r--r--. 1 prsrivas prsrivas 0 May 2 12:43 my-new-bucket_09v6V6FcjyJ17hNJmFgysOjxtLAAX2n_1M
    -rw-r--r--. 1 prsrivas prsrivas 1048576 May 2 12:43 my-new-bucket_09v6V6FcjyJ17hNJmFgysOjxtLAAX2n_1M_0_1048576

  8. Check get-object now
    aws s3api get-object --bucket my-new-bucket --key 1M --endpoint-url http://localhost:8000 --region us-east-1 ./1M-cache

  9. Check md5 of the objects, to ensure that the contents are as expected:
    md5sum ./1M ./1M-cache

The following steps are for uploading/downloading an object to/from a versioned bucket:

  1. Create a bucket:
    aws s3 mb s3://my-new-bucket --endpoint-url http://localhost:8000 --region us-east-1

  2. Enable versioning on the bucket:
    aws s3api put-bucket-versioning --bucket my-new-bucket --versioning-configuration Status=Enabled --endpoint-url http://localhost:8000 --region us-east-1

  3. Upload an object:
    aws s3 cp ./1M s3://my-new-bucket --endpoint-url http://localhost:8000 --region us-east-1

  4. Check d4n datacache contents for the head and data entries

  5. Check get-object without specifying version-id
    aws s3api get-object --bucket my-new-bucket --key 1M --endpoint-url http://localhost:8000 --region us-east-1 ./1M-out

  6. Check get-object by specifying a version-id
    aws s3api get-object --bucket my-new-bucket --key 1M --version-id "09v6V6FcjyJ17hNJmFgysOjxtLAAX2n" --endpoint-url http://localhost:8000 --region us-east-1 ./1M-out

  7. Now wait for the cleaning process to kick in (roughly after rgw_d4n_cache_cleaning_interval which is 600 seconds)

  8. Check d4n datacache contents - all dirty entries are converted to non-dirty now (D prefix removed), which means they have been written to backend store

  9. Now check get-object with and without version-id as in step 6 and 7.

Testing steps for copy object:
when both source and destination buckets are non-versioned

  1. aws s3 mb s3://my-new-bucket --endpoint-url http://localhost:8000 --region us-east-1
  2. aws s3 mb s3://my-bucket --endpoint-url http://localhost:8000 --region us-east-1
  3. aws s3 cp ./1M s3://my-new-bucket --endpoint-url http://localhost:8000 --region us-east-1
  4. aws s3api get-object --bucket my-new-bucket --key 1M --endpoint-url http://localhost:8000 --region us-east-1 ./1M-out
  5. aws s3api copy-object --bucket my-bucket --copy-source my-new-bucket/1M --key 1M-copy --endpoint-url http://localhost:8000 --region us-east-1
  6. check cache contents using ls -l rgw_d4n_datacache/
  7. aws s3api get-object --bucket my-bucket --key 1M-copy --endpoint-url http://localhost:8000 --region us-east-1 ./1M-copy-out
  8. compare md5 of both 1M and 1M-copy-out
  9. Wait for cleaning to kick in
  10. Call get-object like step 7. to check if the object is fetched correctly from the backend store.

when source bucket is versioned

  1. aws s3 mb s3://my-new-bucket --endpoint-url http://localhost:8000 --region us-east-1
  2. aws s3api put-bucket-versioning --bucket my-new-bucket --versioning-configuration Status=Enabled --endpoint-url http://localhost:8000 --region us-east-1
  3. aws s3 cp ./1M s3://my-new-bucket --endpoint-url http://localhost:8000 --region us-east-1
  4. aws s3 mb s3://my-bucket --endpoint-url http://localhost:8000 --region us-east-1
  5. aws s3api copy-object --bucket my-bucket --copy-source my-new-bucket/1M --key 1M-latest --endpoint-url http://localhost:8000 --region us-east-1
  6. aws s3api get-object --bucket my-bucket --key 1M-latest --endpoint-url http://localhost:8000 --region us-east-1 ./1M-latest-out
  7. md5sum ./1M ./1M-latest-out
  8. wait for cleaning process to kick in
  9. call get-object like step 6. to check if the object is read from the backend store correctly.

when destination bucket is versioned:

  1. aws s3 mb s3://my-new-bucket --endpoint-url http://localhost:8000 --region us-east-1
  2. aws s3 mb s3://my-bucket --endpoint-url http://localhost:8000 --region us-east-1
  3. aws s3api put-bucket-versioning --bucket my-bucket --versioning-configuration Status=Enabled --endpoint-url http://localhost:8000 --region us-east-1
  4. aws s3 cp ./1M s3://my-new-bucket --endpoint-url http://localhost:8000 --region us-east-1
  5. aws s3api copy-object --bucket my-bucket --copy-source my-new-bucket/1M --key 1M-latest --endpoint-url http://localhost:8000 --region us-east-1
  6. aws s3api get-object --bucket my-bucket --key 1M-latest --endpoint-url http://localhost:8000 --region us-east-1 ./1M-latest-out
  7. wait for cleaning process to kick in
  8. call get-object like step 6. to check if object is correctly read from the backend store.

Things that do NOT work:

  1. list-object-versions does not work
  2. delete object

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

Copy link

github-actions bot commented Apr 4, 2024

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

pritha-srivastava and others added 3 commits April 22, 2024 13:14
modifications in ReadOp::prepare() method of the d4n filter driver
to cache the head object.

modification in get_obj_attrs to read from cache or backend store.

Signed-off-by: Pritha Srivastava <prsrivas@redhat.com>
Signed-off-by: Samarah <samarah.uriarte@ibm.com>
Signed-off-by: mosayyebzadeh <mosayyeb@bu.edu>
Signed-off-by: mosayyebzadeh <mosayyeb@bu.edu>
Signed-off-by: mosayyebzadeh <mosayyeb@bu.edu>
Signed-off-by: mosayyebzadeh <mosayyeb@bu.edu>
Read process needs to be updated based on write process. It needs to check where is the data and if it is dirty or clean.
If it is in the cache and dirty, we need to put D_ in the oid of the object before reading it from cache.
If it is clean, there is nothing to do.

Signed-off-by: mosayyebzadeh <mosayyeb@bu.edu>
Signed-off-by: mosayyebzadeh <mosayyeb@bu.edu>
Signed-off-by: Pritha Srivastava <prsrivas@redhat.com>
process.

Signed-off-by: Pritha Srivastava <prsrivas@redhat.com>
Signed-off-by: Pritha Srivastava <prsrivas@redhat.com>
which has objects ordered by their creation time and the top
element of which is fetched in the cleaning method, processed
and deleted in a loop.

Signed-off-by: Pritha Srivastava <prsrivas@redhat.com>
bucket_name_version_object_name_ofs_len, to avoid checks
for versioned and non-versioned objects.

Signed-off-by: Pritha Srivastava <prsrivas@redhat.com>
and delete_obj_attrs() to check if the head object exists in a cache,
else direct the calls to backend store.

Signed-off-by: Pritha Srivastava <prsrivas@redhat.com>
while writing the object.

Signed-off-by: Pritha Srivastava <prsrivas@redhat.com>
@samarahu samarahu closed this May 7, 2024
RGWRados, in case ReadOp::prepare() reads the head object from
the cache.

Signed-off-by: Pritha Srivastava <prsrivas@redhat.com>
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

pritha-srivastava and others added 16 commits May 20, 2024 15:10
1. storing objects in directory using their oid, so that the version
is included.
2. making sure that the head block corresponds to latest
version in the block directory.
3. add a directory entry for head block for every version
in case of a versioned bucket.
4. Populating hostsList correctly for blocks and objects.

Signed-off-by: Pritha Srivastava <prsrivas@redhat.com>
Signed-off-by: Samarah <samarah.uriarte@ibm.com>
Signed-off-by: Samarah <samarah.uriarte@ibm.com>
Signed-off-by: Samarah <samarah.uriarte@ibm.com>
Signed-off-by: Samarah <samarah.uriarte@ibm.com>
Signed-off-by: Samarah <samarah.uriarte@ibm.com>
…cript

Signed-off-by: Samarah <samarah.uriarte@ibm.com>
data handling and faster completion

Signed-off-by: Samarah <samarah.uriarte@ibm.com>
Signed-off-by: Samarah <samarah.uriarte@ibm.com>
Signed-off-by: Samarah <samarah.uriarte@ibm.com>
Signed-off-by: Samarah <samarah.uriarte@ibm.com>
Signed-off-by: Samarah <samarah.uriarte@ibm.com>
Signed-off-by: Samarah <samarah.uriarte@ibm.com>
…sistent values, and fix directory updates in `cleanup` method

Signed-off-by: Samarah <samarah.uriarte@ibm.com>
Signed-off-by: Pritha Srivastava <prsrivas@redhat.com>
… (LFUDA).

Signed-off-by: Pritha Srivastava <prsrivas@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants