Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache on the barrelhead #19581

Merged
merged 6 commits into from Jan 2, 2018

Conversation

adamemerson
Copy link
Contributor

RGW Cache improvements

@@ -297,7 +297,7 @@ int rgw_get_user_info_from_index(RGWRados * const store,
list<rgw_cache_entry_info *> cache_info_entries;
cache_info_entries.push_back(&cache_info);

uinfo_cache.put(store, key, &e, cache_info_entries);
uinfo_cache.put(store, key, &e, { &cache_info });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like the cache_info_entries list above is no longer used

@adamemerson adamemerson force-pushed the wip-cache-on-the-barrelhead branch 2 times, most recently from 84906b8 to b3249c3 Compare December 19, 2017 17:57
@mattbenjamin
Copy link
Contributor

@adamemerson promising

@adamemerson
Copy link
Contributor Author

Jenkins, retest this please.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Do not use std::list for the LRU.

And really don't cons up a std::list just to pass a variable number of
arguments to a function. (Use initializer_list instead.)

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
This should get us better look up speeds.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Rename it to rgw_cache_expiry_interval, to be specific. It already
controls the user cache, and will expire objects in the cache onto
which those two are chained.

Fixes: http://tracker.ceph.com/issues/22517
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
We had it in the chained caches, but it doesn't do much good if
they just fetch objects out of the object cache.

Fixes: http://tracker.ceph.com/issues/22517
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Now when we force a refetch of bucket info it will actually go to the
OSD rather than simply using the objects in the object cache.

Fixes: http://tracker.ceph.com/issues/22517
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
@mattbenjamin
Copy link
Contributor

@mattbenjamin
Copy link
Contributor

2017-12-21T21:49:25.384 INFO:tasks.rgw_multisite_tests:tasks.rgw_multi.tests.test_bucket_index_log_trim ... ok
2017-12-21T21:49:25.385 INFO:tasks.rgw_multisite_tests:
2017-12-21T21:49:25.385 INFO:tasks.rgw_multisite_tests:======================================================================
2017-12-21T21:49:25.385 INFO:tasks.rgw_multisite_tests:FAIL: tasks.rgw_multi.tests.test_bucket_sync_disable_enable
2017-12-21T21:49:25.385 INFO:tasks.rgw_multisite_tests:----------------------------------------------------------------------
2017-12-21T21:49:25.385 INFO:tasks.rgw_multisite_tests:Traceback (most recent call last):
2017-12-21T21:49:25.385 INFO:tasks.rgw_multisite_tests: File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/virtualenv/local/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
2017-12-21T21:49:25.385 INFO:tasks.rgw_multisite_tests: self.test(*self.arg)
2017-12-21T21:49:25.385 INFO:tasks.rgw_multisite_tests: File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-rgw-22517/qa/tasks/rgw_multi/tests.py", line 1004, in test_bucket_sync_disable_enable
2017-12-21T21:49:25.385 INFO:tasks.rgw_multisite_tests: zonegroup_bucket_checkpoint(zonegroup_conns, bucket_name)
2017-12-21T21:49:25.385 INFO:tasks.rgw_multisite_tests: File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-rgw-22517/qa/tasks/rgw_multi/tests.py", line 398, in zonegroup_bucket_checkpoint
2017-12-21T21:49:25.385 INFO:tasks.rgw_multisite_tests: zone_bucket_checkpoint(target_conn.zone, source_conn.zone, bucket_name)
2017-12-21T21:49:25.386 INFO:tasks.rgw_multisite_tests: File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-rgw-22517/qa/tasks/rgw_multi/tests.py", line 391, in zone_bucket_checkpoint
2017-12-21T21:49:25.386 INFO:tasks.rgw_multisite_tests: (target_zone.name, source_zone.name, bucket_name)
2017-12-21T21:49:25.386 INFO:tasks.rgw_multisite_tests:AssertionError: finished bucket checkpoint for target_zone=test-zone1 source_zone=test-zone2 bucket=wyhalv-53
2017-12-21T21:49:25.386 INFO:tasks.rgw_multisite_tests:
2017-12-21T21:49:25.386 INFO:tasks.rgw_multisite_tests:----------------------------------------------------------------------
2017-12-21T21:49:25.386 INFO:tasks.rgw_multisite_tests:Ran 21 tests in 2924.258s
2017-12-21T21:49:25.386 INFO:tasks.rgw_multisite_tests:
2017-12-21T21:49:25.386 INFO:tasks.rgw_multisite_tests:FAILED (failures=1)
2017-12-21T21:49:25.386 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/run_tasks.py", line 89, in run_tasks
manager.enter()
File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/task/init.py", line 123, in enter
self.begin()
File "/home/teuthworker/src/github.com_ceph_ceph-c_wip-rgw-22517/qa/tasks/rgw_multisite_tests.py", line 68, in begin
raise RuntimeError('rgw multisite test failures')
RuntimeError: rgw multisite test failures

@mattbenjamin
Copy link
Contributor

@cbodley @oritwas could one of you take a look at the multisite failure (bucket sync enable-disable)--should it block this PR?

@cbodley
Copy link
Contributor

cbodley commented Jan 2, 2018

@cbodley @oritwas could one of you take a look at the multisite failure (bucket sync enable-disable)--should it block this PR?

does not look like a blocker. that same test case passed in the other multisite job

Copy link
Contributor

@mattbenjamin mattbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm sir

@mattbenjamin
Copy link
Contributor

@adamemerson I am getting a merge failure, can you please rebase?

@mattbenjamin mattbenjamin merged commit 05bcf0c into ceph:master Jan 2, 2018
@mattbenjamin
Copy link
Contributor

@adamemerson never mind...

@cbodley
Copy link
Contributor

cbodley commented Jan 3, 2018

@adamemerson now that this merged, i'm seeing some crashes in ObjectCache::touch_lru() when it calls lru.erase(lru_iter);. my best guess is that we're running afoul of the iterator invalidation rules (which are different between list and deque) when storing std::deque<string>::iterator lru_iter in ObjectCacheEntry

@cbodley
Copy link
Contributor

cbodley commented Jan 3, 2018

added http://tracker.ceph.com/issues/22560 with stack trace

@@ -135,8 +134,6 @@ void ObjectCache::put(string& name, ObjectCacheInfo& info, rgw_cache_entry_info
ObjectCacheEntry& entry = iter->second;
ObjectCacheInfo& target = entry.info;

invalidate_lru(entry);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separately, it looks like the removal of this call is causing some failures in s3tests - added back in #19768

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants