Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wip temperature based object eviction for cache tiering #4737

Merged
merged 5 commits into from Nov 13, 2015
Merged

Wip temperature based object eviction for cache tiering #4737

merged 5 commits into from Nov 13, 2015

Conversation

dragonylffly
Copy link
Contributor

This is an implementation of temperature based object eviction policy for cache tiering. The current eviction policy is based only on the latest access time, without considering the access frequency in the history. This policy is apt to leave the just-accessed object in the cache pool,
even it will no longer be accessed. Obviously, this policy will make bad choices in some scenarios. This motivates us to present temperature based object eviction policy, which makes eviction decision by considering both latest access time and access frequency, while keeping the simplicity of the current algorithm framework, and making minimal revision to it. The algorithm is simple, associate each hitset with a weight, the latest hitset with the heaviest weight, say, 1000000. And the least latest hitset with a lighter weight, decayed by an user-defined factor, say, 50, then the least latest hitset has a weight of 1000000*(100-50)/100=500000, and so on. Each object in the cache pool will be calculated a total weight according to its appearance in all the hitsets, then those objects with the least total weights are chosen as the eviction candidates. In fact, the algorithm reduces to the current atime based eviction policy by setting decay rate be 100.

@yuyuyu101
Copy link
Member

Looks cool but why not to implement ARC(https://www.usenix.org/conference/fast-03/arc-self-tuning-low-overhead-replacement-cache) strategy which suits your requirement.

@liewegas
Copy link
Member

I had a hard time figuring out how to use both atime and temperature--simply choosing which makes sense. I wonder, though, if this is too restrictive. If we calculate both atime and temp values, we can build distributions for both and upper/lower values for each. Here we just pick which one to look at. Instead, we could make the final value we use to decide a weighted average of the two. Is that too complex?

The other thing I worry about is that the temp calc is more expensive (looks at all hitsets, not just the last few), so we should avoid doing it if the value won't get used.

@dragonylffly
Copy link
Contributor Author

Looks cool but why not to implement ARC(https://www.usenix.org/conference/fast-03/arc-self-tuning-low-overhead-replacement-cache) strategy which suits your requirement.

The paper claims the algorithm is very good. However, i am suggesting before we try to introduce some academic outcome, we had better to hear where it has been used, especially in production systems. The paper has been published over ten years anyway.

@dragonylffly
Copy link
Contributor Author

I had a hard time figuring out how to use both atime and temperature--simply choosing which makes sense. I wonder, though, if this is too restrictive. If we calculate both atime and temp values, we can build distributions for both and upper/lower values for each. Here we just pick which one to look at. Instead, we could make the final value we use to decide a weighted average of the two. Is that too complex?

The other thing I worry about is that the temp calc is more expensive (looks at all hitsets, not just the last few), so we should avoid doing it if the value won't get used.

For atime, it would make mistake if the object is accessed only once, while enjoying the low complexity; For temperature, it wanna avoid the mistake by looking further back into the history, which, of course, incurring more complexity. So, a straightforward thought, could we make a tradeoff, that is, judge by the last N appreances, that is, a object is calculating its temprature by judging its last N apperances in hitsets. We introduce another parameter, represent N.
by default, N=1, then the algorithm reduces exactly to the current atime based policy; with N=hitset_num, the algirthm is equal to the current form we submmited . Then we could directly replace the current atime based policy by this last-N based policy. What do you think?

@liewegas
Copy link
Member

@dragonylffly The last N thing makes sense to me. What do you think about a blend of atime and temperature? I'm thinking that a good default policy would take both into consideration...

@dragonylffly
Copy link
Contributor Author

Updated, please review

if (grade_table.size() <= i)
return 0;
return grade_table[i];
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems incomplete. It seems like we need a calc_grade_table() method that is called from the end of decode() and the constructor, and drop the set_grade() callers from random other places (like when the pool is created).

It's a bit dangerous because the parameters are not private... we could change that too so that anybody modifying the parameters necessarily recalculates the table.

@LiumxNL
Copy link
Contributor

LiumxNL commented Jun 12, 2015

Updated, please review

@liewegas
Copy link
Member

Aside from my whitespace nit, my only concern here is that the monitor is setting the grade parameters but the table will remain stable. It all works because only the OSD uses the grades and it never modifies the parameters, but as a standalone data type this is asking for trouble.

Can you make those two new fields are private, and add get_foo() and set_foo() accessors? Then the set_ ones can call calc.

Thanks!

@LiumxNL
Copy link
Contributor

LiumxNL commented Jul 13, 2015

@liewegas Sorry, I donot quite get your point. The grade_table[] is calculated by 'decay_rate' and 'hitset_count'. When the Monitor sets the values of the two variables, it seems we do not need update grade_table[] immediately, instead, we do it in pg_pool_t::decode() logic when the OSD parses the updated OSDMAP. We could make 'decay_rate' private ( while 'hitset_count', was there long before, is public), intoduce a set() and update grade_table[] in it. But we need distinguish the situation that when Monitor sets the value of 'decay_rate', it should not update the table, it is not called unconditonally . In our current implementation, we make grade_table[] private and update it in pg_pool_t::decode(). It seems ok to us, do we miss anything?

@dragonylffly
Copy link
Contributor Author

@liewegas Rebased on master, please review the updated codes

@liewegas
Copy link
Member

I think this looks pretty reasonable. I have two questions:

  • how confident are we that it's a net improvement? we're shooting blind a bit here.
  • we're talking about an alternative approach of maintaining a full in-memory LRU on the list. if we go down that path, the hitset approach will probably eventually be dropped.. unless we want to keep it around for low-memory environments?

@ghost
Copy link

ghost commented Sep 1, 2015

@LiumxNL @dragonylffly this needs rebasing

@LiumxNL
Copy link
Contributor

LiumxNL commented Sep 2, 2015

updated :)

MingXin Liu added 5 commits November 11, 2015 14:52
Signed-off-by: MingXin Liu <mingxinliu@ubuntukylin.com>
Reviewed-by: Li Wang <liwang@ubuntukylin.com>
Signed-off-by: MingXin Liu <mingxinliu@ubuntukylin.com>
Reviewed-by: Li Wang <liwang@ubuntukylin.com>
Signed-off-by: MingXin Liu <mingxinliu@ubuntukylin.com>
Reviewed-by: Li Wang <liwang@ubuntukylin.com>
Signed-off-by: MingXin Liu <mingxinliu@ubuntukylin.com>
Reviewed-by: Li Wang <liwang@ubuntukylin.com>
Signed-off-by: MingXin Liu <mingxinliu@ubuntukylin.com>
Reviewed-by: Li Wang <liwang@ubuntukylin.com>
liewegas added a commit that referenced this pull request Nov 13, 2015
…tion

osd: improve temperature calculation for cache tier agent

Reviewed-by: Sage Weil
@liewegas liewegas merged commit 8d3082d into ceph:master Nov 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants