GitHub - SpandanKumarSahu/Ceph

Branches Tags
Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
GSoCProposalSMARTERREWEIGHT-BY-UTILIZATION.pdf		GSoCProposalSMARTERREWEIGHT-BY-UTILIZATION.pdf
Readme		Readme
Repository files navigation

Hi everyone

Here is a simpler algorithm that you can look into. This is based on a PID-based feedback mechanism. [4]
I have included this in my GSoC proposal for 'ceph-mgr: Smarter reweight-by-utilisation'.

I will demonstrate the idea with an example.

1. Consider we've 5 OSDs with weights, having weights [10,10,10,10,1].
   PS : We keep the assigned weights as it is, and make a copy of it, which will be the actual weights we will be working with, and we use the normalised version of it. So, we maintain target_weight_distribution as [(10/41),(10/41),(10/41),(10/41),(1/41)], until the weights are changed.

2. At each 'step' or 'draw' we maintain:

   We need to keep the assigned weight so as to know what distribution we need to have, at all times.
   a) current_load_distribution, which is [0,0,0,0,0] initially.
   b) current_error in load distribution, which is [(10/41),(10/41),(10/41),(10/41),(1/41)] initially
   c) derivative_error in load distribution, which is [0,0,0,0,0] initially.
   d) integral_error in load distribution, which is [0,0,0,0,0] initially.
   e) current_weight_distribution, which is [(10/41),(10/41),(10/41),(10/41),(1/41)] initially

3. We have certain constants (which I shall explain later, how to determine), named Kp, Ki and Kd which will be the coefficients of current_error, integral_error and derivative_error, in the expression:

   total_error=(Kp*current_error)+(Ki*integral_error)+(Kd*derivative_error)

4. At each 'step' or 'draw':

   current_error =  current_load_distribution - target_weight_distribution
   derivative_error = current_error - previous_error
   integral_error = integral_error + current_error

   total_error = (Kp*current_error)+(Ki*integral_error)+(Kd*derivative_error)

   previous_error = current_error

   Now, since, imbalance for an OSD with greater weight is more significant than the one with lower weight, we need to scale them to their target_weight_distribution.

   So, total_error = (total_error).(target_weight_distribution)
       		   // This is simply multiplying the transpose of the vector target_weight_distribution with total_error.

   Now, we need to re-assign the weights as :
   current_weight_distribution = current_weight_distribution - total_error

   Then we normalise current_weight distribution. And reweight with this.

As an example:
   Let us assume, that the PID_Tuner function (explained below), returned
   Kp=0.08, Ki=0.01, and Kd=0.01

   I will be henceforth, using normalised weights.

   Initially, target_weight_distribution : [0.2439, 0.2439, 0.2439, 0.2439, 0.0244]
   At the first draw :

   Probability of choosing OSD 'a' = 0.2439
   After choosing 'a' :
   current_load_distribuion : [1,0,0,0,0]

   current_error : [0.7561, -0.2439, -0.2439, -0.2439, -0.0244]
   integral_error : [0.7561, -0.2439, -0.2439, -0.2439, -0.0244]
   derivative_error: [0.7561, -0.2439, -0.2439, -0.2439, -0.0244]

   total_error=[0.0756, -0.0244, -0.0244, -0.0244, -0.0024]

   total_error= total_error.(target_weight_distribution)
   		= [0.0184,-0.0059,-0.0059,-0.0059,-0.0006]

   current_weight_distribution =  [0.2439, 0.2439, 0.2439, 0.2439, 0.0244] -  [0.0184,-0.0059,-0.0059,-0.0059,-0.0006]
   			       =  [0.2255, 0.2498, 0.2498, 0.2498, 0.0250]
			       =  [0.2255, 0.2498, 0.2498, 0.2498, 0.0250] // After normalisation


   Probability of choosing OSD 'e' = 0.0244
   After choosing 'e' :
   current_load_distribuion : [0,0,0,0,1]

   current_error : [-0.2439, -0.2439, -0.2439, -0.2439, 0.9756]
   integral_error : [-0.2439, -0.2439, -0.2439, -0.2439, 0.9756]
   derivative_error: [-0.2439, -0.2439, -0.2439, -0.2439, 0.9756]

   total_error = [-0.0244, -0.0244, -0.0244, -0.0244, 0.0975]

   total_error= total_error.(target_weight_distribution)
   		= [-0.0059,-0.0059,-0.0059,-0.0059,0.0023]

   current_weight_distribution =  [0.2439, 0.2439, 0.2439, 0.2439, 0.0244] - [-0.0059,-0.0059,-0.0059,-0.0059,0.0023]
   			       =  [0.2498, 0.2498, 0.2498, 0.2498, 0.0221]
			       =  [0.2498, 0.2498, 0.2498, 0.2498, 0.0221] // After normalisation

  Now, let us calculate the probablility of
  getting OSD 'e' = (0.0244) + (0.0236) = 0.0480
  getting OSD 'a' = (0.2439) + (0.2360) + 0.0006 = 0.4805

  We see that the expected ratio is maintained. I can also show for higher degrees of num_rep.

Significance of each term :
a) current_error :
  This is the prime reason, we need to change the weight distribution, i.e. with the current_weight_distribution, we can't achieve the target_load_distribution.
b) integral_error :
  This is to ensure, if a device hasn't been chosen for a long time, this factor will increase the probability that it should have higher chances of getting selected in the subsequent turns.
  It is accounting for the differences in load distributions in the past.
c) derivative_error :
  This reduces the probability of the same device getting selected in successive fashion, more frequently than it should be.
  It is accounting for the possible differences in load distributions in future.

Reasons why a PID-based feedback system should work :
This algorithm rests on the principle that the values of Kp, Ki and Kd are dependent only the CRUSH algorithm, and hence are system independent.
This is explained by
" Given a single integer input value x, CRUSH will output
  an ordered list R of n distinct storage targets. CRUSH utilizes
  a strong multi-input integer hash function whose inputs
  include x, making the mapping completely deterministic and
  independently calculable using only the cluster map, placement
  rules, and x. " [2]
  -- CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data by
  Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Carlos Maltzahn (See Reference)
The value of Kp, Ki and Kd in a PID depends on the system or in better words, depends on "who takes the actions".
Since it is the CRUSH algorithm who is responsible for the selection of devices and CRUSH is same for all the systems, therefore it becomes system independent.


Now why I vouch for this solution :
    1. This is dynamic. Dynamic in the sense that, it takes into account, the decisions the CRUSH has made in the past and prevent certain selections in the future.
       So, if the lower weighted device hasn't been selected for a long time, the integral_error for that device will keep on increasing, and it will increase the chances of it getting selected in the subsequent trials.
       Similarly, if the lower weighted device is once selected, it's weight is significantly reduced, till sufficient turns.
       This ensures that highly improbable situations where distribution is skewed to lower weighted devices.

    2. We can convert the normalised distribution, back to integer weights (with some negligible error) and vice-versa.

    3. All calculations are simple. Though, the distribution of objects in OSDs can be viewed as a Gaussian distribution, with mean proportional to the weight of the device, and standard deviation proportional to the error in distribution that can be allowed for, but the calculations in that case would be very heavy and time consuming.

    4. Though it has been explained for OSDs in buckets, this can be similarly and recursively applied to tree buckets.
       I am not sure, as how to apply for Straw buckets though.

    5. It can account for any sudden change in the distribution.
       For example, if a device is down, the change in the object distribution can occur without any rebalance algorithm.
       Though, to attain the desired distribution, we might have to wait some turns. This might not be such a great feature of this.


Motivation :

This PID-based feedback algorithm, has been used in several areas, especially, in cases of linear systems.
This is a linear system, because the the current_load_distribution is theoretically a linear function of the target_weight_distribution
My personal stint with such algorithms has been in multiple cases, where I had to 'hold on' to a particular value, despite errors and noises.
That is the tasks demanded for the system to be balanced, quickly, if disturbed.


Now the most important part :

How to find the values of Kp, Ki and Kd :

Though we can manually find the values of these, but for a large system, it is very tiresome.

There are existing PID_tuners (can be write one from the scratch too, I have included this in the proposal too), that need us to define a cost function, that we need to minimise. In this case, it would be current_error,i.e. error in load distribution.
They use PID tuning algorithms, based on Machine Learning concepts (more specifically, gradient descent).
We need to provide data (controls: current_weight_distribution, iteration_number output: current_load_distribution) and the model would train upon it.
A simple PID_Tuner was developed by my friend, for the Swarm Project, an autonomous, distributed robotics project.
Though he has used Matlab for the same, converting to C++ shall not be much pain. Here[1] is a link for it.

There are also many PID auto tuning algorithms, like Ziegler-Nichols, which can auto-tune the values of Kp, Ki and Kd, as the system works.
There is however a much simpler implementation. [3]
Tuning control algorithms is, however, tricky in that there is no one single best solution. You can tune a controller for faster rise time, or less overshoot or various other objectives.
What we could do initialise the values from the output of a trained/semi-trained system, and use the auto-tune at each step, for best results.

In either case, we would be needing data, and the more the data, the better. For this, I propose to write a test cases generator, that simply iterates and writes over the data on to a file, using the test cluster.

References :
[1] : https://github.com/ManashRaja/learnPID
[2] : http://www.ssrc.ucsc.edu/papers/weil-sc06.pdf
[3] : https://github.com/br3ttb/Arduino-PID-AutoTune-Library/tree/master/PID_AutoTune_v0
[4] : https://en.wikipedia.org/wiki/PID_controller

P.S. : I have also proposed in my GSoC proposal to design/give a detailed mathematical proof of it, for the sake of records and justification.