Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable/Disable profiling doesn't work consistently #405

Closed
dahorak opened this issue Sep 11, 2017 · 13 comments
Closed

Enable/Disable profiling doesn't work consistently #405

dahorak opened this issue Sep 11, 2017 · 13 comments

Comments

@dahorak
Copy link

dahorak commented Sep 11, 2017

If I'll import Gluster cluster with two Volumes, the Volume Profiling is properly enabled (accordingly to the default value for respective configuration check-box in the import wizard of Tendrl web UI.
But later when I'll switch the profiling off and on multiple times (and wait some time between each step), the profiling state in Gluster is not changed (started or stopped).

I'm not sure about 100% reproduction scenario, but I was able to reproduce it multiple times generally with following steps:

  1. Prepare Gluster storage pool (Gluster cluster).
  2. Create two Gluster volumes, enable Volume profiling on one volume.
  3. Install Tendrl server and tendrl-node-agents.
  4. Import Gluster cluster into Tendrl.
  5. Check that Volume Profiling is enabled on both volumes (gluster volume profile <VOLUME_NAME> info).
  6. Disable Profiling (via Tendrl web UI).
  7. Wait a moment and check that Volume Profiling is disabled on both volumes (gluster volume profile <VOLUME_NAME> info).
  8. Enable Profiling (via Tendrl web UI).
  9. Wait a moment and check that Volume Profiling is enabled on both volumes (gluster volume profile <VOLUME_NAME> info).
  10. Repeat steps 6. - 9. multiple times.
  11. Create new Volume.
  12. Wait a moment and check that Volume Profiling is enabled also on the newly created volume (gluster volume profile <VOLUME_NAME> info).
  13. Prospectively repeat again steps 6. - 9. multiple times.

Suddenly, the volume profiling state visible in Tendrl web UI for particular cluster is correct, but the real state from gluster volume profile command is wrong.

Packages:

tendrl-api-1.5.1-1.el7.centos.noarch
tendrl-api-httpd-1.5.1-1.el7.centos.noarch
tendrl-commons-1.5.1-1.el7.centos.noarch
tendrl-monitoring-integration-1.5.1-1.el7.centos.noarch
tendrl-node-agent-1.5.1-1.el7.centos.noarch
tendrl-ui-1.5.1-1.el7.centos.noarch

tendrl-commons-1.5.1-1.el7.centos.noarch
tendrl-gluster-integration-1.5.1-1.el7.centos.noarch
tendrl-node-agent-1.5.1-1.el7.centos.noarch

I'll try to debug it more and prospectively find more straightforward reproduction scenario.
Where I should look for some errors?

@dahorak
Copy link
Author

dahorak commented Sep 14, 2017

Also it seems, that if you will create Gluster cluster with one Volume with enabled profiling. And then you will import the cluster into Tendrl with disabled Volume profiling (unchecked the Enable Volume profiling. check-box on Import Cluster wizard), volume profiling is not disabled during cluster import.

@shtripat
Copy link
Member

@dahorak #427 would take care of the scenario if for few volumes profiling is enabled before import cluster. Based on option selected while import cluster it should enable/disable profiling for all the underlying volumes.

Regarding back to back enable/disable profiling for an imported cluster, we should be cognizant about the fact that once we set the flag for the cluster, the next sync cycle of gluster-integration takes care for the underlying volumes. Just for the sake enable / disable back to back might not be a valid scenario I feel personally. As long as while sync if gluster-integration sees the value as enabled/disabled and accordingly takes care for the volumes is good enough.

@Tendrl/tendrl-core comments?

@r0h4n
Copy link
Contributor

r0h4n commented Sep 21, 2017

Volume profiling is handled by tendrl-gluster-integration and is not part of import cluster. Import cluster merely communicates the user's choice (which is enable or disable profiling).

I think we can close this issue based on verification of #427

@nnDarshan
Copy link
Contributor

#427 seems to be merged. I guess @dahorak can verify this can close.
@shtripat Pls confirm

@shtripat
Copy link
Member

@nnDarshan ack. @dahorak can verify the changes and close this.

@dahorak
Copy link
Author

dahorak commented Sep 26, 2017

@shtripat what is the default sync cycle interval?
My understanding is, that if I'll switch the profiling on or off multiple time and wait between the steps longer than is the sync cycle interval, it should be valid scenario and it should reflect the last selected state, right?

@shtripat
Copy link
Member

@dahorak yes. I remember now default sync interval is 180 secs.

@mkudlej
Copy link

mkudlej commented Oct 9, 2017

@dahorak Is this verified?

@dahorak
Copy link
Author

dahorak commented Oct 9, 2017

@shtripat Why is this closed? I haven't time to verify it deeply yet, but I'm afraid that it is still broken - I'll try to check it today and will update this issue.

@dahorak
Copy link
Author

dahorak commented Oct 9, 2017

@shtripat First quick test - try to import Gluster cluster into Tendrl with Disabled Volume Profiling.

Once the cluster is imported, profiling should be disabled on all volumes, but it is enabled, so the checkbox Enable Volume profiling on Import Cluster page doesn't work.

Gluster volume profiling status before importing the cluster into Tendrl:

# gluster volume profile volume_usmqe_alpha_distrep_4x2 info
Profile on Volume volume_usmqe_alpha_distrep_4x2 is not started

Import process:
import_cluster-disabled_volume_profiling

POST data send by browser when clicking to Import button (captured by Firefox Developer Tools):

{"Cluster.enable_volume_profiling":"no"}

Gluster volume profiling status after cluster imported into Tendrl:

# gluster volume profile volume_usmqe_alpha_distrep_4x2 info
Brick: gl1.example.com:/mnt/brick_usmqe_alpha_distrep_1/1
---------------------------------------------------------------------------------------
Cumulative Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us             12  RELEASEDIR
 
    Duration: 1622 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Interval 2 Stats:
 
    Duration: 9 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Brick: gl1.example.com:/mnt/brick_usmqe_alpha_distrep_2/2
---------------------------------------------------------------------------------------
Cumulative Stats:
 %-latency   Avg-latency   Min-Latency   Max-Latency   No. of calls         Fop
 ---------   -----------   -----------   -----------   ------------        ----
      0.00       0.00 us       0.00 us       0.00 us             12  RELEASEDIR
 
    Duration: 1622 seconds
   Data Read: 0 bytes
Data Written: 0 bytes
 
Interval 2 Stats:
 
    Duration: 9 seconds
   Data Read: 0 bytes
Data Written: 0 bytes

<<truncated>> 

And status in Tendrl:
imported_cluster-volume_profiling_enabled

I'll try to test also additional scenarios and report potential issues here, so please reopen this issue and left it open, till all the issues will be fixed and verified.

@shtripat
Copy link
Member

shtripat commented Oct 9, 2017

@dahorak can you check what is the value of enable_volume_profiling for set for the cluster in etcd. If that value is set as yes in etcd the gluster-integrtaion blindly would enable for all the volumes of the cluster.

If cluster level value is set as yes while the flag was un-set before import then its a separate issue with import cluster flow.

@dahorak
Copy link
Author

dahorak commented Oct 9, 2017

@shtripat you are right, the value in etcd is incorrect:

# etcdctl --endpoints http://IP:2379 get /clusters/cf700235-9225-4a04-8614-857339852268/enable_volume_profiling
yes

Did I understand it correctly, that this is issue for api?

@shtripat
Copy link
Member

shtripat commented Oct 9, 2017

@dahorak so it was found to be a typo error in UI. @cloudbehl is sending a PR to fix the same.

cloudbehl added a commit to Tendrl/ui that referenced this issue Oct 9, 2017
cloudbehl added a commit to Tendrl/ui that referenced this issue Oct 10, 2017
cloudbehl added a commit to Tendrl/ui that referenced this issue Oct 10, 2017
cloudbehl added a commit to Tendrl/ui that referenced this issue Oct 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants