Added flows for un-manage cluster #798

shtripat · 2018-01-05T17:54:37Z

tendrl-bug-id: #797

Signed-off-by: Shubhendu shtripat@redhat.com

r0h4n · 2018-01-09T07:45:29Z

tendrl/commons/objects/cluster/atoms/clear_cluster_details/__init__.py

+from tendrl.commons.objects.node import Node
+
+
+class ClearClusterDetails(objects.BaseAtom):


Lets call it Unmanage or Delete cluster

Will change to DeleteClusterDetails

r0h4n · 2018-01-09T07:47:12Z

tendrl/commons/objects/cluster/atoms/clear_cluster_details/__init__.py

+        ).load()
+
+        try:
+            NS._int.client.delete(


Use the NS.tendrl.objects.Cluster

r0h4n · 2018-01-09T07:50:30Z

tendrl/commons/objects/cluster/atoms/clear_cluster_details/__init__.py

+from tendrl.commons.event import Event
+from tendrl.commons import objects
+from tendrl.commons.message import Message
+from tendrl.commons.objects.cluster_tendrl_context \


Dont import objects, use NS.tendrl.objects.ClusterTendrlContext

r0h4n · 2018-01-09T07:57:09Z

tendrl/commons/objects/cluster/atoms/clear_cluster_details/__init__.py

+                recursive=True
+            )
+            NS._int.client.delete(
+                "/indexes/tags/detected_cluster_id_to_integration_id/%s" %


Please create an object called NS.tendrl.objects.Index and use it instead of raw read/write

There are different data stored under /indexes/tags and they are based on node tags as well like /indexes/tags/tendrl/monitor. /indexes/tags/tendrl/node etc. How many such objects we create here?

r0h4n · 2018-01-09T07:58:19Z

tendrl/commons/objects/cluster/atoms/clear_cluster_details/__init__.py

+                node_ids.append(node_id)
+                nc = Node(node_id=node_id).load()
+                node_ips[node_id] = nc.fqdn
+                NS._int.client.delete(


Why do you need to delete /nodes/%node_id ?

I thought we are simply unmanaging the cluster here

so do you mean anything updated by tendrl-node-agent ideally shouldnt be deleted as part of un-manager? If so later as part of forget cluster, we would need to clear all these details. comments?

r0h4n · 2018-01-09T08:21:35Z

tendrl/commons/objects/definition/master.yaml

@@ -111,6 +127,36 @@ namespace.tendrl:
      list: /clusters
      value: clusters/$TendrlContext.integration_id
      atoms:
+        StopNodeServices:


break this down into StopMonitoringServices and StopIntegrationServices

r0h4n · 2018-01-09T08:35:49Z

tendrl/commons/objects/definition/master.yaml

+        - "tendrl/monitor"
+      atoms:
+        - tendrl.objects.Cluster.atoms.StopNodeServices
+        - tendrl.objects.Cluster.atoms.ClearClusterDetails


You can call this DeleteCluster

DeleteClusterDetails

r0h4n · 2018-01-09T08:36:03Z

tendrl/commons/objects/definition/master.yaml

+      atoms:
+        - tendrl.objects.Cluster.atoms.StopNodeServices
+        - tendrl.objects.Cluster.atoms.ClearClusterDetails
+        - tendrl.objects.Cluster.atoms.ClearMonitoringData


DeleteMonitoring

DeleteMonitoringDetails

r0h4n · 2018-01-09T08:37:16Z

tendrl/commons/objects/definition/master.yaml

+          type: Update
+          uuid: 333c3333-3c33-33c3-333c-c33cc3c5555c
+          help: Clear cluster details
+        ClearMonitoringData:


DeleteMonitoring

DeleteMonitoringDetails

r0h4n · 2018-01-09T08:37:27Z

tendrl/commons/objects/definition/master.yaml

+          type: Update
+          uuid: 333c3333-3c33-33c3-333c-c33cc3c4444c
+          help: Stop node services
+        ClearClusterDetails:


DeleteCluster

DeleteClusterDetails

centos-ci · 2018-02-07T05:13:00Z

Can one of the admins verify this patch?

shtripat · 2018-02-07T05:15:49Z

Tested the latest change along with Tendrl/monitoring-integration#317 and it works as expected. The cluster can be un-managed and managed back. Across un-manage and manage back, the integration-id for the cluster remains the same. Also while un-manager towards end there is a notification about cluster state change to unmanaged and location of the archived graphite data for the cluster.

@r0h4n still need to work on unittests for these changes. Will work out a separate PR for the same.

centos-ci · 2018-02-08T05:16:00Z

Can one of the admins verify this patch?

tendrl-bug-id: Tendrl#797 Signed-off-by: Shubhendu <shtripat@redhat.com>

r0h4n · 2018-02-12T07:00:34Z

tendrl/commons/flows/unmanage_cluster/__init__.py

+        _cluster = NS.tendrl.objects.Cluster(
+            integration_id=integration_id
+        ).load()
+        if _cluster.status is not None and \


check if cluster.is_managed = "yes"

Ok will add cluster.is_managed as well

r0h4n · 2018-02-12T08:03:48Z

tendrl/commons/flows/unmanage_cluster/__init__.py

+        ).load()
+        if _cluster.status is not None and \
+            _cluster.status != "" and \
+            _cluster.status in \


I would suggest acquiring lock on "cluster" object

ok. will try that

r0h4n · 2018-02-12T08:37:42Z

tendrl/commons/objects/cluster/atoms/delete_cluster_details/__init__.py

+        # the cluster nodes
+        try:
+            gl_srvr_list = NS._int.client.read(
+                "/indexes/tags/gluster/server"


this list includes all gluster nodes from all clusters. We need to change this tag to /indexes/tags/:integration_id/gluster/server

So actually I am not removing the tag actually here. I am just removing the nodes of the cluster being unmanaged from the list and write the tag back with pending node ids

I think we still need to improve this. read("/indexes/tags/gluster/server") gives you every gluster node irrespective of cluster id

If we remove the nodes of currently unmanaging cluster, it shouldnt matter for other clusters. How you see issue in this? Also what you suggest if not this way?

I would suggest we let the tendrl-gluster-integration clean up its own entry in /indexes/tags/gluster/server

So when you shutdown the tendrl-gluster-integration service during unmanage you need to add this clean up of tags.

The reason I dont want to load /indexes/tags/gluster/server is because it adds overhead of going through each and every gluster node across all clusters

So you mean before stopping tendrl-gluster-integration, we should invoke a flow to clean the tags. So in gluster-integration as well we would need to load the list /indexes/tags/gluster/server and remove it node-id from the list and write it back.

So effectively each node loads this tag from etcd, removes its node-id from the list and writes it back. In existing case we are loading once in server and all the cluster nodes entries are removed from the list and then written back. Here is just one read and write whereas in other case (suggested one) it would no os nodes * (1 read + 1 write)

r0h4n · 2018-02-12T08:38:17Z

tendrl/commons/objects/cluster/atoms/delete_cluster_details/__init__.py

+            gl_srvr_list = NS._int.client.read(
+                "/indexes/tags/gluster/server"
+            ).value
+            gl_srvr_list = ast.literal_eval(gl_srvr_list)


please dont use eval, if this is a json array, use json libs

r0h4n · 2018-02-12T10:24:52Z

tendrl/commons/objects/cluster/atoms/stop_monitoring_services/__init__.py

+                # Create jobs on nodes for stoping services
+                _job_id = str(uuid.uuid4())
+                params = {
+                    "Services[]": ["collectd"]


This needs to be documented, stating that Tendrl will have full control over "collectd" service on the storage nodes, which includes shutting it down

@julienlim @mbukatov @nthomas-redhat whats your take on this

r0h4n · 2018-02-12T10:46:41Z

tendrl/commons/flows/unmanage_cluster/__init__.py

+                    time.sleep(5)
+                    continue
+            _cluster.status = ""
+            _cluster.current_job['status'] = "done"


we have been using "finished" all over Tendrl, lets stick to that instead of "done"

r0h4n · 2018-02-12T10:47:17Z

tendrl/commons/objects/cluster/atoms/configure_monitoring/__init__.py

@@ -157,7 +157,8 @@ def run(self):
        _cluster = NS.tendrl.objects.Cluster(
            integration_id=NS.tendrl_context.integration_id
        ).load()
-        _cluster.import_status = "done"
+        _cluster.status = ""
+        _cluster.current_job['status'] = "done"


we have been using "finished" all over Tendrl, lets stick to that instead of "done"

codecov-io · 2018-02-13T06:34:43Z

Codecov Report

❗ No coverage uploaded for pull request base (master@160b4d4). Click here to learn what that means.
The diff coverage is 88.21%.

@@            Coverage Diff            @@
##             master     #798   +/-   ##
=========================================
  Coverage          ?   77.47%           
=========================================
  Files             ?       91           
  Lines             ?     3481           
  Branches          ?      443           
=========================================
  Hits              ?     2697           
  Misses            ?      718           
  Partials          ?       66

Impacted Files	Coverage Δ
tendrl/commons/utils/event_utils.py	`0% <ø> (ø)`
...cts/cluster/atoms/configure_monitoring/__init__.py	`25% <0%> (ø)`
...jects/cluster/atoms/is_cluster_managed/__init__.py	`100% <100%> (ø)`
...ts/cluster/atoms/set_cluster_unmanaged/__init__.py	`100% <100%> (ø)`
tendrl/commons/objects/cluster/__init__.py	`100% <100%> (ø)`
...luster/atoms/delete_monitoring_details/__init__.py	`72.41% <72.41%> (ø)`
tendrl/commons/flows/import_cluster/__init__.py	`84.9% <75%> (ø)`
...s/cluster/atoms/delete_cluster_details/__init__.py	`84.09% <84.09%> (ø)`
tendrl/commons/flows/unmanage_cluster/__init__.py	`85% <85%> (ø)`
...mmons/objects/node/flows/stop_services/__init__.py	`86.95% <86.95%> (ø)`
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 160b4d4...b168461. Read the comment docs.

r0h4n · 2018-02-14T07:55:40Z

tendrl/commons/utils/event_utils.py

@@ -25,8 +25,7 @@ def emit_event(resource, curr_value, msg, instance,
        plugin_instance=instance,
        message=msg,
        integration_id=integration_id or NS.tendrl_context.integration_id,
-        cluster_name=cluster_name or NS.tendrl_context.cluster_name,
-        sds_name=sds_name or NS.tendrl_context.sds_name,
+        cluster_name=cluster_name or NS.tendrl_context.cluster_name


why have you removed sds_name?

Lets keep it with default as None

As confirmed by @GowthamShanmugam it was not required and sds_name internally is decided using integration_id.

r0h4n · 2018-02-14T08:45:05Z

tendrl/commons/objects/cluster/atoms/delete_cluster_details/__init__.py

+            cluster_tendrl_context.cluster_id
+        )
+        etcd_keys_to_delete.append(
+            "/indexes/tags/provisioner/%s" % integration_id


lets clean this tag in integration service

r0h4n · 2018-02-14T08:45:12Z

tendrl/commons/objects/cluster/atoms/delete_cluster_details/__init__.py

+            "/indexes/tags/provisioner/%s" % integration_id
+        )
+        etcd_keys_to_delete.append(
+            "/indexes/tags/tendrl/integration/%s" %


lets clean this tag in integration service

nthomas-redhat

Please fix the pep8 issues and most obvious codacy issues

nthomas-redhat · 2018-02-14T08:41:10Z

.travis.yml

 env:
-  - TOXENV=pep8
  - TOXENV=vulture


Vulture env is not setup. Could you please add https://github.com/Tendrl/node-agent/blob/master/tox.ini#L38 to tox.ini?

nthomas-redhat · 2018-02-14T08:51:17Z

tendrl/commons/flows/import_cluster/__init__.py

-                    in ["in_progress", "done", "failed"]:
+            if (_cluster.status is not None and
+                    _cluster.status != "" and
+                    _cluster.status in ["syncing", "importing", "unmanaging"]):


What is the status of the cluster which is already imported?
I could see the defs as:
status:
help: status of the cluster (importing, syncing, unmanaging, unknown)
type: String
Do we need state to indicate no-sync and no-import is on?
Also what will be the value for a freshly detected cluster?
What will happen if someone trigger an import on already imported cluster(may be using api)?

Status field maintains status of any jobs running on the cluster. If import successful it would be set as syncing by GI. State of the cluster is maintained under final details as healthy /unhealthy. For afresh detected cluster status would be empty and I'd user submit import through app for a cluster which has import running back-end would reject the job.

nthomas-redhat · 2018-02-14T09:40:56Z

tendrl/commons/flows/unmanage_cluster/__init__.py

+            _cluster = NS.tendrl.objects.Cluster(
+                integration_id=integration_id
+            ).load()
+            _cluster.status = ""


Isn't it better to define and set a proper state here?

State of cluster is maintained in global details object already and ui uses the same

nthomas-redhat · 2018-02-14T09:49:31Z

tendrl/commons/objects/cluster/atoms/stop_monitoring_services/__init__.py

+                # Create jobs on nodes for stoping services
+                _job_id = str(uuid.uuid4())
+                params = {
+                    "Services[]": ["collectd"]


nthomas-redhat · 2018-02-14T10:06:19Z

tendrl/commons/objects/definition/master.yaml

+      run: tendrl.flows.UnmanageCluster
+      type: Update
+      uuid: 2f94a48a-05d7-408c-b400-e27827f4efed
+      version: 1


Do we need a post_run to make sure the cluster is un_managed and the services are stopped?

May be a post task to make sure cluster is detected back again. Will check this part

Please create an issue to track

In case of node-agent re-start only we do sds_sync for cluster but in un-manage we are not (re)starting node-agent on storage nodes so removal of tag indexes/tags/tendrl/integration/{int-id} should not be done. Also we cannot depend on this to figure out that an un-managed cluster is import ready now.

@r0h4n any other suggestion??

I have added a post task for the same and posted below

tendrl-bug-id: Tendrl#797 Signed-off-by: Shubhendu <shtripat@redhat.com>

julienlim · 2018-02-14T21:15:59Z

Per discussion with @a2batic, there are a couple of UI changes related to Unmanage cluster. We will be posting updates to the design soon.

shtripat force-pushed the unmanage-cluster branch 2 times, most recently from f5532b8 to 5312b2d Compare January 8, 2018 04:53

r0h4n suggested changes Jan 9, 2018

View reviewed changes

shtripat force-pushed the unmanage-cluster branch 2 times, most recently from 39030e2 to bb7ba56 Compare January 11, 2018 07:36

shtripat force-pushed the unmanage-cluster branch from bb7ba56 to c93b2f2 Compare February 7, 2018 05:13

shtripat force-pushed the unmanage-cluster branch from c93b2f2 to 73fcaba Compare February 8, 2018 05:15

Added flows for un-manage cluster

08f91fb

tendrl-bug-id: Tendrl#797 Signed-off-by: Shubhendu <shtripat@redhat.com>

shtripat force-pushed the unmanage-cluster branch from 73fcaba to 08f91fb Compare February 8, 2018 13:54

shtripat changed the title ~~WIP - Added flows for un-manage cluster~~ Added flows for un-manage cluster Feb 9, 2018

shtripat force-pushed the unmanage-cluster branch 2 times, most recently from 1af76c2 to e882668 Compare February 10, 2018 14:15

Added unitests for new code

c3bc4a9

tendrl-bug-id: Tendrl#797 Signed-off-by: Shubhendu <shtripat@redhat.com>

shtripat force-pushed the unmanage-cluster branch from e882668 to c3bc4a9 Compare February 10, 2018 14:29

shtripat mentioned this pull request Feb 12, 2018

Added changes for new cluster objects attributes Tendrl/gluster-integration#556

Merged

r0h4n suggested changes Feb 12, 2018

View reviewed changes

shtripat force-pushed the unmanage-cluster branch 3 times, most recently from 95e592c to 26304b6 Compare February 14, 2018 04:41

r0h4n suggested changes Feb 14, 2018

View reviewed changes

nthomas-redhat requested changes Feb 14, 2018

View reviewed changes

shtripat force-pushed the unmanage-cluster branch 2 times, most recently from abc300f to e43c568 Compare February 14, 2018 13:04

nthomas-redhat previously approved these changes Feb 14, 2018

View reviewed changes

Incorporated review comments

b168461

tendrl-bug-id: Tendrl#797 Signed-off-by: Shubhendu <shtripat@redhat.com>

shtripat dismissed nthomas-redhat’s stale review via b168461 February 14, 2018 16:10

shtripat force-pushed the unmanage-cluster branch from e43c568 to b168461 Compare February 14, 2018 16:10

r0h4n merged commit ee5d5fb into Tendrl:master Feb 15, 2018

r0h4n added this to the Archive milestone Mar 22, 2018

		from tendrl.commons.objects.node import Node


		class ClearClusterDetails(objects.BaseAtom):

Added flows for un-manage cluster #798

Added flows for un-manage cluster #798

Conversation

shtripat commented Jan 5, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

centos-ci commented Feb 7, 2018

shtripat commented Feb 7, 2018

centos-ci commented Feb 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Feb 13, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nthomas-redhat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

julienlim commented Feb 14, 2018

codecov-io commented Feb 13, 2018 •

edited

Loading