-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added flows for un-manage cluster #798
Conversation
f5532b8
to
5312b2d
Compare
from tendrl.commons.objects.node import Node | ||
|
||
|
||
class ClearClusterDetails(objects.BaseAtom): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets call it Unmanage or Delete cluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change to DeleteClusterDetails
).load() | ||
|
||
try: | ||
NS._int.client.delete( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the NS.tendrl.objects.Cluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
from tendrl.commons.event import Event | ||
from tendrl.commons import objects | ||
from tendrl.commons.message import Message | ||
from tendrl.commons.objects.cluster_tendrl_context \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dont import objects, use NS.tendrl.objects.ClusterTendrlContext
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
recursive=True | ||
) | ||
NS._int.client.delete( | ||
"/indexes/tags/detected_cluster_id_to_integration_id/%s" % |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please create an object called NS.tendrl.objects.Index and use it instead of raw read/write
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are different data stored under /indexes/tags
and they are based on node tags as well like /indexes/tags/tendrl/monitor
. /indexes/tags/tendrl/node
etc. How many such objects we create here?
node_ids.append(node_id) | ||
nc = Node(node_id=node_id).load() | ||
node_ips[node_id] = nc.fqdn | ||
NS._int.client.delete( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to delete /nodes/%node_id ?
I thought we are simply unmanaging the cluster here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so do you mean anything updated by tendrl-node-agent ideally shouldnt be deleted as part of un-manager? If so later as part of forget cluster, we would need to clear all these details. comments?
@@ -111,6 +127,36 @@ namespace.tendrl: | |||
list: /clusters | |||
value: clusters/$TendrlContext.integration_id | |||
atoms: | |||
StopNodeServices: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
break this down into StopMonitoringServices and StopIntegrationServices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
- "tendrl/monitor" | ||
atoms: | ||
- tendrl.objects.Cluster.atoms.StopNodeServices | ||
- tendrl.objects.Cluster.atoms.ClearClusterDetails |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can call this DeleteCluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeleteClusterDetails
atoms: | ||
- tendrl.objects.Cluster.atoms.StopNodeServices | ||
- tendrl.objects.Cluster.atoms.ClearClusterDetails | ||
- tendrl.objects.Cluster.atoms.ClearMonitoringData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeleteMonitoring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeleteMonitoringDetails
type: Update | ||
uuid: 333c3333-3c33-33c3-333c-c33cc3c5555c | ||
help: Clear cluster details | ||
ClearMonitoringData: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeleteMonitoring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeleteMonitoringDetails
type: Update | ||
uuid: 333c3333-3c33-33c3-333c-c33cc3c4444c | ||
help: Stop node services | ||
ClearClusterDetails: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeleteCluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeleteClusterDetails
39030e2
to
bb7ba56
Compare
Can one of the admins verify this patch? |
bb7ba56
to
c93b2f2
Compare
Tested the latest change along with Tendrl/monitoring-integration#317 and it works as expected. The cluster can be un-managed and managed back. Across un-manage and manage back, the integration-id for the cluster remains the same. Also while un-manager towards end there is a notification about cluster state change to @r0h4n still need to work on unittests for these changes. Will work out a separate PR for the same. |
c93b2f2
to
73fcaba
Compare
Can one of the admins verify this patch? |
tendrl-bug-id: Tendrl#797 Signed-off-by: Shubhendu <shtripat@redhat.com>
73fcaba
to
08f91fb
Compare
1af76c2
to
e882668
Compare
tendrl-bug-id: Tendrl#797 Signed-off-by: Shubhendu <shtripat@redhat.com>
e882668
to
c3bc4a9
Compare
_cluster = NS.tendrl.objects.Cluster( | ||
integration_id=integration_id | ||
).load() | ||
if _cluster.status is not None and \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check if cluster.is_managed = "yes"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok will add cluster.is_managed as well
).load() | ||
if _cluster.status is not None and \ | ||
_cluster.status != "" and \ | ||
_cluster.status in \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest acquiring lock on "cluster" object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. will try that
# the cluster nodes | ||
try: | ||
gl_srvr_list = NS._int.client.read( | ||
"/indexes/tags/gluster/server" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this list includes all gluster nodes from all clusters. We need to change this tag to /indexes/tags/:integration_id/gluster/server
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So actually I am not removing the tag actually here. I am just removing the nodes of the cluster being unmanaged from the list and write the tag back with pending node ids
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still need to improve this. read("/indexes/tags/gluster/server") gives you every gluster node irrespective of cluster id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we remove the nodes of currently unmanaging cluster, it shouldnt matter for other clusters. How you see issue in this? Also what you suggest if not this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest we let the tendrl-gluster-integration clean up its own entry in /indexes/tags/gluster/server
So when you shutdown the tendrl-gluster-integration service during unmanage you need to add this clean up of tags.
The reason I dont want to load /indexes/tags/gluster/server is because it adds overhead of going through each and every gluster node across all clusters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you mean before stopping tendrl-gluster-integration, we should invoke a flow to clean the tags. So in gluster-integration as well we would need to load the list /indexes/tags/gluster/server and remove it node-id from the list and write it back.
So effectively each node loads this tag from etcd, removes its node-id from the list and writes it back. In existing case we are loading once in server and all the cluster nodes entries are removed from the list and then written back. Here is just one read and write whereas in other case (suggested one) it would no os nodes * (1 read + 1 write)
gl_srvr_list = NS._int.client.read( | ||
"/indexes/tags/gluster/server" | ||
).value | ||
gl_srvr_list = ast.literal_eval(gl_srvr_list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please dont use eval, if this is a json array, use json libs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
# Create jobs on nodes for stoping services | ||
_job_id = str(uuid.uuid4()) | ||
params = { | ||
"Services[]": ["collectd"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be documented, stating that Tendrl will have full control over "collectd" service on the storage nodes, which includes shutting it down
@julienlim @mbukatov @nthomas-redhat whats your take on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
time.sleep(5) | ||
continue | ||
_cluster.status = "" | ||
_cluster.current_job['status'] = "done" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have been using "finished" all over Tendrl, lets stick to that instead of "done"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -157,7 +157,8 @@ def run(self): | |||
_cluster = NS.tendrl.objects.Cluster( | |||
integration_id=NS.tendrl_context.integration_id | |||
).load() | |||
_cluster.import_status = "done" | |||
_cluster.status = "" | |||
_cluster.current_job['status'] = "done" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have been using "finished" all over Tendrl, lets stick to that instead of "done"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Codecov Report
@@ Coverage Diff @@
## master #798 +/- ##
=========================================
Coverage ? 77.47%
=========================================
Files ? 91
Lines ? 3481
Branches ? 443
=========================================
Hits ? 2697
Misses ? 718
Partials ? 66
Continue to review full report at Codecov.
|
95e592c
to
26304b6
Compare
@@ -25,8 +25,7 @@ def emit_event(resource, curr_value, msg, instance, | |||
plugin_instance=instance, | |||
message=msg, | |||
integration_id=integration_id or NS.tendrl_context.integration_id, | |||
cluster_name=cluster_name or NS.tendrl_context.cluster_name, | |||
sds_name=sds_name or NS.tendrl_context.sds_name, | |||
cluster_name=cluster_name or NS.tendrl_context.cluster_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why have you removed sds_name?
Lets keep it with default as None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As confirmed by @GowthamShanmugam it was not required and sds_name internally is decided using integration_id.
cluster_tendrl_context.cluster_id | ||
) | ||
etcd_keys_to_delete.append( | ||
"/indexes/tags/provisioner/%s" % integration_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets clean this tag in integration service
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
"/indexes/tags/provisioner/%s" % integration_id | ||
) | ||
etcd_keys_to_delete.append( | ||
"/indexes/tags/tendrl/integration/%s" % |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets clean this tag in integration service
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the pep8 issues and most obvious codacy issues
env: | ||
- TOXENV=pep8 | ||
- TOXENV=vulture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vulture env is not setup. Could you please add https://github.com/Tendrl/node-agent/blob/master/tox.ini#L38 to tox.ini?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
in ["in_progress", "done", "failed"]: | ||
if (_cluster.status is not None and | ||
_cluster.status != "" and | ||
_cluster.status in ["syncing", "importing", "unmanaging"]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the status of the cluster which is already imported?
I could see the defs as:
status:
help: status of the cluster (importing, syncing, unmanaging, unknown)
type: String
Do we need state to indicate no-sync and no-import is on?
Also what will be the value for a freshly detected cluster?
What will happen if someone trigger an import on already imported cluster(may be using api)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Status field maintains status of any jobs running on the cluster. If import successful it would be set as syncing by GI. State of the cluster is maintained under final details as healthy /unhealthy. For afresh detected cluster status would be empty and I'd user submit import through app for a cluster which has import running back-end would reject the job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
_cluster = NS.tendrl.objects.Cluster( | ||
integration_id=integration_id | ||
).load() | ||
_cluster.status = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it better to define and set a proper state here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
State of cluster is maintained in global details object already and ui uses the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
# Create jobs on nodes for stoping services | ||
_job_id = str(uuid.uuid4()) | ||
params = { | ||
"Services[]": ["collectd"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
run: tendrl.flows.UnmanageCluster | ||
type: Update | ||
uuid: 2f94a48a-05d7-408c-b400-e27827f4efed | ||
version: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a post_run to make sure the cluster is un_managed and the services are stopped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be a post task to make sure cluster is detected back again. Will check this part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please create an issue to track
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of node-agent re-start only we do sds_sync for cluster but in un-manage we are not (re)starting node-agent on storage nodes so removal of tag indexes/tags/tendrl/integration/{int-id}
should not be done. Also we cannot depend on this to figure out that an un-managed cluster is import ready now.
@r0h4n any other suggestion??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a post task for the same and posted below
abc300f
to
e43c568
Compare
tendrl-bug-id: Tendrl#797 Signed-off-by: Shubhendu <shtripat@redhat.com>
e43c568
to
b168461
Compare
Per discussion with @a2batic, there are a couple of UI changes related to Unmanage cluster. We will be posting updates to the design soon. |
tendrl-bug-id: #797
Signed-off-by: Shubhendu shtripat@redhat.com