add refresh for {node,job}-exporter, prometheus and watchdog #1865
Conversation
@@ -20,9 +20,7 @@ | |||
|
|||
pushd $(dirname "$0") > /dev/null | |||
|
|||
|
|||
echo "refresh prometheus configuration" | |||
kubectl apply -f prometheus-configmap.yaml || exit $? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wandering why it doesn't work. Do you have any idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this only works if someone changed some parameters that are allowed to change in description of configmap. It will not restart prometheus service, which might be expected behavior by refresh user.
@@ -67,7 +67,7 @@ def get_service_list(self): | |||
|
|||
def refresh_all_label(self): | |||
self.logger.info("Begin to refresh all the nodes' labels") | |||
machinelist = self.cluster_object_model['machinelist'] | |||
machinelist = self.cluster_object_model['machine']['machine-list'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are going to decouple the statically binding of machine list
.
Could you query the machine list from k8s api?
Or via kube get pods
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should ask @ydye about this. Anyway, this PR is focused on the refresh of these services not the way of how paictl.py implement it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hao1939, maybe machinelist should be kept for paiservice. it is just for paiservice, machinelist should be retrieved from k8s api, instead of clusterconfig. of course, before deploying paiservice, you will need to validate the k8s cluster state is healthy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd better remove statically machinelist, since we are working on a scalable system.
We should base on assume that system is on changing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My currently assume is that:
- k8s manage the nodes, and label the nodes.
- PAI service bind to label, for scalability, it should decouple from specific nodes when possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hao1939 , that makes sense. please present your design once ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, please approve this PR if appropriate. Changing to dynamic machine list can be addressed by another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.
fix #1748 , I also fixed a bug when calling
paictl.py service refresh
. Not sure if this is correct. @ydye please have a check.