add refresh for {node,job}-exporter, prometheus and watchdog #1865

xudifsd · 2018-12-11T05:35:35Z

fix #1748 , I also fixed a bug when calling paictl.py service refresh. Not sure if this is correct. @ydye please have a check.

coveralls · 2018-12-11T05:42:10Z

Coverage increased (+18.4%) to 70.146% when pulling 1d98665 on dixu/add-refresh into 2c08713 on master.

hao1939 · 2018-12-11T06:47:06Z

src/prometheus/deploy/refresh.sh

@@ -20,9 +20,7 @@

 pushd $(dirname "$0") > /dev/null

-
-echo "refresh prometheus configuration"
-kubectl apply -f prometheus-configmap.yaml || exit $?


I'm wandering why it doesn't work. Do you have any idea?

this only works if someone changed some parameters that are allowed to change in description of configmap. It will not restart prometheus service, which might be expected behavior by refresh user.

hao1939 · 2018-12-11T06:52:37Z

deployment/paiLibrary/paiService/service_management_refresh.py

@@ -67,7 +67,7 @@ def get_service_list(self):

    def refresh_all_label(self):
        self.logger.info("Begin to refresh all the nodes' labels")
-        machinelist = self.cluster_object_model['machinelist']
+        machinelist = self.cluster_object_model['machine']['machine-list']


We are going to decouple the statically binding of machine list.
Could you query the machine list from k8s api?
Or via kube get pods?

you should ask @ydye about this. Anyway, this PR is focused on the refresh of these services not the way of how paictl.py implement it.

@hao1939, maybe machinelist should be kept for paiservice. it is just for paiservice, machinelist should be retrieved from k8s api, instead of clusterconfig. of course, before deploying paiservice, you will need to validate the k8s cluster state is healthy.

We'd better remove statically machinelist, since we are working on a scalable system.
We should base on assume that system is on changing.

My currently assume is that:

k8s manage the nodes, and label the nodes.

PAI service bind to label, for scalability, it should decouple from specific nodes when possible.

@hao1939 , that makes sense. please present your design once ready.

Ok, please approve this PR if appropriate. Changing to dynamic machine list can be addressed by another PR.

add refresh for {node,job}-exporter, prometheus and watchdog

1d98665

xudifsd requested review from hao1939 and ydye December 11, 2018 05:35

hao1939 reviewed Dec 11, 2018

View reviewed changes

hao1939 approved these changes Dec 11, 2018

View reviewed changes

xudifsd merged commit 8b922c0 into master Dec 12, 2018

xudifsd deleted the dixu/add-refresh branch December 12, 2018 02:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add refresh for {node,job}-exporter, prometheus and watchdog #1865

add refresh for {node,job}-exporter, prometheus and watchdog #1865

xudifsd commented Dec 11, 2018

coveralls commented Dec 11, 2018 •

edited

hao1939 Dec 11, 2018

xudifsd Dec 11, 2018

hao1939 Dec 11, 2018

xudifsd Dec 11, 2018 •

edited

fanyangCS Dec 11, 2018

hao1939 Dec 11, 2018

hao1939 Dec 11, 2018

fanyangCS Dec 11, 2018

xudifsd Dec 11, 2018

hao1939 Dec 11, 2018

add refresh for {node,job}-exporter, prometheus and watchdog #1865

add refresh for {node,job}-exporter, prometheus and watchdog #1865

Conversation

xudifsd commented Dec 11, 2018

coveralls commented Dec 11, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xudifsd Dec 11, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Dec 11, 2018 •

edited

xudifsd Dec 11, 2018 •

edited