Skip to content

Conversation

@pinheadmz
Copy link
Contributor

@pinheadmz pinheadmz commented Aug 2, 2024

The goal of this PR is to enable Warnet users to specify exactly which RPC response data from which tanks to monitor in a Grafana dashboard.

  1. Use a new Prometheus exporter image. We started off using https://github.com/jvstein/bitcoin-prometheus-exporter but this PR defines a new image (in resources/images/exporter) maintained by us at https://hub.docker.com/r/bitcoindevproject/bitcoin-exporter. This image checks for environment variable METRICS which should be a space-separated list of labels, RPC commands, and JSON result keys. e.g. inbounds=getnetworkinfo()["connections_in"]. See the updated docs in monitoring.md for more details.
  2. Add a new "metrics" key to the graphml schema and pass its value to the exporter container
  3. Add a new test logging_test.py that runs in CI (after installing helm!) This test starts a network with three nodes, one with default metrics, one with a custom metric, and one with none. After setup, it runs two scenarios and then pulls Prometheus data directly from the Grafana API (just like the web-based dashboard does). This test also runs install_logging.sh and connect_logging.sh logs look something like this:
2024-08-05 12:58:22 | INFO    | cnct_log | Attempting to start Grafana port forwarding
2024-08-05 12:58:22 | INFO    | cnct_log | Forwarding from 127.0.0.1:3000 -> 3000
2024-08-05 12:58:22 | INFO    | cnct_log | Forwarding from [::1]:3000 -> 3000
...

2024-08-05 12:59:27 | INFO    | cnct_log | Handling connection for 3000
2024-08-05 12:59:27 | INFO    | test     | Got Prometheus data source uid from Grafana: PBFA97CFB590B2093
2024-08-05 12:59:27 | DEBUG   | test     | Waiting for predicate with timeout 300s and interval 5s
2024-08-05 12:59:27 | INFO    | test     | Making Grafana request...
2024-08-05 12:59:27 | INFO    | cnct_log | Handling connection for 3000
2024-08-05 12:59:27 | INFO    | test     | No Grafana data yet for blocks

...
2024-08-05 13:02:34 | INFO    | test     | Making Grafana request...
2024-08-05 13:02:34 | INFO    | cnct_log | Handling connection for 3000
2024-08-05 13:02:34 | INFO    | test     | Grafana data: blocks times:  [1722877290000, 1722877305000, 1722877320000, 1722877335000, 1722877350000]
2024-08-05 13:02:34 | INFO    | test     | Grafana data: blocks values: [323, 323, 329, 329, 335]

In addition, a few clean-ups were stuffed into the pull request:

  1. Additional grafana.ini settings were added to disable login entirely for Grafana
  2. For reliability in a subprocess in a test, connect_logging.sh is now more resilient with a try/catch/retry
  3. tx_flood scenario was fixed to work with only 3 nodes

@pinheadmz pinheadmz force-pushed the rpc-gauge branch 4 times, most recently from 228859e to 59e0e63 Compare August 4, 2024 15:07
@pinheadmz pinheadmz marked this pull request as ready for review August 5, 2024 17:40
Copy link
Contributor

@willcl-ark willcl-ark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems pretty nice overall!

But, I can't acutally get it to (appear to) work?

Perhaps likely to be a user error -- I really don't click with the whole grafana/loki/promtail/xxx stack for some reason. But I think the docs could include mentioning how to access the data in grafana (if it's so hiden I can't find it).

- uses: chartboost/ruff-action@491342200cdd1cf4d5132a30ddc546b3b5bc531b
with:
args: 'format --check'
args: 'format --check --config pyproject.toml'
Copy link
Contributor

@willcl-ark willcl-ark Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprised this commit is needed at all. ruff should read the config by default, and extend-exclude is preferred when adding files to exclude (vs exclude).

I wonder if you are overwriting exclude default values ([".bzr", ".direnv", ".eggs", ".git", ".git-rewrite", ".hg", ".mypy_cache", ".nox", ".pants.d", ".pytype", ".ruff_cache", ".svn", ".tox", ".venv", "__pypackages__", "_build", "buck-out", "dist", "node_modules", "venv"]) with the below change?

Re-ordering minikube should not make any difference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think in this commit we should just remove the line

changed-files: 'true'

From the test.yml workflow file. This will see ruff respect the config properly. i.e. cherry-pick this commit?

willcl-ark@fe2ddd9

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok cherry picked and force pushed, lets see how CI digests...

Comment on lines 79 to 93
If you are running docker as a service via systemd you can apply it by adding the following to the service file and restarting the service:

```sh
# Add the following under the [Service] section of the unit file
LimitNOFILE=4096
```

Reload the systemd configuration and restart the unit afterwards:

```
sudo systemctl daemon-reload
sudo systemctl restart docker
```

On Ubuntu this file is located at `/lib/systemd/system/docker.service` but you can find it using `sudo systemctl status docker`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is likely still relevant? With minikube I run inside docker. This woudl also apply to podman, portainer, orb etc.

AFAIK even you guys on k8s via Docker Desktop may still be hit by these limits too? (although less clear there that you would start docker via systemd)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im gonna revert the docs updates in running.md, we can just address all that later. so the commit will just add metrics docs to monitoring.md

<node id="0">
<data key="version">27.0</data>
<data key="exporter">true</data>
<data key="metrics">blocks=getblockcount() inbounds=getnetworkinfo()["connections_in"] outbounds=getnetworkinfo()["connections_in"] mempool_size=getmempoolinfo()["size"]</data>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a metrics key to both default graph and generated graphs from warcli graph create so that users can easily add this key without getting missing key (at the graph level) errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warnet-rpc  | 2024-08-06 09:11:13 | ERROR   | server   | Error bring up warnet: Bad GraphML data: no key metrics
2024-08-06 09:11:13 | ERROR   | warnet.server | jsonrpc error
2024-08-06 09:11:13 | ERROR   | warnet.server | 
Traceback (most recent call last):
warnet-rpc  |   File "/usr/local/lib/python3.12/site-packages/networkx/readwrite/graphml.py", line 966, in decode_data_elements
    data_name = graphml_keys[key]["name"]
warnet-rpc  |                 ~~~~~~~~~~~~^^^^^
KeyError: 'metrics'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool done

resources/scripts/connect_logging.sh
```

The Grafana dashboard (and API) will be accessible without requiring authentication
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's probably because I'm too dumb, but where are the actual logs? When I open grafana on localhost:3000 I don't see any connected logs coming in?

image

This is with patch to default graph:

diff
diff --git a/resources/graphs/default.graphml b/resources/graphs/default.graphml
index 153bd52..8c276a0 100644
--- a/resources/graphs/default.graphml
+++ b/resources/graphs/default.graphml
@@ -6,12 +6,14 @@
   <key attr.name="exporter" attr.type="boolean" for="node" id="exporter"/>
   <key attr.name="collect_logs" attr.type="boolean" for="node" id="collect_logs"/>
   <key attr.name="image" attr.type="string" for="node" id="image"/>
+  <key attr.name="metrics" attr.type="string" for="node" id="metrics"/>
   <graph edgedefault="directed">
     <node id="0">
         <data key="version">27.0</data>
         <data key="bitcoin_config">-uacomment=w0</data>
         <data key="exporter">true</data>
         <data key="collect_logs">true</data>
+        <data key="metrics">blocks=getblockcount() inbounds=getnetworkinfo()["connections_in"] outbounds=getnetworkinfo()["connections_in"] mempool_size=getmempoolinfo()["size"]</data>
     </node>
     <node id="1">
         <data key="version">27.0</data>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I see running logging containers? This is all I see:

image

The script appeared to run successfully:

will@ubuntu in ~/src/warnet on  rpc-gauge [$!?⇕] is 📦 v0.9.11 : 🐍 (warnet)
₿ just installlogging
resources/scripts/install_logging.sh
"grafana" already exists with the same configuration, skipping
"prometheus-community" already exists with the same configuration, skipping
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "grafana" chart repository
...Successfully got an update from the "prometheus-community" chart repository
Update Complete. ⎈Happy Helming!⎈
Release "loki" does not exist. Installing it now.
NAME: loki
LAST DEPLOYED: Tue Aug  6 10:11:49 2024
NAMESPACE: warnet-logging
STATUS: deployed
REVISION: 1
NOTES:
***********************************************************************
 Welcome to Grafana Loki
 Chart version: 5.47.2
 Loki version: 2.9.6
***********************************************************************

Installed components:
* gateway
* minio
* read
* write
* backend
Release "promtail" does not exist. Installing it now.
NAME: promtail
LAST DEPLOYED: Tue Aug  6 10:12:38 2024
NAMESPACE: warnet-logging
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
***********************************************************************
 Welcome to Grafana Promtail
 Chart version: 6.16.4
 Promtail version: 3.0.0
***********************************************************************

Verify the application is working by running these commands:
* kubectl --namespace warnet-logging port-forward daemonset/promtail 3101
* curl http://127.0.0.1:3101/metrics
Release "prometheus" does not exist. Installing it now.
NAME: prometheus
LAST DEPLOYED: Tue Aug  6 10:12:40 2024
NAMESPACE: warnet-logging
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
  kubectl --namespace warnet-logging get pods -l "release=prometheus"

Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.
Release "loki-grafana" does not exist. Installing it now.
NAME: loki-grafana
LAST DEPLOYED: Tue Aug  6 10:12:54 2024
NAMESPACE: warnet-logging
STATUS: deployed
REVISION: 1
NOTES:
1. Get your 'admin' user password by running:

   kubectl get secret --namespace warnet-logging loki-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo


2. The Grafana server can be accessed via port 80 on the following DNS name from within your cluster:

   loki-grafana.warnet-logging.svc.cluster.local

   Get the Grafana URL to visit by running these commands in the same shell:
     export POD_NAME=$(kubectl get pods --namespace warnet-logging -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=loki-grafana" -o jsonpath="{.items[0].metadata.name}")
     kubectl --namespace warnet-logging port-forward $POD_NAME 3000

3. Login with the password from step 1 and the username: admin
#################################################################################
######   WARNING: Persistence is disabled!!! You will lose your data when   #####
######            the Grafana pod is terminated.                            #####
#################################################################################

will@ubuntu in ~/src/warnet on  rpc-gauge [$!?⇕] is 📦 v0.9.11 : 🐍 (warnet) took 1m8s
₿ just connectlogging
resources/scripts/connect_logging.sh
Go to http://localhost:3000
Grafana pod name: loki-grafana-6c855549d4-wsv88
Attempting to start Grafana port forwarding
Forwarding from 127.0.0.1:3000 -> 3000
Forwarding from [::1]:3000 -> 3000
Handling connection for 3000

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubectl does show preometheus running:

₿ kubectl --namespace warnet-logging get pods -l "release=prometheus"
NAME                                                   READY   STATUS    RESTARTS   AGE
prometheus-kube-prometheus-operator-6c5998f7dc-hjvwx   1/1     Running   0          9m27s
prometheus-kube-state-metrics-688d66b5b8-8srsw         1/1     Running   0          9m27s
prometheus-prometheus-node-exporter-8tt7q              1/1     Running   0          9m27s

But I don't see any node exporters?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

server logs don't appear to contain any errors:

warnet-rpc  | 2024-08-06 09:11:34 | DEBUG   | tank     | Parsed graph node: 0 with attributes: ['version=27.0', 'image=None', 'bitcoin_config=-uacomment=w0', 'tc_netem=None', 'exporter=True', 'metrics=blocks=getblockcount() inbounds=getnetworkinfo()["connections_in"] outbounds=getnetworkinfo()["connections_in"] mempool_size=getmempoolinfo()["size"]', 'collect_logs=True', 'build_args=', 'ln=None', 'ln_image=None', 'ln_cb_image=None', 'ln_config=None']
warnet-rpc  | 2024-08-06 09:11:34 | DEBUG   | tank     | Parsed graph node: 1 with attributes: ['version=27.0', 'image=None', 'bitcoin_config=-uacomment=w1', 'tc_netem=None', 'exporter=True', 'metrics=None', 'collect_logs=True', 'build_args=', 'ln=None', 'ln_image=None', 'ln_cb_image=None', 'ln_config=None']
warnet-rpc  | 2024-08-06 09:11:34 | DEBUG   | tank     | Parsed graph node: 2 with attributes: ['version=None', 'image=bitcoindevproject/bitcoin:26.0', 'bitcoin_config=-uacomment=w2 -debug=mempool', 'tc_netem=None', 'exporter=True', 'metrics=None', 'collect_logs=True', 'build_args=', 'ln=None', 'ln_image=None', 'ln_cb_image=None', 'ln_config=None']
warnet-rpc  | 2024-08-06 09:11:34 | DEBUG   | tank     | Parsed graph node: 3 with attributes: ['version=27.0', 'image=None', 'bitcoin_config=-uacomment=w3', 'tc_netem=None', 'exporter=True', 'metrics=None', 'collect_logs=False', 'build_args=', 'ln=None', 'ln_image=None', 'ln_cb_image=None', 'ln_config=None']
warnet-rpc  | 2024-08-06 09:11:34 | DEBUG   | tank     | Parsed graph node: 4 with attributes: ['version=27.0', 'image=None', 'bitcoin_config=-uacomment=w4', 'tc_netem=None', 'exporter=True', 'metrics=None', 'collect_logs=False', 'build_args=', 'ln=None', 'ln_image=None', 'ln_cb_image=None', 'ln_config=None']
warnet-rpc  | 2024-08-06 09:11:34 | DEBUG   | tank     | Parsed graph node: 5 with attributes: ['version=27.0', 'image=None', 'bitcoin_config=-uacomment=w5', 'tc_netem=None', 'exporter=True', 'metrics=None', 'collect_logs=False', 'build_args=', 'ln=None', 'ln_image=None', 'ln_cb_image=None', 'ln_config=None']
warnet-rpc  | 2024-08-06 09:11:34 | DEBUG   | tank     | Parsed graph node: 6 with attributes: ['version=27.0', 'image=None', 'bitcoin_config=-uacomment=w6', 'tc_netem=None', 'exporter=False', 'metrics=None', 'collect_logs=False', 'build_args=', 'ln=None', 'ln_image=None', 'ln_cb_image=None', 'ln_config=None']
warnet-rpc  | 2024-08-06 09:11:34 | DEBUG   | tank     | Parsed graph node: 7 with attributes: ['version=27.0', 'image=None', 'bitcoin_config=-uacomment=w7', 'tc_netem=None', 'exporter=False', 'metrics=None', 'collect_logs=False', 'build_args=', 'ln=None', 'ln_image=None', 'ln_cb_image=None', 'ln_config=None']
warnet-rpc  | 2024-08-06 09:11:34 | DEBUG   | tank     | Parsed graph node: 8 with attributes: ['version=27.0', 'image=None', 'bitcoin_config=-uacomment=w8', 'tc_netem=None', 'exporter=False', 'metrics=None', 'collect_logs=False', 'build_args=', 'ln=None', 'ln_image=None', 'ln_cb_image=None', 'ln_config=None']
warnet-rpc  | 2024-08-06 09:11:34 | DEBUG   | tank     | Parsed graph node: 9 with attributes: ['version=27.0', 'image=None', 'bitcoin_config=-uacomment=w9', 'tc_netem=None', 'exporter=False', 'metrics=None', 'collect_logs=False', 'build_args=', 'ln=None', 'ln_image=None', 'ln_cb_image=None', 'ln_config=None']
warnet-rpc  | 2024-08-06 09:11:34 | DEBUG   | tank     | Parsed graph node: 10 with attributes: ['version=27.0', 'image=None', 'bitcoin_config=-uacomment=w10', 'tc_netem=None', 'exporter=False', 'metrics=None', 'collect_logs=False', 'build_args=', 'ln=None', 'ln_image=None', 'ln_cb_image=None', 'ln_config=None']
warnet-rpc  | 2024-08-06 09:11:34 | DEBUG   | tank     | Parsed graph node: 11 with attributes: ['version=27.0', 'image=None', 'bitcoin_config=-uacomment=w11', 'tc_netem=None', 'exporter=False', 'metrics=None', 'collect_logs=False', 'build_args=', 'ln=None', 'ln_image=None', 'ln_cb_image=None', 'ln_config=None']
2024-08-06 09:11:34 | INFO    | warnet   | Imported 12 tanks from graph
warnet-rpc  | 2024-08-06 09:11:34 | INFO    | warnet   | Created Warnet using directory /root/.warnet/warnet/warnet
2024-08-06 09:11:34 | DEBUG   | k8s      | Deploying pods
warnet-rpc  | 2024-08-06 09:11:34 | DEBUG   | k8s      | Creating bitcoind container for tank 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do see a single running exporter (I think):

✗ kubectl --namespace warnet-logging get pods
NAME                                                     READY   STATUS    RESTARTS   AGE
alertmanager-prometheus-kube-prometheus-alertmanager-0   2/2     Running   0          14m
loki-backend-0                                           2/2     Running   0          16m
loki-backend-1                                           2/2     Running   0          16m
loki-backend-2                                           2/2     Running   0          16m
loki-canary-z7vgh                                        1/1     Running   0          16m
loki-gateway-6b57fdb5dd-ktspk                            1/1     Running   0          16m
loki-grafana-6c855549d4-wsv88                            1/1     Running   0          15m
loki-grafana-agent-operator-b8f4865b9-lq2fc              1/1     Running   0          16m
loki-minio-0                                             1/1     Running   0          16m
loki-read-5d8755d4cf-74zwb                               1/1     Running   0          16m
loki-read-5d8755d4cf-9wctf                               1/1     Running   0          16m
loki-read-5d8755d4cf-gdb6z                               1/1     Running   0          16m
loki-write-0                                             1/1     Running   0          16m
loki-write-1                                             1/1     Running   0          16m
loki-write-2                                             1/1     Running   0          16m
prometheus-kube-prometheus-operator-6c5998f7dc-hjvwx     1/1     Running   0          15m
prometheus-kube-state-metrics-688d66b5b8-8srsw           1/1     Running   0          15m
prometheus-prometheus-kube-prometheus-prometheus-0       2/2     Running   0          14m
prometheus-prometheus-node-exporter-8tt7q                1/1     Running   0          15m
promtail-7h8jk                                           1/1     Running   0          15m

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exporters are inside the tank pods, next to the bitcoin containers.

Screenshot 2024-08-06 at 10 27 58 AM

The stuff in warnet-logging is the grafana api server and the prometheus scraper that reads from the individual tank exporters. I know there is a "node exporter" pod in warnet-logging as well, I dunno what that is actually for and on my system, it never works anyway:
Screenshot 2024-08-06 at 10 26 52 AM

As far as seeing something in Grafana right away you're right I didn't document that, I will push another commit today that hopefully makes a default dashboard easy to load

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willcl-ark logs are in Loki not Prometheus. There are no additional containers for Loki to get it's data as it collects it via k8s directly similar to how you can do k logs rpd-0.

pinheadmz and others added 4 commits August 6, 2024 10:17
Previously this was set to changed files only, but this overrides the
ruff config in pyproject.toml.

This was added as a stop-gap while files were steadily formatted.
Remove this setting to have ruff format respect ruff config.
@mplsgrant
Copy link
Collaborator

mplsgrant commented Aug 6, 2024

I got the graphana page to open, but when I navigated to Explore -> Metrics, the webpage said it was "Unable to retrieve metric names".

My steps were: started minikube, did installlogging and connect logging, and I also added the suggested metrics to the default graphml file before starting warnet.

I did notice kubectl had a number of pods and statefulsets with issues:

NAMESPACE        NAME                                                         READY   STATUS                  RESTARTS        AGE
warnet-logging   pod/alertmanager-prometheus-kube-prometheus-alertmanager-0   0/2     Init:CrashLoopBackOff   6 (4m28s ago)   10m
warnet-logging   pod/prometheus-prometheus-kube-prometheus-prometheus-0       0/2     Init:CrashLoopBackOff   6 (4m53s ago)   10m


NAMESPACE        NAME                                                                    READY   AGE
warnet-logging   statefulset.apps/alertmanager-prometheus-kube-prometheus-alertmanager   0/1     10m
warnet-logging   statefulset.apps/prometheus-prometheus-kube-prometheus-prometheus       0/1     10m

Here's some more context:

$ kubectl describe pod prometheus-prometheus-kube-prometheus-prometheus-0_warnet-logging -n warnet-logging
<snip>
Message:   level=info ts=2024-08-06T15:06:30.497464083Z caller=main.go:142 msg="Starting prometheus-config-reloader" version="(version=0.75.2, branch=refs/tags/v0.75.2, revision=35b0f457d8f705f1ac8282b4e22caa68a19c6c43)"
level=info ts=2024-08-06T15:06:30.497513937Z caller=main.go:143 build_context="(go=go1.22.5, platform=linux/amd64, user=Action-Run-ID-10070178014, date=20240724-04:04:09, tags=unknown)"
level=info ts=2024-08-06T15:06:30.497715636Z caller=cpu.go:28 msg="Leaving GOMAXPROCS=12: CPU quota undefined"
level=info ts=2024-08-06T15:06:30.497916775Z caller=reloader.go:270 msg="reloading via HTTP"
level=error ts=2024-08-06T15:06:30.497940139Z caller=main.go:223 msg="Failed to run" err="add config file /etc/prometheus/config/prometheus.yaml.gz to watcher: create watcher: too many open files"

$ kubectl describe pod/alertmanager-prometheus-kube-prometheus-alertmanager-0 -n warnet-logging
<snip>
Message:   level=info ts=2024-08-06T15:12:13.49196692Z caller=main.go:142 msg="Starting prometheus-config-reloader" version="(version=0.75.2, branch=refs/tags/v0.75.2, revision=35b0f457d8f705f1ac8282b4e22caa68a19c6c43)"
level=info ts=2024-08-06T15:12:13.492003509Z caller=main.go:143 build_context="(go=go1.22.5, platform=linux/amd64, user=Action-Run-ID-10070178014, date=20240724-04:04:09, tags=unknown)"
level=info ts=2024-08-06T15:12:13.492173573Z caller=cpu.go:28 msg="Leaving GOMAXPROCS=12: CPU quota undefined"
level=info ts=2024-08-06T15:12:13.49237159Z caller=reloader.go:270 msg="reloading via HTTP"
level=error ts=2024-08-06T15:12:13.492399994Z caller=main.go:223 msg="Failed to run" err="add config file /etc/alertmanager/config/alertmanager.yaml.gz to watcher: create watcher: too many open files"

@pinheadmz
Copy link
Contributor Author

@willcl-ark I committed a default dashboard template and documented its use: https://github.com/bitcoin-dev-project/warnet/blob/4fb03b133a0558fc01ae6183d64cd392ac1b10e7/docs/monitoring.md

this will work with the default.graphml graph. Follow up PR will make that curl command more pretty

@willcl-ark
Copy link
Contributor

Thanks @pinheadmz , will test it out shortly

@pinheadmz
Copy link
Contributor Author

@mplsgrant too many open files ?? 👀

@mplsgrant
Copy link
Collaborator

mplsgrant commented Aug 6, 2024

@pinheadmz I'm not sure what that's all about. I upped my minikube to 8 cpus and 32 gigs of memory, but I got the same error.

Edit: Wait, actually, I'll try running again after tweaking ulimit.

@mplsgrant
Copy link
Collaborator

mplsgrant commented Aug 6, 2024

To resolve the "too many files" issue, I updated my /etc/sysctl.conf as per the docs.

I would say that this is a requirement for graphana (or at least our use of graphana).

# label=method(params)[return object key][...]
METRICS = os.environ.get(
"METRICS",
'blocks=getblockcount() inbounds=getnetworkinfo()["connections_in"] outbounds=getnetworkinfo()["connections_in"] mempool_size=getmempoolinfo()["size"]',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prepend with "bitcoind."? Assuming period is a valid character in a metric label.

Copy link
Collaborator

@m3dwards m3dwards left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this PR. One nit about metrics naming. With or without happy for this to get merged.

Copy link
Contributor

@willcl-ark willcl-ark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK ec801df

I'd say this is a good enough start to go in as-is. Would be nice to also have all metrics filterable with bitcoin as @m3dwards suggests, but we can also follow-up it...

RUN pip install --no-cache-dir prometheus_client

# Prometheus exporter script for bitcoind
COPY bitcoin-exporter.py /
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the / for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't that copy the file to root directory inside the container ?

@bitcoin-dev-project bitcoin-dev-project deleted a comment from 256dummy Aug 7, 2024
@bdp-DrahtBot
Copy link
Collaborator

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #421 (make pip install warnet use prod by willcl-ark)
  • #419 (warcli cluster refactors by willcl-ark)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@m3dwards m3dwards merged commit bf9eff1 into bitcoin-dev-project:main Aug 7, 2024
@willcl-ark
Copy link
Contributor

nice

1 similar comment
@pinheadmz
Copy link
Contributor Author

nice

@pinheadmz pinheadmz mentioned this pull request Aug 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants