-
Notifications
You must be signed in to change notification settings - Fork 54
Flexible RPC exporter for Prometheus #423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3 node graph used to result in an error: no values in randrange(1, 1)
228859e to
59e0e63
Compare
willcl-ark
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems pretty nice overall!
But, I can't acutally get it to (appear to) work?
Perhaps likely to be a user error -- I really don't click with the whole grafana/loki/promtail/xxx stack for some reason. But I think the docs could include mentioning how to access the data in grafana (if it's so hiden I can't find it).
.github/workflows/test.yml
Outdated
| - uses: chartboost/ruff-action@491342200cdd1cf4d5132a30ddc546b3b5bc531b | ||
| with: | ||
| args: 'format --check' | ||
| args: 'format --check --config pyproject.toml' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surprised this commit is needed at all. ruff should read the config by default, and extend-exclude is preferred when adding files to exclude (vs exclude).
I wonder if you are overwriting exclude default values ([".bzr", ".direnv", ".eggs", ".git", ".git-rewrite", ".hg", ".mypy_cache", ".nox", ".pants.d", ".pytype", ".ruff_cache", ".svn", ".tox", ".venv", "__pypackages__", "_build", "buck-out", "dist", "node_modules", "venv"]) with the below change?
Re-ordering minikube should not make any difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think in this commit we should just remove the line
changed-files: 'true'From the test.yml workflow file. This will see ruff respect the config properly. i.e. cherry-pick this commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok cherry picked and force pushed, lets see how CI digests...
| If you are running docker as a service via systemd you can apply it by adding the following to the service file and restarting the service: | ||
|
|
||
| ```sh | ||
| # Add the following under the [Service] section of the unit file | ||
| LimitNOFILE=4096 | ||
| ``` | ||
|
|
||
| Reload the systemd configuration and restart the unit afterwards: | ||
|
|
||
| ``` | ||
| sudo systemctl daemon-reload | ||
| sudo systemctl restart docker | ||
| ``` | ||
|
|
||
| On Ubuntu this file is located at `/lib/systemd/system/docker.service` but you can find it using `sudo systemctl status docker`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is likely still relevant? With minikube I run inside docker. This woudl also apply to podman, portainer, orb etc.
AFAIK even you guys on k8s via Docker Desktop may still be hit by these limits too? (although less clear there that you would start docker via systemd)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im gonna revert the docs updates in running.md, we can just address all that later. so the commit will just add metrics docs to monitoring.md
| <node id="0"> | ||
| <data key="version">27.0</data> | ||
| <data key="exporter">true</data> | ||
| <data key="metrics">blocks=getblockcount() inbounds=getnetworkinfo()["connections_in"] outbounds=getnetworkinfo()["connections_in"] mempool_size=getmempoolinfo()["size"]</data> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a metrics key to both default graph and generated graphs from warcli graph create so that users can easily add this key without getting missing key (at the graph level) errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warnet-rpc | 2024-08-06 09:11:13 | ERROR | server | Error bring up warnet: Bad GraphML data: no key metrics
2024-08-06 09:11:13 | ERROR | warnet.server | jsonrpc error
2024-08-06 09:11:13 | ERROR | warnet.server |
Traceback (most recent call last):
warnet-rpc | File "/usr/local/lib/python3.12/site-packages/networkx/readwrite/graphml.py", line 966, in decode_data_elements
data_name = graphml_keys[key]["name"]
warnet-rpc | ~~~~~~~~~~~~^^^^^
KeyError: 'metrics'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool done
| resources/scripts/connect_logging.sh | ||
| ``` | ||
|
|
||
| The Grafana dashboard (and API) will be accessible without requiring authentication |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's probably because I'm too dumb, but where are the actual logs? When I open grafana on localhost:3000 I don't see any connected logs coming in?
This is with patch to default graph:
diff
diff --git a/resources/graphs/default.graphml b/resources/graphs/default.graphml
index 153bd52..8c276a0 100644
--- a/resources/graphs/default.graphml
+++ b/resources/graphs/default.graphml
@@ -6,12 +6,14 @@
<key attr.name="exporter" attr.type="boolean" for="node" id="exporter"/>
<key attr.name="collect_logs" attr.type="boolean" for="node" id="collect_logs"/>
<key attr.name="image" attr.type="string" for="node" id="image"/>
+ <key attr.name="metrics" attr.type="string" for="node" id="metrics"/>
<graph edgedefault="directed">
<node id="0">
<data key="version">27.0</data>
<data key="bitcoin_config">-uacomment=w0</data>
<data key="exporter">true</data>
<data key="collect_logs">true</data>
+ <data key="metrics">blocks=getblockcount() inbounds=getnetworkinfo()["connections_in"] outbounds=getnetworkinfo()["connections_in"] mempool_size=getmempoolinfo()["size"]</data>
</node>
<node id="1">
<data key="version">27.0</data>There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I see running logging containers? This is all I see:
The script appeared to run successfully:
will@ubuntu in ~/src/warnet on rpc-gauge [$!?⇕] is 📦 v0.9.11 : 🐍 (warnet)
₿ just installlogging
resources/scripts/install_logging.sh
"grafana" already exists with the same configuration, skipping
"prometheus-community" already exists with the same configuration, skipping
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "grafana" chart repository
...Successfully got an update from the "prometheus-community" chart repository
Update Complete. ⎈Happy Helming!⎈
Release "loki" does not exist. Installing it now.
NAME: loki
LAST DEPLOYED: Tue Aug 6 10:11:49 2024
NAMESPACE: warnet-logging
STATUS: deployed
REVISION: 1
NOTES:
***********************************************************************
Welcome to Grafana Loki
Chart version: 5.47.2
Loki version: 2.9.6
***********************************************************************
Installed components:
* gateway
* minio
* read
* write
* backend
Release "promtail" does not exist. Installing it now.
NAME: promtail
LAST DEPLOYED: Tue Aug 6 10:12:38 2024
NAMESPACE: warnet-logging
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
***********************************************************************
Welcome to Grafana Promtail
Chart version: 6.16.4
Promtail version: 3.0.0
***********************************************************************
Verify the application is working by running these commands:
* kubectl --namespace warnet-logging port-forward daemonset/promtail 3101
* curl http://127.0.0.1:3101/metrics
Release "prometheus" does not exist. Installing it now.
NAME: prometheus
LAST DEPLOYED: Tue Aug 6 10:12:40 2024
NAMESPACE: warnet-logging
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
kubectl --namespace warnet-logging get pods -l "release=prometheus"
Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.
Release "loki-grafana" does not exist. Installing it now.
NAME: loki-grafana
LAST DEPLOYED: Tue Aug 6 10:12:54 2024
NAMESPACE: warnet-logging
STATUS: deployed
REVISION: 1
NOTES:
1. Get your 'admin' user password by running:
kubectl get secret --namespace warnet-logging loki-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
2. The Grafana server can be accessed via port 80 on the following DNS name from within your cluster:
loki-grafana.warnet-logging.svc.cluster.local
Get the Grafana URL to visit by running these commands in the same shell:
export POD_NAME=$(kubectl get pods --namespace warnet-logging -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=loki-grafana" -o jsonpath="{.items[0].metadata.name}")
kubectl --namespace warnet-logging port-forward $POD_NAME 3000
3. Login with the password from step 1 and the username: admin
#################################################################################
###### WARNING: Persistence is disabled!!! You will lose your data when #####
###### the Grafana pod is terminated. #####
#################################################################################
will@ubuntu in ~/src/warnet on rpc-gauge [$!?⇕] is 📦 v0.9.11 : 🐍 (warnet) took 1m8s
₿ just connectlogging
resources/scripts/connect_logging.sh
Go to http://localhost:3000
Grafana pod name: loki-grafana-6c855549d4-wsv88
Attempting to start Grafana port forwarding
Forwarding from 127.0.0.1:3000 -> 3000
Forwarding from [::1]:3000 -> 3000
Handling connection for 3000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubectl does show preometheus running:
₿ kubectl --namespace warnet-logging get pods -l "release=prometheus"
NAME READY STATUS RESTARTS AGE
prometheus-kube-prometheus-operator-6c5998f7dc-hjvwx 1/1 Running 0 9m27s
prometheus-kube-state-metrics-688d66b5b8-8srsw 1/1 Running 0 9m27s
prometheus-prometheus-node-exporter-8tt7q 1/1 Running 0 9m27sBut I don't see any node exporters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
server logs don't appear to contain any errors:
warnet-rpc | 2024-08-06 09:11:34 | DEBUG | tank | Parsed graph node: 0 with attributes: ['version=27.0', 'image=None', 'bitcoin_config=-uacomment=w0', 'tc_netem=None', 'exporter=True', 'metrics=blocks=getblockcount() inbounds=getnetworkinfo()["connections_in"] outbounds=getnetworkinfo()["connections_in"] mempool_size=getmempoolinfo()["size"]', 'collect_logs=True', 'build_args=', 'ln=None', 'ln_image=None', 'ln_cb_image=None', 'ln_config=None']
warnet-rpc | 2024-08-06 09:11:34 | DEBUG | tank | Parsed graph node: 1 with attributes: ['version=27.0', 'image=None', 'bitcoin_config=-uacomment=w1', 'tc_netem=None', 'exporter=True', 'metrics=None', 'collect_logs=True', 'build_args=', 'ln=None', 'ln_image=None', 'ln_cb_image=None', 'ln_config=None']
warnet-rpc | 2024-08-06 09:11:34 | DEBUG | tank | Parsed graph node: 2 with attributes: ['version=None', 'image=bitcoindevproject/bitcoin:26.0', 'bitcoin_config=-uacomment=w2 -debug=mempool', 'tc_netem=None', 'exporter=True', 'metrics=None', 'collect_logs=True', 'build_args=', 'ln=None', 'ln_image=None', 'ln_cb_image=None', 'ln_config=None']
warnet-rpc | 2024-08-06 09:11:34 | DEBUG | tank | Parsed graph node: 3 with attributes: ['version=27.0', 'image=None', 'bitcoin_config=-uacomment=w3', 'tc_netem=None', 'exporter=True', 'metrics=None', 'collect_logs=False', 'build_args=', 'ln=None', 'ln_image=None', 'ln_cb_image=None', 'ln_config=None']
warnet-rpc | 2024-08-06 09:11:34 | DEBUG | tank | Parsed graph node: 4 with attributes: ['version=27.0', 'image=None', 'bitcoin_config=-uacomment=w4', 'tc_netem=None', 'exporter=True', 'metrics=None', 'collect_logs=False', 'build_args=', 'ln=None', 'ln_image=None', 'ln_cb_image=None', 'ln_config=None']
warnet-rpc | 2024-08-06 09:11:34 | DEBUG | tank | Parsed graph node: 5 with attributes: ['version=27.0', 'image=None', 'bitcoin_config=-uacomment=w5', 'tc_netem=None', 'exporter=True', 'metrics=None', 'collect_logs=False', 'build_args=', 'ln=None', 'ln_image=None', 'ln_cb_image=None', 'ln_config=None']
warnet-rpc | 2024-08-06 09:11:34 | DEBUG | tank | Parsed graph node: 6 with attributes: ['version=27.0', 'image=None', 'bitcoin_config=-uacomment=w6', 'tc_netem=None', 'exporter=False', 'metrics=None', 'collect_logs=False', 'build_args=', 'ln=None', 'ln_image=None', 'ln_cb_image=None', 'ln_config=None']
warnet-rpc | 2024-08-06 09:11:34 | DEBUG | tank | Parsed graph node: 7 with attributes: ['version=27.0', 'image=None', 'bitcoin_config=-uacomment=w7', 'tc_netem=None', 'exporter=False', 'metrics=None', 'collect_logs=False', 'build_args=', 'ln=None', 'ln_image=None', 'ln_cb_image=None', 'ln_config=None']
warnet-rpc | 2024-08-06 09:11:34 | DEBUG | tank | Parsed graph node: 8 with attributes: ['version=27.0', 'image=None', 'bitcoin_config=-uacomment=w8', 'tc_netem=None', 'exporter=False', 'metrics=None', 'collect_logs=False', 'build_args=', 'ln=None', 'ln_image=None', 'ln_cb_image=None', 'ln_config=None']
warnet-rpc | 2024-08-06 09:11:34 | DEBUG | tank | Parsed graph node: 9 with attributes: ['version=27.0', 'image=None', 'bitcoin_config=-uacomment=w9', 'tc_netem=None', 'exporter=False', 'metrics=None', 'collect_logs=False', 'build_args=', 'ln=None', 'ln_image=None', 'ln_cb_image=None', 'ln_config=None']
warnet-rpc | 2024-08-06 09:11:34 | DEBUG | tank | Parsed graph node: 10 with attributes: ['version=27.0', 'image=None', 'bitcoin_config=-uacomment=w10', 'tc_netem=None', 'exporter=False', 'metrics=None', 'collect_logs=False', 'build_args=', 'ln=None', 'ln_image=None', 'ln_cb_image=None', 'ln_config=None']
warnet-rpc | 2024-08-06 09:11:34 | DEBUG | tank | Parsed graph node: 11 with attributes: ['version=27.0', 'image=None', 'bitcoin_config=-uacomment=w11', 'tc_netem=None', 'exporter=False', 'metrics=None', 'collect_logs=False', 'build_args=', 'ln=None', 'ln_image=None', 'ln_cb_image=None', 'ln_config=None']
2024-08-06 09:11:34 | INFO | warnet | Imported 12 tanks from graph
warnet-rpc | 2024-08-06 09:11:34 | INFO | warnet | Created Warnet using directory /root/.warnet/warnet/warnet
2024-08-06 09:11:34 | DEBUG | k8s | Deploying pods
warnet-rpc | 2024-08-06 09:11:34 | DEBUG | k8s | Creating bitcoind container for tank 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do see a single running exporter (I think):
✗ kubectl --namespace warnet-logging get pods
NAME READY STATUS RESTARTS AGE
alertmanager-prometheus-kube-prometheus-alertmanager-0 2/2 Running 0 14m
loki-backend-0 2/2 Running 0 16m
loki-backend-1 2/2 Running 0 16m
loki-backend-2 2/2 Running 0 16m
loki-canary-z7vgh 1/1 Running 0 16m
loki-gateway-6b57fdb5dd-ktspk 1/1 Running 0 16m
loki-grafana-6c855549d4-wsv88 1/1 Running 0 15m
loki-grafana-agent-operator-b8f4865b9-lq2fc 1/1 Running 0 16m
loki-minio-0 1/1 Running 0 16m
loki-read-5d8755d4cf-74zwb 1/1 Running 0 16m
loki-read-5d8755d4cf-9wctf 1/1 Running 0 16m
loki-read-5d8755d4cf-gdb6z 1/1 Running 0 16m
loki-write-0 1/1 Running 0 16m
loki-write-1 1/1 Running 0 16m
loki-write-2 1/1 Running 0 16m
prometheus-kube-prometheus-operator-6c5998f7dc-hjvwx 1/1 Running 0 15m
prometheus-kube-state-metrics-688d66b5b8-8srsw 1/1 Running 0 15m
prometheus-prometheus-kube-prometheus-prometheus-0 2/2 Running 0 14m
prometheus-prometheus-node-exporter-8tt7q 1/1 Running 0 15m
promtail-7h8jk 1/1 Running 0 15m
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exporters are inside the tank pods, next to the bitcoin containers.
The stuff in warnet-logging is the grafana api server and the prometheus scraper that reads from the individual tank exporters. I know there is a "node exporter" pod in warnet-logging as well, I dunno what that is actually for and on my system, it never works anyway:

As far as seeing something in Grafana right away you're right I didn't document that, I will push another commit today that hopefully makes a default dashboard easy to load
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willcl-ark logs are in Loki not Prometheus. There are no additional containers for Loki to get it's data as it collects it via k8s directly similar to how you can do k logs rpd-0.
Previously this was set to changed files only, but this overrides the ruff config in pyproject.toml. This was added as a stop-gap while files were steadily formatted. Remove this setting to have ruff format respect ruff config.
|
I got the graphana page to open, but when I navigated to Explore -> Metrics, the webpage said it was "Unable to retrieve metric names". My steps were: started minikube, did installlogging and connect logging, and I also added the suggested metrics to the default graphml file before starting warnet. I did notice kubectl had a number of pods and statefulsets with issues: Here's some more context: |
|
@willcl-ark I committed a default dashboard template and documented its use: https://github.com/bitcoin-dev-project/warnet/blob/4fb03b133a0558fc01ae6183d64cd392ac1b10e7/docs/monitoring.md this will work with the default.graphml graph. Follow up PR will make that |
|
Thanks @pinheadmz , will test it out shortly |
|
@mplsgrant |
|
@pinheadmz I'm not sure what that's all about. I upped my minikube to 8 cpus and 32 gigs of memory, but I got the same error. Edit: Wait, actually, I'll try running again after tweaking ulimit. |
|
To resolve the "too many files" issue, I updated my I would say that this is a requirement for graphana (or at least our use of graphana). |
| # label=method(params)[return object key][...] | ||
| METRICS = os.environ.get( | ||
| "METRICS", | ||
| 'blocks=getblockcount() inbounds=getnetworkinfo()["connections_in"] outbounds=getnetworkinfo()["connections_in"] mempool_size=getmempoolinfo()["size"]', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prepend with "bitcoind."? Assuming period is a valid character in a metric label.
m3dwards
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this PR. One nit about metrics naming. With or without happy for this to get merged.
willcl-ark
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| RUN pip install --no-cache-dir prometheus_client | ||
|
|
||
| # Prometheus exporter script for bitcoind | ||
| COPY bitcoin-exporter.py / |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the / for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't that copy the file to root directory inside the container ?
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones: If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
|
nice |
1 similar comment
|
nice |



The goal of this PR is to enable Warnet users to specify exactly which RPC response data from which tanks to monitor in a Grafana dashboard.
resources/images/exporter) maintained by us at https://hub.docker.com/r/bitcoindevproject/bitcoin-exporter. This image checks for environment variableMETRICSwhich should be a space-separated list of labels, RPC commands, and JSON result keys. e.g.inbounds=getnetworkinfo()["connections_in"]. See the updated docs inmonitoring.mdfor more details."metrics"key to the graphml schema and pass its value to the exporter containerlogging_test.pythat runs in CI (after installing helm!) This test starts a network with three nodes, one with default metrics, one with a custom metric, and one with none. After setup, it runs two scenarios and then pulls Prometheus data directly from the Grafana API (just like the web-based dashboard does). This test also runsinstall_logging.shandconnect_logging.shlogs look something like this:In addition, a few clean-ups were stuffed into the pull request:
connect_logging.shis now more resilient with a try/catch/retrytx_floodscenario was fixed to work with only 3 nodes