feat(infra): Z→Q migration — Ansible/Terraform/monitoring rename (PR 2/3)#14
Conversation
…monitoring Part 2 of the Z→Q testnet migration. Renames the execution-client binary (gzond → gqrl), its Ansible role directory, all gzond_* variables, the systemd service template, cross-role references (qrysm-beacon, security, monitoring), Terraform variables (zond_rpc_url → qrl_rpc_url, ZOND_RPC_URL → QRL_RPC_URL), Prometheus scrape job + alert names (Gzond* → Gqrl*), Grafana dashboard queries, and infrastructure README/ARCHITECTURE comments. Runbooks and standalone documentation are intentionally out of scope and will be covered in PR 3.
There was a problem hiding this comment.
Code Review
This pull request performs a comprehensive rename of the execution client from gzond to gqrl across the entire infrastructure codebase, including Ansible roles, Terraform modules, monitoring, and documentation. Feedback was provided to improve the gqrl Ansible role by ensuring the build task correctly handles binary updates, refining the sync status display for clarity, and utilizing the defined log level variable in the systemd service template.
| - name: Build gqrl from source | ||
| shell: | | ||
| cd /opt/quantapool/gqrl/source | ||
| make gqrl | ||
| cp build/bin/gqrl {{ gqrl_binary_path }} | ||
| args: | ||
| creates: "{{ gqrl_binary_path }}" | ||
| when: gqrl_clone.changed or not ansible_check_mode | ||
| notify: restart gqrl |
There was a problem hiding this comment.
The creates argument in this task will prevent the gqrl binary from being updated when the source code changes. Since the binary already exists at gqrl_binary_path after the first deployment, Ansible will skip this task even if gqrl_clone.changed is true. Additionally, the when condition gqrl_clone.changed or not ansible_check_mode is problematic as it will attempt to run the build on every normal execution (since not ansible_check_mode is true), which is inefficient.
Consider removing creates and using a more robust when condition that checks if the source changed or if the binary is missing.
- name: Build gqrl from source
shell: |
cd /opt/quantapool/gqrl/source
make gqrl
cp build/bin/gqrl {{ gqrl_binary_path }}
when: gqrl_clone.changed or not ansible_check_mode
notify: restart gqrl|
|
||
| - name: Display gqrl sync status | ||
| debug: | ||
| msg: "gqrl sync status: {{ sync_status.json.result | default('starting') }}" |
There was a problem hiding this comment.
The sync status message is counter-intuitive when the node is fully synced. The eth_syncing RPC method returns false when the node is synced, which results in the message gqrl sync status: False. It would be clearer to explicitly state 'Synced' or 'Syncing'.
msg: "gqrl sync status: {{ 'Synced' if sync_status.json.result == false else 'Syncing' }}"| --metrics.addr {{ gqrl_metrics_addr }} \ | ||
| --metrics.port {{ gqrl_metrics_port }} \ | ||
| {% endif %} | ||
| --verbosity 3 |
There was a problem hiding this comment.
Summary
gzond→gqrland Ansible role directory to match the upstream project rename (go-zond→go-qrl).gzond_*→gqrl_*(defaults, playbooks, inventory, group_vars), cross-role references (qrysm-beacon, security, monitoring), and the systemd service template.zond_rpc_url→qrl_rpc_url, envZOND_RPC_URL→QRL_RPC_URLacross modules (validator-node, backup-node, monitoring-server, testnet env).gzond→gqrl, alert rulesGzondDown/LowPeers/Syncing→Gqrl*, Grafana dashboard queries, Alertmanager inhibit rules.infrastructure/README.md,infrastructure/docs/ARCHITECTURE.md,monitoring/README.md, and inline comments.Scope note
Runbooks and standalone documentation (
infrastructure/docs/runbooks/*,DEPLOYMENT.md,validator-integration.md, top-levelREADME.md,docs/architecture.md) are intentionally out of scope and will be covered by PR 3.Files changed (33)
Test plan
ansible-playbook -i inventory.ini.template playbooks/deploy-node.yml --check --diffterraform -chdir=infrastructure/terraform validate+terraform plan -var-file=environments/testnet/terraform.tfvarspromtool check rules monitoring/prometheus/rules/validator-alerts.ymlpromtool check config monitoring/prometheus/prometheus.ymlgzond,Zond,ZOND_in changed dirs (excluding runbooks/docs) — should be empty