Fix deleted VMs: race condition in cleanup and invoice/payment guards for deleted VMs#121
Fix deleted VMs: race condition in cleanup and invoice/payment guards for deleted VMs#121
Conversation
Re-read VM state from database before deletion in the unpaid VM cleanup loop. If a payment was confirmed between the initial list_vms() snapshot and the deletion step, the VM will have expires != created and will be skipped instead of deleted. Also adds admin notification when the pre-deletion re-read fails. Adds regression tests: - test_check_vms_deletes_unpaid_vm_after_one_hour - test_check_vms_skips_unpaid_vm_within_one_hour Agent-Logs-Url: https://github.com/LNVPS/api/sessions/dfa9f0f9-6e61-4671-8747-e45fe763be13 Co-authored-by: v0l <1172179+v0l@users.noreply.github.com>
|
@copilot in the example in prod, the VM was already older than 1hr how was somebody able to generate an invoice and pay for a deleted vm |
- Add vm.deleted check in get_user_vm() so the renew endpoint rejects deleted VMs with "VM not found" - Add deleted guard in vm_payment_paid() (MySQL): update vm.expires only when deleted = 0; roll back and error if the VM was already deleted - Mirror the deleted guard in MockDb::vm_payment_paid() - Change MockDb::delete_vm() from hard-delete to soft-delete (set deleted = true) to match MySQL behavior - Update rollback tests that assumed hard-delete semantics Agent-Logs-Url: https://github.com/LNVPS/api/sessions/dc786568-4b72-4ed2-b117-c34f9b25f2e9 Co-authored-by: v0l <1172179+v0l@users.noreply.github.com>
Good catch. There were two separate paths where this could happen:
Fixed in commit
Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
lnvps_db/src/mysql.rs
Outdated
| .await? | ||
| .rows_affected(); | ||
|
|
||
| if rows_affected == 0 { |
There was a problem hiding this comment.
Instead of this rollback we should just un-delete the VM assuming its a new vm, because there might be other payment methods where payments can persist longer than our pre-defined "delete unpaid vms"
There was a problem hiding this comment.
Done in a6c0b5a. vm_payment_paid() now sets deleted = 0 alongside the expiry update in a single UPDATE, so a payment for a previously auto-deleted VM restores it. The subsequent CheckVm job will re-spawn it on the host. This applies to both MySQL and MockDb.
When vm_payment_paid() receives a payment for a deleted VM (e.g. auto- deleted before a slow payment arrived), un-delete it and apply the payment instead of rolling back with an error. The subsequent CheckVm job will re-spawn it on the host. This handles payment methods with longer timeouts (Revolut, future Stripe/Paypal) where payments can persist beyond the auto-delete window. Agent-Logs-Url: https://github.com/LNVPS/api/sessions/d9b462bf-2581-4ede-8eca-d02a6ab63ac5 Co-authored-by: v0l <1172179+v0l@users.noreply.github.com>
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
@isolkewo review this pls |
|
LGTM! ✅ I've reviewed the implementation and it looks solid. The fixes address both issues mentioned in the PR description: 1. Race condition in check_vms(): ✅
2. Deleted-VM guards: ✅
The implementation handles the edge cases well, particularly the scenario where a payment arrives after a VM has been auto-deleted. The atomic UPDATE that sets both and ensures consistency. Ready to merge once you're satisfied! |
isolkewo
left a comment
There was a problem hiding this comment.
LGTM! The implementation correctly addresses both issues:
Race condition fix in check_vms():
- Re-reads VM from DB before deletion to catch payments that arrived after the snapshot
- Properly skips deletion for paid VMs
- Includes error handling for DB read failures
Deleted-VM guards:
- Invoice creation now rejects deleted VMs
- Payment processing un-deletes VMs (atomically sets deleted=0 and extends expiry)
- MockDB updated to match production soft-delete behavior
The atomic UPDATE in vm_payment_paid() is particularly good - it ensures consistency when payments arrive for auto-deleted VMs.
Ready to merge.
VMs were being marked as deleted after being paid for. Two separate issues were identified and fixed.
Root Causes
1. TOCTOU Race Condition in
check_vms()check_vms()snapshots all VMs vialist_vms(), identifies unpaid VMs (created == expires, age > 1hr) for deletion, then executes the deletions. If a payment is confirmed (Lightning callback / Revolut webhook) between the snapshot and the deletion step,vm.expiresis updated in the DB—but the deletion proceeds anyway based on the stale snapshot.2. No Deleted-VM Guard on Invoice Creation or Payment Processing
get_user_vm()calleddb.get_vm(id)with nodeletedcheck, so users could call the renew endpoint on an already-deleted VM and receive a fresh invoice. Additionally,vm_payment_paid()had no handling for payments arriving after a VM was auto-deleted, meaning paying an outstanding invoice (e.g. a slow Revolut payment or any future long-timeout payment method) had no effect on the VM.Fixes
Race condition (
worker.rs)Re-read each VM from the DB immediately before deleting it. Skip deletion if the VM has transitioned to paid state (
created != expires) since the snapshot. Also adds an admin notification when the pre-deletion DB read fails.Invoice creation guard (
routes.rs)get_user_vm()now rejects deleted VMs with "VM not found", preventing new invoices from being generated for them.Payment processing — un-delete on payment (
mysql.rs)vm_payment_paid()now atomically setsdeleted = 0alongside the expiry extension in a singleUPDATE. If a payment arrives for a VM that was auto-deleted (e.g. a slow payment from Revolut or a future long-timeout payment method), the VM is restored and the subsequentCheckVmjob will re-spawn it on the host.Mock consistency (
mock.rs)MockDb::delete_vm()changed from hard-delete to soft-delete (deleted = true) to match MySQL behaviour.MockDb::vm_payment_paid()mirrors the un-delete-on-payment behaviour.Tests
test_check_vms_deletes_unpaid_vm_after_one_hour— unpaid VMs older than 1hr are deletedtest_check_vms_skips_unpaid_vm_within_one_hour— unpaid VMs younger than 1hr are not deleted