Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot list wallets when using same state directory but different genesis #1292

Closed
piotr-iohk opened this issue Jan 21, 2020 · 5 comments
Closed
Assignees

Comments

@piotr-iohk
Copy link
Contributor

piotr-iohk commented Jan 21, 2020

Context

#1281 (comment)

Information -
Version 2020.1.14 (git revision: 5e69015)
Platform Linux (all?)
Installation From source code

Steps to Reproduce

  • Create a standalone private blockchain (e.g. using the integration config).
  • Turn on the wallet server and wait for a few slots / epochs.
  • Create a wallet
  • Turn the wallet off.
  • Change a setting in the genesis configuration and re-generate the genesis block (effectively creating another blockchain).
  • Restart the server on the same state directory.

In the log there is:

[cardano-wallet.wallet-engine:Error:29] [2020-01-21 11:04:31.30 UTC] Worker has exited unexpectedly: ErrCheckIntegrityDifferentGenesis (Hash {getHash = "\216\163\131S\189W\b]\213+\139U\169Ue\205W\215\158\SI\131}\151ed\ACK\250\186O~\233\&7"}) (Hash {getHash = "<\a\ETX\SO6\191\255\230~..\192\158R\147\211\132c|\210\240\EOT5n\243 \243\254lR\EOT\SUB"})

and then:

Expected behavior

$ cardano-wallet-jormungandr wallet list
Ok.
[]

Actual behavior

$ cardano-wallet-jormungandr wallet list
Something went wrong

Resolution


QA

@piotr-iohk piotr-iohk changed the title Cannot list wallets after rollback to different to different genesis Cannot list wallets after rollback to different genesis Jan 21, 2020
@KtorZ KtorZ changed the title Cannot list wallets after rollback to different genesis Cannot list wallets when using same state directory but different genesis files Jan 21, 2020
@KtorZ KtorZ changed the title Cannot list wallets when using same state directory but different genesis files Cannot list wallets when using same state directory but different genesis Jan 21, 2020
@jonathanknowles jonathanknowles self-assigned this Jan 22, 2020
@jonathanknowles
Copy link
Member

jonathanknowles commented Jan 22, 2020

Confirmed on v2020-01-21 (d188a5f).

@KtorZ
Copy link
Member

KtorZ commented Jan 22, 2020

@jonathanknowles I did some investigation yesterday. The main issue is due to the worker not being unregistered from the registry when it dies.

@jonathanknowles
Copy link
Member

jonathanknowles commented Jan 22, 2020

@KtorZ wrote:

@jonathanknowles I did some investigation yesterday. The main issue is due to the worker not being unregistered from the registry when it dies.

Yes, I found the same thing in my investigation.

But there's something else that's going on. The current implementation of newWorker returns a Worker even if evaluating the before action fails:

https://github.com/input-output-hk/cardano-wallet/blob/d188a5fcf73d26f51391c5aea5030dc8e152741d/lib/core/src/Cardano/Wallet/Registry.hs#L216-L220

Though the comment of the function explains that newWorker should return Nothing if the worker terminates unexpectedly before entering its main action.

One possibility here would be to change this function so that if before fails, we don't return a Worker from newWorker.

Thoughts?

@jonathanknowles
Copy link
Member

The main issue is due to the worker not being unregistered from the registry when it dies.

Just for the record: we discussed this in our team meeting and agreed to adjust the worker life cycle so that newWorker will handle the registration, requiring the caller to pass the registry as an argument. On cleanup (when the thread dies), this function will also handle unregistration from the registry map, preventing the situation where we can perform a successful lookup on a worker whose thread has already died.

iohk-bors bot added a commit that referenced this issue Feb 4, 2020
1301: Simplify worker registration lifecycle. r=jonathanknowles a=jonathanknowles

# Issue Number

#1292 

# Overview

This PR adjusts the worker registration life cycle so that creating and registering a worker is now a single operation, rather than a two-step operation.

Whereas previously callers would need to first call `newWorker` followed by `insert`, they now only need to call `register`.

In addition, we now provide automatic garbage collection of stale worker references from the registry, so that attempting to look up a previously-registered but prematurely-terminated thread will result in `Nothing`. 

Co-authored-by: Jonathan Knowles <jonathan.knowles@iohk.io>
@jonathanknowles
Copy link
Member

We should investigate what would happen if a worker thread dies, w.r.t. Daedalus.

With PR #1301, if a worker thread dies, then its key is removed from the registry. In that case, the wallet will no longer be listed by listWallets, and the wallet will no longer be accessible to Daedalus.

iohk-bors bot added a commit that referenced this issue Mar 24, 2020
1474: Mark wallet as "dead" when underlying workers die r=KtorZ a=KtorZ

# Issue Number

<!-- Put here a reference to the issue this PR relates to and which requirements it tackles -->

#1292 (in particular, last comment: #1292 (comment))

# Overview

<!-- Detail in a few bullet points the work accomplished in this PR -->

- 551a463
  move list databases to enable server calling it from API handlers
  When a wallet worker dies, it is removed from the registry and therefore, is no longer
available for listing. Yet, the file might still exist on disk, so looking at the database
files is more reliable than looking at the registered workers. Should a worker be dead
however, operation on a wallet should be degraded

- 27b19a6
  create a new 'ErrDeadWallet' to capture dead workers
  
- c3ce89a
  list workers through known databases and allow recovering from a dead worker
  With this, we could still return wallet as degraded when listing and fetching them independently. At
least, it indicates client that something went wrong and despite being stuck, wallets remains accessible.

- 4978205
  marks wallets as 'dead' when fetching them despite a worker being dead
 
# Comments

<!-- Additional comments or screenshots to attach if any -->

Kinda hard to test because workers ain't supposed to die in practice 😬 ...
So, I injected a temporary fault in the restoration loop to make the worker dies immediately and it works as expected:

<details>
    <summary>curl http://localhost:8090/v2/byron-wallets | jq</summary>

```json
[
  {
    "passphrase": {
      "last_updated_at": "2020-03-23T11:36:12.582891065Z"
    },
    "state": {
      "status": "dead"
    },
    "balance": {
      "total": {
        "quantity": 0,
        "unit": "lovelace"
      },
      "available": {
        "quantity": 0,
        "unit": "lovelace"
      }
    },
    "name": "MyWallet",
    "id": "adc78398110d92af3132488546c6977f61dc0c7e",
    "tip": {
      "height": {
        "quantity": 0,
        "unit": "block"
      },
      "epoch_number": 0,
      "slot_number": 0
    }
  }
]
```
</details>

<details>
    <summary>curl http://localhost:8090/v2/byron-wallets/{walletId} | jq</summary>

```json
{
  "passphrase": {
    "last_updated_at": "2020-03-23T11:36:12.582891065Z"
  },
  "state": {
    "status": "dead"
  },
  "balance": {
    "total": {
      "quantity": 0,
      "unit": "lovelace"
    },
    "available": {
      "quantity": 0,
      "unit": "lovelace"
    }
  },
  "name": "MyWallet",
  "id": "adc78398110d92af3132488546c6977f61dc0c7e",
  "tip": {
    "height": {
      "quantity": 0,
      "unit": "block"
    },
    "epoch_number": 0,
    "slot_number": 0
  }
}
```
</details>

<details>
    <summary>Anything else</summary>

```json
{
  "code": "wallet_is_dead",
  "message": "That's embarassing. My associated worker for adc78398110d92af3132488546c6977f61dc0c7e is no longer responding. This is not something that is supposed to happen. The worker must have left a trace in the logs of severity 'Error' when it died which might explain the cause. Said differently, this wallet won't be accessible until the server is restarted but there are good chances it'll recover itself upon restart."
}
```
</details>

<!-- 
Don't forget to:

 ✓ Self-review your changes to make sure nothing unexpected slipped through
 ✓ Assign yourself to the PR
 ✓ Assign one or several reviewer(s)
 ✓ Once created, link this PR to its corresponding ticket
 ✓ Assign the PR to a corresponding milestone
 ✓ Acknowledge any changes required to the Wiki
-->


Co-authored-by: KtorZ <matthias.benkort@gmail.com>
@KtorZ KtorZ closed this as completed Mar 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants