Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically maintain leases #44

Closed
exarkun opened this issue Oct 9, 2019 · 9 comments · Fixed by #73
Closed

Automatically maintain leases #44

exarkun opened this issue Oct 9, 2019 · 9 comments · Fixed by #73

Comments

@exarkun
Copy link
Collaborator

exarkun commented Oct 9, 2019

The premise of the plugin is that leases are maintained with a scarce resource (ZKAPs) and shares without leases can be garbage collected to reclaim another scarce resource (storage).

For this to work, something must actually maintain leases on shares which are meant to remain alive. There is an existing UI for this in the Tahoe-LAFS CLI, eg tahoe deep-check --add-lease. The user could keep their data alive by using this to manually renew leases.

It should be possible to provide a better experience by renewing leases automatically, though. This removes the need for the user to remember to perform the task and creates the possibility that the task can be performed in a way that leaks less information to observers (the user may fall into patterns that can be identified, automation may be able to avoid doing so).

@exarkun
Copy link
Collaborator Author

exarkun commented Oct 9, 2019

Here's a design.

  • Read a cap from <node>/private/rootcap (this happens to be where GridSync keeps its rootcap).
  • Traverse the rootcap and renew the lease on the shares for any capability where the existing lease has less than half its period remaining (lease period is hard-coded to 31 days in Tahoe-LAFS so "half its period" means "15.5 days").
  • Repeat in 7 ± 3 days

This design requires:

  • Storage servers to expose information about when current leases expire
  • The client plugin to remember when it last ran through the process

The considerations that have informed this design:

  • Half lease period is selected based on the hard-coded 31 day lease period.
    • This allows a client to be offline for a little more than 2 weeks without missing a lease renewal on any shares.
    • It avoids excessive traversal by allowing renewal passes to be spaced out to a period greater than several days. Spacing this out over several days in turn allows a lot of noise to be inserted into the delay which may reduce the recognizability of the renewal patterns to passive observers.
    • The "lost value" of the half lease period that is thrown away at each renewal can be recovered by pricing ZKAPs appropriately (ie, consider the time component of their value to be 15.5 days instead of 31 days).
  • Traversal from the rootcap is selected based on local state concerns.
    • It avoids the need to locally store a complete list of individual caps along with their expiration time.
    • It avoids desynchronization of state between ZKAPAuthorizer and GridSync.

@exarkun
Copy link
Collaborator Author

exarkun commented Oct 9, 2019

So, regarding the above, how feasible is it to satisfy the requirements that:

  • The client plugin remembers when it last ran through the process
    • This is easy. Put it in filesystem state or the SQLite3 database we have for vouchers and zkaps. The worst case if it is lost is that we perform a check right away and then store a new timestamp.
  • Storage servers expose information about when current leases expire
    1. The ideal place to receive this information would be in the file entry in the directory object. This seems infeasible because only a party holding the write cap for the directory can update it. This means the server cannot update it. So, the client needs to update it separately from lease update. This is prone to desynchronization and increases the write cost of lease updates for client and server significantly (at least compared to the current cost, whether it is absolutely significant or not I'm not sure, but it involves reencrypting and rewriting the whole directory instead of just writing a new plaintext lease expiration to the right place in the storage server).
    2. A different solution would be to have a new API that lets the client ask the server for lease expiration information for a storage index (or perhaps a list of storage indexes). With this, the client would traverse the rootcap and collect caps it cares about. Then it would ask the server for the expiration time of the corresponding storage indexes. Then it would renew whichever shares it needed to. The straightforward implementation of this idea leaks a lot of directory hierarchy/structure metadata to the server but hardly any more than is already leaked simply be traversing a directory hierarchy. Also, the efficient implementation of this weakens shares unlinkability by having the client send all of the storage indexes in one batch (to get expiration time). However, if the client isn't establishing a separate connection (possibly with a different source IP) to the server for each operation on each storage index then this unlinkability doesn't exist in the first place.

(ii) seems feasible and it is tempting to even say straightforward but it probably touches a bunch of parts of Tahoe-LAFS I'm only minimally familiar with so I should probably investigate further before saying something like that.

@meejah
Copy link
Collaborator

meejah commented Oct 24, 2019

Regarding leases on the storage-servers, my understanding (and I just re-read the code, which seems to confirm this) is that leases are stored in the share-files themselves (for both immutables and mutables). Also maybe I'm completely misunderstanding what (i) means above (what I think it means is: ask the rootcap -- a mutable directory -- when its lease expires).

@exarkun
Copy link
Collaborator Author

exarkun commented Oct 24, 2019

For (i) I was imagining having lease information for all of the children of a directory included in the directory itself. This would allow a stat-like call on the directory only to determine lease state for all of the children of the directory (as well as the directory itself, presumable). This inclusion of information related to the children of the directory is what seems impossible.

With that clarification, do you agree? Sorry about the unclear initial writeup.

@meejah
Copy link
Collaborator

meejah commented Oct 25, 2019

Ah, yeah. That makes sense (and yes I agree).

I don't think we need to fret about anonymity / linkability leaks -- because Tahoe doesn't make too many claims about that and I think there are plenty of other places this sort of information is already leaked. And as you already point out, just "traversing the directories" will (probably) reveal it -- unless the client is doing something else, like using Tor. But even then, if a bunch of requests all arrive at once it's pretty likely they're from the same client (except on really busy servers).

@exarkun
Copy link
Collaborator Author

exarkun commented Dec 2, 2019

Alright. So at first blush, to implement this, we need an API like this one:

StorageIndex = bytes

def get_lease_expirations(storage_index: [StorageIndex]) -> [Union[None, datetime]]:
    """
    For each storage index, retrieve the latest lease expiration time or None if there are no active leases.
    """

Or we need two if it turns out we can't tell the difference between slots and storage indexes in the server code but we need to be able to in order to read the lease information. We already have some code that differentiates between the two for the purposes of calculating sizes so we can probably do it for leases too. This brings me to the observation that we also need to know the size of the share at the same time. This is because the number of ZKAPs to spend depends on the size of the data.

There is already an API for retrieving size information about shares: share_sizes. This is used by add/renew_lease to determine exactly what we need here: how many ZKAPs are required to add or renew a lease. However, looking this information up inside add_lease and renew_lease means we need a network round-trip for every single share. Ideally we could have a vectorized version of this API and then use that information for the lease maintenance process. Something like:

def stat_shares(storage_index: [StorageIndex]) -> [ShareStat]:
    """
    For each storage index, retrieve the size and latest lease expiration time (or None for missing information).
    """

This would largely supersede share_sizes and the above proposed get_lease_expirations. Then a lease maintenance job could do something like:

storage_indexes = collect_storage_indexes(rootcap)
share_stats = stat_shares(storage_indexes)
renewals = find_shares_needing_renewal(storage_indexes, share_stats)
renew_leases(renewals)

where find_shares_needing_renewal is something like

def find_shares_needing_renewal(storage_indexes, share_stats):
    for idx, stat in zip(storage_indexes, share_stats):
        if needs_renewal(stat.lease):
            yield idx, stat.size

and renew_leases is something like:

def renew_leases(renewals):
    renew_leases_v(list(
        idx, get_passes(size)
        for (idx, size)
        in renewals
    ))

However, though this is efficient on the network, it's also a little risky of ZKAPs with respect to failure partway through. This approach would extract enough passes to renew everything that needs renewal and then launch them over the network. A failure at this point leaves the client unsure what has been renewed and what hasn't. Without careful retry logic, the client may try to double spend passes which creates linkability. Or the client may fail to spend some passes altogether and lose a large amount of value. Safer would be to keep the renewals as separate operations. Something like:

def renew_leases(renewals):
    for idx, size in renewals:
        renew_lease(idx, size)

This implies many network round-trips but it exposes less value/privacy to loss in the case of a failure. The existing renew_lease API can easily be extended to allow, optionally, the size to be passed in as an argument instead of retrieving it individually from the server internally. This would at least avoid two round-trips per share.

As far as scaling goes, if we assume a renewal takes about a tenth of a second, this limits a client to something over 26 million files which is probably an alright ceiling for the time being. This does suppose the client runs 24/7 renewing leases, though.

Still, to maintain 10,000 files would only require the client to run 1000 seconds (16 minutes) per month.

So perhaps that's fine.

@exarkun
Copy link
Collaborator Author

exarkun commented Dec 13, 2019

Somewhere, an account of how many storage indexes and their sizes must be made, along with the number of ZKAPs spent on the renewals. This will support reporting to a user so they know what they're spending.

@exarkun
Copy link
Collaborator Author

exarkun commented Jan 6, 2020

Somewhere, an account of how many storage indexes and their sizes must be made, along with the number of ZKAPs spent on the renewals. This will support reporting to a user so they know what they're spending.

The branch is already somewhat overly large so this should probably be kicked off to a follow-up ticket.

@exarkun
Copy link
Collaborator Author

exarkun commented Jan 6, 2020

#74 for the accounting etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants