Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

userProfile can't grow very large #52

Open
splinterofchaos opened this issue Jun 7, 2015 · 3 comments
Open

userProfile can't grow very large #52

splinterofchaos opened this issue Jun 7, 2015 · 3 comments

Comments

@splinterofchaos
Copy link
Collaborator

950 bytes for publishing all of one's repositories and branch names with 40-byte shas puts too many constraints on what we can store. A repo named with just one letter requires 74 bytes ({'repositories':{'X':{'HEAD':'sha'}}}), then each branch requires at least 58 bytes (including the comma: ,'refs/heads/X':'sha') which means we can store at most 11 branches (74 + x58 < 950 => x < 12). This should be sufficient for most people's local repositories, if they consistently prune branches that have been merged or become moot, but we can't host pull requests this way, which means we're still tied to github. This repository only has 3 open pulls right now, but larger projects can have hundreds.

Just some brain stormed ideas:

We can compress userProfile: Instead of having the top-level item be a field, repositories, just make userProfile be a list of repositories. We don't need to store HEAD's sha, explicitly, just which branch it points to, unless it's detached. Or we could always report refs/heads/master as HEAD, even when it's not. Every eight characters in the sha could be stored in a 32-bit integer, requiring 20 bytes total, but not using json to encode it.

Not use json: very convenient and simple, but not compact, especially not for raw integers. I've worked with msgpack before and it does have a js implementation. Not familiar with the js ecosystem, but I'm sure other equally efficient serialization libraries exist. I do think it's important, though, that we use something implemented in multiple languages for (ref: #12).

A linked list of mutable keys? We could subvert all limitations by allocating another key when we run out of space and having a next or previous field.

Let users manually decide which branches to share via git-export-ok. We could put regex patterns in this file for what to include, or what to exclude. The default could look something like this:

include: refs/heads/*
exclude: refs/remotes/*

(exclude: could be used to not add files that include: would match. include: could also just be *)

@cjb
Copy link
Owner

cjb commented Jun 7, 2015

@splinterofchaos We could actually also just increase the size limit. The mainline bittorrent DHT only guarantees 1000 bytes per key (clients could store more), but we're not using the mainline bittorrent DHT itself, just its protocol.

It would be our first breaking change against mainline DHT, though, so maybe it's worth trying to avoid.

@splinterofchaos
Copy link
Collaborator Author

we're not using the mainline bittorrent DHT itself, just its protocol.

That makes sense, but might the libraries implementing the DHT and protocol decide to reject large messages at any time? If that ever happened, though, I suppose we could use a forked 'n patched version.

Still, if we do stop limiting to the size, a compressed userProfile would mean transferring the same amount of information using less bandwidth. Though, with network speeds what they are, maybe that's not an issue.

@cjb
Copy link
Owner

cjb commented Jun 7, 2015

It's the bittorrent-dht module. It currently enforces the 1000 byte limit; we'd just ask the maintainers to add a constructor option to relax that limit (perhaps by 10-20x in our case).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants