GuitarString / Electric-Water
- Source
- Commits
- Network (1)
- Issues (5)
- Downloads (0)
- Wiki (1)
- Graphs
-
Branch:
master
-
4 comments Created 2 months ago by BluebieLive voting key listServer Side ArchitecturexI've proposed a N:abcdef sort of list format for the key list, while Guitar String has proposed a server which knows no secrets at all, which would use something resembling a Hash/Dictionary, but implemented on top of the filesystem, in order to stop repeat votes, and which would seem to do nothing to stop people entering made up keys in to the url from sending in multiple bogus votes, which we'd later need to strip out from the logs.
I'd like to share an experience which I feel is relevant, which is that I built a simple web based chatroom app on top of ruby, using the filesystem as a storage, in which every message in the room was a file within the rooms folder. This would work well, until there were about 3,000 files in the folder, and suddenly it became incredibly slow. Unusably slow. As it turned out, this was a limitation of the 'ext' linux filesystem, in that for a certain relatively small quantity of files, it could operate on a folder very very quickly, akin to a hash table, but once you went over that quantity, it would become excruciatingly slow, while still allowing you to store many more files without any technical issues beyond performance. These performance issues remain, even if you never do any directory listings. Even just looking up a named file path to open it for reading or writing has bad performance that makes me wonder if at such large quantities, the filesystem just gives up on efficiency and starts doing a simple top-to-bottom search for the information from a list.
It became enough of an issue that I ended up just deleting one old message every time a new one was added after a few hundred had accumulated, and the only real solutions to the problem seemed to be:
- Use a different, obscure filesystem, like reiserfs.
- Make subfolders to group every thousand or so files, and then when you end up with a thousand or so of those folders, group them as well, getting ever more complex and annoying.
- Just never have too many files.
This is one of the reasons I really like the flat file model. It is possible that we could split the keys up in to chunks of two characters, and look through the filesystem like /ab/0z/42/mr/90.txt and store things in that way, which would allow us to store data within the keys of an arbitrary length, like including the users ballot, but it doesn't feel intuitively nice to me.
The nice aspect is certainly that the server would act even more like the simple hole punch I so love to use to explain this architecture, but the downside of allowing ourselves to be ddosed with fake votes, polluting the logs, and this strange, and annoyingly operating system specific issue with folder performance, all come to make this seem like probably a bad idea unless we can think of some nifty stuff to store inside those keys, and if we did have nifty stuff to store like that, wouldn't it be better to just store it in some sort of complex database software?
The chat thing by the way, is still up, at http://talkie.me/ — it is that crazy and misguided experiment that has taught me an awful lot about the limits of the filesystem, and when it is not a good place to store complex data there. :)
What interesting data could we pair with the keys? It seems to me that their entire purpose is just to exist, and to cease to exist at the required moment, in a strange sort of DRM that replicates the experience of being given just one ballot paper by a voting official to write on... You get just one key to write on. If that is all the state information we need, I think a single flat file is the way to go, for performance, ease of uploading and downloading, ability to reject made up keys, and immediate usefulness in distribution.
Comments
-
14 comments Created 2 months ago by GuitarStringNeed a key generator functionImplementation designxNeeds to output long random numbers...
Comments
Lets just use a Psuedo-random generator for now, and perhaps create another issue to remind ourselves to maybe improve that later. On the assumption that we'd like a 256 bit key, which encodes to 50 alphanumeric characters, using the whole alphabet, here is some code to generate that sort of key in ruby:
"vote-#{rand((2 ** 256) - 1).to_s(36).rjust(50, '0')}"and the output looks like this:
"vote-3nsbfw9a927db340mcsocxeuvi4ohk2zpu2ox50db0cvy6js0o"We can also do the same thing in javascript relatively easy, however it lacks a left padding function equivalent to rjust, so I also monkey patched the string object to do that.
String.prototype.leftPad = function(idealLength, insert) { var str = this; while(str.length < idealLength) { str = insert + str; } return str; } 'vote-' + (Math.random() * Math.pow(2, 256)).toString(36).leftPad(50, '0');These code samples generate keys that are of consistent length, and of course the vote- prefix is optional, and brings the string length up to 55 characters. :)
Though ten digits or so would be much more sane than fifty, keeping in mind the alphabet makes it a base 36 counting system… If we wanted to make a really strong random number, we'd probably want to generate a long string like that in several chunks. And, of course, I have no idea if ruby's rand() or javascript's Math.random() in any particular instance is going to be even trying to use hardware entropy sources.
Ten digits of Base 36 (as above) gives us a numeric range of 0–3,656,158,440,062,975 (0000000000–zzzzzzzzzz), which is way more numbers than we'd ever need, considering the population of australia is only about 22,027,877 right now.
Meanwhile, parsing these strings in ruby and javascript is done with string.to_i(36) and parseInt(string, 36) respectively. 3.6 quadrillion should be plenty really. :)
GuitarString
Sat Oct 24 04:23:51 -0700 2009
| link
Right now its using a psuedorandom key generator;
$RANDOM give a 16bit psuedorandom so I have put a number of them together... this is a cheap standin.
Full on random can be done with:
dd if=/dev/random ibs=64 count=1 iflag=fullblock |xxd -ps dd copies 64 bytes = 512 bits from /dev/random and xxd converts it into hex
...it requires a rng which my eeepc does not have so it takes forever to get input from desktop events lol we could hold a mouse on a treadmill allowing it to swivel a bit...
($RANDOM uses /dev/urandom a psuedorandom number generator)lol, this is the fastest I could get it
1+0 records in
0+1 records out
64 bytes (64 B) copied, 0.837326 s, 0.1 kB/s
5a4ca73b102088a18027901af67e20f91343e1250afd21b7e445ad681550
45dc5253851d21aa56ed79f45eae0f24f4dce8ed0eb193d7bbfec46a208d
2ac68025
(note the '0+1 records out 64 bytes (64 B) copied, 0.837326 s, 0.1 kB/s' goes to stderr and is not counted in the output) Sure we could convert it into a higher base quite easily...
Ok, we need way more keys than there are people in this world... so that its hard to guess..
512 bits gives us 2^512 possible keys:
13407807929942597099574024998205846127479365820592393377723561443721<br/> 76403007354697680187429816690342769003185818648605085375388281194656<br/> 9946433649006084096
Just incase all animals, kids, objects, aliens want to vote too :p
2^(512)=32^x
512ln(2)=xln(32)
x=512ln(2)/ln(32)
x~104 base 36 chars, ok.. thats a bit to much for twitter :p
for only 256 bits we will need 256ln(2)/ln(32)
~52 base 36 chars for only 125 bits we will need
~25
GuitarString
Sat Oct 24 04:25:13 -0700 2009
| link
:s line endings messed up at the end there
GuitarString
Sat Oct 24 04:26:52 -0700 2009
| link
128 bits or maby 52 bits seems good to me
Hex is a waste. We have lots of other wonderful characters we could be using. Characters such as k, y, v, and yes, even x.
Also stuff /dev/random, as there's a very strong chance that it's actually less random than the secure psuedorandom generator at /dev/urandom (even though on many BSD's and OS-X /random just links straight to /urandom) The urandom psuedorandom generator uses entropy when it's available, which makes it pretty much just as safe as the raw entropy, if not better, because someone trying to attack it can't predict when the generator has entropy to consume and when it lacks it, so they cannot predict what it's future numbers are likely to be. We don't know what quality the entropy sources inside a consumer device is.. it could be quite dodgy.
The issue with depending on the unix devices is running it on non-unixes, though python's rand function in it's OS package is cross platform, making use of the entropy sources in windows also, and python is available in titanium... :)
GuitarString
Sat Oct 24 04:46:53 -0700 2009
| link
Yeah I agree with you on 36 base...
lol I didn't know that about bsd...
Well if it had nrg on linux it would be the best thing to use..
We will just have a seperate program that creates keys....As I've said many times now, I'm happy to write these components in a desktop app, and I could quite easily whip up this stuff in the form of Ruby scripts also. Don't feel at all pressured to do difficult things.. :)
I'm just mostly interested in doing GUI stuff as I don't see the command line tools as actually being useful in any realistic manner.
GuitarString
Sat Oct 24 04:51:41 -0700 2009
| link
lol I see gui's as just reporting and managing command line tools
GuitarString
Sat Oct 24 04:52:35 -0700 2009
| link
Just think of command line tools as seperate functions in a big gui program
Why? Splitting the app in such a way only makes it even more tied to the unix architecture, especially if the software is written in something as ungodly as bash. Most GUI tools are not wrappers around command line tools in modern desktop computers. Command line tools and desktop applications just work in such vastly different ways. For a command line tool to integrate properly with a desktop app, it generally has to compromise it so severely that it really ends up just being another chunk of evented desktop code sitting in another process. Great for concurrency! Not so great for the essential spirit of how the terminal works.
`dd bs=4 count=2 if=/dev/urandom`.unpack('Q').first.to_s(36).rjust(13) in ruby makes a nice 13 character (64bit) string, using the computers psuedosecure PNG. :)
To use that in a bash script, you can wrap it up like this:
ruby -e "puts \`dd bs=4 count=2 if=/dev/urandom\`.unpack('Q').first.to_s(36).rjust(13, '0')"Or use print if you don't want the trailing newline. Do be aware that dd seems to output something to the error pipe which shows up in the console.
GuitarString
Mon Oct 26 01:39:53 -0700 2009
| link
Cool, ill just add that in
We should also find some kind of test to make sure it works well
GuitarString
Tue Oct 27 06:20:09 -0700 2009
| link
Ok, this is pretty much sorted for now, I'll close it.





LOL, ok firstly I use ext4 :D and even ext4 only has:
( can't wait for btrfs)
But yeh I was just giving ext as an example of something using a hash table I wasn't suggesting that we use it...
Some (most, I think berkley db uses it) database systems/ general storage systems implement hash tables... and if that doesn't fit I could create something in c (very last resort)
It would actually be pretty simple and there are many hash table libraries available
It would only need to have like 3 interfacing functions
add(key)
keyexists?(key)
export() -> stdout
Just because it supports 4 billion files, does not mean it supports that quantity with a high lookup speed. As I tried to explain, EXT has no issue storing many files, the issue is that there is a severe performance drop after about 3,000.
I don't remember you mentioning ext, but the server this runs on is very likely to be a linux system, and given that, some version of ext is likely what we'll be given. It is my strong impression that the 3000ish files issue applies to a lot of 'modern' filesystems currently though, and that it's only avoided when using specialized filesystems like reiser.
Aside from implementation details, why would we want to use a hash table? what gain does it bring?
Yes as I said before, we are not using ext.
Hash Table Performance
Insertion/Modifying/Viewing O(k) - Pretty much constant lookup time no matter how many keys
This is seriously the only way I see that we can go.
It will be faster and it will allow us to not have a copy of the keyfile on the server which I see as very important.
Or we could not use a hash table. filesystem is likely something we don't control. Dedicated or Virtual Dedicated hosting will likely be provided by netregistry and will likely have some kind of linux already, on some sort of filesystem which is overwhelmingly probable to be EXT of some version. You havent provided what I feel to be a good reason to use a hash table, and there are plenty of performance reasons not to.
I'm going to close this issue now. Feel free to reopen it if you think of a good reason why we would need to invest in a less highly performant and more difficult to setup architecture for the server side key-list. The performance of the initialiser and counting tools aren't too important, we can let those churn, but the server needs to be able to cope with a lot of stress without lagging too far behind.