Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not all base encodings maintain lexicographic-order #7

Closed
4 tasks done
CMCDragonkai opened this issue Oct 21, 2021 · 7 comments · Fixed by #8
Closed
4 tasks done

Not all base encodings maintain lexicographic-order #7

CMCDragonkai opened this issue Oct 21, 2021 · 7 comments · Fixed by #8
Assignees
Labels
bug Something isn't working development Standard development r&d:polykey:core activity 1 Secret Vault Sharing and Secret History Management r&d:polykey:core activity 3 Peer to Peer Federated Hierarchy

Comments

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Oct 21, 2021

Specification

It turns out that not all base encodings maintain lexicographic-order. For example base64 and base58 have alphabets that do not preserve order.

After some random testing, I found that the existing test case fails very rarely: encoded multibase strings are lexically sortable.

I believe this applies to the base encoding but not to the binary string encoding.

Therefore that test doesn't make sense, it makes more sense to identify which base encodings are lexicographic order, and create separate tests for those.

In our README.md it will be important to state that some base encodings will lose the lexicographic order, and provide a selection of the encodings that will maintain order.

Additional context

Tasks

  1. - Devise a good test to test lexicographic order
  2. - Identify the base encodings that maintain order
  3. - Fix README.md
  4. - Ensure all PK uses of base encoded IdSortable should be using lexicographic order preserving base encodings if need be
@CMCDragonkai CMCDragonkai added development Standard development bug Something isn't working labels Oct 21, 2021
@CMCDragonkai CMCDragonkai self-assigned this Oct 21, 2021
@CMCDragonkai
Copy link
Member Author

Here's an output of 1 run of the tests that used base58btc that didn't work.

const before = [
  'zkcpMuSTH5Qyms6UNeUzEy', 'zkcpMuSTH5RhYRusw9Aos2', 'zkcpMuSTH5SVXZ7hEnTWYo',
  'zkcpMuSTH5TFYUrYfaW8d5', 'zkcpMuSTH5U5amB67CJuEn', 'zkcpMuSTH5UoX55jgxLYJ6',
  'zkcpMuSTH5VToqnRqZ5TDm', 'zkcpMuSTH5WDBUrciCYNne', 'zkcpMuSTH5WtGFmdCRb3tU',
  'zkcpMuSTH5XhX2HQhaiV1C', 'zkcpMuSTH5YWRR29zmGbbB', 'zkcpMuSUGcAbYFfKACqKSf',
  'zkcpMuSUGcBDZQLbRxVPLi', 'zkcpMuSUGcBz7LGXvvE62f', 'zkcpMuSUGcCpebDz2r93qg',
  'zkcpMuSUGcDUExCi1WSxyD', 'zkcpMuSUGcEHoyquHbbMQh', 'zkcpMuSUGcEx9r2hU6rxgk',
  'zkcpMuSUGcFggSo43U5AYu', 'zkcpMuSUGcGWvB4eoF47q6', 'zkcpMuSUGcHATR4Rb4Qm6D',
  'zkcpMuSUGcHsLmRrJHpxvr', 'zkcpMuSUGcJboerZkgG2cW', 'zkcpMuSUGcKMfPt4PPn55G',
  'zkcpMuSUGcLBcMxrsUa4A4', 'zkcpMuSUGcLxsyJUWqdDJv', 'zkcpMuSUGcMZAYyeXaDwqq',
  'zkcpMuSUGcNPvRMEVmnQwV', 'zkcpMuSUGcP817AAoypmvr', 'zkcpMuSUGcPtGe231pX5yG',
  'zkcpMuSUGcQbTcZGuH86hk', 'zkcpMuSUGcRKNDXS6jhvCc', 'zkcpMuSUGcSAN5wPhTYJVA',
  'zkcpMuSUGcSpud3wKYfMq5', 'zkcpMuSUGcTXp6EZSSquSb', 'zkcpMuSUGcUKQ4wni8wTY1',
  'zkcpMuSUGcV5VDut2nSCcZ', 'zkcpMuSUGcViPPW3knDtGc', 'zkcpMuSUGcWWpbQG3yHbEm',
  'zkcpMuSUGcXErpiTPJZ3Ww', 'zkcpMuSUGcY1emU9UbMtzZ', 'zkcpMuSUGcYkaLTM7oTTiF',
  'zkcpMuSUGcZYScihhBjBHP', 'zkcpMuSUGcaE1UnuPFAUQV', 'zkcpMu8yZEW76wabV3b3hp',
  'zkcpMu8yZEWowitnvSHsT7', 'zkcpMu8yZEXdGUftrimrdn', 'zkcpMu8yZEYEfEBq21arQ4',
  'zkcpMu8yZEZ2rZzHGdenWt', 'zkcpMu8yZEZtYbmLQ8rdDe', 'zkcpMu8yZEacbXxgzL9TkZ',
  'zkcpMu8yZEbLbVt7qUiDAh', 'zkcpMu8yZEc34AvFsqmiPN', 'zkcpMu8yZEcg2rpRkkZNK7',
  'zkcpMu8yZEdRsKBNpcKNLd', 'zkcpMu8yZEeDNG86DD9twf', 'zkcpMu8yZEf3oMtHXjj8q3',
  'zkcpMu8yZEfhKkh272YNGB', 'zkcpMu8yZEgX1wAD1mAxwt', 'zkcpMu8yZEhC9EoSSKvzFy',
  'zkcpMu8yZEhuHZ9zYsFnVv', 'zkcpMu8yZEiekjCnaQhtf8', 'zkcpMu8yZEjRxY1jyeBvEX',
  'zkcpMu8yZEk8Km9v87osnH', 'zkcpMuSVG8v5HZtJMvBqNX', 'zkcpMuSVG8vqZ5HZ2oQ2oA',
  'zkcpMuSVG8wdi2NFvMUKBx', 'zkcpMuSVG8xGK3kPdbKRed', 'zkcpMuSVG8y6BqbjKEHT5r',
  'zkcpMuSVG8yoUT2qFwSy8F', 'zkcpMuSVG8zWzgwo83AP9P', 'zkcpMuSVG91DkfC6tDagfy',
  'zkcpMuSVG925GvnqC1fkdr', 'zkcpMuSVG92fMrnAT72UJb', 'zkcpMuSVG93VYjec6Zu7pH',
  'zkcpMuSVG94AnXqMsdgQKU', 'zkcpMuSVG94vScfTeFbkeG', 'zkcpMuSVG95ibzkYjhdsrH',
  'zkcpMuSVG96PA4v88n6VRn', 'zkcpMuSVG97FNxp3Jx6bpX', 'zkcpMuSVG97wfXQgXmvCcV',
  'zkcpMuSVG98exdYsFqKEbJ', 'zkcpMuSVG99KSPXmR2KJ7w', 'zkcpMuSWFffe3jRbzWLCBx',
  'zkcpMuSWFfgGrnkzyryHgZ', 'zkcpMuSWFfh8uHbZNhKggy', 'zkcpMuSWFfhqzdV4Mre24h',
  'zkcpMuSWFfiY8Ysy6Fk88N', 'zkcpMuSWFfjMy3qN6uzsoz', 'zkcpMuSWFfjz7yUobzjEJ3',
  'zkcpMuSWFfksLia9Rcfzup', 'zkcpMuSWFfmZXuDi6YMSit', 'zkcpMuSWFfnEyHAHESWEWD',
  'zkcpMuSWFfnwRkdJY8uDvK', 'zkcpMuSWFfofz5CUsb2v7D', 'zkcpMuSWFfpUUPUXYboULf',
  'zkcpMuSWFfq8TvJyuSnzqy', 'zkcpMuSWFfqzQt7U5PTUYW', 'zkcpMuSWFfritht74m4uBv',
  'zkcpMuSWFfsMjc1zgLsfoc'
];

const sorted = [
  'zkcpMu8yZEW76wabV3b3hp', 'zkcpMu8yZEWowitnvSHsT7', 'zkcpMu8yZEXdGUftrimrdn',
  'zkcpMu8yZEYEfEBq21arQ4', 'zkcpMu8yZEZ2rZzHGdenWt', 'zkcpMu8yZEZtYbmLQ8rdDe',
  'zkcpMu8yZEacbXxgzL9TkZ', 'zkcpMu8yZEbLbVt7qUiDAh', 'zkcpMu8yZEc34AvFsqmiPN',
  'zkcpMu8yZEcg2rpRkkZNK7', 'zkcpMu8yZEdRsKBNpcKNLd', 'zkcpMu8yZEeDNG86DD9twf',
  'zkcpMu8yZEf3oMtHXjj8q3', 'zkcpMu8yZEfhKkh272YNGB', 'zkcpMu8yZEgX1wAD1mAxwt',
  'zkcpMu8yZEhC9EoSSKvzFy', 'zkcpMu8yZEhuHZ9zYsFnVv', 'zkcpMu8yZEiekjCnaQhtf8',
  'zkcpMu8yZEjRxY1jyeBvEX', 'zkcpMu8yZEk8Km9v87osnH', 'zkcpMuSTH5Qyms6UNeUzEy',
  'zkcpMuSTH5RhYRusw9Aos2', 'zkcpMuSTH5SVXZ7hEnTWYo', 'zkcpMuSTH5TFYUrYfaW8d5',
  'zkcpMuSTH5U5amB67CJuEn', 'zkcpMuSTH5UoX55jgxLYJ6', 'zkcpMuSTH5VToqnRqZ5TDm',
  'zkcpMuSTH5WDBUrciCYNne', 'zkcpMuSTH5WtGFmdCRb3tU', 'zkcpMuSTH5XhX2HQhaiV1C',
  'zkcpMuSTH5YWRR29zmGbbB', 'zkcpMuSUGcAbYFfKACqKSf', 'zkcpMuSUGcBDZQLbRxVPLi',
  'zkcpMuSUGcBz7LGXvvE62f', 'zkcpMuSUGcCpebDz2r93qg', 'zkcpMuSUGcDUExCi1WSxyD',
  'zkcpMuSUGcEHoyquHbbMQh', 'zkcpMuSUGcEx9r2hU6rxgk', 'zkcpMuSUGcFggSo43U5AYu',
  'zkcpMuSUGcGWvB4eoF47q6', 'zkcpMuSUGcHATR4Rb4Qm6D', 'zkcpMuSUGcHsLmRrJHpxvr',
  'zkcpMuSUGcJboerZkgG2cW', 'zkcpMuSUGcKMfPt4PPn55G', 'zkcpMuSUGcLBcMxrsUa4A4',
  'zkcpMuSUGcLxsyJUWqdDJv', 'zkcpMuSUGcMZAYyeXaDwqq', 'zkcpMuSUGcNPvRMEVmnQwV',
  'zkcpMuSUGcP817AAoypmvr', 'zkcpMuSUGcPtGe231pX5yG', 'zkcpMuSUGcQbTcZGuH86hk',
  'zkcpMuSUGcRKNDXS6jhvCc', 'zkcpMuSUGcSAN5wPhTYJVA', 'zkcpMuSUGcSpud3wKYfMq5',
  'zkcpMuSUGcTXp6EZSSquSb', 'zkcpMuSUGcUKQ4wni8wTY1', 'zkcpMuSUGcV5VDut2nSCcZ',
  'zkcpMuSUGcViPPW3knDtGc', 'zkcpMuSUGcWWpbQG3yHbEm', 'zkcpMuSUGcXErpiTPJZ3Ww',
  'zkcpMuSUGcY1emU9UbMtzZ', 'zkcpMuSUGcYkaLTM7oTTiF', 'zkcpMuSUGcZYScihhBjBHP',
  'zkcpMuSUGcaE1UnuPFAUQV', 'zkcpMuSVG8v5HZtJMvBqNX', 'zkcpMuSVG8vqZ5HZ2oQ2oA',
  'zkcpMuSVG8wdi2NFvMUKBx', 'zkcpMuSVG8xGK3kPdbKRed', 'zkcpMuSVG8y6BqbjKEHT5r',
  'zkcpMuSVG8yoUT2qFwSy8F', 'zkcpMuSVG8zWzgwo83AP9P', 'zkcpMuSVG91DkfC6tDagfy',
  'zkcpMuSVG925GvnqC1fkdr', 'zkcpMuSVG92fMrnAT72UJb', 'zkcpMuSVG93VYjec6Zu7pH',
  'zkcpMuSVG94AnXqMsdgQKU', 'zkcpMuSVG94vScfTeFbkeG', 'zkcpMuSVG95ibzkYjhdsrH',
  'zkcpMuSVG96PA4v88n6VRn', 'zkcpMuSVG97FNxp3Jx6bpX', 'zkcpMuSVG97wfXQgXmvCcV',
  'zkcpMuSVG98exdYsFqKEbJ', 'zkcpMuSVG99KSPXmR2KJ7w', 'zkcpMuSWFffe3jRbzWLCBx',
  'zkcpMuSWFfgGrnkzyryHgZ', 'zkcpMuSWFfh8uHbZNhKggy', 'zkcpMuSWFfhqzdV4Mre24h',
  'zkcpMuSWFfiY8Ysy6Fk88N', 'zkcpMuSWFfjMy3qN6uzsoz', 'zkcpMuSWFfjz7yUobzjEJ3',
  'zkcpMuSWFfksLia9Rcfzup', 'zkcpMuSWFfmZXuDi6YMSit', 'zkcpMuSWFfnEyHAHESWEWD',
  'zkcpMuSWFfnwRkdJY8uDvK', 'zkcpMuSWFfofz5CUsb2v7D', 'zkcpMuSWFfpUUPUXYboULf',
  'zkcpMuSWFfq8TvJyuSnzqy', 'zkcpMuSWFfqzQt7U5PTUYW', 'zkcpMuSWFfritht74m4uBv',
  'zkcpMuSWFfsMjc1zgLsfoc'
];

The problem here is that zkcpMu8... is sorted before zkcpMuS.... The number 8 here is before letter S.

The base58btc alphabet indicates that it is in lexicographic order though...

https://github.com/multiformats/js-multiformats/blob/master/src/bases/base58.js#L6

  alphabet: '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz'

There might be something I'm forgetting here.

@CMCDragonkai
Copy link
Member Author

I'm a bit confused here, the base58btc alphabet is in lexicographic order, but the above is where the error occurred. It seems there's more to preserving lexicographic order than just having the right alphabet order. The link https://www.codeproject.com/Articles/5165340/Sortable-Base64-Encoding says that there's more to it. It could also be due to JS's sorting algorithm is doing something strange.

I'm going to try to decode those back to ids, to confirm that they are in fact the right order with buffer comparison and see what the difference is between base58btc and doing hex encoding which should preserve order.

@CMCDragonkai
Copy link
Member Author

So after some tests, I've found that:

  1. base58btc seems already lexicographically sorted
  2. the ids being generated are the ones with wrong order... it starts as 1634787496.998 and then reverses at 1634787496 then goes to 1634787497.
  3. the sorted version is actually correctly sorted according to both buffer compare and binary string compare (which sort of means that base58btc is actually preserving order)
  4. why are the ids generated wrong though? Could there be an error extractTs?

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Oct 21, 2021

Ok there's a bug inside IdSortable that is not even about the base58 encoding atm.

Here's a script that after running some time, will eventually find an error:

import { extractTs, extractSeq } from './src/IdSortable';
import { IdSortable, utils } from './src';

const idGen = new IdSortable();

const findTheProblem = () => {
  let count = 100;
  while (count > 0) {
    const ids = [...utils.take(idGen, 100)];
    const ids_ = ids.slice();
    ids_.sort(Buffer.compare);
    for (let i = 0; i < 100; i++) {
      if (ids[i].toString() !== ids_[i].toString()) {
        console.log('FOUND AN INEQUALITY');
        console.log('COUNT', count);
        console.log('INDEX', i);
        console.log('ORIGINAL', ids[i]);
        console.log('SORTED', ids_[i]);
        console.log('ORIGINAL', extractTs(ids[i]), extractSeq(ids[i]));
        console.log('SORTED', extractTs(ids_[i]), extractSeq(ids_[i]));
        return [ids, ids_];
      }
    }
    count--;
  }
  return [[], []];
};


const [ids, ids_] = findTheProblem();

console.log(ids.map((id) => utils.toMultibase(id, 'base58btc')));
console.log(ids_.map((id) => utils.toMultibase(id, 'base58btc')));

In my case I found it after generating more than 900 ids.

The problem looks like this:

FOUND AN INEQUALITY
COUNT 91
INDEX 0
ORIGINAL IdInternal(16) [Uint8Array] [
    6, 23,  15,  31, 143, 248,
  112,  8, 181, 148, 192, 253,
  237, 19, 100, 103
]
SORTED IdInternal(16) [Uint8Array] [
    6, 23,  15, 31, 128,   0,
  112,  0, 173, 24,  21, 244,
  176, 94, 231, 60
]
UNIXTSBITS 000001100001011100001111000111111000
MSECBITS 111111111000
ORIGINAL 1634791928.998 8
UNIXTSBITS 000001100001011100001111000111111000
MSECBITS 000000000000
SORTED 1634791928 0

This shows that at index 0, the id provided by the IdSortable generator is providing a timestamp and millisecond output that is greater than the sorted version. The sorted Id comes from a different position, but it would have to be later than index 0.

As you can see the extracted TS in seconds and subseconds is 1634791928.998 for original, and then for the sorted is 1634791928. Which means the sorted one is actually more correct.

When extracting the TS, we are getting 0 bits. This means the decoding/extraction is correct. Somewhere during the encoding or generation of the ID results in generating an Id that is actually earlier according to its TS.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Oct 21, 2021

Neither decoding nor encoding is the problem.

The problem is the toFixedPoint call.

See here are 2 ids generated one after another, and see how the encoded msecBits become all zeros.

SNEAKY {
  ts: 1634793748.9987211,
  unixts: 1634793748,
  msec: 4092,
  unixtsBits: '000001100001011100001111100100010100',
  msecBits: '111111111100',
  seqBits: '000000000110'
} {
  ts: 1634793748.9995031,
  unixts: 1634793748,
  msec: 4096,
  unixtsBits: '000001100001011100001111100100010100',
  msecBits: '000000000000',
  seqBits: '000000000000'
}

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Oct 21, 2021

An msec of 4096 results in 12 bits of 0. An msec of 4095 would be 12 bits of 1.

The toFixedPoint is returning the largest possible msec value which is 4096 derived from a TS of 1634793748.9995031. And this results in the msec setting back to all zeros.

To fix this, we have to fix the rounding occurring in toFixedPoint, so that 4096 cannot be returned. Or if 4096 is returned, then we would have to adjust the unixts accordingly.

The maximum msec that toFixedPoint should be returning is 4095. Beyond that, the unixts has to increment.

@CMCDragonkai
Copy link
Member Author

This is now fixed. My tests show that base58btc does preserve the lexicographic sort order. So since we are using this in PK, this should be safe @tegefaulkes. Just make sure we don't use base64 encoding when we are using IdSortable in PK.

@joshuakarp this should be reminded to go into our reference documentation.

@CMCDragonkai CMCDragonkai added r&d:polykey:core activity 3 Peer to Peer Federated Hierarchy r&d:polykey:core activity 1 Secret Vault Sharing and Secret History Management labels Jul 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working development Standard development r&d:polykey:core activity 1 Secret Vault Sharing and Secret History Management r&d:polykey:core activity 3 Peer to Peer Federated Hierarchy
Development

Successfully merging a pull request may close this issue.

1 participant