Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Url arbitrary #302

Closed
dubzzz opened this issue Feb 5, 2019 · 3 comments · Fixed by #344
Closed

Url arbitrary #302

dubzzz opened this issue Feb 5, 2019 · 3 comments · Fixed by #344

Comments

@dubzzz
Copy link
Owner

dubzzz commented Feb 5, 2019

No description provided.

@dubzzz
Copy link
Owner Author

dubzzz commented Mar 22, 2019

A running draft of the proposal can be found on https://runkit.com/dubzzz/urls

The final arbitrary will need to add the ability to customize some parts of the url (and to be cleaner ^^):

  • authentification or not
  • port or not
  • all schemes or limited list of schemes
  • domain, ipv4, ipv6, ip future or only domain or only domain with precise extension
  • query parameters or not
  • fragment or not

Copy of the draft:

const fc = require('fast-check');

// https://tools.ietf.org/html/rfc3986#section-3.1
// the ranges of:
// - ALPHA (%41-%5A and %61-%7A),
// - DIGIT (%30-%39),
// - hyphen (%2D),
// - period (%2E),
// - underscore (%5F),
// - or tilde (%7E)
const multiMapArb = (...builders) => {
    const numChoices = builders.reduce((acc, b) => acc + b.num, 0);
    return fc.nat(numChoices - 1)
        .map(choice => {
            let idx = -1;
            let numSkips = 0;
            while (choice >= numSkips) {
                numSkips += builders[++idx].num;
            }
            return builders[idx].build(choice - numSkips + builders[idx].num);
        });
};
const multiMapBuilder = (items) => {
    return { num: items.length, build: v => items[v] };
};
const alphaChar = multiMapArb(
    { num: 26, build: v => String.fromCharCode(v + 0x41) },
    { num: 26, build: v => String.fromCharCode(v + 0x61) },
);
//console.log(fc.sample(alphaChar));
const schemeValidCharArb = multiMapArb(
    { num: 26, build: v => String.fromCharCode(v + 0x41) },
    { num: 26, build: v => String.fromCharCode(v + 0x61) },
    { num: 10, build: v => String.fromCharCode(v + 0x30) },
    multiMapBuilder(['+', '-', '.']),
);
//console.log(fc.sample(schemaValidChar));

// scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
const schemeArb = fc.tuple(alphaChar, fc.stringOf(schemeValidCharArb)).map(([t, q]) => t + q);
//console.log(fc.sample(schemeArb));

// IPv4address = dec-octet "." dec-octet "." dec-octet "." dec-octet
const ipV4Arb = fc.tuple(fc.nat(255), fc.nat(255), fc.nat(255), fc.nat(255))
    .map(([a, b, c, d]) => `${a}.${b}.${c}.${d}`);
//console.log(fc.sample(ipV4Arb));

// h16 = 1*4HEXDIG
// ls32 = ( h16 ":" h16 ) / IPv4address
//   IPv6address   =                            6( h16 ":" ) ls32
//                 /                       "::" 5( h16 ":" ) ls32
//                 / [               h16 ] "::" 4( h16 ":" ) ls32
//                 / [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32
//                 / [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32
//                 / [ *3( h16 ":" ) h16 ] "::"    h16 ":"   ls32
//                 / [ *4( h16 ":" ) h16 ] "::"              ls32
//                 / [ *5( h16 ":" ) h16 ] "::"              h16
//                 / [ *6( h16 ":" ) h16 ] "::"
const h16Arb = fc.stringOf(fc.hexa(), 1, 4);
const ls32Arb = fc.oneof(
    fc.tuple(h16Arb, h16Arb).map(([a, b]) => `${a}:${b}`),
    ipV4Arb
);
const ipV6Arb = fc.oneof(
    fc.tuple(fc.array(h16Arb, 6, 6), ls32Arb)
        .map(([eh, l]) => `${eh.join(':')}:${l}`),
    fc.tuple(fc.array(h16Arb, 5, 5), ls32Arb)
        .map(([eh, l]) => `::${eh.join(':')}:${l}`),
    fc.tuple(fc.array(h16Arb, 0, 1), fc.array(h16Arb, 4, 4), ls32Arb)
        .map(([bh, eh, l]) => `${bh.join(':')}::${eh.join(':')}:${l}`),
    fc.tuple(fc.array(h16Arb, 0, 2), fc.array(h16Arb, 3, 3), ls32Arb)
        .map(([bh, eh, l]) => `${bh.join(':')}::${eh.join(':')}:${l}`),
    fc.tuple(fc.array(h16Arb, 0, 3), fc.array(h16Arb, 2, 2), ls32Arb)
        .map(([bh, eh, l]) => `${bh.join(':')}::${eh.join(':')}:${l}`),
    fc.tuple(fc.array(h16Arb, 0, 4), h16Arb, ls32Arb)
        .map(([bh, eh, l]) => `${bh.join(':')}::${eh}:${l}`),
    fc.tuple(fc.array(h16Arb, 0, 5), ls32Arb)
        .map(([bh, l]) => `${bh.join(':')}::${l}`),
    fc.tuple(fc.array(h16Arb, 0, 6), h16Arb)
        .map(([bh, eh]) => `${bh.join(':')}::${eh}`),
    fc.tuple(fc.array(h16Arb, 0, 7))
        .map(([bh]) => `${bh.join(':')}::`),
);
//console.log(fc.sample(ipV6Arb));

// IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
// unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
// sub-delims    = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
const ipFutureValidCharArb = multiMapArb(
    { num: 26, build: v => String.fromCharCode(v + 0x41) },
    { num: 26, build: v => String.fromCharCode(v + 0x61) },
    { num: 10, build: v => String.fromCharCode(v + 0x30) },
    multiMapBuilder(["-", ".", "_", "~", "!", "$", "&", "'", "(", ")", "*", "+", ",", ";", "=", ':']),
);
const ipFutureArb = fc.tuple(fc.hexaString(1, 10), fc.stringOf(ipFutureValidCharArb, 1, 10))
    .map(([a, b]) => `v${a}.${b}`);
//console.log(fc.sample(ipFutureArb));

// reg-name      = *( unreserved / pct-encoded / sub-delims )
const regNameArb = fc.stringOf(
    fc.oneof(
        multiMapArb(
            { num: 26, build: v => String.fromCharCode(v + 0x41) },
            { num: 26, build: v => String.fromCharCode(v + 0x61) },
            { num: 10, build: v => String.fromCharCode(v + 0x30) },
            multiMapBuilder(["-", ".", "_", "~", "!", "$", "&", "'", "(", ")", "*", "+", ",", ";", "="]),
        ),
        fc.hexaString(2, 2).map(v => `%${v}`)
    )
);
//console.log(fc.sample(regNameArb));

// IP-literal    = "[" ( IPv6address / IPvFuture  ) "]"
// host          = IP-literal / IPv4address / reg-name
const hostArb = fc.oneof(
    ipV6Arb.map(ip => `[${ip}]`),
    ipFutureArb.map(ip => `[${ip}]`),
    ipV4Arb,
    regNameArb,
);
//console.log(fc.sample(hostArb));

// userinfo      = *( unreserved / pct-encoded / sub-delims / ":" )
const userInfoArb = fc.stringOf(
    fc.oneof(
        multiMapArb(
            { num: 26, build: v => String.fromCharCode(v + 0x41) },
            { num: 26, build: v => String.fromCharCode(v + 0x61) },
            { num: 10, build: v => String.fromCharCode(v + 0x30) },
            multiMapBuilder(["-", ".", "_", "~", "!", "$", "&", "'", "(", ")", "*", "+", ",", ";", "=", ":"]),
        ),
        fc.hexaString(2, 2).map(v => `%${v}`)
    )
);
//console.log(fc.sample(userInfoArb));

// port        = *DIGIT
const portArb = fc.nat(65535).map(v => String(v));

// authority     = [ userinfo "@" ] host [ ":" port ]
const authorityArb = fc.tuple(
    fc.option(userInfoArb),
    hostArb,
    fc.option(portArb)
).map(([u, h, p]) => (u === null ? '' : `${u}@`) + h + (p === null ? '' : `:${p}`));
//console.log(fc.sample(authorityArb));

// pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
// segment       = *pchar
// segment-nz    = 1*pchar
// path-abempty  = *( "/" segment )
// path-absolute = "/" [ segment-nz *( "/" segment ) ]
// path-rootless = segment-nz *( "/" segment )
// path-empty    = 0<pchar>
// hier-part     = "//" authority path-abempty
//               / path-absolute
//               / path-rootless
//               / path-empty
const pcharArb = fc.oneof(
    multiMapArb(
        { num: 26, build: v => String.fromCharCode(v + 0x41) },
        { num: 26, build: v => String.fromCharCode(v + 0x61) },
        { num: 10, build: v => String.fromCharCode(v + 0x30) },
        multiMapBuilder(["-", ".", "_", "~", "!", "$", "&", "'", "(", ")", "*", "+", ",", ";", "=", ":", "@"]),
    ),
    fc.hexaString(2, 2).map(v => `%${v}`)
);
const pathAbEmptyArb = fc.array(fc.stringOf(pcharArb))
    .map(p => p.map(v => `/${v}`).join(''));
const pathRootLessArb = fc.tuple(
    fc.stringOf(pcharArb, 1, 10),
    fc.array(fc.stringOf(pcharArb)),
).map(([s, p]) => s + p.map(v => `/${v}`).join(''));
const pathAbsoluteArb = fc.oneof(pathRootLessArb.map(p => `/${p}`), fc.constant('/'));
const hierPartArb = fc.oneof(
    fc.tuple(authorityArb, pathAbEmptyArb).map(([a, p]) => `//${a}${p}`),
    pathAbsoluteArb,
    pathRootLessArb,
    fc.constant('')
);
//console.log(fc.sample(hierPartArb));

// query         = *( pchar / "/" / "?" )
// fragment      = *( pchar / "/" / "?" )
const queryFragmentCharArb = fc.oneof(
    multiMapArb(
        { num: 26, build: v => String.fromCharCode(v + 0x41) },
        { num: 26, build: v => String.fromCharCode(v + 0x61) },
        { num: 10, build: v => String.fromCharCode(v + 0x30) },
        multiMapBuilder(["-", ".", "_", "~", "!", "$", "&", "'", "(", ")", "*", "+", ",", ";", "=", ":", "@", "/", "?"]),
    ),
    fc.hexaString(2, 2).map(v => `%${v}`)
);
const queryArb = fc.stringOf(queryFragmentCharArb);
const fragmentArb = fc.stringOf(queryFragmentCharArb);

// absolute-URI  = scheme ":" hier-part [ "?" query ]
// URI           = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
const absoluteUriArb = fc.tuple(schemeArb, hierPartArb, fc.option(queryArb))
    .map(([s, h, q]) => `${s}:${h}` + (q === null ? '' : `?${q}`));
const uriArb = fc.tuple(schemeArb, hierPartArb, fc.option(queryArb), fc.option(fragmentArb))
    .map(([s, h, q, f]) => `${s}:${h}` + (q === null ? '' : `?${q}`) + (f === null ? '' : `#${f}`));
console.log(fc.sample(uriArb));

@dubzzz
Copy link
Owner Author

dubzzz commented Mar 26, 2019

Second iteration: https://runkit.com/dubzzz/urls-v2

There is still an opened question concerning the parameters that should be offered by a webUrl generator.

const fc = require('fast-check');

const multiMapArb = (...builders) => {
    const numChoices = builders.reduce((acc, b) => acc + b.num, 0);
    return fc.nat(numChoices - 1)
        .map(choice => {
            let idx = -1;
            let numSkips = 0;
            while (choice >= numSkips) {
                numSkips += builders[++idx].num;
            }
            return builders[idx].build(choice - numSkips + builders[idx].num);
        });
};
const multiMapBuilder = (items) => {
    return { num: items.length, build: v => items[v] };
};

const alphaNumericArb = multiMapArb(
    { num: 26, build: v => String.fromCharCode(v + 0x61) },
    { num: 10, build: v => String.fromCharCode(v + 0x30) },
);
const alphaNumericHyphenArb = multiMapArb(
    { num: 26, build: v => String.fromCharCode(v + 0x61) },
    { num: 10, build: v => String.fromCharCode(v + 0x30) },
    { num: 1, build: v => '-' },
);

function subdomain() {
    return fc.tuple(
        alphaNumericArb,
        fc.option(
            fc.tuple(
                fc.stringOf(alphaNumericHyphenArb),
                alphaNumericArb
            )
        )
    )
    .map(([f, d]) => d === null ? f : `${f}${d[0]}${d[1]}`)
    .filter(d => d.length <= 63);
}

// customPrefix / customSuffix (or extension)
function domain(extensions) {
    const defaultLabels = fc.array(subdomain(), 1, 5);
    const labels = extensions === undefined
        ? defaultLabels
        : fc.tuple(defaultLabels, fc.constantFrom(...extensions)).map(([l, ext]) => l.concat([ext]));
    return labels
        .map(l => l.join('.'))
        .filter(d => d.length <= 255);
}
//console.log(fc.sample(domain()));
//console.log(fc.sample(domain(['com', 'fr'])));

const alphaChar = multiMapArb(
    { num: 26, build: v => String.fromCharCode(v + 0x41) },
    { num: 26, build: v => String.fromCharCode(v + 0x61) },
);

const percentCharArb = fc.fullUnicode()
    .map(c => {
        const encoded = encodeURIComponent(c);
        return c !== encoded
            ? encoded
            : `%${c.charCodeAt(0).toString(16)}`; // always %xy
    });

const userInfoArb = fc.stringOf(
    fc.frequency(
        {
            weight: 10,
            arbitrary: multiMapArb(
                { num: 26, build: v => String.fromCharCode(v + 0x41) },
                { num: 26, build: v => String.fromCharCode(v + 0x61) },
                { num: 10, build: v => String.fromCharCode(v + 0x30) },
                multiMapBuilder(["-", ".", "_", "~", "!", "$", "&", "'", "(", ")", "*", "+", ",", ";", "=", ":"]),
            ),
        },
        {
            weight: 1,
            arbitrary: percentCharArb,
        }
    )
);

// port        = *DIGIT
const portArb = fc.nat(65535).map(v => String(v));

// IPv4address = dec-octet "." dec-octet "." dec-octet "." dec-octet
const ipV4Arb = fc.tuple(fc.nat(255), fc.nat(255), fc.nat(255), fc.nat(255))
    .map(([a, b, c, d]) => `${a}.${b}.${c}.${d}`);

// h16 = 1*4HEXDIG
// ls32 = ( h16 ":" h16 ) / IPv4address
// IPv6address   =                            6( h16 ":" ) ls32
//               /                       "::" 5( h16 ":" ) ls32
//               / [               h16 ] "::" 4( h16 ":" ) ls32
//               / [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32
//               / [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32
//               / [ *3( h16 ":" ) h16 ] "::"    h16 ":"   ls32
//               / [ *4( h16 ":" ) h16 ] "::"              ls32
//               / [ *5( h16 ":" ) h16 ] "::"              h16
//               / [ *6( h16 ":" ) h16 ] "::"
const h16Arb = fc.stringOf(fc.hexa(), 1, 4);
const ls32Arb = fc.oneof(
    fc.tuple(h16Arb, h16Arb).map(([a, b]) => `${a}:${b}`),
    ipV4Arb
);
const ipV6Arb = fc.oneof(
    fc.tuple(fc.array(h16Arb, 6, 6), ls32Arb)
        .map(([eh, l]) => `${eh.join(':')}:${l}`),
    fc.tuple(fc.array(h16Arb, 5, 5), ls32Arb)
        .map(([eh, l]) => `::${eh.join(':')}:${l}`),
    fc.tuple(fc.array(h16Arb, 0, 1), fc.array(h16Arb, 4, 4), ls32Arb)
        .map(([bh, eh, l]) => `${bh.join(':')}::${eh.join(':')}:${l}`),
    fc.tuple(fc.array(h16Arb, 0, 2), fc.array(h16Arb, 3, 3), ls32Arb)
        .map(([bh, eh, l]) => `${bh.join(':')}::${eh.join(':')}:${l}`),
    fc.tuple(fc.array(h16Arb, 0, 3), fc.array(h16Arb, 2, 2), ls32Arb)
        .map(([bh, eh, l]) => `${bh.join(':')}::${eh.join(':')}:${l}`),
    fc.tuple(fc.array(h16Arb, 0, 4), h16Arb, ls32Arb)
        .map(([bh, eh, l]) => `${bh.join(':')}::${eh}:${l}`),
    fc.tuple(fc.array(h16Arb, 0, 5), ls32Arb)
        .map(([bh, l]) => `${bh.join(':')}::${l}`),
    fc.tuple(fc.array(h16Arb, 0, 6), h16Arb)
        .map(([bh, eh]) => `${bh.join(':')}::${eh}`),
    fc.tuple(fc.array(h16Arb, 0, 7))
        .map(([bh]) => `${bh.join(':')}::`),
);

function authority(settings) {
    // authority = [ userinfo "@" ] host [ ":" port ]
    const hostnameArbs = [domain(settings.withExtensions)]
        .concat(settings.withIPv4 === true ? [ipV4Arb] : [])
        .concat(settings.withIPv6 === true ? [ipV6Arb.map(ip => `[${ip}]`)] : []);
    return fc.tuple(
        settings.withUserInfo === true ? fc.option(userInfoArb) : fc.constant(null),
        fc.oneof(...hostnameArbs),
        settings.withPort === true ? fc.option(portArb) : fc.constant(null),
    ).map(([u, h, p]) => (u === null ? '' : `${u}@`) + h + (p === null ? '' : `:${p}`));
}

// scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
const schemeValidCharArb = multiMapArb(
    { num: 26, build: v => String.fromCharCode(v + 0x41) },
    { num: 26, build: v => String.fromCharCode(v + 0x61) },
    { num: 10, build: v => String.fromCharCode(v + 0x30) },
    multiMapBuilder(['+', '-', '.']),
);
const schemeArb = fc.tuple(alphaChar, fc.stringOf(schemeValidCharArb)).map(([t, q]) => t + q);

// pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
// segment       = *pchar
// path-abempty  = *( "/" segment )
// hier-part     = "//" authority path-abempty
const pcharArb = fc.frequency(
    {
        weight: 10,
        arbitrary: multiMapArb(
            { num: 26, build: v => String.fromCharCode(v + 0x41) },
            { num: 26, build: v => String.fromCharCode(v + 0x61) },
            { num: 10, build: v => String.fromCharCode(v + 0x30) },
            multiMapBuilder(["-", ".", "_", "~", "!", "$", "&", "'", "(", ")", "*", "+", ",", ";", "=", ":", "@"]),
        ),
    },
    {
        weight: 1,
        arbitrary: percentCharArb,
    }
);
const pathAbEmptyArb = fc.array(fc.stringOf(pcharArb))
    .map(p => p.map(v => `/${v}`).join(''));

// query         = *( pchar / "/" / "?" )
// fragment      = *( pchar / "/" / "?" )
const queryFragmentCharArb = fc.frequency(
    {
        weight: 10,
        arbitrary: multiMapArb(
            { num: 26, build: v => String.fromCharCode(v + 0x41) },
            { num: 26, build: v => String.fromCharCode(v + 0x61) },
            { num: 10, build: v => String.fromCharCode(v + 0x30) },
            multiMapBuilder(["-", ".", "_", "~", "!", "$", "&", "'", "(", ")", "*", "+", ",", ";", "=", ":", "@", "/", "?"]),
        ),
    },
    {
        weight: 1,
        arbitrary: percentCharArb,
    }
);
const queryArb = fc.stringOf(queryFragmentCharArb);
const fragmentArb = fc.stringOf(queryFragmentCharArb);

function webUrl(settings) {
    return fc.tuple(
        settings.schemeArbitrary || fc.constantFrom('http', 'https'),
        authority(settings),
        pathAbEmptyArb,
        settings.withQueryParameters === true ? fc.option(queryArb) : fc.constant(null),
        settings.withFragments === true ? fc.option(fragmentArb) : fc.constant(null),
    ).map(([s, a, p, q, f]) => `${s}://${a}${p}${q === null ? '' : `?${q}`}${f === null ? '' : `#${f}`}`);
}
console.log(JSON.stringify(fc.sample(
    webUrl({
        withUserInfo: true,
        withPort: true,
        withIPv4: true,
        withIPv6: true,
        withQueryParameters: true,
        withFragments: true,
        withExtensions: ['com', 'fr'],
    }), 100
)));

// can be tested using decodeURI

dubzzz added a commit that referenced this issue Apr 1, 2019
@dubzzz dubzzz removed the feature label Apr 4, 2019
@dubzzz
Copy link
Owner Author

dubzzz commented Apr 5, 2019

The implementation of the URL arbitraries is pending updates on tickets:

Which seem not to fully agree with RFC-1123 concerning the definition of a valid domain.

EDIT

In the remarks of the RFC we can read:

However, a valid host name can never have the dotted-decimal form #.#.#.#, since at least the highest-level component label will be alphabetic.

dubzzz added a commit that referenced this issue Apr 10, 2019
dubzzz added a commit that referenced this issue Apr 11, 2019
dubzzz added a commit that referenced this issue Apr 11, 2019
dubzzz added a commit that referenced this issue Apr 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant