Claney – regular routes

Claney – regular routes

Claney is a library for compiling a list of routes down to a set of regular expressions. A router can be implemented in ~100 lines of code in the language of your choice.

Claney is currently beta software. There are reasonably comprehensive tests, but it has not yet been used in anger.

Features

Guarantees that no two routes overlap.
Routes can be tagged and output can be limited to routes with a given tag.
Routing requires only two regular expression operations: a find/replace followed by a match.

Opinions

Claney has some opinions about routes:

Valid routes must start with /.
A sequence of two or more / characters is equivalent to a single /.
Query strings and anchors contain supplemental information and aren't used to distinguish between different routes.
No special treatment of #! anchors.

Limitations

Claney has a minimalist take on what a router is. A router is function that maps an HTTP method and path string to either 'not found' or

a name,
a set of named parameters,
a set of tags,
an optional query string and anchor.

In other words, Claney just tells you whether the route exists, which route it is, and which parameters were supplied. The rest is up to you.

Claney does not directly support matching on hostnames. The 'Hosts' section covers some options for dealing with multiple hosts. If your routing involves a complex interaction between hosts and paths, Claney is probably not a good fit.

Installation

Download a binary release:

Latest Claney release

Use ASDF (MacOS amd64 and arm64 or Linux amd64):

https://github.com/addrummond/asdf-claney

Install from source:

go install github.com/addrummond/claney@v0.4.7
$(go env GOPATH)/bin/claney -version

Example usage

claney -input my_route_file -output routes.json

Input format

The following is an example of an input file:

root          /
users         /users                       [users]
  login       [GET,POST] /login
  settings    /:user_id/settings
managers      /managers                    [managers,admin]
  login       /login
  login       [POST] /login
  settings    /:manager_id/settings
  user        /:manager_id/user/:user_id
  delete      [POST] /:manager_id/delete   [api]

This input defines the following routes as valid:

Route name	Method	Path	Parameters	Tags
root	GET or POST	/	`[]`	`[]`
users/login	GET	/users/login	`[]`	`["users"]`
users/settings	GET	/users/:user_id/settings	`["user_id]"`	`["users"]`
managers/login	GET	/managers/login	`[]`	`["admin","managers"]`
managers/login	POST	/managers/login	`[]`	`["admin","managers"]`
managers/settings	GET	/managers/settings	`[]`	`["admin","managers"]`
managers/user	GET	/managers/:manager_id/user/:user_id	`["manager_id", "user_id"]`	`["admin","managers"]`
managers/delete	POST	/managers/:manager_id/delete	`["manager_id", "user_id"]`	`["admin","api", "managers"]`

By default, / is used to join the name of a route to the names of its parent routes. A different separator may be used if desired (see 'Command line operation' below).

General syntax

Tabs or spaces can be used for indentation. It is recommended not to mix tabs and spaces, but if you do, each tab or space character contributes one unit of indentation.

Comments are indicated using # in the usual way. As the sequence of characters :# is used to specify integer parameters (see below), a # is not interpreted as the beginning of a comment if it is immediately preceded by a :.

An input line ending in \ is joined to the subsequent line (with the \ elided) and interpreted as a single logical line.

Special characters can be escaped using \ (including \\ for a literal backslash).

Input files should be UTF-8 encoded. Non-ASCII codepoints may be used as literals in URL patterns; they are are reproduced as-is inside the regular expressions.

The output JSON is UTF-8 encoded.

Route syntax

A route has the following general form:

route_name [METHOD1, METHOD2, ...] /path/pattern/:with/:parameters [list,of,tags]

Both the list of method names and the list of tags may be omitted. If no methods are explicitly specified then GET is added by default. Tags are case-sensitive. Method names are always converted to upper case.

Indentation is significant. If route A is indented under route B then B's path is joined to A's path and B inherits all of A's tags.

Route name uniqueness

You may define multiple routes with the same name, but they must be defined next to each other in the same file. The following example is OK:

posts /users/posts
# ...
# some comments or blank lines may intervene
# ...
posts /users/posts/:id

But an error is reported if another route is added in between the two posts routes:

# BAD
posts          /users/posts
something_else /foo/bar
posts          /users/posts/:id

This system retains the flexibility of allowing multiple routes to map to the same name while making it difficult to accidentally define duplicate route names.

Named parameters

Named parameters are introduced using the : character. Named parameters

never match empty strings or strings consisting only of / characters,
can appear anywhere within a URL pattern, and
are matched greedily (except in the case of rest parameters – see below).

Named string parameters

These can be written :foo, or :{foo bar} to allow whitespace and other special characters.

Named integer parameters

Integer parameters are written :#foo or :#{foo bar}.

Named rest parameters

Rest parameters are written :**foo or :**{foo bar}. Unlike normal parameters, rest parameters can match strings incuding / characters.

A rest parameter is typically the final element in a pattern, but you can also use rest parameters in the middle of a pattern. In this case matching is non-greedy. For example, the pattern /foo/:**rest/bar will match any URL that starts with /foo/ and ends with /bar – so long as a rest can be assigned a suitable value. For example, /foo/amp/bar matches, but /foo/bar does not, and neither does /foo//bar.

Examples of named parameters

The following are some examples of URL patterns containing named parameters:

/users/:email/summary
/users/:#user_id/comments/:#comment_id
/users/:#user_id/comment-:#comment_id
/users/comment:{uuid}-summary
/users/:**rest

Wildcards

The wildcards * and ** can be used in URL patterns. They are respectively equivalent to string parameters and rest parameters, except that they are unnamed and their values are discarded.

Trailing slashes

If a route pattern doesn't end with a / then a trailing / is optional. For example, the pattern /users/:id matches both /users/123 and /users/123/. If a pattern ends with a / then a trailing / is obligatory. You can use the special sequence !/ to disallow trailing slashes. For example, the pattern /users/:id!/ matches /users/123 but not /users/123/.

Multiple slashes

Claney always treats sequences of multiple slashes as equivalent to a single slash. For example, //foo///bar// is equivalent to /foo/bar/.

Command line operation

Claney reads from stdin and writes to stdout by default. An input file or output file may be specified using the -input and -output flags:

claney < input.routes > output.json
claney -input input.routes -output output.json

The -name-separator flag may be used to change the separator used to delimit hierarchical route names. The default is "/".

Claney guarantees that dictionary keys in its JSON output are always serialized in the same order. Output is therefore guaranteed to be identical for identical inputs.

Multiple input files

The -input flag can be passed multiple times to generate output on the basis of multiple input files. The output obtained is essentialy the same as if the input files were concatenated into one. The only difference is that Claney considers the least indented routes in each file to be at the top level, even if the least indented route in one file is more indented than the least indented route in another. In other words, given the following two input files, the router recognizes /foo and /bar, not /foo/bar:

<input file 1>
/foo

<input file 2>
    /bar

Filtering the output

The output can be filtered using the -filter option to include or exclude routes with certain methods or tags. For example:

claney -input input.routes -output output.json -filter 'manager-*|api'

The filter expression manager-*|api includes only routes that have a tag that matches the glob manager-* or that have the api tag. The following operators can be used to contruct filter expressions:

& – and
| - or
! - not

The & and | operators are left-associative and have equal precedence; ! binds tighter. Parentheses may be used. For example, the following expression includes only those routes which have the api tag and which do not have the client or employee tag:

api&!(client|employee)

Methods can be specified by enclosing the name of the method in []. For example, the following expression includes only PUT or POST routes with the api tag:

([PUT]|[POST])&api

In the case of routes with multiple methods, each method is treated independently for filtering. For example, for the route foo [GET,POST] /foo, the option -filter '[GET]' generates a router that recognizes GET /foo but not POST /foo.

A \ can be used to escape spaces, special characters, and * within a glob. Globs cannot be used with methods. All special characters can be surrounded by whitespace, so that e.g. the following two expressions are valid and equivalent:

api&!(client|[GET])
api & ! ( client | [ GET ] )

Hosts

Claney does not directly support matching on hostnames. If your routing involves a complex interaction between hosts and paths, Claney is probably not a good fit.

For simple cases there are three options:

Define a separate router for each host. This makes sense for cases such as foo.com and api.foo.com.
Add the hostname to the beginning of the path before matching (e.g. /foo.com/home).
Tag each route with the host(s) where it is valid.

An example of the third option is the following:

routeA /foo [host:host1.foo.com, host:host2.foo.com]
routeB /bar [host:host1.foo.com]

You can add logic in your router to 404 if the host doesn't match one of the host:* tags. Alternatively, you can use filtering to generate a separate router for each host:

claney -input routes -filter 'host:host1.foo.com' -output just_host1.json
claney -input routes -filter 'host:host2.foo.com' -output just_host2.json
claney -input routes -filter 'host:*' -output all_hosts.json

Case sensitivity

By default Claney operates in case-insensitive mode. Routes must be defined using only lower case characters, and the router implementations normalize URLs to lower case before matching (excluding the query string and anchor).

If you want your router to be case-sensitive, pass the -allow-upper-case flag to claney and ensure that to your router implementation does not normalize URLs before matching. The router constructors in the Go and JavaScript router implementations take a boolean caseSensitive parameter.

Benchmarking suggests that the cost of normalization is trivial (in JavaScript and Go at least) compared to the rest of the computation required to match a route.

Implementation

Routing is a two-step process. The first step is a find/replace using a single 'God' regular expression that matches valid routes. The result of the find/replace is a string containing all of the constant portions of the input route. For example, if the route is something like /manager/10/settings, then the constant portion would be /manager//settings. In most cases, the constant portion uniquely identifies a route; if so, a dictionary lookup is performed to retrieve a regex for matching the route string and extracting its parameter values. If multiple routes share the same constant portion then the matching regex is disjunctive and matches all applicable routes. The route can then be identified by the indices of the groups that have non-empty captures.

In the regexp used for the second step, capture groups are nested according to the scheme of a binary tree. For example, suppose that there are six routes (R1...R6) in a group. The portion of the regexp corresponding to each route is wrapped in a capture group. The smallest complete binary tree that can hold 6 values in its leaf nodes has a depth of 3. The capture groups are therefore nested as follows:

(                            )(                           )
 (            )(            )  (    R5     ) (    R6     )
  ( R1 )( R2 )  ( R3 )( R4 )

The matching route (if any) can be located via binary search. For example, if the leftmost capture group on the first line is empty then we know that either R5 or R6 is the matching route.

There are some edge cases where an invalid route will match the initial 'God' regular expression but then fail to match the second regular expression. Routers should interpret this scenario as a 404.

Performance

Claney generates a single disjunctive regex representing the entire set of valid routes. Regex engines are generally not well optimized for massively disjunctive regular expressions. Even trivial cases such as foo|bar|foobar|baz will often trigger unnecessary backtracking and performance linear in the number of alternatives.

Claney does a couple of things to make its disjunctive regular expressions execute as quickly as possible. First, if routes are specified hierachically in the input file, the common prefixes are factored out in the regular expression. This limits the amount of backtracking required.

The Javascript benchmark in js/router.bench.js gives a rough idea of the performance that can be expected. The input file contains n routes of the form /${m}foo for m=1..n (a fairly pessimal case, given the lack of hierarchy). On an M1 Macbook Air, the following times per routing operation are observed:

10 routes:    0.0003  milliseconds (per routing operation)
100 routes:   0.0004  milliseconds
1000 routes:  0.0019  milliseconds
10000 routes: 0.0164  milliseconds

Decomposing routers

Claney does not provide any special facilty for 'including' one router inside another, but it is easy to use rest parameters to decompose one router into multiple subrouters, as in the following example.

Main routes file:

managers /managers/:**{path} [managers]
# add the line below if you want `/managers` to be a valid route,
# as rest parameters do not match empty strings.
managers /managers          [managers]

clients /clients/:**{path} [clients]

Manager routes file:

foo /foo
bar /bar

Client routes file:

amp /amp
baz /baz

Router code:

const mainRouter = new Router(MAIN_ROUTES_JSON);
const managerRouter = new Router(MANAGER_ROUTES_JSON);
const clientRouter = new Router(CLIENT_ROUTES_JSON);

function route(path) {
  const r = mainRouter.route(path);
  if (r === null)
    return null;
  const subroutePath = r.params.path || '/';

  // you might want to add additional metadata to the return value
  // to indicate which router matched the route.
  if (r.tags.indexOf("managers") !== -1)
    return managerRouter.route(subroutePath);
  if (r.tags.indexOf("clients") !== -1)
    return clientRouter.route(subroutePath);
  return null;
}

Router implementations

Javascript and Go router implementations are provided in js/router.js and router/router.go. There is documentation for the Go implementation here.

There is a simple example of integrating the router into a single-page React app in js/react_example. In this directory, run build.sh and then startserver.sh to start a server on localhost:80001. (CORS issues make it necessary to use a server.) The app uses the library in js/reactmicrorouter.js.

Name

Claney is named after Stephen Cole Kleene (whose last name is pronounced [ˈkleɪni]).

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
.github/workflows		.github/workflows
compiler		compiler
examples		examples
glob		glob
js		js
router		router
.gitignore		.gitignore
.tool-versions		.tool-versions
LICENSE		LICENSE
Readme.md		Readme.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go
main_test.go		main_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Claney – regular routes

Features

Opinions

Limitations

Installation

Example usage

Input format

General syntax

Route syntax

Route name uniqueness

Named parameters

Named string parameters

Named integer parameters

Named rest parameters

Examples of named parameters

Wildcards

Tags

Trailing slashes

Multiple slashes

Command line operation

Multiple input files

Filtering the output

Hosts

Case sensitivity

Implementation

Performance

Decomposing routers

Router implementations

Name

About

Releases 5

Languages

License

addrummond/claney

Folders and files

Latest commit

History

Repository files navigation

Claney – regular routes

Features

Opinions

Limitations

Installation

Example usage

Input format

General syntax

Route syntax

Route name uniqueness

Named parameters

Named string parameters

Named integer parameters

Named rest parameters

Examples of named parameters

Wildcards

Tags

Trailing slashes

Multiple slashes

Command line operation

Multiple input files

Filtering the output

Hosts

Case sensitivity

Implementation

Performance

Decomposing routers

Router implementations

Name

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 5

Languages