The Great Refactoring™️ #165

kasperisager · 2019-12-17T10:17:18Z

This pull request will chronicle the rather large refactor that I've been working on for the past two months. Many of the changes mentioned here will also warrant a separate ADR; one purpose of this pull request is identifying exactly which changes do so.

Let me start by addressing the most obvious question...

Why on earth?

More than two years ago, Alfa was born on a simple premise as outlined in ADR 2: Work with static data to provide reliable and reproducible accessibility audits. The way this was implemented in practice was by operating on a strict subset of DOM and CSSOM and to ensure that this subset was directly serialisable to JSON. By keeping the core data format simple to produce and pass around, we however had to introduce a bit of complexity on the consuming side:

As a JSON serialisable format meant no circular references, the DOM subset could not include parent pointers. This meant that every function that needed access to parents had to be passed a top level "context" node from which parent pointers could be inferred. Among other things, this lead to fairly complex traversal APIs as evidenced by Turn visitor functions into generators #147.
Many DOM and CSSOM properties are optional and can therefore be set to null and/or undefined. Per ADR 3, this was perfectly "safe" as the TypeScript compiler would ensure that we guarded against dereferencing something nullable, but it littered our code with nullable checks. Even worse, nullable values are so ingrained in the JavaScript ecosystem that the TypeScript compiler will actually not complain about excessive checking.
As the data format only contained the bare minimum needed to represent DOM and CSSOM, a lot of additional functionality had to be provided on top. For the most part this was fine, but we often ran into situations where the functionality would be rather expensive, such as constructing parent pointers, and therefore had to be implemented using a cache.

While these complexities initially amounted to only a small overhead, the overhead quickly increased as the code base grew. After working on #147, it became clear to me that something had to be done. The following sections will outline all the major changes that were made to reduce the overall complexity of Alfa to a point where I can once again consider its future development sustainable. While all of the above points will be addressed, I'll also go into detail about some other long-needed changes that made their way in.

Major changes

Parent pointers and JSON serialisable DOM

In order to provide both parent pointers and JSON serialisable nodes, DOM and CSSOM types are now implemented as classes that all implement the toJSON() method. This method returns a canonical JSON representation of nodes that can also be used to translate back into concrete nodes.

This also means that Alfa no longer works on a strict subset of DOM and CSSOM, which has afforded us the possibility to fix some of the "quirks" in its design. For example, many properties now have shorter names and are implemented as methods, such as node.parent() instead of of node.parentNode, and many of these methods optionally allow specifying traversal parameters, such as node.parent({ flattened: true }).

To actually construct these DOM and CSSOM types, constructor functions are used:

Element.of(
  Option.of(Namespace.HTML), // Namespace
  None,                      // Prefix
  "div",                     // Name
  self => [                  // Attributes
    Attribute.of(
      None,                  // Namespace
      None,                  // Prefix
      "title",               // Name
      "Hello world",         // Value
      Option.of(self)        // Owner
    )
  ],
  self => [                  // Children
    Text.of(
      "Hello world",         // Data
      Option.of(self)        // Parent
    )
  ]
);

When using JSX, each JSX element is translated to the canonical JSON representation of an element, which can then be further translated to an actual element:

Element.fromElement(
  <div title="Hello world">
    Hello world
  </div>
);

With this new design, traversal is also trivial to implement without exposing any additional APIs for doing so. For example, we can satisfy all traversal requirements laid out in #147 using a simple generator function:

function *visit(node: Node): Generator<void, boolean> {
  if (shouldSkipSubtree) {
    return true;
  }

  if (shouldExit) {
    return false;
  }

  // Pre-order code

  for (const child of node.children()) {
    const shouldContinue = yield* visit(child);

    if (!shouldContinue) {
      return false;
    }
  }

  // Post-order code

  return true;
}

Replacing nullables with the `Option<T>` type

In the previous section, you might have noticed the use of Option.of(...) and None. These are examples of the Option<T> type in use, which will eventually replace all uses of nullable values. The Option<T> type represents an optional value T which can be either Some<T> if the value exists and None if it doesn't. You might have already stumbled upon a similar concept in functional programming languages such as F# or Scala, or perhaps even in Rust.

Other than of course making it very explicit that a type T is optional, rather than use "magic" values such as null or undefined, the primary benefit of the Option<T> type is its chained API. For example, given something like this...

const a: A | null = getA();

if (a !== null) {
  const b: B | null = getB(a);

  if (b !== null) {
    const c: C | null = getC(b);
  }
}

...we can instead write the following using the Option<T> type:

const c: Option<C> = getA().flatMap(getB).flatMap(getC);

Put another way, Option<T> is to nullable checking madness what Promise<T> is to callback hell.

Side-effect free questions in non-automated rules

In order to allow callers to provide input to non-automated rules, Alfa employs a question/answer mechanism by which rules pose questions to be answered by the caller. Previously, this mechanism was implemented by means of side-effects; a question() function would be invoked with a question after which an answer to the question would be looked up and, if found, returned from the question() function. If no answer was found, a question would be added to a list of questions to be returned to the caller after finishing the audit. The caller would then need to provide answers to these questions by making a new audit with the answers provided.

However, the above flow often lead to some unfortunate situations when rules posed multiple questions, such as this:

const q1 = question("q1", ...);
const q2 = question("q2", ...);

return {
  1: { holds: q1 === true || q2 === false }
}

The intent of the code above is to first ask q1 and then, if q1 was not answered with "yes", ask q2. If q1 is indeed answered with "yes", q2 would however still end up in the list of questions returned to the caller even though it's not needed. To avoid this, we would have to write the code as such:

const q1 = question("q1", ...);

if (q1 === true) {
  return {
    1: { holds: true }
  };
};

const q2 = question("q2", ...);

return {
  1: { holds: q2 === false }
}

This way, the side-effect of the second question never occurs and the code behaves like it's supposed to. You can however imagine how even slightly more complex cases quickly turn into a tangled mess if we constantly have to keep track of which side-effects should and shouldn't occur and when.

To untangle this questionnaire mechanism, questions are now returned as values and therefore no longer rely on side effects. To illustrate what this means, let's rewrite the previous example:

const q1 = Question.of("q1", ...);
const q2 = Question.of("q2", ...);

return {
  1: q1.map(q1 => q1 === true 
      ? /* pass */
      : q2.map(q2 => q2 === false 
          ? /* pass */ 
          : /* fail */
        )
  )
};

In effect, questions only ever reach callers when they're returned from an expectation, meaning that an answer is actually needed.

Optionally asynchronous audits using the `Future<T>` type

Previously, audits were required to be fully synchronous is order to facilitate integration to projects without support for asynchronicity, such as ESLint. While rules themselves have no use for asynchronous behaviour per ADR 2, other processes, such as user input, do. This restriction lead to an unfortunate user input flow, as outlined in the previous section, and roughly looked like this:

Run an audit.
For all questions returned, provide user input.
Run a new audit with provided user input.
If new questions are returned, go to step 2.

The flow we really want is this:

Run an audit.
When requested, provide user input directly to the running audit.

In order to accomplish this flow, we would need the ability to asynchronously provide user input to a running audit. However, if we wanted to inject any kind of asynchronicity anywhere within the audit flow, we would be forced to make the entire flow asynchronous assuming we had chosen to rely on the standard Promise<T> type. The reason for this is that the Promise<T> type by design cannot be resolved synchronously.

So, to circumvent this design caveat of the standard Promise<T> type, we opted for the simpler Future<T> type, which can optionally be resolved synchronously by means of the Future.now() and Future.delay() constructor functions. As such, the user input flow can default to a synchronous "no answer available" mode, with the ability for callers to also provide an asynchronous user input flow.

While the Promise<T> type provides functionality for dealing with computations that might throw exceptions, the Future<T> type leaves it up to the caller to deal with errors and provides no functionality for handling exceptions. This is very much by design as the Future<T> type is meant as a fairly low-level primitive for working with possibly asynchronous computations. As the next section will make apparent, there are however ways of building error handling on top of the Future<T> type in a more type-safe manner than what is possible with the Promise<T> type.

Typed errors using the `Result<T, E>` type

JavaScript somewhat suffers from the fact that the common way of dealing with even non-exceptional errors is through exceptions. While I won't go into the details of why exceptions are a poor primitive for error handling (go read something like https://www.atlassian.com/blog/archives/exceptions_are_bad for more on that), I'll mention two things that are fairly critical when working with a code base of a non-trivial size: Control flow and type safety. With exceptions, control flow can be difficult to reason about as exceptions break the normal flow of execution, and type safety surrounding errors is non-existent as the TypeScript compiler can provide no type information on errors in catch (err) {} blocks as anything can be thrown as an exception in JavaScript.

Fortunately for us, Alfa does not need to deal with non-exceptional errors in that many places. The most common use of errors are in its parsers and the JSON-LD expansion algorithm. However, these places are also prime candidates for type safe error handling: Why and where did the parsing fail, and why and where did the JSON-LD document not expand correctly? When we rely on exceptions to inform the caller about such errors, we risk two things:

The caller forgetting to handle potential errors altogether by not wrapping their call in a try {} block.
The caller forgetting to handle a specific error properly due to them having to manually narrow the any error caught by a catch (err) {} block.

To avoid these risks, the Result<T, E> type is now used instead. The type parameter T represents the type of a "successful" value whereas the type parameter E represents a "failed" value, referred to as an error. Similar to the Option<T> type, you might already be familiar with the Result<T, E> type from F# or Rust. Also similar to the Option<T> type, the Result<T, E> provides a chained API that allows us to turns this...

try {
  const a: A = getA();
  const b: B = getB(a);
  const c: C = getC(b);
} catch (err) {
  if (err instanceof ErrorA || err instanceof ErrorB || err instanceof ErrorC) {
    throw log(err);
  } else {
    throw err;
  }
}

...into this:

const c: Result<C, ErrorA | ErrorB | ErrorC> = getA().map(getB).map(getC).mapErr(log);

We can also compose the Result<T, E> type and the Future<T> type in order to model asynchronous computations that might fail: Future<Result<T, E>>. Turning this into the standard Promise<T> type which is rejected on errors is then as simple as:

const task: Future<Result<T, E>> = ...;

const promise: Promise<T> = new Promise((resolve, reject) => 
  task.then(result => result.map(resolve).mapErr(reject))
);

Do notice that by doing so we've however lost the error type E.

* master: Remove contributors from `package.json` Implement `Reducer` type Implement `Monad` type Implement `Iterable` type Implement `Generator` type Implement `Functor` type Implement `Mapper` type Remove some comments Implement `Result` type Implement `Predicate` type

* master: Implement `Branched` traversal and sequencing Don't consider empty IDs in SIA-R3

* master: Add missing project reference Remove `flatten()` methods and implement `Equality` for some types Implement `Equality` type

* master: Wrap up some commentary Implement `List#pop` and fix `List#push` Add missing project reference, again Implement `Foldable` type Implement `Equality` for `Branched` type and rename a package Implement `List` type Add additional `Generator` and `Iterable` functions Add missing project reference

* compat: A couple of fixes Remove `Result.Maybe` Rework compatibility package

* master: Various fixes to `Iterable`, `List`, `Option`, and `Predicate` types

* master: Roll back to TypeScript 3.6 to get things building

* master: Implement `Cache#merge()`

* master: Remove crypto package, take 2 Remove crypto package

* master: Update lockfile

* master: Refine `Equality` type

* master: Various little refactors

* master: Make `Option` and `Result` iterable

…hods

* origin/master: Three valued logic (#167)

kasperisager added 30 commits October 24, 2019 20:13

Rework compatibility package

9a089c8

Remove Result.Maybe

e413df2

Merge branch 'master' into compat

4b683a9

* master: Implement `Branched` traversal and sequencing Don't consider empty IDs in SIA-R3

A couple of fixes

2d320b1

Merge branch 'master' into compat

a855d31

* master: Add missing project reference Remove `flatten()` methods and implement `Equality` for some types Implement `Equality` type

Merge branch 'compat' into develop

4e25ea1

* compat: A couple of fixes Remove `Result.Maybe` Rework compatibility package

Progress on DOM, CSS, and ARIA refactors

188a0b2

Introduce top-level Browser and Feature types

24f0e19

Merge branch 'master' into develop

4a81b2e

* master: Various fixes to `Iterable`, `List`, `Option`, and `Predicate` types

Refactor getDigest() and getChildNodes()

499f2f3

Merge branch 'master' into develop

dc4bfa9

* master: Roll back to TypeScript 3.6 to get things building

Fix getElementNamespace()

8ec513b

Merge branch 'master' into develop

a394756

* master: Implement `Cache#merge()`

Merge branch 'master' into develop

912fcbf

* master: Remove crypto package, take 2 Remove crypto package

Merge branch 'master' into develop

677632b

* master: Update lockfile

Merge branch 'master' into develop

fc45256

* master: Refine `Equality` type

Progress on ACT package refactor

efbdc5f

Merge branch 'master' into develop

81b8487

* master: Various little refactors

Progress on JSON-LD package refactor

6ae11d9

Rework some dependencies

09f72c5

Merge branch 'master' into develop

4c55c83

* master: Make `Option` and `Result` iterable

Progress on removing util and collection packages

b996985

Further progress on refactors

c95ccad

Update lockfile

55fa8d8

Body#body should be an ArrayBuffer

5ebd77e

Progress on assert package refactors

cfd68d4

Add subject property to Question

0522c6b

Squash remaining variance issues

b6557b3

kasperisager added 19 commits January 13, 2020 15:51

Make children optional in Document constructor

6afcdd8

Assertion target can now also be Text

8fc4813

Add Document.empty() constructor

fde875d

Rework HTTP, web, and device types

6049a6d

Add missing Question declarations

6b9d7d9

Get integrations compiling again

6485eaf

Add Headers#set() and Headers#delete()

6007e01

Make Headers iterable

8944d29

HTTP method name must be uppercase

44d8d6d

Fix some bugs uncovered during acceptance testing

c5426ad

Initial implementation of SIA-R64

2760b51

Refactor Node internals and add preceding() and following() met…

5faa48d

…hods

More consistent complex selector matching

90afd5d

_ prefix private fields in Style

5f67099

Add missing package references

195a950

Implement strict JSON and EARL serialization

e2bcb80

Make Branched JSON serializable

38458aa

Implement Iterable.count() and Sequence#count()

3b1ede4

Merge remote-tracking branch 'origin/master' into develop

d61d5f6

* origin/master: Three valued logic (#167)

kasperisager merged commit d61d5f6 into master Jan 23, 2020

This was referenced Jan 23, 2020

Set up ESLint configuration #171

Closed

Implement CSS transform parser #168

Closed

Rework the user agent style sheet #169

Closed

kasperisager mentioned this pull request Feb 3, 2020

Explore use cases of including requirements in rules #180

Closed

kasperisager deleted the develop branch April 22, 2020 10:31

kasperisager mentioned this pull request May 5, 2020

Implement pseudo selector parsing #229

Closed

kasperisager mentioned this pull request Jul 3, 2020

Revisit architecture decision reports #291

Closed

12 tasks

kasperisager mentioned this pull request Sep 4, 2020

Implement support for An+B microsyntax #57

Closed

kasperisager mentioned this pull request Mar 30, 2021

Cover all rules in @siteimprove/alfa-rules with test cases #759

Closed

83 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Great Refactoring™️ #165

The Great Refactoring™️ #165

kasperisager commented Dec 17, 2019 •

edited

Loading

The Great Refactoring™️ #165

The Great Refactoring™️ #165

Conversation

kasperisager commented Dec 17, 2019 • edited Loading

Why on earth?

Major changes

Parent pointers and JSON serialisable DOM

Replacing nullables with the Option<T> type

Side-effect free questions in non-automated rules

Optionally asynchronous audits using the Future<T> type

Typed errors using the Result<T, E> type

kasperisager commented Dec 17, 2019 •

edited

Loading

Replacing nullables with the `Option<T>` type

Optionally asynchronous audits using the `Future<T>` type

Typed errors using the `Result<T, E>` type