Skip to content

Calc Engine

Jahav edited this page Oct 27, 2022 · 3 revisions

This page describes design of a CalcEngine.

Types

Every value in the CalcEngine has a type. There is a limited number of types and they can be type conversions. Some basic details are described in ECMA-376:1, §18.17.2.6, though incomplete.

Base types

Blank

Blank is a value of an empty cell or a unspecified optional (=ISBLANK(IF(TRUE,,)) is true). It can be coerced to a number 0 or empty string.

Logical

Logical is a bool value that can be either TRUE or FALSE.

Number

Number is a double precision number. Number is not represented by int, only by double.

Text

Text is a string value. It can have an arbitrary length, the spec recommends at least 32,767 chars. Text is never null, but it can be empty.

Error

Error is a value representing an error in a formula. Error is a normal value that can be passed from one function to another. New errors can be added to OpenXML (e.g. #SPILL! was added for dynamic arrays).

Reference

Reference is a collection of areas. Each reference must contain at least one area. Area is a rectangular range in a worksheet, though area can be even without worksheet.

Array

Array is a 2D array of scalar values. Every array is at least 1x1.

Whether array can actually contain blank or not is uncertain (the const declaration doesn't allow it), but for each function accepting an array we can substitute array with a reference. Reference can have blank values, so let's assume array can too.

Composite types

ScalarValue

ScalarValue is a single concrete value of one of the following types: blank, logical, number, text or error.

AnyValue

Any value represents a union of any type that can be passed during formula evaluation (blank, logical, number, text, error, reference, array).

Non-types

It is worth mentioning a few terms that are not types in the context of CalcEngine:

  • area - a reference consists of areas, but area itself is not a separate type.
  • int - all numbers are represented by number (=double).

Formula representation

- Not yet finished

Formulas are represented by a stateless tree in the RC form. The reason for statelessness is to share formulas, e.g. imagine a sheet where formula is coped on each row

A B C C in RC
1 Month YTD sales AVG sales/month
2 1 100 =B2/A2 =R[0]C[-1]/R[0]C[-2]
2 2 140 =B3/A3 =R[0]C[-1]/R[0]C[-2]
2 3 375 =B4/A4 =R[0]C[-1]/R[0]C[-2]

Note that RC notation formulas are the same. Formulas can be far more complex, they are syntax trees and it would be very wasteful to have a separate tree for each row. Thanks to statelessness, formulas can be shared across multiple cells and thanks to RC notation it is easier to unify the formulas.

How to develop a new function

Research it

You have to research how does a function works. A good start is the info put out by Microsoft:

  • ECMA-376 Part 1 §18.17.7 has an entry for some functions, but not all and documentation has a lot of errors. You can look at ISO standard too, they are basically the same.
  • MS-OI29500 is an implementation document for Excel that provides a further information about how does Excel implement various features. Each chatper in a document ammends info about a chapter in ECMA-376/ISO.
  • Office Support page (Excel functions (alphabetical)) has often some additional information, though less detailed than ECMA-376.

What is needed is understanding of edge conditions and unusual inputs. Some common questions to consider:

  • Does the culture of Excel change behavior?
  • Unexpected inputs, not described anywhere (errors, empty string, negative values where only positive expected and so on).
  • Does it except ranges? If so, in which arguments.
  • If it accepts reference, what happens if we pass in reference with multiple areas
  • If it accepts an array, what if some elements of an array are errors or a unexpected type

Determine signature

One you have a good understanding of the function, try to determine signature - determine what arguments are accepted. The universal delegate for all functions is this one:

internal delegate AnyValue CalcEngineFunction(CalcContext ctx, Span<AnyValue> args);

but it should be used sparingly. Function shouldn't deal with conversions, it should just have a signature with argument it accepts. For example:

AnyValue Cos(double number)

The conversion from the generic CalcEngineFunction signature to the the more specific is done by adapter function (see later chapter). Common arguments should be

  • bool (representing logical type),
  • double (representing number type),
  • string (representing text type),
  • XLError
  • Reference
  • Array
  • OneOf<Blank, Reference> - for optional arguments
  • OneOf<T1, T2> - when argument can have multiple types
  • ScalarValue
  • AnyValue

The specification often gives hints about correct types, but the suggested types of argument must be verified!

As an example, PROPER likely has a single argument OneOf<string, Reference> image

Write an adapter

Write an adapter from CalcEngineFunction to an actual signature of the type. This step can be skipped,

  • if there already is an adapter for the function signature
  • if the function has a same signature as CalcEngineFunction.

By convention, all methods are called Adapt and differ only by argument types. Note that #VALUE! (XLError.IncompatibleValue) means a typing error.

  • Open 'SignatureAdapter'
  • Try to find an Adapt method with a signature that accepts the function actual signature. If it does, everything is great.
  • Write a new function that takes a reference to a function and returns CalcEngineFunction.

One day it might work with a source generators, but for now everything must be done by hand.

Unit tests

Function must be convered by tests.

Run fuzzer

FAQ

Examples

Real world scenarios

Time Savers

Performance and Memory

Misc

Inserting Data/Tables

Styles

Ranges

Rows

Columns

Page Setup (Print Options)

AutoFilters

Comments

Dev docs

Clone this wiki locally