Calc Engine
This page describes design of a CalcEngine
.
Every value in the CalcEngine has a type. There is a limited number of types and they can be type conversions. Some basic details are described in ECMA-376:1, §18.17.2.6, though incomplete.
Blank is a value of an empty cell or a unspecified optional (=ISBLANK(IF(TRUE,,))
is true
). It can be coerced to a number 0 or empty string.
Logical is a bool value that can be either TRUE or FALSE.
Number is a double precision number. Number is not represented by int
, only by double
.
Text is a string value. It can have an arbitrary length, the spec recommends at least 32,767 chars. Text is never null, but it can be empty.
Error is a value representing an error in a formula. Error is a normal value that can be passed from one function to another. New errors can be added to OpenXML (e.g. #SPILL!
was added for dynamic arrays).
Reference is a collection of areas. Each reference must contain at least one area. Area is a rectangular range in a worksheet, though area can be even without worksheet.
Array is a 2D array of scalar values. Every array is at least 1x1.
Whether array can actually contain blank or not is uncertain (the const declaration doesn't allow it), but for each function accepting an array we can substitute array with a reference. Reference can have blank values, so let's assume array can too.
ScalarValue is a single concrete value of one of the following types: blank
, logical
, number
, text
or error
.
Any value represents a union of any type that can be passed during formula evaluation (blank
, logical
, number
, text
, error
, reference
, array
).
It is worth mentioning a few terms that are not types in the context of CalcEngine:
- area - a reference consists of areas, but area itself is not a separate type.
- int - all numbers are represented by number (=
double
).
- Not yet finished
Formulas are represented by a stateless tree in the RC form. The reason for statelessness is to share formulas, e.g. imagine a sheet where formula is coped on each row
A | B | C | C in RC | |
---|---|---|---|---|
1 | Month | YTD sales | AVG sales/month | |
2 | 1 | 100 | =B2/A2 | =R[0]C[-1]/R[0]C[-2] |
2 | 2 | 140 | =B3/A3 | =R[0]C[-1]/R[0]C[-2] |
2 | 3 | 375 | =B4/A4 | =R[0]C[-1]/R[0]C[-2] |
Note that RC notation formulas are the same. Formulas can be far more complex, they are syntax trees and it would be very wasteful to have a separate tree for each row. Thanks to statelessness, formulas can be shared across multiple cells and thanks to RC notation it is easier to unify the formulas.
You have to research how does a function works. A good start is the info put out by Microsoft:
- ECMA-376 Part 1 §18.17.7 has an entry for some functions, but not all and documentation has a lot of errors. You can look at ISO standard too, they are basically the same.
- MS-OI29500 is an implementation document for Excel that provides a further information about how does Excel implement various features. Each chatper in a document ammends info about a chapter in ECMA-376/ISO.
- Office Support page (Excel functions (alphabetical)) has often some additional information, though less detailed than ECMA-376.
What is needed is understanding of edge conditions and unusual inputs. Some common questions to consider:
- Does the culture of Excel change behavior?
- Unexpected inputs, not described anywhere (errors, empty string, negative values where only positive expected and so on).
- Does it except ranges? If so, in which arguments.
- If it accepts reference, what happens if we pass in reference with multiple areas
- If it accepts an array, what if some elements of an array are errors or a unexpected type
One you have a good understanding of the function, try to determine signature - determine what arguments are accepted. The universal delegate for all functions is this one:
internal delegate AnyValue CalcEngineFunction(CalcContext ctx, Span<AnyValue> args);
but it should be used sparingly. Function shouldn't deal with conversions, it should just have a signature with argument it accepts. For example:
AnyValue Cos(double number)
The conversion from the generic CalcEngineFunction
signature to the the more specific is done by adapter function (see later chapter).
Common arguments should be
-
bool
(representinglogical
type), -
double
(representingnumber
type), -
string
(representingtext
type), XLError
Reference
Array
-
OneOf<Blank, Reference>
- for optional arguments -
OneOf<T1, T2>
- when argument can have multiple types ScalarValue
AnyValue
The specification often gives hints about correct types, but the suggested types of argument must be verified!
As an example, PROPER
likely has a single argument OneOf<string, Reference>
Write an adapter from CalcEngineFunction
to an actual signature of the type. This step can be skipped,
- if there already is an adapter for the function signature
- if the function has a same signature as
CalcEngineFunction
.
By convention, all methods are called Adapt
and differ only by argument types.
Note that #VALUE! (XLError.IncompatibleValue) means a typing error.
- Open 'SignatureAdapter'
- Try to find an
Adapt
method with a signature that accepts the function actual signature. If it does, everything is great. - Write a new function that takes a reference to a function and returns
CalcEngineFunction
.
One day it might work with a source generators, but for now everything must be done by hand.
Function must be convered by tests.
- How do I deliver an Excel file in ASP.NET?
- Does it support Excel 2003 and prior formats (.xls)?
- How can I insert an image?
- Text with numbers are getting converted to numbers, what's up with that?
- How do I get the result of a formula?
- Data Types
- Creating Multiple Worksheets
- Organizing Sheets
- Loading and Modifying Files
- Using Lambda Expressions
- Cell Values
- Workbook Properties
- Using Formulas
- Evaluating Formulas
- Creating Rows And Columns Outlines
- Hide Unhide Rows And Columns
- Freeze Panes
- Copying Worksheets
- Using Hyperlinks
- Data Validation
- Hide Worksheets
- Sheet Protection
- Tab Colors
- Conditional Formatting
- Pivot Table example
- Sparklines
- Copying IEnumerable Collections
- Inserting Data
- Inserting Tables
- Adding DataTable as Worksheet
- Adding DataSet
- Styles - Alignment
- Styles - Border
- Styles - Fill
- Styles - Font
- Styles - NumberFormat
- NumberFormatId Lookup Table
- Style Worksheet
- Style Rows and Columns
- Using Default Styles
- Using Colors
- ClosedXML Predefined Colors
- Excel Indexed Colors
- Using Rich Text
- Using Phonetics
- Defining Ranges
- Merging Cells
- Clearing Ranges
- Deleting Ranges
- Multiple Ranges
- Shifting Ranges
- Transpose Ranges
- Named Ranges
- Accessing Named Ranges
- Copying Ranges
- Using Tables
- Sorting Data
- Selecting Cells and Ranges
- Row Height and Styles
- Selecting Rows
- Inserting Rows
- Inserting and Deleting Rows
- Adjust Row Height and Column Width to Contents
- Row Cells
- Column Width and Styles
- Selecting Columns
- Inserting Columns
- Inserting and Deleting Columns
- Adjust Row Height and Column Width to Contents
- Column Cells
- Pages Tab
- Paper Size Lookup Table
- Margins Tab
- Headers and Footers Tab
- Sheet Tab
- Print Areas and Page Breaks