Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formulas handling redesign #893

Open
Pankraty opened this issue Jun 1, 2018 · 1 comment
Open

Formulas handling redesign #893

Pankraty opened this issue Jun 1, 2018 · 1 comment

Comments

@Pankraty
Copy link
Member

Pankraty commented Jun 1, 2018

After Styles and Ranges handling has been improved it's time to revise formulas.

Goals:

  1. Decrease memory usage by formulas stored in XLCell s by implementing the formula sharing.
  2. Improve the performance by reducing the amount of formula parsing work (Better way to store formulas internally #815).
  3. Unify the approach to storing the formulas among different entities (cells, named ranges, conditional formats, etc.)
  4. Unify the approach to storing formula computation results. Now implemented for cells only.
  5. Unify the approach to formula shifting (Formulas in conditional format rules are not shifted properly #686, Addresses in named ranges do not become invalid when the range itself is deleted #880). This may be achieved if formulas refer to cells and ranges by reference rather than by addresses in string format. Thus, when the related range address changes the formula changes accordingly automatically; no need to shift them one-by-one separately for cells, conditional formats, named ranges, data validations, etc.
  6. Support table column references, e.g. [Value] (Support table column references, e.g. [Value] #271)
  7. Support for GETPIVOTDATA() formula (needs pivot context).

Affected entities

  1. XLCell
  2. XLConditionalFormat
  3. XLNamedRange
  4. XLDataValidation
  5. XLTableField
  6. Charts - somewhere far in the future

Basic concepts

  1. Different contexts require different formulas. E.g. for cell, formula result must be a single value, while for named range it can be either a single value, or a range reference, or a collection of range references. Technically, they may be implemented as different subclasses of a single class, or by using an enum reflecting the formula type.
  2. For the purpose of sharing formulas between entities the new repository has to be introduced (XLExpressionRepository?). It will store expression trees without relation to concrete cells or ranges. For example, two cells having formulas =SUM(A1:A2) and =SUM(D2:E18) will internally refer to the same expression SUM(*range reference nr. 1*); another cell with the formula =SUM(A1:A2)+SUM(D2:E18) will refer to another expression (SUM(*range reference nr. 1*) + SUM(*range reference nr.2*)) having no relation to the first one. (Textual representation of the expressions serves to explanation only; they must be stored in a compiled form)
  3. Concrete instances that may use formulas (see Affected entities) hold separate instances of XLFormula (?). It includes
  • A reference to the shared expression (XLExpression ?)
  • A collection of concrete range references (A1:A2 for the cell 1, D2:E18 for the cell 2, {A1:A2, D2:E18} for the cell 3). Perhaps, the same collection must store cell references and table column references as well.
  • Cached computation results, a stamp of the last computation, flag NeedsRecalculation, etc. - the existing staff implemented in XLCell for caching formula values.
  1. XLFormula must have methods allowing to dynamically construct FormulaA1 and FormulaR1C1 so they won't be stored for each cell, reducing the memory usage.
  2. References to cells and ranges in XLFormula may be relative or absolute. When the formula is copied absolute references remain and the relative references start to refer to new ranges.
  3. When the range is shifted formulas referring to it modify accordingly. Same when the range or entire worksheet is deleted (references become invalid - #REF!). No effort is required.
  4. Public API most likely won't change.

Risks

  1. Cross-workbook operations may become more complicated and require extensive testing.
@Pankraty Pankraty added this to the v0.94 milestone Jun 1, 2018
@igitur
Copy link
Member

igitur commented Jun 1, 2018

I've edited the list above and added an item about GETPIVOTDATA. As we discover more issues relevant to this, let's update the original post.

@Pankraty Pankraty mentioned this issue Jun 8, 2018
2 tasks
@igitur igitur removed this from the v0.94 milestone Oct 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants