Partial Row and Column Ranges #7

alexharri · 2023-01-23T14:28:17Z

What

Add support for A1:A, A:A1, 1:A1, and A1:1 ranges.

How

Match `re_A1COL_PARTIAL` and `re_A1ROW_PARTIAL` before `re_A1RANGE`

re_A1RANGE matches A1, so it can match the prefix of both partial row and column ranges:

A1:A - partial column range
A1:1 - partial row range

This means that we need to match partial row and column ranges before attempting to match normal ranges.

export const tokenHandlersA1 = [
  // ...
  [ RANGE_BEAM,  re_A1COL_PARTIAL, quickVerifyRangeA1 ],
  [ RANGE_BEAM,  re_A1ROW_PARTIAL, quickVerifyRangeA1 ],
  [ RANGE,       re_A1RANGE, quickVerifyRangeA1 ],
  [ RANGE_BEAM,  re_A1COL, quickVerifyRangeA1 ],
  [ RANGE_BEAM,  re_A1ROW, quickVerifyRangeA1 ],
  // ...
];

re_A1COL_PARTIAL and re_A1ROW_PARTIAL regexes both have two capture groups, which can be represented like so (heavily simplified):

re_A1COL_PARTIAL = /(A1:A)|(A:A1)/i
re_A1ROW_PARTIAL = /(A1:1)|(1:A1)/i

I would have liked to do something like so:

re_A1COL_PARTIAL = /A1?:A1?/i
re_A1ROW_PARTIAL = /A?1:A?1/i

But this would match normal ranges such as A1:B2, so we need to match both partial cases explicitly.

The re_A1COL_PARTIAL regex contains a negative lookahead in the A1:A case that disallows the match being followed by:

the digits 0-9, or
(, or
., or
the characters A-Z.

This prevents the following matches:

A1:B2
^^^^ - prevented because '2' matches 0-9

A1:IF()
^^^^^ - prevented because '(' matches '('

A1:F.DIST()
^^^^ - prevented because '(' matches '('

A1:IF()
^^^^ - prevented because 'F' matches A-Z

This is documented in a comment, and captured in tests.

borgar · 2023-01-27T13:45:46Z

This is good effort but there are some things we need to fix here:

This will need to be optional. For use in Excel strict mode we don't want to match these. I'm planning to bump a major version anyway (see dev branch) so I would be happy to have it opt out, but an option needs to exist.
P10:1 is lexed as { type: 'range-beam', value: 'P10:1' } but the range is not a beam, it's a regular range. Sheets interprets this as the regular range P1:XFD10. I have not tested it, but I suspect this may cause problems in translateToRC().
Similar to how partial strings and paths work, partial range tokens will need to be tagged as partial or unterminated to simplify normalizing them to "compliant" ranges. So P10:1 would something like: { type: 'range', value: 'P10:1', partial: true }
There need to be tests covering the translation of partials to RC notation.
This is causing me some head scratching when it comes to the relationship between the most atomic range tokens and "merged ranges". There are two ways I can see this going:
1. We could, when parsing without merged ranges, simply parse A1:1 as we currently do (A1, :, 1) and then when we merge ranges we can recognize that RANGE:NUM is a partial range and deal with it there. Partial ranges always have a simple range on either side of the colon, so the problem is really limiting which RANGE_NAMED or NUMBER are allowed.
2. We can add a new token type, RANGE_PART, that would cover the four variants (A:A1, 1:A1, A1:A, A1:1). This would likely be simpler to do overall, and cover the tagging need.

From the looks of it, how ranges are parsed may need a bigger change. I noted, while exploring this, that Google Sheets has a dynamic range size limit. They won't parse (or perhaps expose?) a range if it exceeds current sheet bounds. So in the default "new sheet", which is sized A1:Z1000, they won't show AA:AA as a range. Insert a column to expand the size to A1:AA1000 and then it will work. This might be a desirable feature and another argument for re-working the lexer a bit to allow it.

alexharri · 2023-01-28T14:13:44Z

@borgar Thanks for the review, really good points! Here are some thoughts:

This will need to be optional. For use in Excel strict mode we don't want to match these.

Good point. I figure we could introduce a mode option (e.g. "excel" | "google"), or we could go for more granular options (e.g. { partialBeams?: boolean }). Do you have a strong opinion here?

P10:1 is lexed as { type: 'range-beam', value: 'P10:1' } but the range is not a beam, it's a regular range. Sheets interprets this as the regular range P1:XFD10.

Hmm, we may need to align on terminology here. To me, a "range beam" is equivalent to "column or row range", and I consider partial col/row ranges such as A:A1, A1:A, 1:A1, A1:1 to be a type of col/row range. This is why I chose to categorize them as range beams.

But I think you're right that we need to distunguish partial col/row ranges from whole col/row ranges. We have a few options:

{ type: "range-beam", partial: true, value: "..." }
{ type: "partial-range-beam", value: "..." }

Google Sheets seems to consider A1:A to be a single token. It does not support doing A1:INDIRECT("A"), but it does support =A1:INDIRECT("A10"), which ties into:

We can add a new token type, RANGE_PART, that would cover the four variants (A:A1, 1:A1, A1:A, A1:1).

Since A1:INDIRECT("A") is not supported, I don't think there is too much value in doing that. Parsing the different parts of partial col/row ranges separately would likely be a source of complexity, so I think we should parse partial col/row ranges as a single token like we currently do for whole col/row ranges.

There need to be tests covering the translation of partials to RC notation.

I had not considered this. I'll have to do some research to figure out how this should work.

I noted, while exploring this, that Google Sheets has a dynamic range size limit. ... This might be a desirable feature and another argument for re-working the lexer a bit to allow it.

We could add the sheet size as an option and use that in quickVerifyRangeA1. That would allow us to keep the lexer structure relatively unchanged, and it wouldn't be a hit to performance.

alexharri added 5 commits January 23, 2023 13:47

add support for partial row and column ranges

c33bf0d

add tests and fix bugs related to partial row and column ranges

30bfc84

flip rows and columns if necessary for partial row and column ranges

066abe6

expand lexer tests for tokenizing row and column ranges

57c2c0f

document re_A1COL_PARTIAL and fix matching A1:IF for A1:IF(...)

8535e58

alexharri marked this pull request as ready for review January 23, 2023 15:18

borgar mentioned this pull request Feb 21, 2023

3.0 update #14

Merged

borgar closed this in 57a21ee Feb 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partial Row and Column Ranges #7

Partial Row and Column Ranges #7

alexharri commented Jan 23, 2023 •

edited

Loading

borgar commented Jan 27, 2023

alexharri commented Jan 28, 2023

Partial Row and Column Ranges #7

Partial Row and Column Ranges #7

Conversation

alexharri commented Jan 23, 2023 • edited Loading

What

How

Match re_A1COL_PARTIAL and re_A1ROW_PARTIAL before re_A1RANGE

borgar commented Jan 27, 2023

alexharri commented Jan 28, 2023

alexharri commented Jan 23, 2023 •

edited

Loading

Match `re_A1COL_PARTIAL` and `re_A1ROW_PARTIAL` before `re_A1RANGE`