Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial Row and Column Ranges #7

Closed

Conversation

alexharri
Copy link

@alexharri alexharri commented Jan 23, 2023

What

Add support for A1:A, A:A1, 1:A1, and A1:1 ranges.

How

Match re_A1COL_PARTIAL and re_A1ROW_PARTIAL before re_A1RANGE

re_A1RANGE matches A1, so it can match the prefix of both partial row and column ranges:

  • A1:A - partial column range
  • A1:1 - partial row range

This means that we need to match partial row and column ranges before attempting to match normal ranges.

export const tokenHandlersA1 = [
  // ...
  [ RANGE_BEAM,  re_A1COL_PARTIAL, quickVerifyRangeA1 ],
  [ RANGE_BEAM,  re_A1ROW_PARTIAL, quickVerifyRangeA1 ],
  [ RANGE,       re_A1RANGE, quickVerifyRangeA1 ],
  [ RANGE_BEAM,  re_A1COL, quickVerifyRangeA1 ],
  [ RANGE_BEAM,  re_A1ROW, quickVerifyRangeA1 ],
  // ...
];

re_A1COL_PARTIAL and re_A1ROW_PARTIAL regexes both have two capture groups, which can be represented like so (heavily simplified):

re_A1COL_PARTIAL = /(A1:A)|(A:A1)/i
re_A1ROW_PARTIAL = /(A1:1)|(1:A1)/i

I would have liked to do something like so:

re_A1COL_PARTIAL = /A1?:A1?/i
re_A1ROW_PARTIAL = /A?1:A?1/i

But this would match normal ranges such as A1:B2, so we need to match both partial cases explicitly.

The re_A1COL_PARTIAL regex contains a negative lookahead in the A1:A case that disallows the match being followed by:

  • the digits 0-9, or
  • (, or
  • ., or
  • the characters A-Z.

This prevents the following matches:

A1:B2
^^^^ - prevented because '2' matches 0-9

A1:IF()
^^^^^ - prevented because '(' matches '('

A1:F.DIST()
^^^^ - prevented because '(' matches '('

A1:IF()
^^^^ - prevented because 'F' matches A-Z

This is documented in a comment, and captured in tests.

@alexharri alexharri marked this pull request as ready for review January 23, 2023 15:18
@borgar
Copy link
Owner

borgar commented Jan 27, 2023

This is good effort but there are some things we need to fix here:

  • This will need to be optional. For use in Excel strict mode we don't want to match these. I'm planning to bump a major version anyway (see dev branch) so I would be happy to have it opt out, but an option needs to exist.

  • P10:1 is lexed as { type: 'range-beam', value: 'P10:1' } but the range is not a beam, it's a regular range. Sheets interprets this as the regular range P1:XFD10. I have not tested it, but I suspect this may cause problems in translateToRC().

  • Similar to how partial strings and paths work, partial range tokens will need to be tagged as partial or unterminated to simplify normalizing them to "compliant" ranges. So P10:1 would something like: { type: 'range', value: 'P10:1', partial: true }

  • There need to be tests covering the translation of partials to RC notation.

  • This is causing me some head scratching when it comes to the relationship between the most atomic range tokens and "merged ranges". There are two ways I can see this going:

    1. We could, when parsing without merged ranges, simply parse A1:1 as we currently do (A1, :, 1) and then when we merge ranges we can recognize that RANGE:NUM is a partial range and deal with it there. Partial ranges always have a simple range on either side of the colon, so the problem is really limiting which RANGE_NAMED or NUMBER are allowed.

    2. We can add a new token type, RANGE_PART, that would cover the four variants (A:A1, 1:A1, A1:A, A1:1). This would likely be simpler to do overall, and cover the tagging need.

From the looks of it, how ranges are parsed may need a bigger change. I noted, while exploring this, that Google Sheets has a dynamic range size limit. They won't parse (or perhaps expose?) a range if it exceeds current sheet bounds. So in the default "new sheet", which is sized A1:Z1000, they won't show AA:AA as a range. Insert a column to expand the size to A1:AA1000 and then it will work. This might be a desirable feature and another argument for re-working the lexer a bit to allow it.

@alexharri
Copy link
Author

@borgar Thanks for the review, really good points! Here are some thoughts:

This will need to be optional. For use in Excel strict mode we don't want to match these.

Good point. I figure we could introduce a mode option (e.g. "excel" | "google"), or we could go for more granular options (e.g. { partialBeams?: boolean }). Do you have a strong opinion here?

P10:1 is lexed as { type: 'range-beam', value: 'P10:1' } but the range is not a beam, it's a regular range. Sheets interprets this as the regular range P1:XFD10.

Hmm, we may need to align on terminology here. To me, a "range beam" is equivalent to "column or row range", and I consider partial col/row ranges such as A:A1, A1:A, 1:A1, A1:1 to be a type of col/row range. This is why I chose to categorize them as range beams.

But I think you're right that we need to distunguish partial col/row ranges from whole col/row ranges. We have a few options:

{ type: "range-beam", partial: true, value: "..." }
{ type: "partial-range-beam", value: "..." }

Google Sheets seems to consider A1:A to be a single token. It does not support doing A1:INDIRECT("A"), but it does support =A1:INDIRECT("A10"), which ties into:

We can add a new token type, RANGE_PART, that would cover the four variants (A:A1, 1:A1, A1:A, A1:1).

Since A1:INDIRECT("A") is not supported, I don't think there is too much value in doing that. Parsing the different parts of partial col/row ranges separately would likely be a source of complexity, so I think we should parse partial col/row ranges as a single token like we currently do for whole col/row ranges.

There need to be tests covering the translation of partials to RC notation.

I had not considered this. I'll have to do some research to figure out how this should work.

I noted, while exploring this, that Google Sheets has a dynamic range size limit. ... This might be a desirable feature and another argument for re-working the lexer a bit to allow it.

We could add the sheet size as an option and use that in quickVerifyRangeA1. That would allow us to keep the lexer structure relatively unchanged, and it wouldn't be a hit to performance.

@borgar borgar mentioned this pull request Feb 21, 2023
@borgar borgar closed this in 57a21ee Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants