Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokens: handle ISO 8601 long format cycle points #6123

Merged
merged 1 commit into from
Jun 11, 2024

Conversation

MetRonnie
Copy link
Member

@MetRonnie MetRonnie commented Jun 6, 2024

Closes #6110

Before:

>>> from cylc.flow.id import tokenise

>>> dict(tokenise('//2050-01-01T00:00Z'))
{'cycle': '2050-01-01T00', 'cycle_sel': '00Z'}

After:

>> from cylc.flow.id_cli import cli_tokenise

>>> dict(cli_tokenise('//2050-01-01T00:00Z'))
{'cycle': '2050-01-01T00:00Z'}

>>> dict(cli_tokenise('//2050-01-01T00:fail'))
{'cycle': '2050-01-01T00', 'cycle_sel': 'fail'}

I decided to have a go at supporting the long format rather than disallowing it, seeing as it isn't much diff. Performance-wise this should have negligible impact as the "expensive" part of it is only going to be hit once by users running a CLI command.

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • No dependency changes
  • Tests are included
  • CHANGES.md entry included if this is a change that can affect users
  • No docs needed
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@MetRonnie MetRonnie added small could be better Not exactly a bug, but not ideal. labels Jun 6, 2024
@MetRonnie MetRonnie added this to the 8.3.1 milestone Jun 6, 2024
@MetRonnie MetRonnie self-assigned this Jun 6, 2024
cylc/flow/id.py Outdated Show resolved Hide resolved
Copy link
Member

@dwsutherland dwsutherland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

>>> dict(Tokens('//2050-01-01T00:00Z:failed'))
{'cycle': '2050-01-01T00:00Z', 'cycle_sel': 'failed', 'task': None, 'task_sel': None, 'job': None, 'job_sel': None}

I assume there's no large overhead adding this extra step into tokenise

@MetRonnie
Copy link
Member Author

On master:

In [2]: %timeit Tokens('foo//20500101T0000Z/a')
5.45 µs ± 313 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

On this branch:

In [2]: %timeit Tokens('foo//20500101T0000Z/a')
7.64 µs ± 55.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [3]: %timeit Tokens('foo//2050-01-01T00:00Z/a')
1.13 ms ± 10.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Ok, so I wasn't expecting it to be that much worse. Turns out init'ing the TimePointParser class goes through a process of generating a bunch of regexes? https://github.com/metomi/isodatetime/blob/a4614a2153548ca99e2999ba6f20ae40242b0037/metomi/isodatetime/parsers.py#L140

I don't think this should be a problem for Cylc internals as we don't use the extended datetime format for cycle points, just have to make sure we keep it that way? Or I guess I could move this to id_cli.py so it definitely only applies to the CLI

@dwsutherland
Copy link
Member

I don't think this should be a problem for Cylc internals as we don't use the extended datetime format for cycle points, just have to make sure we keep it that way?

I think the cycle format is used in the ids of the data-store, so it's whether they are being tokenised from string format frequently.. (which they very well may be)

@MetRonnie
Copy link
Member Author

Note the cycle point format set in flow.cylc can't be the extended/long format (as it is used for job dir paths so can't contain :)

@dwsutherland
Copy link
Member

Note the cycle point format set in flow.cylc can't be the extended/long format (as it is used for job dir paths so can't contain :)

Ah yes, and only that could be used internally.. So I think we're safe.

@MetRonnie
Copy link
Member Author

Or I guess I could move this to id_cli.py so it definitely only applies to the CLI

I've gone ahead and done this as it makes sense to me

@hjoliver hjoliver merged commit 1a9c681 into cylc:master Jun 11, 2024
27 checks passed
@hjoliver
Copy link
Member

Post-merge comment - this might warrant an explanation and example in CLI help, otherwise users are unlikely to realize that it's possible.

@MetRonnie MetRonnie modified the milestones: 8.3.1, 8.3.0 Jun 12, 2024
@MetRonnie MetRonnie deleted the tokens branch June 12, 2024 10:55
@MetRonnie
Copy link
Member Author

cylc help id only shows examples with integer cycles for brevity anyway - I think it's fine the way it is. The important thing is that either the short or long format will work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
could be better Not exactly a bug, but not ideal. small
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ID tokeniser bug
3 participants