Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Text/replace builtin #1065

Merged
merged 25 commits into from
Oct 10, 2020

Conversation

alexhumphreys
Copy link
Collaborator

This attempts to standardise the Text/replace builtin.

There's a few parts I wasn't sure on (eg. the B case for the test tests/parser/success/builtinsA.dhall), and there's probably plenty other things I missed so best to look over this carefully.

Also, wasn't sure how best to describe this feature in the prose sections (doc strings in prelude, Language-tour.md, etc.). I feel the need to use the phrase "substring" but Dhall has Text not String so maybe that's misleading. So if anyone has suggestions to make those descriptions more clear I'd appreciate it.

Fixes #1051.

Signed-off-by: Alex Humphreys <alex.humphreys@here.com>
Copy link
Contributor

@Gabriella439 Gabriella439 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great so far!

I think one thing we need to decide is what to do if the substring to replace is "", in order to avoid an infinite loop. Some possibilities I can think of are:

  • Silently perform no replacement if the substring to replace is empty
  • Return an Optional with a None result
  • Require the substring to replace to be statically known at type-checking time and return a type error if empty

Prelude/Text/replace.dhall Show resolved Hide resolved
standard/beta-normalization.md Outdated Show resolved Hide resolved
standard/beta-normalization.md Outdated Show resolved Hide resolved
@alexhumphreys
Copy link
Collaborator Author

alexhumphreys commented Sep 7, 2020

As regards "", I'd vote for the first choice of do nothing. The second choice of returning an optional seems unergonomic. The third choice seems like an odd type error, given that "non-empty string" (or non-empty list for that matter) isn't a type that exists anywhere else in the language.

Just tried ruby to see what it does in this case and it was not what I was expecting:

"foo".gsub "" "bar" == "barfbarobarobar"

... not sure we want to follow that example 😅 Though I can see a case to be made for Text/replace "" "foo" "" === "foo".

I don't feel strongly about this though, whatever ye all prefer is cool.

alexhumphreys and others added 5 commits September 7, 2020 11:53
Co-authored-by: Gabriel Gonzalez <Gabriel439@gmail.com>
Signed-off-by: Alex Humphreys <alex.humphreys@here.com>
Signed-off-by: Alex Humphreys <alex.humphreys@here.com>
Co-authored-by: Gabriel Gonzalez <Gabriel439@gmail.com>
@alexhumphreys alexhumphreys changed the title [WIP] Add Text/replace builtin Add Text/replace builtin Sep 16, 2020
Signed-off-by: Alex Humphreys <alex.humphreys@here.com>
@alexhumphreys
Copy link
Collaborator Author

@Gabriel439 I've committed your suggested change defining normalisation in terms of recursion. I've also added a note to do nothing in the case of "" as the substring to replace, and if we decide on some other behaviour I can update this later.

@Gabriella439
Copy link
Contributor

@alexhumphreys: I'm taking vacation this week, but I'll review again afterwards

@alexhumphreys
Copy link
Collaborator Author

Ah nice, enjoy the vacation! 🌴

Copy link
Contributor

@Gabriella439 Gabriella439 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a test for replacing an empty string? That way, implementations won't forget to handle that case

Other than that, this looks to me like this proposal is basically ready to move out of Draft status

docs/references/Built-in-types.md Outdated Show resolved Hide resolved
docs/tutorials/Language-Tour.md Show resolved Hide resolved
Comment on lines 685 to 690
If the substring to replace is empty (`""`), then no replacement is performed:


f ⇥ Text/replace "" replacement a ⇥ "foo"
────────────────────────────────────────────
f a ⇥ "foo"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this judgement to be the first or second one? The standard has an implicit "first judgement wins" convention that we've been using when more than one judgement applies, and we want this judgment to take precedence

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

alexhumphreys and others added 4 commits September 20, 2020 09:53
Co-authored-by: Gabriel Gonzalez <Gabriel439@gmail.com>
Signed-off-by: Alex Humphreys <alex.humphreys@here.com>
Signed-off-by: Alex Humphreys <alex.humphreys@here.com>
Signed-off-by: Alex Humphreys <alex.humphreys@here.com>
@alexhumphreys
Copy link
Collaborator Author

alexhumphreys commented Sep 20, 2020

@Gabriel439 I've added that empty string test. I still don't have the B case for tests/parser/success/builtinsA.dhall as I'm not sure what Text/replace gets parsed as, but aside from that it's good for me to move this out of draft status.

@alexhumphreys alexhumphreys marked this pull request as ready for review September 20, 2020 08:09
@sjakobi
Copy link
Collaborator

sjakobi commented Sep 20, 2020

Isn't the normalization still way under-specified? How are composite characters supposed to be handled, for example? (I believe @Profpatsch had some opinions about that in some discussion about Text operations)

@Gabriella439
Copy link
Contributor

@sjakobi: I'd propose doing replacement at the grapheme level, as suggested by this test:

let example2 = assert : replace "👨" "👩" "👨‍👩‍👧‍👦" "👩‍👩‍👧‍👦"

… and for correctly handling composite characters I'd suggest performing the replace if the "needle" and the prefix are both canonically equivalent, meaning that they have the same normalization form, specifically Normalization Form C.

@sjakobi
Copy link
Collaborator

sjakobi commented Sep 21, 2020

@Gabriel439 Sounds good to me!

Signed-off-by: Alex Humphreys <alex.humphreys@here.com>
@alexhumphreys
Copy link
Collaborator Author

@Gabriel439 @sjakobi I updated this with a note on using NFC. Well, I pretty much just took your comment from above. Does this need more detail or would that be clear enough?


let example1 = assert : replace "💣" "💥" "💣💣💣" ≡ "💥💥💥"

let example2 = assert : replace "👨" "👩" "👨‍👩‍👧‍👦" ≡ "👩‍👩‍👧‍👦"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add this as one of the test cases in the standard test suite?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, updated. Should I leave this in the prelude? Just wondering how easy this will be for all implementations, since most users importing the prelude won't need this particular replacement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is also worth keeping in the Prelude since this is useful documentation for the end user. These examples will also show up in the generated docs

@Gabriella439
Copy link
Contributor

@alexhumphreys: I have a PR open with all of my suggested changes in: alexhumphreys#1

I decided to skip handling normalization for now (and adding a test making that explicit). I can standardize support for Unicode normalization in a separate proposal, to keep this pull request simpler.

Gabriella439 added a commit to dhall-lang/dhall-haskell that referenced this pull request Oct 4, 2020
@Gabriella439
Copy link
Contributor

I also have a matching change to the Haskell implementation up here: dhall-lang/dhall-haskell#2063

@alexhumphreys
Copy link
Collaborator Author

@Gabriel439 thanks for the PR, those beta normalisation rules are a lot clearer. It also thought me a bit about string interpolation: didn't know it would get normalised under a lambda binding before the lambda is applied.

Splitting the Unicode handling to a different PR is cool too. Would let example2 = assert : replace "👨" "👩" "👨‍👩‍👧‍👦" ≡ "👩‍👩‍👧‍👦" still remain in the prelude for this PR? I think the output of that wouldn't change between NFC and NFD normalisation, so it should be ok.

Alex Humphreys added 2 commits October 4, 2020 21:46
Signed-off-by: Alex Humphreys <alex.humphreys@here.com>
Signed-off-by: Alex Humphreys <alex.humphreys@here.com>
@Gabriella439
Copy link
Contributor

@alexhumphreys: I think that example would be the same regardless of how we handle normalization

Copy link
Collaborator

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from one more wibble. Thanks! :)

docs/references/Built-in-types.md Outdated Show resolved Hide resolved
Co-authored-by: Simon Jakobi <simon.jakobi@gmail.com>
@Gabriella439
Copy link
Contributor

@alexhumphreys: You should be clear to merge this

@alexhumphreys
Copy link
Collaborator Author

@Gabriel439 sweet! Any 3 day waiting period or can I just hit the button?

@Gabriella439
Copy link
Contributor

@alexhumphreys: Since you don't have a majority of approvals but you also don't have a majority of disapprovals then the policy is 7 days since the PR was ready (i.e. not a draft PR) and 1 day since last code change. For more details, see: https://github.com/dhall-lang/dhall-lang/blob/master/.github/CONTRIBUTING.md#how-do-changes-get-approved

@alexhumphreys
Copy link
Collaborator Author

Ah cool, so it should be good to be merged. Looks like I don't have write access so maybe one of ye wants to hit the button? Also there's 25 commits in this PR now, do ye have a policy of squashing commits or should I go in and tidy up the history a bit?

@philandstuff
Copy link
Collaborator

This repo enforces squash-on-merge. We should get you the commit bit so you can merge yourself 👍🏻

@philandstuff
Copy link
Collaborator

philandstuff commented Oct 10, 2020

@alexhumphreys I’ve granted you write access now, check your email

@alexhumphreys alexhumphreys merged commit 16160eb into dhall-lang:master Oct 10, 2020
@alexhumphreys
Copy link
Collaborator Author

Sweet! Thanks @philandstuff, and everyone else too for helping on this 🙂

@Gabriella439
Copy link
Contributor

@alexhumphreys: Thank you, too! Also, now that this is merged you can submit an invoice here for $200 and one of us will approve it: https://opencollective.com/dhall/expenses/new

@alexhumphreys
Copy link
Collaborator Author

Thanks, but it's ok! I've been meaning to donate to that Dhall fund so consider this a contribution 🙂

Gabriella439 added a commit to dhall-lang/dhall-haskell that referenced this pull request Oct 13, 2020
@alexhumphreys alexhumphreys deleted the feature/text/replace branch October 15, 2020 06:40
Copy link
Collaborator

@basile-henry basile-henry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm a bit late for a review. I just have a question as I am attempting to implement this in dhall-rust. 😅

{- This test verifies that an implementation correctly permits both the
"replacement" and the "haystack" to be abstract.
-}
λ(x : Text) → λ(y : Text) → Text/replace "a" "-${x}-" "_a_${y}_a_"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand how the haystack can be abstract. Don't we want to replace occurrences of the needle in the abstract section too? i.e. What happens if y is the string "a"?

Copy link
Contributor

@Gabriella439 Gabriella439 Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@basile-henry: Oh, I see what you mean. Allowing the haystack to be abstract means that β-reduction is no longer confluent. For right now, just ignore that test and I can put up a change to require the haystack to be non-abstract before further reduction can occur.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Having a way to apply string/text transformation like replace
7 participants