Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support hexadecimal and octal syntax for integers #7578

Open
roberth opened this issue Jan 10, 2023 · 21 comments · May be fixed by #7695
Open

Support hexadecimal and octal syntax for integers #7578

roberth opened this issue Jan 10, 2023 · 21 comments · May be fixed by #7695
Labels
feature Feature request or proposal language The Nix expression language; parser, interpreter, primops, evaluation, etc

Comments

@roberth
Copy link
Member

roberth commented Jan 10, 2023

Is your feature request related to a problem? Please describe.

Some configuration formats support hex or octal syntax. It would be nice if those numbers were easily expressed in Nix as well.

Describe the solution you'd like

Extend the parser to parse 0x[:xdigit]+ as a integer converted from base 16
Currently e.g. 0xff is parsed as 0 xff, which would never return anything is only valid syntax when used inside a list literal. We can assume that all expressions containing a hexadecimal integer are currently invalid expressions. That makes this change an improvement instead of a breaking change. Nonetheless, this seems like a very unlikely sub-expression in practice, as most people have used actual whitespace to separate numbers and identifiers in lists, if they even consider building a heterogeneous list.

Similarly, extend the parser to parse 0o[0-7]+ as a integer converted from base 8.

Similarly, make sure negative integers in these bases work.

I would suggest not to save the base of the integer in its Value representation. This keeps integers simple.

Describe alternatives you've considered

  • Add base 8 or 16 conversion functions to the Nixpkgs lib. The end user syntax would not be great, for example: lib.fromHex "0xff".

  • Saving the base of the integer in its Value representation might be useful for detecting decimal integers used in places where an octal integer is expected. If the user writes an octal integer using decimal syntax, we could detect that mistake. However, I don't think it's currently worth complicating the meaning of integers for this.

Additional context

Inspired by NixOS/nixpkgs#208747 (comment)

Priorities

Add 👍 to issues you find important.

@roberth roberth added feature Feature request or proposal language The Nix expression language; parser, interpreter, primops, evaluation, etc labels Jan 10, 2023
@roberth roberth changed the title Support hexadecimal and octal syntax for numbers Support hexadecimal and octal syntax for integers Jan 10, 2023
@shikanime
Copy link

shikanime commented Jan 25, 2023

Hello, I would love to work on a this ( ˶˃ᆺ˂˶) The implementation of lib.fromHex is quite trivial, was there an issue with it ? (🔄 reading the additional context)

@roberth
Copy link
Member Author

roberth commented Jan 25, 2023

The point of this issue is to add native literals to the language. I'm sure Nixpkgs can implement lib.fromHex quite well, but that'd be a function from string to integer. I don't think it has an issue, but it's just a different thing.

The goal here is to have the following.

# this is a nix file
assert 15 == 0xf;
assert toString 0xf == "15";
assert 0o10 == 8;
# etc

You may notice that github doesn't know how to highlight that; Nix doesn't know how to parse it either, yet.

@shikanime shikanime linked a pull request Jan 25, 2023 that will close this issue
7 tasks
@roberth
Copy link
Member Author

roberth commented Jan 27, 2023

Oh no! I forgot about the list item separator...

This proposal does steal usable syntax:

nix-repl> let x10 = "x number ten"; in [ 0x10 ]
[ 0 "x number ten" ]

It's still unlikely to cause a problem, but it does mean that this is technically a breaking change.

@ClarkeRemy
Copy link

Might I recommend that instead of 0x<number in hex> for only hexadecimal,
allow <base in decimal>b<number in base>?
It's more extensible and allows more bases to be leveraged.

All of the following would equal 11 in base 10.

10b011 # Decimal
2b1011 # Binary
8b0013 # Octal
16b00A # Hexadecimal

7b0014 # even base 7
12b0a.0 # even works with floats

This parses easily because it still begins with a numeral.
(The letter 'b' could be bikeshed, consider '_'.)

If this was implemented with case insensitivity, you get 36 bases.
(10 numeral + 26 ascii letters)
With case sensitivity you can have up to base 62.
(10 numeral + 52 ascii letters)

@eclairevoyant
Copy link
Contributor

Why? Is there a use case for making base 23 or base 37 easier to write?

@roberth
Copy link
Member Author

roberth commented Feb 21, 2023

This seems over-engineered, and the base 32 wouldn't match Nix's own base32 encoding; lack of base 64 would be surprising.

I'd like to reserve _ as an ignored digit group separator, similar to Haskell's NumericUnderscores:

million = 1_000_000;

@roberth
Copy link
Member Author

roberth commented Feb 21, 2023

We might also want to steal syntax such as 64Ki == 65536 to represent the base 2 unit prefixes (Ki, Mi, Gi, etc), although that opens up more questions about complete units, rather than just unit prefixes. Perhaps we could reserve all alphabetic suffixes but only implement unitless unit prefixes that return the usual unitless numbers.

@x10an14
Copy link

x10an14 commented Feb 21, 2023

@eclairevoyant: Why? Is there a use case for making base 23 or base 37 easier to write?

Nope - but it's all just the modulo operator, so why go look for hardcoded chars when we could take the <base> number as input to the modulo operation?

I would argue that it's simpler/easier with integerValue := <inputNumber> % <inputBaseValue>, than hard-coding for specific bases.
What are you gonna do when another base is requested/implemented?
We've already got 4x typical ones (decimal, binary, octal, hexadecimal).

Time (bases 24 & 60) would also be somewhat realistic use-case(s) (since you asked), for a vast majority of us using Nix.

@roberth: This seems over-engineered, and the base32 wouldn't match Nix's own base32 encoding; lack of base 64 would be surprising.

Re. over-engineering; see above.

Re. base64; this suggestion would make the case for bases higher than 36 simpler to write/extend once someone(tm) figures out how to solve the (completely orthogonal to this discussion - at least so far) problem with base64.

@roberth: I'd like to reserve _ as an ignored digit group separator, similar to Haskell's NumericUnderscores:

million = 1_000_000;

Good enough argument for me to move away from _ as a separator.

EDIT: Minor edits for clarity/readability.

@eclairevoyant
Copy link
Contributor

eclairevoyant commented Feb 21, 2023

What are you gonna do when another base is requested/implemented? We've already got 4x typical ones (decimal, binary, octal, decimal).

Decimal is a given, hex is common, octal is somewhat niche and binary is quite niche. I don't imagine getting requests for other bases in this fashion. These prefixes are also standard across the computing world, no need to have some bespoke integer format.

Time (bases 24 & 60) would also be somewhat realistic use-case(s) (since you asked), for a vast majority of us using Nix.

This is not how you represent time, so that's actually hurting your point.
To me something like time would make more sense to handle in a library function, since that's a mix of bases (and also would need to handle time zones), or even just an external command.

@x10an14
Copy link

x10an14 commented Feb 21, 2023

What are you gonna do when another base is requested/implemented? We've already got 4x typical ones (decimal, binary, octal, decimal).

@eclairevoyant: Decimal is a given, hex is common, octal is somewhat niche and binary is quite niche. I don't imagine getting requests for other bases in this fashion. These prefixes are also standard across the computing world, no need to have some bespoke integer format.

I am not as confident as you in your (I feel too) strong assertion(s).
But I don't think I've got anything more productive to add to the discussion than I already have at this point, so tapping out here - leaving this for you (who feels strongly enough to write "no need to have some bespoke integer format" (emphasis mine)) and others to decide.

Time (bases 24 & 60) would also be somewhat realistic use-case(s) (since you asked), for a vast majority of us using Nix.

@eclairevoyant: This is not how you represent time, so that's actually hurting your point.

I definitely think your argument is over-reaching here - seems (to me) like you're presuming knowledge of all use-cases.
I (personally) have no problem seeing myself (or others) using something like services.<serviceName>.config.intervalAttribute = 7 * 24n1;.

@ClarkeRemy
Copy link

Why? Is there a use case for making base 23 or base 37 easier to write?

Cherry picking?

This is already a thing in Bash but it uses #
This prints 120

$ echo $((23#55))

This seems over-engineered, and the base 32 wouldn't match Nix's own base32 encoding; lack of base 64 would be surprising.

I presume you mean the SHA sums for packages. This is semantically different, people are not expected to read hashes as actual numbers, or actually reason about them.

Base 64 has always been a tricky one. I would love for a way to do it for base 64.

I'd like to reserve _ as an ignored digit group separator, similar to Haskell's NumericUnderscores:

million = 1_000_000;

I'm fine with the 'b' character. I think 'n' could be good too.
Bikeshed away.

16_0a
16b0a
16n0a

I'm also of the opinion that we could just have a builtin called base
That takes a number and string

builtin.base 16 "FF"
builtin.base 23 "55"

@eclairevoyant
Copy link
Contributor

eclairevoyant commented Feb 21, 2023

I have no objection to a builtin for less-used bases. All I'm getting at is I don't think that should block movement on this feature request. Nor do I think we should look to bash for user friendliness.

@roberth
Copy link
Member Author

roberth commented Feb 21, 2023

Arbitrary base literals are hard to read and understand, and any remaining use cases are sufficiently niche that they're better covered by a library function.

intervalAttribute = 7 * 24n1

This is complete gibberish to me. interval suggests something about time, so perhaps this is better solved by support for units. I'm happy to reserve syntax for units, but I'd like to save the remaining discussion about units for when hex and octal have been implemented.

I do have an objection to a new built-in. Builtins have to be absolutely bug-free and can not be changed. Furthermore they make the implementation of alternate evaluators needlessly hard. Niche functions like arbitrary-base parsing are better implemented in a Nix-language library such as a flake, which does not suffer from these problems, or nixpkgs.lib, which suffers very little from these problems.

@x10an14
Copy link

x10an14 commented Feb 21, 2023

I have no objection to a builtin for less-used bases. All I'm getting at is I don't think that should block movement on this feature request. Nor do I think we should look to bash for user friendliness.

I'm tapping out of this back and forth.

But it's important to me to clarify for anyone who cares that none of my inputs to this thread was meant as @eclairevoyant interpreted them;

block(ing)

@ClarkeRemy
Copy link

Arbitrary base literals are hard to read and understand, and any remaining use cases are sufficiently niche that they're better covered by a library function.

intervalAttribute = 7 * 24n1

This is complete gibberish to me. interval suggests something about time, so perhaps this is better solved by support for units. I'm happy to reserve syntax for units, but I'd like to save the remaining discussion about units for when hex and octal have been implemented.

I do have an objection to a new built-in. Builtins have to be absolutely bug-free and can not be changed. Furthermore they make the implementation of alternate evaluators needlessly hard. Niche functions like arbitrary-base parsing are better implemented in a Nix-language library such as a flake, which does not suffer from these problems, or nixpkgs.lib, which suffers very little from these problems.

fair enough, just thought I would pitch in.

@x10an14
Copy link

x10an14 commented Feb 21, 2023

Arbitrary base literals are hard to read and understand, and any remaining use cases are sufficiently niche that they're better covered by a library function.

intervalAttribute = 7 * 24n1

This is complete gibberish to me. interval suggests something about time, so perhaps this is better solved by support for units. I'm happy to reserve syntax for units, but I'd like to save the remaining discussion about units for when hex and octal have been implemented.

Do you still feel that's the case with the below code? I'm curious.

let
 dayAndAHalf = 12n3;
in {
  intervalAttribute = 3 * dayAndAHalf;
}

@eclairevoyant
Copy link
Contributor

12n3 is gibberish as well. What does that have to do with a day and a half?

But it's important to me to clarify for anyone who cares that none of my inputs to this thread was meant as @eclairevoyant interpreted them;
block(ing)

Of course this discussion is blocking since it directly conflicts with the proposed syntax. If it wasn't blocking then it should be a separate request, yes?

@roberth
Copy link
Member Author

roberth commented Feb 21, 2023

Did you mean 12n30 instead of 12n3? The fact that you're making this mistake (or the interpretation is ambigous) is not helping your point, and it will not refute the fact that arbitrary-base syntax is unconventional and surprising to all readers. Most readers only use the language occasionally, so we should try to avoid surprises.
I'm enough of a nerd to enjoy the idea, but professionally we're not going to implement arbitrary-base syntax, and we've exhausted the subject by now. Please stop.

@x10an14
Copy link

x10an14 commented Feb 21, 2023

12n3 is gibberish as well. What does that have to do with a day and a half

Not engaging.

I'm enough of a nerd to enjoy the idea, but professionally we're not going to implement arbitrary-base syntax, and we've exhausted the subject by now. Please stop.

I'm on the same page.

@eclairevoyant: Of course this discussion is blocking since it directly conflicts with the proposed syntax. If it wasn't blocking then it should be a separate request, yes?

Then where are discussions to be held?

@mweinelt
Copy link
Member

mweinelt commented Feb 4, 2024

I needed octal support to represent file permissions in a module today. Converting the octal int from its string representation into a decimal integer was the only safe way I found to transport it through PyYAML (wants octal numbers to be zero-prefixed) into Python's os.chmod (supports all kinds of integer representations).

Anyway, since this was one of the first issues I found on the matter: Here is a function that converts an octal string into an integer with base 8.

toIntBase8 = str:
  lib.pipe str [
    lib.stringToCharacters
    (map lib.toInt)
    (lib.foldl (acc: digit: acc * 8 + digit) 0)
  ];

@chayleaf
Copy link

chayleaf commented Feb 21, 2024

builtins.fromTOML can be used as a compact way to parse bin/oct/dec/hex numbers in modern Nix versions, like assert (builtins.fromTOML "a = 0b11").a == 3. Of course, this isn't a replacement for adding non-decimal integer literal support into the language.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Feature request or proposal language The Nix expression language; parser, interpreter, primops, evaluation, etc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants