New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Arithmetic with Overflow #6223

Open
wants to merge 30 commits into
base: master
from

Conversation

Projects
None yet
@bcardiff
Member

bcardiff commented Jun 19, 2018

This PR adds overflow to integer arithmetics, adds new language constructs to allow user
to choose where overflowing (checked) or wrapping (unchecked) semantic is desired in a given lexical scope.

It also updates the STD where unchecked semantic is required. Or at least, where specs began to fail if run with --overflow-checked.

Updates in the compiler options

There are two compiler options --overflow-checked and --overflow-unchecked to change the
default overflow policy to be used. Currently the default is --overflow-unchecked to keep the current
semantic and allow code to be updated more easily. In future versions I think the default should be --overflow-checked in favor of safety.

After building a crystal compiler you build with --overflow-checked option and run specs as follows.

$ ./bin/crystal foo.cr --overflow-checked
$ ./bin/crystal build foo.cr --overflow-checked
$ ./bin/crystal spec --overflow-checked

Updates in the language

The operations affected by the overflow policy are only +,-,* between integers.

The new language constructs are:

checked { <expr> }
unchecked { <expr> }

The type and value of the whole [un]checked { <expr> } matches the type and value of <expr> or it raise an OverflowError.

When an OverflowError is raised the location advertised by the exception is/should be the location of the operator that generate the overflow.

Lexical Scope

The intention of using a lexical scope is that it would make as easy as possible to know how the operation is performed.

When compiled with --overflow-checked,

def foo(v)
  unchecked { 
    # the * will not raise on overflow
    bar(v) * 2 
  }
end

def bar(v)
  # the + will raise on overflow
  v + 2
end

Regarding blocks the idea is the same, when compiled with --overflow-unchecked,

def bar(v)
  # all operations here would be unchecked due to default policy
  yield v
end

def foo
  checked {
    bar(127_i8) do |v| 
      # Although the block is yielded from bar, the + is
      # performed as checked since it is written lexically
      # inside a checked section
      v + 1_i8
    end
  }
  1
rescue OverflowError
  0
end

foo # => 0

Implementation details

Given that +,-,* for Int are primitives they are always inlined and the codegen stage knows what to do along the way.
It is easy to change the semantic of the operation give some context, in this case the lexical scope information.

The current behaviour and under the unchecked policy the + is translated to add IR.

def inc(v)
  v + 1
end
define i32 @"*inc<Int32>:Int32"(i32 %v) #0 {
entry:
  %0 = add i32 %v, 1
  ret i32 %0
}

Under the checked policy the + is translated to llvm.sadd.with.overflow.* a LLVM with Overflow Intrinsics.

define i32 @"*inc<Int32>:Int32"(i32 %v) #0 {
entry:
  %0 = call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %v, i32 1)
  %1 = extractvalue { i32, i1 } %0, 0
  %2 = extractvalue { i32, i1 } %0, 1
  br i1 %2, label %overflow, label %normal

overflow:                                         ; preds = %entry
  %3 = call %OverflowError* @"*OverflowError::new:OverflowError"()
  call void @"*raise<OverflowError>:NoReturn"(%OverflowError* %3)
  unreachable

normal:                                           ; preds = %entry
  ret i32 %1
}

Note: In both situations when the operands do not belong to the same type some expansion, truncation and additional checks for overflow are done. This is because the operation is carried on the "bigger" type, and the result is always translated to the type of the left operand. This is the current semantic.

Limitations of the scopes

While developing the feature I came across two operations that would be nice to overflow along with +,-,*.

  1. Int#**(exponent : Int)
  2. Int#-()

This operation are not primitives as the others, but they are operators. I think it would be nicer if their overflow semantic is changed together with the current overflow scope policy.

Maybe for now, Int#**(exponent : Int) got its counterpart Int#unchecked_pow(exponent : Int) since the former will match the compiler policy that will eventually be checked.

The second one was used in one single place
as -info.value.to_{{method}} and changed to unchecked { 0.to_{{method}} - info.value.to_{{method}} }.

These two cases are methods and they are not inlined, so the mechanism described in "Implementation details" does not longer apply.

If we want them to be affected by the overflow policy the possible paths to go are:

  1. Make them @[AlwaysInline] and make the inline methods be affected by the policy.
  2. Add another annotation to specify the method is affected by the overflow policy. If not inlined then the code gen should make the call to the checked or the unchecked version directly.

Inlining the ** might not be that nice since it's ~10 lines and used it in many places. Adding the annotation seems a bit to much for a first round.

Another usage of forwarding the policy to other methods I can imagine is for example in matrix operations: If * is to be applied between matrices or matrix-scalar the user might have fine-grained control beyond the global policy (--overflow-[un]checked). But I found that either the packages always wrap silently or use floats.

So, all in all, up to here the only real candidate was **, hence the unchecked_pow method.

Standard Library changes

As a migration technique a temporal macro __next_unchecked is added and should be replaced with unchecked on the release after merge.

These changes are marked as Random: Mark required unchecked code blocks or different std-lib sections. All of these change can be changed later specially Time and JSON. They are focused of keeping the current behavior mostly. In the case of Time, there was some manual overflow check.

All of these changes were pushed due to existing specs that failed otherwise.

Pending

  • Add [un]checked do; end notation also. I like that { } fits better to be used inside expressions.
  • Overflow on type casts (eg: 256.to_u8 should raise in checked context)
  • Fix/Test some cases of variables shadowing if they are first declared in a [un]checked scope
  • Fix/Test lexical scope and anonymous functions declaration
  • Formatter (Same rules as method, but with the OverflowCheckScope ast node)
  • Track highlighters to add the new keywords

Feedback is welcome

bcardiff added some commits Jun 13, 2018

Refactor: avoid setting ivar in codegen_binary_op
since it's done in codegen_primitive and others codegen_binary avoids setting it
Refactor: move truncate code to each operator
More linear code for operators
Forward AST call operator location to codegen_binary_op
for exact location information of the exception in the caller context
Add --overflow-checked and --overflow-unchecked compiler options
Make default unchecked to keep current semantics but schedule change on the default policy
Add checked / unchecked keywords
Move OverflowCheck to OverflowCheckScope::Policy
Add overflow check policy to bc_flags
Changing overflow_check should invalidate cache
FloatPrinter: Mark required unchecked code blocks
Removing any of these prevent grisu3_specs to pass
String: Mark required unchecked code blocks
Int#** avoid k *= k in last iteration due to possible overflow
Compiler: Mark required unchecked code blocks
Otherwise generated enum values might overflow (even if they are not used due to manual assignment)
@sempervictus

This comment has been minimized.

sempervictus commented Jul 1, 2018

Basic scientific method states that experiments should occur in a controlled environment, testing for one isolated variable at a time if possible, for maximum accuracy and reduction of white noise. If something as basic as arithmetics work differently in debug and release, debug stops being a viable way to test release code.

@bcardiff

This comment has been minimized.

Member

bcardiff commented Jul 2, 2018

So, I've been changing the code in other branch to use custom operators. The migration is a bit longer because there is a bunch of code to change and there are many specs that use + between Ints or Ints and floats. Changing them to &+ is needed to avoid additional runtime dependencies of the code, but when Float | Int are involved, then &+ needs to be added. Or maybe the specs should change to use other operations since it's not the main concern.

I had 99% of compiler specs running. And I still need to change the std specs to avoid or fix overflowing.

I think explicit operators seems fine and it gives more control (yet sometimes feel a bit annoying to type all the &). Introducing this operators requires some additional releases.

  1. next: support parting &+, &-, &* and &**. With support of &+=, &-=, &*=. Add formatting rules.
  2. next+1: make all the above work as the current operation
  3. next+2: change +, -, *, ** to overflow and fix specs.

The integer division might get into the way since INT_MIN / -1 (or INT_MIN // -1) should actually overflow, meaning &// should exist. But it could come slightly later.

&- collide with &->proc that actually is &(->proc)`. Unless the lexer is changed to add lookahead in that case there is a slightly breaking change here.

This story will continue in different PRs.


My current branch regarding this are: https://github.com/bcardiff/crystal/tree/feature/overflow-ops-phase2 , https://github.com/bcardiff/crystal/tree/feature/overflow-ops-phase1, https://github.com/bcardiff/crystal/tree/feature/overflow-ops-phase0 . But beware because I rebase a lot :-P

@wontruefree

This comment has been minimized.

Contributor

wontruefree commented Jul 2, 2018

Whenever I think of '&'something in Crystal I think Procs. Is there another symbol that would work better like :+, :*, :+=?

@bcardiff

This comment has been minimized.

Member

bcardiff commented Jul 2, 2018

@wontruefree The only symbols that are left besides &... are ~+ ~- ~*, but at least one other language is using &..., we are not alone.

Unless your keyword layout has something I am not seeing, or we go to with heavy signs (they are not easy to type!), the option to go seems to be &...

@asterite

This comment has been minimized.

Contributor

asterite commented Jul 2, 2018

When I first thought about these operators, I thought about +?, -?, etc. It could even maybe be +^, -^, etc... the ^ meaning "overflow" (it goes up :-P). There are plenty of possibilities to choose from (and that symbol doesn't have to come before the operator, it can come after it).

@asterite

This comment has been minimized.

Contributor

asterite commented Jul 2, 2018

(that said, I'm fine with &+, etc.)

@bcardiff

This comment has been minimized.

Member

bcardiff commented Jul 2, 2018

^ is associated with exponents. I know we use ** for that, but still..
? suffix might conflict with some expr.op without argument and the ternary operator. I would prefer to leave some space in the operators to not require ... some space. 🥁

I'm fine with & prefix also.

BTW: I found it confusing that swift names overflowing operators to the one that after an overflow will wrap around &+. And we are calling overflowing when an exception might be raised +. #NamesAreHard

@Sija

This comment has been minimized.

Contributor

Sija commented Jul 2, 2018

BTW: I found it confusing that swift names overflowing operators to the one that after an overflow will wrap around &+. And we are calling overflowing when an exception might be raised +. #NamesAreHard

I'd also assume that overflowing in this case means that's allowed to overflow.

@bew

This comment has been minimized.

Contributor

bew commented Jul 2, 2018

Following up on what @asterite said, it could be a char after the operator:
What about <operator>! for overflow checking disabled overflow checks ? (dangerous, as @straight-shoota said below)
It feels good because it's known that some methods ending with ! can raise (like not_nil!).

So like: +! *!

@wontruefree

This comment has been minimized.

Contributor

wontruefree commented Jul 2, 2018

thank @bcardiff that makes a lot of sense.

@straight-shoota

This comment has been minimized.

Contributor

straight-shoota commented Jul 2, 2018

@bew But the standard operators +, -, * should raise on overflow. Disabling overflow checks is the exception and needs different operators. Still, ! could make sense because it marks these operators as dangerous.

But then, ! is also an unary operator, so maybe not a good choice here.

@bew

This comment has been minimized.

Contributor

bew commented Jul 2, 2018

Oh right thanks @straight-shoota, I've updated my message accordingly.

@jhass

This comment has been minimized.

Member

jhass commented Jul 2, 2018

Yeah, let's avoid something where a wrong space is still valid syntax with a different meaning even if it then would fail the semantic phase, as in 1 +! 2 vs 1 + !2

@Sija

This comment has been minimized.

Contributor

Sija commented Jul 2, 2018

Maybe @? It visually "wraps around" and has no prior meaning.

@funny-falcon

This comment has been minimized.

Contributor

funny-falcon commented Jul 2, 2018

If use suffix instead of prefix, then why not same & character?
+&, -&, *&, >>&, /&, %& .
Will it collide with some existing syntax?

It could be read as "add with and", "sub with and" etc, that is close to logical semantic:
a.to_u8 +& b.to_u8 == ((a + b) & 0xff).to_u8

d &+= a &+ b &* c
vs
d +&= a +& b *& c

Though, prefix looks a bit prettier.

@jhass

This comment has been minimized.

Member

jhass commented Jul 2, 2018

a @+ b could work I guess. a +@ b is bad due to a + @b. (minor: +@ is unary + in Ruby)

@ysbaddaden

This comment has been minimized.

Member

ysbaddaden commented Jul 2, 2018

I think &+ is quite clever. It reads as "wrapping add" or (a + b) & 0xFF, which is exactly what we do in languages without wrapping integers (e.g. Ruby).

@ysbaddaden

This comment has been minimized.

Member

ysbaddaden commented Jul 2, 2018

I.e. no need to bother to be smart or clever. Swift paved the way. Let's follow.

@megatux

This comment has been minimized.

megatux commented Jul 3, 2018

@sempervictus

This comment has been minimized.

sempervictus commented Jul 4, 2018

@megatux: I think the road to malbolge is paved with something like that :)
The Swift way seems the most straightforward (likely to be understood by new developers) and safe approach, reducing the pebcak->equifax scenarios down the line.

@wontruefree

This comment has been minimized.

Contributor

wontruefree commented Sep 4, 2018

I am curious what is the progress on this?

@RX14

This comment has been minimized.

Member

RX14 commented Sep 10, 2018

@bcardiff

This comment has been minimized.

Member

bcardiff commented Sep 10, 2018

The amp ops will have the right semantics in 0.27.
The plain ops will change their semantics in 0.28.

I will extract those changes from old branch probably next week.

I’m glad there was interest in the numerical features in the last days!

@wontruefree

This comment has been minimized.

Contributor

wontruefree commented Oct 10, 2018

I just saw the amp ops PR go threw!
The 0.27 release will not have the amp ops but they will not be implemented right?

@RX14

This comment has been minimized.

Member

RX14 commented Oct 10, 2018

@wontruefree no, 0.26 had the operators but no implementation. 0.27 will have &+ behave the same as +, 0.28 will change + to raise on overflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment