New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The next step #1824

Closed
asterite opened this Issue Oct 26, 2015 · 139 comments

Comments

Projects
None yet
@asterite
Contributor

asterite commented Oct 26, 2015

An always present worry that we have with the language is that it won't be useful for large projects. Since the global type inference algorithm always has to start from scratch, it's good if we find a way to do incremental or partial compilation: some way to reuse a previous compilation's result for the next one. That means that the first time you checkout a project it might take a few seconds to compile, but the next compiles should be fast, given small changes are made to the codebase.

We sat down and thought how it can be done with the current language, and we think (or at least we are strongly convinced) that there's no efficient way to do this. The language might need a change.

We thought of the minimum delta that could make this work. The conclusion is not a happy one, but it's also not the worse one could imagine:

When declaring a type (class, struct) you have to specify its instance variables and their types. The same applies to class variables and global variables.

After some more brainstorming we think that the above rule can be relaxed in some cases: the compiler will try, for example, to infer instance variables types from the expressions assigned to them in the initialize methods, if these expressions are simple (literals and new calls). An example of where this simplification can be applied is the CSV::Lexer class from the standard library:

# Current code will compile just fine
class CSV::Lexer
  def initialize
    @token = Token.new         # inferred to be Token
    @buffer = MemoryIO.new     # inferred to be MemoryIO
    @column_number = 1         # inferred to be Int32
    @line_number = 1           # inferred to be Int32
    @last_empty_column = false # inferred to be Bool
  end
end

Another rule is defining a type restriciton of an initialize argument. For example:

class CSV::Builder
  # @io inferred to be an IO
  def initialize(@io : IO)
    @first_cell_in_row = true # inferred to be Bool
  end
end

But note that if in the above cases an instance variable is assigned another type in other methods, it will immediately be an error: if you want a union, define the instance variable type as such.

In other cases you'll inevitably have to write down the types:

class Markdown::Parser
  # Needed because the expression assigned to it
  # in the initialize is not simple (involves calls but it's not a `new` call)
  @lines :: Array(String)

  # @renderer is inferred to be a Markdown::Renderer
  def initialize(text, @renderer : Markdown::Renderer)
    @lines = text.lines.map &.chomp
    @line = 0 # inferred to be an Int32
  end
end

The current compiler works by taking into account all assignments to instance variables in the whole program. This has many (bad) consequences:

  • Because their type might change by mistake, if an errors follows it is hard to digest and hard to track. Also, the rule of "if it's not assigned in the initialize then it's nilable" is not very intuitive and, again, can happen by mistake. By having a clear rule of what are the types of instance variables these type-changing errors should happen less or at least will be easier to understood. Also: accessining a non-declared instance variable will immediately give an error, instead of propagating a nil value.
  • Because their type might change at any point in the program, they can affect the type of any method that uses them. This means that local type inference for a method can't be done: you always have to take into account the possibility of an instance variable changing after that method has been typed. If the compiler knows the types of all "global" data (instance and class variables, and global variables) local type inference becomes possible, leading to a faster compiler but most importantly allowing the possibility of reusing a previous compilation's result (for example by caching the "dependencies" (types, other methods) of each method).

The second point is the most important. If we want Crystal to be used for real, large projects, it's paramount that the compiler can handle such projects and that you don't have to wait 30 or 60 seconds between compiles. We prefer making this change rather than not doing it now and later get suck with a language that is useful for "mostly toy" projects.

Because this change has a huge impact on the overall algorithm of the compiler, we decided to take this opportunity to also rewrite the compiler "from scratch". This means: we'll rewrite it taking into account the current code and specs, but we won't reuse most of the code we have (well, the lexer and parser will be reused, the formatter, as well as some other bits). The current code, even though it's kind of efficient and more or less organized, has some flaws that are hard to fix without a rewrite (for example, we'd like to make an initial pass to gather all types so that we know which types have subclasses and so are "virtual"). And we'll do it taking into account the current bugs (which are about 45 and are more or less related to the same compiler/language flaws).

The good news is that we'll make sure that this new compiler is perfectly commented and documented: you'll be able to jump right into its source code and all phases and algorithms will be explained (promise!). Right now this is far from this case :-)

Let's summarise the cons and pros of this change:

Pros:

  • Faster compile times (even without using a previou's compile result), and incremental compilation becomes possible.
  • Better error messages when incorrectly assining a wrong type to an instance variable, or when reading a non-existant one.
  • No need for a human to infer which instance variables are nilable: if they are not mentioned in the initialize there will be a type annotation saying its type.
  • We have some ideas on how to make possible Array(Object), Array(Reference) and basically using any type as a generic argument. You'll be able to have an Array(Object) and invoke to_s on each of them, for example. Because the compiler will know all types before the "main" code, virtual method lookups (that involve search a type's subclasses recursively) will be faster and can be cached because the type hierarchy will be fixed.

Cons:

  • It might be tedious to sometimes write these types. However, when you program, most of the time is spent in defining methods and in using types, not in defining the type of their instance variables. And you always know their type. A downside is that this might not go well with duck typing and you might have to define a module to act as an interface. But you can take that change to your benefit and define abstract methods on that module that tell you what has to be implemented (or you can leave the module empty: both approaches will work). Also: you'll be able to reopen a type and redefine the type of an instance variable, so mocks are still possible.
  • The promise of never having to write types down becomes false. However, this is already false: you have to do it for generic type arguments. And sometimes you do it in method arguments (although that's mostly for overloads, but it can also lead to better error messages and better documented code). And with the relaxation rules this won't have to be done all the time.

Remember that types won't be mandatory in method arguments or return types, so duck typing still holds. This method:

def add(x, y)
  x + y
end

has no type annotations and works on any type that defines a #+(other) method, so we belive Crystal is still unique and doesn't suddendly become an existing language.

Because the rewrite will take some time (and also because this next month we'll be busy with some specific work/life matters) you'll have to be patient. In the meantime the standard library can be grown and existing issues will be fixed unless they can only be fixed in this new compiler rewrite (some examples of issues we'll fix in the new compiler are #456, #718, #729, #846, #867, #916, #941, #962, #1346). After the rewrite you can use crystal tool hierarchy to port your code to the new version, or we might provide a migration tool.

@asterite asterite added the task label Oct 26, 2015

@jhass

This comment has been minimized.

Member

jhass commented Oct 26, 2015

I think that compilation times are still great and in no way a pain point of the current language. A far greater benefit and boost could be achieved by focusing on concurrency and parallelism idioms as well as the threaded scheduler. The longer that's postponed, the more standard library code will need to be checked against thread safety. I think the time you can allocate would be extremely well spent on that issue instead and benefit the current community tremendously more. I'm not sure I can find the words to emphasize this strongly enough really. I'm truly convinced this is the wrong priority here, sorry.

But that's just my two cents, I can only give my opinion and don't dictate you anything of course.

@Perelandric

This comment has been minimized.

Perelandric commented Oct 26, 2015

For my part... 👍 and it has little to do with compile times. I'm a fan of having more explicit type requirements in general.

Because I know going in that Crystal isn't exactly the way I'd want it in some areas, it would be silly of me to complain about it. I like it as a language either way so I'm willing to go with the flow but this would be very welcome from my perspective.

@asterite

This comment has been minimized.

Contributor

asterite commented Oct 26, 2015

I forgot to mention that this change isn't only to improve compile times. Consider this:

class Foo
  def foo
    1
  end
end

class Bar < Foo
  def initialize(@foo)
  end

  def foo
    @foo
  end
end

a = [] of Foo
a << Foo.new
puts typeof(a[0].foo) #=> ???

The compiler doesn't know the type of @foo, because Bar was never instantiated, so what's the type of the expression above? Well, right now it's kind of hacky, and it involves tracking which types were instantiated and recalculating types and methods when this happens. A real example of this issue is #1809 . If you have to somehow specify the type of @foo, this problems disappears.

So it's not just for a faster compiler: it's for a better, more complete and non-broken language.

@jhass As I said, it's not that we'll only be working on this new compiler: the standard library will continue to grow and we'll fix bugs as they appear.

@jhass

This comment has been minimized.

Member

jhass commented Oct 26, 2015

And as I said I'm convinced it's wrong prioritization at this point. I'd rather give this issue more time and act quickly on the concurrency issue, than acting on this quickly with "easy" solutions through more explicitness.

@waj

This comment has been minimized.

Member

waj commented Oct 26, 2015

I don't see concurrency as a problem right now. It's also a priority for us to have proper concurrency support for the language but the language is still usable. Also, this can be done in parallel, as I don't think adding concurrency support will affect most of the standard library code or even the compiler itself (probably not affected at all).

On the other hand, our concern right now is to make sure that the language can scale with projects of any size. We believe is our responsibility to make it happen so we don't hit a big wall in the future and leaving this (big) growing community disappointed.

To be honest, at some point we even thought we could leave concurrency support for a future version and make the language robust for 1.0. Probably not gonna happen this way after all, but that's how much transversal problem I think it is.

@luislavena

This comment has been minimized.

Contributor

luislavena commented Oct 26, 2015

@asterite in your example, I would rather see a compiler error telling me that I'm using @foo uninitialized that being imposed to detail all the types of each instance or class variables.

Perhaps a different approach can be taken? incremental compilation/analysis? If I'm not mistaken Rust have this in their RFCs (not yet prioritized, even they are post-1.0 era)

https://github.com/nikomatsakis/rfcs/blob/incremental-compilation/text/0000-incremental-compilation.md

@asterite

This comment has been minimized.

Contributor

asterite commented Oct 26, 2015

@luislavena You will get a compile-time error saying something like No type specified for @foo of Bar and to fix the problem you'll have to specify its type. How can you determine the type of Bar#foo without knowing the type of @foo? Or maybe I'm misunderstanding something.

Rust requires type annotations in types and methods, and that's why they'll be able to eventually do incremental compilation. We are aiming for the same goal and the minimum requirement is to have types (and their instance variables) well defined before we start typing the whole program.

@samis

This comment has been minimized.

samis commented Oct 26, 2015

I'm not fan of more explicitness and increased number of required type annotations. Is it not possible to let the user decide if they want to make the trade-off with regards to compile times? I would not like to see this language become a modified clone of some other static/dynamic type language with different paint (i.e syntax.) As for concurrency, I don't have an opinion about that.
(P.S At this stage, might this be a premature optimization?)

@ysbaddaden

This comment has been minimized.

Member

ysbaddaden commented Oct 26, 2015

I'm not a type person, so having to specify more types is daunting... at first. Yet, thinking about it, it may be a good thing to restrict the struct/class variable types as proposed here. I just hope we won't see more and more changes in the language to have to specify types more and more and more 😨

The current compiler is acceptably fast on a decent computer for small applications (eg: Shards), but I acknowledge it slows down quickly when the number of types grows up to a mid-size application (or with a heavyweight framework). If it could be lightning fast, and we could transparently recompile the app live on each file save, and this only requires to specify the type of hazy struct/class variables... then I guess I can live with it.

I'd love to see parallelism being worked on in parallel (pun intended) as @jhass suggested. But let's have another, dedicated, issue to discuss this.

@kirbyfan64

This comment has been minimized.

Contributor

kirbyfan64 commented Oct 26, 2015

I don't think this would help that much. IMO, Crystal still builds itself faster than building a C++ application half as long and is even faster than Nim, so there's not really much to complain about in that field.

@benoist

This comment has been minimized.

Contributor

benoist commented Oct 26, 2015

Would this also make it possible to create a REPL?

@asterite

This comment has been minimized.

Contributor

asterite commented Oct 26, 2015

@samis There are many issues if we make this optional:

  • You include a shard in your project and your compile times slow down because the author didn't optimize it for compile times.
  • There's still the issue with uninstantiated classes I mentioned above. This change plans to solve all remaining issues with the language.

@ysbaddaden We are pretty sure this is the last change regarding types. After years of developing the compiler and writing code in Crystal the major pain point is the lack of a fixed type for instance vars: methods sometimes can't be typed (like the example above) and you need some hacks here and there.

@kirbyfan64 Last time I compiled Nim it took 2.5s (or 5s?) to compile itself. That's including the codegen. We are at 10s on the same machine, but I think Nim's compiler is much more complex (for example it has a VM in it). We also don't want to use C++ as an excuse, such as in "Oh, yes, it takes 20s to compile, but I'm sure it takes a couple of minutes in C++" :-)

@benoist It will definitely make creating a REPL much easier (but still not trivial). It will also improve times for tools for IDEs, so we could have autocomplete and jump-to-definition tools that are fast.

I forgot to say that we'll develop this in parallel and it will be an experiment: if everything goes well and we see an improvement as a whole, we'll take the change. Otherwise we'll remain with the current language.

@kirbyfan64

This comment has been minimized.

Contributor

kirbyfan64 commented Oct 26, 2015

@asterite Huh? Nim caches C source files to avoid re-generating them. Try deleting nimcache.

I don't mind this as an experiment to see how it turns out, but I personally don't really like the idea. There have to be better ways to improve speed... :O

@ozra

This comment has been minimized.

Contributor

ozra commented Oct 26, 2015

I've been juggling around ideas in my head about incremental compilation in Crystal, and all my ideas has ended up with a compiler daemon, that keeps the inferred AST in mem. When compile command is used it simply signals the daemon for the proj dir, it checks which files are changed, lex, parse, transform their nodes, update all nodes that require changes, and re-infers only what's needed. But that's just a dreamy thought - haven't reasoned about the devilous details yet.

Typing data members sounds like a really good minimum requirement to me. Being sure about data structures' types is foundational imo - and not much work. The types visually go well there since it's already a more formal construct. And being relieved of typing methods is a must-have, which is nice to hear is part of the planned re-write.

The idea of compiling types in two steps sounds good. The Nim compiler has two such phases - though only operating on one "types-collection" at a time, which is a bit limiting imo.

I think prioritizing this before concurrency is good, since it's so fundamental to the progress of the language. Concurrency is an important aspect, but having a correct foundation is imperative.

@waj

This comment has been minimized.

Member

waj commented Oct 26, 2015

@ozra we had a similar idea. We even named it internally as the "reactive compiler" because it would react to changes and rebuild the minimal difference. The problem with the current design is that the type inference is very tight with the order of parsing. Another goal of the new design is detach as much as possible the order of declarations without affecting the semantics drastically. Then the information used for caching could be stored on disk and reloaded each time, or just live in memory within the compiler daemon for the project.

@samis

This comment has been minimized.

samis commented Oct 26, 2015

I find @ozra's idea of a compiler daemon interesting, but I disagree with the minimum requirement. I do agree that the issue raised by @asterite with the code example is an issue, but I'm not sure how important of an issue it is.

@waterlink

This comment has been minimized.

Contributor

waterlink commented Oct 26, 2015

My 2 coins into this discussion:

What about formatting (or other, like infer) tool trying to detect types
automatically and insert annotations in code? It will do it only for
non-annotated instance variables and if you want to change the type or add
a type to union you will have to do it manually. I think this tool will
cover at least 90% of the cases and allow not to write a lot of annotations.

WDYT?

Best Regards,
Alexey Fedorov,
Sr Ruby, Clojure, Crystal, Golang Developer,
Microservices Backend Engineer,
+49 15757 486 476

On Mon, Oct 26, 2015 at 11:11 PM, Oscar Campbell notifications@github.com
wrote:

I've been juggling around ideas in my head about incremental compilation
in Crystal, and all my ideas has ended up with a compiler daemon, that
keeps the inferred AST in mem. When compile is called it simply signals the
daemon for the proj dir, it checks which files are changed, lex, parse,
transform their nodes, update all nodes that require changes, and re-infers
only what's needed. But that's just a dreamy thought - haven't reasoned
about the devilous details yet.

Typing data members sounds like a really good minimum requirement to me.
Being sure about data structures' types is foundational imo - and not much
work. The types visually go well there since it's already a more formal
construct. And being relieved of typing methods is a must-have, which is
nice to hear is part of the planned re-write.

The idea of compiling types in two steps sounds good. The Nim compiler has
two such phases - though only operating on one "types-collection" at a
time, which is a bit limiting imo.

I think prioritizing this before concurrency is good, since it's so
fundamental to the progress of the language. Concurrency is an important
aspect, but having a correct foundation is imperative.


Reply to this email directly or view it on GitHub
#1824 (comment).

@asterite

This comment has been minimized.

Contributor

asterite commented Oct 27, 2015

I will try to explain why it's impossible to cache a previous compilation's result without fixing the type of instance variables. Take this code:

class Foo
  def initialize
    @x = 1
  end

  def x=(@x)
  end
end

def method(foo)
  foo.x
end

foo = Foo.new
method(foo)

The current compiler (after analyzing the whole program) will type Foo's @x as Int32, and create an instance of method like this (think of all methods as C++ templates):

def method(foo :: Foo) :: Int32
  foo.x
end

I use :: to mean "the type of the instantiated method" instead of "a type restriction".

Let's try to cache this method somehow. We can see it depends on Foo, and on the call Foo#x. If none of these things change we (supposedly) could reuse the previous compiler's result (meaning: no need to type that method, we can reuse its type information and also the generated LLVM code for it).

In turn, Foo#x depdends on the type of Foo's @x: if it changes, we need to recompute method's instantiation.

Now we change the program to this:

# This is the same as before except some added lines at the end
class Foo
  def initialize
    @x = 1
  end

  def x=(@x)
  end
end

def method(foo)
  foo.x
end

foo = Foo.new
method(foo)

# Now comes the added code

def problem(foo)
  foo.x = 'a'
end

problem(foo)

Can we still use the previous compilation's result? Let's see: did Foo#x change? It's source code didn't change. Let's see if Foo's @x changed. And here's the problem: we don't know unless we type the whole program again (we have to start from scratch, by first typing foo = Foo.new, then method(foo), then problem(foo), and find @x becomes a union, but when we get to that point it's already late). If the type of Foo's @x is written down (or inferred syntatically, without having to do semantic analysis) then the compiler can know that the method can/can't be reused without analyzing the whole program again, just what method source changed, or what types changed (it gets a lot trickier with macros, but we have some ideas).

By knowing these types beforehand the compiler can analyze method instantiations without having to do whole program analysis.

But if you propose a way to do it without these type annotations, we'll definitely do it the way we are doing it right now. But... it seems not possible. I don't consider the above a mathematical proof, but I tried to be more or less formal :-)

@waterlink

This comment has been minimized.

Contributor

waterlink commented Oct 27, 2015

Have anybody looked how type inference works in GHC (Glasgow Haskell
Compiler)? AFAIK they have full type inference and they have REPL and
incremental compilation. I might take a closer look later. Maybe for them
it is possible because type system is lambda calculus and it has some neat
properties that allow doing that IDK.

Best Regards,
Alexey Fedorov,
Sr Ruby, Clojure, Crystal, Golang Developer,
Microservices Backend Engineer,
+49 15757 486 476

On Tue, Oct 27, 2015 at 3:31 AM, Ary Borenszweig notifications@github.com
wrote:

I will try to explain why it's impossible to cache a previous
compilation's result without fixing the type of instance variables. Take
this code:

class Foo
def initialize
@x = 1
end

def x=(@x)
endend
def method(foo)
foo.xend

foo = Foo.new
method(foo)

The current compiler (after analyzing the whole program) will type Foo's
@x https://github.com/x as Int32, and create an instance of method like
this (think of all methods as C++ templates):

def method(foo :: Foo) :: Int32
foo.xend

I use :: to mean "the type of the instantiated method" instead of "a type
restriction".

Let's try to cache this method somehow. We can see it depends on Foo, and
on the call Foo#x. If none of these things change we (supposedly) could
reuse the previous compiler's result (meaning: no need to type that method,
we can reuse its type information and also the generated LLVM code for it).

In turn, Foo#x depdends on the type of Foo's @x https://github.com/x:
if it changes, we need to recompute method's instantiation.

Now we change the program to this:

This is the same as before except some added lines at the endclass Foo

def initialize
@x = 1
end

def x=(@x)
endend
def method(foo)
foo.xend

foo = Foo.new
method(foo)

Now comes the added code

def problem(foo)
foo.x = 'a'end

problem(foo)

Can we still use the previous compilation's result? Let's see: did Foo#x
change? It's source code didn't change. Let's see if Foo's @x
https://github.com/x changed. And here's the problem: we don't know
unless we type the whole program again (we have to start from scratch, by
first typing foo = Foo.new, then method(foo), then problem(foo), and find
@x https://github.com/x becomes a union, but when we get to that point
it's already late). If the type of Foo's @x https://github.com/x is
written down (or inferred syntatically, without having to do semantic
analysis) then the compiler can know that the method can/can't be reused
without analyzing the whole program again, just what method source changed,
or what types changed (it gets a lot trickier with macros, but we have some
ideas).

By knowing these types beforehand the compiler can analyze method
instantiations without having to do whole program analysis.

But if you propose a way to do it without these type annotations, we'll
definitely do it the way we are doing it right now. But... it seems not
possible. I don't consider the above a mathematical proof, but I tried to
be more or less formal :-)


Reply to this email directly or view it on GitHub
#1824 (comment).

@sdogruyol

This comment has been minimized.

Member

sdogruyol commented Oct 27, 2015

I also think that current compile times are really OK (excluding --release).

For me the main point of Crystal is:

  • Statically type-checked but without having to specify the type of variables or method arguments.

If we have more type checks ( we already have for generics and that's OK somehow) that'd be really detractive for me and type disliking developers.

@ozra

This comment has been minimized.

Contributor

ozra commented Oct 27, 2015

@sdogruyol - with release, the bulk of the time is spent in LLVM optimizer, and we can't make that magically go away, but I think we're both ok with that :)

@waj - that's really interesting to hear! The vague notion in my head was that every AST node should have a flag for which passes it has performed / are needed, and for the entire transformation and typing to be gradual. This would require a bunch of passes over the tree (the idea was to support "two-way inference" also) until all nodes are flagged as final. And as soon as code is changed - their references to sources must be compared to see which are dropped, what new nodes come in, which taint others (the dependency graph for type inference would likely have to be extended beyond just that, such as dependency on source-file [line diffing might be taking it a bit far] [still - this is available already via name/line/col]), and re-flag for inference, etc. And once again, these are just the ideas bubbling in the back of the head - and as always, when starting to examine the details - there are always devils you didn't want to think about ;-)

Even though I'm all for typing data-structures (aka members of classes), and there are strong indications from studies showing that typed code is both more stable, and faster to develop in the long run - I know many want the vibe of "never having to type", and I guess, it's also a great selling point - even though it's not the greatest pattern in a big application. But catering to one off solutions, script-style code all the way to full size apps is neat. And in the prototypal stage types can really "get in the way", even though they're a must have in the more complex scenario.

So I totally understand why many urge for a solution that keeps the possibility of "complete inference" (that sounded a bit strange).

So, usage wise, it could be seen from two perspectives:

  • Big complex app - it's good to use typing no matter what
  • Prototypal / one off / scriptish app - it's very nice to not have to type

But then we come to the technical aspect, which @asterite has elaborated on:

  • There are confusing cases where the compiler (currently) have a hard time to (can't) figure out the type
    • this can probably be solved, by selectively making the compiler less lazy (requires the change of the compiler into more decoupled ordering phases - which is planned anyway)
  • There are confusing cases where the compiler can figure it out - but it takes tiiiime, requiring it to look through the whole program each time
    • for those who look at their program and says "shrugs - compile speed is ok..." - realize that the slow down when your application grows is far more than linear! and all applications grow. So, as @asterite says - it really must be handled if Crystal is not to be placed in a one-trick pony corner.
    • As also pondered by many here - there just might be a better way to do it!?

The reactive compiler concept (I use the concept name of @waj and @asterite from here on) is really cool. I implemented something akin to it for pre-JS some years ago - it's not fully comparable of course. But the idea is:

  • Iff it is possible to get an AST-tree to be mutatable ("subtracting" and "adding") by diffing previous source vs. modified (associated with the node) - the reactive compiler approach as primary solution is preferred - completely ditching file caching notions.
  • If other projects are compiled, and the daemon is starting to get heavy memory wise - it dumps the most stale project(s) from cache - they will have to be recompiled from scratch next time "crystal..." is called in their dir.
  • No file-caching is attempted at all, it's the iterating / turn over compilations we want to make faster, and then the state is cached in the daemon. When jumping between a bunch of projects, or leaving it for a while there may be a flush, depending on machine memory. This is not the whole world (you get current compilation speed - which still is acceptable, perhaps a bit slower given some additional house keeping for the incremental tree).

All this being said - it is quite an enormous endeavour.

As it is now, lexing and parsing takes basically no time, type inference takes a great chunk, and then surprisingly, figuring out which IR has to be compiled to objects take quiet some time. And then a great chunk is spent by LLVM of course (a significant chunk of time if --release of course - but that's an exception)

I spent an entire evening trying to optimize the simple last phase in the compiler, with all kinds of models of CRC'in/hashing instead of generating llvm-binary encoded files as comparison, etc, but merely got time down insignificantly, finally deciding to not waste time on it thinking about the dream scenario of the reactive compiler. Further pros with the reactive model is that not even IR would have to be re-generated until the branch is "dirty". Iff one manages to correlate 'source-code -> dependant nodes -> (other dependant nodes) -> llvm-IR'. Well, it's a big if...

I've also had the run-macros in the back of my head - in above mentioned, hypothetical solution, they could be wrapped in formalia code making them (distinct) daemons too. This way macros, as long as they're unchanged could run as daemons receiving and returning mutated code through nanomsg, or any messaging system/model preferred, thereby reducing compilation time when utilizing them also, without having to ever waste the time to implement an interpreter for the entire language (AND allowing c-libs like BigInt/whatever in run-macros, which is not possible in Nim for instance - where also a considerable amount of resources go into maintaining the interpreter part of the compiler). Further more, taking the concept further, compilation could ultimately be distributed!!

Well - this was a mouthful.

[ed: fixed typo]

@mverzilli

This comment has been minimized.

Member

mverzilli commented Oct 27, 2015

@waterlink Haskell forces you to be really explicit with the types of data structures, that's probably one of the reasons they got all those features (plus 30 years of research and development, of course :)).

So in a way what the guys here are proposing is closer to Haskell than the current language specs.

@asterite

This comment has been minimized.

Contributor

asterite commented Oct 27, 2015

@waterlink What @mverzilli says is true. Here's a page that shows that.

data Shape = Circle Float Float Float | Rectangle Float Float Float Float  

Then you don't have to specify types in functions, but that's also how our language will work :-)

Putting the compile times aside, there's another issue that will improve: memory usage. Right now to compile the compiler (40k+ lines of code) it takes 930MB of memory (and it's not LLVM's fault: just the type inference phase reaches that memory). The problem is that because everything needs to be interconnected at all times (because types might change) we can't free that big mesh. So the bigger the program, the more memory it will consume. More memory also means more time spent in the GC traversing all that non-freeable structure.

Imagine a bigger program (at work we have a Rails app with 200k+ lines of code, and that's excluding the gems it uses). If we just multiply the current times we get 50s to compile the app and 5GB of memory. Doesn't look very nice.

With the new approach once we type a method we can discard all that interconnection because types can no longer change.

Back to the reactive compiler, I think with the new approach we could make one. With the current approach a background process consuming gigs of ram doesn't sound very nice or useful.

@ozra

This comment has been minimized.

Contributor

ozra commented Oct 27, 2015

One idea for the inference cases like #1809 is of course that if a type is not instantiated - it could be ditched from all type unions - since it will not affect program, and hence it's "could-be" extension of method-types is a non matter. That's just one part of it all though.
Trying to find a way for the "no types" crowd :)

@asterite

This comment has been minimized.

Contributor

asterite commented Oct 27, 2015

@ozra The problem is: how do you find out that a type is never instantiated before starting to type the whole program?

@kirbyfan64

This comment has been minimized.

Contributor

kirbyfan64 commented Oct 27, 2015

WHAT IF...

the type of a member variable could not change from the initialize function? That way, in most cases, you still wouldn't need to specify the type. BUT the right-hand-side of an assignment wouldn't be restricted to simple stuff. Mypy does this.

I think there are other ways to achieve speed. Again referencing Mypy, it type-checks itself rather quickly, and it's written in Python, which is definitely NOT the fastest language on earth!

@bcardiff

This comment has been minimized.

Member

bcardiff commented Oct 27, 2015

I like to keep the typing annotations as minimum as possible. For a while I though the compiler service might overcome the growing compilation time, but @asterite already exposed some concerns why might not be feasible, at least how things are right now.

A proposal was made for somehow using the existing compiler to make an initial guess for type inference, keep that guess and use it as a hint for a more fast compiler phase. But discarding outdates assumptions due to change in the code will require pretty much the same memory and even a bit more calculations to get it right.

So instead of thinking of all instance variables will need to be annotated, which sounds annoying and not so tempting for sure I now see this proposal as.

  1. A class type is determined from it's own code, not is usage.

  2. The code inside the class is used to determine the type of ivars, using unions etc.

    2.1 I'm not sure why use only the initializer rather than all the methods

    2.2 Initially, as it was explained, only literals assignments, restricted variables assignments and Class.new calls (with non-overrided new) will be used.

  3. If there is no type information inferred, then the programer will need to disambiguate.

I think that more rules will be able to appear probably and less annotations will be required. But is a nice starting point to shift to a modular compilation IMO.

The compiler nowadays needs to infer types for all ivars, despite the fact that we don't explicit them. We will be loosing some cases initially, but not all, not a lot.

Things like

class Person
    property name
    property age
end

won't work as they are working right now. But mainly because the class itself has no behavior than a bag of data. It's less a prototyping experience without those kind of thing, but I would prefer to lose that than seen crystal as toy language.

@oprypin

This comment has been minimized.

Contributor

oprypin commented Oct 27, 2015

Array(Object) has me sold. "using any type as a generic argument" is very important and I would like to see it work everywhere like I expect it.

@ozra

This comment has been minimized.

Contributor

ozra commented Oct 27, 2015

@asterite - that idea was based on the assumption of a rewrite, just with different goals, where inference can be handled gradually, solved as needed information becomes available from other nodes, not in a "monolithic" type inference phase. My explanation is bad, there's probably a good term for it, but I don't know it B-)

In any event - I will be perfectly happy with the proposed model.

@asterite

This comment has been minimized.

Contributor

asterite commented Jan 17, 2016

An update on this: since the new compiler will take a while to develop, specially because the main algorithm changes and we plan to simplify a lot of things, for now we'll try to introduce these changes in the current compiler (for example we just introduced a first pass to declare types, which already fixes several bugs). These might not produce a big performance improvement, but once we have all code working under these new rules we won't be breaking any code when we introduce the new optimized compiler. In short, the sooner we finish defining the language and stop breaking backwards compatibility, the better.

Also, we'll continue working on the concurrency model in parallel (no pun intended ^_^), with the goal of running fibers in multiple threads, and maybe allowing user code to create threads at will.

@ylluminate

This comment has been minimized.

ylluminate commented Jan 26, 2016

@asterite what's the current condition of concurrent distributed application development with Crystal? I have found myself quite enamored with Julia recently, but am missing Crystal.

@theduke

This comment has been minimized.

theduke commented Jan 29, 2016

I recently discovered Crystal, and just as a general remark:

BIG support for this.
The point of using a compiled, type safe language is not only speed, but many compile time guarantees that you lack in a dymanic language.

The examples in the documentation where a class instance variable ends up having potentially mutliple types ( tuple(A,B) ) immediately turned me off. Severely.

I could just use Ruby / Python for that, almost all apps can perfectly live with the performance, and just substitue with C extensions when neccessary.

This will make Crystal considerably more appealing to me.

Also: +1 for the 'interface' concept.

@Perelandric

This comment has been minimized.

Perelandric commented Jan 29, 2016

I just want to reiterate my support for this.

When I first learned about Crystal about a year ago, I read the docs and liked all of it until it came to the dynamic class members, at which point I dismissed it. Only came back because I was stuck on Nim bugs I couldn't work around.

Crystal is great and I think will have a much broader appeal if it's tightened up just a little bit. Until the new compiler is complete, I'm using Go, which can be aggravating. I've resorted to building code generators to make the language a little more tolerable.

Looking forward to writing most of my code in Crystal and Pony! 😃

@theduke

This comment has been minimized.

theduke commented Jan 29, 2016

Haha, @Perelandric:

I'm in a very similar spot.
Used Go for half a year but so many limitations (no generics, no proper inheritance or composition, no enums, just to name a few) made me look for something else.

First stumbled upon Nim too, which is a conceptually AMAZING language, but it's so full of bugs in both the language proper and the stdlib, and badly maintained (not the maintainers fault, it's a huge language and they would just need more people or narrow the scope of the language and the stdlib), that it's not usable for any serious work.

And so I stumbled upon Crystal... got high hopes, we'll see.

Sorry for the OT.

@Perelandric

This comment has been minimized.

Perelandric commented Jan 29, 2016

@theduke lol, yep sounds like we're in the same boat!

I especially think proselytizing the Go community will pay off for Crystal big time. Go still has to preach its lack of features as some sort of zen, minimalist philosophy. Works to some degree, but in the end, we just need to get some work done!

@jreinert

This comment has been minimized.

Contributor

jreinert commented Feb 23, 2016

@asterite very much in favor of this. Would this make writing and using shared, linked crystal libraries possible? If not, is there any other way to make that possible? Maybe by having something similar to lib but the other way around, to expose methods of your library to the global namespace and reduce crystal types to primitive ones.

@asterite

This comment has been minimized.

Contributor

asterite commented Feb 24, 2016

@jreinert Maybe yes, maybe no, the changes don't have that as a goal, but doing it becomes easier, just not trivial.

@dsounded

This comment has been minimized.

dsounded commented Apr 12, 2016

I just discovered and started to contribute to the community recently, so I understand my opinion doesn't weight much, but I beg you to gather data from the community at large about this topic, as it might be a deal breaker for an important amount of us (or not)

+ 1_000_000_000

@asterite

This comment has been minimized.

Contributor

asterite commented Apr 12, 2016

@elthariel @RX14 @dsounded

As we'll soon merge #2443 and this will finally land, I want to write a few things about why I think this is not such a deal-breaker as many of you think.

First, right now you can't write [] nor {}, the compiler will complain. You have to write [] of Int32, {} of String => Int32, or Set(Int32).new. This is not how you would write it in Ruby, you have to be explicit about types in these cases, yet I don't see anyone complaining about this. Maybe it's because this behaviour was present from the beginning (well, in the first three months of development it wasn't like that, but the language wasn't publicly known), so the behaviour is not annoying or surprising. There's also the thing that you need to write ->(x : Int32) { x + 1} (you can't just write ->(x) { x + 1 }. And if you capture a block you also need to specify types: def (&block : Int32 -> Int32).

Now we are making a change where you need to specify the types of instance variables, and in most cases, as shown in the diffs I mention here the compiler will be able to guess a correct type from all things assigned to them. This almost always will boil down to adding type restrictions in initialize methods.

If you do some stats you will realize that the number of array and hash literals, proc literals and block arguments where you need an explicit type is much bigger than the number of initialize methods you will have in a program, so this change is really a minor one. If the language worked like this from the beginning I don't think there would have been many complaints.

There's a different problem, though: "Yeah, right, now we need global/class/instance vars to have an explicit or easily guessable type, why shouldn't I think that in the future I will need to add type annotations in every method argument and return type?". For starters, if the language changes like that, I will stop using it because then yes, the language would be much more painful to use. So, required types in method arguments and return types will never happen. I'll explain why I know this is true:

Right now the compiler has to have in memory all the code and the analysis it's making, and can't discard intermediate results, because some assignment to a global/class/instance variable might change its type and that can affect a totally unrelated method that uses those vars. In short, the compiler has a tangle of all the code (through bindings) and can't release it. With this change this won't happen anymore: once you analyze a method, its return type can't change, because nothing external to the method (the arguments it uses, the instance vars, other methods, etc.) can change type anymore. So, we can analyze one method and the free the data structures (bindings) needed for that analysis. I believe this will dramatically improve, at least, memory usage (right now to compile the compiler you need at least 1GB ram, which is way too much... imagine a bigger program), and will allow us to do incremental compilation in the future.

But note that the compiler doesn't need to know upfront the types of method arguments and return types, because once they are inferred they can't change. This isn't true if global/class/instance vars types aren't determined upfront.

Does this change means the language becomes some other language, maybe Java, Go, C#, Nim, D, Rust, OCaml, Swift or Pony? I don't think so. There are many interesting things in Crystal:

  • No need to be super explicit about class/global/instance vars in some cases. If you write class Counter; def initialize; @count = 0; end; end; it's understood that @count is Int32, you don't have to write @count : Int32.
  • No need to write the type of method arguments and return types: this makes it easy to extract pieces of a method to another method, and it's also super useful for private methods where types aren't even needed for API-level users.
  • Open classes: you can add and redefine methods, so things like webmock.cr and timecop.cr
  • Multiple dispatch (this is different than just method overloading)
  • Inlined blocks and captured blocks, allowing to add constructs to the language that feel like regular language constructrs (like 5.times { |i| puts i }
  • Type flow anlaysis, meaning that if you do if var, if var.nil?, if !var, if var.is_a?(T), the variable's type is narrowed inside the then and else branches (I don't know if another language does this. I know in Swift for example you have to use a second ariable). You can also do early returns like return unless x (in Swift you have to use guards). Coupled with !, && and ||, it makes the language feel like it understands what you mean without having to add additional syntax.
  • Array, Hash, Range and Regex literals (and I find the $~ variable pretty useful for text processing)
  • Structs and tuples (for example Java doesn't have them)
  • Everything looks like an object (you can reopen Int32 and add methods to it)
  • Macros (I think they are simpler and easier to use and understand than in other languages, although probably less powerful)
  • Type safety, and null doesn't implicitly belong to all reference types (as is the case in Java, C# and Go).
  • Batteries-included standard library, including non-blocking IO, CSV/JSON/YAML/XML, HTTP::Client and Server, OAuth, a simple spec library, etc. This is something that's not directly related to the language's syntax and semantic but I do think it's important as it makes all programs nicely interact with each other using common types and idioms.
  • Compile to native code, inline assembly, access to C libraries, raw pointers, etc., so it's both a low-level and high-level language.
  • Ruby-like syntax. Yes, this is something worth mentioning. We love Ruby and the how its code looks, and we think this is something really nice to have in a language, specially if many constructs translate directly (like return unless x, if x.is_a?(...), etc.).
  • A bootstrapped compiler, so you can contribute to it if you know how to program in Crystal (no need to code in C/C++)
  • (and probably other things I forget right now)

I believe the language will still make sense after this change 😊

@kirbyfan64

This comment has been minimized.

Contributor

kirbyfan64 commented Apr 12, 2016

👍

@chocolateboy

This comment has been minimized.

Contributor

chocolateboy commented Apr 12, 2016

+1 for this change.

I don't know if another language does this.

Kotlin also does it :-)

@asterite

This comment has been minimized.

Contributor

asterite commented Apr 12, 2016

@chocolateboy Thanks for mentioning it. Kotlin is a really nice language that borrows many nice ideas from Ruby too. I think they now have something like blocks where if you return from them you actually return from the method, like in Ruby.

@dsounded

This comment has been minimized.

dsounded commented Apr 12, 2016

@asterite open classes it's good and bad in the same time, but it's not about this topic. I think, that static type checking is OK for interfaces, but when you write a class which you use only for your own and it hasn't any connection to public interface static type checking is excess. I do appreciate your job fellas, Crystal is only one thing that I really have liked after Ruby. initialize is just another method, the "keyword" method probably, but the method. It will be really strange for me, I'd like to see two ways: the way as it was or the static type checking all around the language(we will need to rewrite a lot), but this will be unify.

Anyway, it's your decision, and as I've seen you won't change it, all of you do extremely good job, it's just situation when we've got different points of view. Basically, IMO you should have done a poll for this.

@elthariel

This comment has been minimized.

Contributor

elthariel commented Apr 12, 2016

Hi @asteriste and others,

I've been giving this frequent thoughts since this conversation started.

Because of this change and the uncertainty over the future of the language,
I finally decided to write my project in C++. After a few months using this
beast again, which I hadn't practiced much for a few years, as well as
toying with other new languages I realized there's still a lot of value in
Crystal even with this change (C++ templates instantiation traces have this
effect on people I guess).

In the end, this change solves a real problem and consolidate the types
declaration in one place, which will improve the code readability as well
as solving modular compilation amongst other things.

However, despite seeing the benefits, I'm still not a big fan of those
changes and disagree on some parts of the definition of the problem. Let it
be stated that working in Facebook's codebase for more than a year gave me
a different perspective on software development, and that this point of
view might not be ideal for smaller companies or project with limited
resources.

  • Compilation is using 1GB of RAM: I don't care. My computer is here to
    make my life easier. I'll add more RAM if I need to if the value I get out
    of it is worth it. Last time I checked, Chrome could only be compiled on a
    64bit machines as they need more than 4GB of ram. iirc, they build/link
    everything in one call to allow for further optimizations.
  • I'm wondering if there wasn't another approach to this problem using a
    daemon that would perform the type inference in the background while you
    write code and/or cache the results of the type inference. More and more
    tools at Facebook/Google (and I assume other giants) are built using this
    design. I think it'll grow beyond these companies.
  • Finally, and I don't remember if I already talked about this but I would
    have preferred to first think about what is a module for Crystal, and then
    enforce specifying type at module boundary:
    ** You want to have a global, if you want it to be accessed out of the
    module, mark it as exported or something, then define it's type.
    ** You have a class that is exported out of the module: define its public
    prototype clearly.
    ** You're doing magic that is internal to your module: Crystal will take
    the time to infer things types.

That being said, I also appreciate that Crystal is not an R&D only project
and that you aim at production usage. This obviously gives you less time
for intellectual masturbation and makes you focus on getting shit done,
which is another argument in favor of this change.

As a conclusion, I'm probably going to keep thinking about other approaches
to this but I'll definitely give it another spin once this change lands.

On Tue, Apr 12, 2016 at 5:06 PM, Kiril Dokh notifications@github.com
wrote:

@asterite https://github.com/asterite open classes it's good and bad in
the same time, but it's not about this topic. I think, that static type
checking is OK for interfaces, but when you write a class which you use
only for your own and it hasn't any connection to public interface static
type checking is excess. I do appreciate your job fellas, Crystal is only
one thing that I really have liked after Ruby. initialize is just another
method, the "keyword" method probably, but the method. It will be really
strange for me, I'd like to see two ways: they way as it was or the static
type checking all around the language(we will need to rewrite a lot), but
this will be unify.

Anyway, it's your decision, and as I've seen you won't change it, all of
do extremely well job, it's just situation when we've got different points
of view. Basically, IMO you should have done a poll for this.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#1824 (comment)

@kirbyfan64

This comment has been minimized.

Contributor

kirbyfan64 commented Apr 12, 2016

@elthariel I personally feel it's also faster compile times, which are always important!

@elthariel

This comment has been minimized.

Contributor

elthariel commented Apr 12, 2016

Good caching also helps for this

On Tue, Apr 12, 2016 at 10:00 PM, Ryan Gonzalez notifications@github.com
wrote:

@elthariel https://github.com/elthariel I personally feel it's also
faster compile times, which are always important!


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#1824 (comment)

@kirbyfan64

This comment has been minimized.

Contributor

kirbyfan64 commented Apr 12, 2016

Which was one of @asterite's initial points; it's easier to maintain a cache with the types of instance variables known ahead of time.

@asterite

This comment has been minimized.

Contributor

asterite commented Apr 12, 2016

We actually thought about a deamon and caching stuff from the previous compilation, but with the current global type inference it's really hard to do and we weren't sure that it would be efficient in the end. In the way we are currently going, code becomes much easier to understand but also the compiler becomes simpler to implement, which might get more people into developing it and fixing bugs, optimizing it or implementing new features. Getting into a daemon compiler, computing AST deltas and caching stuff is definitely harder to do and understand (well, we might do caching and deltas, but it'll be much easier this way).

I also feel (and see, based on the diffs) that this isn't the big change I was fearing, and it wouldn't have bothered me at all specifying all those types in the initialize methods if it had been like that from the beginning. Remember, it's usually one, and otherwise a few initialize methods that will need type annotations, compared to many other methods a class has. For example Set has only one initialize, 337 lines of code, and in this case we didn't have to explicitly specify a type. Another beast is the compiler's Lexer, which actually just needs two type annotations (the others there are mostly informative), in a file that has 2473 lines of code.

And some error messages will improve and be easier to understand, because a type won't be able to be instantiated with incorrect arguments. And the compiler will (probably) be faster and consume less memory.

So yes, we had many options to choose, but we think that this is the best of them.

@asterite

This comment has been minimized.

Contributor

asterite commented Apr 13, 2016

I'm closing this, as all the logic related to this task is already present in master. The compiler's code will need a cleanup to get rid of old code that is no longer necessary (or at least not necessary in the way it currently is), and of course performance improvements could be applied, but that's a separate issue.

@asterite asterite closed this Apr 13, 2016

@mrkaspa

This comment has been minimized.

mrkaspa commented Apr 21, 2016

I would rather to specify types in method signatures, the compilation will be faster and the types works as documentation, and handle the "duck typing" using something similar to the Go Interfaces or allowing to define abstract methods in the module, something similar to Scala traits.

@pannous

This comment has been minimized.

pannous commented Oct 15, 2016

Shall we open a new issue to track the progress of the very important incremental build feature?

@asterite

This comment has been minimized.

Contributor

asterite commented Oct 15, 2016

@pannous If you want yes. However, I don't think it will happen anytime soon as it's a really huge task and there are more important things right now (parallelism, std)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment