The next step #1824

Closed
asterite opened this Issue Oct 26, 2015 · 139 comments

Projects

None yet
@asterite
Member

An always present worry that we have with the language is that it won't be useful for large projects. Since the global type inference algorithm always has to start from scratch, it's good if we find a way to do incremental or partial compilation: some way to reuse a previous compilation's result for the next one. That means that the first time you checkout a project it might take a few seconds to compile, but the next compiles should be fast, given small changes are made to the codebase.

We sat down and thought how it can be done with the current language, and we think (or at least we are strongly convinced) that there's no efficient way to do this. The language might need a change.

We thought of the minimum delta that could make this work. The conclusion is not a happy one, but it's also not the worse one could imagine:

When declaring a type (class, struct) you have to specify its instance variables and their types. The same applies to class variables and global variables.

After some more brainstorming we think that the above rule can be relaxed in some cases: the compiler will try, for example, to infer instance variables types from the expressions assigned to them in the initialize methods, if these expressions are simple (literals and new calls). An example of where this simplification can be applied is the CSV::Lexer class from the standard library:

# Current code will compile just fine
class CSV::Lexer
  def initialize
    @token = Token.new         # inferred to be Token
    @buffer = MemoryIO.new     # inferred to be MemoryIO
    @column_number = 1         # inferred to be Int32
    @line_number = 1           # inferred to be Int32
    @last_empty_column = false # inferred to be Bool
  end
end

Another rule is defining a type restriciton of an initialize argument. For example:

class CSV::Builder
  # @io inferred to be an IO
  def initialize(@io : IO)
    @first_cell_in_row = true # inferred to be Bool
  end
end

But note that if in the above cases an instance variable is assigned another type in other methods, it will immediately be an error: if you want a union, define the instance variable type as such.

In other cases you'll inevitably have to write down the types:

class Markdown::Parser
  # Needed because the expression assigned to it
  # in the initialize is not simple (involves calls but it's not a `new` call)
  @lines :: Array(String)

  # @renderer is inferred to be a Markdown::Renderer
  def initialize(text, @renderer : Markdown::Renderer)
    @lines = text.lines.map &.chomp
    @line = 0 # inferred to be an Int32
  end
end

The current compiler works by taking into account all assignments to instance variables in the whole program. This has many (bad) consequences:

  • Because their type might change by mistake, if an errors follows it is hard to digest and hard to track. Also, the rule of "if it's not assigned in the initialize then it's nilable" is not very intuitive and, again, can happen by mistake. By having a clear rule of what are the types of instance variables these type-changing errors should happen less or at least will be easier to understood. Also: accessining a non-declared instance variable will immediately give an error, instead of propagating a nil value.
  • Because their type might change at any point in the program, they can affect the type of any method that uses them. This means that local type inference for a method can't be done: you always have to take into account the possibility of an instance variable changing after that method has been typed. If the compiler knows the types of all "global" data (instance and class variables, and global variables) local type inference becomes possible, leading to a faster compiler but most importantly allowing the possibility of reusing a previous compilation's result (for example by caching the "dependencies" (types, other methods) of each method).

The second point is the most important. If we want Crystal to be used for real, large projects, it's paramount that the compiler can handle such projects and that you don't have to wait 30 or 60 seconds between compiles. We prefer making this change rather than not doing it now and later get suck with a language that is useful for "mostly toy" projects.

Because this change has a huge impact on the overall algorithm of the compiler, we decided to take this opportunity to also rewrite the compiler "from scratch". This means: we'll rewrite it taking into account the current code and specs, but we won't reuse most of the code we have (well, the lexer and parser will be reused, the formatter, as well as some other bits). The current code, even though it's kind of efficient and more or less organized, has some flaws that are hard to fix without a rewrite (for example, we'd like to make an initial pass to gather all types so that we know which types have subclasses and so are "virtual"). And we'll do it taking into account the current bugs (which are about 45 and are more or less related to the same compiler/language flaws).

The good news is that we'll make sure that this new compiler is perfectly commented and documented: you'll be able to jump right into its source code and all phases and algorithms will be explained (promise!). Right now this is far from this case :-)

Let's summarise the cons and pros of this change:

Pros:

  • Faster compile times (even without using a previou's compile result), and incremental compilation becomes possible.
  • Better error messages when incorrectly assining a wrong type to an instance variable, or when reading a non-existant one.
  • No need for a human to infer which instance variables are nilable: if they are not mentioned in the initialize there will be a type annotation saying its type.
  • We have some ideas on how to make possible Array(Object), Array(Reference) and basically using any type as a generic argument. You'll be able to have an Array(Object) and invoke to_s on each of them, for example. Because the compiler will know all types before the "main" code, virtual method lookups (that involve search a type's subclasses recursively) will be faster and can be cached because the type hierarchy will be fixed.

Cons:

  • It might be tedious to sometimes write these types. However, when you program, most of the time is spent in defining methods and in using types, not in defining the type of their instance variables. And you always know their type. A downside is that this might not go well with duck typing and you might have to define a module to act as an interface. But you can take that change to your benefit and define abstract methods on that module that tell you what has to be implemented (or you can leave the module empty: both approaches will work). Also: you'll be able to reopen a type and redefine the type of an instance variable, so mocks are still possible.
  • The promise of never having to write types down becomes false. However, this is already false: you have to do it for generic type arguments. And sometimes you do it in method arguments (although that's mostly for overloads, but it can also lead to better error messages and better documented code). And with the relaxation rules this won't have to be done all the time.

Remember that types won't be mandatory in method arguments or return types, so duck typing still holds. This method:

def add(x, y)
  x + y
end

has no type annotations and works on any type that defines a #+(other) method, so we belive Crystal is still unique and doesn't suddendly become an existing language.

Because the rewrite will take some time (and also because this next month we'll be busy with some specific work/life matters) you'll have to be patient. In the meantime the standard library can be grown and existing issues will be fixed unless they can only be fixed in this new compiler rewrite (some examples of issues we'll fix in the new compiler are #456, #718, #729, #846, #867, #916, #941, #962, #1346). After the rewrite you can use crystal tool hierarchy to port your code to the new version, or we might provide a migration tool.

@asterite asterite added the task label Oct 26, 2015
@jhass
Member
jhass commented Oct 26, 2015

I think that compilation times are still great and in no way a pain point of the current language. A far greater benefit and boost could be achieved by focusing on concurrency and parallelism idioms as well as the threaded scheduler. The longer that's postponed, the more standard library code will need to be checked against thread safety. I think the time you can allocate would be extremely well spent on that issue instead and benefit the current community tremendously more. I'm not sure I can find the words to emphasize this strongly enough really. I'm truly convinced this is the wrong priority here, sorry.

But that's just my two cents, I can only give my opinion and don't dictate you anything of course.

@Perelandric

For my part... πŸ‘ and it has little to do with compile times. I'm a fan of having more explicit type requirements in general.

Because I know going in that Crystal isn't exactly the way I'd want it in some areas, it would be silly of me to complain about it. I like it as a language either way so I'm willing to go with the flow but this would be very welcome from my perspective.

@asterite
Member

I forgot to mention that this change isn't only to improve compile times. Consider this:

class Foo
  def foo
    1
  end
end

class Bar < Foo
  def initialize(@foo)
  end

  def foo
    @foo
  end
end

a = [] of Foo
a << Foo.new
puts typeof(a[0].foo) #=> ???

The compiler doesn't know the type of @foo, because Bar was never instantiated, so what's the type of the expression above? Well, right now it's kind of hacky, and it involves tracking which types were instantiated and recalculating types and methods when this happens. A real example of this issue is #1809 . If you have to somehow specify the type of @foo, this problems disappears.

So it's not just for a faster compiler: it's for a better, more complete and non-broken language.

@jhass As I said, it's not that we'll only be working on this new compiler: the standard library will continue to grow and we'll fix bugs as they appear.

@jhass
Member
jhass commented Oct 26, 2015

And as I said I'm convinced it's wrong prioritization at this point. I'd rather give this issue more time and act quickly on the concurrency issue, than acting on this quickly with "easy" solutions through more explicitness.

@waj
Contributor
waj commented Oct 26, 2015

I don't see concurrency as a problem right now. It's also a priority for us to have proper concurrency support for the language but the language is still usable. Also, this can be done in parallel, as I don't think adding concurrency support will affect most of the standard library code or even the compiler itself (probably not affected at all).

On the other hand, our concern right now is to make sure that the language can scale with projects of any size. We believe is our responsibility to make it happen so we don't hit a big wall in the future and leaving this (big) growing community disappointed.

To be honest, at some point we even thought we could leave concurrency support for a future version and make the language robust for 1.0. Probably not gonna happen this way after all, but that's how much transversal problem I think it is.

@luislavena
Contributor

@asterite in your example, I would rather see a compiler error telling me that I'm using @foo uninitialized that being imposed to detail all the types of each instance or class variables.

Perhaps a different approach can be taken? incremental compilation/analysis? If I'm not mistaken Rust have this in their RFCs (not yet prioritized, even they are post-1.0 era)

https://github.com/nikomatsakis/rfcs/blob/incremental-compilation/text/0000-incremental-compilation.md

@asterite
Member

@luislavena You will get a compile-time error saying something like No type specified for @foo of Bar and to fix the problem you'll have to specify its type. How can you determine the type of Bar#foo without knowing the type of @foo? Or maybe I'm misunderstanding something.

Rust requires type annotations in types and methods, and that's why they'll be able to eventually do incremental compilation. We are aiming for the same goal and the minimum requirement is to have types (and their instance variables) well defined before we start typing the whole program.

@samis
samis commented Oct 26, 2015

I'm not fan of more explicitness and increased number of required type annotations. Is it not possible to let the user decide if they want to make the trade-off with regards to compile times? I would not like to see this language become a modified clone of some other static/dynamic type language with different paint (i.e syntax.) As for concurrency, I don't have an opinion about that.
(P.S At this stage, might this be a premature optimization?)

@ysbaddaden
Member

I'm not a type person, so having to specify more types is daunting... at first. Yet, thinking about it, it may be a good thing to restrict the struct/class variable types as proposed here. I just hope we won't see more and more changes in the language to have to specify types more and more and more 😨

The current compiler is acceptably fast on a decent computer for small applications (eg: Shards), but I acknowledge it slows down quickly when the number of types grows up to a mid-size application (or with a heavyweight framework). If it could be lightning fast, and we could transparently recompile the app live on each file save, and this only requires to specify the type of hazy struct/class variables... then I guess I can live with it.

I'd love to see parallelism being worked on in parallel (pun intended) as @jhass suggested. But let's have another, dedicated, issue to discuss this.

@kirbyfan64
Contributor

I don't think this would help that much. IMO, Crystal still builds itself faster than building a C++ application half as long and is even faster than Nim, so there's not really much to complain about in that field.

@benoist
Contributor
benoist commented Oct 26, 2015

Would this also make it possible to create a REPL?

@asterite
Member

@samis There are many issues if we make this optional:

  • You include a shard in your project and your compile times slow down because the author didn't optimize it for compile times.
  • There's still the issue with uninstantiated classes I mentioned above. This change plans to solve all remaining issues with the language.

@ysbaddaden We are pretty sure this is the last change regarding types. After years of developing the compiler and writing code in Crystal the major pain point is the lack of a fixed type for instance vars: methods sometimes can't be typed (like the example above) and you need some hacks here and there.

@kirbyfan64 Last time I compiled Nim it took 2.5s (or 5s?) to compile itself. That's including the codegen. We are at 10s on the same machine, but I think Nim's compiler is much more complex (for example it has a VM in it). We also don't want to use C++ as an excuse, such as in "Oh, yes, it takes 20s to compile, but I'm sure it takes a couple of minutes in C++" :-)

@benoist It will definitely make creating a REPL much easier (but still not trivial). It will also improve times for tools for IDEs, so we could have autocomplete and jump-to-definition tools that are fast.

I forgot to say that we'll develop this in parallel and it will be an experiment: if everything goes well and we see an improvement as a whole, we'll take the change. Otherwise we'll remain with the current language.

@kirbyfan64
Contributor

@asterite Huh? Nim caches C source files to avoid re-generating them. Try deleting nimcache.

I don't mind this as an experiment to see how it turns out, but I personally don't really like the idea. There have to be better ways to improve speed... :O

@ozra
Contributor
ozra commented Oct 26, 2015

I've been juggling around ideas in my head about incremental compilation in Crystal, and all my ideas has ended up with a compiler daemon, that keeps the inferred AST in mem. When compile command is used it simply signals the daemon for the proj dir, it checks which files are changed, lex, parse, transform their nodes, update all nodes that require changes, and re-infers only what's needed. But that's just a dreamy thought - haven't reasoned about the devilous details yet.

Typing data members sounds like a really good minimum requirement to me. Being sure about data structures' types is foundational imo - and not much work. The types visually go well there since it's already a more formal construct. And being relieved of typing methods is a must-have, which is nice to hear is part of the planned re-write.

The idea of compiling types in two steps sounds good. The Nim compiler has two such phases - though only operating on one "types-collection" at a time, which is a bit limiting imo.

I think prioritizing this before concurrency is good, since it's so fundamental to the progress of the language. Concurrency is an important aspect, but having a correct foundation is imperative.

@waj
Contributor
waj commented Oct 26, 2015

@ozra we had a similar idea. We even named it internally as the "reactive compiler" because it would react to changes and rebuild the minimal difference. The problem with the current design is that the type inference is very tight with the order of parsing. Another goal of the new design is detach as much as possible the order of declarations without affecting the semantics drastically. Then the information used for caching could be stored on disk and reloaded each time, or just live in memory within the compiler daemon for the project.

@samis
samis commented Oct 26, 2015

I find @ozra's idea of a compiler daemon interesting, but I disagree with the minimum requirement. I do agree that the issue raised by @asterite with the code example is an issue, but I'm not sure how important of an issue it is.

@waterlink
Contributor

My 2 coins into this discussion:

What about formatting (or other, like infer) tool trying to detect types
automatically and insert annotations in code? It will do it only for
non-annotated instance variables and if you want to change the type or add
a type to union you will have to do it manually. I think this tool will
cover at least 90% of the cases and allow not to write a lot of annotations.

WDYT?

Best Regards,
Alexey Fedorov,
Sr Ruby, Clojure, Crystal, Golang Developer,
Microservices Backend Engineer,
+49 15757 486 476

On Mon, Oct 26, 2015 at 11:11 PM, Oscar Campbell notifications@github.com
wrote:

I've been juggling around ideas in my head about incremental compilation
in Crystal, and all my ideas has ended up with a compiler daemon, that
keeps the inferred AST in mem. When compile is called it simply signals the
daemon for the proj dir, it checks which files are changed, lex, parse,
transform their nodes, update all nodes that require changes, and re-infers
only what's needed. But that's just a dreamy thought - haven't reasoned
about the devilous details yet.

Typing data members sounds like a really good minimum requirement to me.
Being sure about data structures' types is foundational imo - and not much
work. The types visually go well there since it's already a more formal
construct. And being relieved of typing methods is a must-have, which is
nice to hear is part of the planned re-write.

The idea of compiling types in two steps sounds good. The Nim compiler has
two such phases - though only operating on one "types-collection" at a
time, which is a bit limiting imo.

I think prioritizing this before concurrency is good, since it's so
fundamental to the progress of the language. Concurrency is an important
aspect, but having a correct foundation is imperative.

β€”
Reply to this email directly or view it on GitHub
#1824 (comment).

@asterite
Member

I will try to explain why it's impossible to cache a previous compilation's result without fixing the type of instance variables. Take this code:

class Foo
  def initialize
    @x = 1
  end

  def x=(@x)
  end
end

def method(foo)
  foo.x
end

foo = Foo.new
method(foo)

The current compiler (after analyzing the whole program) will type Foo's @x as Int32, and create an instance of method like this (think of all methods as C++ templates):

def method(foo :: Foo) :: Int32
  foo.x
end

I use :: to mean "the type of the instantiated method" instead of "a type restriction".

Let's try to cache this method somehow. We can see it depends on Foo, and on the call Foo#x. If none of these things change we (supposedly) could reuse the previous compiler's result (meaning: no need to type that method, we can reuse its type information and also the generated LLVM code for it).

In turn, Foo#x depdends on the type of Foo's @x: if it changes, we need to recompute method's instantiation.

Now we change the program to this:

# This is the same as before except some added lines at the end
class Foo
  def initialize
    @x = 1
  end

  def x=(@x)
  end
end

def method(foo)
  foo.x
end

foo = Foo.new
method(foo)

# Now comes the added code

def problem(foo)
  foo.x = 'a'
end

problem(foo)

Can we still use the previous compilation's result? Let's see: did Foo#x change? It's source code didn't change. Let's see if Foo's @x changed. And here's the problem: we don't know unless we type the whole program again (we have to start from scratch, by first typing foo = Foo.new, then method(foo), then problem(foo), and find @x becomes a union, but when we get to that point it's already late). If the type of Foo's @x is written down (or inferred syntatically, without having to do semantic analysis) then the compiler can know that the method can/can't be reused without analyzing the whole program again, just what method source changed, or what types changed (it gets a lot trickier with macros, but we have some ideas).

By knowing these types beforehand the compiler can analyze method instantiations without having to do whole program analysis.

But if you propose a way to do it without these type annotations, we'll definitely do it the way we are doing it right now. But... it seems not possible. I don't consider the above a mathematical proof, but I tried to be more or less formal :-)

@waterlink
Contributor

Have anybody looked how type inference works in GHC (Glasgow Haskell
Compiler)? AFAIK they have full type inference and they have REPL and
incremental compilation. I might take a closer look later. Maybe for them
it is possible because type system is lambda calculus and it has some neat
properties that allow doing that IDK.

Best Regards,
Alexey Fedorov,
Sr Ruby, Clojure, Crystal, Golang Developer,
Microservices Backend Engineer,
+49 15757 486 476

On Tue, Oct 27, 2015 at 3:31 AM, Ary Borenszweig notifications@github.com
wrote:

I will try to explain why it's impossible to cache a previous
compilation's result without fixing the type of instance variables. Take
this code:

class Foo
def initialize
@x = 1
end

def x=(@x)
endend
def method(foo)
foo.xend

foo = Foo.new
method(foo)

The current compiler (after analyzing the whole program) will type Foo's
@x https://github.com/x as Int32, and create an instance of method like
this (think of all methods as C++ templates):

def method(foo :: Foo) :: Int32
foo.xend

I use :: to mean "the type of the instantiated method" instead of "a type
restriction".

Let's try to cache this method somehow. We can see it depends on Foo, and
on the call Foo#x. If none of these things change we (supposedly) could
reuse the previous compiler's result (meaning: no need to type that method,
we can reuse its type information and also the generated LLVM code for it).

In turn, Foo#x depdends on the type of Foo's @x https://github.com/x:
if it changes, we need to recompute method's instantiation.

Now we change the program to this:

This is the same as before except some added lines at the endclass Foo

def initialize
@x = 1
end

def x=(@x)
endend
def method(foo)
foo.xend

foo = Foo.new
method(foo)

Now comes the added code

def problem(foo)
foo.x = 'a'end

problem(foo)

Can we still use the previous compilation's result? Let's see: did Foo#x
change? It's source code didn't change. Let's see if Foo's @x
https://github.com/x changed. And here's the problem: we don't know
unless we type the whole program again (we have to start from scratch, by
first typing foo = Foo.new, then method(foo), then problem(foo), and find
@x https://github.com/x becomes a union, but when we get to that point
it's already late). If the type of Foo's @x https://github.com/x is
written down (or inferred syntatically, without having to do semantic
analysis) then the compiler can know that the method can/can't be reused
without analyzing the whole program again, just what method source changed,
or what types changed (it gets a lot trickier with macros, but we have some
ideas).

By knowing these types beforehand the compiler can analyze method
instantiations without having to do whole program analysis.

But if you propose a way to do it without these type annotations, we'll
definitely do it the way we are doing it right now. But... it seems not
possible. I don't consider the above a mathematical proof, but I tried to
be more or less formal :-)

β€”
Reply to this email directly or view it on GitHub
#1824 (comment).

@sdogruyol
Contributor

I also think that current compile times are really OK (excluding --release).

For me the main point of Crystal is:

  • Statically type-checked but without having to specify the type of variables or method arguments.

If we have more type checks ( we already have for generics and that's OK somehow) that'd be really detractive for me and type disliking developers.

@ozra
Contributor
ozra commented Oct 27, 2015

@sdogruyol - with release, the bulk of the time is spent in LLVM optimizer, and we can't make that magically go away, but I think we're both ok with that :)

@waj - that's really interesting to hear! The vague notion in my head was that every AST node should have a flag for which passes it has performed / are needed, and for the entire transformation and typing to be gradual. This would require a bunch of passes over the tree (the idea was to support "two-way inference" also) until all nodes are flagged as final. And as soon as code is changed - their references to sources must be compared to see which are dropped, what new nodes come in, which taint others (the dependency graph for type inference would likely have to be extended beyond just that, such as dependency on source-file [line diffing might be taking it a bit far] [still - this is available already via name/line/col]), and re-flag for inference, etc. And once again, these are just the ideas bubbling in the back of the head - and as always, when starting to examine the details - there are always devils you didn't want to think about ;-)

Even though I'm all for typing data-structures (aka members of classes), and there are strong indications from studies showing that typed code is both more stable, and faster to develop in the long run - I know many want the vibe of "never having to type", and I guess, it's also a great selling point - even though it's not the greatest pattern in a big application. But catering to one off solutions, script-style code all the way to full size apps is neat. And in the prototypal stage types can really "get in the way", even though they're a must have in the more complex scenario.

So I totally understand why many urge for a solution that keeps the possibility of "complete inference" (that sounded a bit strange).

So, usage wise, it could be seen from two perspectives:

  • Big complex app - it's good to use typing no matter what
  • Prototypal / one off / scriptish app - it's very nice to not have to type

But then we come to the technical aspect, which @asterite has elaborated on:

  • There are confusing cases where the compiler (currently) have a hard time to (can't) figure out the type
    • this can probably be solved, by selectively making the compiler less lazy (requires the change of the compiler into more decoupled ordering phases - which is planned anyway)
  • There are confusing cases where the compiler can figure it out - but it takes tiiiime, requiring it to look through the whole program each time
    • for those who look at their program and says "shrugs - compile speed is ok..." - realize that the slow down when your application grows is far more than linear! and all applications grow. So, as @asterite says - it really must be handled if Crystal is not to be placed in a one-trick pony corner.
    • As also pondered by many here - there just might be a better way to do it!?

The reactive compiler concept (I use the concept name of @waj and @asterite from here on) is really cool. I implemented something akin to it for pre-JS some years ago - it's not fully comparable of course. But the idea is:

  • Iff it is possible to get an AST-tree to be mutatable ("subtracting" and "adding") by diffing previous source vs. modified (associated with the node) - the reactive compiler approach as primary solution is preferred - completely ditching file caching notions.
  • If other projects are compiled, and the daemon is starting to get heavy memory wise - it dumps the most stale project(s) from cache - they will have to be recompiled from scratch next time "crystal..." is called in their dir.
  • No file-caching is attempted at all, it's the iterating / turn over compilations we want to make faster, and then the state is cached in the daemon. When jumping between a bunch of projects, or leaving it for a while there may be a flush, depending on machine memory. This is not the whole world (you get current compilation speed - which still is acceptable, perhaps a bit slower given some additional house keeping for the incremental tree).

All this being said - it is quite an enormous endeavour.

As it is now, lexing and parsing takes basically no time, type inference takes a great chunk, and then surprisingly, figuring out which IR has to be compiled to objects take quiet some time. And then a great chunk is spent by LLVM of course (a significant chunk of time if --release of course - but that's an exception)

I spent an entire evening trying to optimize the simple last phase in the compiler, with all kinds of models of CRC'in/hashing instead of generating llvm-binary encoded files as comparison, etc, but merely got time down insignificantly, finally deciding to not waste time on it thinking about the dream scenario of the reactive compiler. Further pros with the reactive model is that not even IR would have to be re-generated until the branch is "dirty". Iff one manages to correlate 'source-code -> dependant nodes -> (other dependant nodes) -> llvm-IR'. Well, it's a big if...

I've also had the run-macros in the back of my head - in above mentioned, hypothetical solution, they could be wrapped in formalia code making them (distinct) daemons too. This way macros, as long as they're unchanged could run as daemons receiving and returning mutated code through nanomsg, or any messaging system/model preferred, thereby reducing compilation time when utilizing them also, without having to ever waste the time to implement an interpreter for the entire language (AND allowing c-libs like BigInt/whatever in run-macros, which is not possible in Nim for instance - where also a considerable amount of resources go into maintaining the interpreter part of the compiler). Further more, taking the concept further, compilation could ultimately be distributed!!

Well - this was a mouthful.

[ed: fixed typo]

@mverzilli
Contributor

@waterlink Haskell forces you to be really explicit with the types of data structures, that's probably one of the reasons they got all those features (plus 30 years of research and development, of course :)).

So in a way what the guys here are proposing is closer to Haskell than the current language specs.

@asterite
Member

@waterlink What @mverzilli says is true. Here's a page that shows that.

data Shape = Circle Float Float Float | Rectangle Float Float Float Float  

Then you don't have to specify types in functions, but that's also how our language will work :-)

Putting the compile times aside, there's another issue that will improve: memory usage. Right now to compile the compiler (40k+ lines of code) it takes 930MB of memory (and it's not LLVM's fault: just the type inference phase reaches that memory). The problem is that because everything needs to be interconnected at all times (because types might change) we can't free that big mesh. So the bigger the program, the more memory it will consume. More memory also means more time spent in the GC traversing all that non-freeable structure.

Imagine a bigger program (at work we have a Rails app with 200k+ lines of code, and that's excluding the gems it uses). If we just multiply the current times we get 50s to compile the app and 5GB of memory. Doesn't look very nice.

With the new approach once we type a method we can discard all that interconnection because types can no longer change.

Back to the reactive compiler, I think with the new approach we could make one. With the current approach a background process consuming gigs of ram doesn't sound very nice or useful.

@ozra
Contributor
ozra commented Oct 27, 2015

One idea for the inference cases like #1809 is of course that if a type is not instantiated - it could be ditched from all type unions - since it will not affect program, and hence it's "could-be" extension of method-types is a non matter. That's just one part of it all though.
Trying to find a way for the "no types" crowd :)

@asterite
Member

@ozra The problem is: how do you find out that a type is never instantiated before starting to type the whole program?

@kirbyfan64
Contributor

WHAT IF...

the type of a member variable could not change from the initialize function? That way, in most cases, you still wouldn't need to specify the type. BUT the right-hand-side of an assignment wouldn't be restricted to simple stuff. Mypy does this.

I think there are other ways to achieve speed. Again referencing Mypy, it type-checks itself rather quickly, and it's written in Python, which is definitely NOT the fastest language on earth!

@bcardiff
Member

I like to keep the typing annotations as minimum as possible. For a while I though the compiler service might overcome the growing compilation time, but @asterite already exposed some concerns why might not be feasible, at least how things are right now.

A proposal was made for somehow using the existing compiler to make an initial guess for type inference, keep that guess and use it as a hint for a more fast compiler phase. But discarding outdates assumptions due to change in the code will require pretty much the same memory and even a bit more calculations to get it right.

So instead of thinking of all instance variables will need to be annotated, which sounds annoying and not so tempting for sure I now see this proposal as.

  1. A class type is determined from it's own code, not is usage.

  2. The code inside the class is used to determine the type of ivars, using unions etc.

    2.1 I'm not sure why use only the initializer rather than all the methods

    2.2 Initially, as it was explained, only literals assignments, restricted variables assignments and Class.new calls (with non-overrided new) will be used.

  3. If there is no type information inferred, then the programer will need to disambiguate.

I think that more rules will be able to appear probably and less annotations will be required. But is a nice starting point to shift to a modular compilation IMO.

The compiler nowadays needs to infer types for all ivars, despite the fact that we don't explicit them. We will be loosing some cases initially, but not all, not a lot.

Things like

class Person
    property name
    property age
end

won't work as they are working right now. But mainly because the class itself has no behavior than a bag of data. It's less a prototyping experience without those kind of thing, but I would prefer to lose that than seen crystal as toy language.

@oprypin
Contributor
oprypin commented Oct 27, 2015

Array(Object) has me sold. "using any type as a generic argument" is very important and I would like to see it work everywhere like I expect it.

@ozra
Contributor
ozra commented Oct 27, 2015

@asterite - that idea was based on the assumption of a rewrite, just with different goals, where inference can be handled gradually, solved as needed information becomes available from other nodes, not in a "monolithic" type inference phase. My explanation is bad, there's probably a good term for it, but I don't know it B-)

In any event - I will be perfectly happy with the proposed model.

@lbguilherme
Contributor

I'm +1 for this.

I don't think Java is great as a language, but it is great and loved because of the awesome tooling it has. And that's what keeps one productive while using a language, I would love to see Crystal there too. TypeScript is going in the same way, their type inference is local to each function (you have to type each argument, quite annoying), but they have an compiler daemon that makes auto completion and code linting very fast. And being forced to type only the non-local variables to make it possible in Crystal too is a small price to pay IMHO.

The other point is error reporting. One logical error somewhere can cause the compiler to report problems somewhere far far away. Having a huge stack of "what caused X to have type Y" not always helps on bigger projects. This is a problem C++ has been having ever since they added templates. They are now solving it by adding "Concepts", which is, in short, a way to declare interfaces for template variables, kind of static types with duck typing. Crystal is in a good shape here, with simple semantic rules but still needs ways to avoid too long error messages. This changes helps here too.


What about doing the "syntactic" type inference not only from initialize, but from every method defined on the class. Such as:

class Foo
  def initialize
    @foo = nil
  end
  def get
    @foo ||= Bar.new
  end
  def set(x)
    @foo = x
  end
  def stuff
    thing = Thing.new
    @foo = thing if rand > 0.5
  end
end

Here @foo would be deduced as Bar|Nil, without any semantic analysis. Calling set(3) would result into a compile time error because it would be unable to set an Int into a Bar|Nil variable.

It could even get smarter and do a local type analysis inside every function. It would fail to deduce some variables or expressions (and mark them as "Failed"). Then it would me a matter of cleaning up "Failed"s from the final type union and voilΓ . If the final types doesn't fit the global analysis later, it would be an error. With this the final @foo type would be Bar|Thing|Nil.


@bcardiff We could modify the property macro to also allow taking a type specification, like this:

class Person
  property name :: String
end

Would expand to

class Person
  @name :: String
  def name; @name; end
  def name=(x); @name = x; end
end
@samis
samis commented Oct 27, 2015

I think @waterlink's idea of having a dedicated tool would be useful in a situation like this. If one removed the limitations you'd be able to gain the benefits of having the type annotations without the programmer actually needing to type them.

@waterlink
Contributor

Oh I see, that is actually a good point - that it is basically the same
thing.

Best Regards,
Alexey Fedorov,
Sr Ruby, Clojure, Crystal, Golang Developer,
Microservices Backend Engineer,
+49 15757 486 476

On Tue, Oct 27, 2015 at 2:13 PM, Ary Borenszweig notifications@github.com
wrote:

@waterlink https://github.com/waterlink What @mverzilli
https://github.com/mverzilli says is true. Here's a page
http://learnyouahaskell.com/making-our-own-types-and-typeclasses that
shows that.

data Shape = Circle Float Float Float | Rectangle Float Float Float Float

Then you don't have to specify types in functions, but that's also how our
language will work :-)

Putting the compile times aside, there's another issue that will improve:
memory usage. Right now to compile the compiler (40k+ lines of code) it
takes 930MB of memory (and it's not LLVM's fault: just the type inference
phase reaches that memory). The problem is that because everything needs to
be interconnected at all times (because types might change) we can't free
that big mesh. So the bigger the program, the more memory it will consume.
More memory also means more time spent in the GC traversing all that
non-freeable structure.

Imagine a bigger program (at work we have a Rails app with 200k+ lines of
code, and that's excluding the gems it uses). If we just multiply the
current times we get 50s to compile the app and 5GB of memory. Doesn't look
very nice.

With the new approach once we type a method we can discard all that
interconnection because types can no longer change.

Back to the reactive compiler, I think with the new approach we could make
one. With the current approach a background process consuming gigs of ram
doesn't sound very nice or useful.

β€”
Reply to this email directly or view it on GitHub
#1824 (comment).

@bcardiff
Member

@lbguilherme yes, actually the macro property already has that. My concern was in the scenario where the programmer doesn't care, and thinks use whatever you need to compile, based upon the usage. That will be the thing you lose if any kind of modular scalable type inference/compilation is wanted.

And regarding the samples, yes, I bet that is the idea, but maybe adding some of the corner cases analysis in the long term. There is no way to know if the programmer made a mistake by assigning another type wrt the initializer. Either we keep a less conservative and build unions, or we stick to just initialize method analysis.

@waterlink
Contributor

I suppose my concern is: Would it be possible to make typing of ivars
optional?:

  • Generally types of ivars are required to be specified or inferred from
    the body of a class.
  • Special keyword global_infer @stuff, which will force current class to
    re-compile each time and be inferred from its usage.

I would like to still have 2nd option since I have quite a bit of
meta-libraries (spec2, mocks, etc.), which will be entirely broken by this
change otherwise and will be no longer fixable, at least without marginal
change to public interface, because that is the whole point that types are
really unknown until the point user actually uses the library.

Best Regards,
Alexey Fedorov,
Sr Ruby, Clojure, Crystal, Golang Developer,
Microservices Backend Engineer,
+49 15757 486 476

On Tue, Oct 27, 2015 at 8:08 PM, Alex Fedorov waterlink000@gmail.com
wrote:

Oh I see, that is actually a good point - that it is basically the same
thing.

Best Regards,
Alexey Fedorov,
Sr Ruby, Clojure, Crystal, Golang Developer,
Microservices Backend Engineer,
+49 15757 486 476

On Tue, Oct 27, 2015 at 2:13 PM, Ary Borenszweig <notifications@github.com

wrote:

@waterlink https://github.com/waterlink What @mverzilli
https://github.com/mverzilli says is true. Here's a page
http://learnyouahaskell.com/making-our-own-types-and-typeclasses that
shows that.

data Shape = Circle Float Float Float | Rectangle Float Float Float Float

Then you don't have to specify types in functions, but that's also how
our language will work :-)

Putting the compile times aside, there's another issue that will improve:
memory usage. Right now to compile the compiler (40k+ lines of code) it
takes 930MB of memory (and it's not LLVM's fault: just the type inference
phase reaches that memory). The problem is that because everything needs to
be interconnected at all times (because types might change) we can't free
that big mesh. So the bigger the program, the more memory it will consume.
More memory also means more time spent in the GC traversing all that
non-freeable structure.

Imagine a bigger program (at work we have a Rails app with 200k+ lines of
code, and that's excluding the gems it uses). If we just multiply the
current times we get 50s to compile the app and 5GB of memory. Doesn't look
very nice.

With the new approach once we type a method we can discard all that
interconnection because types can no longer change.

Back to the reactive compiler, I think with the new approach we could
make one. With the current approach a background process consuming gigs of
ram doesn't sound very nice or useful.

β€”
Reply to this email directly or view it on GitHub
#1824 (comment)
.

@waterlink
Contributor

AFAIK property, getter and setter already allow type specification.
Am I wrong? (I was using it, actually..)

Best Regards,
Alexey Fedorov,
Sr Ruby, Clojure, Crystal, Golang Developer,
Microservices Backend Engineer,
+49 15757 486 476

On Tue, Oct 27, 2015 at 7:55 PM, Guilherme Bernal notifications@github.com
wrote:

I'm +1 for this.

I don't think Java is great as a language, but it is great and loved
because of the awesome tooling it has. And that's what keeps one productive
while using a language, I would love to see Crystal there too. TypeScript
is going in the same way, their type inference is local to each function
(you have to type each argument, quite annoying), but they have an optional
compiler daemon that makes auto completion and code linting very fast. And
being forced to type only the non-local variables is a small price to pay
IMHO.

The other point is error reporting. One logical error somewhere can cause
the compiler to report problems somewhere far far away. Having a huge stack
of "what caused X to have type Y" not always helps on bigger projects. This
is a problem C++ has been having ever since they added templates. They are
now solving it by adding "Concepts", which is, in short, a way to declare
interfaces for template variables, kind of static types with duck typing.
Crystal is in a good shape here, with simple semantic rules but still needs

ways to avoid too long error messages. This changes helps here too.

What about doing the "syntactic" type inference not only from initialize,
but from every method defined on the class. Such as:

class Foo
def initialize
@foo = nil
end
def get
@foo ||= Bar.new
end
def set(x)
@foo = x
end
def stuff
thing = Thing.new
@foo = thing if rand > 0.5
endend

Here @foo would be deduced as Bar|Nil, without any semantic analysis.
Calling set(3) would result into a compile time error because it would be
unable to set an Int into a Bar|Nil variable.

It could even get smarter and do a local type analysis inside every
function. It would fail to deduce some variables or expressions (and mark
them as "Failed"). Then it would me a matter of cleaning up "Failed"s from
the final type union and voilΓ . If the final types doesn't fit the global
analysis later, it would be an error. With this the final @foo type would

be Bar|Thing|Nil.

@bcardiff https://github.com/bcardiff We could modify the property
macro to also allow taking a type specification, like this:

class Person
property name :: Stringend

Would expand to

class Person
@name :: String
def name; @name; end
def name=(x); @name = x; endend

β€”
Reply to this email directly or view it on GitHub
#1824 (comment).

@waterlink
Contributor

"use whatever you need to compile, based upon the usage" - this is my
concern :)

Best Regards,
Alexey Fedorov,
Sr Ruby, Clojure, Crystal, Golang Developer,
Microservices Backend Engineer,
+49 15757 486 476

On Tue, Oct 27, 2015 at 8:08 PM, Brian J. Cardiff notifications@github.com
wrote:

@lbguilherme https://github.com/lbguilherme yes, actually the macro
property already has that. My concern was in the scenario where the
programmer doesn't care, and thinks use whatever you need to compile,
based upon the usage
. That will be the thing you lose if any kind of
modular scalable type inference/compilation is wanted.

And regarding the samples, yes, I bet that is the idea, but maybe adding
some of the corner cases analysis in the long term. There is no way to know
if the programmer made a mistake by assigning another type wrt the
initializer. Either we keep a less conservative and build unions, or we
stick to just initialize method analysis.

β€”
Reply to this email directly or view it on GitHub
#1824 (comment).

@ozra
Contributor
ozra commented Oct 27, 2015

@waterlink - I don't think types works with property macro - it lets erroneous code through:

class Foo
  property a :: Float64

  def initialize()
    @a = "Monkey Business"
  end
end

f = Foo.new
p f.a # => "Monkey Business"
@ozra
Contributor
ozra commented Oct 27, 2015

(this works [in that it gives the error it should]:)

class Foo
  @a :: Float64
  property a

  def initialize()
    @a = "Monkey Business"
  end
end

f = Foo.new
p f.a
@waterlink
Contributor

property and friends type getter and setter methods, not ivar itself. See: http://carc.in/#/r/kpk

@waterlink
Contributor

^ Which they probably could do at this point.

@asterite
Member

@waterlink Can you give an example of some code that will break (without a solution)?

@waterlink
Contributor

For example this will break: https://github.com/waterlink/mocks.cr/blob/master/src/mocks/registry.cr#L114-L118

class ResultWrapper
  getter result
  def initialize(@result)
  end
end

I just literally can't know its type, since the type is up to user of the library to decide. This class is used for instantiating a generic Hash({ObjectId, Args}, ResultWrapper).

And I remember having such wrappers to avoid problems with generics elsewhere.

I suppose with that change, public interface of the library will have to have some sort of generics, like this:

# now:
allow(something).to receive(action(with, this, arguments)).and_return("stuff")

# after change:
allow(something, String).to receive(action(with, this, arguments)).and_return("stuff")

# or:
allow(something).to receive(String, action(with, this, arguments)).and_return("stuff")

And I still not sure what will be happening under the hood.


EDIT: maybe I am just doing it wrong.. ?

@waterlink
Contributor

Hm, I suppose spec2 is actually in clear for this change - nice, I thought all this magic I did there will break, but looks like it is fine.

@waterlink
Contributor

Another meta-library I have is active_record and apparently it is in clear too - it already actually have pretty strong typing through macros.

Everything else I have is simple and will be totally fine after change.

So, I suppose, I am +1 for the change. Though I would love to see:

  • infer tool,
  • inferring from class body automatically.

And would be very-very nice (not required) to have option to force global inference on some instance variables like I described in previous comment, along the lines of global_infer @stuff.

@waterlink
Contributor

About this infer tool I am talking, I see it like this:

  • it runs global inference for all un-annotated and not-inferrable-from-the-body-of-class ivars and inserts respective annotations into the head of the class.
  • this allows developer to not think about the types at the beginning.
  • when developer actually needs to change the type, he can change annotation, or remove it and run infer tool again.
@waterlink
Contributor

This tool will be actually pretty helpful for converting current codebase(s) of crystal itself and all the libraries out there to the new version, where annotation of ivars is mandatory (or required if not inferrable from class body).

@waterlink
Contributor

If we all agree here to go into this direction, I can actually volunteer to build such a tool while compiler is being worked on to introduce this change..

@sdogruyol
Contributor

@waterlink πŸ‘

@samis
samis commented Oct 27, 2015

@waterlink perhaps have an option to re-infer existing annotations? Other than that πŸ‘ for your idea.

@waterlink
Contributor

@samis If it will re-infer existing annotations - then it is no different from what currently compiler does. It will be slow on big codebases.

@asterite
Member

The tool already exists, it's called crystal tool hierarchy: it prints that type information for all types. We just need to make it put these annotations in existing code (if we wanted to).

For ResultWrapper you can define it as ResultWrapper(T) and specify that @value :: T. That will make @value to be of type Object in the general case, I don't know if it's useful for you.

And no, I don't think we'll have a flag or something like that to omit type annotations and later add them, because the inferring compiler will be broken unless we use the new algorithm (because of the things I'm explaining in previous posts, such as the non-initialized types). Further, having to maintain two different compilers seems like a nightmare.

Come on, it's not such a hard task. The other day we did the exercise of writing some code for the new compiler and adding type annotations as we go, and after finishing writing the algorithm I couldn't even remember when did I put that type annotation. It takes like 1s of your time.

@waterlink
Contributor

Further, having to maintain two different compilers seems like a nightmare.

This is definitely true.

@waterlink
Contributor

For ResultWrapper you can define it as ResultWrapper(T) and specify that @value :: T. That will make @value to be of type Object in the general case, I don't know if it's useful for you.

I can't have {} of Something => ResultWrapper - it complains that generic argument can't be a generic itself (should be an instance).

And I can't have {} of Something => ResultWrapper(T) - because T can be different each time someone call it and even different inside of one program (that is the whole point).

@waterlink
Contributor

Come on, it's not such a hard task. The other day we did the exercise of writing some code for the new compiler and adding type annotations as we go, and after finishing writing the algorithm I couldn't even remember when did I put that type annotation. It takes like 1s of your time.

So putting type takes 1s of your time when you know what the type should be.

If you don't know - you probably resort to generics, but they usually cause you problems. For example, on spec2 I wanted to implement generic matcher expect(...).to be .any_method_call?(with, arguments) including expect(...).to be > 42 and I was unable to do it because of limitations of current generics. Instead I had to change interface a bit to be less pleasant: expect(...).to_be .any_method_call?(with, arguments).

You would say that difference between Expectation(T)#to_be and Expectation(T)#to(matcher), where matcher is anything is not big. I say it is, because what I was trying to do is late binding, which will enable me to create matcher object instance separately from the Expectation - but at this point I can't, because I need a type of @actual, because of how generics work now. So I resorted to changing public interface in this case and giving up on late binding just to have access from matcher to this T from Expectation(T).


Different point: it does take 1s to put type there. Does it take you 1s to maintain it? I don't think so - it usually becomes very tedious task to change all the type unions you have there. Tedious, boring and not fun of course.

@waterlink
Contributor

Adding to the last point: having a way to automate or semi-automate type annotations updates saves you from such boring work. Except for the cases when tool can not figure it out - in this case you of course do it yourself - but this should be rare.

@waterlink
Contributor

@asterite I think I already gave a πŸ‘ on the issue and I am fine with annotating ivars.

Everything else were just my "WANTS". You basically can safely ignore them if they do not resonate with you.

If you want we can move a conversation about ResultWrapper somewhere else. Highly possible that it does have a better solution than it is currently.

@technorama
Contributor

Being forced to specify types breaks duck typing which is one of the main reasons I use crystal. Strict typing breaks mock objects and testing. Will I be forced to define my types as RealType | MockType or use interfaces/abstract types everywhere?

@technorama
Contributor

To speed up the compiler would caching dependencies help? Rather than keeping the entire AST in memory recalculate the dependencies and inferred types when a file changes. This is close to the "use a daemon" idea without the daemon.

@samis
samis commented Oct 27, 2015

I'm not very much a fan of having to specify / figure out if the logic is present for it to be figured out automatically. Just saying. Using a tool would satisfy that though.

@waterlink
Contributor

Another idea to the table: since maintaining 2 compilers is hard (even insane), one could extract global inferring mechanisms into the library and continue maintenance independently. This tool can be based on it and this tool can be independent and optional. And it should not work in 100% of cases, it needs to handle only 9X% of cases to be very useful.

@technorama
Contributor

@waterlink Where would the tool store the type annotations? In the source or somewhere else? Are the type annotations to speed up the compiler or for other uses?

@waterlink
Contributor

Well, the whole conversation not about tool, but about making incremental compilation possible. And annotations are only for instance variables, i.e.:

class Example
  # these will be required
  @stuff :: String
  @more_stuff :: Int32

  # ... methods still duck-typed ...
  def do_something(other_stuff)
     # ...
  end
end

The tool I am talking about will just try to guess types of instance variables and insert corresponding annotations as seen above. This tool is independent and basically a very big project, since it would mean:

  • having almost 2nd version of compiler with full type inference,
  • maintain this version to be compatible (not necessary 100%) with upstream.
@technorama
Contributor

@waterlink Won't putting the annotations in the source screw up version control when using shards, other libraries, git submodule, etc? Use the ResultWrapper example. I assume the wrapper is in the spec2 shard. Are the annotations added by modifying the shards source? How does that work with version control? What about git submodule or similar functions provided by other utilities?

@asterite
Member

Well, maybe the "it takes 1s" was an exaggeration, but it's still a small time. And it's the only place where a variable's type is specified: if it changes, unlike other statically typed languages, you have to change that info in one place. Remember: no type annotations are required in method arguments. Duck typing will still be here (please re-read the original text, the add method).

And I'll say this again: you will be able to reopen a class and redefine the type of an instance variable, so mocks will still be possible.

@waterlink Please provide the example that will break, involving the ResultWrapper, but extracted and reduced from spec2 so we can understand the problem. I'm sure together we will find a solution :-) (you can do it in a separate issue if you want).

@waterlink
Contributor

@technorama This tool helps only with your code in your editor as much as formatter tool does. It will not touch your dependencies. So for this ResultWrapper it won't work, it will allow you as a programmer just not to type annotations, but make process automatic. In cases when you can't have type at all (read: late binding), I think only generics are left here.

@asterite I will reduce an example and create an issue. And it is mocks not spec2 :)

@luislavena
Contributor

@asterite how this change will affect things like this?

https://github.com/luislavena/crystal-beryl/blob/routing-tree/src/beryl/routing/node.cr

Where payload can be anything (or nothing).

Will be forced to decide what payload can or cannot be?

@asterite
Member

@luislavena If payload can be anything I guess it's @payload :: Object? Note that this doesn't work right now but it will work if we do these changes. But... how do you use the payload later?

@luislavena
Contributor

@asterite

Either to replace the payload of a node (when using Tree#add

https://github.com/luislavena/crystal-beryl/blob/routing-tree/src/beryl/routing/tree.cr#L150

And then to map it to Action.call, only if responds to it.

https://github.com/luislavena/crystal-beryl/blob/routing-tree/src/beryl/router.cr#L67-L69

payload can be a Symbol too.

I know doesn't work now, but the suggested changes will allow the usage of generics to cover these scenarios?

@asterite
Member

@luislavena

You will say @payload :: Object.

In your code:

  • you pass it around between Nodes so there are no methods to be checked
  • when you want to invoke a method on the payload you put an if payload.responds_to?(:call), so there the compiler will restrict the type of payload to all objects that respond to that method

In both cases your code will compile and work just fine, given you annotate payload as being an Object.

The problem can happen if you expect payload to respond to some method but you don't do that inside a responds_to? or .is_a?, but it doesn't seem to be your case :-)

@waterlink
Contributor

@asterite

As far as I understand #responds_to? allows one to handle case of incompatible type at runtime. At compilation time all the objects will be allowed. Right?

So the question is: do you see it as a good idea to have responds_to type of type restrictions, so that the user of the library get the failure at compile time? For example:

@payload :: Object : responds_to(:call)

# or not after type annotation, but on initialize or any other method:
def initialize(@payload : responds_to(:call))

I suppose one can use implicit interface for this. We have similar thing - virtual type, but it is not quite the same thing. As far as I understand user will have to inject this "interface" into the inheritance hierarchy of the object.

So this is not going to work:

# Library code:
abstract class Stuff
  abstract def call(more_stuff)
end

class Node
  @payload :: Stuff
  def initialize(@payload)
  end

  def do_something
    # .. use @payload.call(...) here ..
  end
end

# User code:
class MyStuff
  def call(stuff)
    # .. do smth here ..
  end
end

x = Node.new(MyStuff.new)    # => this already will not work
x.do_something

Here is the example: http://carc.in/#/r/kvv

@waterlink
Contributor

Though that can be made very easy with modules - apparently it is possible to have abstract methods in modules and do this:

module Stuff
  abstract def call(some_stuff)
end

User code will still have to include this module:

class MyStuff
  include Stuff

  def call(stuff)
    # .. do smth here ..
  end
end

Working example: http://carc.in/#/r/kw1

This is totally fine if it is a functionality that will be used in actual runtime of the actual program or whatever, like we have include Enumerable, or in web framework one can say include FancyWebFramework::View.

This will be weird for such things that are used only in tests, like mocks. You can of course workaround by re-opening classes in tests.

@asterite
Member

@waterlink I guess you want (or maybe we'd want) something like Go's interfaces: you specify which methods you need, but you don't require an explicit inherit or include in types. Of course, in Go it's much simpler because everything is well typed. In our case we will need to traverse the object hierarchy for each interface and see which ones implement each interface, so later we can know the type of an interface method without you writing the type down. For example:

interface Callable
  # We don't know the return type of this method
  def call(stuff)
end

class Foo
  def call(stuff)
    1
  end
end

class Bar
  def call(stuff)
    'a'
  end
end

Foo and Bar define call(stuff) so they implement the interface and we can deduce the type of Callable#call(stuff) to be Int32 | Char. Although maybe for these cases you'd want to specify types in the interface methods (at least for the return type), not sure.

This idea is not crazy at all. I used "interface" instead of "module" because you need to include a module, but interface would be implemented implicitly. But I don't know how expensive that implicit check can be. We could probably cache that info too and recompute that on changes. Or we can require explicit types in an interface's return type and it becomes super easy.

We can start by using modules and having to explicitly include them. Even an empty module used as a marker module can work, there's no need to define abstract methods.

@waterlink
Contributor

That being said, what if I do not care even about methods of the object I store? For example, following scenario:

  • User of my library gives me his object of whatever type
  • My library stores it until certain point in time in some sort of registry
  • It gives this object back to the user and of course, user should be able to call methods on it, so the type of the thing user receives can't be just Object

All 3 things can really happen on one line of the user's code. Only workaround I see is to ask for a type of an object at the some stage from user, like this:

allow(example).to receive(say("hi", "world") as String).to_return("hi, world")

And do some macro magic in receive to instantiate everything properly.

@waterlink
Contributor

Empty module - that might actually work :)

@asterite
Member

@waterlink You'd still need to explicitly include it in user types so they "belong" to that module hierarchy. Not sure it can be used for your mocks (I still don't fully understand how your mocks work so I can't comment much on that)

@waterlink
Contributor

My next concern is how can I store such things properly?:

For example I have type Message(T), how can I make a registry of type Hash(String, Message(*)), where Message(*) means I want to leave T as a free generic parameter.

Scenario again is the same, user gives me something of type T and each time type can be different, I do not call any methods on it and just need to return it to user. But of course I need to give it to user as original type T not as an object .. (

@waterlink
Contributor

Actually now I see, that types do implement #cast on them. So I suppose if I can somehow store Message(*), I will be able to ask these messages about their type and do message.own_type.cast(message.recorded_return_value) - which will always succeed, but will actually need cast at runtime..

Interesting let me experiment a bit on it a bit later.

@waterlink
Contributor

Next question I have. What happens to ivars defined through method_missing ?

For example I have code like this: https://github.com/waterlink/ton/blob/master/src/entity.cr

Simpler example:

class SimplerExample
  macro figure_out_type(name)
    # ....
  end

  macro method_missing(name, args, block)
    @_{{name.id}} ||= Registry(figure_out_type({{name}})).new
  end
end

class Registry(T)
  # ...
end
@waterlink
Contributor

Would it be possible to annotate new ivars from body of method_missing since I do can figure out their type:

macro method_missing(name, args, block)
  @_{{name.id}} :: Registry(figure_out_type({{name}}))
  # ...
end
@asterite
Member

I don't think ivars will be possible to be introduced via method_missing.

@waterlink
Contributor

Oh, that is unfortunate. This was my only way how I could have stored Hash(String, Message(*)) - by splitting it into different ivars by type of T in method_missing macro.

I have another idea I'm playing around with in carc.in - we will see if I can make it.

@waterlink
Contributor

So I was able to store literally anything by providing well-typed module-interface and generic type that implements that interface: http://carc.in/#/r/kxm

But I end up with types that are unions - which is probably bad for an end user of the library.

On the other hand I am not 100% sure mocks.cr returns exact type, maybe it returns value of type union too - then I think it is basically broken already ...

@waterlink
Contributor

And I suppose it is easily fixable as soon as user, while registering the mock provides strict types for return values - this way I can put them in generated mock class in form of as {{user_provided_return_type}}. Interesting.

I think I have now enough confidence that I will be able to fix mocks.cr after this huge change in compiler with very little change of public interface!

@waterlink
Contributor

And my ton app should be pretty easy to fix, actually, including that big method_missing method. Because:

  • I know type in macro
  • I can store Hash(Any, Any) (or I will be able to store in future Hash(Object, Object)) = store everything
  • I can in generated code in method_missing do as {{type_i_figured_out}}

Therefore I don't need to define ivars in method_missing anymore!

@ozra
Contributor
ozra commented Oct 28, 2015

@asterite, @waj - One thing about the rewrite - could it be possible to support macros executed at different stages? "lexical macro", "typed macro", etc. depending on where in the AST-phases one intends it to operate (I'm thinking more about AST-mutating run-macro here, I guess).

@asterite
Member

@ozra We have that problem with macros too: sometimes you want to run them at different points. I don't know, it's something we'll definitely consider. But we wouldn't like to make the language too complex.

@waterlink
Contributor

Sorry for asking yet another question:

What happens to class variables in that scenario? Currently I can not even annotate them with type: @@name :: String results in parser error unexpected token ::

@waterlink
Contributor

@asterite That is a horrible thing I had to do only because I want to care only about equality and hash of things user passes in to me: waterlink/mocks.cr@d967ed7

Yes, implicit interface like in Go would help here, since I could have just defined interface with 2 methods (== and hash) and spin off everything not wasting debugging and trying to trick compiler and generics to accept my code and do what human can say is an obvious thing (behavior there were really strange while trying including some segfaults..).

PS: Maybe I am doing something horribly wrong, though.

PSS: of course after debugging and understanding how to trick compiler, I can revert to normal names for the interface: waterlink/mocks.cr@259a775

@waterlink
Contributor

Had to use the same pattern just now for other class. It seems pretty useful:

module ExampleInterface
  abstract def downcast
  # .. more abstract methods ..
end

class Example(T)
  include ExampleInterface

  @value :: T
  def initialize(@value : T)
  end

  def downcast
    self
  end

  # .. define more methods from interface ..
end

# To use `@x :: ExampleInterface`, you do `@x.downcast.call_some_method(with, args)`.
@waterlink
Contributor

Maybe it is really cheating and compiler should forbid that? Or other way around: should make it easier?

@asterite
Member

@waterlink You'll have to annotate the types of class vars and globals too. It seems the current compiler doesn't allow that. And I think class vars will be inherited (it will fix #916).

I don't quite understand what you are trying to do with generics and stuff. If something doesn't feel like working well it might be a compiler bug (generics have many bugs right now). You could try to isolate the problem and open a separate issue.

@alex-fedorov
Contributor

@asterite +1 for class vars being annotated too.

I will create detailed issue with small example.

@dimparf
dimparf commented Oct 30, 2015

+1 for class vars being annotated too

@PragTob
Contributor
PragTob commented Oct 31, 2015

Thanks everyone for this fruitful discussion :)

I think it's a good change - I'm a bit afraid that I have to use too many interfaces/modules in production code than to still be ok with stubbing/mocking (save reopening the class) but we'll see. I like the idea of an interface that is implicitly implemented (like in Go).

I also somewhat agree with @jhass that concurrency is a pressing issue. @asterite and @waj probably know best but good concurrency support is a must have for modern languages imo and is what is still missing for me to see Crystal as a full good evolution from Ruby.

Anyhow, still excited to see where this journey goes and I really appreciate the openness and discussions πŸ‘

@will
Contributor
will commented Nov 3, 2015

This is aside from the topic at hand, but since it was brought up I wanted to say that I concur with @jhass on the importance of parallelism.

To be honest, at some point we even thought we could leave concurrency support for a future version and make the language robust for 1.0.

Please do not do this. From a developer marketing point of view alone, this would be disastrous. Most people will hear about the project then, and if it's a new language in year 2016 (or whenever) without stellar parallelism, people will write it off and never come back.

@omninonsense
Contributor

I also agree with @jhass that the language should have a decent parallelism and concurrency support before the 1.0 release.

@sdogruyol
Contributor

πŸ‘ on concurrency support

@ozra
Contributor
ozra commented Nov 4, 2015

I too agree on a good coherent concurrency model before 1.0, and I also agree on compiler rewrite prioritized over concurrency (how's that for concurrency!? ;-) )

@samis
samis commented Nov 4, 2015

So, now that everything is said and done perhaps it is time to write a summary of this thread and the resulting decisions? ( πŸ‘ for a good concurrency model by the way)
I.e will crystal tool hierarchy or something similar be able to infer types and display the inferred types or are we just giving up entirely on that front?

@stakach
stakach commented Nov 7, 2015

Not sure if this is the best place to discuss concurrency however I have been toying around with cross platform concurrency on Ruby for awhile now with https://github.com/cotag/libuv which primarily wraps all IO in promises and supports multiple event loops.

I think there are some really powerful things we could do with IO / threading on crystal.

  1. Dedicated IO threads
    • Spin up a maximum of one per-core
    • Lazy load them with basic thresholds
    • Shared nothing
  2. Have seperate threads for processing
    • Internally managed thread pool (can grow and shrink as required)
    • The initial / main thread is the first of these
    • Optionally allow users to execute code on their own dedicated threads
  3. Promises and Futures for coordination
    • So threads are transparent to the users
    • Code is automatically run in parallel
    • Heavy use of fibers

Effective this would allow you to write code like this:

stream = IO.new
stream.write 'hello'   # => Promise
# Writes are performed on an IO thread. The write itself is scheduled on that thread
# We can use Fibers to prevent further execution until the write has been buffered (not completed) so there is no concurrent access of the variable.
# Reference counting may provide away to avoid a fiber yield here if this is the last reference to the variable or if the data is an object literal or constant etc
stream.write 'world'  # => Promise
resp = stream.recv  # => Promise
output = resp.value
# ---- Call to value is a future. Fiber yields here until response is received
# Once response is received we continue processing on the same thread
puts output

Further to this, if you would like to use multiple threads for processing

stream = IO.new
stream.write('hello').then(:concurrently) do
    # Anything in a callback block could optionally be executed on a different thread
    # (Idle threads watching a work queue)
    # This call to write would only occur after the first write had completed
    stream.write 'world'  # => Promise
end
resp = stream.recv  # => Promise
output = resp.value
# The existing thread 
puts output

Much like Go Lang or .NET async, it would be easy to provide concurrent execution options:

def some_func(arg1)
    'return value'
end

# co keyword, for concurrently, to execute function on another thread
result = co some_func(arg1) # => Promise
puts 'no stopping me'
puts result.value

Just some food for thought.

@sfcgeorge

I'm okay with this.

Terrible code. I like writing magic crazy code, if I understand it I don't care if it's a "bad practice". But I keep that within methodsβ€”and they will be unaffected by this! I don't like crazy APIs / interfaces between distinct chunks of codeβ€”I can see how the magic within a class works, but if there's magic between classes it gets too much to keep in my head.

Beginners. It would be nice if Crystal was just a fast Ruby, but it isn't. In it's current state you already get type errors, and have to deal with unions, and beginners will be put off. I tried porting some Ruby to Crystal and it too hours and that was all type errors and I wanted to claw my brain out. So I don't think requiring types at the top of a class will put off beginners any more because they're already put off Β―_(ツ)_/Β― This change might actually make it easier to learn by simplifying the rules of where types come from, especially for debugging.

Real world. If I have a Person class with a name it makes sense to restrict name to String. It's a neat bundle of data and its types really shouldn't change throughout the program. If name ends up String | Int32 then I've probably made a mistake and didn't want that. It's also helpful for understanding having a list of IVars and their types at the top of the class because inevitably you forget.

Duck tape. It's impressive what the current compiler can do, but it sounds like a complicated maintenance burden with lots of edgecases that could fall apart at any time. Maybe that's hyperbole but we already have a beautiful complicated messβ€”it's called Ruby, and look at the bickering it has caused between MRI and JRuby trying to get things to work consistently. I like the idea that Crystal is like a simpler more performant Ruby. A good chunk of the nice stuff, but with a lot less quirks.

Libraries. I think this is the real pain point. If libraries like @waterlink's with nice APIs become impossible that will be a great shame, and it sounds like it might lead to types having to be passed to library methods in some cases which would not be fun. But it still seems sensible to go ahead with this; then where it turns out to be too limiting, add a language feature (like interface). I wouldn't worry about overcomplicating the language as it's only complicated libraries that would have to use these extra things anyway, client code can stay simple. Make the complexity someone else's problem ;)

@ozra
Contributor
ozra commented Nov 9, 2015

@stakach - concurrency details definitely should go in it's own issue. Could you please add/move your comment to a new issue, to facilitate commenting? +1 for libuv over current libevent2. +1 for share nothing. etc ;-)
You might wanna have a look at #1698. There are further links to other issues of interest there too.

@luislavena
Contributor

@asterite resurfacing this with some questions:

  • This change confirms that will be possible to have, say: Hash(String, Object?), something that right now is not possible and I'm struggling writing this for a small library.
  • How much effort will be required to pursue this?

It is clear that to answer second question, a review of the Roadmap is required:

  • Both compiler internals and concurrency seems (to pretty much all that commented here) important
  • Changes to the compiler will reduce several of the enhancements and changes listed/and tricks required.
  • Appears that concurrency needs to happen pre-1.0, there is a strong interest in that be corrected and improved.

I'm not personally biased or attached to any of these items in the roadmap, however my interest is in relation to timing: when any of those changes will be implemented so libraries can be adapted.

While Crystal continues to evolve, breaking changes will be required less and less required, but is all a matter of coordination. For example, Crystal 0.8 introduced several breaking changes and not all library authors have upgraded their codebase to support it, which leads to usage of forks.

In my personal experience with forked libraries, it has been a real pain to manage over time (and deal with reports).

Looking forward your answers.

Thank you.

@RX14
Contributor
RX14 commented Nov 20, 2015

Does this really need to be mandatory? Can larger projects not get performance benefits by using this without forcing it on smaller projects?

In my opinion this change is ruining one of the major attractions I have to this langauge for a minor performance improvement.

@Perelandric

@RX14 It isn't just about performance. See the last paragraph of the original post and also: #1824 (comment)

I think asterite summed it up well with...

"So it's not just for a faster compiler: it's for a better, more complete and non-broken language."

@dragostis

@asterite I'm totally for this change, but I don't really see why the case when

the initialize is not simple (involves calls but it's not a new call)

would not be inferred. When caching all of these inference results from one compilation to the other, I don't really think it's worth giving up on some inferencing while keeping others, i.e. simple ones. Scala has huge compilation times and it still didn't stop people from using it. I strongly believe that this language has a huge potential but that it just needs to stick to its Ruby roots a bit tighter until it catches a pair of wings of its own.

I'm seeking to be contribute to the project. What I can do is design, branding, some web development, std library implementation etc. I would also be interesting in doing my thesis on something related to the project that's more consistent.

And, by the way, congrats with the project. Easily one of the most exciting in the last few months.

P.S. We could hook up periodic compilation to Atom for larger projects. The inferencing is done whenever you save a file and gets cached until you're actually ready to run/build the project. When you do, everything works fast.

@RX14
Contributor
RX14 commented Nov 21, 2015

I would be a lot less against this if the types could be for example inferred from the constructor, as well as static declarations.

And yes, the reason I use this is because it's very much like compiled ruby, and you almost never need to annotate types, so I'm naturally against annotating types here.

@samis
samis commented Nov 22, 2015

@RX14 In the opening comment @asterite said the following:

After some more brainstorming we think that the above rule can be relaxed in some cases: the compiler will try, for example, to infer instance variables types from the expressions assigned to them in the initialize methods, if these expressions are simple (literals and new calls).

To me, this feels like we're trading away one of the language's strong points and potentially unique elements. I am not personally aware of a language that's so similar to Ruby while compiling to native code. mruby comes close but doesn't go the full way.

@raydf
raydf commented Dec 22, 2015

Is there any thoughts about using dynamically linked libraries in crystal? In a project of 200K lines maybe it would be better to compile smaller libraries individually for rapid iterations and then create a linked big executable before release. Also this would open the benefit of parallel compilation with global type inference.

@elthariel
Contributor

This plan really breaks (for me) the promise of the language and its identity. How is it different from OCaml or other alternatives after that ?

It might also be worth noting that compilation time isn't much of an issue. Just have a look at how long it takes to compile https://github.com/facebook/folly and then imagine the time it takes to compile big projects depending on it. Having a file taking more than 90s to compile is not uncommon and not a blocker on really big projects.
What's important is having the compiler scale on machines with numerous cores. On my machine, building the compiler doesn't seem to use 100% of all my core, which is important for bigger project

I definitely believe that specifying the typing of ivars should be an optimization rather than a requirement. Developer time is much more valuable than compile time. You can throw a bigger machine to make compile faster, you can't do that for engineers.

What about enforcing this rule on module boundaries? If you import a shard, the shard should have a "public interface" with all its types specified. This way you enforce higher quality APIs on shards, define the shard as the incremental compilation unit, allow for binary distribution and respect the initial promise of the language (fast iterative development).

I just discovered and started to contribute to the community recently, so I understand my opinion doesn't weight much, but I beg you to gather data from the community at large about this topic, as it might be a deal breaker for an important amount of us (or not)

@RX14
Contributor
RX14 commented Dec 22, 2015

@elthariel this really sums up my problem with this idea, thanks for articulating what I couldn't.

It's about identity. I feel that this change removes one of the major features that the language has, and takes it back a step.

@benoist
Contributor
benoist commented Dec 22, 2015

Developer time is much more valuable than compile time. You can throw a bigger machine to make compile faster, you can't do that for engineers.

That really depends on the complexity, with exponential complexity you can't just use a bigger machine and expect good results. Developer time is indeed more valuable than compile time, but waiting for compilation is even more expensive. To get feedback fast about the code you write, you'd also need somewhat faster compilation. Programmer happiness to me is not writing the least amount of code, but I do getter happier if I feel I'm spending my time efficiently. Thinking about what types to use might not seem efficient, but it might safe you from regression issues that take up a lot of time.

I totally agree with the importance of concurrency, however the way it will most likely be implemented wouldn't require a major code change for existing projects, introducing the new compiler will. So therefore I'd rather have the major code changes sooner rather than later so we can keep building more stable tools while better concurrency is introduced.

@elthariel
Contributor

Developer time is indeed more valuable than compile time, but waiting for
compilation is even more expensive.

You can read your mail, write tests and documentation, manage tasks/tickets, etc

Thinking about what types to use might not seem efficient, but it might safe you from regression issues that take up a lot of time.

That's a really different debate. You can have the same when debating ruby vs whatever-else

I'd rather have the major code changes sooner rather than later

Agreed. But I think we must explore other directions instead of reverting to what other languages are already doing.

@benoist
Contributor
benoist commented Dec 23, 2015

You can read your mail, write tests and documentation, manage tasks/tickets, etc

Indeed you can, but when I compile my code I just want to know if my tests pass before I write new ones. Reading email and doing other distracting jobs, take away my focus and makes me less efficient. But I suppose thats different for everyone. Having speedy compilation allows you to stay focused, but also allows you to do other tasks.

That's a really different debate. You can have the same when debating ruby vs whatever-else

I think it's exactly this debate but then crystal vs whatever-else

@elthariel
Contributor

But I suppose thats different for everyone

I wasn't taking about when you have the choice, I was talking about when you don't. At companies like google or facebook, you often have to wait a lot of times between your tests. My point was, you can perfectly live with it.

@ozra
Contributor
ozra commented Dec 25, 2015

@elthariel - compilation speed is a major concern of mine - it's one of the stronger reasons I've wanted to get out of C++ for ages, so comparing with a bloated C++-project then is not a good argument in my book.
I also belong to the mental configuration that works on one objective and can't switch to mails, tickets and socializing at a whim.

If types on ivars makes it possible to increase compilation speed I'm all for it - I mean, typing types is kind of reasonable to begin with, and catches a lot of bugs too as a bonus.

@asterite
Member

An update on this: since the new compiler will take a while to develop, specially because the main algorithm changes and we plan to simplify a lot of things, for now we'll try to introduce these changes in the current compiler (for example we just introduced a first pass to declare types, which already fixes several bugs). These might not produce a big performance improvement, but once we have all code working under these new rules we won't be breaking any code when we introduce the new optimized compiler. In short, the sooner we finish defining the language and stop breaking backwards compatibility, the better.

Also, we'll continue working on the concurrency model in parallel (no pun intended ^_^), with the goal of running fibers in multiple threads, and maybe allowing user code to create threads at will.

@ylluminate

@asterite what's the current condition of concurrent distributed application development with Crystal? I have found myself quite enamored with Julia recently, but am missing Crystal.

@theduke
theduke commented Jan 29, 2016

I recently discovered Crystal, and just as a general remark:

BIG support for this.
The point of using a compiled, type safe language is not only speed, but many compile time guarantees that you lack in a dymanic language.

The examples in the documentation where a class instance variable ends up having potentially mutliple types ( tuple(A,B) ) immediately turned me off. Severely.

I could just use Ruby / Python for that, almost all apps can perfectly live with the performance, and just substitue with C extensions when neccessary.

This will make Crystal considerably more appealing to me.

Also: +1 for the 'interface' concept.

@Perelandric

I just want to reiterate my support for this.

When I first learned about Crystal about a year ago, I read the docs and liked all of it until it came to the dynamic class members, at which point I dismissed it. Only came back because I was stuck on Nim bugs I couldn't work around.

Crystal is great and I think will have a much broader appeal if it's tightened up just a little bit. Until the new compiler is complete, I'm using Go, which can be aggravating. I've resorted to building code generators to make the language a little more tolerable.

Looking forward to writing most of my code in Crystal and Pony! πŸ˜ƒ

@theduke
theduke commented Jan 29, 2016

Haha, @Perelandric:

I'm in a very similar spot.
Used Go for half a year but so many limitations (no generics, no proper inheritance or composition, no enums, just to name a few) made me look for something else.

First stumbled upon Nim too, which is a conceptually AMAZING language, but it's so full of bugs in both the language proper and the stdlib, and badly maintained (not the maintainers fault, it's a huge language and they would just need more people or narrow the scope of the language and the stdlib), that it's not usable for any serious work.

And so I stumbled upon Crystal... got high hopes, we'll see.

Sorry for the OT.

@Perelandric

@theduke lol, yep sounds like we're in the same boat!

I especially think proselytizing the Go community will pay off for Crystal big time. Go still has to preach its lack of features as some sort of zen, minimalist philosophy. Works to some degree, but in the end, we just need to get some work done!

@jreinert
Contributor

@asterite very much in favor of this. Would this make writing and using shared, linked crystal libraries possible? If not, is there any other way to make that possible? Maybe by having something similar to lib but the other way around, to expose methods of your library to the global namespace and reduce crystal types to primitive ones.

@asterite
Member

@jreinert Maybe yes, maybe no, the changes don't have that as a goal, but doing it becomes easier, just not trivial.

@dsounded

I just discovered and started to contribute to the community recently, so I understand my opinion doesn't weight much, but I beg you to gather data from the community at large about this topic, as it might be a deal breaker for an important amount of us (or not)

+ 1_000_000_000

@asterite
Member

@elthariel @RX14 @dsounded

As we'll soon merge #2443 and this will finally land, I want to write a few things about why I think this is not such a deal-breaker as many of you think.

First, right now you can't write [] nor {}, the compiler will complain. You have to write [] of Int32, {} of String => Int32, or Set(Int32).new. This is not how you would write it in Ruby, you have to be explicit about types in these cases, yet I don't see anyone complaining about this. Maybe it's because this behaviour was present from the beginning (well, in the first three months of development it wasn't like that, but the language wasn't publicly known), so the behaviour is not annoying or surprising. There's also the thing that you need to write ->(x : Int32) { x + 1} (you can't just write ->(x) { x + 1 }. And if you capture a block you also need to specify types: def (&block : Int32 -> Int32).

Now we are making a change where you need to specify the types of instance variables, and in most cases, as shown in the diffs I mention here the compiler will be able to guess a correct type from all things assigned to them. This almost always will boil down to adding type restrictions in initialize methods.

If you do some stats you will realize that the number of array and hash literals, proc literals and block arguments where you need an explicit type is much bigger than the number of initialize methods you will have in a program, so this change is really a minor one. If the language worked like this from the beginning I don't think there would have been many complaints.

There's a different problem, though: "Yeah, right, now we need global/class/instance vars to have an explicit or easily guessable type, why shouldn't I think that in the future I will need to add type annotations in every method argument and return type?". For starters, if the language changes like that, I will stop using it because then yes, the language would be much more painful to use. So, required types in method arguments and return types will never happen. I'll explain why I know this is true:

Right now the compiler has to have in memory all the code and the analysis it's making, and can't discard intermediate results, because some assignment to a global/class/instance variable might change its type and that can affect a totally unrelated method that uses those vars. In short, the compiler has a tangle of all the code (through bindings) and can't release it. With this change this won't happen anymore: once you analyze a method, its return type can't change, because nothing external to the method (the arguments it uses, the instance vars, other methods, etc.) can change type anymore. So, we can analyze one method and the free the data structures (bindings) needed for that analysis. I believe this will dramatically improve, at least, memory usage (right now to compile the compiler you need at least 1GB ram, which is way too much... imagine a bigger program), and will allow us to do incremental compilation in the future.

But note that the compiler doesn't need to know upfront the types of method arguments and return types, because once they are inferred they can't change. This isn't true if global/class/instance vars types aren't determined upfront.

Does this change means the language becomes some other language, maybe Java, Go, C#, Nim, D, Rust, OCaml, Swift or Pony? I don't think so. There are many interesting things in Crystal:

  • No need to be super explicit about class/global/instance vars in some cases. If you write class Counter; def initialize; @count = 0; end; end; it's understood that @count is Int32, you don't have to write @count : Int32.
  • No need to write the type of method arguments and return types: this makes it easy to extract pieces of a method to another method, and it's also super useful for private methods where types aren't even needed for API-level users.
  • Open classes: you can add and redefine methods, so things like webmock.cr and timecop.cr
  • Multiple dispatch (this is different than just method overloading)
  • Inlined blocks and captured blocks, allowing to add constructs to the language that feel like regular language constructrs (like 5.times { |i| puts i }
  • Type flow anlaysis, meaning that if you do if var, if var.nil?, if !var, if var.is_a?(T), the variable's type is narrowed inside the then and else branches (I don't know if another language does this. I know in Swift for example you have to use a second ariable). You can also do early returns like return unless x (in Swift you have to use guards). Coupled with !, && and ||, it makes the language feel like it understands what you mean without having to add additional syntax.
  • Array, Hash, Range and Regex literals (and I find the $~ variable pretty useful for text processing)
  • Structs and tuples (for example Java doesn't have them)
  • Everything looks like an object (you can reopen Int32 and add methods to it)
  • Macros (I think they are simpler and easier to use and understand than in other languages, although probably less powerful)
  • Type safety, and null doesn't implicitly belong to all reference types (as is the case in Java, C# and Go).
  • Batteries-included standard library, including non-blocking IO, CSV/JSON/YAML/XML, HTTP::Client and Server, OAuth, a simple spec library, etc. This is something that's not directly related to the language's syntax and semantic but I do think it's important as it makes all programs nicely interact with each other using common types and idioms.
  • Compile to native code, inline assembly, access to C libraries, raw pointers, etc., so it's both a low-level and high-level language.
  • Ruby-like syntax. Yes, this is something worth mentioning. We love Ruby and the how its code looks, and we think this is something really nice to have in a language, specially if many constructs translate directly (like return unless x, if x.is_a?(...), etc.).
  • A bootstrapped compiler, so you can contribute to it if you know how to program in Crystal (no need to code in C/C++)
  • (and probably other things I forget right now)

I believe the language will still make sense after this change 😊

@kirbyfan64
Contributor

πŸ‘

@chocolateboy
Contributor

+1 for this change.

I don't know if another language does this.

Kotlin also does it :-)

@asterite
Member

@chocolateboy Thanks for mentioning it. Kotlin is a really nice language that borrows many nice ideas from Ruby too. I think they now have something like blocks where if you return from them you actually return from the method, like in Ruby.

@dsounded

@asterite open classes it's good and bad in the same time, but it's not about this topic. I think, that static type checking is OK for interfaces, but when you write a class which you use only for your own and it hasn't any connection to public interface static type checking is excess. I do appreciate your job fellas, Crystal is only one thing that I really have liked after Ruby. initialize is just another method, the "keyword" method probably, but the method. It will be really strange for me, I'd like to see two ways: the way as it was or the static type checking all around the language(we will need to rewrite a lot), but this will be unify.

Anyway, it's your decision, and as I've seen you won't change it, all of you do extremely good job, it's just situation when we've got different points of view. Basically, IMO you should have done a poll for this.

@elthariel
Contributor

Hi @asteriste and others,

I've been giving this frequent thoughts since this conversation started.

Because of this change and the uncertainty over the future of the language,
I finally decided to write my project in C++. After a few months using this
beast again, which I hadn't practiced much for a few years, as well as
toying with other new languages I realized there's still a lot of value in
Crystal even with this change (C++ templates instantiation traces have this
effect on people I guess).

In the end, this change solves a real problem and consolidate the types
declaration in one place, which will improve the code readability as well
as solving modular compilation amongst other things.

However, despite seeing the benefits, I'm still not a big fan of those
changes and disagree on some parts of the definition of the problem. Let it
be stated that working in Facebook's codebase for more than a year gave me
a different perspective on software development, and that this point of
view might not be ideal for smaller companies or project with limited
resources.

  • Compilation is using 1GB of RAM: I don't care. My computer is here to
    make my life easier. I'll add more RAM if I need to if the value I get out
    of it is worth it. Last time I checked, Chrome could only be compiled on a
    64bit machines as they need more than 4GB of ram. iirc, they build/link
    everything in one call to allow for further optimizations.
  • I'm wondering if there wasn't another approach to this problem using a
    daemon that would perform the type inference in the background while you
    write code and/or cache the results of the type inference. More and more
    tools at Facebook/Google (and I assume other giants) are built using this
    design. I think it'll grow beyond these companies.
  • Finally, and I don't remember if I already talked about this but I would
    have preferred to first think about what is a module for Crystal, and then
    enforce specifying type at module boundary:
    ** You want to have a global, if you want it to be accessed out of the
    module, mark it as exported or something, then define it's type.
    ** You have a class that is exported out of the module: define its public
    prototype clearly.
    ** You're doing magic that is internal to your module: Crystal will take
    the time to infer things types.

That being said, I also appreciate that Crystal is not an R&D only project
and that you aim at production usage. This obviously gives you less time
for intellectual masturbation and makes you focus on getting shit done,
which is another argument in favor of this change.

As a conclusion, I'm probably going to keep thinking about other approaches
to this but I'll definitely give it another spin once this change lands.

On Tue, Apr 12, 2016 at 5:06 PM, Kiril Dokh notifications@github.com
wrote:

@asterite https://github.com/asterite open classes it's good and bad in
the same time, but it's not about this topic. I think, that static type
checking is OK for interfaces, but when you write a class which you use
only for your own and it hasn't any connection to public interface static
type checking is excess. I do appreciate your job fellas, Crystal is only
one thing that I really have liked after Ruby. initialize is just another
method, the "keyword" method probably, but the method. It will be really
strange for me, I'd like to see two ways: they way as it was or the static
type checking all around the language(we will need to rewrite a lot), but
this will be unify.

Anyway, it's your decision, and as I've seen you won't change it, all of
do extremely well job, it's just situation when we've got different points
of view. Basically, IMO you should have done a poll for this.

β€”
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#1824 (comment)

@kirbyfan64
Contributor

@elthariel I personally feel it's also faster compile times, which are always important!

@elthariel
Contributor

Good caching also helps for this

On Tue, Apr 12, 2016 at 10:00 PM, Ryan Gonzalez notifications@github.com
wrote:

@elthariel https://github.com/elthariel I personally feel it's also
faster compile times, which are always important!

β€”
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#1824 (comment)

@kirbyfan64
Contributor

Which was one of @asterite's initial points; it's easier to maintain a cache with the types of instance variables known ahead of time.

@asterite
Member

We actually thought about a deamon and caching stuff from the previous compilation, but with the current global type inference it's really hard to do and we weren't sure that it would be efficient in the end. In the way we are currently going, code becomes much easier to understand but also the compiler becomes simpler to implement, which might get more people into developing it and fixing bugs, optimizing it or implementing new features. Getting into a daemon compiler, computing AST deltas and caching stuff is definitely harder to do and understand (well, we might do caching and deltas, but it'll be much easier this way).

I also feel (and see, based on the diffs) that this isn't the big change I was fearing, and it wouldn't have bothered me at all specifying all those types in the initialize methods if it had been like that from the beginning. Remember, it's usually one, and otherwise a few initialize methods that will need type annotations, compared to many other methods a class has. For example Set has only one initialize, 337 lines of code, and in this case we didn't have to explicitly specify a type. Another beast is the compiler's Lexer, which actually just needs two type annotations (the others there are mostly informative), in a file that has 2473 lines of code.

And some error messages will improve and be easier to understand, because a type won't be able to be instantiated with incorrect arguments. And the compiler will (probably) be faster and consume less memory.

So yes, we had many options to choose, but we think that this is the best of them.

@asterite
Member

I'm closing this, as all the logic related to this task is already present in master. The compiler's code will need a cleanup to get rid of old code that is no longer necessary (or at least not necessary in the way it currently is), and of course performance improvements could be applied, but that's a separate issue.

@asterite asterite closed this Apr 13, 2016
@mrkaspa
mrkaspa commented Apr 21, 2016 edited

I would rather to specify types in method signatures, the compilation will be faster and the types works as documentation, and handle the "duck typing" using something similar to the Go Interfaces or allowing to define abstract methods in the module, something similar to Scala traits.

@pannous
pannous commented Oct 15, 2016

Shall we open a new issue to track the progress of the very important incremental build feature?

@asterite
Member

@pannous If you want yes. However, I don't think it will happen anytime soon as it's a really huge task and there are more important things right now (parallelism, std)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment