-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The next step #1824
Comments
I think that compilation times are still great and in no way a pain point of the current language. A far greater benefit and boost could be achieved by focusing on concurrency and parallelism idioms as well as the threaded scheduler. The longer that's postponed, the more standard library code will need to be checked against thread safety. I think the time you can allocate would be extremely well spent on that issue instead and benefit the current community tremendously more. I'm not sure I can find the words to emphasize this strongly enough really. I'm truly convinced this is the wrong priority here, sorry. But that's just my two cents, I can only give my opinion and don't dictate you anything of course. |
For my part... 👍 and it has little to do with compile times. I'm a fan of having more explicit type requirements in general. Because I know going in that Crystal isn't exactly the way I'd want it in some areas, it would be silly of me to complain about it. I like it as a language either way so I'm willing to go with the flow but this would be very welcome from my perspective. |
I forgot to mention that this change isn't only to improve compile times. Consider this: class Foo
def foo
1
end
end
class Bar < Foo
def initialize(@foo)
end
def foo
@foo
end
end
a = [] of Foo
a << Foo.new
puts typeof(a[0].foo) #=> ??? The compiler doesn't know the type of So it's not just for a faster compiler: it's for a better, more complete and non-broken language. @jhass As I said, it's not that we'll only be working on this new compiler: the standard library will continue to grow and we'll fix bugs as they appear. |
And as I said I'm convinced it's wrong prioritization at this point. I'd rather give this issue more time and act quickly on the concurrency issue, than acting on this quickly with "easy" solutions through more explicitness. |
I don't see concurrency as a problem right now. It's also a priority for us to have proper concurrency support for the language but the language is still usable. Also, this can be done in parallel, as I don't think adding concurrency support will affect most of the standard library code or even the compiler itself (probably not affected at all). On the other hand, our concern right now is to make sure that the language can scale with projects of any size. We believe is our responsibility to make it happen so we don't hit a big wall in the future and leaving this (big) growing community disappointed. To be honest, at some point we even thought we could leave concurrency support for a future version and make the language robust for 1.0. Probably not gonna happen this way after all, but that's how much transversal problem I think it is. |
@asterite in your example, I would rather see a compiler error telling me that I'm using Perhaps a different approach can be taken? incremental compilation/analysis? If I'm not mistaken Rust have this in their RFCs (not yet prioritized, even they are post-1.0 era) |
@luislavena You will get a compile-time error saying something like Rust requires type annotations in types and methods, and that's why they'll be able to eventually do incremental compilation. We are aiming for the same goal and the minimum requirement is to have types (and their instance variables) well defined before we start typing the whole program. |
I'm not fan of more explicitness and increased number of required type annotations. Is it not possible to let the user decide if they want to make the trade-off with regards to compile times? I would not like to see this language become a modified clone of some other static/dynamic type language with different paint (i.e syntax.) As for concurrency, I don't have an opinion about that. |
I'm not a type person, so having to specify more types is daunting... at first. Yet, thinking about it, it may be a good thing to restrict the struct/class variable types as proposed here. I just hope we won't see more and more changes in the language to have to specify types more and more and more 😨 The current compiler is acceptably fast on a decent computer for small applications (eg: Shards), but I acknowledge it slows down quickly when the number of types grows up to a mid-size application (or with a heavyweight framework). If it could be lightning fast, and we could transparently recompile the app live on each file save, and this only requires to specify the type of hazy struct/class variables... then I guess I can live with it. I'd love to see parallelism being worked on in parallel (pun intended) as @jhass suggested. But let's have another, dedicated, issue to discuss this. |
I don't think this would help that much. IMO, Crystal still builds itself faster than building a C++ application half as long and is even faster than Nim, so there's not really much to complain about in that field. |
Would this also make it possible to create a REPL? |
@samis There are many issues if we make this optional:
@ysbaddaden We are pretty sure this is the last change regarding types. After years of developing the compiler and writing code in Crystal the major pain point is the lack of a fixed type for instance vars: methods sometimes can't be typed (like the example above) and you need some hacks here and there. @kirbyfan64 Last time I compiled Nim it took 2.5s (or 5s?) to compile itself. That's including the codegen. We are at 10s on the same machine, but I think Nim's compiler is much more complex (for example it has a VM in it). We also don't want to use C++ as an excuse, such as in "Oh, yes, it takes 20s to compile, but I'm sure it takes a couple of minutes in C++" :-) @benoist It will definitely make creating a REPL much easier (but still not trivial). It will also improve times for tools for IDEs, so we could have autocomplete and jump-to-definition tools that are fast. I forgot to say that we'll develop this in parallel and it will be an experiment: if everything goes well and we see an improvement as a whole, we'll take the change. Otherwise we'll remain with the current language. |
@asterite Huh? Nim caches C source files to avoid re-generating them. Try deleting I don't mind this as an experiment to see how it turns out, but I personally don't really like the idea. There have to be better ways to improve speed... :O |
I've been juggling around ideas in my head about incremental compilation in Crystal, and all my ideas has ended up with a compiler daemon, that keeps the inferred AST in mem. When compile command is used it simply signals the daemon for the proj dir, it checks which files are changed, lex, parse, transform their nodes, update all nodes that require changes, and re-infers only what's needed. But that's just a dreamy thought - haven't reasoned about the devilous details yet. Typing data members sounds like a really good minimum requirement to me. Being sure about data structures' types is foundational imo - and not much work. The types visually go well there since it's already a more formal construct. And being relieved of typing methods is a must-have, which is nice to hear is part of the planned re-write. The idea of compiling types in two steps sounds good. The Nim compiler has two such phases - though only operating on one "types-collection" at a time, which is a bit limiting imo. I think prioritizing this before concurrency is good, since it's so fundamental to the progress of the language. Concurrency is an important aspect, but having a correct foundation is imperative. |
@ozra we had a similar idea. We even named it internally as the "reactive compiler" because it would react to changes and rebuild the minimal difference. The problem with the current design is that the type inference is very tight with the order of parsing. Another goal of the new design is detach as much as possible the order of declarations without affecting the semantics drastically. Then the information used for caching could be stored on disk and reloaded each time, or just live in memory within the compiler daemon for the project. |
My 2 coins into this discussion: What about formatting (or other, like WDYT? Best Regards, On Mon, Oct 26, 2015 at 11:11 PM, Oscar Campbell notifications@github.com
|
I will try to explain why it's impossible to cache a previous compilation's result without fixing the type of instance variables. Take this code: class Foo
def initialize
@x = 1
end
def x=(@x)
end
end
def method(foo)
foo.x
end
foo = Foo.new
method(foo) The current compiler (after analyzing the whole program) will type Foo's @x as Int32, and create an instance of def method(foo :: Foo) :: Int32
foo.x
end I use Let's try to cache this method somehow. We can see it depends on In turn, Now we change the program to this: # This is the same as before except some added lines at the end
class Foo
def initialize
@x = 1
end
def x=(@x)
end
end
def method(foo)
foo.x
end
foo = Foo.new
method(foo)
# Now comes the added code
def problem(foo)
foo.x = 'a'
end
problem(foo) Can we still use the previous compilation's result? Let's see: did By knowing these types beforehand the compiler can analyze method instantiations without having to do whole program analysis. But if you propose a way to do it without these type annotations, we'll definitely do it the way we are doing it right now. But... it seems not possible. I don't consider the above a mathematical proof, but I tried to be more or less formal :-) |
Have anybody looked how type inference works in GHC (Glasgow Haskell Best Regards, On Tue, Oct 27, 2015 at 3:31 AM, Ary Borenszweig notifications@github.com
|
I also think that current compile times are really OK (excluding For me the main point of Crystal is:
If we have more type checks ( we already have for generics and that's OK somehow) that'd be really detractive for me and type disliking developers. |
@sdogruyol - with release, the bulk of the time is spent in LLVM optimizer, and we can't make that magically go away, but I think we're both ok with that :) @waj - that's really interesting to hear! The vague notion in my head was that every AST node should have a flag for which passes it has performed / are needed, and for the entire transformation and typing to be gradual. This would require a bunch of passes over the tree (the idea was to support "two-way inference" also) until all nodes are flagged as final. And as soon as code is changed - their references to sources must be compared to see which are dropped, what new nodes come in, which taint others (the dependency graph for type inference would likely have to be extended beyond just that, such as dependency on source-file [line diffing might be taking it a bit far] [still - this is available already via name/line/col]), and re-flag for inference, etc. And once again, these are just the ideas bubbling in the back of the head - and as always, when starting to examine the details - there are always devils you didn't want to think about ;-) Even though I'm all for typing data-structures (aka members of classes), and there are strong indications from studies showing that typed code is both more stable, and faster to develop in the long run - I know many want the vibe of "never having to type", and I guess, it's also a great selling point - even though it's not the greatest pattern in a big application. But catering to one off solutions, script-style code all the way to full size apps is neat. And in the prototypal stage types can really "get in the way", even though they're a must have in the more complex scenario. So I totally understand why many urge for a solution that keeps the possibility of "complete inference" (that sounded a bit strange). So, usage wise, it could be seen from two perspectives:
But then we come to the technical aspect, which @asterite has elaborated on:
The reactive compiler concept (I use the concept name of @waj and @asterite from here on) is really cool. I implemented something akin to it for pre-JS some years ago - it's not fully comparable of course. But the idea is:
All this being said - it is quite an enormous endeavour. As it is now, lexing and parsing takes basically no time, type inference takes a great chunk, and then surprisingly, figuring out which IR has to be compiled to objects take quiet some time. And then a great chunk is spent by LLVM of course (a significant chunk of time if I spent an entire evening trying to optimize the simple last phase in the compiler, with all kinds of models of CRC'in/hashing instead of generating llvm-binary encoded files as comparison, etc, but merely got time down insignificantly, finally deciding to not waste time on it thinking about the dream scenario of the reactive compiler. Further pros with the reactive model is that not even IR would have to be re-generated until the branch is "dirty". Iff one manages to correlate 'source-code -> dependant nodes -> (other dependant nodes) -> llvm-IR'. Well, it's a big if... I've also had the run-macros in the back of my head - in above mentioned, hypothetical solution, they could be wrapped in formalia code making them (distinct) daemons too. This way macros, as long as they're unchanged could run as daemons receiving and returning mutated code through Well - this was a mouthful. [ed: fixed typo] |
@waterlink Haskell forces you to be really explicit with the types of data structures, that's probably one of the reasons they got all those features (plus 30 years of research and development, of course :)). So in a way what the guys here are proposing is closer to Haskell than the current language specs. |
@waterlink What @mverzilli says is true. Here's a page that shows that. data Shape = Circle Float Float Float | Rectangle Float Float Float Float Then you don't have to specify types in functions, but that's also how our language will work :-) Putting the compile times aside, there's another issue that will improve: memory usage. Right now to compile the compiler (40k+ lines of code) it takes 930MB of memory (and it's not LLVM's fault: just the type inference phase reaches that memory). The problem is that because everything needs to be interconnected at all times (because types might change) we can't free that big mesh. So the bigger the program, the more memory it will consume. More memory also means more time spent in the GC traversing all that non-freeable structure. Imagine a bigger program (at work we have a Rails app with 200k+ lines of code, and that's excluding the gems it uses). If we just multiply the current times we get 50s to compile the app and 5GB of memory. Doesn't look very nice. With the new approach once we type a method we can discard all that interconnection because types can no longer change. Back to the reactive compiler, I think with the new approach we could make one. With the current approach a background process consuming gigs of ram doesn't sound very nice or useful. |
One idea for the inference cases like #1809 is of course that if a type is not instantiated - it could be ditched from all type unions - since it will not affect program, and hence it's "could-be" extension of method-types is a non matter. That's just one part of it all though. |
@ozra The problem is: how do you find out that a type is never instantiated before starting to type the whole program? |
WHAT IF... the type of a member variable could not change from the I think there are other ways to achieve speed. Again referencing Mypy, it type-checks itself rather quickly, and it's written in Python, which is definitely NOT the fastest language on earth! |
I like to keep the typing annotations as minimum as possible. For a while I though the compiler service might overcome the growing compilation time, but @asterite already exposed some concerns why might not be feasible, at least how things are right now. A proposal was made for somehow using the existing compiler to make an initial guess for type inference, keep that guess and use it as a hint for a more fast compiler phase. But discarding outdates assumptions due to change in the code will require pretty much the same memory and even a bit more calculations to get it right. So instead of thinking of all instance variables will need to be annotated, which sounds annoying and not so tempting for sure I now see this proposal as.
I think that more rules will be able to appear probably and less annotations will be required. But is a nice starting point to shift to a modular compilation IMO. The compiler nowadays needs to infer types for all ivars, despite the fact that we don't explicit them. We will be loosing some cases initially, but not all, not a lot. Things like class Person
property name
property age
end won't work as they are working right now. But mainly because the class itself has no behavior than a bag of data. It's less a prototyping experience without those kind of thing, but I would prefer to lose that than seen crystal as toy language. |
|
@asterite - that idea was based on the assumption of a rewrite, just with different goals, where inference can be handled gradually, solved as needed information becomes available from other nodes, not in a "monolithic" type inference phase. My explanation is bad, there's probably a good term for it, but I don't know it B-) In any event - I will be perfectly happy with the proposed model. |
An update on this: since the new compiler will take a while to develop, specially because the main algorithm changes and we plan to simplify a lot of things, for now we'll try to introduce these changes in the current compiler (for example we just introduced a first pass to declare types, which already fixes several bugs). These might not produce a big performance improvement, but once we have all code working under these new rules we won't be breaking any code when we introduce the new optimized compiler. In short, the sooner we finish defining the language and stop breaking backwards compatibility, the better. Also, we'll continue working on the concurrency model in parallel (no pun intended ^_^), with the goal of running fibers in multiple threads, and maybe allowing user code to create threads at will. |
@asterite what's the current condition of concurrent distributed application development with Crystal? I have found myself quite enamored with Julia recently, but am missing Crystal. |
I recently discovered Crystal, and just as a general remark: BIG support for this. The examples in the documentation where a class instance variable ends up having potentially mutliple types ( tuple(A,B) ) immediately turned me off. Severely. I could just use Ruby / Python for that, almost all apps can perfectly live with the performance, and just substitue with C extensions when neccessary. This will make Crystal considerably more appealing to me. Also: +1 for the 'interface' concept. |
I just want to reiterate my support for this. When I first learned about Crystal about a year ago, I read the docs and liked all of it until it came to the dynamic class members, at which point I dismissed it. Only came back because I was stuck on Nim bugs I couldn't work around. Crystal is great and I think will have a much broader appeal if it's tightened up just a little bit. Until the new compiler is complete, I'm using Go, which can be aggravating. I've resorted to building code generators to make the language a little more tolerable. Looking forward to writing most of my code in Crystal and Pony! 😃 |
Haha, @Perelandric: I'm in a very similar spot. First stumbled upon Nim too, which is a conceptually AMAZING language, but it's so full of bugs in both the language proper and the stdlib, and badly maintained (not the maintainers fault, it's a huge language and they would just need more people or narrow the scope of the language and the stdlib), that it's not usable for any serious work. And so I stumbled upon Crystal... got high hopes, we'll see. Sorry for the OT. |
@theduke lol, yep sounds like we're in the same boat! I especially think proselytizing the Go community will pay off for Crystal big time. Go still has to preach its lack of features as some sort of zen, minimalist philosophy. Works to some degree, but in the end, we just need to get some work done! |
@asterite very much in favor of this. Would this make writing and using shared, linked crystal libraries possible? If not, is there any other way to make that possible? Maybe by having something similar to |
@jreinert Maybe yes, maybe no, the changes don't have that as a goal, but doing it becomes easier, just not trivial. |
|
As we'll soon merge #2443 and this will finally land, I want to write a few things about why I think this is not such a deal-breaker as many of you think. First, right now you can't write Now we are making a change where you need to specify the types of instance variables, and in most cases, as shown in the diffs I mention here the compiler will be able to guess a correct type from all things assigned to them. This almost always will boil down to adding type restrictions in If you do some stats you will realize that the number of array and hash literals, proc literals and block arguments where you need an explicit type is much bigger than the number of There's a different problem, though: "Yeah, right, now we need global/class/instance vars to have an explicit or easily guessable type, why shouldn't I think that in the future I will need to add type annotations in every method argument and return type?". For starters, if the language changes like that, I will stop using it because then yes, the language would be much more painful to use. So, required types in method arguments and return types will never happen. I'll explain why I know this is true: Right now the compiler has to have in memory all the code and the analysis it's making, and can't discard intermediate results, because some assignment to a global/class/instance variable might change its type and that can affect a totally unrelated method that uses those vars. In short, the compiler has a tangle of all the code (through bindings) and can't release it. With this change this won't happen anymore: once you analyze a method, its return type can't change, because nothing external to the method (the arguments it uses, the instance vars, other methods, etc.) can change type anymore. So, we can analyze one method and the free the data structures (bindings) needed for that analysis. I believe this will dramatically improve, at least, memory usage (right now to compile the compiler you need at least 1GB ram, which is way too much... imagine a bigger program), and will allow us to do incremental compilation in the future. But note that the compiler doesn't need to know upfront the types of method arguments and return types, because once they are inferred they can't change. This isn't true if global/class/instance vars types aren't determined upfront. Does this change means the language becomes some other language, maybe Java, Go, C#, Nim, D, Rust, OCaml, Swift or Pony? I don't think so. There are many interesting things in Crystal:
I believe the language will still make sense after this change 😊 |
👍 |
+1 for this change.
Kotlin also does it :-) |
@chocolateboy Thanks for mentioning it. Kotlin is a really nice language that borrows many nice ideas from Ruby too. I think they now have something like blocks where if you |
@asterite Anyway, it's your decision, and as I've seen you won't change it, all of you do extremely good job, it's just situation when we've got different points of view. Basically, IMO you should have done a poll for this. |
Hi @asteriste and others, I've been giving this frequent thoughts since this conversation started. Because of this change and the uncertainty over the future of the language, In the end, this change solves a real problem and consolidate the types However, despite seeing the benefits, I'm still not a big fan of those
That being said, I also appreciate that Crystal is not an R&D only project As a conclusion, I'm probably going to keep thinking about other approaches On Tue, Apr 12, 2016 at 5:06 PM, Kiril Dokh notifications@github.com
|
@elthariel I personally feel it's also faster compile times, which are always important! |
Good caching also helps for this On Tue, Apr 12, 2016 at 10:00 PM, Ryan Gonzalez notifications@github.com
|
Which was one of @asterite's initial points; it's easier to maintain a cache with the types of instance variables known ahead of time. |
We actually thought about a deamon and caching stuff from the previous compilation, but with the current global type inference it's really hard to do and we weren't sure that it would be efficient in the end. In the way we are currently going, code becomes much easier to understand but also the compiler becomes simpler to implement, which might get more people into developing it and fixing bugs, optimizing it or implementing new features. Getting into a daemon compiler, computing AST deltas and caching stuff is definitely harder to do and understand (well, we might do caching and deltas, but it'll be much easier this way). I also feel (and see, based on the diffs) that this isn't the big change I was fearing, and it wouldn't have bothered me at all specifying all those types in the And some error messages will improve and be easier to understand, because a type won't be able to be instantiated with incorrect arguments. And the compiler will (probably) be faster and consume less memory. So yes, we had many options to choose, but we think that this is the best of them. |
I'm closing this, as all the logic related to this task is already present in master. The compiler's code will need a cleanup to get rid of old code that is no longer necessary (or at least not necessary in the way it currently is), and of course performance improvements could be applied, but that's a separate issue. |
I would rather to specify types in method signatures, the compilation will be faster and the types works as documentation, and handle the "duck typing" using something similar to the Go Interfaces or allowing to define abstract methods in the module, something similar to Scala traits. |
Shall we open a new issue to track the progress of the very important incremental build feature? |
@pannous If you want yes. However, I don't think it will happen anytime soon as it's a really huge task and there are more important things right now (parallelism, std) |
An always present worry that we have with the language is that it won't be useful for large projects. Since the global type inference algorithm always has to start from scratch, it's good if we find a way to do incremental or partial compilation: some way to reuse a previous compilation's result for the next one. That means that the first time you checkout a project it might take a few seconds to compile, but the next compiles should be fast, given small changes are made to the codebase.
We sat down and thought how it can be done with the current language, and we think (or at least we are strongly convinced) that there's no efficient way to do this. The language might need a change.
We thought of the minimum delta that could make this work. The conclusion is not a happy one, but it's also not the worse one could imagine:
After some more brainstorming we think that the above rule can be relaxed in some cases: the compiler will try, for example, to infer instance variables types from the expressions assigned to them in the
initialize
methods, if these expressions are simple (literals andnew
calls). An example of where this simplification can be applied is theCSV::Lexer
class from the standard library:Another rule is defining a type restriciton of an initialize argument. For example:
But note that if in the above cases an instance variable is assigned another type in other methods, it will immediately be an error: if you want a union, define the instance variable type as such.
In other cases you'll inevitably have to write down the types:
The current compiler works by taking into account all assignments to instance variables in the whole program. This has many (bad) consequences:
The second point is the most important. If we want Crystal to be used for real, large projects, it's paramount that the compiler can handle such projects and that you don't have to wait 30 or 60 seconds between compiles. We prefer making this change rather than not doing it now and later get suck with a language that is useful for "mostly toy" projects.
Because this change has a huge impact on the overall algorithm of the compiler, we decided to take this opportunity to also rewrite the compiler "from scratch". This means: we'll rewrite it taking into account the current code and specs, but we won't reuse most of the code we have (well, the lexer and parser will be reused, the formatter, as well as some other bits). The current code, even though it's kind of efficient and more or less organized, has some flaws that are hard to fix without a rewrite (for example, we'd like to make an initial pass to gather all types so that we know which types have subclasses and so are "virtual"). And we'll do it taking into account the current bugs (which are about 45 and are more or less related to the same compiler/language flaws).
The good news is that we'll make sure that this new compiler is perfectly commented and documented: you'll be able to jump right into its source code and all phases and algorithms will be explained (promise!). Right now this is far from this case :-)
Let's summarise the cons and pros of this change:
Pros:
to_s
on each of them, for example. Because the compiler will know all types before the "main" code, virtual method lookups (that involve search a type's subclasses recursively) will be faster and can be cached because the type hierarchy will be fixed.Cons:
Remember that types won't be mandatory in method arguments or return types, so duck typing still holds. This method:
has no type annotations and works on any type that defines a
#+(other)
method, so we belive Crystal is still unique and doesn't suddendly become an existing language.Because the rewrite will take some time (and also because this next month we'll be busy with some specific work/life matters) you'll have to be patient. In the meantime the standard library can be grown and existing issues will be fixed unless they can only be fixed in this new compiler rewrite (some examples of issues we'll fix in the new compiler are #456, #718, #729, #846, #867, #916, #941, #962, #1346). After the rewrite you can use
crystal tool hierarchy
to port your code to the new version, or we might provide a migration tool.The text was updated successfully, but these errors were encountered: