-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Created a test for sorting interleaved inputs. #3817
Conversation
The general heuristic is to greedily add lines, with constraint-checking having highest priority, then derived values, and parsing only when the first two cannot be done. I also chose to sort constraint-checks and derived values based on number of dependencies, so that `a < 10` will always take place before `a < b`, if `a` and `b` are both defined.
It will be a few days at least before I get back to working on The first file I added was my idea of how it should go, based on getExecOrder. It is perhaps a bit overcomplicated, but I personally really like the results. It uses enough heuristics that the code it generates is fairly close to what a human would write. @balacij asked if I could try simplifying the algorithm by putting all of the expressions in one data structure. I wasn't sure how this would work, but I created the second file as a test. The second algorithm is simpler, but because it is straight topological sort based on dependencies, it is pretty haphazard with its ordering. It only cares about producing code that will run, not whether that code is interleaved. Are the more complex data structures in the first example unwarranted? I've been trying to think how we could modify the second algorithm to produce code as idiomatic as that produced by the first, but I don't think we can without being able to distinguish between types of statements (parsing, constraint-checks, derived inputs). |
By "straight topological sorting" you probably mean just using the dependency information - which is indeed sufficient to get something that will compile. I think it is also possible to create a more subtle "less than" relation that takes into account the various kinds of things being expressed, which is what you did in a more ad hoc way in the longer code. So think of the "kind of data" you have (I think you split things into 4 kinds?) as itself a piece of information on which you want to key your ordering. |
I created a new datatype to store the kind of statement, and used it and the number of dependencies to naively sort the statements by the heuristics before running topological sort. Once that is done, we can just repeatedly choose the first element that has all of its dependencies met. This should actually produce slightly better results (though they're the same for these tests), and is hopefully a bit more concise and readable.
Thanks for your guidance @JacquesCarette. It took me a bit to understand what you were getting at, but I think I do now. I updated The main change I made was to naively sort the inputs based on the heuristics first, and then start topological sort. This allows the topological sort to focus only on resolving dependencies rather than the double-duty it previously had of resolving dependencies and applying heuristics. Is this more in line with how it should look? Are there other changes I should make? Let me know what you think 🙂. |
I was originally thinking of doing both of those things at the same time - but there is no need to do that. Your 2-pass approach is simpler. |
I created an idealized example of how I think sorting the input statements should go for interleaved inputs. It takes (1) parsing statements, (2) constraint-checks, and (3) derived values, each with the values they define and/or the values they require (as applicable), and sorts them as described below.
The general heuristic is to greedily add lines, with constraint-checking having highest priority, then derived values, and parsing only when the first two cannot be done. I also chose to sort constraint-checks and derived values based on number of dependencies, so that
a < 10
will always take place beforea < b
, ifa
andb
are both defined.You can run it by opening it in
ghci
and callingrunTests
. I'll post the current output below as well, with some added annotations.Contributes to #3781