Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Created a test for sorting interleaved inputs. #3817

Merged
merged 3 commits into from
Jul 7, 2024
Merged

Conversation

B-rando1
Copy link
Collaborator

@B-rando1 B-rando1 commented Jun 24, 2024

I created an idealized example of how I think sorting the input statements should go for interleaved inputs. It takes (1) parsing statements, (2) constraint-checks, and (3) derived values, each with the values they define and/or the values they require (as applicable), and sorts them as described below.

The general heuristic is to greedily add lines, with constraint-checking having highest priority, then derived values, and parsing only when the first two cannot be done. I also chose to sort constraint-checks and derived values based on number of dependencies, so that a < 10 will always take place before a < b, if a and b are both defined.

You can run it by opening it in ghci and calling runTests. I'll post the current output below as well, with some added annotations.

ghci> runTests
Test:     -- First test
a = 1     -- parsing
a > 0     -- constraint-check
c = a / 2 -- derived value
c < 100   -- constraint-check
b = 2     -- parsing
a > 2 * b -- constraint-check

Test:  -- Second test
a = 1  -- parsing
b = 2  -- parsing
b < 30 -- constraint-check
a > b  -- constraint-check - goes after `b < 30` because it involves both `a` and `b`

Contributes to #3781

The general heuristic is to greedily add lines, with constraint-checking
having highest priority, then derived values, and parsing only when the
first two cannot be done.  I also chose to sort constraint-checks and
derived values based on number of dependencies, so that `a < 10` will
always take place before `a < b`, if `a` and `b` are both defined.
@B-rando1 B-rando1 requested review from smiths and balacij June 24, 2024 14:11
@B-rando1 B-rando1 marked this pull request as draft June 24, 2024 18:38
JacquesCarette
JacquesCarette previously approved these changes Jun 29, 2024
@B-rando1 B-rando1 marked this pull request as ready for review July 2, 2024 13:14
@B-rando1
Copy link
Collaborator Author

B-rando1 commented Jul 2, 2024

It will be a few days at least before I get back to working on Choices, but I will need a bit more direction as to what algorithm I should use when I get there.

The first file I added was my idea of how it should go, based on getExecOrder. It is perhaps a bit overcomplicated, but I personally really like the results. It uses enough heuristics that the code it generates is fairly close to what a human would write.

@balacij asked if I could try simplifying the algorithm by putting all of the expressions in one data structure. I wasn't sure how this would work, but I created the second file as a test. The second algorithm is simpler, but because it is straight topological sort based on dependencies, it is pretty haphazard with its ordering. It only cares about producing code that will run, not whether that code is interleaved.

Are the more complex data structures in the first example unwarranted? I've been trying to think how we could modify the second algorithm to produce code as idiomatic as that produced by the first, but I don't think we can without being able to distinguish between types of statements (parsing, constraint-checks, derived inputs).

@JacquesCarette
Copy link
Owner

By "straight topological sorting" you probably mean just using the dependency information - which is indeed sufficient to get something that will compile.

I think it is also possible to create a more subtle "less than" relation that takes into account the various kinds of things being expressed, which is what you did in a more ad hoc way in the longer code. So think of the "kind of data" you have (I think you split things into 4 kinds?) as itself a piece of information on which you want to key your ordering.

I created a new datatype to store the kind of statement, and used it and
the number of dependencies to naively sort the statements by the
heuristics before running topological sort.  Once that is done, we can
just repeatedly choose the first element that has all of its
dependencies met.  This should actually produce slightly better results
(though they're the same for these tests), and is hopefully a bit more
concise and readable.
@B-rando1
Copy link
Collaborator Author

B-rando1 commented Jul 3, 2024

Thanks for your guidance @JacquesCarette. It took me a bit to understand what you were getting at, but I think I do now. I updated sampleInterleaved.hs, and it should be much more concise now, and likely faster as well.

The main change I made was to naively sort the inputs based on the heuristics first, and then start topological sort. This allows the topological sort to focus only on resolving dependencies rather than the double-duty it previously had of resolving dependencies and applying heuristics.

Is this more in line with how it should look? Are there other changes I should make? Let me know what you think 🙂.

@JacquesCarette
Copy link
Owner

I was originally thinking of doing both of those things at the same time - but there is no need to do that. Your 2-pass approach is simpler.

@JacquesCarette JacquesCarette merged commit 7370fe5 into main Jul 7, 2024
@JacquesCarette JacquesCarette deleted the SampleSorting branch July 7, 2024 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants