Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some kind of problem with name clashes in defunctionalisation #1174

Closed
athas opened this issue Oct 31, 2020 · 0 comments
Closed

Some kind of problem with name clashes in defunctionalisation #1174

athas opened this issue Oct 31, 2020 · 0 comments

Comments

@athas
Copy link
Member

athas commented Oct 31, 2020

This program gives the wrong results, because the closure record mixes up the different sizes:

-- ==                                                                                                                                                                                                                                                         
-- input { 0i64 }                                                                                                                                                                                                                                             
-- output { [1i64] [2i64, 2i64] }                                                                                                                                                                                                                             

let delaylength [x] (arr: [x]i32) (y: i64) = length arr

let main x =
  let f = delaylength [x]
  let g = delaylength [x,x]
  let (f', g') = id (f, g)
  in (f' 1, g' 2)

The handling of sizes in the defunctionaliser has been dubious for a long time. I wonder if we can hack around this one, or if we really have to fix it now.

@athas athas closed this as completed in b4deb90 Nov 1, 2020
athas added a commit that referenced this issue Nov 12, 2020
* More Steve-friendly.

* Fix usage text.

* This is more type-safe.

* Detect more complex invariant loop parameters. (#1111)

Closes #1110.

* Detect bad entry points names early.

* Oops.

* Clearer error message.

* Style fix.

* Fix expected error.

* Fix #1112.

* Handle another obscure tiling case.

* Parallelise tiling across entry points.

* Test that this tiles.

* Remove dead reference.

* Use cache-oblivious transpose on CPU. (#1113)

This is a good bit faster in many cases, but a bit slower for small
arrays.  Maybe we can special case those later.

* This linker flag is needed.

* CUDA backend can now be called from multiple threads. (#1114)

Closes #1077.

* More principled use of phantom-typed PrimExp.

* Clean up implementation.

* Generalise the optimisation of concatenations. (#1116)

* Generalise the optimisation of concatenations.

* Add small hack to avoid (or delay...) code explosion.

* Use const pointers in array creation functions.

* Avoid some unused-parameter warnings.

* Avoid more untyped operations.

* Ignore parameters in a smarter way.

* Style fix.

* Use style-check.sh in precommit hook.

* Introduce typed variables in code generator (#1119)

These will help us keep our types straight, hopefully.

* Fix synchronisation bug.

* delete gitattributes

* Fix this description.

* Remove instance with nonobvious type.

* futhark-benchmarks: bump

* Do not tolerate warnings in benchmarks.

* Permit array indexing with any integer type. (#1123)

Also changes the rules for warning about type defaulting, so that only
types that propagate to the top-level binding are warned about.
Otherwise we would get an ambiguity warning for every instance of
`x[0]`.

Closes #1122.

* Ban unsigned ranges. (#1125)

This has various implications, such as removing u8.iota and similar.

The point is to simplify size handling in preparation for 64-bit
sizes, and the optimisations they will need.

* Better loop simplification for non-i32 loops.

* Fix #1126. (#1127)

* Clarify some type restrictions.

* Actually enforce this restriction.

* Relax these constraints.

* An allocation is a priori considered for hoisting.

* Remove certificates on safe statements.

* Revert "Remove certificates on safe statements."

This reverts commit 98db1fd.

Turns out this broke user-provided assertions.

* Remove certificates on some allocations.

* Build and upload nightly tarball on macOS. (#1130)

* Fix typo.

* Fix the typo again.

* Better opencl commands (#1131)

* Add --list-devices flag to opencl executables

This convenience feature lists the current devices and platforms on the system,
showing what to choose between using the `-p` and `-d` flags.

* Add --help command and usage in C-like compiled futhark programs

* Add --help command to man and usage pages

* Shorter help messages

* Add changelog entry

* Also run these with oclgrind.

* Hack around the local memory problem.

* futhark-benchmarks: bump

* This is 0.17.2.

* Onwards!

* Releases must be on master.

* Revert "Releases must be on master."

This reverts commit 4e02f90.

As usual, CI services have terrible documentation with no semantics.

* Try to restrict releases to master, again.

* Fix #1133.

* Better work queue in pmapIO.

* Fix prelude doc link (#1136)

* Fix NaN comparisons yet again.

* Fix action name and description.

* futhark-benchmarks: bump

* Fix a sum type corner case.

I hoped this would also fix #1139, but it did not.

* Add missing case for sum types.

* Add Bifunctor instance to TypeBase.

* Slightly more information in this internal error message.

* Consistently use the variable-free type here.

* Fix #1139.

* Better context information when type-checking If.

* Fix #1142.

* Fix #1143.

* Remember to zero-initialise this.

* This is 0.17.3.

* Onwards!

* Merge multicore back-end into master (#1146)

* Use strong compare_exchange

* Fixes for CAS SegHist

* don't start new task while in nested case

* Add name id to subtask struct

* Only take time if MCPROFILE is defined

* Make us of CAS swap too

* Use a faster rand num gen

* XXX

* We need direct execution to avoid too much overhead

* Bug fix for segHist

* Optimize code based on number of subtasks created

* Add name to subtask

* Check code body for possible imbalance

* Only generate 1 subtask when no free workers

* Support 64 bit integer CAS seghist

* Refactor some code

* Allocate cached intermediate arrays on stack

* This should not be a pointer

* Pass string for easier identification of generated code

* Choose histogram implementation based on condition

* Use lock-free deque

* Decide on sequential execution based on number of free workers

* This shoudl not be a pointer

* Automatik granularity of dynamic scheduled tasks

* Override tid on steal

* This should be statically scheduling

* Remove now unused Code

* I don't need this anymore

* Uses own tid when subtask is chunkable

* Need this on Linux

* This should be up here

* Optimize segscan when op is on scalar values

* Remove unused code

* Add name identifier to task

* Start work on using timing

* Remove debug prints

* Consistently extract allocations

* Implement heartbeat style timing

* Improve timing

* Pass along the physical thread id

* Need to check for errors

* Use a Global variable for threads to exit

* Dont' use the global var anyways

* Add tuning program

* Make use of kappa for dynamic scheduling

* Clean up code a bit

* Only create enough task based minimum task overhead

* Clean up tuning program

* Implement a dynamic scheduling algorithm

* Try to steal from "main" queue first

* Reduce code duplication

* Need to break on succesful steal

* More clean-up

* Remove unused stuff

* Remove unused code

* Better comments

* We don't use this in these cases

* I missed one

* More refinement

* Fix potential race condition

* Make tuning use seperat threads

* Hack for avoid unsolvable deadlock

* Forgot to commit this

* This estimate is slightly better

* This should be zero

* This should be smaller or less than 1

* Measure timings inside of function bodies

* Remove redundant parenthesis

* CLean up timing program

* Remove redundant initialization

* Hack for avoiding division zero in segreduce-iota

* Revert "Hack for avoiding division zero in segreduce-iota"

This reverts commit c313ca9.

* Merge master into multicore.

* Do not perform flattening in multicore pipeline.

Instead we depend on sequentialisation to generate efficient code.

* Revert "Merge master into multicore."

* Revert "Revert "Merge master into multicore.""

* Use 64-bit for intermediate indicies

* Beter naming for chunks

* Only use 64-bit in cases we compute product of dimensions

* Make variables more readable

* Update tuning program to follow similar approach to heartbeat

* Lets try this automatically  process

* Use simpler stop condition

* Use a more appropriate default value

* Add a AtomicXchg operation

* I forgot an _n

* Lets try to old queue again

* Implement half work-stealing

* Steal from the front

* Adapt tuning to new queue

* Disable auto tuning

* Change ints to int64_t

* This is not a float

* Let random number be unsigned

* Add some debug flags for later

* This looks prettier

* Measure time properly for nested parallelism

* This needs a fence now

* Measure time

* Modify to tuning again

* Modify tuning program

* This should finally work

* Prettier functions

* Shoudl initialize this

* Oops

* Remove unused field

* This should be int64_t

* Clean up

* Setup for easy switching between queues

* Better error handling

* Fix typo

* Do not naively lift allocations out of loops.

* Call this a parloop

* Renaming

* Clean up

* Use a higher res clock (if avaliable)

* Update tuning program too

* Stolen tasks are executed immediately

* Use threshold to select between segHist versions

* No need to measure sequential runs

* Show in us instead of ns

* Use half-work stealing with chaselev deque too

* Remove debug statements

* This should be int64_t

* Wake up threads when there is work with chaselev

* Need a fence here, just in case

* Steal from queue 0 first, else try random queue

* That was dumb

* Don't try to steal if there is no active work

* Use jobqueue again

* Accidently swapped these

* This should based on number of subhistos

* Cast these to int64

* Simplify seghist

* FIx compile error

* try to use local variables when possible (WIP)

* Just hack with shape for now

* Don't run sequentially

* Fix potential deadlock

* Fix for more reduce cases

* Let's try to use nested ops too

* Fix for missing variable declaration for nested op

* Let's just avoid stack allocations

* Ok let's not

Revert "Let's just avoid stack allocations"

This reverts commit 9e80aa3.

* Only wake up threads when using the nested function

* Use the actual number of subtasks created to decide if task should be sequential

* Revert "Use the actual number of subtasks created to decide if task should be sequential"

This reverts commit 6961e61.

* Revert "Only wake up threads when using the nested function"

This reverts commit db8c704.

* Add field to wake up threads

* Need to load this

* Oops

* Clean up

* Use exact same process as in paper

* Use hardware cycle counter

* Remember to convert to ns

* I hope this works on linux

* Remove void

* Not consistent to use cycle counters

* Apply Ormolu.

* Reduce duplication.

* Test multicore on CI.

* Cleanup.

* Restore nice Dev module.

* These tests are now in the attributes/ subdir.

* Run CI on multicore branch, maybe.

* Strangle some warnings.

* Clean up code

* prettier clamping of number of subtasks

* Simplify

* Remove dead code

* Clean up more

* Avoid deadlocking in case of errors

* Add comments

* Simplify

* CLean up and add more comments

* Properly measure time working by each thread

* Only output thread usage if profiling

* I forgot this one

* Remove dead code

* Fix function args for deque destroy for chase-lev

* Only used nested function when number we don't have enough work

* Vectorise SegHist operators.

* Also do double-buffering in multicore backend.

* Measure time for sequential execution too

* Give multicore -P option a description.

* Eliminate DeclareStackMem.

* Improve SegRed with vectorised operators.

* Include possible allocation in prebody too

* Remove unused variable.

* Sequentialisation of histograms in multicore pipeline.

* Avoid division zero like this instead

* Small fixes

* rename task_fn -> segop_fn

* Avoid using a shared accumulator array for scan

* Remove unused code

* Rename task to segop

* Use task-local small histograms.

* Clean up code generation a bit

* Remove chase-lev deque from multicore

use multicore-deque instead

* more cleanup

* More clean-up

* No need for this anymore.

* Apply Ormolu.

* futhark-benchmarks: bump

* Strangle some warnings.

* Propagate errors in multicore backend.

* Make the current thread the first worker when entering entry point.

* Wait for all subtasks to finish before propagating error.

* Duc says it is better to free last.

Co-authored-by: Troels Henriksen <athas@sigkill.dk>

* Strangle more warnings.

* Manpage for multicore backend.

* multicore does not work on Windows.

* Remove unused code.

* Make all sizes of type `i64`. (#1124)

This has wide-ranging implications for the types of things in the prelude:

* Functions like `replicate` and `iota` now take `i64` arguments.

* The `from_fraction` function now takes `i64`.

* The `to_i32` function has been removed.

Closes #134.

* futhark-benchmarks: bump

* These are 64-bit.

* futhark-benchmarks: bump

* Move a division out of the histogram kernel.

* Move more 64-bit divisions out of histogram kernels.

* Use $GITHUB_PATH instead of add-path (#1151)

The `add-path` method of adding stuff to $PATH has been deprecated.

More info:

https://github.blog/changelog/2020-10-01-github-actions-deprecating-set-env-and-add-path-commands/
https://docs.github.com/en/free-pro-team@latest/actions/reference/workflow-commands-for-github-actions#environment-files
https://docs.github.com/en/free-pro-team@latest/actions/reference/workflow-commands-for-github-actions#adding-a-system-path

* Fix input type.

* Ask for binary output when testing.

* Let's do both because apparently Ubuntu is inconsistent. (#1152)

* This is 0.18.1.

* Onwards!

* The macOS build must be done before we can deploy.

* Fix a 32-bit leftover.

* Document overloading subtlety (#1153).

* Clean up README more.

* Also run pyopencl backend on GA. (#1154)

* Report warnings even when type errors occur.

Closes #1153.

Closes #1155.

* Eliminate more 32-bit artifacts from code generation.

Most importantly, this lets the multicore backend handle more than
2**31-1 iterations per task.

* SizeOf is a 64-bit expression.

* Another 32/64-bit fix.

* Add some documentation for Tools (#1161)

* Add some documentation for tools

* Fix rendering issues

* No header guards in embedded RTS code.

* Move scheduler_common.h into scheduler.h.

* One global variable down (#1157).

* Make kappa scheduler-local.

* Rework self-tuning a little bit, does not appear to work.

* adding tests for matrix multiplication with reg tiling, should be run with cuda or opencl versions, but cosmin does not know how to specify that inside the source file

* Combine scheduler.h and scheduler_tune.h and fix kappa-tuning.

* Silence warning about potential uninitialised variable.

* Benchmarks do not belong in the test suite.

* Further cleanup in scheduler implementation.

* These should be static.

* Combine all multicore headers into scheduler.h.

* Centralise scheduler initialisation.

* Stop using mutable global variables in multicore scheduler.

We still use a single thread-local variable to find the worker struct
for a thread, but that is harmless, as it does not prevent multiple
contexts from co-existing (they will have their own threads).

Closes #1157.

* Close #1162.

* Fix error message generation for multicore backend.

* Fix another 32-bit leftover.

* Restore newline after warnings.

* Fix exit code on bugs and limitations.

* Also do variable substitution inside Ops.

* Look properly for variant allocations deep in kernels.

* Look across loops when spelunking for parallelism.

* Implement partial tiling. (#1163)

Closes #1145.

* Print newline after warnings.

* Switch to newer Nixpkgs and GHC. (#1166)

* Fix RST syntax error.

* Use the newest version of 'versions'. (#1165)

* Fix multicore histograms with empty inputs.

* Not a bug.

* More descriptive internal names.

* Fix #1168.

* Fix #1169.

It's a bit ad-hoc that we just lock here.  This should use the
criticalSection abstraction, but that's internal to GenericC.  This is
good enough for now, but if we ever do more complex entry/exit
operations, this will need refactoring.

* futhark dataset now more accepting of piping into something dead.

* Freeing an opaque is a critical section (#1169).

* This does not need arguments.

* Better type error for #1171.

* Fix #1173.

* Fix #1174.

* This was hard, so it deserves a mention.

* Eliminate fishy instances.

* Use dedicated datatype for pattern literals.

* Fix #1134. (#1178)

Our counterexamples for missing matches are now slightly worse, but at
least we detect them properly (I hope!).

* Fix #1177.

* This is 0.18.2.

* Onwards!

* Polish some docs.

* Fix #1180.

* Add error handling for bad file paths (#1181)

* Add error handling for bad file paths

* Catch error, instead of check

* Fix toctou issue

* datacmp: simplify error handling.

* Oops, avoid deadlock.

* This is 0.18.3.

* Onwards!

* Fix style violation

Co-authored-by: Philip Lassen <philiplassen+git@gmail.com>
Co-authored-by: Philip Munksgaard <philip@munksgaard.me>
Co-authored-by: Ryan Huang <NPN@users.noreply.github.com>
Co-authored-by: Minh Duc Tran <minhtran1391@gmail.com>
Co-authored-by: Cosmin Oancea <cosmin.oancea@diku.dk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant