Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v3.5.1 #1929

Closed
wants to merge 55 commits into from
Closed

Release v3.5.1 #1929

wants to merge 55 commits into from

Conversation

mlloreda
Copy link
Member

v3.5.1

The source code with submodules can be downloaded directly from the following
link: http://arrayfire.com/arrayfire_source/arrayfire-full-3.5.1.tar.bz2

Installer CUDA Version: 8.0 (Required)
Installer OpenCL Version: 1.2 (Minimum)

Improvements

  • Relaxed af::unwrap() function's arguments. 1
  • Changed behavior of `af::array::allocated() to specify memory allocated. 1
  • Removed restriction on the nubmer of bins for af::histogram() on CUDA and
    OpenCL kernels. 1

Performance

  • Improved JIT performance. 1
  • Improved CPU element-wise operation performance. 1
  • Improved regions performance using texture objects. 1

Bug fixes

  • Fixed overflow issues in mean. 1
  • Fixed memory leak when chaining indexing operations. 1
  • Fixed bug in array assignment when using an empty array to index. 1
  • Fixed bug with af::matmul() which occured when its RHS argument was an
    indexed vector. 1
  • Fixed bug deadlock bug when sparse array was used with a JIT Array. 1
  • Fixed pixel tests for FAST kernels. 1
  • Fixed af::replace so that it is now copy-on-write. 1
  • Fixed launch configuration issues in CUDA JIT. 1
  • Fixed segfaults and "Pure Virtual Call" error warnings when exiting on
    Windows. 1 2
  • Workaround for clEnqueueReadBuffer bug on OSX.
    1

Build

  • Fixed issues when compiling with GCC 7.1. 1 2
  • Eliminated unnecessary Boost dependency from CPU and CUDA backends. 1

Misc

  • Updated support links to point to Slack instead of Gitter. 1

[skip arrayfire ci]

9prady9 and others added 30 commits June 26, 2017 13:30
#1864)

* Get the node_map and full_nodes simultaneously in JIT for all backends

* Use std::array instead of std::vector for the children
* Fix pthreads linking error when linking with lapacke

* Add pthreads always
* Fix max allowable window size in af_unwrap

As padding is added a both sides of a dimension, the max allowable window size
should be dim_size + 2 * padding
Indexing operations leaked when chained(i.e. arr.rows(10, 20).cols(1, 4).
This was happening because the array_proxy object's member functions created
an array pointer when indexing operations were performed. This array was
not freed when the indexing operation was evaluated on conversion back to
af::array.

* Cleanup and document new variable in array_proxy_impl
This solves the issue when sparse blas is called with a JIT'd array
Windows terminates threads before the queue threads and other resources are
released. This causes deadlocks with the condition_variables in the async_queue
objects. This is a bug in Visual Studio/Windows that is documented here:

https://connect.microsoft.com/VisualStudio/feedback/details/747145

This will leak some resources but these resources will be released by the
operating system on exit.
syurkevi and others added 25 commits August 25, 2017 00:15
Remove pre-3.0-compute checks as we don't support 2.0 compute
capability anymore
Cleanup mean overflow changes

* Use vectors instead of unique_ptr
* Remove the creation of Param objects. Instead use createArray
* Rename mops.cl -> mean_ops.cl
* Formatting changes
This commit implements a workaround for a Apple bug in their Iris OpenCL driver
where clEnqueueWriteBuffer fails when you pass in static C arrays. This change
fixes canny on OSX.
* Refactored some of the tests
* Changed the names of LargeDim to MaxDim to keep inline with other MaxDim tests
for easier filtering
* Added comments about failures on OSX
* Fixed a few warnings
On Windows the resources that are released after the main function have exited
cause "Pure Virtual Function Called" errors. It seems that Windows releases all
resources when exiting main without calling their destructors. When the
destructors are called this error is thrown. This is related to
#1899
- CUDA is still slower than OpenCL on same device
- At large sizes, OpenCL is 1.3x faster instead of 2x.
- Some optimizations not included in OpenCL because it hurts performance.
@mlloreda mlloreda added this to the v3.5.1 milestone Sep 15, 2017
@mlloreda mlloreda closed this Sep 15, 2017
@pavanky pavanky removed this from the v3.5.1 milestone Sep 16, 2017
@pavanky pavanky removed the release label Sep 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants