New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release v3.5.1 #1929
Closed
Release v3.5.1 #1929
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#1864) * Get the node_map and full_nodes simultaneously in JIT for all backends * Use std::array instead of std::vector for the children
* Fix pthreads linking error when linking with lapacke * Add pthreads always
* Fix max allowable window size in af_unwrap As padding is added a both sides of a dimension, the max allowable window size should be dim_size + 2 * padding
Indexing operations leaked when chained(i.e. arr.rows(10, 20).cols(1, 4). This was happening because the array_proxy object's member functions created an array pointer when indexing operations were performed. This array was not freed when the indexing operation was evaluated on conversion back to af::array. * Cleanup and document new variable in array_proxy_impl
This solves the issue when sparse blas is called with a JIT'd array
Windows terminates threads before the queue threads and other resources are released. This causes deadlocks with the condition_variables in the async_queue objects. This is a bug in Visual Studio/Windows that is documented here: https://connect.microsoft.com/VisualStudio/feedback/details/747145 This will leak some resources but these resources will be released by the operating system on exit.
Remove pre-3.0-compute checks as we don't support 2.0 compute capability anymore
Cleanup mean overflow changes * Use vectors instead of unique_ptr * Remove the creation of Param objects. Instead use createArray * Rename mops.cl -> mean_ops.cl * Formatting changes
This commit implements a workaround for a Apple bug in their Iris OpenCL driver where clEnqueueWriteBuffer fails when you pass in static C arrays. This change fixes canny on OSX.
* Refactored some of the tests * Changed the names of LargeDim to MaxDim to keep inline with other MaxDim tests for easier filtering * Added comments about failures on OSX * Fixed a few warnings
On Windows the resources that are released after the main function have exited cause "Pure Virtual Function Called" errors. It seems that Windows releases all resources when exiting main without calling their destructors. When the destructors are called this error is thrown. This is related to #1899
- CUDA is still slower than OpenCL on same device - At large sizes, OpenCL is 1.3x faster instead of 2x. - Some optimizations not included in OpenCL because it hurts performance.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
v3.5.1
The source code with submodules can be downloaded directly from the following
link: http://arrayfire.com/arrayfire_source/arrayfire-full-3.5.1.tar.bz2
Installer CUDA Version: 8.0 (Required)
Installer OpenCL Version: 1.2 (Minimum)
Improvements
af::unwrap()
function's arguments. 1af::histogram()
on CUDA andOpenCL kernels. 1
Performance
Bug fixes
af::matmul()
which occured when its RHS argument was anindexed vector. 1
af::replace
so that it is now copy-on-write. 1Windows. 1 2
clEnqueueReadBuffer
bug on OSX.1
Build
Misc
[skip arrayfire ci]