Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R119 cuda ops again #7117

Merged
merged 101 commits into from Feb 6, 2019

Conversation

Projects
None yet
@raver119
Copy link

commented Feb 6, 2019

Merge master to r119_cuda_ops, with CUDA issues resolved

shyrma and others added some commits Jan 2, 2019

Shyrma fix (#6930)
* provide intrinsic gnu gcc function for casting float -> float16

* f16c support for avx2 builds

* meh

* another var name

* check whether correct functions are coalled while float <--> float16 cast
Add explicit dependency on libnd4j in nd4j-backend-impls (#6921)
* Add explicit dependency on libnd4j in nd4j-backend-impls

* Move libnd4j dependency into backends themselves and use appropriate classifier
Update config-gpu-cpu.md (#6926)
* Update config-gpu-cpu.md

Added cuda version list

* Update config-gpu-cpu.md
[WIP] No more axis (#6902)
Axis overhaul
Few more fixes (#6944)
* broadcastable pow

* one range test

* few more range tests

* listdiff now accepts all int types for indices + test

* or/xor/and tests
[WIP] SameDiff fixes, exec overhaul, control dependencies etc (#6816)
* More blocking out

* Session updates

* More tweaks

* Split up sessions; SameDiff method deprecations in preparation for changes

* More session implementation

* Misc

* Sessions - better logging, cleanup

* First steps for proper execution in session...

* Fix merge op use outside of import

* Next steps

* Import variable control dependencies

* More fixes

* Start on execution frames

* More frame/iteration support

* A ton more implementation; while loops almost work

* More fixes

* Cleanup + exit op

* While loop now works (other than maybe unsafe output array reuse...)

* Cleanup, javadoc, polish

* Fixes

* All arrays are now allocated in InferenceSession not SameDiff class

* Fix session loop execution: allocate new arrays for loops to enable backprop

* Start cleaning up SameDiff class fields

* Transition more fields to new structure

* More cleanup of SameDiff class fields

* Clean up fields, change some signatures, force dtypes in creator methods, break a bunch of stuff in the process

* Stuff compiles again

* Remove variableNameToArr map

* More cleanup

* Integrate new execution sessions; clean up some old exec methods

* Debugging

* A ton of fixes

* Multiple FlatBuffers mapping fixes

* Add more FlatBuffers fixes

* More fixes

* Change VariableType to VarType to avoid name clash

* Change VariableType to VarType to avoid name clash

* Regenerate with flatc 1.9.0

* Restore old VariableType.h inadvertantly overwritten

* Scalar op flatbuffers fixes

* More scalar flatbuffers mapping fixes

* More fixes; reimplement backprop grad fn creation to remove topological order assumption

* Cleanup

* Greedy datatype inference pt1

* First round of output dtype method implementations

* First set of gradient checks finally passing again

* Fixes, and output datatypes for many ops

* Add cast op methods + other fixes

* Next round of fixes

* Multiple fixes

* More datatype methods

* A whole lot more flatbuffers mapping fixes

* More datatype calculation implementations

* Next round of fixes

* Even more fixes...

* Next round of fixes

* More fixes

* More fixes, remove old workspace from samediff (to go into inference session)

* Random op fixes

* More fixes

* A bunch more fixes, conv etc

* More fixes

* More fixes

* More fixes

* Result array datatype fixes for a bunch of BaseNDArray ops

* More flatbuffers mapping fixes

* More fixes

* Another set of fixes

* Fix handling of repeated variable inputs to ops

* Multiple fixes for scalar ops (types, flatbuffers mapping)

* Handful of additional fixes

* Import fixes

* ArgMin/Max fixes; fix skipping of non-imported nodes during import

* More TF import fixes

* Session fixes

* Import fixes, consolidate 2 more fields

* Op type calculation fixes

* Op validation datatype message improvements

* More op dtype fixes, no more ninja array creation in base op getters

* Fixes for reduction op exec with dimension arg

* toVector error message improvements

* Fix import datatype inference

* Op fixes

* Log a more useful exception for native output shape calculation issues

* Reduction shape fix, resolve before exec; fix concat resolving

* Name fix for properties

* More op and import fixes

* Fixes for variables datatypes for TF import

* Update ops.proto; change allowable datatypes for unique with counts

* Fixes for output shape calc

* Clean up and significantly simplify op output shape calculation

* A bunch more fixes

* Fix issue with wrong opnum in NativeOpExecutioner

* More op fixes

* TF import fixes, inference session allow long axis arg, ArrayOptionsHelper fix

* Fix BatchNorm

* Fix NDArrayStrings for String arrays + improve tests

* Import TF String arrays as UTF8 INDArrays

* Assertion and control dependency fixes

* More op fixes

* Minor test update

* Small fixes

* Small fixes most rebase

* Various compilation fixes

* A bunch of post merge fixes

* FlatBuffers mapping fix; fix Nd4j.createFromArray methods

* Reduce3 fixes

* Fix Or/And/Xor execution

* Loss function and other fixes

* Segment op dtype fixes

* Small fix for sum with keep dims
Fix PlayUIServer.detach() to allow sequential attach-detach of StatsS…
…torage (#6950)

* Fix PlayUIServer.detach(), add test for sequentially attaching and detaching StatsStorage

* fix missing ND4J backend for TestPlayUI

* remove unused imports
Fixes for failing DL4J/ND4J tests (#6954)
* Make Reshape.doDiff more robust/reliable

* Multiple dl4j/nd4j fixes

* Fixes for unsorted segment ops

* More test fixes

* More fixes

* Small SameDiff fix
[WIP] Couple of fixes (#6953)
* few casts

* output buffer nullification is optional now

* copypasta fix

* tan fix
FlatBuffers upgrade (#6956)
* small test update

* fb update to 1.10
fix: dl4j RepeatVector (#6945)
* detatch after repeat

* typo

* dont detach

* rem redund line
Shugeo zeta (#6957)
* Added a pair tests with zeta ops.

* one more test for zeta

* Fixed java-related zeta op bug.

* Refactored zeta helper to eliminate assignment and allocations.
Next round of SameDiff/ND4J/TF import fixes (#6959)
* Add LogMatrixDeterminant

* Add LSTM.getHelper()

* Fix output type validation for shapes_of op

* Multiple op fixes

* Remove old/legacy Nd4j.getEnsuredShape

* More SameDiff fixes
Next round of SameDiff/ND4J fixes (#6964)
* Multiple test fixes

* Fix DataType issue in BaseNDArray constructors

* Misc fixes

* More misc fixes

* Variety of fixes; a few temporary ignores to get CI running again

* Fix reverse op
Unify CPU extension parameter selection (#6947)
This drops javacpp.extension in favor of only using libnd4j.extension
to trigger the proper profiles that will set all other required properties.

So far only two possible values are supported: avx2 and avx512.
libnd4j: Install both CPU and CUDA assembly ZIP files on Maven build (#…
…6946)

Also move all CUDA modules in "cuda" profiles and nd4j-native modules in "cpu" profile activated by default when libnd4j.chip != cuda
bug fix: Nd4j - "Invalid opType SHORT" when creating short buffer (#6966
)

* Nd4j.create(....SHORT)

* more dtypes

* fix createSame()
Another round of SameDiff and TF import fixes (#6967)
* Fix for SameDiff.outputs() inference for conditional ops

* Variable control dependency execution fix

* Another control dependency import/execution fix

* Fix another control dependencies edge case for execution

* SameDiff.outputs() fix/workaround for bad TF graphs (non-consumed switch outputs)

* Identity output shape calculation fix

* Fix issue with control dependencies for constants combined with loops

* Ignore for handful of remaining import tests for CI
Next round of SameDiff/import fixes (#6971)
* Check for ragged arrays for Nd4j.createFromArray

* ResizeBilinear and Pad calculateOutputDataTypes methods

* BatchNorm and DepthwiseConv2D calculateOutputDataTypes methods

* ArgMin/ArgMax fixes

* Skip NaN/Inf check for profiler mode for non-FP datatype arrays

* NativeOpExecutioner error message improvements; update ignores based on now passing tests

* Update zoo model ignores for now passing models
[WIP] SameDiff fixes/improvements (#6969)
* var types?

* placeholder shape

* temporary samediff array registration

* - variable existance check
- new exception
- lots of legacy fb files removed

* legacy rng removed

* sort draft

* - castTo during association
- allow placeholder to contain array

* resolved PLACEHOLDER becomes NDARRAY

* strings draft

* strings equality shortcut

* schema update

* flat string array ser/de

* schema update

* schema update

* - fb deserialization draft
- new NDArrayFactory::empty signature

* - byte strings
- bitswap for be/le

* NDArray::asByteVector for strings

* more strings tweaks

raver119 and others added some commits Jan 24, 2019

[WIP] Aggregates gone (#7059)
* initial commit

* initial commit

* sg skeleton

* next step

* fix for #7051

* next step

* next step

* next step

* couple of tests

* back to configurable

* require inplace

* next step

* first test passes

* one more test

* ns test

* cbow

* missed arg

* temp commit

* next step

* small fix

* cbow sg numeric test passes

* cbow ns numeric test passes

* java time

* skipgramm java wrapper

* cbow java wrapper

* next step

* idx abs

* CBOW tests pass

* SG tests pass
[WIP] Small ND4J fix, SameDiff graph visualization fixes/improvements (
…#7065)

* Control dopendencies, extra UI info

* No-arg constructors for CbowRound/SkipGramRound for reflections scanning

* More node info and better formatting to help debugging

* Remove bad layout options
[WIP] SameDiff nested while execution fix (#7077)
* Track parent frames during SameDiff execution for nested loop/enter cases

* Check for existing element when adding to abstract session availableForExec

* Working towards a nested while fix

* SameDiff: Fix nested loop java execution

* Cleanup

* Remove debug class

* Small fix
Add missing loss functions (#7042)
* Add gradient for hinge loss

* Add huber loss backprop

* Fix Poisson spelling

* Change input order for log_poisson_loss to match those of other loss functions

* Add Poisson Loss

* Add description on how optimized implementation of MPWSE is derived

* Initial implementation of MPWSE as defined in the paper

* Add gradient calculation for mean pairwise squared error loss
Fix ScalNet test using old API
Use argMax instead of directly calling OP
[WIP] Misc issue fixes (#7085)
* #7084 SameDiff GradCheckUtil mask

* #7074 #7075 Add ROCBinary.getROC(int); Add ROCBinary.stats() AUPRC

* #7064 Fix DL4J UIServer (temporary) memory leak

* #6991 Validate invalid TBPTT + GlobalPooling/LastTimeStep

* #7068 DataType validation (and casts where required) for dropout
fix reduction modes and misc issues (#7082)
* Fix reduction modes for Hinge Loss

* Fix reductions on huber loss

* Fix LogPoissonLoss reduction modes & full mode switching

* Update mean pairwise square error tests in libnd4j to match match those
in java land

Expected values are calculated using the nested loop method, as
implemented in LossOpValidation.java

* Use Gradient Check Mask in Loss OP Validation Tests for MEAN_BY_NONZERO_WEIGHT_COUNT

MEAN_BY_NONZERO_WEIGHT_COUNT is non differentiable for weight=0 so those
points have to be masked out.

* Add reduction mode support to MPWSE Loss

* fix gradient check numerical issues for softmax losses

* Fix calculation of weights gradient in cases where label smoothing is applied

* All LossOpValidation Tests are passing
Infer kernel size from weights if needed (#7098)
* Infer kernel size from weights if needed

If the kernel size isn't set, we can try inferring the proper sizes from the given weights.

Fixes #7008

* Fix expected weight gradient in softmax cross entropy test

Weight gradient calculation was changed in 65f2313.
[WIP] DL4J: L2/Weight decay (#7097)
* Add Regularization interface; add L2 and WeightDecay

* Support schedules in L2/WeightDecay

* L1 regularization; cleanup, JSON etc

* Regularization API change, round 1

* Regularization API change, round 2

* Regularization API change, round 3

* Fixes

* Multiple fixes

* Javadoc and additional builder/layer methods

* Handle passed-in 0.0 value for l1/l2/weight decay

* Improvements for config duplicate regularization checks; remove legacy batch scaled l2

* Legacy JSON format loading for regularization

* alignment fix + test

* order matters

* Test fixes; Add warning when removing L2 on WeightDecay addition and vice-versa

* Fixes + test fixes

* Small transfer learning fix
ND4J Tests (#7060)
* #7054 Set default datatype in Zoo import tests

* Ignores for last remaining tests for CI

* Ignores for last remaining tests for CI

* Small TF import resource loading fix

* Temporarily disable failing test until PR merged

* Clean up test verbose mode config etc

* BaseNDArray.equals fix for compressed arrays
Update cudnn-config.md (#6925)
* Update cudnn-config.md

* Update

The redist artifacts were not released for CUDA 9.0:
http://repo2.maven.org/maven2/org/bytedeco/javacpp-presets/cuda/9.0-7.0-1.4.1/
[WIP] ND4J: a few more test fixes (#7113)
* Workaround for pullrows issue

* GSON fixes

* nd4j-tests-tensorflow fix

* Small dependency fix
Updated serialization/deserialization for NLP (#7072)
* Vocabulary serialization

* Serialization

* Serialization

* Serialization

* test update

* test update

* test update

* In process

* minor tweaks

* minor tweaks

* In process

* In process

* Cleanup

* Cleanup

* Vectors serialization

* Added matrices to serializer

* Intermediate

* Rewritten serialization

* Intermediate

* Intermediate

* Intermediate

* flush

* layerSize fix

* Corrected version

* Cleanup
[WIP] SameDiff TF import fixes (#7088)
* Apply styling to UI

* Cleanup and more styling

* POC for line chart rendering from flatbuffers data

* Non-max suppression op fixes

* TensorArrayRead: get datatype during import

* Various fixes for SSD import

* Handle TensorArrayWrite + enter edge case

* Fix slice op

* InferenceSession shape check

* Cleanup

* Make graph rendering more useful for debugging

* Basic search functionality, v1

* Search works

* Workaround for INDArray.get rank1/2 issue (#7092)

* Fix some of the broken layout issues

* Workaround for logged issues

* #7100 Clarify behaviour of SameDiff.execBackwards

* Another workaround for get issue; remove unnecessary Reshape op resolve properties method

* Fix calculateOutputShape handling of empty arrays

* Small InferenceSession fix; LogFileWriter histogram writing

* Reshape empty array fix

* UI tweaks

* Fix libnd4j gather op for scalar case; clean up test ignores + reenable one test

* Additional info in NativeOpExecutioner errors on failed op exec

* Fix issue with cast op of empty arrays returning scalar arrays

* Gather op: add support for empty indices array

* Gather fix, round 2

* Split op: add support for empty arrays for TF import compatibility

* Misc UI

* Nd4j.create(LongShapeDescriptor, boolean) check for empty shape info

* Small UI fixes

* Broadcastable ops: follow TF convensions for broadcasting with empty input array(s)

* Align concat op empty array handling with TF for import

* Fix LongShapeDescriptor.asDataType for empty arrays

* Gather empty array fix

* Small fixes

* UI dep fix

* Final fixes
Merge branch 'master' into r119_cuda_ops_again
# Conflicts:
#	libnd4j/blas/NDArrayFactory.h
#	libnd4j/blas/cpu/NDArray.cpp
#	libnd4j/blas/cpu/NDArrayFactory.cpp
#	libnd4j/blas/cpu/NativeOpExcutioner.cpp
#	libnd4j/blas/cpu/NativeOps.cpp
#	libnd4j/blas/cuda/NativeOps.cu
#	libnd4j/include/exceptions/graph_exception.h
#	libnd4j/include/graph/exceptions/impl/graph_exception.cpp
#	libnd4j/include/graph/impl/FlatUtils.cpp
#	libnd4j/include/helpers/impl/MmulHelper.cpp
#	libnd4j/include/ops/declarable/generic/list/write_list.cpp
#	libnd4j/include/ops/declarable/generic/loss/cosineDistance.cpp
#	libnd4j/include/ops/declarable/generic/loss/huberLoss.cpp
#	libnd4j/include/ops/declarable/generic/loss/meanPairWsSqErr.cpp
#	libnd4j/include/ops/declarable/generic/parity_ops/zeta.cpp
#	libnd4j/include/ops/declarable/generic/transforms/pad.cpp
#	libnd4j/include/ops/declarable/generic/transforms/reverse.cpp
#	libnd4j/include/ops/declarable/helpers/cpu/image_resize.cpp
#	libnd4j/include/ops/declarable/helpers/cpu/reverse.cpp
#	libnd4j/include/ops/declarable/helpers/cpu/transforms.cpp
#	libnd4j/include/ops/declarable/helpers/cpu/zeta.cpp
#	libnd4j/include/ops/declarable/helpers/image_resize.h
#	libnd4j/include/ops/declarable/helpers/reverse.h
#	libnd4j/include/ops/declarable/helpers/zeta.h
#	libnd4j/include/ops/declarable/impl/BooleanOp.cpp
#	libnd4j/include/ops/declarable/impl/LegacyReduceBoolOp.cpp
#	libnd4j/include/ops/declarable/impl/LegacyReduceFloatOp.cpp
#	libnd4j/include/ops/declarable/impl/LegacyReduceLongOp.cpp
#	libnd4j/include/ops/declarable/impl/LegacyScalarBoolOp.cpp
#	libnd4j/include/ops/declarable/impl/LegacyScalarOp.cpp
#	libnd4j/tests_cpu/layers_tests/CMakeLists.txt
#	libnd4j/tests_cpu/layers_tests/DeclarableOpsTests12.cpp
#	libnd4j/tests_cpu/layers_tests/DeclarableOpsTests9.cpp
#	libnd4j/tests_cpu/layers_tests/EmptyTests.cpp
#	libnd4j/tests_cpu/layers_tests/HelpersTests1.cpp
- disable asan for cuda/linux
- cuda_exception instead of runtime_error

@raver119 raver119 merged commit c905eee into r119_cuda_ops Feb 6, 2019

0 of 2 checks passed

Codacy/PR Quality Review Hang in there, Codacy is reviewing your Pull request.
Details
continuous-integration/jenkins/pr-head This commit is being built
Details

@raver119 raver119 deleted the r119_cuda_ops_again branch Feb 6, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.