Dense SNT Wrapper for TorchTensors #51

interesaaat · 2018-11-06T17:41:23Z

This closes #3. We will need a different issue for the operators. This PR only extends the methods in the
DenseTensor<T> class using Torch tensors as backends.

NiklasGustafsson · 2018-11-06T17:44:19Z

SNT/NativeMemoryManager.cs

+using System.Threading;
+using System;
+
+namespace Torch.SNT


Should we be using non-descriptive names like 'SNT' in namespaces? It does not aid in discovery.

I agree. Any suggestion? I can definitely unroll SNT as SystemNumericTensor but it becomes really long.

SNT/NativeMemoryManager.cs

NiklasGustafsson · 2018-11-06T17:47:07Z

SNT/SNT.csproj

+
+  <ItemGroup>
+    <PackageReference Include="System.Numerics.Tensors" Version="0.1.0-preview2-181031-3" />
+    <PackageReference Include="System.Runtime.CompilerServices.Unsafe" Version="4.5.2" />


Since I don't do unsafe C# too often, I keep forgetting -- does it not complicate using the package from applications/services that don't allow unsafe code? Is there a way to avoid that?

Mmm I don't think I can avoid this. I need to map the unmanaged memory from TorchSharp into C# stuff; I don't think I can avoid the usage of pointers.

Unsafe is also already used in TorchSharp when returning the pointer to data.

So, looking at this some more -- TorchSharp.NNTensor.Data takes an IntPtr and casts it to a pointer, which makes it unsafe code. Then, your code takes the pointer and casts it back to an IntPtr for storage. If Tensor.Data instead returns the IntPtr it got from native code, you shouldn't have to handle pointers, right?

I can definitely remove some of the unsafe in that way. However I will still need Unsafe because the Pin method accepts as input an index and therefore I need to do pointer + index to return the correct slice of memory. Actually that Unsafe dependency is used for this.

Okay, that's good. We can get rid of the unsafe attribute on TorchSharp, then. I have a PR coming with some of those changes.

SNT/TypeGenerator.tt

NiklasGustafsson · 2018-11-06T17:51:07Z

SNT/Utils.cs

+            int product = 1;
+            for (int i = startIndex; i < dimensions.Length; i++)
+            {
+                if (dimensions[i] < 0)


Can we avoid this check by using unsinged integers for the dimensions?

I think this is difficult to do, unless we can convince the SNT folks to use unsigned ints in their API. (Torch is actually using long BTW)

Test/FloatTorchTensorUnitTest.cs

SNT/TypeGenerator.tt

SNT/NativeMemoryManager.cs

markusweimer · 2018-11-06T22:38:31Z

SNT/NativeMemoryManager.cs

+
+        public unsafe NativeMemory(void* memory, int length)
+        {
+            this.memory = (IntPtr)memory;


Can this conversion fail? Also, this constructor could call the other one.

I redirected the constructor to the previous one. The conversion is ok, I checked online and it seems that it cannot fail, it is simply boxing a pointer into a managed object.

markusweimer · 2018-11-06T22:39:03Z

SNT/NativeMemoryManager.cs

+        private IntPtr memory;
+        private int length;
+
+        public NativeMemory(IntPtr memory, int length)


This probably should do some defensive parameter checks of memory not being null and such, and length being greater than 0.

Length could be zero if you want to set up an empty tensors I presume. I can add the check that length is not negative.

SNT/NativeMemoryManager.cs

SNT/TypeGenerator.cs

SNT/Utils.cs

markusweimer · 2018-11-06T22:44:33Z

SNT/Utils.cs

+{
+    internal class Utils
+    {
+        public static int GetTotalLength(ReadOnlySpan<int> dimensions, int startIndex = 0)


Also, couldn't this be a property on the Tensor object?

There is a Length property in Tensor (which I will map into a call to Torch nElement). But this is different because it is used before the tensor is created.

Test/FloatTorchTensorUnitTest.cs

markusweimer · 2018-11-06T22:46:02Z

Test/Test.csproj

@@ -14,6 +14,7 @@

  <ItemGroup>
    <ProjectReference Include="..\TorchSharp\TorchSharp.csproj" />
+    <ProjectReference Include="..\SNT\SNT.csproj" />


Maybe we should have our own test project for the SNT wrapper? That way, we keep the two layers completely separate.

I agree. At this point we should address #34 and create a unique test folder (either in source or on its own).

interesaaat · 2018-11-07T00:14:12Z

Fixed all comments so far.

migueldeicaza · 2018-11-09T03:11:48Z

Ouch - rerunning the tests still crashes. I suspect that this indicates a real memory corruption bug on this PR.

interesaaat · 2018-11-09T04:08:19Z

Ouch - rerunning the tests still crashes. I suspect that this indicates a real memory corruption bug on this PR.

Yes there is something going on with the memory clean up. I have spent the last day in trying to figure this out with no luck. Now that we have libtorch running in Windows I should be able to figure out what is going on. Please don't merge until I am able to fix the bug.

interesaaat · 2018-11-09T20:41:55Z

I think I have addressed all comments and fixed the test error (thanks to @NiklasGustafsson).

NiklasGustafsson · 2018-11-12T20:44:20Z

Should we still hold off on merging this one?

interesaaat · 2018-11-12T20:52:58Z

Should we still hold off on merging this one?

If you want we can wait until System.Numerics.Tensor NuGet get released. Beside this, it can be merged as is I think.

migueldeicaza · 2018-11-13T02:47:39Z

If this relies on a third party NuGet that is not released, let us wait until then.

markusweimer · 2018-11-13T22:39:38Z

If this relies on a third party NuGet that is not released, let us wait until then.

The NuGet is available on MyGet. The .NET team is working on pushing it to Nuget.org soon. I'm OK holding off till that happens.

SNT/NativeMemoryManager.cs

ericstj · 2018-11-14T20:59:49Z

SNT/TypeGenerator.tt

+    /// <summary>
+    ///   Wrapper class used to surface a Torch <#=tname#>Tensor as a System.Numerics DensorTensor of <#=tname.ToLower()#>
+    /// </summary>
+    public sealed class <#=tname#>TorchTensor : DenseTensor<<#=tname.ToLower()#>>


Can you summarize briefly why you needed to subclass DenseTensor rather than use it as is? Was it only so that you could make the Clone give you back a tensor with memory owned by torch rather than managed memory?

I think that the points are 2:

we want to be able to surface to users the Tensor<T> type backed by Torch tensor implementation.

It is not only for Clone: we want to use the fast operations over tensors that Torch provides. Eventually we will also have DenseTensors backed by GPUs.

I don't think we can achieve these 2 points without subclassing (unless I am missing something).

Thank you for this. Did you consider making your derived type generic as well?

I think my initial implementation was generic but we figured that a non-generic version was simpler. What do you think? Do you think that a generic type will be more "compatible" with the Tensor<T> design?

If you have a generic type / methods it enables generic callers, if you don't have them then that scenario is not possible (unless the caller creates a type-switch). Ideally we want to handle the type switching in the libraries rather than force all the callers to do type switching.

We also try to design generic APIs so that the generic type can be inferred: this is why we have the static creation methods on Tensor (note: not Tensor<T>) rather than forcing folks to call the constructor.

FWIW, libtorch C++ API is not generic - that is, it has a single non-parameterized at::Tensor class with methods like .type(), .device(), .is_sparse() etc. All dispatching to the proper backend and function specialization happens at runtime in Type and TensorImpl classes. So maybe if even libtorch developers were not able to make the Tensor class parameterized - not even by the scalar type and in less restrictive C++ template system, we should not worry about it too much, either?

P.S. That said, I'd be delighted to see a strongly typed tensor library for C# - ideally something in the spirit of Haskell's HMatrix. The question is how much effort we want to put into it. (Maybe if we have a nice low-level API now, we can build an alternative strongly-typed parameterized API on top of it later?) Just thinking out loud.

These are two different concerns tho. I think that the Tensor<T> wrapper for TorchSharp should be following the Tensor<T> paradigm, i.e., try to use generics. For the main TorchSharp story, I totally agree with you: it is fine to be non-generic and consistent with the Torch API, even if a generic API will be better (and Issue #18 is exactly for this).

I was looking into following the approach of System.Numerics.Tensors for dispatching types but it is not going to work in our case at the moment. The problem is that we don't have a root type for the tensors in TorchSharp (i.e., the ITensorArithmetic<T> in https://github.com/dotnet/corefx/blob/master/src/System.Numerics.Tensors/tests/TensorArithmetic.cs) therefore I am not able to create a singleton object for type dispatching.

That's just an internal interface, could't you create one?

Not in this PR. We have issue #18 tracking this. I already tried to implement something on that line in WIP #28 but it looks we have not an agreement yet.

Added the Nuget one.

interesaaat · 2018-11-15T01:40:28Z

I have update the dependency to the Nuget so this PR can now be merged (once @ericstj agrees)

SNT/NativeMemoryManager.cs

ericstj · 2018-11-16T06:31:56Z

/cc @tannergooding

…native memory point to a torch storage instead of directly to the data pointer.

ericstj · 2018-11-19T20:05:06Z

I'm concerned with the overall approach that you now have two tensor types exposed instead of just one. This PR creates Torch.SNT.*TorchTensor, but you still have TorchSharp.*Tensor. Is this PR a step along the way, or do you imagine having both of these tensor types remain publicly exposed?

interesaaat · 2018-11-19T20:36:15Z

I'm concerned with the overall approach that you now have two tensor types exposed instead of just one. This PR creates Torch.SNT.*TorchTensor, but you still have TorchSharp.*Tensor. Is this PR a step along the way, or do you imagine having both of these tensor types remain publicly exposed?

I think that the idea was to have TorchSharp as a thin layer with minimal dependencies to provide a way to access Torch functionalities from C#. Torch.SNT should instead try to back System.Numerics.Tensors with Torch tensors so that, from one side, we can exploit the efficient implementation of tensor operations that Torch provide, and from the other we can converge to "the" tensor type in .Net without providing a new one (TorchSharp.Tensor). I agree that in this way some of the functionalities will have duplicates (e.g., NativeMemory and Storage and 2 tensor types) but @migueldeicaza was clear that he wants to separate the 2 efforts (am I wrong?). @markusweimer do you have any additional comment on this?

ericstj · 2018-11-26T20:10:34Z

SNT/TypeGenerator.tt

+            /// <summary>
+            ~<#=tname#>NativeMemory ()
+            {
+                Dispose (true);


Shouldn't this be false?

In this version I actually have to dispose the memory. If you look at line 146, I am retain the storage pointer every time a new native memory is created (or pinned). Therefore I need to Dispose (true) to properly decrement the reference count. I have to fix the comment :)

I don't believe you can safely do that. The GC does not guarantee that storage (another managed object) has not yet been finalized at the time the finalizer for this NativeMemory runs. Only one object should be in charge of maintaining the native state and that object should be the one that has the finalizer that does work. (ps, consider #51 (comment))

There is something I am missing on C# GC then. Native storage maintains a reference to storage, why the GC should destroy storage if refcount > 0?

Both references are out of scope at the same time. This is documented behavior

The finalizers of two objects are not guaranteed to run in any specific order, even if one object refers to the other. That is, if Object A has a reference to Object B and both have finalizers, Object B might have already been finalized when the finalizer of Object A starts.

I suspect this has to do with GC performance and how it calculates live objects in a single pass and puts them on the finalization queue in the order they are encountered (not necessarily in the order they appear in an object graph), also likely has to do with not needing to deal with potentially deep object cycles. @Maoni0 could explain better.

I see. This behavior is only related to finalizers, right?

The exact time when the finalizer executes is undefined. To ensure deterministic release of resources for instances of your class, implement a Close method or provide a IDisposable.Dispose implementation.

Therefore I can do Dispose (false) in Native Memory (Storage already have Dispose (false)) to avoid nondeterministic behaviors and use the Dispose method to make sure that Native Memory objects are disposed before Storage (since they maintain a reference to it).

That sounds right. Of course in the case that no-one calls Dispose, the finalizer for storage will still clean up the native resources, therefore you can technically remove the finalizer from NativeMemory since it does nothing. You can also remove the public Dispose method unless you need to explicitly call it, since the base type implements IDisposable.

Did you consider unifying the storage and NativeMemory classes? I think that actually might reduce the number of object allocations per tensor and feels like the right conceptual pairing.

Did you consider unifying the storage and NativeMemory classes? I think that actually might reduce the number of object allocations per tensor and feels like the right conceptual pairing.

Yes I was looking into that but this will require to add a dependency to the main TorchSharp project which I believe @migueldeicaza and @markusweimer want to avoid.

this will require to add a dependency to the main TorchSharp project

You're talking about a dependency on System.Memory? Seems like that's not such a bad depdendency to have since it would let TorchSharp use Span, but I can see why you may want to avoid it.

I think this is leading to confusion in how this NativeMemory class is implemented. I was taking a closer look and now I see that Dispose is calling Free, but I don't think this is correct. It looks to me like you're using Free to represent "decrement a reference count" however the reference count may not have been previously incremented since I don't see a case where this is happening during construction. Since the NativeMemory itself isn't the owner of the memory at all, I wonder if it even makes sense to do anything in its Dispose 😕: in fact I don't even see where you call this.

In the Tensor/NativeMemory sample we were using the NativeMemory object as the "owner" of the memory. As such, it was the one implementing the reference count and it was the one responsible for eventually freeing that memory in its finalizer/dispose.

In this case, you're using NativeMemory as an adapter for your storage class which is ultimately the owner of the memory. The adapter acts as the mechanism for constructing spans over the storage.

Furthermore you have added IDisposable to your derived Tensors and it looks to me like this will dispose the backing torch tensor which will dispose the backing storage. I would have imagined you'd instead want to have decremented a reference count here and dispose the backing storage when the ref count gets to zero. As its implemented I think you'll end up freeing the memory behind a shallow-copied reshaped tensor.

Perhaps you can add some test cases to further flesh out how this should behave?

@tannergooding perhaps we could expand our Tensor samples to cover this scenario?

It looks to me like you're using Free to represent "decrement a reference count" however the reference count may not have been previously incremented since I don't see a case where this is happening during construction.

I have pushed the call to Retain into the Nativ Memory constructor. Before it was outside when the storage was retrieved from the tensor to create the NativeMemory object.

Since the NativeMemory itself isn't the owner of the memory at all, I wonder if it even makes sense to do anything in its Dispose 😕: in fact I don't even see where you call this.

Dispose is required because NameMemory extends MemoryManager.

Furthermore you have added IDisposable to your derived Tensors and it looks to me like this will dispose the backing torch tensor which will dispose the backing storage. I would have imagined you'd instead want to have decremented a reference count here and dispose the backing storage when the ref count gets to zero. As its implemented I think you'll end up freeing the memory behind a shallow-copied reshaped tensor.

This is exactly the behavior I implemented (I hope :) ). The fact is that NativeMemory is required to dispose the storage because it may happen the case where a span is retrieved and used outside a tensor context (e.g., in this test case).

danmoseley · 2018-11-28T23:04:24Z

@tannergooding owns S.N.T and can help make sure that the right integration is happening here. It's important to us that S.N.T is valuable for TorchSharp.

SNT/TypeGenerator.cs

Moved Retain after argument checks. Added some missing case.

ericstj

As we discussed face to face today, I think it's OK to proceed with this PR, but we should review it for things that we don't like so that we can revisit them in the design of Tensor.

interesaaat added 8 commits November 5, 2018 20:05

update

85caee7

Added SNT

141ba13

update

ba655c3

update

0ea59b0

update to the sln file

a2bc0d0

update to sln

83793f5

Merge remote-tracking branch 'upstream/master' into issue3

ff99c49

Update to TypeGenerator

0c6cdea

interesaaat requested review from NiklasGustafsson, markusweimer and motus November 6, 2018 17:41

Addinig missing nuget.config file

1dc871b

NiklasGustafsson requested changes Nov 6, 2018

View reviewed changes

Adding documentation

6fa6986

markusweimer reviewed Nov 6, 2018

View reviewed changes

interesaaat added 3 commits November 6, 2018 15:53

Addressed PR comments minus test generation.

9389811

Merge remote-tracking branch 'upstream/master' into issue3

a746099

Unit tests for SNT Dense tensors are autmatically generated now.

e6994fe

interesaaat added 4 commits November 8, 2018 10:32

few minor fixes in SNT

8f38c5c

update to test

8b8bb6f

update to types

e7a640e

removed unsafe from getSpan()

a71e566

NiklasGustafsson approved these changes Nov 8, 2018

View reviewed changes

interesaaat added 4 commits November 9, 2018 11:31

merged

bec5d7d

bug fix

13c2a90

Merge remote-tracking branch 'upstream/master' into issue3

fd96581

rename TorchTensorUnitTests into SNTUnitTest

4c6b191

NiklasGustafsson approved these changes Nov 9, 2018

View reviewed changes

ericstj reviewed Nov 14, 2018

View reviewed changes

interesaaat added 3 commits November 14, 2018 15:38

Removed previous SNT dependency.

ac1faaf

Added the Nuget one.

merged

081855f

merged again

19a05f7

Fixes to spaces

af78787

ericstj reviewed Nov 16, 2018

View reviewed changes

SNT/NativeMemoryManager.cs Outdated Show resolved Hide resolved

Fixed the problem with memory used after tensor is disposed: now the …

571f5d6

…native memory point to a torch storage instead of directly to the data pointer.

ericstj reviewed Nov 26, 2018

View reviewed changes

Removed finalizer and public Dispose method from Native Memory

52efec2

Moved retain into native memory constructor.

bb8eb6c

eerhardt reviewed Nov 29, 2018

View reviewed changes

SNT/TypeGenerator.cs Show resolved Hide resolved

eerhardt reviewed Nov 29, 2018

View reviewed changes

SNT/TypeGenerator.cs Outdated Show resolved Hide resolved

interesaaat added 3 commits November 29, 2018 13:00

Added proper type instead of object.

d14acc6

Moved Retain after argument checks. Added some missing case.

Merge remote-tracking branch 'upstream/master' into issue3

8fb5832

merged with master

e195a6d

ericstj approved these changes Dec 3, 2018

View reviewed changes

NiklasGustafsson merged commit 42d8989 into dotnet:master Dec 3, 2018

interesaaat mentioned this pull request Dec 4, 2018

Discussion: SNT integration with TorchSharp #71

Closed

interesaaat mentioned this pull request Jan 31, 2020

System.Numerics.Tensors for PyTorch: few comments dotnet/runtime#27973

Closed

Dense SNT Wrapper for TorchTensors #51

Dense SNT Wrapper for TorchTensors #51

Conversation

interesaaat commented Nov 6, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

interesaaat Nov 6, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

interesaaat commented Nov 7, 2018

migueldeicaza commented Nov 9, 2018

interesaaat commented Nov 9, 2018 • edited Loading

interesaaat commented Nov 9, 2018

NiklasGustafsson commented Nov 12, 2018

interesaaat commented Nov 12, 2018

migueldeicaza commented Nov 13, 2018

markusweimer commented Nov 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericstj Nov 16, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

interesaaat commented Nov 15, 2018

ericstj commented Nov 16, 2018

ericstj commented Nov 19, 2018

interesaaat commented Nov 19, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericstj Nov 28, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danmoseley commented Nov 28, 2018

ericstj left a comment

Choose a reason for hiding this comment

interesaaat Nov 6, 2018 •

edited

Loading

interesaaat commented Nov 9, 2018 •

edited

Loading

ericstj Nov 16, 2018 •

edited

Loading

interesaaat commented Nov 19, 2018 •

edited

Loading

ericstj Nov 28, 2018 •

edited

Loading