Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dense SNT Wrapper for TorchTensors #51

Merged
merged 32 commits into from
Dec 3, 2018

Conversation

interesaaat
Copy link
Contributor

This closes #3. We will need a different issue for the operators. This PR only extends the methods in the
DenseTensor<T> class using Torch tensors as backends.

using System.Threading;
using System;

namespace Torch.SNT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be using non-descriptive names like 'SNT' in namespaces? It does not aid in discovery.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Any suggestion? I can definitely unroll SNT as SystemNumericTensor but it becomes really long.

SNT/NativeMemoryManager.cs Outdated Show resolved Hide resolved

<ItemGroup>
<PackageReference Include="System.Numerics.Tensors" Version="0.1.0-preview2-181031-3" />
<PackageReference Include="System.Runtime.CompilerServices.Unsafe" Version="4.5.2" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I don't do unsafe C# too often, I keep forgetting -- does it not complicate using the package from applications/services that don't allow unsafe code? Is there a way to avoid that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm I don't think I can avoid this. I need to map the unmanaged memory from TorchSharp into C# stuff; I don't think I can avoid the usage of pointers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsafe is also already used in TorchSharp when returning the pointer to data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, looking at this some more -- TorchSharp.NNTensor.Data takes an IntPtr and casts it to a pointer, which makes it unsafe code. Then, your code takes the pointer and casts it back to an IntPtr for storage. If Tensor.Data instead returns the IntPtr it got from native code, you shouldn't have to handle pointers, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can definitely remove some of the unsafe in that way. However I will still need Unsafe because the Pin method accepts as input an index and therefore I need to do pointer + index to return the correct slice of memory. Actually that Unsafe dependency is used for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that's good. We can get rid of the unsafe attribute on TorchSharp, then. I have a PR coming with some of those changes.

SNT/TypeGenerator.tt Outdated Show resolved Hide resolved
int product = 1;
for (int i = startIndex; i < dimensions.Length; i++)
{
if (dimensions[i] < 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid this check by using unsinged integers for the dimensions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is difficult to do, unless we can convince the SNT folks to use unsigned ints in their API. (Torch is actually using long BTW)

Test/FloatTorchTensorUnitTest.cs Outdated Show resolved Hide resolved
SNT/TypeGenerator.tt Outdated Show resolved Hide resolved
SNT/NativeMemoryManager.cs Outdated Show resolved Hide resolved

public unsafe NativeMemory(void* memory, int length)
{
this.memory = (IntPtr)memory;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this conversion fail? Also, this constructor could call the other one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I redirected the constructor to the previous one. The conversion is ok, I checked online and it seems that it cannot fail, it is simply boxing a pointer into a managed object.

private IntPtr memory;
private int length;

public NativeMemory(IntPtr memory, int length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably should do some defensive parameter checks of memory not being null and such, and length being greater than 0.

Copy link
Contributor Author

@interesaaat interesaaat Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Length could be zero if you want to set up an empty tensors I presume. I can add the check that length is not negative.

SNT/NativeMemoryManager.cs Outdated Show resolved Hide resolved
SNT/NativeMemoryManager.cs Outdated Show resolved Hide resolved
SNT/TypeGenerator.cs Outdated Show resolved Hide resolved
SNT/Utils.cs Outdated Show resolved Hide resolved
SNT/Utils.cs Outdated
{
internal class Utils
{
public static int GetTotalLength(ReadOnlySpan<int> dimensions, int startIndex = 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, couldn't this be a property on the Tensor object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a Length property in Tensor (which I will map into a call to Torch nElement). But this is different because it is used before the tensor is created.

Test/FloatTorchTensorUnitTest.cs Outdated Show resolved Hide resolved
@@ -14,6 +14,7 @@

<ItemGroup>
<ProjectReference Include="..\TorchSharp\TorchSharp.csproj" />
<ProjectReference Include="..\SNT\SNT.csproj" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should have our own test project for the SNT wrapper? That way, we keep the two layers completely separate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. At this point we should address #34 and create a unique test folder (either in source or on its own).

@interesaaat
Copy link
Contributor Author

Fixed all comments so far.

@migueldeicaza
Copy link
Contributor

Ouch - rerunning the tests still crashes. I suspect that this indicates a real memory corruption bug on this PR.

@interesaaat
Copy link
Contributor Author

interesaaat commented Nov 9, 2018

Ouch - rerunning the tests still crashes. I suspect that this indicates a real memory corruption bug on this PR.

Yes there is something going on with the memory clean up. I have spent the last day in trying to figure this out with no luck. Now that we have libtorch running in Windows I should be able to figure out what is going on. Please don't merge until I am able to fix the bug.

@interesaaat
Copy link
Contributor Author

I think I have addressed all comments and fixed the test error (thanks to @NiklasGustafsson).

@NiklasGustafsson
Copy link
Contributor

Should we still hold off on merging this one?

@interesaaat
Copy link
Contributor Author

Should we still hold off on merging this one?

If you want we can wait until System.Numerics.Tensor NuGet get released. Beside this, it can be merged as is I think.

@migueldeicaza
Copy link
Contributor

If this relies on a third party NuGet that is not released, let us wait until then.

@markusweimer
Copy link
Member

If this relies on a third party NuGet that is not released, let us wait until then.

The NuGet is available on MyGet. The .NET team is working on pushing it to Nuget.org soon. I'm OK holding off till that happens.

SNT/NativeMemoryManager.cs Outdated Show resolved Hide resolved
SNT/NativeMemoryManager.cs Outdated Show resolved Hide resolved
/// <summary>
/// Wrapper class used to surface a Torch <#=tname#>Tensor as a System.Numerics DensorTensor of <#=tname.ToLower()#>
/// </summary>
public sealed class <#=tname#>TorchTensor : DenseTensor<<#=tname.ToLower()#>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you summarize briefly why you needed to subclass DenseTensor rather than use it as is? Was it only so that you could make the Clone give you back a tensor with memory owned by torch rather than managed memory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the points are 2:

  1. we want to be able to surface to users the Tensor<T> type backed by Torch tensor implementation.
  2. It is not only for Clone: we want to use the fast operations over tensors that Torch provides. Eventually we will also have DenseTensors backed by GPUs.

I don't think we can achieve these 2 points without subclassing (unless I am missing something).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this. Did you consider making your derived type generic as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my initial implementation was generic but we figured that a non-generic version was simpler. What do you think? Do you think that a generic type will be more "compatible" with the Tensor<T> design?

Copy link
Member

@ericstj ericstj Nov 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a generic type / methods it enables generic callers, if you don't have them then that scenario is not possible (unless the caller creates a type-switch). Ideally we want to handle the type switching in the libraries rather than force all the callers to do type switching.

We also try to design generic APIs so that the generic type can be inferred: this is why we have the static creation methods on Tensor (note: not Tensor<T>) rather than forcing folks to call the constructor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, libtorch C++ API is not generic - that is, it has a single non-parameterized at::Tensor class with methods like .type(), .device(), .is_sparse() etc. All dispatching to the proper backend and function specialization happens at runtime in Type and TensorImpl classes. So maybe if even libtorch developers were not able to make the Tensor class parameterized - not even by the scalar type and in less restrictive C++ template system, we should not worry about it too much, either?

P.S. That said, I'd be delighted to see a strongly typed tensor library for C# - ideally something in the spirit of Haskell's HMatrix. The question is how much effort we want to put into it. (Maybe if we have a nice low-level API now, we can build an alternative strongly-typed parameterized API on top of it later?) Just thinking out loud.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are two different concerns tho. I think that the Tensor<T> wrapper for TorchSharp should be following the Tensor<T> paradigm, i.e., try to use generics. For the main TorchSharp story, I totally agree with you: it is fine to be non-generic and consistent with the Torch API, even if a generic API will be better (and Issue #18 is exactly for this).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking into following the approach of System.Numerics.Tensors for dispatching types but it is not going to work in our case at the moment. The problem is that we don't have a root type for the tensors in TorchSharp (i.e., the ITensorArithmetic<T> in https://github.com/dotnet/corefx/blob/master/src/System.Numerics.Tensors/tests/TensorArithmetic.cs) therefore I am not able to create a singleton object for type dispatching.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just an internal interface, could't you create one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in this PR. We have issue #18 tracking this. I already tried to implement something on that line in WIP #28 but it looks we have not an agreement yet.

@interesaaat
Copy link
Contributor Author

I have update the dependency to the Nuget so this PR can now be merged (once @ericstj agrees)

SNT/NativeMemoryManager.cs Outdated Show resolved Hide resolved
@ericstj
Copy link
Member

ericstj commented Nov 16, 2018

/cc @tannergooding

…native memory point to a torch storage instead of directly to the data pointer.
@ericstj
Copy link
Member

ericstj commented Nov 19, 2018

I'm concerned with the overall approach that you now have two tensor types exposed instead of just one. This PR creates Torch.SNT.*TorchTensor, but you still have TorchSharp.*Tensor. Is this PR a step along the way, or do you imagine having both of these tensor types remain publicly exposed?

@interesaaat
Copy link
Contributor Author

interesaaat commented Nov 19, 2018

I'm concerned with the overall approach that you now have two tensor types exposed instead of just one. This PR creates Torch.SNT.*TorchTensor, but you still have TorchSharp.*Tensor. Is this PR a step along the way, or do you imagine having both of these tensor types remain publicly exposed?

I think that the idea was to have TorchSharp as a thin layer with minimal dependencies to provide a way to access Torch functionalities from C#. Torch.SNT should instead try to back System.Numerics.Tensors with Torch tensors so that, from one side, we can exploit the efficient implementation of tensor operations that Torch provide, and from the other we can converge to "the" tensor type in .Net without providing a new one (TorchSharp.Tensor). I agree that in this way some of the functionalities will have duplicates (e.g., NativeMemory and Storage and 2 tensor types) but @migueldeicaza was clear that he wants to separate the 2 efforts (am I wrong?). @markusweimer do you have any additional comment on this?

/// <summary>
~<#=tname#>NativeMemory ()
{
Dispose (true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this version I actually have to dispose the memory. If you look at line 146, I am retain the storage pointer every time a new native memory is created (or pinned). Therefore I need to Dispose (true) to properly decrement the reference count. I have to fix the comment :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe you can safely do that. The GC does not guarantee that storage (another managed object) has not yet been finalized at the time the finalizer for this NativeMemory runs. Only one object should be in charge of maintaining the native state and that object should be the one that has the finalizer that does work. (ps, consider #51 (comment))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is something I am missing on C# GC then. Native storage maintains a reference to storage, why the GC should destroy storage if refcount > 0?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both references are out of scope at the same time. This is documented behavior

The finalizers of two objects are not guaranteed to run in any specific order, even if one object refers to the other. That is, if Object A has a reference to Object B and both have finalizers, Object B might have already been finalized when the finalizer of Object A starts.

I suspect this has to do with GC performance and how it calculates live objects in a single pass and puts them on the finalization queue in the order they are encountered (not necessarily in the order they appear in an object graph), also likely has to do with not needing to deal with potentially deep object cycles. @Maoni0 could explain better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. This behavior is only related to finalizers, right?

The exact time when the finalizer executes is undefined. To ensure deterministic release of resources for instances of your class, implement a Close method or provide a IDisposable.Dispose implementation.

Therefore I can do Dispose (false) in Native Memory (Storage already have Dispose (false)) to avoid nondeterministic behaviors and use the Dispose method to make sure that Native Memory objects are disposed before Storage (since they maintain a reference to it).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds right. Of course in the case that no-one calls Dispose, the finalizer for storage will still clean up the native resources, therefore you can technically remove the finalizer from NativeMemory since it does nothing. You can also remove the public Dispose method unless you need to explicitly call it, since the base type implements IDisposable.

Did you consider unifying the storage and NativeMemory classes? I think that actually might reduce the number of object allocations per tensor and feels like the right conceptual pairing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider unifying the storage and NativeMemory classes? I think that actually might reduce the number of object allocations per tensor and feels like the right conceptual pairing.

Yes I was looking into that but this will require to add a dependency to the main TorchSharp project which I believe @migueldeicaza and @markusweimer want to avoid.

Copy link
Member

@ericstj ericstj Nov 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will require to add a dependency to the main TorchSharp project

You're talking about a dependency on System.Memory? Seems like that's not such a bad depdendency to have since it would let TorchSharp use Span, but I can see why you may want to avoid it.

I think this is leading to confusion in how this NativeMemory class is implemented. I was taking a closer look and now I see that Dispose is calling Free, but I don't think this is correct. It looks to me like you're using Free to represent "decrement a reference count" however the reference count may not have been previously incremented since I don't see a case where this is happening during construction. Since the NativeMemory itself isn't the owner of the memory at all, I wonder if it even makes sense to do anything in its Dispose 😕: in fact I don't even see where you call this.

In the Tensor/NativeMemory sample we were using the NativeMemory object as the "owner" of the memory. As such, it was the one implementing the reference count and it was the one responsible for eventually freeing that memory in its finalizer/dispose.

In this case, you're using NativeMemory as an adapter for your storage class which is ultimately the owner of the memory. The adapter acts as the mechanism for constructing spans over the storage.

Furthermore you have added IDisposable to your derived Tensors and it looks to me like this will dispose the backing torch tensor which will dispose the backing storage. I would have imagined you'd instead want to have decremented a reference count here and dispose the backing storage when the ref count gets to zero. As its implemented I think you'll end up freeing the memory behind a shallow-copied reshaped tensor.

Perhaps you can add some test cases to further flesh out how this should behave?

@tannergooding perhaps we could expand our Tensor samples to cover this scenario?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like you're using Free to represent "decrement a reference count" however the reference count may not have been previously incremented since I don't see a case where this is happening during construction.

I have pushed the call to Retain into the Nativ Memory constructor. Before it was outside when the storage was retrieved from the tensor to create the NativeMemory object.

Since the NativeMemory itself isn't the owner of the memory at all, I wonder if it even makes sense to do anything in its Dispose 😕: in fact I don't even see where you call this.

Dispose is required because NameMemory extends MemoryManager.

Furthermore you have added IDisposable to your derived Tensors and it looks to me like this will dispose the backing torch tensor which will dispose the backing storage. I would have imagined you'd instead want to have decremented a reference count here and dispose the backing storage when the ref count gets to zero. As its implemented I think you'll end up freeing the memory behind a shallow-copied reshaped tensor.

This is exactly the behavior I implemented (I hope :) ). The fact is that NativeMemory is required to dispose the storage because it may happen the case where a span is retrieved and used outside a tensor context (e.g., in this test case).

@danmoseley
Copy link
Member

@tannergooding owns S.N.T and can help make sure that the right integration is happening here. It's important to us that S.N.T is valuable for TorchSharp.

SNT/TypeGenerator.cs Outdated Show resolved Hide resolved
Copy link
Member

@ericstj ericstj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed face to face today, I think it's OK to proceed with this PR, but we should review it for things that we don't like so that we can revisit them in the design of Tensor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement System.Numerics.Tensor using libtorch
8 participants