-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dense SNT Wrapper for TorchTensors #51
Conversation
SNT/NativeMemoryManager.cs
Outdated
using System.Threading; | ||
using System; | ||
|
||
namespace Torch.SNT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be using non-descriptive names like 'SNT' in namespaces? It does not aid in discovery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Any suggestion? I can definitely unroll SNT as SystemNumericTensor but it becomes really long.
|
||
<ItemGroup> | ||
<PackageReference Include="System.Numerics.Tensors" Version="0.1.0-preview2-181031-3" /> | ||
<PackageReference Include="System.Runtime.CompilerServices.Unsafe" Version="4.5.2" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I don't do unsafe C# too often, I keep forgetting -- does it not complicate using the package from applications/services that don't allow unsafe code? Is there a way to avoid that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm I don't think I can avoid this. I need to map the unmanaged memory from TorchSharp into C# stuff; I don't think I can avoid the usage of pointers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsafe is also already used in TorchSharp when returning the pointer to data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, looking at this some more -- TorchSharp.NNTensor.Data takes an IntPtr and casts it to a pointer, which makes it unsafe code. Then, your code takes the pointer and casts it back to an IntPtr for storage. If Tensor.Data instead returns the IntPtr it got from native code, you shouldn't have to handle pointers, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can definitely remove some of the unsafe in that way. However I will still need Unsafe because the Pin method accepts as input an index and therefore I need to do pointer + index to return the correct slice of memory. Actually that Unsafe dependency is used for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that's good. We can get rid of the unsafe attribute on TorchSharp, then. I have a PR coming with some of those changes.
int product = 1; | ||
for (int i = startIndex; i < dimensions.Length; i++) | ||
{ | ||
if (dimensions[i] < 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid this check by using unsinged integers for the dimensions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is difficult to do, unless we can convince the SNT folks to use unsigned ints in their API. (Torch is actually using long BTW)
SNT/NativeMemoryManager.cs
Outdated
|
||
public unsafe NativeMemory(void* memory, int length) | ||
{ | ||
this.memory = (IntPtr)memory; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this conversion fail? Also, this constructor could call the other one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I redirected the constructor to the previous one. The conversion is ok, I checked online and it seems that it cannot fail, it is simply boxing a pointer into a managed object.
SNT/NativeMemoryManager.cs
Outdated
private IntPtr memory; | ||
private int length; | ||
|
||
public NativeMemory(IntPtr memory, int length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably should do some defensive parameter checks of memory
not being null and such, and length
being greater than 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Length could be zero if you want to set up an empty tensors I presume. I can add the check that length is not negative.
SNT/Utils.cs
Outdated
{ | ||
internal class Utils | ||
{ | ||
public static int GetTotalLength(ReadOnlySpan<int> dimensions, int startIndex = 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, couldn't this be a property on the Tensor object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a Length property in Tensor (which I will map into a call to Torch nElement). But this is different because it is used before the tensor is created.
@@ -14,6 +14,7 @@ | |||
|
|||
<ItemGroup> | |||
<ProjectReference Include="..\TorchSharp\TorchSharp.csproj" /> | |||
<ProjectReference Include="..\SNT\SNT.csproj" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should have our own test project for the SNT wrapper? That way, we keep the two layers completely separate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. At this point we should address #34 and create a unique test folder (either in source or on its own).
Fixed all comments so far. |
Ouch - rerunning the tests still crashes. I suspect that this indicates a real memory corruption bug on this PR. |
Yes there is something going on with the memory clean up. I have spent the last day in trying to figure this out with no luck. Now that we have libtorch running in Windows I should be able to figure out what is going on. Please don't merge until I am able to fix the bug. |
I think I have addressed all comments and fixed the test error (thanks to @NiklasGustafsson). |
Should we still hold off on merging this one? |
If you want we can wait until System.Numerics.Tensor NuGet get released. Beside this, it can be merged as is I think. |
If this relies on a third party NuGet that is not released, let us wait until then. |
The NuGet is available on MyGet. The .NET team is working on pushing it to Nuget.org soon. I'm OK holding off till that happens. |
SNT/TypeGenerator.tt
Outdated
/// <summary> | ||
/// Wrapper class used to surface a Torch <#=tname#>Tensor as a System.Numerics DensorTensor of <#=tname.ToLower()#> | ||
/// </summary> | ||
public sealed class <#=tname#>TorchTensor : DenseTensor<<#=tname.ToLower()#>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you summarize briefly why you needed to subclass DenseTensor rather than use it as is? Was it only so that you could make the Clone give you back a tensor with memory owned by torch rather than managed memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the points are 2:
- we want to be able to surface to users the
Tensor<T>
type backed by Torch tensor implementation. - It is not only for Clone: we want to use the fast operations over tensors that Torch provides. Eventually we will also have DenseTensors backed by GPUs.
I don't think we can achieve these 2 points without subclassing (unless I am missing something).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this. Did you consider making your derived type generic as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my initial implementation was generic but we figured that a non-generic version was simpler. What do you think? Do you think that a generic type will be more "compatible" with the Tensor<T>
design?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have a generic type / methods it enables generic callers, if you don't have them then that scenario is not possible (unless the caller creates a type-switch). Ideally we want to handle the type switching in the libraries rather than force all the callers to do type switching.
We also try to design generic APIs so that the generic type can be inferred: this is why we have the static creation methods on Tensor (note: not Tensor<T>) rather than forcing folks to call the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, libtorch C++ API is not generic - that is, it has a single non-parameterized at::Tensor
class with methods like .type()
, .device()
, .is_sparse()
etc. All dispatching to the proper backend and function specialization happens at runtime in Type
and TensorImpl
classes. So maybe if even libtorch developers were not able to make the Tensor
class parameterized - not even by the scalar type and in less restrictive C++ template system, we should not worry about it too much, either?
P.S. That said, I'd be delighted to see a strongly typed tensor library for C# - ideally something in the spirit of Haskell's HMatrix. The question is how much effort we want to put into it. (Maybe if we have a nice low-level API now, we can build an alternative strongly-typed parameterized API on top of it later?) Just thinking out loud.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are two different concerns tho. I think that the Tensor<T>
wrapper for TorchSharp should be following the Tensor<T>
paradigm, i.e., try to use generics. For the main TorchSharp story, I totally agree with you: it is fine to be non-generic and consistent with the Torch API, even if a generic API will be better (and Issue #18 is exactly for this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking into following the approach of System.Numerics.Tensors for dispatching types but it is not going to work in our case at the moment. The problem is that we don't have a root type for the tensors in TorchSharp (i.e., the ITensorArithmetic<T>
in https://github.com/dotnet/corefx/blob/master/src/System.Numerics.Tensors/tests/TensorArithmetic.cs) therefore I am not able to create a singleton object for type dispatching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's just an internal interface, could't you create one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have update the dependency to the Nuget so this PR can now be merged (once @ericstj agrees) |
/cc @tannergooding |
…native memory point to a torch storage instead of directly to the data pointer.
I'm concerned with the overall approach that you now have two tensor types exposed instead of just one. This PR creates Torch.SNT.*TorchTensor, but you still have TorchSharp.*Tensor. Is this PR a step along the way, or do you imagine having both of these tensor types remain publicly exposed? |
I think that the idea was to have TorchSharp as a thin layer with minimal dependencies to provide a way to access Torch functionalities from C#. Torch.SNT should instead try to back System.Numerics.Tensors with Torch tensors so that, from one side, we can exploit the efficient implementation of tensor operations that Torch provide, and from the other we can converge to "the" tensor type in .Net without providing a new one (TorchSharp.Tensor). I agree that in this way some of the functionalities will have duplicates (e.g., NativeMemory and Storage and 2 tensor types) but @migueldeicaza was clear that he wants to separate the 2 efforts (am I wrong?). @markusweimer do you have any additional comment on this? |
SNT/TypeGenerator.tt
Outdated
/// <summary> | ||
~<#=tname#>NativeMemory () | ||
{ | ||
Dispose (true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this version I actually have to dispose the memory. If you look at line 146, I am retain the storage pointer every time a new native memory is created (or pinned). Therefore I need to Dispose (true)
to properly decrement the reference count. I have to fix the comment :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe you can safely do that. The GC does not guarantee that storage
(another managed object) has not yet been finalized at the time the finalizer for this NativeMemory runs. Only one object should be in charge of maintaining the native state and that object should be the one that has the finalizer that does work. (ps, consider #51 (comment))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is something I am missing on C# GC then. Native storage maintains a reference to storage
, why the GC should destroy storage
if refcount > 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both references are out of scope at the same time. This is documented behavior
The finalizers of two objects are not guaranteed to run in any specific order, even if one object refers to the other. That is, if Object A has a reference to Object B and both have finalizers, Object B might have already been finalized when the finalizer of Object A starts.
I suspect this has to do with GC performance and how it calculates live objects in a single pass and puts them on the finalization queue in the order they are encountered (not necessarily in the order they appear in an object graph), also likely has to do with not needing to deal with potentially deep object cycles. @Maoni0 could explain better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. This behavior is only related to finalizers, right?
The exact time when the finalizer executes is undefined. To ensure deterministic release of resources for instances of your class, implement a Close method or provide a IDisposable.Dispose implementation.
Therefore I can do Dispose (false)
in Native Memory (Storage already have Dispose (false)
) to avoid nondeterministic behaviors and use the Dispose method to make sure that Native Memory objects are disposed before Storage (since they maintain a reference to it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds right. Of course in the case that no-one calls Dispose, the finalizer for storage will still clean up the native resources, therefore you can technically remove the finalizer from NativeMemory since it does nothing. You can also remove the public Dispose method unless you need to explicitly call it, since the base type implements IDisposable.
Did you consider unifying the storage and NativeMemory classes? I think that actually might reduce the number of object allocations per tensor and feels like the right conceptual pairing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider unifying the storage and NativeMemory classes? I think that actually might reduce the number of object allocations per tensor and feels like the right conceptual pairing.
Yes I was looking into that but this will require to add a dependency to the main TorchSharp project which I believe @migueldeicaza and @markusweimer want to avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will require to add a dependency to the main TorchSharp project
You're talking about a dependency on System.Memory? Seems like that's not such a bad depdendency to have since it would let TorchSharp use Span, but I can see why you may want to avoid it.
I think this is leading to confusion in how this NativeMemory class is implemented. I was taking a closer look and now I see that Dispose is calling Free, but I don't think this is correct. It looks to me like you're using Free to represent "decrement a reference count" however the reference count may not have been previously incremented since I don't see a case where this is happening during construction. Since the NativeMemory itself isn't the owner of the memory at all, I wonder if it even makes sense to do anything in its Dispose 😕: in fact I don't even see where you call this.
In the Tensor/NativeMemory sample we were using the NativeMemory object as the "owner" of the memory. As such, it was the one implementing the reference count and it was the one responsible for eventually freeing that memory in its finalizer/dispose.
In this case, you're using NativeMemory as an adapter for your storage class which is ultimately the owner of the memory. The adapter acts as the mechanism for constructing spans over the storage.
Furthermore you have added IDisposable to your derived Tensors and it looks to me like this will dispose the backing torch tensor which will dispose the backing storage. I would have imagined you'd instead want to have decremented a reference count here and dispose the backing storage when the ref count gets to zero. As its implemented I think you'll end up freeing the memory behind a shallow-copied reshaped tensor.
Perhaps you can add some test cases to further flesh out how this should behave?
@tannergooding perhaps we could expand our Tensor samples to cover this scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks to me like you're using Free to represent "decrement a reference count" however the reference count may not have been previously incremented since I don't see a case where this is happening during construction.
I have pushed the call to Retain into the Nativ Memory constructor. Before it was outside when the storage was retrieved from the tensor to create the NativeMemory object.
Since the NativeMemory itself isn't the owner of the memory at all, I wonder if it even makes sense to do anything in its Dispose 😕: in fact I don't even see where you call this.
Dispose is required because NameMemory extends MemoryManager.
Furthermore you have added IDisposable to your derived Tensors and it looks to me like this will dispose the backing torch tensor which will dispose the backing storage. I would have imagined you'd instead want to have decremented a reference count here and dispose the backing storage when the ref count gets to zero. As its implemented I think you'll end up freeing the memory behind a shallow-copied reshaped tensor.
This is exactly the behavior I implemented (I hope :) ). The fact is that NativeMemory is required to dispose the storage because it may happen the case where a span is retrieved and used outside a tensor context (e.g., in this test case).
@tannergooding owns S.N.T and can help make sure that the right integration is happening here. It's important to us that S.N.T is valuable for TorchSharp. |
Moved Retain after argument checks. Added some missing case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed face to face today, I think it's OK to proceed with this PR, but we should review it for things that we don't like so that we can revisit them in the design of Tensor.
This closes #3. We will need a different issue for the operators. This PR only extends the methods in the
DenseTensor<T>
class using Torch tensors as backends.