-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help with implementing / learning TorchSharp. #980
Comments
Extracted Comments continued: (added in tags for the participants) @NiklasGustafsson Apr 14, 2023 Ah, yes -- something really hard to start out! :-) PyTorch relies on Python pickling for saving model and optimizer state. That is a magical serialization format, but it is tightly coupled to the Python object model and runtime. There are libraries like https://github.com/irmen/pickle that can handle unpickling in C#, with limitations, but it doesn't unpickle classes as classes, it restores them as Dictionary<string,object>, which is not sufficient for our needs -- modules need to have their logic restored, too, including all the calls to native code. So, we have had to rely on two separate solutions for sharing module state (weights + buffers) between Python and .NET:
There are two articles under the 'Wiki' header that covers these topics. Hopefully, those are sufficient to get you started. If not, please let me know where the information gaps are. https://github.com/dotnet/TorchSharp/wiki/Sharing-Model-Data-between-PyTorch-and-TorchSharp There's also a discussion of serialization in one of the tutorials: https://github.com/dotnet/TorchSharpExamples/blob/main/tutorials/CSharp/tutorial6.ipynb michieal Apr 14, 2023 Thank you. I really wondered what the "pickle" tag on the models meant. Also, since I have the python code for LLaMA, from Meta, wouldn't I have the structures/logic? I also a c/cpp implementation of it (llama.cpp).. though, that converts the data model into a ggml format. I have a few different pth files that I have downloaded. I'd fully convert the cpp to c#, but as you may know, that is a serious pain... Though, one point that I do like about the llama.cpp implementation, they have a PR that does mmap of the file, so that the OS loads it in the background and it uses less memory than reading all of that in. So, instead of a 24gb memory footprint, it has a ~4gb memory footprint. Also, working through the links that you gave me. If you have further suggestions on this, I'd greatly appreciate them! NiklasGustafsson Apr 14, 2023 Yes, if you have the source code, then you're golden. From there, it's "just work" to translate it to C# and then you can load weights and buffers. Please note that you have to be meticulous about the translation -- the fields have to have exactly the same name, the module/submodule hierarchy has to be exactly the same, etc. Also, since you'll be in the business of translating a serious model with serious memory requirements, make sure to read the Wiki article on memory management. michieal Apr 14, 2023 Yeah, I was thinking that method three is what I am going to have to cut my teeth on. Nothing quite like jumping head first into the deep end of the pool, lmao. NiklasGustafsson Apr 14, 2023 Yes, that's the only reasonable one to use. NiklasGustafsson Apr 14, 2023 Also, make sure that you follow the protocol for custom module construction: https://github.com/dotnet/TorchSharp/wiki/Creating-Your-Own-TorchSharp-Modules Without RegisterComponents toward the end, nothing will work. If you want to move a module to GPU, you need to do that after the call to RegisterComponents michieal Apr 14, 2023 Okay, I did notice that the code tries to use the GPU... (Cuda specifically.) if that is not available on the target machine, will TS route it over to the CPU instead? I mean, I have a Radeon, not a NVidia card... NiklasGustafsson Apr 14, 2023 TorchSharp currently only supports CPU and CUDA. We haven't had the resources to test with the Linux ROCm backend. That'll be a great thing for someone to contribute... :-) In TorchSharp, all non-TorchScript modules start out on CPU, and you have to move it explicitly to GPU. That will fail if you don't have a GPU (to be more precise, if you don't have a GPU backend). michieal Apr 14, 2023 I see... hmmn, that makes things difficult. The last release of the Linux ROCm Radeon drivers, well, was a cluster-f***. So, at the moment, I don't have it installed, as it breaks the system. But, once they release a working driver, I'd actually be willing to be a testing guinea pig. The Meta LLaMA code tries to do cuda processing... so, if I don't move it to the GPU... will it still run on the CPU? it also uses sentencepiece to do some processing, where do I find a C# equiv for that? NiklasGustafsson Apr 14, 2023 Yes, it will. Unless you save it using tracing on TorchScript while the weights are on CUDA. (this is a TorchScript limitation). Always call torch.is_cuda_available() before trying to move it unless you happen to know that you have a CUDA GPU. NiklasGustafsson Apr 14, 2023
TorchSharp is scoped to providing .NET bindings to libtorch, so any tokenization libraries, especially ones that include native code (which I believe sentencepiece does) is unfortunately outside the scope of this library. @luisquintanilla (ML.NET PM) may have some thoughts on suitable tokenizers to use in place of sentencepiece. michieal Apr 14, 2023 Understandable. Hopefully, they will know. @GeorgeS2019 Apr 14, 2023: NiklasGustafsson Apr 14, 2023 BlingFire may also be useful, and it's on NuGet: https://github.com/microsoft/BlingFire michieal Apr 14, 2023 Thank you, both of you! This is definitely helpful! michieal Apr 20, 2023 Okay, so... I didn't die... but wow, the migraine. lol. The github versions of SentencePiece (wrappers) are seriously incomplete, especially for something that wants to know information about the vocab & tokenizer model. So, I had to make my own wrapper, based on the original SentencePiece c++ code. I will probably go through later and expose the training features, but for now I don't think? I need it. also, I am going to copypasta this to a new issue, so that it's not buried. (Edit: hence, this issue and comments) |
So, question: code: class RMSNorm(torch.nn.Module):
def __init__(self, dim: int, eps: float = 1e-6):
super().__init__()
self.eps = eps
self.weight = nn.Parameter(torch.ones(dim))
def _norm(self, x):
return x * torch.rsqrt(x.pow(2).mean(-1, keepdim=True) + self.eps)
def forward(self, x):
output = self._norm(x.float()).type_as(x)
return output * self.weight Which, I translated to look like this: class RMSNorm : nn.Module<torch.Tensor, torch.Tensor> {
public float eps { get; }
public Parameter weight { get; }
public RMSNorm(int Dim, float Eps = 1e-6f) : base(nameof(RMSNorm)) {
this.eps = Eps;
weight = nn.Parameter(torch.ones(Dim));
register_parameter("weight", weight);
RegisterComponents();
}
public torch.Tensor Norm(torch.Tensor x) {
return x * torch.rsqrt(x.pow(2).mean(new long[] {-1}, true) + eps);
}
/*
def forward(self, x):
output = self._norm(x.float()).type_as(x)
return output * self.weight
*/
public override torch.Tensor forward(torch.Tensor x) {
using (var scope = torch.NewDisposeScope()) {
var xFloat = x.ToScalar();
var norm = Norm(xFloat);
return norm * weight;
}
}
} Note the |
Properties and parameters don't go well together -- the underlying private field will be treated as the actual parameter by 'RegisterComponents()' The safest thing is to declare the private field explicitly and then adjust the get and set of the property to use that field. Like C# 1.0 or something... :-) Unless you have a very compelling reason for the parameter to be public, I would just keep it in a private field. |
If you really need the parameter to be public and a property, then it's best to follow the pattern used in this version of Linear, which is from a WIP PR that replaces native code implementations with managed code: |
@michieal -- just checking in. How's your progress on learning TorchSharp? |
@michieal -- should I close this issue? |
Sorry, I had a confluence of life events cause me some issues. So, I had to step back some. I am still interested in / working on this. So, please keep this open. (Also, thank you for asking before just closing it.) |
@michieal |
@NiklasGustafsson @GeorgeS2019 , |
We need the dotnet version of transformers😐 |
I don't quite understand that the C# version of the code can obviously be more streamlined. Why do the library designers make them look very different from Python and look very redundant? |
@sgf -- thanks for the feedback. It's not clear to me what you are referring to when you mention it looking very different from Python, and very redundant. We have made a lot of effort (and not all .NET developers like it) to make it look as much as the Python APIs as possible. Some examples of what you are thinking about would be helpful. |
@NiklasGustafsson what my means is the code of the tutorials and examples like this:
and this code:
|
@sgf -- thanks for those examples. In these examples, what is it that you are wondering about / commenting on? |
Are you still in need of assistance learning TorchSharp? |
I am just now, as in this week, getting somewhere in recovering from a medical issue.
I am in the process of downloading the latest version of Meta's Llama. Which I will need help with. But as to the issue, you can close it, if you are willing to respond to new comments on it.
I apologize for taking so long to get back to you on this, I do plan to be a lot more active on / with torch Sharp.
Get Outlook for Android<https://aka.ms/AAb9ysg>
…________________________________
From: Niklas Gustafsson ***@***.***>
Sent: Wednesday, August 9, 2023 6:54:09 PM
To: dotnet/TorchSharp ***@***.***>
Cc: Michieal O'Sullivan ***@***.***>; Mention ***@***.***>
Subject: Re: [dotnet/TorchSharp] Help with implementing / learning TorchSharp. (Issue #980)
@michieal<https://github.com/michieal> -- should I close this issue?
Sorry, I had a confluence of life events cause me some issues. So, I had to step back some. I am still interested in / working on this. So, please keep this open. (Also, thank you for asking before just closing it.)
Are you still in need of assistance learning TorchSharp?
—
Reply to this email directly, view it on GitHub<#980 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AD3MH64EAQZA67TBOKDQ6BDXUQIJDANCNFSM6AAAAAAXGFGWXY>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@michieal Hope u are far along in recovery. We now have more users here doing MiniGPT or variants in Torchsharp |
@michieal -- I will close this for now. If you have more questions and/or ideas, please open an issue, or add a topic to the discussions. |
Understood. Sorry for keeping this open and unattended for so long. I do have a question, is there anything on the horizon about this working with the Godot Engine? |
Yes...I tested it using Godot 4 .net 6 |
awesome! |
If U know Godot, start a new project with .net6, take any unit test available here and attach to a node. Reference TorchSharp. U need to know Godot |
Thank you. |
U need to join Godot community..e.g. discord.just say U are an Unity Refugee, people will help u |
I use Linux, but I have the engine / editor installed. |
Yeah, lol, I'm there. |
I might be joining the conversation a bit late, but I wanted to share my weekend project with everyone here in case it could be beneficial.
|
@LittleLittleCloud Hi Zhang Xiaoyun, thanks for your contribution. We value it! |
Extracted from another issue: (note: some typo's may be fixed in the extraction.)
NiklasGustafsson Apr 11, 2023
Last time I checked, both the tutorials and examples at dotnet/TorchSharpExamples built and ran on Linux and MacOS, as well as Windows. If that has regressed, please file a bug in that repo.
TorchSharp is a thin library on top of libtorch, and the API design was done to make it straightforward to build on the plethora of Python-based examples and tutorials that are out there, since we do not have the resources to create our own.
I can (most of the time) copy-and-paste tensor and module expressions from Python into C#, but there are inherent differences that cannot be overcome without programmer involvement:
I'm not sure what the old API you are referring to is -- we switched to Python-like naming conventions, following the SciSharp community in that regard, and the PyTorch scope hierarchy (which forced us to use a lot of static classes everywhere) a very long time ago. The examples and tutorials online at dotnet/TorchSharpExamples do not use the old version of the APIs.
Here are some other resources:
https://github.com/dotnet/TorchSharp/wiki
https://github.com/dotnet/TorchSharpExamples/tree/main/src
https://github.com/dotnet/TorchSharpExamples/tree/main/tutorials/CSharp
michieal Apr 14, 2023
Well, Ideally, I would like to use it to make a LLaMa or Alpaca implementation in C#. But, the "simple test" that I used to "get to know this" was this code:
I mean, the python script works, I tested it earlier. I would like to make a c# version of this, so I don't have to have everything hard coded. But, the other day, I couldn't even do that much.
michieal Apr 14, 2023
You mentioned loading the modules using module(x)... can you tell me more about that?
I think that was one of the main points of failure that I experienced. Like, there's nothing up front that says to do that (that I saw), and then there's also the concept of that one has no idea what modules that the command can load; or what x should be? is it a string? is it...? etc.
NiklasGustafsson Apr 14, 2023
When you pass data into a module in Python, you treat it as a callable object, and the forward method is called:
Since C# has no operator() that can be overloaded, unlike C++, we cannot replicate that syntax in C# without resorting to dynamic, which we don't want to do. Therefore, you have to call call on the module, which allows hooks to be invoked, or you can call forward directly.
michieal Apr 14, 2023
okay, the code has statements like these: class FeedForward(nn.Module): and in it, it defines a forward function... I know that TS does forward functions (I've read at least that much, lol)... is this where I load a module using module(x) or, is this creating a new module?
I guess, I am asking how to interpret some of the python, to know when to use the module command.
NiklasGustafsson Apr 14, 2023
That just means that FeedForward is derived from nn.Module. In C#, you should derive from one of the Module<T...> classes, preferably. The <T...> signature determines the signature of the forward function, which contains the logic of the forward pass. The backward pass is determined via autograd in the native runtime.
So, in Python you would call the forward function directly, if you want, but you usually treat the module class as a callable, i.e. a function-like object. Think of the Python module as having overloaded operator(...) by defining forward(...)
michieal Apr 14, 2023
ahhh okay. I was just checking on that, what about the use of a delegate to call it like a function? (asking because it was suggested.)
but, when I go to build those parts, I definite it as a class, that derives from nn.Module<type, type>, and fill in the two types from specifically the forward function's two types... correct?
NiklasGustafsson Apr 14, 2023
Technically, it's
Module<T, TResult>
, where TResult is the return type of forward and T is any number of types that form its signature (I think we have it defined up to <T1,...,T6,TResult>.There are some modules that have multiple forward signatures (Python deals with this dynamically), in which case you have to mix in IModule<T,...TResult> for anything you don't consider mainstream. Specifically, Sequential only accepts IModule<Tensor,Tensor> components, so if your module has that (which most do), then that should be the main one. That said, multiple forwards is an uncommon situation.
michieal Apr 14, 2023
the code only has single forward statements per class def. Which, I guess would be module definition?
NiklasGustafsson Apr 14, 2023
Unfortunately, you have to look at the logic -- Python doesn't allow overloading of methods, so the forward() will figure out what was passed inside the body. In TorchSharp, we insist on static typing... :-)
michieal Apr 14, 2023
Well, here's the smallest code snippet from the source code. this is in the Model.py file. (I figure that it's small enough to work with here, to get an understanding)
I'm guessing, that I would use TorchScript to construct this? or, am I off there?
Also, I am today years old learning that python has lambda declarations. lol.
NiklasGustafsson Apr 14, 2023
No, you would translate to C# manually. It would look something like (I didn't try to compile it):
You can definitely use TorchScript, too, if PyTorch is able to export the model. However, it will then be a black box and you won't be able to modify it or use it to learn the details of how to use TorchSharp. The one benefit is you don't have to mess with translating the code.
michieal Apr 14, 2023
Well, I would rather learn. I'm not a fan of black boxes, especially in regards to code. And, my design goals means that I would need to use LLaMA in conjunction with a pre-filtering AI module to convert the user input into a viable input for the llama section. (So trying to not toss out the word "module" all over, and confuse the subject. lol.)
For that part, I was thinking that a BERT model would work well, as I am trying to (ultimately) make an AI assistant that you can ask questions and get a creative / helpful / mostly factual response.
NiklasGustafsson Apr 14, 2023
Just to give you an idea about the difference between call() and forward(), here's the Module<T,TResult> implementation of call:
You should only implement (i.e. override) forward in your custom module.
The text was updated successfully, but these errors were encountered: