Help with implementing / learning TorchSharp. #980

michieal · 2023-04-20T23:56:12Z

Extracted from another issue: (note: some typo's may be fixed in the extraction.)

NiklasGustafsson Apr 11, 2023

Last time I checked, both the tutorials and examples at dotnet/TorchSharpExamples built and ran on Linux and MacOS, as well as Windows. If that has regressed, please file a bug in that repo.

TorchSharp is a thin library on top of libtorch, and the API design was done to make it straightforward to build on the plethora of Python-based examples and tutorials that are out there, since we do not have the resources to create our own.

I can (most of the time) copy-and-paste tensor and module expressions from Python into C#, but there are inherent differences that cannot be overcome without programmer involvement:

Python and .NET memory management are different.
Python's syntax for passing arguments by name is different from C#'s.
We can't mimic the module invocation syntax module(x) without resorting to using dynamic, so (with the latest release) it's module.call(x) in C#.
Python's and C#'s statement/class/method, etc. syntax are obviously very different.

I'm not sure what the old API you are referring to is -- we switched to Python-like naming conventions, following the SciSharp community in that regard, and the PyTorch scope hierarchy (which forced us to use a lot of static classes everywhere) a very long time ago. The examples and tutorials online at dotnet/TorchSharpExamples do not use the old version of the APIs.

Here are some other resources:

https://github.com/dotnet/TorchSharp/wiki
https://github.com/dotnet/TorchSharpExamples/tree/main/src
https://github.com/dotnet/TorchSharpExamples/tree/main/tutorials/CSharp

michieal Apr 14, 2023

Well, Ideally, I would like to use it to make a LLaMa or Alpaca implementation in C#. But, the "simple test" that I used to "get to know this" was this code:

import torch

model1_path = "./pytorch_model-00001-of-00003.bin"
model2_path = "./pytorch_model-00002-of-00003.bin"
model3_path = "./pytorch_model-00003-of-00003.bin"
merged_model_path = "./pytorch_model-13B.bin"

model1 = torch.load(model1_path, map_location=torch.device('cpu'))
model2 = torch.load(model2_path, map_location=torch.device('cpu'))
model3 = torch.load(model3_path, map_location=torch.device('cpu'))

# merge the models into a single dictionary
merged_model = {"model1": model1, "model2": model2, "model3": model3}

torch.save(merged_model, merged_model_path)

I mean, the python script works, I tested it earlier. I would like to make a c# version of this, so I don't have to have everything hard coded. But, the other day, I couldn't even do that much.

michieal Apr 14, 2023

You mentioned loading the modules using module(x)... can you tell me more about that?
I think that was one of the main points of failure that I experienced. Like, there's nothing up front that says to do that (that I saw), and then there's also the concept of that one has no idea what modules that the command can load; or what x should be? is it a string? is it...? etc.

NiklasGustafsson Apr 14, 2023

When you pass data into a module in Python, you treat it as a callable object, and the forward method is called:

input = ... 
module = ...
output = module(input)

Since C# has no operator() that can be overloaded, unlike C++, we cannot replicate that syntax in C# without resorting to dynamic, which we don't want to do. Therefore, you have to call call on the module, which allows hooks to be invoked, or you can call forward directly.

michieal Apr 14, 2023

okay, the code has statements like these: class FeedForward(nn.Module): and in it, it defines a forward function... I know that TS does forward functions (I've read at least that much, lol)... is this where I load a module using module(x) or, is this creating a new module?
I guess, I am asking how to interpret some of the python, to know when to use the module command.

NiklasGustafsson Apr 14, 2023

That just means that FeedForward is derived from nn.Module. In C#, you should derive from one of the Module<T...> classes, preferably. The <T...> signature determines the signature of the forward function, which contains the logic of the forward pass. The backward pass is determined via autograd in the native runtime.

So, in Python you would call the forward function directly, if you want, but you usually treat the module class as a callable, i.e. a function-like object. Think of the Python module as having overloaded operator(...) by defining forward(...)

michieal Apr 14, 2023

ahhh okay. I was just checking on that, what about the use of a delegate to call it like a function? (asking because it was suggested.)

but, when I go to build those parts, I definite it as a class, that derives from nn.Module<type, type>, and fill in the two types from specifically the forward function's two types... correct?

NiklasGustafsson Apr 14, 2023

Technically, it's Module<T, TResult>, where TResult is the return type of forward and T is any number of types that form its signature (I think we have it defined up to <T1,...,T6,TResult>.

There are some modules that have multiple forward signatures (Python deals with this dynamically), in which case you have to mix in IModule<T,...TResult> for anything you don't consider mainstream. Specifically, Sequential only accepts IModule<Tensor,Tensor> components, so if your module has that (which most do), then that should be the main one. That said, multiple forwards is an uncommon situation.

michieal Apr 14, 2023

the code only has single forward statements per class def. Which, I guess would be module definition?

NiklasGustafsson Apr 14, 2023

Unfortunately, you have to look at the logic -- Python doesn't allow overloading of methods, so the forward() will figure out what was passed inside the body. In TorchSharp, we insist on static typing... :-)

michieal Apr 14, 2023

Well, here's the smallest code snippet from the source code. this is in the Model.py file. (I figure that it's small enough to work with here, to get an understanding)

class FeedForward(nn.Module):
    def __init__(
        self,
        dim: int,
        hidden_dim: int,
        multiple_of: int,
    ):
        super().__init__()
        hidden_dim = int(2 * hidden_dim / 3)
        hidden_dim = multiple_of * ((hidden_dim + multiple_of - 1) // multiple_of)

        self.w1 = ColumnParallelLinear(
            dim, hidden_dim, bias=False, gather_output=False, init_method=lambda x: x
        )
        self.w2 = RowParallelLinear(
            hidden_dim, dim, bias=False, input_is_parallel=True, init_method=lambda x: x
        )
        self.w3 = ColumnParallelLinear(
            dim, hidden_dim, bias=False, gather_output=False, init_method=lambda x: x
        )

    def forward(self, x):
        return self.w2(F.silu(self.w1(x)) * self.w3(x))

I'm guessing, that I would use TorchScript to construct this? or, am I off there?

Also, I am today years old learning that python has lambda declarations. lol.

NiklasGustafsson Apr 14, 2023

No, you would translate to C# manually. It would look something like (I didn't try to compile it):

public class FeedForward : torch.nn.Module<Tensor,Tensor>
{
    private ColumnParallelLinear w1;
    private RowParallelLinear w2;
    private ColumnParallelLinear w3;

    public FeedForward(int dim, int hidden_dim, int multiple_of) : base(nameof(FeedForward))
    {
        var hidden_dim = (int) 2 * hidden_dim / 3.0;
        hidden_dim = multiple_of * ((hidden_dim + multiple_of - 1) / multiple_of); // The last division must be integer division.

        w1 = new ColumnParallelLinear(dim, hidden_dim, bias: false, gather_output: false, init_method: x => x);
        w2 = new RowParallelLinear(hidden_dim, dim, bias: false, gather_output: false, init_method: x => x);
        w3 = new ColumnParallelLinear(dim, hidden_dim, bias: false, gather_output: false, init_method: x => x);
        RegisterComponents();
    }

    public override Tensor forward(Tensor x)
    {
        using _ = torch.NewDisposeScope();
        return w2.forward(functional.silu(w1.forward(x)) * w3.foward(x)).MoveToOuterDisposeScope();
    }
}

You can definitely use TorchScript, too, if PyTorch is able to export the model. However, it will then be a black box and you won't be able to modify it or use it to learn the details of how to use TorchSharp. The one benefit is you don't have to mess with translating the code.

michieal Apr 14, 2023

Well, I would rather learn. I'm not a fan of black boxes, especially in regards to code. And, my design goals means that I would need to use LLaMA in conjunction with a pre-filtering AI module to convert the user input into a viable input for the llama section. (So trying to not toss out the word "module" all over, and confuse the subject. lol.)

For that part, I was thinking that a BERT model would work well, as I am trying to (ultimately) make an AI assistant that you can ask questions and get a creative / helpful / mostly factual response.

NiklasGustafsson Apr 14, 2023

Just to give you an idea about the difference between call() and forward(), here's the Module<T,TResult> implementation of call:

public TResult call(T input)
{
    // Call pre-hooks, if available.

    foreach (var hook in pre_hooks.Values) {
        var modified = hook(this, input);
        if (modified is not null)
            input = modified;
    }

    var result = forward(input);

    // Call post-hooks, if available.

    foreach (var hook in post_hooks.Values) {
        var modified = hook(this, input, result);
        if (modified is not null)
            result = modified;
    }

    return result;
}

You should only implement (i.e. override) forward in your custom module.

The text was updated successfully, but these errors were encountered:

michieal · 2023-04-21T00:05:22Z

Extracted Comments continued: (added in tags for the participants)

@NiklasGustafsson Apr 14, 2023

Ah, yes -- something really hard to start out! :-)

PyTorch relies on Python pickling for saving model and optimizer state. That is a magical serialization format, but it is tightly coupled to the Python object model and runtime. There are libraries like https://github.com/irmen/pickle that can handle unpickling in C#, with limitations, but it doesn't unpickle classes as classes, it restores them as Dictionary<string,object>, which is not sufficient for our needs -- modules need to have their logic restored, too, including all the calls to native code.

So, we have had to rely on two separate solutions for sharing module state (weights + buffers) between Python and .NET:

TorchScript -- this has no dependence on the Python runtime and has good performance. TorchSharp supports loading and saving ScriptModules first created in Python (traced as well as scripted), but not creating them from scratch. Not all models are supported by TorchScript (this is not a TorchSharp limitation).

A custom format for saving the state_dict in Python, then loading in .NET. For this, you have to recreate the exact model definition, and have an instance of it to load the state into. This is a tedious process, but works. We have a Python script to export the state from Python in the source repo (it can also export optimizer state now).

There are two articles under the 'Wiki' header that covers these topics. Hopefully, those are sufficient to get you started. If not, please let me know where the information gaps are.

https://github.com/dotnet/TorchSharp/wiki/Sharing-Model-Data-between-PyTorch-and-TorchSharp
https://github.com/dotnet/TorchSharp/wiki/TorchScript

There's also a discussion of serialization in one of the tutorials:

https://github.com/dotnet/TorchSharpExamples/blob/main/tutorials/CSharp/tutorial6.ipynb

michieal Apr 14, 2023

Thank you. I really wondered what the "pickle" tag on the models meant. Also, since I have the python code for LLaMA, from Meta, wouldn't I have the structures/logic? I also a c/cpp implementation of it (llama.cpp).. though, that converts the data model into a ggml format. I have a few different pth files that I have downloaded. I'd fully convert the cpp to c#, but as you may know, that is a serious pain... Though, one point that I do like about the llama.cpp implementation, they have a PR that does mmap of the file, so that the OS loads it in the background and it uses less memory than reading all of that in. So, instead of a 24gb memory footprint, it has a ~4gb memory footprint.

Also, working through the links that you gave me. If you have further suggestions on this, I'd greatly appreciate them!

NiklasGustafsson Apr 14, 2023

Yes, if you have the source code, then you're golden. From there, it's "just work" to translate it to C# and then you can load weights and buffers. Please note that you have to be meticulous about the translation -- the fields have to have exactly the same name, the module/submodule hierarchy has to be exactly the same, etc.

Also, since you'll be in the business of translating a serious model with serious memory requirements, make sure to read the Wiki article on memory management.

michieal Apr 14, 2023

Yeah, I was thinking that method three is what I am going to have to cut my teeth on. Nothing quite like jumping head first into the deep end of the pool, lmao.

NiklasGustafsson Apr 14, 2023

Yes, that's the only reasonable one to use.

NiklasGustafsson Apr 14, 2023

Also, make sure that you follow the protocol for custom module construction:

https://github.com/dotnet/TorchSharp/wiki/Creating-Your-Own-TorchSharp-Modules

Without RegisterComponents toward the end, nothing will work. If you want to move a module to GPU, you need to do that after the call to RegisterComponents

michieal Apr 14, 2023

Okay, I did notice that the code tries to use the GPU... (Cuda specifically.) if that is not available on the target machine, will TS route it over to the CPU instead? I mean, I have a Radeon, not a NVidia card...

NiklasGustafsson Apr 14, 2023

TorchSharp currently only supports CPU and CUDA. We haven't had the resources to test with the Linux ROCm backend. That'll be a great thing for someone to contribute... :-)

In TorchSharp, all non-TorchScript modules start out on CPU, and you have to move it explicitly to GPU. That will fail if you don't have a GPU (to be more precise, if you don't have a GPU backend).

michieal Apr 14, 2023

I see... hmmn, that makes things difficult. The last release of the Linux ROCm Radeon drivers, well, was a cluster-f***. So, at the moment, I don't have it installed, as it breaks the system. But, once they release a working driver, I'd actually be willing to be a testing guinea pig.

The Meta LLaMA code tries to do cuda processing... so, if I don't move it to the GPU... will it still run on the CPU?

it also uses sentencepiece to do some processing, where do I find a C# equiv for that?

NiklasGustafsson Apr 14, 2023

Yes, it will. Unless you save it using tracing on TorchScript while the weights are on CUDA. (this is a TorchScript limitation). Always call torch.is_cuda_available() before trying to move it unless you happen to know that you have a CUDA GPU.

NiklasGustafsson Apr 14, 2023

it also uses sentencepiece to do some processing, where do I find a C# equiv for that?

TorchSharp is scoped to providing .NET bindings to libtorch, so any tokenization libraries, especially ones that include native code (which I believe sentencepiece does) is unfortunately outside the scope of this library.

@luisquintanilla (ML.NET PM) may have some thoughts on suitable tokenizers to use in place of sentencepiece.

michieal Apr 14, 2023

Understandable. Hopefully, they will know.

@GeorgeS2019 Apr 14, 2023:

@michieal
https://github.com/wang1ang/SentencePieceWrapper

NiklasGustafsson Apr 14, 2023

BlingFire may also be useful, and it's on NuGet:

https://github.com/microsoft/BlingFire

michieal Apr 14, 2023

Thank you, both of you! This is definitely helpful!

michieal Apr 20, 2023

Okay, so... I didn't die... but wow, the migraine. lol.

The github versions of SentencePiece (wrappers) are seriously incomplete, especially for something that wants to know information about the vocab & tokenizer model. So, I had to make my own wrapper, based on the original SentencePiece c++ code. I will probably go through later and expose the training features, but for now I don't think? I need it.

also, I am going to copypasta this to a new issue, so that it's not buried. (Edit: hence, this issue and comments)

michieal · 2023-04-21T00:09:50Z

So, question:
when I am making a parameter that is defined as Parameter, do I need to call register_parameter() with the parameter?

code:

class RMSNorm(torch.nn.Module):
    def __init__(self, dim: int, eps: float = 1e-6):
        super().__init__()
        self.eps = eps
        self.weight = nn.Parameter(torch.ones(dim))

    def _norm(self, x):
        return x * torch.rsqrt(x.pow(2).mean(-1, keepdim=True) + self.eps)

    def forward(self, x):
        output = self._norm(x.float()).type_as(x)
        return output * self.weight

Which, I translated to look like this:

    class RMSNorm : nn.Module<torch.Tensor, torch.Tensor> {
        public float     eps    { get; }
        public Parameter weight { get; }

        public RMSNorm(int Dim, float Eps = 1e-6f) : base(nameof(RMSNorm)) {
            this.eps = Eps;
            weight = nn.Parameter(torch.ones(Dim));
            register_parameter("weight", weight);
            RegisterComponents();
        }

        public torch.Tensor Norm(torch.Tensor x) {
            return x * torch.rsqrt(x.pow(2).mean(new long[] {-1}, true) + eps);
        }

        /*         
        def forward(self, x):
            output = self._norm(x.float()).type_as(x)
            return output * self.weight
            */

        public override torch.Tensor forward(torch.Tensor x) {
            using (var scope = torch.NewDisposeScope()) {
                var xFloat = x.ToScalar();
                var norm = Norm(xFloat);
                return norm * weight;
            }
        }   
    }

Note the self.weight = nn.Parameter(torch.ones(dim)) line in the Python Code.

NiklasGustafsson · 2023-04-24T14:07:11Z

Properties and parameters don't go well together -- the underlying private field will be treated as the actual parameter by 'RegisterComponents()' The safest thing is to declare the private field explicitly and then adjust the get and set of the property to use that field. Like C# 1.0 or something... :-)

Unless you have a very compelling reason for the parameter to be public, I would just keep it in a private field.

NiklasGustafsson · 2023-04-25T15:26:45Z

If you really need the parameter to be public and a property, then it's best to follow the pattern used in this version of Linear, which is from a WIP PR that replaces native code implementations with managed code:

https://github.com/dotnet/TorchSharp/blob/1dba6d14effb4fe94982b8dcd7db5ff6cc080972/src/TorchSharp/NN/Linear.cs

NiklasGustafsson · 2023-04-28T16:23:07Z

@michieal -- just checking in. How's your progress on learning TorchSharp?

NiklasGustafsson · 2023-05-16T15:52:19Z

@michieal -- should I close this issue?

michieal · 2023-05-17T04:37:24Z

@michieal -- should I close this issue?

Sorry, I had a confluence of life events cause me some issues. So, I had to step back some. I am still interested in / working on this. So, please keep this open. (Also, thank you for asking before just closing it.)

GeorgeS2019 · 2023-05-17T07:44:54Z

@michieal
I hope you will manage your challenges soon and get back to asking more questions. Now there are more projects on Deep NLP using TorchSharp waiting for you to checkout :-)

ChengYen-Tang · 2023-05-23T05:20:13Z

Extracted Comments continued: (added in tags for the participants)

@NiklasGustafsson Apr 14, 2023

Ah, yes -- something really hard to start out! :-)

PyTorch relies on Python pickling for saving model and optimizer state. That is a magical serialization format, but it is tightly coupled to the Python object model and runtime. There are libraries like https://github.com/irmen/pickle that can handle unpickling in C#, with limitations, but it doesn't unpickle classes as classes, it restores them as Dictionary<string,object>, which is not sufficient for our needs -- modules need to have their logic restored, too, including all the calls to native code.

So, we have had to rely on two separate solutions for sharing module state (weights + buffers) between Python and .NET:
TorchScript -- this has no dependence on the Python runtime and has good performance. TorchSharp supports loading and saving ScriptModules first created in Python (traced as well as scripted), but not creating them from scratch. Not all models are supported by TorchScript (this is not a TorchSharp limitation).
A custom format for saving the state_dict in Python, then loading in .NET. For this, you have to recreate the exact model definition, and have an instance of it to load the state into. This is a tedious process, but works. We have a Python script to export the state from Python in the source repo (it can also export optimizer state now). 
There are two articles under the 'Wiki' header that covers these topics. Hopefully, those are sufficient to get you started. If not, please let me know where the information gaps are.

https://github.com/dotnet/TorchSharp/wiki/Sharing-Model-Data-between-PyTorch-and-TorchSharp https://github.com/dotnet/TorchSharp/wiki/TorchScript

There's also a discussion of serialization in one of the tutorials:

https://github.com/dotnet/TorchSharpExamples/blob/main/tutorials/CSharp/tutorial6.ipynb

michieal Apr 14, 2023

Thank you. I really wondered what the "pickle" tag on the models meant. Also, since I have the python code for LLaMA, from Meta, wouldn't I have the structures/logic? I also a c/cpp implementation of it (llama.cpp).. though, that converts the data model into a ggml format. I have a few different pth files that I have downloaded. I'd fully convert the cpp to c#, but as you may know, that is a serious pain... Though, one point that I do like about the llama.cpp implementation, they have a PR that does mmap of the file, so that the OS loads it in the background and it uses less memory than reading all of that in. So, instead of a 24gb memory footprint, it has a ~4gb memory footprint.

Also, working through the links that you gave me. If you have further suggestions on this, I'd greatly appreciate them!

NiklasGustafsson Apr 14, 2023

Yes, if you have the source code, then you're golden. From there, it's "just work" to translate it to C# and then you can load weights and buffers. Please note that you have to be meticulous about the translation -- the fields have to have exactly the same name, the module/submodule hierarchy has to be exactly the same, etc.

Also, since you'll be in the business of translating a serious model with serious memory requirements, make sure to read the Wiki article on memory management.

michieal Apr 14, 2023

Yeah, I was thinking that method three is what I am going to have to cut my teeth on. Nothing quite like jumping head first into the deep end of the pool, lmao.

NiklasGustafsson Apr 14, 2023

Yes, that's the only reasonable one to use.

NiklasGustafsson Apr 14, 2023

Also, make sure that you follow the protocol for custom module construction:

https://github.com/dotnet/TorchSharp/wiki/Creating-Your-Own-TorchSharp-Modules

Without RegisterComponents toward the end, nothing will work. If you want to move a module to GPU, you need to do that after the call to RegisterComponents

michieal Apr 14, 2023

Okay, I did notice that the code tries to use the GPU... (Cuda specifically.) if that is not available on the target machine, will TS route it over to the CPU instead? I mean, I have a Radeon, not a NVidia card...

NiklasGustafsson Apr 14, 2023

TorchSharp currently only supports CPU and CUDA. We haven't had the resources to test with the Linux ROCm backend. That'll be a great thing for someone to contribute... :-)

In TorchSharp, all non-TorchScript modules start out on CPU, and you have to move it explicitly to GPU. That will fail if you don't have a GPU (to be more precise, if you don't have a GPU backend).

michieal Apr 14, 2023

I see... hmmn, that makes things difficult. The last release of the Linux ROCm Radeon drivers, well, was a cluster-f***. So, at the moment, I don't have it installed, as it breaks the system. But, once they release a working driver, I'd actually be willing to be a testing guinea pig.

The Meta LLaMA code tries to do cuda processing... so, if I don't move it to the GPU... will it still run on the CPU?

it also uses sentencepiece to do some processing, where do I find a C# equiv for that?

NiklasGustafsson Apr 14, 2023

Yes, it will. Unless you save it using tracing on TorchScript while the weights are on CUDA. (this is a TorchScript limitation). Always call torch.is_cuda_available() before trying to move it unless you happen to know that you have a CUDA GPU.

NiklasGustafsson Apr 14, 2023

it also uses sentencepiece to do some processing, where do I find a C# equiv for that?

TorchSharp is scoped to providing .NET bindings to libtorch, so any tokenization libraries, especially ones that include native code (which I believe sentencepiece does) is unfortunately outside the scope of this library.

@luisquintanilla (ML.NET PM) may have some thoughts on suitable tokenizers to use in place of sentencepiece.

michieal Apr 14, 2023

Understandable. Hopefully, they will know.

@GeorgeS2019 Apr 14, 2023:

@michieal
https://github.com/wang1ang/SentencePieceWrapper

NiklasGustafsson Apr 14, 2023

BlingFire may also be useful, and it's on NuGet:

https://github.com/microsoft/BlingFire

michieal Apr 14, 2023

Thank you, both of you! This is definitely helpful!

michieal Apr 20, 2023

Okay, so... I didn't die... but wow, the migraine. lol.

The github versions of SentencePiece (wrappers) are seriously incomplete, especially for something that wants to know information about the vocab & tokenizer model. So, I had to make my own wrapper, based on the original SentencePiece c++ code. I will probably go through later and expose the training features, but for now I don't think? I need it.

also, I am going to copypasta this to a new issue, so that it's not buried. (Edit: hence, this issue and comments)

@NiklasGustafsson @GeorgeS2019 ,
I have always had a question, how does Java load the pytorch model from python? 🤔

ChengYen-Tang · 2023-05-23T05:24:02Z

@michieal I hope you will manage your challenges soon and get back to asking more questions. Now there are more projects on Deep NLP using TorchSharp waiting for you to checkout :-)

We need the dotnet version of transformers😐

GeorgeS2019 · 2023-05-23T06:18:49Z

@ChengYen-Tang

dotnet/TorchSharpExamples#19

sgf · 2023-06-08T03:58:48Z

I don't quite understand that the C# version of the code can obviously be more streamlined. Why do the library designers make them look very different from Python and look very redundant?
Is C#'s descriptive ability too poor? That's why python is so popular?

NiklasGustafsson · 2023-06-08T15:16:22Z

@sgf -- thanks for the feedback. It's not clear to me what you are referring to when you mention it looking very different from Python, and very redundant. We have made a lot of effort (and not all .NET developers like it) to make it look as much as the Python APIs as possible.

Some examples of what you are thinking about would be helpful.

sgf · 2023-06-08T19:05:50Z

I don't quite understand that the C# version of the code can obviously be more streamlined. Why do the library designers make them look very different from Python and look very redundant?
Is C#'s descriptive ability too poor? That's why python is so popular?

@NiklasGustafsson what my means is the code of the tutorials and examples like this:

class FeedForward(nn.Module):
    def __init__(
        self,
        dim: int,
        hidden_dim: int,
        multiple_of: int,
    ):
        super().__init__()
        hidden_dim = int(2 * hidden_dim / 3)
        hidden_dim = multiple_of * ((hidden_dim + multiple_of - 1) // multiple_of)

        self.w1 = ColumnParallelLinear(
            dim, hidden_dim, bias=False, gather_output=False, init_method=lambda x: x
        )
        self.w2 = RowParallelLinear(
            hidden_dim, dim, bias=False, input_is_parallel=True, init_method=lambda x: x
        )
        self.w3 = ColumnParallelLinear(
            dim, hidden_dim, bias=False, gather_output=False, init_method=lambda x: x
        )

    def forward(self, x):
        return self.w2(F.silu(self.w1(x)) * self.w3(x))

public class FeedForward : torch.nn.Module<Tensor,Tensor>
{
    private ColumnParallelLinear w1;
    private RowParallelLinear w2;
    private ColumnParallelLinear w3;

    public FeedForward(int dim, int hidden_dim, int multiple_of) : base(nameof(FeedForward))
    {
        var hidden_dim = (int) 2 * hidden_dim / 3.0;
        hidden_dim = multiple_of * ((hidden_dim + multiple_of - 1) / multiple_of); // The last division must be integer division.

        w1 = new ColumnParallelLinear(dim, hidden_dim, bias: false, gather_output: false, init_method: x => x);
        w2 = new RowParallelLinear(hidden_dim, dim, bias: false, gather_output: false, init_method: x => x);
        w3 = new ColumnParallelLinear(dim, hidden_dim, bias: false, gather_output: false, init_method: x => x);
        RegisterComponents();
    }

    public override Tensor forward(Tensor x)
    {
        using _ = torch.NewDisposeScope();
        return w2.forward(functional.silu(w1.forward(x)) * w3.foward(x)).MoveToOuterDisposeScope();
    }
}

and this code:

class RMSNorm(torch.nn.Module):
    def __init__(self, dim: int, eps: float = 1e-6):
        super().__init__()
        self.eps = eps
        self.weight = nn.Parameter(torch.ones(dim))

    def _norm(self, x):
        return x * torch.rsqrt(x.pow(2).mean(-1, keepdim=True) + self.eps)

    def forward(self, x):
        output = self._norm(x.float()).type_as(x)
        return output * self.weight

    class RMSNorm : nn.Module<torch.Tensor, torch.Tensor> {
        public float     eps    { get; }
        public Parameter weight { get; }

        public RMSNorm(int Dim, float Eps = 1e-6f) : base(nameof(RMSNorm)) {
            this.eps = Eps;
            weight = nn.Parameter(torch.ones(Dim));
            register_parameter("weight", weight);
            RegisterComponents();
        }

        public torch.Tensor Norm(torch.Tensor x) {
            return x * torch.rsqrt(x.pow(2).mean(new long[] {-1}, true) + eps);
        }

        /*         
        def forward(self, x):
            output = self._norm(x.float()).type_as(x)
            return output * self.weight
            */

        public override torch.Tensor forward(torch.Tensor x) {
            using (var scope = torch.NewDisposeScope()) {
                var xFloat = x.ToScalar();
                var norm = Norm(xFloat);
                return norm * weight;
            }
        }   
    }

NiklasGustafsson · 2023-06-08T21:09:56Z

@sgf -- thanks for those examples. In these examples, what is it that you are wondering about / commenting on?

NiklasGustafsson · 2023-07-12T16:54:40Z

@michieal, @sgf -- can we close this one?

sgf · 2023-07-13T17:01:58Z

@michieal, @sgf -- can we close this one?

im not the opener of this issue.
if the opener agree,u can do close.

NiklasGustafsson · 2023-08-09T22:53:58Z

@michieal -- should I close this issue?

Sorry, I had a confluence of life events cause me some issues. So, I had to step back some. I am still interested in / working on this. So, please keep this open. (Also, thank you for asking before just closing it.)

Are you still in need of assistance learning TorchSharp?

michieal · 2023-08-24T21:18:52Z

I am just now, as in this week, getting somewhere in recovering from a medical issue. I am in the process of downloading the latest version of Meta's Llama. Which I will need help with. But as to the issue, you can close it, if you are willing to respond to new comments on it. I apologize for taking so long to get back to you on this, I do plan to be a lot more active on / with torch Sharp. Get Outlook for Android<https://aka.ms/AAb9ysg>

…

________________________________ From: Niklas Gustafsson ***@***.***> Sent: Wednesday, August 9, 2023 6:54:09 PM To: dotnet/TorchSharp ***@***.***> Cc: Michieal O'Sullivan ***@***.***>; Mention ***@***.***> Subject: Re: [dotnet/TorchSharp] Help with implementing / learning TorchSharp. (Issue #980) @michieal<https://github.com/michieal> -- should I close this issue? Sorry, I had a confluence of life events cause me some issues. So, I had to step back some. I am still interested in / working on this. So, please keep this open. (Also, thank you for asking before just closing it.) Are you still in need of assistance learning TorchSharp? — Reply to this email directly, view it on GitHub<#980 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AD3MH64EAQZA67TBOKDQ6BDXUQIJDANCNFSM6AAAAAAXGFGWXY>. You are receiving this because you were mentioned.Message ID: ***@***.***>

GeorgeS2019 · 2023-08-25T05:29:31Z

@michieal Hope u are far along in recovery.

We now have more users here doing MiniGPT or variants in Torchsharp

NiklasGustafsson · 2023-10-20T19:06:09Z

I am just now, as in this week, getting somewhere in recovering from a medical issue. I am in the process of downloading the latest version of Meta's Llama. Which I will need help with. But as to the issue, you can close it, if you are willing to respond to new comments on it. I apologize for taking so long to get back to you on this, I do plan to be a lot more active on / with torch Sharp.

@michieal -- I will close this for now. If you have more questions and/or ideas, please open an issue, or add a topic to the discussions.

michieal · 2023-10-20T22:25:06Z

Understood. Sorry for keeping this open and unattended for so long.

I do have a question, is there anything on the horizon about this working with the Godot Engine?

GeorgeS2019 · 2023-10-20T22:26:31Z

Yes...I tested it using Godot 4 .net 6

michieal · 2023-10-20T22:28:11Z

awesome!
Is there a test project out there?

GeorgeS2019 · 2023-10-20T22:32:55Z

If U know Godot, start a new project with .net6, take any unit test available here and attach to a node. Reference TorchSharp. U need to know Godot

GeorgeS2019 · 2023-10-20T22:33:48Z

#1032 (comment)

michieal · 2023-10-20T22:38:34Z

#1032 (comment)

Thank you.
Uhm, I'm learning Godot, as I stepped away from Unity due to the recent dumpster fire their Management team created.
Kinda think that it would be neat to have a visual avatar with a real ai attached to it.

GeorgeS2019 · 2023-10-20T22:39:17Z

https://github.com/godotengine/godot-builds/releases/download/4.1.2-stable/Godot_v4.1.2-stable_mono_win64.zip

GeorgeS2019 · 2023-10-20T22:40:17Z

U need to join Godot community..e.g. discord.just say U are an Unity Refugee, people will help u

michieal · 2023-10-20T22:40:27Z

I use Linux, but I have the engine / editor installed.

michieal · 2023-10-20T22:41:27Z

Yeah, lol, I'm there.
I decided to jump in feet first into Godot, that same week.

LittleLittleCloud · 2024-01-22T08:24:16Z

I might be joining the conversation a bit late, but I wanted to share my weekend project with everyone here in case it could be beneficial.

Torchsharp-llama, a llama2 implementation using torchsharp.

GeorgeS2019 · 2024-01-22T15:44:16Z

@LittleLittleCloud Hi Zhang Xiaoyun, thanks for your contribution. We value it!

NiklasGustafsson closed this as completed Oct 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help with implementing / learning TorchSharp. #980

Help with implementing / learning TorchSharp. #980

michieal commented Apr 20, 2023

michieal commented Apr 21, 2023

michieal commented Apr 21, 2023

NiklasGustafsson commented Apr 24, 2023 •

edited

Loading

NiklasGustafsson commented Apr 25, 2023 •

edited

Loading

NiklasGustafsson commented Apr 28, 2023

NiklasGustafsson commented May 16, 2023

michieal commented May 17, 2023

GeorgeS2019 commented May 17, 2023

ChengYen-Tang commented May 23, 2023

ChengYen-Tang commented May 23, 2023

GeorgeS2019 commented May 23, 2023

sgf commented Jun 8, 2023 •

edited

Loading

NiklasGustafsson commented Jun 8, 2023

sgf commented Jun 8, 2023 •

edited

Loading

NiklasGustafsson commented Jun 8, 2023

NiklasGustafsson commented Jul 12, 2023

sgf commented Jul 13, 2023

NiklasGustafsson commented Aug 9, 2023

michieal commented Aug 24, 2023 via email

GeorgeS2019 commented Aug 25, 2023

NiklasGustafsson commented Oct 20, 2023

michieal commented Oct 20, 2023

GeorgeS2019 commented Oct 20, 2023

michieal commented Oct 20, 2023

GeorgeS2019 commented Oct 20, 2023

GeorgeS2019 commented Oct 20, 2023

michieal commented Oct 20, 2023

GeorgeS2019 commented Oct 20, 2023

GeorgeS2019 commented Oct 20, 2023

michieal commented Oct 20, 2023

michieal commented Oct 20, 2023

LittleLittleCloud commented Jan 22, 2024

GeorgeS2019 commented Jan 22, 2024

Help with implementing / learning TorchSharp. #980

Help with implementing / learning TorchSharp. #980

Comments

michieal commented Apr 20, 2023

michieal commented Apr 21, 2023

michieal commented Apr 21, 2023

NiklasGustafsson commented Apr 24, 2023 • edited Loading

NiklasGustafsson commented Apr 25, 2023 • edited Loading

NiklasGustafsson commented Apr 28, 2023

NiklasGustafsson commented May 16, 2023

michieal commented May 17, 2023

GeorgeS2019 commented May 17, 2023

ChengYen-Tang commented May 23, 2023

ChengYen-Tang commented May 23, 2023

GeorgeS2019 commented May 23, 2023

sgf commented Jun 8, 2023 • edited Loading

NiklasGustafsson commented Jun 8, 2023

sgf commented Jun 8, 2023 • edited Loading

NiklasGustafsson commented Jun 8, 2023

NiklasGustafsson commented Jul 12, 2023

sgf commented Jul 13, 2023

NiklasGustafsson commented Aug 9, 2023

michieal commented Aug 24, 2023 via email

GeorgeS2019 commented Aug 25, 2023

NiklasGustafsson commented Oct 20, 2023

michieal commented Oct 20, 2023

GeorgeS2019 commented Oct 20, 2023

michieal commented Oct 20, 2023

GeorgeS2019 commented Oct 20, 2023

GeorgeS2019 commented Oct 20, 2023

michieal commented Oct 20, 2023

GeorgeS2019 commented Oct 20, 2023

GeorgeS2019 commented Oct 20, 2023

michieal commented Oct 20, 2023

michieal commented Oct 20, 2023

LittleLittleCloud commented Jan 22, 2024

GeorgeS2019 commented Jan 22, 2024

NiklasGustafsson commented Apr 24, 2023 •

edited

Loading

NiklasGustafsson commented Apr 25, 2023 •

edited

Loading

sgf commented Jun 8, 2023 •

edited

Loading

sgf commented Jun 8, 2023 •

edited

Loading