Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CSX execution time #28

Closed
alirezanet opened this issue Feb 5, 2022 · 21 comments
Closed

Improve CSX execution time #28

alirezanet opened this issue Feb 5, 2022 · 21 comments
Labels
enhancement New feature or request

Comments

@alirezanet
Copy link
Owner

Details

Currently, CSX execution is not so efficient it takes about 2 seconds on the first run which is not ideal, I opened this issue to evaluate possible options to solve this problem. the expected behavior is to minimize the Roslyn compiler warming up process.

the minimum execution time for the bellow command currently is about 1-2 seconds.

dotnet husky exec <fileNme.csx>
@alirezanet alirezanet added enhancement New feature or request help wanted Extra attention is needed labels Feb 5, 2022
@alirezanet alirezanet changed the title Improve CSX execution time on the first run Improve CSX execution time Feb 5, 2022
@acesyde
Copy link
Contributor

acesyde commented Feb 26, 2022

Hi @alirezanet

IMO you can try to compile scripts when you detect a change and only execute them when it's required, you can minimize the execution process time by doing this

I don't know if it's possible it's just an idea 😄

@alirezanet
Copy link
Owner Author

Hi @alirezanet

IMO you can try to compile scripts when you detect a change and only execute them when it's required, you can minimize the execution process time by doing this

I don't know if it's possible it's just an idea 😄

Hi, Thank you for the suggestion, I agree. Compiling scripts probably is the best solution that comes to mind but I didn't find any documentation of Roslyn APIs that can be used yet although I didn't spend enough time searching.

@Xemrox
Copy link
Contributor

Xemrox commented Feb 28, 2022

Heya. Joining the discussion.
More concrete solution would involve a hash that relates to the source.
Something like $script.csx -> $script.$hash.dll?
Btw writing the assembly to a file can be done with:
compilation.Emit(...)

@acesyde
Copy link
Contributor

acesyde commented Feb 28, 2022

I checked it quickly

var cachePath = System.IO.Path.Combine(workingDirectory, ".cache");
var ouputDllPath = System.IO.Path.Combine(cachePath, "output.dll");
var outputPdbPath = System.IO.Path.Combine(cachePath, "output.pdb");

if (!Directory.Exists(cachePath))
{
    Directory.CreateDirectory(cachePath);
}

if (File.Exists(ouputDllPath) && File.Exists(outputPdbPath))
{
         using var ilstream = new MemoryStream();
         using var pdbstream = new MemoryStream();
         using var ouputDllPathStream = new FileStream(ouputDllPath, FileMode.Open);
         ouputDllPathStream.CopyTo(ilstream);
         using var outputPdbPathStream = new FileStream(outputPdbPath, FileMode.Open);
         outputPdbPathStream.CopyTo(pdbstream);

         var assembly = Assembly.Load(ilstream.GetBuffer(), pdbstream.GetBuffer());
         var type = assembly.GetType("Submission#0");
         var factory = type.GetMethod("<Factory>");
         var submissionArray = new object[2];
         var task = (Task<object>)factory.Invoke(null, new object[] { submissionArray });
         await task;
         "Command executed.".Log(ConsoleColor.Green);
         return;
}
else
{
         using var ouputDllPathStream = new FileStream(ouputDllPath, FileMode.OpenOrCreate);
         using var outputPdbPathStream = new FileStream(outputPdbPath, FileMode.OpenOrCreate);
         var emitResult = compilation.Emit(ouputDllPathStream, outputPdbPathStream);
}

It seems to be better, but I need to spend more time to benchmark it 😄

image

Todo :

  • Arguments
  • Better code
  • Global cache folder ?
  • File hash

Thks for the idea @Xemrox

@alirezanet
Copy link
Owner Author

alirezanet commented Feb 28, 2022

Interesting ... I think we don't need a global cache folder and we can use .husky\_\bin folder to store the compiled cache files also we can save the files using its hash (crc32 or md5 etc) as fileNames to keep track of changes. but this solution needs a cleanup command.

@Xemrox
Copy link
Contributor

Xemrox commented Feb 28, 2022

@acesyde why do you copy the FileStream to a MemoryStream? You could make use of Assembly.LoadFile(absolutePath) instead.
@alirezanet sha512 is the standard for hashes so go for that instead of md5. My proposal is a $scriptName.hash file containing the hash of the source that lead to the latest assembly. This way if the hash differs you can safely assume the assembly is out of sync and regenerate it

@acesyde
Copy link
Contributor

acesyde commented Mar 1, 2022

@Xemrox When I said fast it was fast, now it's time to clean up and improve 😃

@alirezanet
Copy link
Owner Author

alirezanet commented Mar 1, 2022

@Xemrox what do you mean by SHA512 is standard?
In our case using SHA algorithms is unnecessary because they are slower and more CPU intensive than something like crc32 or md5. the thing is we don't need a secure hash algorithm here. anything with a checksum works fine.
but when we implemented this feature we can do some benchmarking.

image

SHA1 looks good tho

@acesyde
Copy link
Contributor

acesyde commented Mar 1, 2022

@alirezanet with benchmarkdotnet

image

@alirezanet
Copy link
Owner Author

alirezanet commented Mar 1, 2022

Nice. I'm not sure it performs better but can you include also Crc32.NET? because here we just care about performance. It could be faster.

@acesyde
Copy link
Contributor

acesyde commented Mar 1, 2022

Not a huge difference, IMO using the official cryptographic library instead of an external lib may be better for the future

Method Mean Error StdDev Ratio Gen 0 Allocated
TestCRC32 5.605 us 0.2339 us 0.6822 us 1.00 0.0610 386 B
TestMd5 5.992 us 0.1182 us 0.1407 us 1.00 0.0610 426 B
TestSHA1 6.089 us 0.1216 us 0.1623 us 1.00 0.0687 450 B
TestSHA256 6.340 us 0.1250 us 0.2848 us 1.00 0.0763 490 B
TestSHA512 8.064 us 0.1564 us 0.2193 us 1.00 0.2289 1,453 B
// * Warnings *
MultimodalDistribution
  HashBench.TestCRC32: .NET 6.0 -> It seems that the distribution is bimodal (mValue = 3.86)

@Xemrox
Copy link
Contributor

Xemrox commented Mar 1, 2022 via email

@alirezanet
Copy link
Owner Author

alirezanet commented Mar 1, 2022

I get your point but even hashes like SHA-256 are SHA-512 are not collision-free, also in our case the source code (CSX) is available for the attackers and we don't share or commit compiled versions anywhere. so IMO thinking about security is a little bit over-engineering. let's say you wanna somehow attack this system, how it is possible when you don't have any access to other contributors' compiled files. I'm not saying using SHA-512 is a bad choice, on the contrary, I personally have a strong interest in safer algorithms but I don't see any real-world vulnerabilities yet.
But to show my respect to your opinion about this I'll probably use SHA-512 since its overhead is barely noticeable.

@acesyde
Copy link
Contributor

acesyde commented Mar 1, 2022

@alirezanet maybe in the future we can add the capability to consume shared libraries from another repo (like pre-commit) so the SHA-512 can be a solution

@alirezanet
Copy link
Owner Author

You both voted for SHA-512, hands up 🙌 😉. so SHA-512 it is. 😅

@acesyde
Copy link
Contributor

acesyde commented Mar 1, 2022

But in the case of "pre-commit" they use an external database (sqlite) to store all this information

@alirezanet
Copy link
Owner Author

What do you think about the default behavior? should exec command automatically compile the scripts for future executions? or we should add an option like --compile to change its behavior? In the end, we need one of these options probably --raw or --compile

@acesyde
Copy link
Contributor

acesyde commented Mar 1, 2022

Like you say a --raw or --no-cache option seems to be better, compilation must be the default behavior

@Xemrox
Copy link
Contributor

Xemrox commented Mar 1, 2022

I'll go with --no-cache. Exhibit good default behaviour and if anybody needs to recompile every time they have a switch for it

@acesyde
Copy link
Contributor

acesyde commented Mar 1, 2022

So i made a benchmark between compiled vs actual solution and the compiled version is a little bit faster 🤣

Method Mean Error StdDev Ratio Allocated
UseCSharpScript 29,607.01 us 548.193 us 457.766 us 1.000 6,404,152 B
UseCompiledScript 78.44 us 1.516 us 2.269 us 0.003 768 B

The csx file only contains a Console.WriteLine

@acesyde acesyde mentioned this issue Mar 4, 2022
11 tasks
@alirezanet alirezanet removed the help wanted Extra attention is needed label Mar 9, 2022
@alirezanet
Copy link
Owner Author

added in v0.5.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants