Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unique id for ephemeral binaries #8340

Open
bkao opened this issue Oct 16, 2019 · 14 comments
Open

Unique id for ephemeral binaries #8340

bkao opened this issue Oct 16, 2019 · 14 comments

Comments

@bkao
Copy link

bkao commented Oct 16, 2019

Would it be possible to name the ephemeral binaries in .cache with some type of unique id? I don't want to compile a permanent binary and prefer to use it like a scripting language, but when I do this in a MapReduce framework each instance of the script compiles to the same filename causing it to crash if two or more instances are simultaneously writing the binary.

I like to do something like this where myprog.cr has your typical shebang line: "#!/usr/bin/env crystal":

cat file.json | myprog.cr --release > out

Which generates this binary:
~/.cache/crystal/crystal-run-myprog.tmp

Would be nice if the binaries were named with unique id's like:
~/.cache/crystal/crystal-run-myprog-<someuniqid>.tmp

This way each instance in my MapReduce multi-process framework gets its own file.

@jkthorne
Copy link
Contributor

jkthorne commented Oct 16, 2019

is this what you are looking for?

Edit:
https://github.com/Val/crun

@oprypin
Copy link
Member

oprypin commented Oct 17, 2019

is what? i don't get it

also is there a bit missing from the original post? maybe angle brackets after the dash in the file?

both my comments have been resolved by corresponding edits

@asterite
Copy link
Member

Yes, sorry, I also don't understand the issue.

But note that right now running the compiler twice in parallel doesn't work well.

@bkao11
Copy link

bkao11 commented Oct 17, 2019

But note that right now running the compiler twice in parallel doesn't work well.

Yes because the filenames clash.

It looks like the crun solution from @wontruefree would do the trick. I just wish it were more tightly integrated. Also, I'm not sure if it's possible to pass the '--release' option to crystal through the shebang line.

@bkao bkao closed this as completed Oct 17, 2019
@NIFR91
Copy link

NIFR91 commented Oct 17, 2019

I recently found the same issue, in my case i have a program that process some text, for example
extract some lines or columns. But I wanted to pipe in parallel the program for example

cat "1 2 3\n4 5 6\n" | ./myprog.cr extract-first-line | ./myprog.cr get-first-col

But some times the second compilation clashes with the first and we get the error

execvp (/home/nieto/.cache/crystal/crystal-run-writter.tmp): Text file busy: Text file busy (Errno)
execvp (/home/nieto/.cache/crystal/crystal-run-writter.tmp): No such file or directory: No such file or directory (Errno)
  from ???
  ...
  from ???
Error: you've found a bug in the Crystal compiler. Please open an issue, including source code that will allow us to reproduce the bug: https://github.com/crystal-lang/crystal/issues
  from ???
  ...
  from ???
Error: you've found a bug in the Crystal compiler. Please open an issue, including source code that will allow us to reproduce the bug: https://github.com/crystal-lang/crystal/issues

I also think it would be nice to have the crystal compiler handle this cases, so the user wont need to install
crun.

minimal program
# program.cr 

#!/usr/bin/env crystal 
while line = gets
  puts line 
end
echo "Hello\nWorld\n" | program.cr | program.cr 

@bew
Copy link
Contributor

bew commented Oct 17, 2019

Simpler:

#!/usr/bin/env crystal
sleep 1

Then foo.cr | foo.cr

@asterite
Copy link
Member

My suggestion: don't use crystal a scripting language. Compile the program to a binary. Then it'll be faster (no need to wait for compilation) and you won't have this "compiler is running twice" problem.

@bkao
Copy link
Author

bkao commented Oct 18, 2019

Yes compilation would render the problem moot, but there are times when I don't want to deal with separate source and binary files. This is one of the nice features of scripting languages. If crystal can already behave like a scripting language, why not go all the way solve this filename collision problem.

Crun would work, but it's a work-around, not a genuine solution. Perhaps if it were more tightly integrated or simply subsumed into crystal then we'd basically be taking it all the way in terms of behaving like a scripting language -- I think, since I haven't looked into the code path of when crystal is called via the she-bang.

Ideally crystal scripting language would behave like make and only recompile if any of the source files are newer than the executable. This would be an improvement, no? Maybe crun is doing that but like @NIFR91 said, it would be nice to bypass yet-another-dependency. Plus I don't know if it's possible to pass compiler arguments like '--release' via crun.

@NIFR91
Copy link

NIFR91 commented Oct 18, 2019

I agree with @bkao also we have crystal run which is the default behavior, its very useful and naturally leads to using it for scripting (this is one of the reasons i really like the language).
Making the user keep track of the binary when Crystal could have a integrated tool like crun (that can be used in replacement to run) makes crystal feel more like C than Crystal in this aspect. In my opinion going all the way into make crystal behave like a scripting language could be beneficial as it could be used a little more for simpler programs-scripts (the ease of use of a scripting language and raw performance of Crystal is a very compelling combination hence this issue and crun like shards.

@bkao bkao reopened this Oct 18, 2019
@RX14
Copy link
Contributor

RX14 commented Oct 29, 2019

We can exclusively lock the compiler cache directory while the compiler is using it, and atomically replace the output file after linking (don't write it in place). Crystal already has flock bound, but it might have to be added to dirfds.

@rdp
Copy link
Contributor

rdp commented Mar 9, 2020

Maybe crystal could have a parameter like "--unique-id=x" then you could use a wrapper script (wraps crystal) for your bash shebang, though...that doesn't feel optimal somehow...
@bkao the problem is if both processes are simultaneously trying to build it (and files have changed). I guess you could alternatively have your script pre-build the required binaries FWIW...crystal build foo.cr && ./foo | ./foo type of thing, FWIW?

@bkao
Copy link
Author

bkao commented Mar 9, 2020

@rdp, I don't think the actual name of the executable is the problem. Simply make one component of the filename be a hash of the source code. For example: foo.cr --> .foo.6c1a0 (temp file) --> foo.6c1a0 (final exe)

Have the crystal system first check for the hidden file indicating the compilation is in progress to prevent race conditions.

Order of operations would be something like:

  1. If final exe exists, run it
  2. If final exe doesn't exist and temp file doesn't exist, initiate compilation
  3. If final exe doesn't exist and temp file does exist, wait until compilation completes

@straight-shoota
Copy link
Member

Just a general question: How would you determine the uid? I'd assume it should be some kind of hash over the source code?

Considering that programs can contain dynamic data, such as the result of other programs (as run macro) there are some consequences:

  • The compiler would need to at least generate the entire source code before it can calculate a hash.
  • If there's a highly dynamic component that changes on every build (such as a date or counter) the uid would change for every build resulting in case no 2. which leads to concurrent compilation (with the known issues).

I assume you would only want to rebuild the binary when the (actual) source code has changed. That seems like a prototypical use case for a build management tool like make.

@rdp
Copy link
Contributor

rdp commented Mar 9, 2020

I like @RX14 idea. Maybe each crystal "output filename" could go in its own separate cache folder somehow, then compiling could flock that folder until it finishes...or something like that... :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests