Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: A new script command #2805

Closed
snoyberg opened this issue Nov 24, 2016 · 39 comments
Closed

RFC: A new script command #2805

snoyberg opened this issue Nov 24, 2016 · 39 comments

Comments

@snoyberg
Copy link
Contributor

I've discussed this in some part with various people, so I thought I'd get the thoughts down in writing. Currently, script interpreter usage of Stack has (IMO) three limitations:

  • It's slower than just using runghc directly. Some of that is inherent: it needs to check for the presence of GHC and libraries, for example, which runghc does not. Nonetheless, there is more overhead than necessary. From my understanding, settings file checking is a significant component of this (and also likely something we don't want to impact the behavior of script usage).

  • Scripts violate some reproducibility rules, because:

    • They don't require that a resolver be set
    • They don't require that all packages be listed
    • They can be impacted by other packages installed

    These trade-offs all make sense when considering the runghc feature in general, but are less appetizing when applied to reproducible scripts.

  • There are multiple options that should almost always be set but aren't enforced: --install-ghc, a --resolver as implied above.

Proposal: let's add a new Stack command: stack script. It would have the following semantics:

  • Require that a --resolver be set
  • Hide all packages from the package database except those specified with --package. (Perhaps make an exception for wired in packages, or at the very least base.)
  • Automatically imply --install-ghc
  • Ignore all settings files

As an alternative (or addition) to the above, we could consider:

  • Making these behaviors options available to runghc (e.g., --no-parse-settings-file)
  • Don't actually add a script command, but add support to the script interpreter bits of Stack to recognize a script command and do the right thing.
@silky
Copy link
Contributor

silky commented Nov 24, 2016

i prefer the first proposal; a new command. it will be cleaner indication that it is doing something special.

@chrisdone
Copy link
Member

I think the script command is a good proposal, better than changing runghc.

@harendra-kumar
Copy link
Collaborator

Sounds good to have a packaged command since this is a common enough use case. Users will not have to scratch their head and consult documentation on what all options to specify. Making this part of runghc will not solve the complexity of too many options (we already have tricky ways to specify options, for example see #2486).

I also do not like the second alternative of a magical command only visible to interpreter, it will cause confusion.

How about using stack run or stack runsure instead of stack script? Just a quick thought.

@snoyberg
Copy link
Contributor Author

I think stack run would be confused with cabal run, which (IIRC) builds an executable in a project and runs it.

I agree with the overall consensus that an explicit new command is better than a bunch of new options or a secret command.

snoyberg added a commit that referenced this issue Nov 24, 2016
This adds a basic script command. Most of the work involved has to do
with preventing config files from being loaded. There's at least one
ugly hack involved (adding a dummy bcStackYaml value), and some usages
of error that should probably be converted to proper exception types.

While this addresses reproducibility, it doesn't help with the speed
concerns: script is now about 100ms faster than runghc on my system for
the case where --package is provided, but it's still over a second for
Hello World. The slowdown is inherent right now in checking if the
relevant packages are installed. It would be nice to figure out a way to
optimize the package check.

Also, this should include some integration tests. It should be a simple
matter of a test that includes a bogus stack.yaml and proving that stack
script ignores it.
@snoyberg
Copy link
Contributor Author

I was planning on hacking on this next week, but I was a little over-anxious to hit it, so: #2806. Still needs work, but may be worth poking at to see if the functionality meets what we're all expecting.

@guibou
Copy link

guibou commented Nov 24, 2016

Do you think a shorter way of listing packages can be introduced? Instead of including as many --package as needed, perhaps a coma separated list ? Or a special comment in the haskell file, something such as:

#!/usr/bin/env/stack
-- resolver: lts-...
-- packages: foo bar baz

@snoyberg
Copy link
Contributor Author

@guibou I've actually set this up already such that the following works:

#!/usr/bin/env stack
-- stack script --resolver lts-7.9 --package "stm async"
import Control.Concurrent.Async
import Control.Concurrent.STM

main = putStrLn "Hello World!"

I can see advantages in both directions on your request. The current script approach is very nice, because it's a direct translation of what you'd do on the command line. There's nothing extra that needs to be learnt. On the other hand, a more explicit syntax may be nice.

Given the ability to give a space-separated (or maybe also comma-separated as you suggested? seems unobjectionable to me) list of packages, do you think that reduces the pain enough to keep the syntax as-is?

@guibou
Copy link

guibou commented Nov 25, 2016

@snoyberg The space separated list inside --package(s) is alright.

Another feature request, script needs a way to fallback to ghci once the script is done executing (or if there is no main method). I'm always using ghci as a interface for my scripts when they need some kind of user interaction, mostly to use the auto completion, the type checker and avoid writing a CLI parser.

@harendra-kumar
Copy link
Collaborator

@guibou you can use stack exec ghci command in interpreter comment. See #2788 .

@guibou
Copy link

guibou commented Nov 25, 2016

@harendra-kumar Yes but, as far as I know, it suffers from the same issues about reproducibility discussed in this thread.

@snoyberg
Copy link
Contributor Author

I'm not following the connection between this and ghci. Keep in mind that I almost never use ghci myself, so it's almost certain I'm just missing something very obvious.

@harendra-kumar
Copy link
Collaborator

Even I don't use ghci but it seems quite a few people use it. This is about starting ghci instead of running the script via the interpreter comment. It is already supported by using stack exec ghci command instead of stack runghc or stack script in the comment. But I guess @guibou wants reproducibility with ghci as well.

@simonmichael
Copy link
Contributor

simonmichael commented Nov 25, 2016

Re speed, here are some timings I posted yesterday.

As I mentioned there, and perhaps related to this stack ghci discussion, it would be lovely if stack ghc script.hs also used the stack script options, so that compiling a stack script just works.

@snoyberg
Copy link
Contributor Author

Interesting point on a ghc command... maybe we should just allow a flag to script to switch its behavior to instead compile? It's also possible to try and apply the optimization I made to the exec command, but it makes some assumptions that won't generally apply (e.g., that only packages present in a snapshot can be specified).

I also pushed a commit to the PR as a WIP commit, which begins a major refactoring of the BuildConfig and EnvConfig types to allow for the possibility of no config files and no local databases. It's mostly busywork, but certainly adds some complexity, and I'm not convinced the complexity it worth it. By contrast, we can just go with the original implementation I wrote, and add special logic to the build command that, when given a flag, it will ensure that no local packages are specified.

Any feedback on which of those approaches I should take? Especially CCing @mgsloan @borsboom and @Blaisorblade from an ongoing-maintenance perspective.

@alexanderkjeldaas
Copy link
Contributor

I think a requirement for such a command is that it should be possible to just write

#/usr/bin/env stack script

and make it dwim.

I'd go as far as to ask for a different prelude in this case as well, to cut down on the import boiler plate. It should be possible to write a "hello world" haskell script in <5 lines.

@snoyberg
Copy link
Contributor Author

That's unfortunately impossible, there's no standard behavior across shells for what happens when multiple arguments are provided. That's why we've been using the special syntax until now.

An alternate prelude is almost certainly far too opinionated a decision to make here.

@alexanderkjeldaas
Copy link
Contributor

#!/usr/bin/env stack
-- stack script --resolver lts-7.9 --package "stm async"
import Control.Concurrent.Async
import Control.Concurrent.STM

main = putStrLn "Hello World!"

The above script violates DRY. Some solutions:

  • Imports should be implied by searching for the first package with the first module to export a function (thus utilizing the --package "stm async" part)
  • Packages are allowed to export a "main" module, which will be imported by default for all packages from --package.
  • --package is removed, and import Foo from "base" used.

Stack could have a global default resolver.

#!/usr/bin/env stack
import Turtle from "turtle"

main = putStrLn "Hello World!"

not sure how we can get rid of main, but at least that's slightly more succinct.

@borsboom
Copy link
Contributor

Overall, I like the idea of the refactoring to make it possible to use with no config files in a somewhat general way. It does seem like something that will be useful to many people.

@borsboom
Copy link
Contributor

borsboom commented Nov 29, 2016

@alexanderkjeldaas These features would be nice and be more "scripty", but would force Stack to parse the Haskell first to figure out the packages it's using and also change some of the behaviour of the compiler. We're sticking with "vanilla" GHC, but may be worth making a separate issue to discuss this, separate from the current proposal. BTW there is existing syntax with the PackageImports extension to specify which package to load a module from using import "turtle Turtle.

@alexanderkjeldaas
Copy link
Contributor

@borsboom Stack doesn't actually have to parse the file. Instead if could use https://hackage.haskell.org/package/HFuse to lazily mount a full stackage copy, so all packages are available immediately (windows not supported).

@Blaisorblade
Copy link
Collaborator

But would that solution actually work? I might be missing details, but it's far from clear.

"Windows not supported" probably means core stack wouldn't do it, but you can always write additional commands.

But I guess you'd need to mount (via HTTP? FTP? What else) a compiled snapshot of full Stackage, together with its ghc-pkg package database and more, and then download the binaries. This is assuming they'd work, and some wouldn't, since the build is supposed to customize those to your machine, see e.g. the ghc-paths package (maybe it'd work for a fixed distribution?).
Or you'd have to merge all the sources, except i don't think that would actually work.

Don't misunderstand me: if that works it'd be amazing and you could go around giving talks about it. But the effort and questions involved seem to go go beyond "let's just give stack devs a tip".

@alexanderkjeldaas
Copy link
Contributor

@Blaisorblade sure it might require work :-)

the build is supposed to customize those to your machine

it this is true, then that's an issue in itself. I think packages that don't have deterministic builds should be marked somehow in stackage, to avoid caching. The same applies to nixpkgs or any other distribution system for pre-compiled packages.

But I guess you'd need to mount (via HTTP? FTP? What else) ..

This is what HFuse does - mounts a file system that calls back into a daemon that implements the file system. So no HTTP/FTP, but stack would implement a file system while running the build, yes.

@hvr
Copy link

hvr commented Nov 29, 2016

@alexanderkjeldaas Note that this extends to GHC itselfs whose build-time configure tries to detect various aspects of the environment and select the optimal entry points into libc/libgmp/etc and may affect what becomes inlineable or not at the .hs level. And to make it worse, we've got quite a few autoconf-based (or even Setup.hs-based ones) on Hackage which detect the current cpu/architecture (some of the crypto packages do that) at compile-time (rather than generating code that makes the choice at runtime).

@alexanderkjeldaas
Copy link
Contributor

alexanderkjeldaas commented Nov 29, 2016

@hvr regarding the CPU architecture detection stuff, this is handled by distros by forcing a compatible one. For example we had the i386 -> i586 change years ago in linux distributions - the changes were basically which CPU architecture to compile against.

for libc/libgmp it is hard. I don't know if it is possible to cache these unless at least the major version of the libraries are part of the cache key.

But note that the fuse based solution works partly without any sort of caching as well. stack can block the file system call and compile packages on demand. the package database might be a challenge, but if some packages are marked "can't be cached", but the package database is complete, then it could work.

@Blaisorblade
Copy link
Collaborator

Blaisorblade commented Nov 29, 2016 via email

@hvr
Copy link

hvr commented Nov 29, 2016

@alexanderkjeldaas fwiw, for the new http://matrix.hackage.haskell.org/ prototype I've got a remote shared 2nd level nix-store cache. The new nix-style features in cabal make this quite easy to implement for the case of all nodes using the same hardware & distro (i.e. you have to index the cache by distro/abi in addition to os/arch/ghc-version etc), w/o having to resort to a layer like FUSE.

@snoyberg
Copy link
Contributor Author

snoyberg commented Feb 2, 2017

This is not actually correct: the plan is to disallow packages outside the snapshot. There are many downsides that come with allowing additional packages, such as needing to create .stack-work directories which may conflict with other scripts. If you need non-snapshot packages, you can continue to use the runghc command. This command is intended to solve the common case in the best way possible.

snoyberg added a commit that referenced this issue Feb 8, 2017
This adds a basic script command. Most of the work involved has to do
with preventing config files from being loaded. There's at least one
ugly hack involved (adding a dummy bcStackYaml value), and some usages
of error that should probably be converted to proper exception types.

While this addresses reproducibility, it doesn't help with the speed
concerns: script is now about 100ms faster than runghc on my system for
the case where --package is provided, but it's still over a second for
Hello World. The slowdown is inherent right now in checking if the
relevant packages are installed. It would be nice to figure out a way to
optimize the package check.

Also, this should include some integration tests. It should be a simple
matter of a test that includes a bogus stack.yaml and proving that stack
script ignores it.
snoyberg added a commit that referenced this issue Feb 9, 2017
This adds a basic script command. Most of the work involved has to do
with preventing config files from being loaded. There's at least one
ugly hack involved (adding a dummy bcStackYaml value), and some usages
of error that should probably be converted to proper exception types.

While this addresses reproducibility, it doesn't help with the speed
concerns: script is now about 100ms faster than runghc on my system for
the case where --package is provided, but it's still over a second for
Hello World. The slowdown is inherent right now in checking if the
relevant packages are installed. It would be nice to figure out a way to
optimize the package check.

Also, this should include some integration tests. It should be a simple
matter of a test that includes a bogus stack.yaml and proving that stack
script ignores it.
snoyberg added a commit that referenced this issue Feb 9, 2017
This adds a basic script command. Most of the work involved has to do
with preventing config files from being loaded. Originally we planned on
creating an alternative set of config types for with and without a local
config. That turned out to be prohibitively invasive. Instead, we now
just create a dummy config file instead ~/.stack/script/lts-x.y (or
/nightly-...).

While this addresses reproducibility, it doesn't help with the speed
concerns: script is now about 100ms faster than runghc on my system for
the case where --package is provided, but it's still over a second for
Hello World. The slowdown is inherent right now in checking if the
relevant packages are installed. It would be nice to figure out a way to
optimize the package check.

Also, this should include some integration tests. It should be a simple
matter of a test that includes a bogus stack.yaml and proving that stack
script ignores it.
snoyberg added a commit that referenced this issue Feb 9, 2017
This adds a basic script command. Most of the work involved has to do
with preventing config files from being loaded. Originally we planned on
creating an alternative set of config types for with and without a local
config. That turned out to be prohibitively invasive. Instead, we now
just create a dummy config file instead ~/.stack/script/lts-x.y (or
/nightly-...).

While this addresses reproducibility, it doesn't help with the speed
concerns: script is now about 100ms faster than runghc on my system for
the case where --package is provided, but it's still over a second for
Hello World. The slowdown is inherent right now in checking if the
relevant packages are installed. It would be nice to figure out a way to
optimize the package check.

Also, this should include some integration tests. It should be a simple
matter of a test that includes a bogus stack.yaml and proving that stack
script ignores it.
snoyberg added a commit that referenced this issue Feb 10, 2017
This adds a basic script command. Most of the work involved has to do
with preventing config files from being loaded. Originally we planned on
creating an alternative set of config types for with and without a local
config. That turned out to be prohibitively invasive. Instead, we now
just create a dummy config file instead ~/.stack/script/lts-x.y (or
/nightly-...).

While this addresses reproducibility, it doesn't help with the speed
concerns: script is now about 100ms faster than runghc on my system for
the case where --package is provided, but it's still over a second for
Hello World. The slowdown is inherent right now in checking if the
relevant packages are installed. It would be nice to figure out a way to
optimize the package check.

Also, this should include some integration tests. It should be a simple
matter of a test that includes a bogus stack.yaml and proving that stack
script ignores it.
snoyberg added a commit that referenced this issue Feb 10, 2017
This adds a basic script command. Most of the work involved has to do
with preventing config files from being loaded. Originally we planned on
creating an alternative set of config types for with and without a local
config. That turned out to be prohibitively invasive. Instead, we now
just create a dummy config file instead ~/.stack/script/lts-x.y (or
/nightly-...).

While this addresses reproducibility, it doesn't help with the speed
concerns: script is now about 100ms faster than runghc on my system for
the case where --package is provided, but it's still over a second for
Hello World. The slowdown is inherent right now in checking if the
relevant packages are installed. It would be nice to figure out a way to
optimize the package check.

Also, this should include some integration tests. It should be a simple
matter of a test that includes a bogus stack.yaml and proving that stack
script ignores it.
snoyberg added a commit that referenced this issue Feb 10, 2017
This adds a basic script command. Most of the work involved has to do
with preventing config files from being loaded. Originally we planned on
creating an alternative set of config types for with and without a local
config. That turned out to be prohibitively invasive. Instead, we now
just create a dummy config file instead ~/.stack/script/lts-x.y (or
/nightly-...).

While this addresses reproducibility, it doesn't help with the speed
concerns: script is now about 100ms faster than runghc on my system for
the case where --package is provided, but it's still over a second for
Hello World. The slowdown is inherent right now in checking if the
relevant packages are installed. It would be nice to figure out a way to
optimize the package check.

Also, this should include some integration tests. It should be a simple
matter of a test that includes a bogus stack.yaml and proving that stack
script ignores it.
snoyberg added a commit that referenced this issue Feb 10, 2017
This adds a basic script command. Most of the work involved has to do
with preventing config files from being loaded. Originally we planned on
creating an alternative set of config types for with and without a local
config. That turned out to be prohibitively invasive. Instead, we now
just create a dummy config file instead ~/.stack/script/lts-x.y (or
/nightly-...).

While this addresses reproducibility, it doesn't help with the speed
concerns: script is now about 100ms faster than runghc on my system for
the case where --package is provided, but it's still over a second for
Hello World. The slowdown is inherent right now in checking if the
relevant packages are installed. It would be nice to figure out a way to
optimize the package check.

Also, this should include some integration tests. It should be a simple
matter of a test that includes a bogus stack.yaml and proving that stack
script ignores it.
@snoyberg
Copy link
Contributor Author

This is now implemented on a branch:

2805-new-script-command-no-config-change

For anyone interested, please give this a shot and comment on the PR at:

#2992

If I don't hear any objections, I'll merge later this week.

snoyberg added a commit that referenced this issue Feb 13, 2017
This was originally done to help implementation of #2805, but we ended
up going a different route.
snoyberg added a commit that referenced this issue Feb 13, 2017
This was originally done to help implementation of #2805, but we ended
up going a different route.
snoyberg added a commit that referenced this issue Feb 14, 2017
This was originally done to help implementation of #2805, but we ended
up going a different route.
snoyberg added a commit that referenced this issue Feb 14, 2017
This was originally done to help implementation of #2805, but we ended
up going a different route.
snoyberg added a commit that referenced this issue Feb 15, 2017
This adds a basic script command. Most of the work involved has to do
with preventing config files from being loaded. Originally we planned on
creating an alternative set of config types for with and without a local
config. That turned out to be prohibitively invasive. Instead, we now
just create a dummy config file instead ~/.stack/script/lts-x.y (or
/nightly-...).

While this addresses reproducibility, it doesn't help with the speed
concerns: script is now about 100ms faster than runghc on my system for
the case where --package is provided, but it's still over a second for
Hello World. The slowdown is inherent right now in checking if the
relevant packages are installed. It would be nice to figure out a way to
optimize the package check.

Also, this should include some integration tests. It should be a simple
matter of a test that includes a bogus stack.yaml and proving that stack
script ignores it.
snoyberg added a commit that referenced this issue Feb 15, 2017
This was originally done to help implementation of #2805, but we ended
up going a different route.
snoyberg added a commit that referenced this issue Feb 16, 2017
This adds a basic script command. Most of the work involved has to do
with preventing config files from being loaded. Originally we planned on
creating an alternative set of config types for with and without a local
config. That turned out to be prohibitively invasive. Instead, we now
just create a dummy config file instead ~/.stack/script/lts-x.y (or
/nightly-...).

While this addresses reproducibility, it doesn't help with the speed
concerns: script is now about 100ms faster than runghc on my system for
the case where --package is provided, but it's still over a second for
Hello World. The slowdown is inherent right now in checking if the
relevant packages are installed. It would be nice to figure out a way to
optimize the package check.

Also, this should include some integration tests. It should be a simple
matter of a test that includes a bogus stack.yaml and proving that stack
script ignores it.
snoyberg added a commit that referenced this issue Feb 16, 2017
This was originally done to help implementation of #2805, but we ended
up going a different route.
snoyberg added a commit that referenced this issue Feb 16, 2017
@snoyberg
Copy link
Contributor Author

This has been merged into master, including the compile-to-executable capabilities and the import parsing (though that's experimental). Improvements can continue in separate issues, so closing.

@AlainODea
Copy link

I'm super late to this particular party, but this is an awesome feature @snoyberg! Thank you :)

I've been burned a few times by sending Haskell scripts using Stack that wouldn't run on the destination because I missed packages that I had installed locally.

This removes the "works on my machine" problem very elegantly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests