Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency hell in Raku #340

Open
atroxaper opened this issue Aug 4, 2022 · 26 comments · May be fixed by rakudo/rakudo#5060
Open

Dependency hell in Raku #340

atroxaper opened this issue Aug 4, 2022 · 26 comments · May be fixed by rakudo/rakudo#5060
Labels
language Changes to the Raku Programming Language

Comments

@atroxaper
Copy link

Introduction

Take a look to any module in the Ecosystem, which has dependency of other module. META6.json file has a special section for specifying those dependencies. Information in the section means the module needs its dependency installed for correct work of the module. Probably, the module needs its dependency of a particular version. It will be luck if the module describes the author of a dependency. Unfortunately, there are not any guaranty except the dependency of correct version, author and api will be installed.

If we see use Module::A; in the code, this means 'compiler will take a CompUnit of the Module::A (in other words compiled version of file Module/A.rakumod) the highest version from the local library'. If you think that the compiler will look to META6.json, you wrong.

Even more, if we see use CurrentModule::YourFile, then the compiler will take a CompUnit of CurrentModule::YourFile the highest version from the local library. Except we write use lib 'lib'; in code or -Ilib in command line.

Examples

Let's look to a few vulnerable scenarios:

Scenario 1:

  1. install module A:ver<1>;
  2. install module B depends A:ver<1>;
  3. all B’s tests passed;
  4. install module A:ver<2>;
  5. module B works incorrectly.

This is because either we see use A::AModule; without version in B code, or module A doesn't specify a version of its submodules in the code.

Scenario 2:

  1. install module A:ver<2>;
  2. install module A:ver<1>;
  3. tests of A:ver<1> passed;
  4. module B works incorrectly.

This is because module A doesn't specify a version of its submodules in the code. We are testing modules with -Ilib flag during installation, but run the program without it.

Scenario 3:

  1. upload OpenSSL:ver<33> with malicious code to the fez;
  2. install cro;
  3. cro uses malicious code.

This is because cro authors did not specify the author of OpenSSL in the META6.json.

Scenario 4:

  1. upload any module with version 33 with malicious submodule OpenSSL::SSL (file OpenSSL/SSL.rakumod) to the fez;
  2. somehow install the module to target machine;
  3. cro on the target machine uses malicious code.
    This is because cro authors did specify a version and author of OpenSSL in the META6.json, but authors of OpenSSL did not specify a version and author of its submodules in the code.

We are not alone

Probably we can decide that in some universe the malicious code is unimportant (with what I as a specialist of information security disagree). Anyway general dependency hell scenarios are important part of programing language. For instance, as I can see in the article, in Perl world they are solving the problem through local CPAN installations and fixing foreign modules on its own. In JavaScript world, they solve the problem in npm through the coping to module code the code of its dependencies. In Python world they invent and build a huge system around virtual environment, but it does not solve the problem completely. In Java world they specify which dependencies of dependencies include or exclude from the classpath by hand or even rename packages of dependencies.

What we have in Raku? The great system with :ver<>:auth<>:api<> which absolutely doesn't help to avoid the problem. For now to avoid the problem we need to obligate to write at least auth and ver each programmer (module author either module consumer) in each .rakumod file in each use statement. A programmer could make yourself to write such boilerplate (kudos lizmat). Unfortunately they couldn't obligate to do so the all community. Besides, such boilerplate looks ugly. It is dangerous repetition.

Why it is important?

There are a few principles, which Raku adhere to:

Making Easy Things Easy and Hard Things Possible

In theory, we can avoid a dependency hell even in current state of the language, but is it so hard thing for language for the next hundred years? Can the language make that thing easy?

Try to minimize things that Less Than Awesome

Let's imagine a page in a documentation about dependency hell avoiding. There will be written that you have to write a boilerplate in each use statement, but even that doesn't help you. Or they need to stick to Perl strategy with local servers and fixing foreign modules. I think it will be much less than awesome.

Do What I Mean:

I cannot know what mean programmers who write versions in its META6.json but not in use statements. Probably, programmers do not even know the dependency hell problem exists. Let's analyze some objective data. There were 1447 modules in the fez in the past week. There are 3k of dependencies in META6.json files and 36k of use statements at all. If we exclude lizmat from the statistics, we have: 22.2% of dependencies and 0.12% of use statements have a version, for auth it is 5.4% and 0.2% respectively, for api - 0.6% and 0.2%. In other words, almost nobody specifies anything in use statement even if specifies in META6.json. Programmers mean the dependency hell problem does not exist.

It is a huge problem, which we do not observe only because of small the Ecosystem. Once the Ecosystem becomes large enough, the problem will blow up.

@atroxaper atroxaper added the language Changes to the Raku Programming Language label Aug 4, 2022
@atroxaper
Copy link
Author

What I propose to solve the problem:

In case of CURI (CompUnit::Repository::Installation):

  • obligate to specify a version and auth in depends section of META6.json;
  • if author of a module omits a version and author in use statement, then use what specified in META6.json;
  • if use statement specify either version or author, then use it;
  • if use statement specify whatever (*) somewhere in :ver:auth:api, then use any highest version of module (as now works).

In case of CUFS (CompUnit::Repository::FileSystem use lib 'lib') a programmer takes responsibility for yourself for what they write in the code. If lib/../META6.json exists (-I.) then use its dependency section or the highest version.

@patrickbkr
Copy link
Member

AFAIK it's difficult to get the "look in META6.json" solution to be performant. More complex solutions like somehow compiling the META6.json might be possible.

Maybe an intermediate step (to buy us some time) could be to have a Raku program integrated into Fez that substitutes all use statements in lib/, bin/, t/ and xt/ with their META6.json equivalents before upload. (Or have a separate tool, which the user has to call manually and have Fez only verify that the versions in META6.json and the source files match.)

@lizmat
Copy link
Collaborator

lizmat commented Aug 4, 2022

FWIW, I have been toying with the following idea:

If you do use pre-production in your code, it will warn about unpinned versions in use statements (use Foo). We can discuss what we consider unpinned, but any use statement without associated :auth<...> I would consider unpinned. It will also refuse to compile if it finds any use lib active (aka, any Repo of CUR::FileSystem type).

If you do use production in your code, your code will fail to compile if there is any unpinned use statement. As well as all the use pre-production features.

I think this will allow people to indicate the strictness they want.

Note that it is possible to load a module such as pre-production as a command-line parameter, so you would not need to change any actual code to force this check.

Now, time to find the hooks and the tuits to do this... :-)

@niner
Copy link

niner commented Aug 4, 2022 via email

@Kaiepi
Copy link

Kaiepi commented Aug 4, 2022

I think https://www.openbsd.org/faq/ports/guide.html might be a worthwhile reference for considerations pertaining to packaging security.

@lizmat
Copy link
Collaborator

lizmat commented Aug 4, 2022

@niner Yeah, that's what I also had in mind, until I got distracted by App::Rak and the conference :-)

@atroxaper
Copy link
Author

Maybe an intermediate step (to buy us some time) could be to have a Raku program integrated into Fez that substitutes...

If we can solve the problem through simple substitution of use statements in the sources (which is a hack), then we can solve the problem by substitution of use statements in a compile time (which is reliable and does not require any action from the user).

@atroxaper
Copy link
Author

I think this will allow people to indicate the strictness they want.

@lizmat It is interesting idea. I see two disadvantages:

  1. A user will need to write a boilerplate :auth<>:ver<>;
  2. This solution doesn't solve a transitive situation. The situation when a production module want to use dependency which uses unpinned use statements.

@vrurg
Copy link
Contributor

vrurg commented Aug 10, 2022

I don't currently remember all the details of module loading sequence, but there should be no hacks necessary.

First of all, META6 is available for the compiler or can be available. All we need is to parse it and keep the dependency info around as defaults for CU::DependencySpecification. This way we can use default auth/ver/api where not explicitly specified by user.

When the module is resolved with all the user provided and default values in mind, it is identified by its hash value. So, loading pre-compiled code should not be a problem either.

What am I missing here?

@lizmat
Copy link
Collaborator

lizmat commented Aug 10, 2022

Well, to me it was not just a SMOP :-(

@atroxaper atroxaper reopened this Aug 11, 2022
@atroxaper
Copy link
Author

I say it is a hack about substitution of use statements by some tool on the source code level.

@vrurg. As I know, yes, you are right. All comp unit dependencies in CURI specified as hashes at the first lines of the precompiled file. But I do not know how precompiled use statement looks like - Raku allows us to use a different versions of module in different scopes. Probably, it will be difficult part.

@vrurg
Copy link
Contributor

vrurg commented Aug 12, 2022

There is a thing which simplifies the task a lot. Basically, it makes it half-resolved. Each module final name in a repository is defined by distribution ID. The latter is computed as a hash over module name, version, api, and auth. All together. We just need to make sure that this exact name gets loaded. To do so the compiler needs access to META6.json data.

@abraxxa
Copy link

abraxxa commented Aug 17, 2022

I like that idea very much! 👍🏻
As raku compiles the source code to an intermediate format and stores those files, can't the infos from META6.json be added to all use statements in the intermediate files at that step?

@Skarsnik
Copy link

Skarsnik commented Sep 5, 2022

I don't think there is a good solution for most use cases.

I can take my Gumbo module for example.
Its raku dependencies are XML.
Should I pin a version/author? This look annoying since it means I will have to test every new version to see if that still works and update Gumbo just to change the pinned version. Also, it's better if the XML module is allowed to be updated so Gumbo can be faster or safer because the XML update fixed/improved stuff.

Leaving unpinned obviously leads to the issue aborded by this issue.

I think also we should look at the 2 main environments:
-The whole neatly packaged env with every version knows and specified somewhere. It can be a docker env or a simple application where you provide all the deps.
-A normal linux distribution: where you don't control the possible version of your dependencies.

Both offer different challenges :
For example

  • Packed app : You can miss a critical security update
  • Distrib : It's harder to control that your app will work with all deps updates

@atroxaper
Copy link
Author

atroxaper commented Sep 5, 2022

@Skarsnik we have auth, ver and api. You case require dependency like XML:auth< zef:raku-community-modules >:ver<0.+>:api<1>.

There are two problems: Firstly, XML module do not specify its api number. I think, self api number should be obligatory in META6.json. Secondly, 0.+ means any version more then 0. It is 0.3.4 and 15.5.2. There is no possibility to say ‘version more than 0, but less than 1’. But probably, we can use range here. Some thing like :ver<v0.3.2..^v1>.

I think, such solution could satisfy you: you have tested you app on XML:ver<0.3.2>, and you expect, that XML developers will increase api number in case of api changes. Olso, we are protected from using future version 1 for sure.

@niner
Copy link

niner commented Sep 5, 2022 via email

@Skarsnik
Copy link

Skarsnik commented Sep 5, 2022

@Skarsnik we have auth, ver and api. You case require dependency like XML:auth< zef:raku-community-modules >:ver<0.+>:api<1>.

There are two problems: Firstly, XML module do not specify its api number. I think, self api number should be obligatory in META6.json. Secondly, 0.+ means any version more then 0. It is 0.3.4 and 15.5.2. There is no possibility to say ‘version more than 0, but less than 1’. But probably, we can use range here. Some thing like :ver<v0.3.2..^v1>.

I think, such solution could satisfy you: you have tested you app on XML:ver<0.3.2>, and you expect, that XML developers will increase api number in case of api changes. Olso, we are protected from using future version 1 for sure.

I think what could be first started is imposing a API number for Modules (but not apps). This could be a check in zef or when the module 'database' is generated. This way it's easier for other people to be able to use API numbers on the Meta/use statement. So that will solve my Gumbo issue where I can say use XML:api<1>

@atroxaper
Copy link
Author

@Skarsnik It will be only a half way solution - it will solve your problem by forcing you to write a boilerplate.

@Kaiepi
Copy link

Kaiepi commented Sep 5, 2022

I think, self api number should be obligatory in META6.json.

Yes, please; even if just a fez-ism, I have a bad habit of forgetting this, despite always wanting one, because there's no warning or anything if omitted.

@lizmat
Copy link
Collaborator

lizmat commented Sep 5, 2022

+1 one it being enforced by fez

@nxadm
Copy link

nxadm commented Sep 7, 2022

FWIW, I have been toying with the following idea:

If you do use pre-production in your code, it will warn about unpinned versions in use statements (use Foo). We can discuss what we consider unpinned, but any use statement without associated :auth<...> I would consider unpinned. It will also refuse to compile if it finds any use lib active (aka, any Repo of CUR::FileSystem type).

If you do use production in your code, your code will fail to compile if there is any unpinned use statement. As well as all the use pre-production features.

I think this will allow people to indicate the strictness they want.

Note that it is possible to load a module such as pre-production as a command-line parameter, so you would not need to change any actual code to force this check.

Now, time to find the hooks and the tuits to do this... :-)

Interesting, but to be workable I would have $future-raku-version default to the "production" mode and add a cli parameter for pre-prod. Being strict about dependencies is a feature.

This may be interesting for deciding how to deal with library updates: https://go.dev/ref/mod#minimal-version-selection. Luckily, Raku has a more flexibility than Go on this matter, but the versioning should be set centrally per library + once for the project and not on a zillion of source files.

@nxadm
Copy link

nxadm commented Sep 7, 2022

+1 one it being enforced by fez

Core functionality should not be handled by external tools. Or fez or fez-like should be pulled into core (and probably handled by a raku sub command).

@atroxaper
Copy link
Author

Core functionality should not be handled by external tools.

Ecosystem is not a core functionality. Core functionality is how and where Raku looking PrecompUnits. Community could negotiate that Fez is official Ecosystem and fully supports Raku algorithms. If we use Fez/Fez then all going to be fine.

@nxadm
Copy link

nxadm commented Sep 7, 2022

Ecosystem is not a core functionality.

The rules (interfaces) and, in this case, their enforcement is a core functionality in my eyes. The contents of the ecosystem is not, of course.

@vrurg
Copy link
Contributor

vrurg commented Sep 16, 2022

rakudo/rakudo#5060

@tony-o
Copy link

tony-o commented Aug 29, 2023

FWIW, I have been toying with the following idea:
If you do use pre-production in your code, it will warn about unpinned versions in use statements (use Foo). We can discuss what we consider unpinned, but any use statement without associated :auth<...> I would consider unpinned. It will also refuse to compile if it finds any use lib active (aka, any Repo of CUR::FileSystem type).
If you do use production in your code, your code will fail to compile if there is any unpinned use statement. As well as all the use pre-production features.
I think this will allow people to indicate the strictness they want.
Note that it is possible to load a module such as pre-production as a command-line parameter, so you would not need to change any actual code to force this check.
Now, time to find the hooks and the tuits to do this... :-)

Interesting, but to be workable I would have $future-raku-version default to the "production" mode and add a cli parameter for pre-prod. Being strict about dependencies is a feature.

This may be interesting for deciding how to deal with library updates: https://go.dev/ref/mod#minimal-version-selection. Luckily, Raku has a more flexibility than Go on this matter, but the versioning should be set centrally per library + once for the project and not on a zillion of source files.

MVS looks like something zef needs to do to install things correctly and ends up putting on a bandaid that works about 60% of the time. Go's module loading leaves a lot to be desired. The real answer to pinning these versions gets better with RakuAST and deployment tooling or, if you're not into tools, writing the boilerplate. Haskell's dependency hell is real but they have the right idea with pinning hard versions in modules.

I agree with your last sentence though, if I pin in META6.json then raku should load that version of the module when it's requested. This makes Nick's job slightly easier (zef) and skarsnik's and my jobs more slightly more complex (fez & mi6) but the result is worth it. With RakuAST pinning becomes possible at the time of packaging:

  1. include every file in the provides
  2. track which versions of modules get loaded
  3. run the test suite
  4. pin versions in META6
  5. package
  6. either leave or revert the META6
  7. deploy

Edit: this can also be done using something like envy to determine what is installed/loaded: https://raku.land/zef:tony-o/envy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
language Changes to the Raku Programming Language
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants