|
| 1 | +=begin pod :kind("Language") :subkind("Language") :category("tutorial") |
| 2 | +
|
| 3 | +=TITLE CompUnits and where to find them |
| 4 | +
|
| 5 | +=SUBTITLE How and when Raku modules are compiled, where they are stored, and how to access them in compiled form. |
| 6 | +
|
| 7 | +=head1 Overview |
| 8 | +
|
| 9 | +Programs in Raku, as a member of the Perl language family, tend at the top level to be more at the interpreted |
| 10 | +end of the interpreted-compiled spectrum. In this tutorial, an 'interpreted' program means that the source code, |
| 11 | +namely the human-readable text such as C<say 'hello world';> is immediately processed by the C<Raku> program into code |
| 12 | +that can be executed by the computer, with any intermediate stages being stored in memory. |
| 13 | +
|
| 14 | +A compiled program, by contrast, is one where the human readable source is first processed into machine-executable code |
| 15 | +and some form of this code is stored 'on disc'. In order to execute the program, the machine-readable version is loaded |
| 16 | +into memory and then run by the computer. |
| 17 | +
|
| 18 | +Both compiled and interpreted forms have advantages. Briefly, interpreted programs can be 'whipped up' quickly and |
| 19 | +the source changed quickly. Compiled programs can be complex and take a significant time to pre-process into machine-readable |
| 20 | +code, but then running them is much faster for a user, who only 'sees' the loading and running time, not the compilation |
| 21 | +time. |
| 22 | +
|
| 23 | +C<Raku> has both paradigms. At the B<top level> a Raku program is interpreted, but if code that is separated out into a |
| 24 | +Module will be compiled and the preprocessed version is then loaded when necessary. In practice, Modules that have been |
| 25 | +written by the community will only need to be pre-compiled once by a user when they are 'installed', for example by a |
| 26 | +Module manager such as C<zef>. Then they can be C<use'd> by a developer in her own program. The effect is to make C<Raku> |
| 27 | +top level programs run quickly. |
| 28 | +
|
| 29 | +One of the great strengths of the C<Perl> family of languages was the ability to integrate a whole ecosystem of modules |
| 30 | +written by competent programmers into a small program. This strength was widely copied and is now the norm for all |
| 31 | +languages. C<Raku> takes integration even further, making it relatively easy for C<Raku> programs to incorporate system |
| 32 | +libraries written in other languages into C<Raku> programs, see L<Native Call|Language/nativecall>. |
| 33 | +
|
| 34 | +The experience from C<Perl> and other languages is that the distributive nature of Modules generate several practical difficulties: |
| 35 | +=item a popular module may go through several iterations as the API gets improved, without a guarantee that there is |
| 36 | +backward compatibility. So, if a program relies on some specific function or return, then there has to be a way to |
| 37 | +specify the B<Version>. |
| 38 | +=item a module may have been written by Bob, a very competent programmer, who moves on in life, leaving the module unmaintained, |
| 39 | +so Alice takes over. This means that the same module, with the same name, and the same general API may have have two |
| 40 | +versions in the wild. Alternatively, two developers (eg., Alice and Bob) who initially cooperated on a module, then part company about its |
| 41 | +development. Consequently, it sometimes is necessary for there to be a way to define the B<Auth> of the module. |
| 42 | +=item a module may be enhanced over time and the maintainer keeps two versions uptodate, but with different APIs. So it is |
| 43 | +may be necessary to define the B<API> required. |
| 44 | +=item when developing a new program a developer may want to have the modules written by both Alice and Bob installed locally. |
| 45 | +So it is not possible simply to have only one version of a module with a single name installed. |
| 46 | +
|
| 47 | +C<Raku> enables all of these possibilities, allowing for multiple versions, multiple authorities, and multiple APIs to be present |
| 48 | +installed and available locally. The way classes and modules can be accessed with specific attributes |
| 49 | +is explained L<elsewhere|Language/typesystem#Versioning_and_authorship>. This tutorial is about how C<Raku> handles these |
| 50 | +possibilities. |
| 51 | +
|
| 52 | +=head1 Introduction |
| 53 | +
|
| 54 | +Before considering the C<Raku> framework, let's have a look at how languages like C<Perl> or C<Python> handle module |
| 55 | +installation and loading. |
| 56 | +
|
| 57 | +=begin code |
| 58 | +ACME::Foo::Bar -> ACME/Foo/Bar.pm |
| 59 | +os.path -> os/path.py |
| 60 | +=end code |
| 61 | +
|
| 62 | +In those languages, module names have a 1:1 relation with file system paths. |
| 63 | +We simply replace the double colons with slashes and add a .pm |
| 64 | +
|
| 65 | +Note that these are relative paths. |
| 66 | +Both C<Python> and C<Perl> use a list of include paths, to complete these paths. |
| 67 | +In C<Perl> they are available in the global C<@INC> array. |
| 68 | +
|
| 69 | +=begin code |
| 70 | +@INC |
| 71 | +
|
| 72 | +/usr/lib/perl5/site_perl/5.22.1/x86_64-linux-thread-multi |
| 73 | +/usr/lib/perl5/site_perl/5.22.1/ |
| 74 | +/usr/lib/perl5/vendor_perl/5.22.1/x86_64-linux-thread-multi |
| 75 | +/usr/lib/perl5/vendor_perl/5.22.1/ |
| 76 | +/usr/lib/perl5/5.22.1/x86_64-linux-thread-multi |
| 77 | +/usr/lib/perl5/5.22.1/ |
| 78 | +=end code |
| 79 | +
|
| 80 | +Each of these include directories is checked for whether it contains a relative path determined from the module name. |
| 81 | +If the shoe fits, the file is loaded. |
| 82 | +
|
| 83 | +Of course that's a bit of a simplified version. |
| 84 | +Both languages support caching compiled versions of modules. |
| 85 | +So instead of just the C<.pm> file C<Perl> first looks for a C<.pmc> file. |
| 86 | +And C<Python> first looks for C<.pyc> files. |
| 87 | +
|
| 88 | +Module installation in both cases means mostly copying files into locations determined by the same simple mapping. The |
| 89 | +system is easy to explain, easy to understand, simple and robust. |
| 90 | +
|
| 91 | +=head2 Why change? |
| 92 | +
|
| 93 | +Why would C<Raku> need another framework? The reason is there are features that those languages lack, namely: |
| 94 | +=item Unicode module names |
| 95 | +=item Modules published under the same names by different authors |
| 96 | +=item Having multiple versions of a module installed |
| 97 | +
|
| 98 | +The 26 Latin characters is too restrictive for virtually all real modern languages, including English, which |
| 99 | +has diacritics for many loan words. |
| 100 | +
|
| 101 | +With a 1:1 relation between module names and file system paths, you enter a world of pain |
| 102 | +once you try to support Unicode on multiple platforms and file systems. |
| 103 | +
|
| 104 | +Then there's sharing module names between multiple authors. This one may or may not work out well in practice. |
| 105 | +I can imagine using it for example for publishing a module with some fix until the original author includes |
| 106 | +the fix in the "official" version. |
| 107 | +
|
| 108 | +Finally there's multiple versions. Usually people who need certain versions of modules reach for local::lib or |
| 109 | +containers or some home grown workarounds. They all have their own disadvantages. None of them would be necessary |
| 110 | +if applications could just say, hey I need good old, trusty version 2.9 or maybe a bug fix release of that branch. |
| 111 | +
|
| 112 | +If you had any hopes of continuing using the simple name mapping solution, you probably gave up at the |
| 113 | +versioning requirement. Because, how would you find version 3.2 of a module when looking for a 2.9 or higher? |
| 114 | +
|
| 115 | +Popular ideas included collecting information about installed modules in JSON files but when those turned out to be |
| 116 | +toe-nail growing slow, text files were replace by putting the meta data into SQLite databases. |
| 117 | +However, these ideas can be easily shot down by introducing another requirement: distribution packages. |
| 118 | +
|
| 119 | +Packages for Linux distributions are mostly just archives containing some files plus some meta data. |
| 120 | +Ideally the process of installing such a package means just unpacking the files and updating the central package database. |
| 121 | +Uninstalling means deleting the files installed this way and again updating the package database. |
| 122 | +Changing existing files on install and uninstall makes packagers' lives much harder, so we really want to avoid that. |
| 123 | +Also the names of the installed files may not depend on what was previously installed. |
| 124 | +We must know at the time of packaging what the names are going to be. |
| 125 | +
|
| 126 | +=head2 Long names |
| 127 | +
|
| 128 | +=begin code |
| 129 | +Foo::Bar:auth<cpan:nine>:ver<0.3>:api<1> |
| 130 | +=end code |
| 131 | +
|
| 132 | +Step 0 in getting us back out of this mess is to define a long name. |
| 133 | +A full module name in C<Raku> consists of the short-name, auth, version and API |
| 134 | +
|
| 135 | +At the same time, the thing you install is usually not a single module but a distribution which probably contains one or more modules. |
| 136 | +Distribution names work just the same way as module names. |
| 137 | +Indeed, distributions often will just be called after their main module. |
| 138 | +An important property of distributions is that they are immutable. |
| 139 | +C<V< Foo:auth<nine>:ver<0.3>:api<1> >> will always be the name for exactly the same code. |
| 140 | +
|
| 141 | +=head2 $*REPO |
| 142 | +In C<Perl> and C<Python> you deal with include paths, pointing to file system directories. |
| 143 | +In C<Raku> we call such directories "repositories" and each of these repositories is governed by an object that does the |
| 144 | +C<CompUnit::Repository> role. |
| 145 | +Instead of an C<B<@INC>> array, there's the C<$*REPO> variable. |
| 146 | +It contains a single repository object. |
| 147 | +This object has a B<next-repo> attribute that may contain another repository. |
| 148 | +In other words: repositories are managed as a I<linked list>. |
| 149 | +The important difference to the traditional array is, that when going through the list, each object has a say in whether |
| 150 | +to pass along a request to the next-repo or not. |
| 151 | +C<Raku> sets up a standard set of repositores, i.e. the "perl", "vendor" and "site" repositories, just like you know them from C<Perl>. |
| 152 | +In addition, we set up a "home" repository for the current user. |
| 153 | +
|
| 154 | +Repositories must implement the C<need> method. |
| 155 | +A C<use> or C<require> statement in C<Raku> code is basically translated to a call to C<B<$*REPO>>'s C<need> method. |
| 156 | +This method may in turn delegate the request to the next-repo. |
| 157 | +
|
| 158 | +=begin code |
| 159 | +role CompUnit::Repository { |
| 160 | + has CompUnit::Repository $.next is rw; |
| 161 | +
|
| 162 | + method need(CompUnit::DependencySpecification $spec, |
| 163 | + CompUnit::PrecompilationRepository $precomp, |
| 164 | + CompUnit::Store :@precomp-stores |
| 165 | + --> CompUnit:D |
| 166 | + ) |
| 167 | + { ... } |
| 168 | + method loaded( |
| 169 | + --> Iterable |
| 170 | + ) |
| 171 | + { ... } |
| 172 | +
|
| 173 | + method id( --> Str ) |
| 174 | + { ... } |
| 175 | +} |
| 176 | +=end code |
| 177 | +
|
| 178 | +=head2 Repositories |
| 179 | +
|
| 180 | +Rakudo comes with several classes that can be used for repositories. |
| 181 | +The most important ones are C<CompUnit::Repository::FileSystem> and C<CompUnit::Repository::Installation>. |
| 182 | +The FileSystem repo is meant to be used during module development and actually works just like C<Perl> when |
| 183 | +looking for a module. |
| 184 | +It doesn't support versions or auths and simply maps the short-name to a file system path. |
| 185 | +
|
| 186 | +The Installation repository is where the real smarts are. When requesting a module, you will usually either do it |
| 187 | +via its exact long name, or you say something along the lines of "give me a module that matches this filter". |
| 188 | +Such a filter is given by way of a C<CompUnit::DependencySpecification> object which has fields for |
| 189 | +=item short-name, |
| 190 | +=item auth-matcher, |
| 191 | +=item version-matcher and |
| 192 | +=item api-matcher. |
| 193 | +
|
| 194 | +When looking through candidates, the Installation repository will smart match a module's long name against this |
| 195 | +DependencySpecification or rather the individual fields against the individual matchers. |
| 196 | +Thus a matcher may be some concrete value, a version range or even a regex (though an arbitrary regex, such as C<.*>, |
| 197 | +would not produce a useful result, but something like C<3.20.1+> will only find candidates higher than 3.20.1). |
| 198 | +
|
| 199 | +Loading the meta data of all installed distributions would be prohibitively slow. The current immplementation of |
| 200 | +the C<Raku> framework uses |
| 201 | +the file system as a kind of database. However, another implementation may use another strategy. The following description |
| 202 | +shows how one implementation works and is included here to illustrate what is happening. |
| 203 | +
|
| 204 | +We store not only a distribution's files but also create indices for speeding up lookups. |
| 205 | +One of these indices comes in the form of directories named after the short-name of installed modules. |
| 206 | +However most of the file systems in common use today cannot handle Unicode names, so we cannot just use |
| 207 | +module names directly. |
| 208 | +This is where the now infamous SHA-1 hashes enter the game. |
| 209 | +The directory names are the ASCII encoded SHA-1 hashes of the UTF-8 encoded module short-names. |
| 210 | +
|
| 211 | +In these directories we find one file per distribution that contains a module with a matching short name. |
| 212 | +These files again contain the ID of the dist and the other fields that make up the long name: auth, version and api. |
| 213 | +So by reading these files we have a usually short list of auth-version-api triplets which we can match against our |
| 214 | +DependencySpecification. |
| 215 | +We end up with the winning dist's ID, which we use to look up the meta data, stored in a JSON encoded file. |
| 216 | +This meta data contains the name of the file in the sources/ directory containing the requested module's code. |
| 217 | +This is what we can load. |
| 218 | +
|
| 219 | +Finding names for source files is again a bit tricky, as there's still the Unicode issue and in addition the same |
| 220 | +relative file names may be used by different installed distributions (think versions). |
| 221 | +So for now at least, we use SHA-1 hashes of the long-names. |
| 222 | +
|
| 223 | +=head2 Resources |
| 224 | +
|
| 225 | +=begin code |
| 226 | +%?RESOURCES |
| 227 | +%?RESOURCES<libraries/p5helper> |
| 228 | +%?RESOURCES<icons/foo.png> |
| 229 | +%?RESOURCES<schema.sql> |
| 230 | +
|
| 231 | +Foo |
| 232 | +|___ lib |
| 233 | +| |____ Foo.rakumod |
| 234 | +| |
| 235 | +|___ resources |
| 236 | + |___ schema.sql |
| 237 | + | |
| 238 | + |___ libraries |
| 239 | + |____ p5helper |
| 240 | + | |___ |
| 241 | + |___ icons |
| 242 | + |___ foo.png |
| 243 | +
|
| 244 | +=end code |
| 245 | +
|
| 246 | +It's not only source files that are stored and found this way. |
| 247 | +Distributions may also contain arbitrary resource files. |
| 248 | +These could be images, language files or shared libraries that are compiled on installation. |
| 249 | +They can be accessed from within the module through the C<%?RESOURCES> hash |
| 250 | +
|
| 251 | +As long as you stick to the standard layout conventions for distributions, this even works during development |
| 252 | +without installing anything. |
| 253 | +
|
| 254 | +A nice result of this architecture is that it's fairly easy to create special purpose repositories. |
| 255 | +
|
| 256 | +=head2 Dependencies |
| 257 | +
|
| 258 | +Luckily precompilation at least works quite well in most cases. Yet it comes with its own set of challenges. |
| 259 | +Loading a single module is easy. |
| 260 | +The fun starts when a module has dependencies and those dependencies have again dependencies of their own. |
| 261 | +
|
| 262 | +When loading a precompiled file in C<Raku> we need to load the precompiled files of all its dependencies, too. |
| 263 | +And those dependencies B<must> be precompiled, we cannot load them from source files. |
| 264 | +Even worse, the precomp files of the dependencies B<must> be exactly the same files we used for precompiling our |
| 265 | +module in the first place. |
| 266 | +
|
| 267 | +To top it off, precompiled files work only with the exact C<Raku> binary, that was used for compilation. |
| 268 | +
|
| 269 | +All of that would still be quite manageable if it weren't for an additional requirement: as a user you expect a new |
| 270 | +version of a module you just installed to be actually used, don't you? |
| 271 | +
|
| 272 | +In other words: if you upgrade a dependency of a precompiled module, we have to detect this and precompile the module |
| 273 | +again with the new dependency. |
| 274 | +
|
| 275 | +=head2 Precomp stores |
| 276 | +
|
| 277 | +Now remember that while we have a standard repository chain, the user may prepend additional repositories by way of |
| 278 | +C<-I> on the command line or "use lib" in the code. |
| 279 | +
|
| 280 | +These repositories may contain the dependencies of precompiled modules. |
| 281 | +
|
| 282 | +Our first solution to this riddle was that each repository gets it's own precomp store where precompiled files are stored. |
| 283 | +We only ever load precomp files from the precomp store of the very first repository in the chain because this is the |
| 284 | +only repository that has direct or at least indirect access to all the candidates. |
| 285 | +
|
| 286 | +If this repository is a FileSystem repository, we create a precomp store in a C<.precomp> directory. |
| 287 | +
|
| 288 | +While being the safe option, this has the consequence that whenever you use a new repository, we will start out |
| 289 | +without access to precompiled files. |
| 290 | +
|
| 291 | +Instead, we will precompile the modules used when they are first loaded. |
| 292 | +
|
| 293 | +=head2 Credit |
| 294 | +This tutorial is based on a C<niner> L<talk|http://niner.name/talks/A%20look%20behind%20the%20curtains%20-%20module%20loading%20in%20Perl%206/>. |
| 295 | +=end pod |
0 commit comments