Skip to content
This repository has been archived by the owner on Aug 3, 2023. It is now read-only.

Support arbitrary javascript bundling tools #1568

Merged
merged 22 commits into from Dec 11, 2020
Merged

Support arbitrary javascript bundling tools #1568

merged 22 commits into from Dec 11, 2020

Conversation

caass
Copy link
Contributor

@caass caass commented Sep 23, 2020

Inspired by (and will hopefully one day close) #1558

Motivation

After searching for why I wasn't able to target ESNext with typescript and webpack, I found out it was because I was using optional chaining. Even though (if I understand correctly) the workers runtime uses V8, which supports optional chaining, the included version of webpack doesn't. I found someone had already brought this up in #1558, and since I religiously read the cloudflare blog posts and have been messing around with rust for the past few months, I thought I might be able to contribute this feature, since Workers is my go-to platform.

The idea behind this PR is that wrangler should be able to transparently sit on top of any arbitrary javascript bundling system, such as rollup or parcel or even swc. While integration with webpack has been awesome and has enabled me to use all sorts of neat things like typescript and whatnot, ultimately it does feel like kind of an arbitrary choice -- and hardcoding in a bunch of webpack-4 specific stuff feels...unfriendly to maintenance?

What I mean to say is that the cracks are starting to show (e.g. with nullish coalescing and optional chaining), and I think now would be a great time to implement this functionality to allow wrangler to decouple itself from any specific javascript bundling tool as Workers continues to become a true competitor to offerings like Lambda and GCF.

I would love to hear feedback on whether or not this is something I should be working on (i.e. something similar is in the works internally), as well as any design decisions to take into consideration.

Problem

Currently, wrangler is coupled with webpack in only a couple ways:

If I'm understanding correctly, wranglerjs basically works by:

  1. Running some sanity checks to make sure...
    a. ...the target is webworker (here)
    b. ...the output filename is "worker.js" (here)
  2. Overwriting the webassembly functionality to...
    a. ...remove the dependency on a filesystem, which workers doesn't support (here in wranglerjs vs here in webpack)
    b. ...disable import mangling and streaming (here)
    • Note: This is probably the most intensively coupled-to-webpack part of this whole process.
  3. Actually executing the config (here)

Because of the way that wranglerjs kind of "wraps around" webpack, it's impossible to use other tools (such as webpack 5).

Solution

Before I start getting into rampant speculation, I would once again like to interject -- y'all are the experts on WASM/JS/etc., and I am just a dev who wants to use optional chaining in her workers. Please, please tell me where I have missed/misunderstood stuff. This whole PR idea might be completely impossible.

My proposed solution to this problem would be to change the functionality of wranglerjs so that it runs after the javascript bundling process, and makes the necessary modifications (sanity checks, wasm imports) then.

Sanity checks

Checking for the "webworker" target is basically just trying to make sure we (the lowly devs) aren't using any APIs not available in the workers runtime. We could use a tool like Esprisma to check to make sure that the generated script doesn't use any APIs other than the ones available in the Workers runtime (e.g. Window).

Output filenames is trivial, wrangler can just rename and warn.

A new check needs to be introduced, which ensures that there's just a single JS file (and an optional single WASM file) since we no longer have control over the build process, which may produce multiple output files depending on its chunk-splitting strategy. I think this should just result in an outright build failure, since the alternative is to bundle the output ourselves, which kind of defeats the whole point of this PR.

Webassembly functionality

I'm actually pretty lost here. I have never successfully used WebAssembly with Workers (which is a me problem, not a Workers problem) so I'm not sure exactly how it's handled. I suspect that the .wasm files themselves are "dumb", meaning they don't need to be modified, and that all of the importing/streaming/mangling stuff occurs on the JS side.

If (fingers crossed) this is the case, this too could be accomplished by modifying the bundled code and doing some AST trickery to change any usages of streaming or filesystem stuff to use whatever logic Workers requires instead.

That's all! I would love to hear any and all thoughts regarding my proposal. Thank you for your time ❤️

literally does nothing except break things :blushing:
@caass caass requested a review from a team as a code owner September 23, 2020 04:03
@caass caass marked this pull request as draft September 23, 2020 04:04
Not sure *exactly* what this does but i basically just copied the
parts i thought were relevant from the site code. We have a src dir,
which is what needs to be watched, and an output dir, which is what
needs to be analyzed, and a build command, which is what needs to be
run when we wanna run the build.
@ispivey
Copy link
Contributor

ispivey commented Sep 24, 2020

@caass this is awesome – thanks so much! None of us have had time for the detailed response that your comments merit, but your analysis is great. We'll try to get back to you tomorrow – but I didn't want to leave you hanging and thinking this was unappreciated 🎉

I think...at least...that that's what this commit does. Def starting
to get into "oh god i hope i don't break anything" territory.
Also, use the nice AsRef<Path> idiom that works so well
@xtuc
Copy link
Member

xtuc commented Sep 28, 2020

The use of a webpack-specific CLI arg to redirect build output

https://github.com/cloudflare/wrangler/blob/ce41671a36db1f81275685362e9fd8f2f92b6f53/wranglerjs/index.js#L153 we use an intermediate representation. this isn't specific to webpack, but would currently require an indirection to support other bundler.

If I'm understanding correctly, wranglerjs basically works by:

sounds correct.

Solution

as you mentioned, wranglerjs should be a step after bundling that lints/transforms the output of any bundlers.

webworker

this target causes many issues for workers since it doesn't map entirely to worker API and many nodejs libraries won't be supported. I believe you described a better solution, using babel we can transform the code to polyfill missing API.

Webassembly functionality

only functionally, provided by wranglerjs, is fixing the webpack internal runtime to support loading wasm module in a worker and wrangler must generate the correct metadata to bind the wasm module to a worker.

@caass
Copy link
Contributor Author

caass commented Sep 28, 2020

@xtuc thanks so much for your thoughts!

The use of a webpack-specific CLI arg to redirect build output

https://github.com/cloudflare/wrangler/blob/ce41671a36db1f81275685362e9fd8f2f92b6f53/wranglerjs/index.js#L153
we use an intermediate representation. this isn't specific to webpack, but would currently require an indirection to support other bundler.

I was actually looking into this for a different reason -- one of the drawbacks of the method of running wrangler after the build process is that there is another round of disk reads/writes that makes the whole process slower. This line clears things up for me and introduces another problem that is worth looking at.

Currently, the process looks like (and again, please correct me if I'm wrong)

  1. Wrangler runs webpack and stores the bundled JS in memory
  2. Wrangler does its voodoo to package in the wasm
  3. The output is stored to disk

The naive approach I've been looking at so far for this process would be something like

  1. JS/WASM distribution bundle is written to disk
  2. Wrangler reads the bundle from disk into memory
  3. Wrangler applies whatever changes are necessary
  4. Wrangler writes the modified files back to disk.

I hadn't thought about intercepting filesystem writes from the bundler -- I was looking into more like, "how can I intercept arbitrary writes to the filesystem and redirect them into memory," to which the answer seemed to be "you can't."

The first solution I thought of to avoid the extra overhead would be to require that the bundler output to stdout and have Wrangler tap in from there (via basically an elaborate pipe -- like webpack --stdout | wrangler build), but it doesn't look like webpack has support for that from my one minute of googling.

I wonder if there's some way to create a monkey-patched version of fs and have webpack or whatever use it. That kinda seems like a rabbit hole though, and not one I want to go down right now 😅

this target causes many issues for workers since it doesn't map entirely to worker API and many nodejs libraries won't be supported. I believe you described a better solution, using babel we can transform the code to polyfill missing API.

I hadn't even thought of polyfilling missing APIs -- I was leaning towards using swc_ecma_parser to run lints from Rust to 1) leverage the performance benefits and 2) keep more of the logic in one place (Rust), instead of passing back and forth too much, but I think if the end goal is to automatically polyfill missing APIs then it's a no-brainer to leverage babel instead.

Webassembly functionality

only functionally, provided by wranglerjs, is fixing the webpack internal runtime to support loading wasm module in a worker and wrangler must generate the correct metadata to bind the wasm module to a worker.

While I've tended to stay away from WASM since it's never really been necessary for my workers use-cases, it's definitely a must-have for this PR to land. With that in mind (plus the LOC you linked earlier), I think the last missing piece of knowledge for me to build out an MVP is an understanding of exactly how worker.js interacts with WebAssembly. I'll do some more digging and post here if I encounter any roadblocks.

Thanks again for your help ❤️

@xtuc
Copy link
Member

xtuc commented Sep 29, 2020

one of the drawbacks of the method of running wrangler after the build process is that there is another round of disk reads/writes that makes the whole process slower

It does already and i don't think slowness is a concern, as long as it's not considerably slower.

Wrangler runs webpack and stores the bundled JS in memory

Stores on disk. see ./dist (webpack's output) and the --output-file option.

I hadn't thought about intercepting filesystem writes from the bundler

webpack can do in-memory only but all bundlers won't be able to do that.


I wanted to do it earlier but i think we need some specification of the work that needs to be done, before changing the code.

I think it makes sense to actually create a `Bundler` struct because
like...eventually in my head it makes sense to have one `Bundler`
instance that you could like pass around in `wrangler dev` or
whatever.

I think most of the codebase is more functional-y but i'm sure someone
will tell me if this is not the preferred style to do this in.

How many more commits can I make before I have to actually parse any
javascript? Let's find out, shall we?
This restructuring is because I think it makes more sense to
have each FileType run its own checks. This also cleanly bridges
between the functions in `js.rs` and `wasm.rs` and the BundlerOutput
struct in `mod.rs`.
Looks like we will have to wait until the next commit to parse any AST
I keep making tons of changes without committing but I think that it
seems bigger than it actually is -- mostly what I've done here is
define a few traits

Lintable: a struct is able to lint itself with a given argument to check against
Parseable: a struct is able to create Self from a given input
Validate: a struct satisfies both Lintable and Parseable

The JS boilerplate is mostly done -- since we actually want to lint expressions,
and not statements, the bulk of the code at the moment is just about
getting to the expressions in each statement. Essentially a glorifed
switch case with a little recursion mixed in.

The webassembly is all todo!(), i just wanted to get typechecking working.

Basically, this is just yet another commit before we actually parse any
AST. But I did write some documentation, which is cool.
I guess...now I just have to...implement Lintable for all these
different types of expressions...yippee...
@nataliescottdavidson
Copy link
Contributor

Hey @caass, this is super cool. It looks like the current build is failing- if you can revert to a stable commit on this branch, I'd be happy to check it out

@caass
Copy link
Contributor Author

caass commented Oct 5, 2020

@nataliescottdavidson sure! i'm pretty sure the build is failing because the linter is catching "unused variable" stuff for unimplemented methods -- i had started prefixing my variables with underscores to pass lint checks but then i wound up forgetting what still needed to be done 😬

Hopefully tonight i'll have the javascript linter closer to completion, which has been the bulk of the effort on this PR, although if you'd like I can make a quick commit to pass CI before you poke around :)

@caass
Copy link
Contributor Author

caass commented Oct 10, 2020

Looking through the code for the durable objects beta chat example, I think I'd like to restructure this to accommodate the new Workers syntax:

// This is a new way of writing Workers that we expect to introduce more broadly in the future. We
// like this syntax because it is *composable*: You can take two workers written this way and
// merge them into one worker, by importing the two Workers' exported handlers yourself, and then
// exporting a new handler that call into the other Workers as appropriate.
//
// This new syntax is required when using Durable Objects, because your Durable Objects are
// implemented by classes, and those classes need to be exported. The new syntax can be used for
// writing regular Workers (without Durable Objects) too, but for now, you must be in the Durable
// Objects beta to be able to use the new syntax, while we work out the quirks.
//
// To see the API for uploading module-based Workers, check out the publish.sh script.

I've been busy recently trying to stabilize IP in the Rust standard library, but once that wraps up I'll have more time to dedicate to this. It seems to me like having composable workers goes hand-in-hand with "arbitrary bundling tools".

But we're going for the restructure. again. round four. this time,
with macros. because i'm sick of defining Lintable for all of these
structs
Everything compiles now, which is nice. We just need to actually
implement Lintable for JavaScript and we should have version one ready
to go.
@caass
Copy link
Contributor Author

caass commented Oct 29, 2020

Hello again everyone! I've done some more work and I feel like we're nearing completion -- or at least, nearing "ready for testing".

I've been doing most of the development here, basically breaking out the code effected by this PR and just running that locally instead of needing to compile and run the entire wrangler tool. It's not linked via git in any way, just me copying and pasting code back and forth, so things might be slightly out of sync. It mostly just makes things easier for me to debug by having a main function 😅

If anyone on the cloudflare side is interested in poking around, here's the basic idea:

  1. Define three traits: Parseable, which means a struct can be parsed from a given input, Lintable, which means a struct can lint itself, and Validate, which combines the parsing and linting steps.
  2. Define JavaScript, WebAssembly, and BundlerOutput, which are in-memory representations of a JS file, a WASM file, and the output of a bundler tool, respectively.
  3. Implement Validate (which requires both Lintable and Parseable) on JavaScript, WebAssembly, and BundlerOutput.
  4. Call BundlerOutput::validate(&output_dir) on the output directory as specified in wrangler.toml.

There's still work to be done, and some things I'm not sure of, and the whole thing is basically untested. That being said, progress is being made! I would love some feedback on this if anyone has some spare time ❤️

Copy link
Contributor

@Electroid Electroid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great work @caass, really excited where this is going! I left some high-level feedback and questions. If you're not already in the beta to test this, you can send an email with your account id to, ashcon at cloudflare, and I'll get you access.

src/build/check/mod.rs Outdated Show resolved Hide resolved
src/settings/toml/bundle.rs Outdated Show resolved Hide resolved
src/build/check/config.rs Outdated Show resolved Hide resolved
match n {
ModuleDecl::Import(import) => {
if import.type_only {
self.error(import, "Typescript not allowed!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a path, with the current approach, to supporting Typescript in the future? (That would be awesome)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! swc has full Typescript support, and has actually recently landed in Deno if that's a confidence booster on the library's reliability.

src/build/check/js/mod.rs Show resolved Hide resolved
src/build/check/js/parse.rs Outdated Show resolved Hide resolved
src/build/check/mod.rs Outdated Show resolved Hide resolved
src/build/check/mod.rs Show resolved Hide resolved
/// have access to the durable objects beta
/// but it seems as though the format is basically:
/// * one `.mjs` file that serves as the entrypoint to the worker,
/// * any number other arbitrary files that can be imported into the worker.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true, but there is a subtle nuance: what is the content-type of each file?

You might want to import an HTML file as an ArrayBuffer, thus using application/octet-stream in the form data upload; or text/plain which would import it as a String.

From what I gather, there are 3 potential options:

  1. Always upload non-JS and non-WASM files as application/octet-stream and worry about this nuance later
  2. Have users define the content-type of their imports somewhere in the wrangler.toml
  3. Adopt the tc39 proposal for import-assertions, which swc now supports

Moreover, if we decide to adopt the proposal, there are 2 additional considerations:

  1. Since V8 does not support it, each assertion would need to be "removed" prior to upload?
  2. What will each assertion type be called? "arraybuffer"? "text"?

Coincidentally, our very own @xtuc is championing this proposal, so I think he would have some great ideas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh hmm interesting!

I was able to find the tracking issue for V8, and it seems like they're working on it at the moment. I don't know what the typical timeline is with V8 from "started working" to "shipped", but I'd assume it's...a couple months? Is that overly optimistic? I'd love to hear some more input and hopefully by the time this PR lands it'll become a non-issue.

  1. Always upload non-JS and non-WASM files as application/octet-stream and worry about this nuance later

I choose the option that involves the least work!

  1. Have users define the content-type of their imports somewhere in the wrangler.toml

Speaking from personal experience, the less research and configuring I have to do, the better. Part of the appeal of wrangler (and Workers) is that it "just works", so I'm kind of opposed to this idea but not strongly. Just kind of wary of adding more configuration that for most folks will likely be unnecessary.

  1. Adopt the tc39 proposal for import-assertions, which swc now supports

I'm hesitant to start parsing things that don't map directly to V8 because then we might start to get into more opinionated parsing, like you said, which could lead to unexpected behavior somewhere down the line.

In my opinion the optimal approach for getting this done before V8 adopts import-assertions would be to traverse the AST and see what kinds of methods are called on each import, and whether those are ArrayBuffer type things or String type things, but that seems...overkill?

I think probably the easiest thing to do is to use application/octet-stream and document that that's what we've selected, and that if you want to manipulate your import as a String you should do some conversion first. There's a million StackOverflow questions about that exact thing so I figure that won't be too much of a hassle.

- Check total bundle size, not individual files
- Replace hyphens with underscores in wrangler.toml
- Rename MAX_FILE_SIZE to MAX_BUNDLE_SIZE
- Duplicate comment about swc_common::SourceMap vs sourcemap::SourceMap
- rename entry & entry_file to module & module_path
@xortive
Copy link
Contributor

xortive commented Dec 11, 2020

Hey @caass, thank you for your work on this PR, and our apologies for not responding to you sooner! We're really excited about this PR, and we're going to use it as the base for a feature branch that adds support for durable objects. However, we've decided to try and keep as much validation for worker bundles in the API as possible, as to not duplicate logic between wrangler and the API. The validator work here is very neat though, and if you find it useful I'd suggest looking into packaging it as a standalone linter binary, where it can run as part of your custom build and return a failed status to wrangler if validation fails.

@xortive xortive changed the base branch from master to malonso/modules-do December 11, 2020 18:55
@xortive xortive changed the title [WIP] Support arbitrary javascript bundling tools Support arbitrary javascript bundling tools Dec 11, 2020
@xortive xortive marked this pull request as ready for review December 11, 2020 18:56
@xortive xortive merged commit 1c3b588 into cloudflare:malonso/modules-do Dec 11, 2020
@caass
Copy link
Contributor Author

caass commented Dec 12, 2020

Ok! Glad to hear that the code from this PR will go to good use :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants