Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📎 Monorepo support #2228

Open
ematipico opened this issue Mar 28, 2024 · 15 comments
Open

📎 Monorepo support #2228

ematipico opened this issue Mar 28, 2024 · 15 comments
Labels
A-CLI Area: CLI A-Core Area: core Fund S-Enhancement Status: Improve an existing feature S-Feature Status: new feature to implement

Comments

@ematipico
Copy link
Member

ematipico commented Mar 28, 2024

Description

This task is related to #1573 , but it has slightly different requirements and use cases, although we could potentially solve both with the same solution.

Background

Monorepo (package manager workspaces) are very common in the web ecosystem, and they come in different flavours and expectations.

However, the common denominator is the following: a root configuration file, and each package in the monorepo extends the root configuration.

Flavours:

  1. one command at the root of the monorepo, then the CLI/LSP does the job of understanding which configuration file that is closer to the file is processing, and it applies the changes accordingly. This is very common for lint/format/testing tools.
  2. multiple commands, each command is defined inside a package of the monorepo. Then users usually use other tools to run these commands at once, e.g. pnpm run --filter, turborepo, etc. This is very common for building tools such as bundlers, compilers, and doc generation.

The Biome case

Biome is a particular case here, because, even though it is a linter/formatter, in the future Biome will also transform/compile users's code, so it requires awareness of the manifest file and the dependency graph. Which means, while it makes sense to run biome check at the root of the monorepo, what about a - future - biome compile command?

We will have to untangle this. I am also happy to force users to set up Biome in one way in their monorepo.

CLI vs LSP = Workspace

The solution should lie in the Workspace. The Workspace is what LSP and CLI both share, meaning that both of them hold an instance of it, and they use it to pull data when they need.

CLI

The CLI usually works from up to bottom, it scans and handles the files that are closest to the working directorey, and eventually handles the farthest files from the working directory. Although, this isn't always true, because for each directory AND file, we always span a thread, which means that eventually all jobs go their own way.

We would need to change the strategy of our CLI here, in way where we would need to read and resolve possible biome.json files in each folder.

This could be potentially solved with a new workspace configuration, that would allow Biome to resolve the packages before hand.

LSP

The LSP has a different problem to solve. Biome must apply the correct configuration for the opened file when jumping from one file to another.

Workspace

The reason why I think the solution lies in the Workspace, is because both CLI and LSP have to do a very similar job: when handling a file, we should apply the configuration that belongs to that file. The Biome Workspace would potentially store all those configurations, and then the CLI and LSP could:

  • signal the Workspace to use the correct configuration, e.g. Workspace::swap_config
  • or the Workpace could that by checking the path of the file/folder, and resolving automatically the configuration to use

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue. Use the  👍 emoji to upvote it.
  • Maintainers and Core contributors will work on this issue.
  • If you'd like to see this feature happening sooner rather than later, consider funding this issue
Fund with Polar
@ematipico ematipico added A-CLI Area: CLI S-Enhancement Status: Improve an existing feature S-Feature Status: new feature to implement A-Core Area: core labels Mar 28, 2024
@polar-sh polar-sh bot added the Fund label Mar 28, 2024
@NyanHelsing
Copy link

NyanHelsing commented Mar 30, 2024

wouldn't it be easier to run biome in each package and run it in parallel with like a pnpm (or whatever your'e using for workspaces) script?

That affords a lot of flexibility considering in a monorepo you may not have all packages linted with the same config or even the same tools, eg you have an older package that is using eslint+prettier still that package can have an npm script format that does it with those tools, and then pnpm format can do biome in another package, then the package.json at the workspace root can do https://pnpm.io/cli/run#--recursive--r

To double down on this; unix philosophy says do one thing and one thing well. Biome is amazing at fixing style issues and smells in ur code. It's not a workspaces tool; theres already a bunch of tools that are specialized in that, eg lerna, pnpm workspaces, yarn workspaces, rush... biome should work with these tools not against them.

@arendjr
Copy link
Contributor

arendjr commented Apr 2, 2024

@NyanHelsing I think you make a convincing argument, but there’s a few practical issues we keep running into with the current approach (which aligns with your suggestion):

  • Users expect to open a monorepo in their editor and have it “just work” with nested packages that may or may not have customized configs. If we don’t add monorepo support to Biome we would put the burden on our extension developers to fix this issue.
  • Competing tools such as ESLint have created an environment where users expect nested configuration files to just work. This is not limited to monorepo setups, but it’s commonly used there. (Interestingly, ESLint seems to want to migrate away from this with their new flat configs, but IMO it’s a gamble to see how users will receive this change).
  • Not a current concern, but an anticipated one: When we implement type inference or bundling, we may need to pull information from other packages. This typically includes third-party (NPM) packages, but in a monorepo may also include first-party packages stored within the same repositories. When we need to do this, we need to have an understanding of all the packages within the repository and their (inter)dependencies. In other words, we’ll need a holistic understanding of the monorepo anyway.

So while I agree with your argument in principle, I’m afraid there’s too many practical downsides for Biome to keep it limited to a per-package scope.

@arendjr
Copy link
Contributor

arendjr commented Apr 2, 2024

@ematipico When you mention the workspace::swap_config command, what would that look like? I’m afraid such a command would be a source of race conditions if multiple other threads or even processes are concurrently executing commands against the Workspace. But maybe there’s a part of it I’m overlooking.

@ematipico
Copy link
Member Author

@NyanHelsing

I also share your vision, and I wouldn't have created this issue if it wasn't for @arendjr's points in #2228 (comment). The web ecosystem has matured in the last few years, although we have yet to make a standard on managing these big monorepos, which means that users have different flavours.

For example, some projects out there would expect a top-level CLI command called test, which would run all the tests.

I would suggest the same as you did: tell the users to use a package manager to create multiple scripts, one in each monorepo. I just wish there was a better way to do this, maybe by pushing for a proper use of workspaces.


@arendjr

Yeah workspace::swap_config, in a CLI environment, might not be the best. I haven't designed a solution yet, my was more a suggestion/idea.

@NyanHelsing
Copy link

@NyanHelsing I think you make a convincing argument, but there’s a few practical issues we keep running into with the current approach (which aligns with your suggestion):

* Users expect to open a monorepo in their editor and have it “just work” with nested packages that may or may not have customized configs. If we don’t add monorepo support to Biome we would put the burden on our extension developers to fix this issue.

i'm confident theis can ameliorated with a top-level section in the documentation: (demonstrated here with pnpm)

Biome and Workspaces

Biome works great with workspaces; install it in each of the packages:

pnpm -r add @biomejs/biome

Biome can now be run in each package.

pnpm -r exec biome
* Competing tools such as ESLint have created an environment where users expect nested configuration files to just work. This is not limited to monorepo setups, but it’s commonly used there. (Interestingly, ESLint seems to want to migrate away from this with their new flat configs, but IMO it’s a gamble to see how users will receive this change).

if this is the desire it isn't obvious that is should be dependent on workspaces, this is just about putting multiple biome.json in any folder structure and expecting them to work?

* Not a current concern, but an anticipated one: When we implement type inference or bundling, we may need to pull information from other packages. This typically includes third-party (NPM) packages, but in a monorepo may also include first-party packages stored within the same repositories. When we need to do this, we need to have an understanding of all the packages within the repository and their (inter)dependencies. In other words, we’ll need a holistic understanding of the monorepo anyway.

it still sounds like this is saying it isn't needed for linting or formatting.

In the lint/format space, biome files a decidedly lint/formatting-shaped hole. Theres jslint/jshint which nobody should use because they're old and slow and dont work on modern code, and there's prettier/eslint which are more modern but still slow. Biome is needed here.

Since there are already lots of tools that do bundling we'd expect lots of users to continue to use one of the many bundlers (rollup, rspack, webpack, swcpack, turbopack, even grunt) that are all very good (except maybe grunt which is old, webpack which is slow. A biome bundler (AFAIK) doesn't exist yet; assuming it already exists and there is some space it fills that isn't already occupied by one of the previous bundlers; I could imagine users being a little upset at the prospect of having many bundlers installed in their project that might not even be used.

It's strongly encouraged to make a bundler a part of a dedicated and separate install (or better, provide tight integration with the existing bundlerts) rather than bloating the tool that creates a hygenic environment for us.

So while I agree with your argument in principle, I’m afraid there’s too many practical downsides for Biome to keep it limited to a per-package scope.

@arendjr
Copy link
Contributor

arendjr commented Apr 2, 2024

i'm confident theis can ameliorated with a top-level section in the documentation: (demonstrated here with pnpm)

I'm sorry, but the solution you offered has nothing to do with the problem I highlighted. The problem is that when users open a repository in their IDE, the extension currently only supports using the top-level biome.json for all files in that repository. If the repository is a monorepo, it won't discover the nested the biome.json files and thus apply the wrong configuration on files.

It's strongly encouraged to make a bundler a part of a dedicated and separate install (or better, provide tight integration with the existing bundlerts) rather than bloating the tool that creates a hygenic environment for us.

Please look again at the tagline for Biome: One toolchain for your web project

It is the project's explicit goal to create a single unified tool that can cover several needs.

@NyanHelsing
Copy link

i'm confident theis can ameliorated with a top-level section in the documentation: (demonstrated here with pnpm)

I'm sorry, but the solution you offered has nothing to do with the problem I highlighted. The problem is that when users open a repository in their IDE, the extension currently only supports using the top-level biome.json for all files in that repository. If the repository is a monorepo, it won't discover the nested the biome.json files and thus apply the wrong configuration on files.

This sounds like a problem with the ide plugins then? not with biome per se; the plugin just needs to have the command to be configurable; then document it at a top level so it's easily seen by folks.

It's strongly encouraged to make a bundler a part of a dedicated and separate install (or better, provide tight integration with the existing bundlerts) rather than bloating the tool that creates a hygenic environment for us.

Please look again at the tagline for Biome: One toolchain for your web project

It is the project's explicit goal to create a single unified tool that can cover several needs.

A tool chain contains multiple tools.

@arendjr
Copy link
Contributor

arendjr commented Apr 2, 2024

This sounds like a problem with the ide plugins then? not with biome per se; the plugin just needs to have the command to be configurable; then document it at a top level so it's easily seen by folks.

It's not a matter of documentation or configuring the extension. Extensions would need the ability to switch between configurations on the fly, depending on which file is currently opened. Please see this comment for why we don't want to implement such logic purely within the extension: #1573 (comment)

A tool chain contains multiple tools.

Sorry, but I don't see this discussion heading in a productive direction. Feel free to disagree, but Biome has been clear in its approach: It provides multiple tools within a single command/binary/toolchain.

@anthony-hayes
Copy link

anthony-hayes commented Apr 17, 2024

" in the future Biome will also transform/compile users's code, so it requires awareness of the manifest file and the dependency graph"

Is there any details on these compilation plans? For context, right now I'm using Biome in a monorepo, entirely for its prettier-compliant formatter.

@arendjr
Copy link
Contributor

arendjr commented Apr 18, 2024

@anthony-hayes It’s briefly mentioned in the roadmap: https://biomejs.dev/blog/roadmap-2024/#transformations

There’s also already a biome_js_transformation crate and I think the TS => JS transform is already implemented. Additionally, we’re working on implementing GritQL plugins which will eventually allow user-defined transformations as well. So bits and pieces are being worked on, but I don’t think the compiler functionality itself is a concrete focus right now.

@Faithfinder
Copy link

I wouldn't strictly call "Central file" common denominator.

There's a monorepo tool called Rush, and it basically does things in a way opposite to more popular tools. And while it has less downloads than turbo or nx, it is used for large-scale monorepos at companies like TikTok, Microsoft, HBO.

Rush's way is basically to isolate package from each other for portability. There's no root config files, each dependency is explicitly listed in each package.json, common configs are distributed as packages themselves.

@omairvaiyani
Copy link

@Faithfinder Maybe I'm missing the crux here. Whilst Rush doesn't have a central package.json file or equivalent, it does have mono-repo level configuration that can alter the behaviour of commands ran within individual projects.

@Faithfinder
Copy link

Faithfinder commented May 2, 2024

@Faithfinder Maybe I'm missing the crux here. Whilst Rush doesn't have a central package.json file or equivalent, it does have mono-repo level configuration that can alter the behaviour of commands ran within individual projects.

Well, my main point was that a lot of tools don't work on Rush because they expect a root level package.json or a lock file. Other tools are using node_modules as a build target by default (prisma, panda CSS). Either approach doesn't work well in Rush's case.

And I do believe Rush gets many things right.

Just trying to put other approaches on Biome's radar before they make a design decision that's hard to back out of

@ematipico
Copy link
Member Author

@Faithfinder thank you for providing a different example. My assumption was based on my working experience and the projects that I've seen around. I know Rush, and I wanted to use it; however, when I was evaluating the project, I understood that it still needed a package manager under the hoods. Of course, I might be wrong.

@Faithfinder
Copy link

Oh, it uses a package manager under the hood, but it wraps it and imposes additional restrictions on top. At this point it would be easier for you to try it. Their own repo is a moderately sized Rush repo, works well as a study https://github.com/microsoft/rushstack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-CLI Area: CLI A-Core Area: core Fund S-Enhancement Status: Improve an existing feature S-Feature Status: new feature to implement
Projects
None yet
Development

No branches or pull requests

6 participants