Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve monorepo developer experience #2480

Open
mattste opened this issue Dec 6, 2022 · 0 comments
Open

Improve monorepo developer experience #2480

mattste opened this issue Dec 6, 2022 · 0 comments

Comments

@mattste
Copy link

mattste commented Dec 6, 2022

I've been experiencing some pain points with our monorepo and its infrastructure tooling. While this is all fresh on my mind, I wanted to create an issue detailing approaches taken and their pros/cons. The hope is to create a space for Earthly users to discuss monorepo pain points, solutions and what potential solutions might exist.

This excellent post from @gf3 outlines what a realistic monorepo set-up may look like. This key point was mentioned:

> Hi @ahsf - see also this simple example: https://github.com/earthly/earthly/tree/main/examples/monorepo

unfortunately this example is a little too simplistic and doesn't accurately represent the structure and needs of most monorepos. most of the monorepos i've worked with have shared these three characteristics:

  1. multiple applications or binaries
  2. multiple shared libraries used by the above applications
  3. shared dependencies across both the applications and libraries

Approaches

There are a few possible approaches one can take with a monorepo. I'll describe and outline the pros/cons of each approach. For the examples, I'll be using an Elixir/Node monorepo with the following structure:

repo
├── apps
│  ├── app1 # elixir
│  └── app2 # node
│  └── app3 # node
├── packages
│  ├── package1
│  ├── package2
│  └── package3
└── package.json
└── yarn.lock

Approach 1: Earthly top-level

Repo Structure

# Earthfile

all:
  WAIT
    BUILD +test
  END
  WAIT
    IF [ $DEPLOY_ENABLED == "true" ]
      BUILD +deploy
    END
  END

deploy:
  # These deploys should only happen if we actually made changes to their respective targets
  BUILD ./apps/app1+deploy
  BUILD ./apps/app2+deploy
  BUILD ./apps/app3+deploy

test:
  BUILD ./apps/app1+test
  BUILD ./apps/app2+test
  BUILD ./apps/app3+test
# apps/app1/Earthfile

test:
  BUILD +test-self

test-self:
  FROM earthly/dind:alpine
  ... # Full stack integration tests
# apps/app2/Earthfile

test:
  BUILD +test-self
  BUILD ../../packages/package1+test
  BUILD ../../packages/package2+test

test-self:
  FROM earthly/dind:alpine
  ... # Full stack integration tests
# apps/app3/Earthfile

test:
  BUILD +test-self
  BUILD ../../packages/package1+test
  BUILD ../../packages/package3+test

test-self:
  FROM earthly/dind:alpine
  ... # Full stack integration tests
# .github/workflows/ci.yml
name: CI
on:
  - push
jobs:
  all:
    name: +all
    env:
      DEPLOY_ENABLED: ${{ github.ref == format('refs/heads/{0}', github.event.repository.default_branch) }}
    runs-on: ubuntu-latest
    steps:
      # set-up steps here
      - name: Run test and deploy
        run: |
          earthly \
            --push -P
            +all --DEPLOY_ENABLED=$DEPLOY_ENABLED

Analysis

This set-up allows for Earthly to construct a DAG and theoretically do the minimal work required.

It's also amazing that within each Earthly build target, you can install and run whatever tooling your heart desires (more on this later).

Open questions:

How do we only deploy the affected targets?

Currently, Earthly has no way to only run the deploy for app3 if it detects changes only in app1 or its child packages.

NX has the --affected flag.

How do we spread the compute load?

Right now this would all be running on a single node. For our monorepo, local machines grind to a halt when both Node and Elixir builds are running at the same time. This does not scale well as more projects are added.

Perhaps we should annotate each BUILD command with which satelite should be used. Although perhaps it'd be better if there is one giant shared cache with just compute being spread around.

How do we make this more ergonomic?

As build args are added for targets, everything must be passed along. The --pass-args flag proposal could definitely go a long way here.

Is Earthly's caching good enough?

This FAQ of Bazel vs Earthly makes the point that Earthly does not do file level compilation caching. The new CACHE command fixes these issues in my experience. In Elixir, you can cache the _build directory which has compiled Elixir files.

Can we integrate with existing package manger tooling better?

Both app2 and app3 would declare in their package.json that they depend on their javascript packages. Tooling such as NX automatically detect those dependencies.

Perhaps we add a project.json file that Earthly can utilize to detect and declaritively configure child dependencies. NX does an excellent job at this configuration from my experience. How would this work with Earthly's imperative nature?

How about local development?

Earthly is great but it has overhead. Can we have a fast and efficient watch mode? This issue covers this. It's important to note that even tools such as NX and Turborepo do a poor job at supporting this.

How do we handle important CI tasks such as commenting on a pull request with deploy status?

Imagine we have a target such as +pull-request-deploy. We want to update the pull request with the produced deployment URL and Docker image tag.

Since we're using Earthly, we could theoretically just create our own bash script that calls the Github API to post a comment. I do think there are definitely some improved ergonomics that could be found here in making it more declarative. Earthly and Docker excel at "inheritance" but aren't as great with "composition." This is a topic for another issue perhaps.

Approach 2: App/Service Level

Repo Structure

# Earthfile

# various targets
# apps/app1/Earthfile

test:
  BUILD +test-self

test-self:
  FROM earthly/dind:alpine
  ... # Full stack integration tests
# apps/app2/Earthfile

test:
  BUILD +test-self
  BUILD ../../packages/package1+test
  BUILD ../../packages/package2+test

test-self:
  FROM earthly/dind:alpine
  ... # Full stack integration tests
# apps/app3/Earthfile

test:
  BUILD +test-self
  BUILD ../../packages/package1+test
  BUILD ../../packages/package3+test

test-self:
  FROM earthly/dind:alpine
  ... # Full stack integration tests
# .github/workflows/app2-ci.yml
name: CI
on:
  - push
jobs:
  deploy:
    name: +deploy
    runs-on: ubuntu-latest
    steps:
      # set-up steps here
      - name: deploy
        run: |
          earthly \
            --push -P
            ./apps/app1+deploy
  test:
    name: +test
    runs-on: ubuntu-latest
    steps:
      # set-up steps here
      - name: test
        run: |
          earthly \
            --push -P
            ./apps/app1+test
# .github/workflows/app3-ci.yml
# The same as app2 but for app2 instead
# .github/workflows/app1-ci.yml
# The same as app2 but for app1 instead

Analysis

This changes our CI to split it at the app level. We no longer call a top level Earthfile target and instead do it for each service.

Open questions:

How can we avoid repeatedly testing the dependent packages?

If our app1 and app2 Github Actions workflows are running in parallel, then the dependent packages (pkg1, pkg3) will be repeatedly tested. Satelite caching may help with this but it also may be a race condition. This approach wastes satelite compute minutes (assuming they use the same satelite -- which as discussed above may not be a good idea).

Several of the questions mentioned in approach 1 also apply here. Most notably, how do we only run app deploy targets if they're actually affected by code changes.

Analysis

This changes approach 2 to instead let Github Actions handle the DAG. This allows for multiple services to avoid repeating work if they both use the same package.

Why are we splitting up our DAG when Earthly should be able to handle it all?

I've found working with CI providers to be a rough experience. I think part of that frustration is recreating a DAG that our build tools (such as Earthly) already know about. Instead of letting CI declare the compute split, what if we integrated that better within Earthly?

Approach 3: App/Service/Package Level

This is very similar to approach 2 but instead we don't have apps/services test their child dependencies. They're ran separately.

Repo Structure

# Earthfile

# various targets
# apps/app1/Earthfile

test:
  BUILD +test-self

test-self:
  FROM earthly/dind:alpine
  ... # Full stack integration tests
# apps/app2/Earthfile

test:
  FROM earthly/dind:alpine
  ... # Full stack integration tests
# apps/app3/Earthfile

test:
  FROM earthly/dind:alpine
  ... # Full stack integration tests
# packages/pkg1/Earthfile

test:
  FROM +node
  RUN yarn run test
# .github/workflows/app2-ci.yml
name: CI
on:
  - push
jobs:
  deploy:
    name: +deploy
    # Github Actions don't let you actually easily do this. You have to split it up and it's a horrible DX. See [here](https://stackoverflow.com/questions/58457140/dependencies-between-workflows-on-github-actions/64733705#64733705) for how this works.
    depends-on: pkg1-ci
    runs-on: ubuntu-latest
    steps:
      # set-up steps here
      - name: deploy
        run: |
          earthly \
            --push -P
            ./apps/app1+deploy
  test:
    name: +test
    runs-on: ubuntu-latest
    steps:
      # set-up steps here
      - name: test
        run: |
          earthly \
            --push -P
            ./apps/app1+test
# .github/workflows/pkg1-ci.yml
# The same as app1 but for pkg1 instead

Analysis

This changes approach 2 to instead let Github Actions handle the DAG. This allows for multiple services to avoid repeating work if they both use the same package.

Why are we splitting up our DAG when Earthly should be able to handle it all?

I've found working with CI providers to be a rough experience. I think part of that frustration is recreating a DAG that our build tools (such as Earthly) already know about. Instead of letting CI declare the compute split, what if we integrated that better within Earthly?

Approach 4: Just use a monorepo build tool (NX, Turborepo, Bazel, etc.)

This is very similar to approach #1 but instead leverages a build tool (in this example, NX).

Repo Structure

# Earthfile

deploy:
  COPY . .
  RUN --push nx --affected --target=deploy

test:
  COPY . .
  RUN --push nx --affected --target=test
# .github/workflows/ci.yml
name: CI
on:
  - push
jobs:
  deploy:
    name: +deploy
    if: ${{ github.ref == format('refs/heads/{0}', github.event.repository.default_branch) }}
    runs-on: ubuntu-latest
    steps:
      # set-up steps here
      - name: Run test and deploy
        run: |
          earthly \
            --push -P
            +deploy
  test:
    name: +test
    if: ${{ github.ref == format('refs/heads/{0}', github.event.repository.default_branch) }}
    runs-on: ubuntu-latest
    steps:
      # set-up steps here
      - name: Run test and test
        run: |
          earthly \
            --push -P
            +test

Analysis

This seems great because existing tools can be leveraged to handle the monorepo side of things.

Open questions:

How do we avoid one giant Docker image?

Our top level targets would require us to install all of the tooling we need for them to run. This means a giant image with both Elixir, Node and any other language we use. It also means one image with all of the binaries used during build/test time.

It doesn't allow us to leverage Earthly to split-up our Docker instructions of what build tools are required for each app/package. Perhaps UDC commands have a place here but it's far from an ideal solution.

How do we spread the compute?

Some of the build tools support distributed task execution but then we lose the benefits of Earthly. We have to use the provider's tools for declaring necessary dependencies.

How can we improve the copy operation?

There are definitely some performance considerations when copying all of the files in the repo. Also, the DX is not great but this issue on nested .earthlyignore files could improve some pain points.

Summary

I think Earthly is in a unique position to take over more of the stack. These are real issues that I've ran into when working on my team's multi-language monorepo. We are a small team and only have two major languages in use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

1 participant