Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improve TS SDK runtime performances #7096

Merged
merged 14 commits into from
May 22, 2024

Conversation

TomChv
Copy link
Member

@TomChv TomChv commented Apr 15, 2024

Following #7093, I found that most of the time is spent downloading dependencies, it seems I can drastically reduce the download time using yarn and the --production flag to only install production dependencies.

This is a first step toward improving TS performances, we might switch to pnpm later, this require some tests though because I'm hitting an issue (see #7093 (comment))

@TomChv TomChv requested review from vito and helderco April 15, 2024 18:59
@TomChv TomChv self-assigned this Apr 15, 2024
@TomChv TomChv force-pushed the feat/improve-ts-performances branch from e35d7c0 to fe36957 Compare April 16, 2024 08:52
sdk/typescript/runtime/main.go Outdated Show resolved Hide resolved
sdk/typescript/runtime/main.go Outdated Show resolved Hide resolved
sdk/typescript/runtime/main.go Outdated Show resolved Hide resolved
@TomChv TomChv force-pushed the feat/improve-ts-performances branch 2 times, most recently from 77828af to 1988c2b Compare April 26, 2024 10:05
Copy link
Contributor

This PR is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@jedevc
Copy link
Member

jedevc commented May 13, 2024

@TomChv how's this one going? Anything you need a hand with?

@TomChv
Copy link
Member Author

TomChv commented May 14, 2024

how's this one going? Anything you need a hand with?

I'm trying to figure out how to reduce the installation time of the global package, it seems using yarn make it slower so I'm trying other ways.
I might be able to hack some ways to use pnpm at some point, still digging into it

@TomChv TomChv force-pushed the feat/improve-ts-performances branch from 1988c2b to d1e14c4 Compare May 14, 2024 18:08
@TomChv
Copy link
Member Author

TomChv commented May 14, 2024

It seems theses changes in the setup introduce issues in the tests, sometime the dependencies cannot be installed properly and it's a weird typescript error.
I'm digging into it because this does not make sense

@TomChv
Copy link
Member Author

TomChv commented May 14, 2024

I can reproduce the error using dagger call --source=. test custom --run="TestModuleTypescriptInit"

I'm going to try to reproduce it locally, it seems something is going wrong when using dagger init with the new setup, I'll update this comment with my researches.

Update 1:
It seems the sdk folder doesn't contain the source anymore, this is the origin of the CI issue.
I'm trying to figure out if that's because withDirectory override the current sdk file or if it's something else.
Normally it should merge, I'm doing tests to understand what's going on.

Update 2:
I found out that mounting a directory in the path was overwritting the current SDK.
Thanks to @vito, we found a solution: mount the module dir with the SDK in it.
I updated the code and it worked. the only issue of that is the copy of the SDK directory with the node_modules in it, it doesn't seem to affect performances that much though.

Update 3:
I splitted part of the Base function into sub function to make it easier to understand and find some ways to improve the current process.

Update 4:
I saw that I copy the runtime directory into base container, it may be possible to improve performances by ignoring part of it.
I think I could apply the same logic in the sdk directory to ignore tests files that are useless during the execution.

Update 5:
I'm hitting an issue with TestModuleTypescriptInit/respect existing package.json test.

I'm still trying to understand what is happening.

                                       
    multi.go:85:     ┃ Error: Unsupported type Container                                                                                                                                                                                                                                                                     
    multi.go:85:     ┃     at typeToTypedef (/src/sdk/introspector/scanner/utils.ts:115:13)                                                                                                                                                                                                                                  
    multi.go:85:     ┃     at Method.get returnType (/src/sdk/introspector/scanner/abtractions/method.ts:114:12)                                                                                                                                                                                                             
    multi.go:85:     ┃     at Method.get typeDef (/src/sdk/introspector/scanner/abtractions/method.ts:130:24)                                                                                                                                                                                                                
    multi.go:85:     ┃     at <anonymous> (/src/sdk/entrypoint/register.ts:44:57)                                                                                                                                                                                                                                            
    multi.go:85:     ┃     at Array.forEach (<anonymous>)                                                                                                                                                                                                                                                                    
    multi.go:85:     ┃     at <anonymous> (/src/sdk/entrypoint/register.ts:43:35)                                                                                                                                                                                                                                            
    multi.go:85:     ┃     at Array.forEach (<anonymous>)                                                                                                                                                                                                                                                                    
    multi.go:85:     ┃     at register (/src/sdk/entrypoint/register.ts:36:33)                                                                                                                                                                                                                                               
    multi.go:85:     ┃     at result.startActiveSpan (/src/sdk/entrypoint/entrypoint.ts:42:26)                                                                                                                                                                                                                               
    multi.go:85:     ┃     at <anonymous> (/src/sdk/telemetry/tracer.ts:48:22)                                                                                                                                                                                                                                               
    multi.go:85:     ┃                                                                                                                                                                                                                                                                                                       
    multi.go:85:     ┃ Node.js v21.3.0                                                                                                                                                                                                                                                                                       
    multi.go:85:       ✔ exec yarn install 0.7s
    multi.go:85:       ┃ yarn install v1.22.19                                                                                                                                                                                                                                                                               
    multi.go:85:       ┃ info No lockfile found.                                                                                                                                                                                                                                                                             
    multi.go:85:       ┃ [1/4] Resolving packages...                                                                                                                                                                                                                                                                         
    multi.go:85:       ┃ [2/4] Fetching packages...                                                                                                                                                                                                                                                                          
    multi.go:85:       ┃ [3/4] Linking dependencies...                                                                                                                                                                                                                                                                       
    multi.go:85:       ┃ [4/4] Building fresh packages...                                                                                                                                                                                                                                                                    
    multi.go:85:       ┃ success Saved lockfile.                                                                                                                                                                                                                                                                             
    multi.go:85:       ┃ Done in 0.05s.                                                                                                                                                                                                                                                                                      
    multi.go:85:       ✘ exec tsx --no-deprecation --tsconfig /src/tsconfig.json /src/src/__dagger.entrypoint.ts 3.1s
    multi.go:85:       ! process "tsx --no-deprecation --tsconfig /src/tsconfig.json /src/src/__dagger.entrypoint.ts" did not complete successfully: exit code: 1
    multi.go:85: 
    multi.go:85: 
    suite_test.go:299: 

I think dagger is not correctly instead in the package.json in that dagger, I think I can workaround this with another install after only this that case.

Update 5
I'm currently hitting a regression of performances after fixing the issue, I'm trying to figure out how to get back my old performances.

My changes also extends the time of export the file because it copies the node_modules, which lost of 3 precious seconds, I'm trying to figure out how to copy the directory to the host.

Update 6
I could optimize the installation to make the new implementation faster. I'm still trying to figure out how to remove the node_modules from the generated files so we can be at least 3-4seconds faster.

executor = "bun"
}

ctr = ctr.WithExec([]string{executor, "/opt/module/bin/__tsconfig.updator.ts"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not put this logic in Go since it's faster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, however I think I should take care of that in a follow up

@helderco
Copy link
Contributor

I saw that I copy the runtime directory into base container, it may be possible to improve performances by ignoring part of it. I think I could apply the same logic in the sdk directory to ignore tests files that are useless during the execution.

You may have forgotten but putting the whole sdk directory was just a stop gap so we could work on more important things first. But the intent was to eventually cycle back and refactor the SDK to decouple the parts that aren't needed in modules. Python needs the same.

@gerhard
Copy link
Member

gerhard commented May 16, 2024

I would very much like to see a version of this land in the next milestone, v0.11.5:

Even if we don't finish everything, any improvement that we can make will go a long way.

@gerhard gerhard added this to the v0.11.5 milestone May 16, 2024
@gerhard
Copy link
Member

gerhard commented May 21, 2024

I tried this with a simple TypeScript pipeline for a Next.js app:

/**
 * Build automation powered by Dagger
 */
import { dag, Directory, Secret, object, func } from "@dagger.io/dagger"

@object()
class App {
  /**
   * Publish image: dagger call image --source=.:app --token=env:GH_TOKEN
   */
  @func()
  async image(source: Directory, token: Secret): Promise<string> {
    return dag
      .container()
      // https://hub.docker.com/_/node/tags?page=&page_size=&ordering=&name=lts
      .from("node:lts-iron")
      // https://github.com/krallin/tini
      .withExec(["apt", "update"])
      .withExec(["apt", "install", "--yes", "tini"])
      .withDirectory("/app", source)
      .withWorkdir("/app")
      .withExec(["npm", "install"])
      .withExec(["npm", "run", "build"])
      .withEntrypoint(["tini", "--", "npm", "run", "start"])
      .withRegistryAuth("ghcr.io", "gerhard", token)
      .publish("ghcr.io/gerhard/nextjs")
  }
}

I am connecting to a remote Docker instance via SSH:

Client: Docker Engine - Community
 Version:           26.0.0
 OS/Arch:           darwin/amd64

Server: Docker Engine - Community
 Engine:
  Version:          26.1.3
  OS/Arch:          linux/amd64
 containerd:
  Version:          1.6.31
 runc:
  Version:          1.1.12

Linux 6.8 x86_64 host where Docker / Dagger is running has:

  • 16 CPUs
  • 32GB DDR5
  • NVMe disk

These are the results for various runs:

COMMAND v0.11.4 - BEFORE THIS CHANGE dev - AFTER THIS CHANGE
1st run - no cache 95s 108s
2nd run - cached 11s 9s
3rd run - touch foo 50s 49s
4th run - module comment change 32s 19s
5th run - module comment change 22s 20s

Based on the above measurements, I observed one improvement, 4th run, when I just kept updating a comment in the module, which was 40% quicker: 19s vs 32s

Before this change, that was inconsistent: 5th run was only 10% quicker: 20s vs 22s

I also observed one deterioration: 1st run - no cache was 13% slower: 108s vs 95s

Here are Dagger Cloud traces for the interesting runs above:

Based on the above measurements, this does not seem a net performance improvement to me. What do you think @TomChv:

  1. Do my measurements look correct?
  2. Was I measuring the right thing?
  3. Do you still think that it's worth continuing in this direction?

@TomChv
Copy link
Member Author

TomChv commented May 21, 2024

Thanks for your amazing feedbacks! It really helps me understand what's going on with a real world use case.

Do my measurements look correct?

Yep absolutely, I could hit the same kind of behaviour and flakiness on my tests! Specially between 4 and 5.

The improvement of 40% to 10% is also something I got which confirm my work! I could hit a 2x improvement with an old version but it seams that it wasn't supporting every cases (to be more specfic: the hasPkgJson & hasTsConfig) was failing.

Was I measuring the right thing?

Yes! Perfect, I needed the run and the traces, to see how much time take every runs and which steps is longer.

Do you still think that it's worth continuing in this direction?

I think what I've done in this PR is a good start for further improvement, there's still some mystery like this example below where I cannot understand how is that possible.

Screenshot 2024-05-21 at 21 48 54

I'm super confused about the fact that yarn was waaay slower than npm when the withExec has multiple instruction, and it's even weirder because npm is only supposed to update the package.json (that's why --package-lock-only is done) but then the real npm install took only 2.3s which is super confusing because it should have install all the package.

I think I'm missing something about yarn, which keep us from using the cache. I'm thinking that at some point, we could hit higher performance if we were running the TS SDK in runtime language but that seams quite complex to do.
The overall installation of dependencies takes time, the cache isn't hitting every time (which doesn't make sense since nothing should have before adding user's source in CodegenBase.

I think it's still worth merging the PR, because it makes the module much clearer but there's still a long way to go before getting a super fast TS SDK

@TomChv
Copy link
Member Author

TomChv commented May 21, 2024

I'm also having a concern about the CI and the flakiness this PR introduce, on multiple runs, I could see different tests passing & failing, with always this kind of error:

couldn't find a package.json file in "/usr/local/share/.cache/yarn/v6/npm-@dagger-io-dagger-0.0.0-1cdff291-b199-40fc-9ef6-3ff1ac6d6d1d-1716293036796/node_modules/@dagger.io/dagger" 

Meaning that there's an issue with yarn cache and I cannot understand why is that so, this is something that isn't happening with npm at all.

As a fix, I'm trying to use yarn v4 in order to fixe both cache and download performance, according to some benchmark I was it can make things much faster

I also disabled yarn cache file to avoid flackiness, it might still become faster than npm thanks to yarn v4 hopefully the CI will be green

@TomChv
Copy link
Member Author

TomChv commented May 21, 2024

By doing so, I could successfully get a green success on initialization, which is a good news

Screenshot 2024-05-21 at 23 46 44

Tom Chauveau added 6 commits May 21, 2024 23:57
Signed-off-by: Tom Chauveau <tom@epitech.eu>
Signed-off-by: Tom Chauveau <tom@epitech.eu>
Signed-off-by: Tom Chauveau <tom@epitech.eu>
Move SDK depenencies installation before copying source code so we do
not re downaload dependencies after a user
changes (make things much faster because we
benefit from cache).
Update runtime to only use dagger command.

I did some tests, from a fresh engine we got 2 to 3 time differences.

Signed-off-by: Tom Chauveau <tom@epitech.eu>
Signed-off-by: Tom Chauveau <tom@epitech.eu>
Signed-off-by: Tom Chauveau <tom@epitech.eu>
Tom Chauveau added 4 commits May 21, 2024 23:57
Signed-off-by: Tom Chauveau <tom@epitech.eu>
Signed-off-by: Tom Chauveau <tom@epitech.eu>
Signed-off-by: Tom Chauveau <tom@epitech.eu>
Signed-off-by: Tom Chauveau <tom@epitech.eu>
@TomChv TomChv force-pushed the feat/improve-ts-performances branch from af52ea4 to 2013233 Compare May 21, 2024 21:57
@TomChv
Copy link
Member Author

TomChv commented May 21, 2024

@gerhard It seems I could fix the CI, could you do one more benchmark with your example just to see if there's any changes? I think it might get slower, I'll need to find why the yarn cache fails with collision.
A really simple check would be to add a specific cache-key for each module but I'm not sure we want to do that.

@TomChv
Copy link
Member Author

TomChv commented May 21, 2024

Conclusion of this PR

This PR changed several internal things in the TypeScript SDK runtime.

I'm not sure I could succeed in my path to reduce time by 2 but I still want to give a summary of what will happens with this PR:

  • I switched from npm to yarn v4 which in theory should make things faster.
  • I reorganized the SDK initialization to install the SDK before adding the user's source which should in theory improving the cache.
  • I improved the TS SDK code to be more readable and more clear.
  • I improve the user's package.json update using npm pkg set instead of a simple npm install which definitelly give at least a second of improvement.
  • I moved @dagger.io/dagger from devDependencies to dependencies. That allows to use --production flag which in theory is faster than installing all dependencies.
  • I deduplicate the TS SDK yarn.lock to remove any no-essential dependencies from the lockfile, it's a way to reduce the overall download time.
  • I gained valuable knowledge around the runtime and how package manager works.

Few things to do as next steps

  • Use a faster language to update tsconfig.json, this step takes 2s every time.
  • Add back cache key on yarn module. I still need to understand what's going on with that but we can save it for later, I think it will take some time and current resources on the web are not that helpful in our specific context.
  • Dig more into cache system, the SDK installation is always trigger but it shouldn't, my theory is that its because of the list files operations, I need to find a way to do it in a different way or reorder steps.

Copy link
Member

@jedevc jedevc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the refactor here, very nice 🎉

sdk/typescript/runtime/main.go Outdated Show resolved Hide resolved
sdk/typescript/runtime/main.go Show resolved Hide resolved
).
WithWorkdir(filepath.Join(ModSourceDirPath, subPath))

return t.setupModule(ctx, base, detectedRuntime, name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good refactor here 🎉

This was very hard to understand before, super glad to see this cleaned up! ❤️

Signed-off-by: Tom Chauveau <tom@epitech.eu>
Copy link
Member

@jedevc jedevc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tentative approval here, I think this definitely lays the foundations to do some more cache analysis/perf improvements here 🎉

Nice start!

@TomChv
Copy link
Member Author

TomChv commented May 22, 2024

Tentative approval here, I think this definitely lays the foundations to do some more cache analysis/perf improvements here 🎉

Nice start!

Thanks for the approval, I'm gonna merge the PR once CI is green.

@gerhard
Copy link
Member

gerhard commented May 22, 2024

A lot of helpful detail in your recent comments @TomChv, thank you for going the extra mile 🙌

I am doing one more test run now:

COMMAND v0.11.4 - BEFORE THIS CHANGE +PR7096 - AFTER THIS CHANGE
1st run - no cache 90s 93s
2nd run - module comment change 22s 19s ⚡️ 16% quicker
3rd run - module comment change 28s 19s ⚡️ 47% quicker
4th run - module comment change 22s 20s ⚡️ 9% quicker

👆 Links to Dagger Cloud Traces inline.

💪 LGTM 🚀

@TomChv
Copy link
Member Author

TomChv commented May 22, 2024

I am doing one more test now and will share the results - will update this comment.

Thanks, I'll still merge the PR no matter what, the improvement in the readability is super important and will help for further optimizations.
My guess is that it might be slighly slower than the npm implementation but I'm investing in the future with this PR.

@gerhard
Copy link
Member

gerhard commented May 22, 2024

Before merging, can you please add a changie @TomChv?

@TomChv
Copy link
Member Author

TomChv commented May 22, 2024

👆 Links to Dagger Cloud Traces inline.

💪 LGTM 🚀

Thanks for these amazing benchmark! I'll add the changie and we can merge it

Signed-off-by: Tom Chauveau <tom@epitech.eu>
@TomChv
Copy link
Member Author

TomChv commented May 22, 2024

Release note added in c423bf9

@TomChv TomChv merged commit f8cfeca into dagger:main May 22, 2024
93 checks passed
@TomChv TomChv deleted the feat/improve-ts-performances branch May 22, 2024 14:21
@gerhard
Copy link
Member

gerhard commented May 22, 2024

This introduced a bug in provisioning, we will not be able to release without a fix @TomChv: https://github.com/dagger/dagger/actions/runs/9193090706/job/25283718931#step:6:11

image

TomChv pushed a commit to TomChv/dagger that referenced this pull request Jun 6, 2024
dagger#7096 introduced a regression by using `yarn install --production`.
If the field `packagaManager` is set, it might lead to failure because
yarn post v1.22.11 doesn't support the `--production` flag anymore,
it has been removed in favor of `yarn workspaces focus --production`.

However, depending on the yarn major version (v2, v3, v4), the options
are different but also the results and the way they handle
subdependencies.

It seems `npm` is the only package manager that actually force install
all the sub dependencies, which ensure the TS SDK runtime properly
works.

Signed-off-by: Tom Chauveau <tom@epitech.eu>
gerhard pushed a commit that referenced this pull request Jun 11, 2024
* fix(ts-sdk) rollback to npm to ignore packageManager field

#7096 introduced a regression by using `yarn install --production`.
If the field `packagaManager` is set, it might lead to failure because
yarn post v1.22.11 doesn't support the `--production` flag anymore,
it has been removed in favor of `yarn workspaces focus --production`.

However, depending on the yarn major version (v2, v3, v4), the options
are different but also the results and the way they handle
subdependencies.

It seems `npm` is the only package manager that actually force install
all the sub dependencies, which ensure the TS SDK runtime properly
works.

Signed-off-by: Tom Chauveau <tom@epitech.eu>

* chore: add release note

Signed-off-by: Tom Chauveau <tom@epitech.eu>

* feat: move corepack init inside SDK installation

Signed-off-by: Tom Chauveau <tom@epitech.eu>

* fix: remove `--production` on user's install deps

Signed-off-by: Tom Chauveau <tom@epitech.eu>

---------

Signed-off-by: Tom Chauveau <tom@epitech.eu>
Co-authored-by: Tom Chauveau <tom@epitech.eu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants