Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build(docker): optimize; add Chromium bundled ver #9626

Merged
merged 8 commits into from
Apr 28, 2022

Conversation

Rongronggg9
Copy link
Contributor

@Rongronggg9 Rongronggg9 commented Apr 25, 2022

该 PR 相关 Issue / Involved issue

Close #7612
Close #7613

完整路由地址 / Example for the proposed route(s)

NOROUTE

新 RSS 检查列表 / New RSS Script Checklist

  • 新的路由 New Route
  • 文档说明 Documentation
    • 中文文档 CN
    • 英文文档 EN
  • 全文获取 fulltext
    • 使用缓存 Use Cache
  • 反爬/频率限制 anti-bot or rate limit?
    • 如果有, 是否有对应的措施? If yes, do your code reflect this sign?
  • 日期和时间 date and time
    • 可以解析 Parsed
    • 时区调整 Correct TimeZone
  • 添加了新的包 New package added
  • Puppeteer

说明 / Note

Bump Debian from stretch to buster bullseye

node:14-slim is based on Debian stretch, which will be EOL after 2022/6/30.

Bump node from 14 to 16

#9626 (comment)

Optimizations

Rewrite Dockerfile

Rewrote the Dockerfile to speed up the build process and maximize the cache hit rate and concurrency. Some unnecessary dependencies are no longer installed. Typically speeds up the build workflow by >5min (initial cache miss builds).
Check the comments in Dockerfile for more details.

Optimize minify-docker.js

Only copy those files in node_modules/. Typically speeds up the build workflow by >30s.

Separate the minifying stage and only install production deps in the dep-builder stage

The separate stage docker-minifier has only necessary dependencies of minify-docker.js and the dep-builder stage no longer needs dev dependencies. Thus, installing fewer dependencies effectively cuts down the build time. It also helps further shrink the cache size, which saves GitHub Actions cache space and speeds up the cache export process.

Typically speeds up the build workflow by >1min (initial cache miss builds).

Add Chromium-bundled version

There are 4 reasons why we should build a Chromium-bundled version:

  1. browserless/chrome is unbelievably huge and clumsy (~900MB compressed size), which makes pulling its image a painful work, while the Chromium-bundled version is pretty light-weighted (~270MB~220MB compressed size, of which ~70MB is RSSHub and its deps).
  2. browserless/chrome consumes more memory. Even if it is inactive, it still eats >100M happily, not to mention the time when it is active.
  3. Puppeteer never officially guarantees to work with browserless/chrome, instead, "Each version of Puppeteer bundles a specific version of Chromium – the only version it is guaranteed to work with". Thus, there might be some potential issues.
  4. Sometimes using Docker Compose is simply not an option. For example, there are some workarounds to deploy RSSHub to Railway.app, but it only supports Docker. Still, there are other workarounds to run browserless/chrome as another service and get them connected, but the bad news is that it will be exposed to the Internet (so unnecessary and dangerous!). If this PR gets merged, I will create another PR to make the deployment to Railway.app easier.

The Chromium-bundled Docker image is able to reuse all caches from the ordinary image to speed up its build progress. Typically, only needs <1min even if the cache for Chromium misses.

The previous Dockerfile has been unable to build Chromium-bundled images for a long time, so this is also a fix.

I did not change the docker-compose.yml to adopt the Chromium-bundled version but just added some prompts. If such a change is considered acceptable, I may also change it to use the Chromium-bundled version by default.

To verify that it works fine, check https://<REDACTED>.up.railway.app/pincong/hot


Note about unintended cache miss on some arch:

I've noticed this issue for a long time. I've managed to test using registry cache instead of GitHub Actions cache, only to find that nothing changed:

moby/buildkit#2822

Thus, even though I managed to adopt a lot of caching techniques, non-initial builds are still sometimes as slow as initial ones.

It seems that dropping one arch may help since usually only one arch loses its cache. But that's not cool before figuring out how many users still have their ARMv7 devices running RSSHub. (only the repo owner can check the last pull time of each tag and each arch on Docker Hub)

Signed-off-by: Rongrong <i@rong.moe>
docs/install/README.md Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
Signed-off-by: Rongrong <i@rong.moe>
Signed-off-by: Rongrong <i@rong.moe>
Signed-off-by: Rongrong <i@rong.moe>
Signed-off-by: Rongrong <i@rong.moe>
@Rongronggg9
Copy link
Contributor Author

I've done some additional optimization (refer to the modified issue description) to obtain a >1min speedup.
Now an initial cache-miss build only needs 17min to finish.

I've also requested a secondary run attempt to see if it helps solve the multi-arch caching problem and it seemed to help!

Signed-off-by: Rongrong <i@rong.moe>
@Rongronggg9
Copy link
Contributor Author

I am still figuring out if the image size of the Chromium-bundled version can be further shrunk. Refer to puppeteer/puppeteer#7822 (comment)

for Chromium-bundled version

Signed-off-by: Rongrong <i@rong.moe>
@Rongronggg9
Copy link
Contributor Author

Rongronggg9 commented Apr 26, 2022

OK. Now I have minimized the dependencies of the bundled Chromium. ~50MB is saved from the compressed image size (~130MB from the extracted image size).

I've also done some tests to prove that it still works fine with minimized dependencies.

To verify that it works fine, check https://<REDACTED>.up.railway.app/pincong/hot, https://<REDACTED>.up.railway.app/nytimes, etc.

@Rongronggg9
Copy link
Contributor Author

Rongronggg9 commented Apr 27, 2022

I believe that I've fixed the cache issue and a cached build (with routes and package.json/yarn.lock updated) just needs 13min to finish.
If package.json/yarn.lock was not updated, it could be further faster.

@TonyRL TonyRL merged commit d8c00ec into DIYgod:master Apr 28, 2022
@Rongronggg9 Rongronggg9 deleted the docker-optimization branch May 5, 2022 06:12
@Rongronggg9
Copy link
Contributor Author

If this PR gets merged, I will create another PR to make the deployment to Railway.app easier.

Now the plan died young. I will no longer pay my effort on it until it deserves. But there does be a preview: master...Rongronggg9:railway-ci


https://railway.app/changelog/2022-05-27

TL;DR on proposals so far

  • Private repos are back on the free plan
  • Credit card to verify (FREE)
  • Prepaid plans (purchase credits up-front)
  • Project member limits
  • $5 Dev Plan credits instead of $10
  • Service execution limits on free plan

https://blog.railway.app/p/updates-on-plans

  • Aggressive dependency scanning is coming (main concern)

Stronger Fraud Scanning: We’ll be scanning more aggressively on dependencies in this tier.

  • Free Plan is about to require credit card verification

Verification Center: Verify via Credit Card

Reasoning: We deal with a lot of bots trying to deploy on the platform, hence, we were vague in the past on how to get your account into a good state. As a result, the rules around verification aren’t clearly communicated within the product.
Proposal: We have now separated the verification flow from purchasing. You can add a credit card and not have to purchase anything to prove your humanity.

  • Developer Plan will lose its financial attraction soon ($10 credit -> $5, a light-used RSSHub deployment with puppeteer used consumes at least ~$3/mo)

...we are looking to adjust the current Developer plan down from $10 a month to $5 of credit...

@TonyRL
Copy link
Collaborator

TonyRL commented May 29, 2022

Since Railway decided on this move, it will lose its competitiveness among other free hosting platforms that can be found on https://free-for.dev.

These are just a few from there for a brief comparison:

Provider RAM Bandwidth Auto Sleep Function Timeout Usage limit Regions Notes
Deta 128MB* ? Yes 10s* N/A ?
  • RAM can be increase to 512/1024MB
  • Timeout can be increase to 15/20s
fly.io 256MB 160GB* Always-on N/A Pay outside free tier APAC, EU, US
  • C.C. required
  • 100GB EU+US, 30GB India, 30GB rest of APAC + others
Heroku 512MB 2TB soft limit 30 mins 30s 550 / 1000 hrs * EU, US  
  • C.C. required for 1000hrs
Koyeb 256MB x 2 or 512MB 100GB Always-on N/A $5.40 APAC, EU, US  
Qoddi 256MB Always-on N/A N/A ?
  • ID might be required
Railway 512MB / ∞ 100GB soft limit Always-on N/A -> TDA $5/$10 → 500hrs / $5 US  
  • C.C. will be required
  • Unknown list of banned dependencies and keywords with aggressive
Render 512MB 100GB 15 mins N/A 750 hrs APAC, EU, US  
Vercel N/A 100GB Always-on 10s 100 GB-hours APAC, EU, US  

The "banned dependency" also hit me for deploying RSSHub on Railway at the end of November last year. I've asked the support to review this. It was escalated to the upper staff and I haven't received a single word since then.

For me, their detection system feels like "ban-everything-i-don't-like-on-steroids", e.g., one of their customer asked

this

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

无法编译带Chrome的Docker镜像 Docker编译出来的带chrome的image在运行时找不到chrome
3 participants