Skip to content

Conversation

@MQ37
Copy link
Contributor

@MQ37 MQ37 commented Nov 25, 2025

This PR updates the node packages before enabling the limited permissions, just in case - encountered issue with bad cheerio related type imports and then with the Dockerfile build process.

Tested on my personal account

Copy link
Collaborator

@jirispilka jirispilka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use similar dockerfile as in website-content-crawler: https://github.com/apify/store-website-content-crawler/blob/main/.actor/Dockerfile

Ideally I would keep it as close as possible.

Copy link
Collaborator

@matyascimbulka matyascimbulka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, pre-approving. There are some lint errors that should be fixed before merging.

@MQ37
Copy link
Contributor Author

MQ37 commented Nov 25, 2025

Should we use similar dockerfile as in website-content-crawler: https://github.com/apify/store-website-content-crawler/blob/main/.actor/Dockerfile

Ideally I would keep it as close as possible.

Honestly I would not do that, we can customize the Dockerfile to server our needs and these are different to WCC. This change sped up the build times from ~ 3 minutes to under 1 minute. Also when I saw their build process depending on some random Russian API to get current time I would not also do that 😅 (see https://github.com/apify/store-website-content-crawler/blob/1159fd27675a61639b2a8191e167c6d2e0bbf237/.actor/Dockerfile#L22C1-L23C73). Sometimes it's better to "reinvent" the wheel than to stick to the standardized solution that you just blindly copy, even if they work - they will break one day and fixing something that you did not write sucks, for example the langchain-apify build github workflow, we are just using shared one and it was migrated to uv from poetry without us knowing and it broke the pipeline, if we used our own it would be fine. So I am strongly against this.

@jirispilka
Copy link
Collaborator

Build time is not that important when running in prod.

I never said we should copy it blindly. Let me explain my reasoning: the tooling team is actively working on this so why not learn from their Dockerfile and reuse what makes sense for us.

For example: TZ settings, patches (if needed), policies, etc.

@MQ37
Copy link
Contributor Author

MQ37 commented Nov 26, 2025

Build time is not that important when running in prod.

I never said we should copy it blindly. Let me explain my reasoning: the tooling team is actively working on this so why not learn from their Dockerfile and reuse what makes sense for us.

For example: TZ settings, patches (if needed), policies, etc.

Makes sense, will implement the relevant stuff from the WCC dockerfile 👍

@MQ37 MQ37 changed the title feat: update libs for limited perms, fix and speed up Dockerfile build, fix wrong imports feat: update libs for limited perms, fix and speed up Dockerfile build, fix wrong imports, firefox patches and policies, NY TZ Nov 26, 2025
MQ37 added 2 commits November 26, 2025 10:52
commit 7aadecd
Author: MQ37 <themq37@gmail.com>
Date:   Wed Nov 26 11:14:09 2025 +0100

    add var for better readability

commit fd02b0a
Author: MQ37 <themq37@gmail.com>
Date:   Wed Nov 26 11:09:57 2025 +0100

    make code more typescripty
@MQ37 MQ37 merged commit 43cd82a into fix-google-num-results Nov 26, 2025
@MQ37 MQ37 deleted the limited-perms branch November 26, 2025 10:31
MQ37 added a commit that referenced this pull request Nov 26, 2025
… WCC (#81)

* fix: number of google pages results

* Update src/crawlers.ts

Co-authored-by: Matyáš Cimbulka <matyas.cimbulka@apify.com>

* Update src/crawlers.ts

Co-authored-by: Matyáš Cimbulka <matyas.cimbulka@apify.com>

* refactor: user data handling for the serp pagination

* refactor: remove duplicate code

* fix: update request handler to use addRequests for search crawler

* make code more typescripty

* add var for better readability

* feat: update libs for limited perms, fix and speed up Dockerfile build, fix wrong imports, firefox patches and policies, NY TZ (#82)

* feat: update libs for limited perms, fix and speed up Dockerfile build, fix wrong imports

* magic number to const

* lint

* fix lint, update crawlee

* add firefox policies, patches, set NY TZ

* add ghostery blocker

* Squashed commit of the following:

commit 7aadecd
Author: MQ37 <themq37@gmail.com>
Date:   Wed Nov 26 11:14:09 2025 +0100

    add var for better readability

commit fd02b0a
Author: MQ37 <themq37@gmail.com>
Date:   Wed Nov 26 11:09:57 2025 +0100

    make code more typescripty

* fix playwright version, fix types.ts

* fix playwright version for the patch

* fix merge issue

* Update src/utils.ts

Co-authored-by: Matyáš Cimbulka <matyas.cimbulka@apify.com>

---------

Co-authored-by: Matyáš Cimbulka <matyas.cimbulka@apify.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants