Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap for v1.0 #8

Open
7 of 15 tasks
thomasdondorf opened this issue Sep 5, 2018 · 10 comments
Open
7 of 15 tasks

Roadmap for v1.0 #8

thomasdondorf opened this issue Sep 5, 2018 · 10 comments
Labels
discussion Talk about features or implementation
Milestone

Comments

@thomasdondorf
Copy link
Owner

thomasdondorf commented Sep 5, 2018

I'm thinking about what kind of functionality this library should provide before it should be released as v1. I might edit the list in the future:

My goals:

Maybe:

  • Provide a simple but robust data store with the library
  • Rename API: Some parts of API are rather unfortunate
    • concurrency should be concurrencyType
    • maxConcurrency maybe maxWorkers?
  • Provide queue function to the task function for a more functional syntax (so that you don't need to access cluster from inside the task

Not planned (for now):

  • Roadmap for v1.0 #8 (comment) Mixed concurrency models
    • Reason: It does not work well together with the idea of having a sandbox (which part of the browser/page/context stuff should be sandboxed then)
@thomasdondorf thomasdondorf added the enhancement New feature or request label Sep 5, 2018
@thomasdondorf thomasdondorf added this to the v0.1 milestone Sep 5, 2018
@thomasdondorf thomasdondorf changed the title Ideas/Goals for v1.0 Ideas/Roadmap for v1.0 Sep 6, 2018
@thomasdondorf thomasdondorf added discussion Talk about features or implementation and removed enhancement New feature or request labels Sep 6, 2018
@barpaw
Copy link

barpaw commented Sep 11, 2018

I have a question. How many browsers I can spawn in parallel for processor core? Lets Say my server has processor with 4 cores. How many browsers I can spawn in one time for my tests to pass?

@thomasdondorf
Copy link
Owner Author

Next time, please open a separate issue if it has nothing to do with this issue.

Regarding your question: It depends on your use case. For simple DOM handling I was able to run ~10 worker on my machine (i5 quad core). Just give it a try with the option (monitor: true) and see how your machine is handling the tasks.

@j-manu
Copy link

j-manu commented Sep 14, 2018

  1. Add a mixed concurrency model. i.e for PAGE or CONTEXT concurrency model, have the option to distribute the jobs to more than one browser instance. So a crash won't affect all jobs and this offers a good balance between reliability and resource usage.

  2. Add API to return the length of queue, time when the oldest item in queue was added and Number of jobs processed in the last minute. For a continuously operating cluster i.e jobs being added continuously, this information is valuable.

@cyxou
Copy link

cyxou commented Dec 21, 2018

Unfortunately, the current implementation of custom concurrency doesn't address the case when you need to provide custom puppeteer parameters to jobInstances. IMHO this would effectively solve the #36 with puppeteer args: [ '--incognito', '--proxy-server=${proxyServer}' ] and await page.authenticate(credentials).

@thomasdondorf , what do you think about this?

@thomasdondorf
Copy link
Owner Author

I'm currently thinking about completely reworking the concurrency implementations. Then there would be no more "WorkerInstance" and "JobInstance". Just one function that is called when a page is needed. Then the concurrency implementation would have 100% flexibility when a puppeteer instance is started and when one is reused.

Expect some code changes in the next two weeks ;)

@cyxou
Copy link

cyxou commented Dec 22, 2018

Cool, glad to hear that. Feel free to ping me if you need any help)

@thomasdondorf thomasdondorf changed the title Ideas/Roadmap for v1.0 Roadmap for v1.0 Mar 20, 2019
@thomasdondorf thomasdondorf pinned this issue Mar 20, 2019
@strarsis
Copy link

+1 for Docker container support.
https://github.com/skalfyfan/dockerized-puppeteer

@ermolaev1337
Copy link

Is there a way to connect the puppeteer-cluster to a remote instance of chromium? (“connect” instead of “launch”)

@generic11
Copy link

Hello - just wanted to get a feel for how active this project is. I see puppeteer cluster as being useful for several projects I'd like to work on. However, I'm hesitant to use it if development will be abandoned. Is development still happening? Thanks!

@rennokki
Copy link

(Long-term runs of puppteer-cluster #25) Make sure it's reliable and crawl more than 10 million pages with it (so far the maximum I crawled was ~800k pages)

I use k6 benchmarks in my CI tests for soketi, making sure all releases are passing benchmarks in most of the cases.

Would it be a great idea to set it up for you for page rendering testing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Talk about features or implementation
Projects
None yet
Development

No branches or pull requests

8 participants