Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

forking v8 as an optimization #170

Open
heavyk opened this issue Sep 17, 2015 · 16 comments
Open

forking v8 as an optimization #170

heavyk opened this issue Sep 17, 2015 · 16 comments

Comments

@heavyk
Copy link
Contributor

heavyk commented Sep 17, 2015

continuing discussion started in nodejs/node#2133 (comment)


it looks like you can fork v8 even after the context has been created. however, if what you're wanting to accomplish is save memory, you will probably need to have that memory outside of v8's gc'd memory for the OS not to copy and reallocate that memory when garbage is collected in in the forked process.

also, a potential optimization for NodeOS (still haven't tried it yet so I don't really know) may be to prepare a template process with your desired base v8 context in it (because creating a context is expensive) and just fork that template process for each process spawn and then that base context, the first thing it would do, is ask the OS which script to run and where to find it.. (kinda like a process boot loader)

I don't know enough about these subjects to say anything concrete, though, I do know that if NodeOS is working, some smart chap will figure out how to optimize it. NodeOS does look interesting...

cheers

@ronkorving
Copy link

Afaik the OS should already limit the physical copies of the .text section of a program to 1. It's a simple optimisation that I think kernels have been doing for years. Someone please correct me if I'm wrong.

@heavyk
Copy link
Contributor Author

heavyk commented Sep 18, 2015

I think you're right about the text. it looks to be based on linux. linux should already be doing that

perhaps what he was saying was to not copy the memory of the js scripts/ast -- which I'm not at all sure is possible (I think it is)... but then... thing is, I don't know how v8 stores it stuff ... experimentation would have to be done.

I'm gonna have to take a look at NodeOS now. I wonder how some things are working now... I really like the idea

@ronkorving
Copy link

A fork would probably do a copy-on-write there, but yeah the next GC cycle may mess all that up.

@piranna
Copy link
Member

piranna commented Sep 18, 2015

Sorry for no reply before. As @ronkorving pointed out, my idea was about, my idea is NOT to fork() after the v8 engine has been initialized (that would be crazy by two processes sharing the heap memory space, that's like threads and for that I think it's better to use multi-contexts + Workers) but before it, so no process heap or stack is shared, but instead start a new clean process. I don't know if Linux kernel make the optimization you are talking about, it could be possible if it have a map between binaries and the memory location they were loaded so it could be reused, but if that's the case, it doesn't makes sense that each new process spawned by child_process.fork() use 10mb itself, while that's the size of the Node.js binary itself and should be already in memory... I don't believe v8 waste so much memory for each instance internal memory structures...

By the way @heavyk, I think your idea of the process-loader would be of interest to runtime.js... :-)

@heavyk
Copy link
Contributor Author

heavyk commented Sep 18, 2015

hmmm, runtimejs does look cool. honestly though, I would prefer having node as the runtime env though, just simply for the ecosystem and the massive amount of modules which already run in it. that's why nodeOS looks so desirable.

I can assure you that the .text of the binary is not duplicated in memory for linux. I'll have to look, but I think that those 10mb are actually the js bytecode of the core library. what maybe nodeOS (I really hate capitalizing the first 'N' for some reason) could do is to create a slimmer "core" library -- actually a fully stripped down version of the core library, and then have all of those core libraries located in a globally accessable node_modules dir (eg. one for 'net', 'http', 'fs', etc.) that way, only the "core" library components used in each program would be loaded as needed. without modifications to your version of node, there would be no sharing of that memory. however shared memory isn't hard at all and v8 could be extended to look in some sort of shared memory cache of required files... it actually potentially already exists, as I'm going to guess that chrome does has some efforts to not duplicate memory for multiple tabs open of the same website.

@piranna have you tried getting node 4.0.0 to run in nodeOS? I know your intention is to get the workers in, so I imagine you've at least attempted something.

@piranna
Copy link
Member

piranna commented Sep 19, 2015

hmmm, runtimejs does look cool. honestly though, I would prefer having node as the runtime env though, just simply for the ecosystem and the massive amount of modules which already run in it. that's why nodeOS looks so desirable.

That's why it's planned to have a Node.js compatibility layer in the future :-)

I can assure you that the .text of the binary is not duplicated in memory for linux.

Also when starting them with two different execve() calls?

I'll have to look, but I think that those 10mb are actually the js bytecode of the core library. what maybe nodeOS (I really hate capitalizing the first 'N' for some reason) could do is to create a slimmer "core" library -- actually a fully stripped down version of the core library, and then have all of those core libraries located in a globally accessable node_modules dir (eg. one for 'net', 'http', 'fs', etc.) that way, only the "core" library components used in each program would be loaded as needed.

Interesting... I've read before about people proposing to move out all the build-in modules from the Node.js core code and left only almost just the require() function (that's almost just what would runtime.js export :-P ), and in fact they are publishing them in npm, maybe it's a doable thing... I don't believe that this 10mb are all from the js bytecode, but doing this would certainly help :-)

without modifications to your version of node, there would be no sharing of that memory. however shared memory isn't hard at all and v8 could be extended to look in some sort of shared memory cache of required files...

People shouldn't, but they can be able to patch modules after they are loaded (the require() cache is accesible for the apps), so sharing it can be dangerous, I would not take this path.

I'm going to guess that chrome does has some efforts to not duplicate memory for multiple tabs open of the same website

Bad example, Chrome is a memory hooker :-P Hope some day they fix their memory wastes...

@piranna have you tried getting node 4.0.0 to run in nodeOS? I know your intention is to get the workers in, so I imagine you've at least attempted something.

It's on the to-do list, but I'm currently working on my bachelor thesis (about NodeOS itself) and don't want to modify the code until I get it finished. It's only a matter of upgrading the version of Node.js and check if it works, so if you want to do it I will accept your pull-request :-)

@piranna
Copy link
Member

piranna commented Sep 19, 2015

@heavyk, could you be able to open an issue on Node.js about your idea of moving the build-in modules to the /lib/node_modules folder? I think it could help to others too :-) We could do it ourselves on our fork of Node.js (it's mostly the lib folder on their repo), but maintain it would be a lot of work... :-/

@heavyk
Copy link
Contributor Author

heavyk commented Sep 19, 2015

Also when starting them with two different execve() calls?

dunno for sure, but I would assume so. it's just a mmap of the file contents essentially. it should be marked as read only and executable. any others with identical mappings should not be duplicated. nowadays, I think even if it's not read-only, it won't copy until the memory is written to, just the same as fork()

People shouldn't, but they can be able to patch modules after they are loaded (the require() cache is accesible for the apps), so sharing it can be dangerous, I would not take this path.

it would have to be a copy-on-write require cache shared globally, or some sort of system where the cache is based on the hash of the file contents. actually, I just realized -- this could potentially be prototyped in node. hmmm.

It's only a matter of upgrading the version of Node.js and check if it works, so if you want to do it I will accept your pull-request :-)

sure I will, but I'm on a mac right now, so I will look into it "soon". however, I may want to invest my time in docker for the moment. we use coreos on our servers, so pushing out tiny nodeOS images would be a huge win for us.

@heavyk
Copy link
Contributor Author

heavyk commented Sep 19, 2015

could you be able to open an issue on Node.js about your idea of moving the build-in modules to the /lib/node_modules folder?

this isn't necessary. all you have to do is to overwrite this section of the gyp file with the path to your require function and the module loader.. to test it, modify these - lines in the configure and override them (instead of appending). you'll need the module.js an maybe a few others. after, a script could easily be produced to produce the node_modules directories. it'd be little more than moving files around. it could be easily added before the compile phase (after code checkout)

@piranna
Copy link
Member

piranna commented Sep 19, 2015

dunno for sure, but I would assume so. it's just a mmap of the file contents essentially.

Didn't though from this point of view... Probably it's true that's already optimized... We would need to know where that 10mb exactly came from.

it would have to be a copy-on-write require cache shared globally, or some sort of system where the cache is based on the hash of the file contents. actually, I just realized -- this could potentially be prototyped in node. hmmm.

Node.js modules are cached based on the file path. Maybe there could be a global cache and later Object.observe() or Proxy objects could wrap it and take in account changes done on the global objects by the applications... Another option would be to freeze the objects on the cache and throw an error if someone want to change them, that would be easier and more secure by not allowing apps to change global modules (whoever that wants to do it, it should do it's own module and inherit from the global ones).

I may want to invest my time in docker for the moment.

I'm not too much into Docker and the current images are outdated, but I have planned to update them before release v0.1.0.

we use coreos on our servers, so pushing out tiny nodeOS images would be a huge win for us.

Are you planning to use NodeOS in production? That would be cool! :-D Could you be able to tell me the name of your organization so I can add it to my bachelor thesis? :-)

this isn't necessary. all you have to do is to overwrite this section of the gyp file with the path to your require function and the module loader.. to test it, modify these - lines in the configure and override them (instead of appending). you'll need the module.js an maybe a few others. after, a script could easily be produced to produce the node_modules directories. it'd be little more than moving files around. it could be easily added before the compile phase (after code checkout)

Interesting, specially the linked_library flag :-) Seems the work is almost done, and maybe could be added as a configure option (--build-ins=[yes]/external/no).

@heavyk
Copy link
Contributor Author

heavyk commented Sep 21, 2015

Are you planning to use NodeOS in production? That would be cool! :-D Could you be able to tell me the name of your organization so I can add it to my bachelor thesis? :-)

I certainly would like to :) we're named affinaty. we haven't technically launched yet. that's happening this week. after the launch I'll be revisiting the servers and testing out nodeos on docker all of the following week. currently, we're just using a ubuntu docker container with iojs inside of it. so, assuming that I'm able to get all native modules compiled and working, and it also gives a boost in performance (it should), we'll convert over to the nodeos on docker. expect prs next week as I try and get it working. I'll update you as things develop but it'd be cool to do some sort of collaboration in the future. I think the way you're taking this is the right direction.

Interesting, specially the linked_library flag :-) Seems the work is almost done, and maybe could be added as a configure option (--build-ins=[yes]/external/no).

yeah I'll give what I wrote above a try when I test this out next week and update you on where that memory usage comes from. that option is really nice because that means I can put all of the core server code right into the executable.


side thought: perhaps for desktop it doesn't make a whole lot of sense doing a lot of work getting a shell going and perhaps it may be accepted quicker if you were to just spawn webviews over xvfb like found here: https://github.com/kapouer/node-webkitgtk

@piranna
Copy link
Member

piranna commented Sep 21, 2015

we're named affinaty

That Affinaty?! Are you working here at Madrid, Spain, at only 30 minutes from my home?!? That's definitely a signal!!! :-P If you are interested we can talk in private, my email is in my profile :-)

assuming that I'm able to get all native modules compiled and working

It's unmodified Linux kernel and Node.js, so it should, or maybe with minor fix-ups on the modules. As I told you currently it's using Node.js v0.11.14 due a problem with v8 version on upcoming releases, I've tried yesterday to upgrade it to v4.1.0 but it requires gcc 4.8 or greater (we are using 4.7.3), but didn't be able to make it work any musl-patched version... :-( Needs more testing, though, but at least it helped me to review and do some clean-ups on the build mechanism :-)

and it also gives a boost in performance (it should)

One of the assignements of my thesis tutor is to do some benchmarks compared to regular Node.js over Ubuntu, so let's see what happens when I'll do them... ;-)

that option is really nice because that means I can put all of the core server code right into the executable

Do you know if that's only for pure-Javascript modules, or can it be used with compiled ones too? Do they follow recursive require()s, or is it needed to add manually each one of them? I would be interested on adding kexec...

side thought: perhaps for desktop it doesn't make a whole lot of sense doing a lot of work getting a shell going and perhaps it may be accepted quicker if you were to just spawn webviews over xvfb like found here: https://github.com/kapouer/node-webkitgtk

Interesting option, but as discussed on other topics, graphic interface will be done in html5 and a client-server architecture, so as a quick solution until a native web renderer is available, any web browser can be used to connect remotely to a NodeOS server instance.

@heavyk
Copy link
Contributor Author

heavyk commented Sep 22, 2015

That Affinaty?! Are you working here at Madrid, Spain, at only 30 minutes from my home?!

si, si :) ese es... yo soy kenny :) ok, I'll send you an email this afternoon. right now I'm located in torrelodones, and work 98% from home. occasionally I go down to madrid (when I need to). however, my plan for expanding the team is actually to do mostly distance work (because I'm not a huge huge fan of large offices - like when I worked for tuenti) and instead encourage the team to organize their own gatherings as often as desired (cause sometimes programming in the same area is much much more effective) so it doesn't really matter where you live. but yeah, being close in geometric proximity we could get together and work on things a bit easier.

Do you know if that's only for pure-Javascript modules, or can it be used with compiled ones too?

pure js modules will be compiled using the js2c commandline tool which converts them to some sort of code which v8 understands in some way (I think it's essentially inlining the source - nothing special - but I have to look). for compiled modules, you just add each of them to the source and require them as usual. an example is the natives module which is the interface used by lib to access most of libuv.

Do they follow recursive require()s, or is it needed to add manually each one of them?

I will need to look, but I think that they do not have deep nesting. meaning that if you have two modules which have a dep, they will both need to be the same version. I neither know if having a module in the node_modules folder with the same name can override a core module. these need to be tested.

@piranna
Copy link
Member

piranna commented Sep 22, 2015

I'll send you an email this afternoon

Cool! I'll wait them :-) It's not good to alienate more this topic :-P

I neither know if having a module in the node_modules folder with the same name can override a core module

I believe I've that's the behaviour, that the module on node_modules folder has precedence, that's one of the reasons why people are packaging the core modules to be able to access to new functionality with old versions of Node.js.

@piranna
Copy link
Member

piranna commented Oct 10, 2015

On Node.js they are thinking about [to include a list with the build-ins currently included in the binary(https://github.com/nodejs/node/issues/3307), and also there are modules that show that list. I think this can be a good guide to start moving out all the build-in modules to an external node_modules folder :-)

@mitsukaki
Copy link
Contributor

bump this thread because I'm not sure of it's progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants