Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of the NodeJS URL class #40

Closed
wants to merge 9 commits into from

Conversation

etiennemartin
Copy link
Contributor

@etiennemartin etiennemartin commented Nov 25, 2022

We've made extensive use of the goja runtime without our projects. One of the missing pieces was the ability to use NodeJS's URL functionality. We've decided to create this PR and give back the implementation we wrote for others to use. We hope this helps others seeking for URL support with goja.

While the implementation is very close to the NodeJS documentation, we did leave out a few corner cases where the behaviour might differ.

Limitations:

  • Doesn't support searchParams property (yet, coming soon)
  • Parsing ports will not support any string with non-alphanumeric values
  • Some corner cases with the constructor aren't supported:
myURL = new URL('foo://Example.com/', 'https://example.org/');
// foo://Example.com/

myURL = new URL('https:Example.com/', 'https://example.org/');
// https://example.org/Example.com/

The unit tests written in Javascript are a port of all the examples given in the NodeJS documentation: https://nodejs.org/api/url.html

@etiennemartin etiennemartin marked this pull request as draft November 25, 2022 20:55
@etiennemartin
Copy link
Contributor Author

I've added support for searchParams property on the the URL object as specified in https://nodejs.org/api/url.html. The only aspect missing is the global constructors of the Class URLSearchParams. The implementation is tied to the URL class for the time being.

I think the current implementation has enough for a starting point in the current state. We can always evolve this implementation if the need arises.

@etiennemartin etiennemartin marked this pull request as ready for review November 27, 2022 05:13
@etiennemartin etiennemartin mentioned this pull request Nov 29, 2022
@dop251
Copy link
Owner

dop251 commented Nov 30, 2022

Thanks for submitting! I haven't had time to look into the details, but the overall implementation framework needs some adjustments. In particular, creating the prototype for each instance is not the right way, both from performance and correctness point of view (for example, instanceof won't work property).

Overall it should look something like this:

package url

import (
	"net/url"
	"reflect"

	"github.com/dop251/goja"
	"github.com/dop251/goja_nodejs/require"
)

const ModuleName = "node:URL"

var reflectTypeURL = reflect.TypeOf((*url.URL)(nil))

func toURL(r *goja.Runtime, v goja.Value) *url.URL {
	if v.ExportType() == reflectTypeURL {
		if u := v.Export().(*url.URL); u != nil {
			return u
		}
	}
	panic(r.NewTypeError("Expected URL"))
}

func defineURLAccessorProp(r *goja.Runtime, p *goja.Object, name string, getter func(*url.URL) interface{}, setter func(*url.URL, goja.Value)) {
	var getterVal, setterVal goja.Value
	if getter != nil {
		getterVal = r.ToValue(func(call goja.FunctionCall) goja.Value {
			return r.ToValue(getter(toURL(r, call.This)))
		})
	}
	if setter != nil {
		setterVal = r.ToValue(func(call goja.FunctionCall) goja.Value {
			setter(toURL(r, call.This), call.Argument(0))
			return goja.Undefined()
		})
	}
	p.DefineAccessorProperty(name, getterVal, setterVal, goja.FLAG_FALSE, goja.FLAG_TRUE)
}

func createURL(r *goja.Runtime) goja.Value {
	proto := r.NewObject()

	defineURLAccessorProp(r, proto, "host", func(u *url.URL) interface{} {
		return u.Host
	}, func(u *url.URL, arg goja.Value) {
		u.Host = arg.String() // TODO: validate and ignore if not a valid host
	})
	// TODO add the remaining properties

	f := r.ToValue(func(call goja.ConstructorCall) *goja.Object {
		var u *url.URL

		u, _ = url.Parse(call.Argument(0).String())
		// TODO: implement proper constructor logic

		res := r.ToValue(u).(*goja.Object)
		res.SetPrototype(call.This.Prototype())
		return res
	}).(*goja.Object)

	f.Set("prototype", proto)
	return f
}

func Require(runtime *goja.Runtime, module *goja.Object) {
	module.Set("exports", createURL(runtime))
}

func Enable(runtime *goja.Runtime) {
	runtime.Set("URL", require.Require(runtime, ModuleName))
}

func init() {
	require.RegisterNativeModule(ModuleName, Require)
}

@etiennemartin
Copy link
Contributor Author

Thanks for the feedback! I'll post an update to the PR with the changes you're suggesting. Can you elaborate a bit on the performance aspect of the change? I'm asking because I would love to avoid performance issues in the rest of our code base. Is there something specific I should be looking for.

Again thanks for the feedback, it's very much appreciated. 🙌🏻

@dop251
Copy link
Owner

dop251 commented Dec 2, 2022

Nothing specific, it's just that allocating and populating a prototype object for each instance obviously takes its toll on performance.

@etiennemartin
Copy link
Contributor Author

@dop251 I've updated the code to reflect the changes you put in your example.

I've removed the searchParams part of the implementation for now. I'll revisit it later since I'm not 100% sure how we can share data between two objects. Previously I had a shared wrapper around the url itself which is no longer the case. So editing the searchParams should update the URL and vice versa. I need to figure out how that's going to work before re-introducing it.

Did you see any issues with keep an external reference to the url.URL like I had previously? Or is there a way to update the parent relationship between the searchParam and the instance of URL.

@dop251
Copy link
Owner

dop251 commented Dec 5, 2022

I would probably have a wrapper struct around url.URL, as such:

type urlWrapper struct {
	u *url.URL
	query url.Values
}

And have another type for the query:

type urlQuery urlWrapper

Then you could define the getter like this:

	defineURLAccessorProp(r, proto, "searchParams", func(u *urlWrapper) interface{} {
		res := r.ToValue((*urlQuery)(u)).(*goja.Object)
		res.SetPrototype(searchParamsProto)
		return res
	}, nil)

Then you could have defineURLParamsAccessorProp and define all getters and setters on searchParamsProto a similar way.

The urlWrapper.query should be set lazily the first time it's accessed. This prevents calling url.Query() each time because it does the parsing. If URLSearchParams is used as a standalone instance, urlWrapper.u will be nil.

BTW, I didn't look closely, the 'node:url' module does not export the URL class as its root, rather:

> require("node:url")
{
  Url: [Function: Url],
  parse: [Function: urlParse],
  resolve: [Function: urlResolve],
  resolveObject: [Function: urlResolveObject],
  format: [Function: urlFormat],
  URL: [class URL],
  URLSearchParams: [class URLSearchParams],
  domainToASCII: [Function: domainToASCII],
  domainToUnicode: [Function: domainToUnicode],
  pathToFileURL: [Function: pathToFileURL],
  fileURLToPath: [Function: fileURLToPath],
  urlToHttpOptions: [Function: urlToHttpOptions]
}

@etiennemartin
Copy link
Contributor Author

BTW, I didn't look closely, the 'node:url' module does not export the URL class as its root, rather:

I was thinking about doing another PR for the searchParams. If we address the issue with the root would you be ok merging this PR in? I'm asking because we would love to reference the URL implementation from this repo instead of custom one we have right now.

BTW, Most of this work is helping power a load testing tool (DDoS) we use internally here at Shopify. We appreciate the work you've put into goja. This tool helps us scale test our systems.

@etiennemartin
Copy link
Contributor Author

I've renamed the method that creates the constructor and the prototype so the Enable call works as expected. I'm confused how we can model the exports of the module to represent the structure you posted:

> require("node:url")
{
  Url: [Function: Url],
  parse: [Function: urlParse],
  resolve: [Function: urlResolve],
  resolveObject: [Function: urlResolveObject],
  format: [Function: urlFormat],
  URL: [class URL],
  URLSearchParams: [class URLSearchParams],
  domainToASCII: [Function: domainToASCII],
  domainToUnicode: [Function: domainToUnicode],
  pathToFileURL: [Function: pathToFileURL],
  fileURLToPath: [Function: fileURLToPath],
  urlToHttpOptions: [Function: urlToHttpOptions]
}

Since classes are a concept of ECMA 6, should the exports of the module simply be an empty object? Like follows:

func Require(runtime *goja.Runtime, module *goja.Object) {
	module.Set("exports", runtime.NewObject()) // Empty object
}

This would allow the URL functionality to be enabled via the Enable call, and allow us to expand on the URL exports in the future. (i.e. domainToASCII, domainToUnicode, etc...)

url/module.go Outdated
}

func Require(runtime *goja.Runtime, module *goja.Object) {
module.Set("exports", runtime.NewObject()) // Empty object
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
module.Set("exports", runtime.NewObject()) // Empty object
exports := module.Get("exports").(*goja.Object)
exports.Set("URL", createURLConstructor(runtime))

url/module.go Outdated
}

func Enable(runtime *goja.Runtime) {
runtime.Set("URL", createURLConstructor(runtime))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
runtime.Set("URL", createURLConstructor(runtime))
runtime.Set("URL", require.Require(runtime, ModuleName).ToObject(runtime).Get("URL"))

@etiennemartin
Copy link
Contributor Author

Added the recommended changes. Let me know what you think.

url/module.go Outdated
Comment on lines 126 to 131
f, _ := strconv.ParseFloat(arg.String(), 64)
max := ^uint16(0) // 65535
if f > float64(max) {
f = float64(max)
}
u.Host = u.Hostname() + ":" + fmt.Sprintf("%d", int(f))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f, _ := strconv.ParseFloat(arg.String(), 64)
max := ^uint16(0) // 65535
if f > float64(max) {
f = float64(max)
}
u.Host = u.Hostname() + ":" + fmt.Sprintf("%d", int(f))
if arg.String() == "" {
u.Host = u.Hostname() // clear port
} else if num := arg.ToInteger(); num >= 0 && num <= math.MaxUint16 {
u.Host = u.Hostname() + ":" + fmt.Sprintf("%d", num)
}

throw new Error(`Failed setting port. got: ${myURL.href}, expected: https://example.org:1234`)
}
myURL.port = 123456789;
if (myURL.href != "https://example.org:65535") {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, maybe this is different in other versions of node, but in v16, this is different:

Welcome to Node.js v16.11.1.
Type ".help" for more information.
> myURL = new URL('https://example.org/abc/xyz?123');
URL {
  href: 'https://example.org/abc/xyz?123',
  origin: 'https://example.org',
  protocol: 'https:',
  username: '',
  password: '',
  host: 'example.org',
  hostname: 'example.org',
  port: '',
  pathname: '/abc/xyz',
  search: '?123',
  searchParams: URLSearchParams { '123' => '' },
  hash: ''
}
> myURL.port = 123456789;
123456789
> myURL
URL {
  href: 'https://example.org/abc/xyz?123',
  origin: 'https://example.org',
  protocol: 'https:',
  username: '',
  password: '',
  host: 'example.org',
  hostname: 'example.org',
  port: '',
  pathname: '/abc/xyz',
  search: '?123',
  searchParams: URLSearchParams { '123' => '' },
  hash: ''
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow ok, that's a bit odd, I guess they ignore invalid cases. Makes sense since this the current behaviour would set something you didn't expect. I'll correct that.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually documented here: https://nodejs.org/api/url.html#urlport
Even with my fix it's still not quite right (the default ports for protocol are not handled, neither are strings beginning with numbers)...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright good call. I can get that worked in here. I'll post an update soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed another commit that addresses the differences. I've also ported over all the examples from the docs into the unit tests. So it should have the same behaviour now.

@etiennemartin
Copy link
Contributor Author

Nice catch on the port. I've updated the code to reflect your changes and updated the unit tests as well.

@dop251
Copy link
Owner

dop251 commented Dec 10, 2022

I'm making some changes to this PR, and I have a question: did you try running test/url_test.js in nodejs? If so, what version?

@dop251
Copy link
Owner

dop251 commented Dec 11, 2022

Ok, I've made some changes and merged it in #43. It's quite tricky to get full compatibility because escaping rules are slightly different. It will be even more tricky for the query parameters (I believe to get full compatibility it would require a custom parser).

@dop251 dop251 closed this Dec 11, 2022
@etiennemartin
Copy link
Contributor Author

I took a look at the final changes you made. Looks good. Some really good clean up and code structure in there. Originally I wasn't sure how close we wanted the implementation to be. I can see now we want to be exact. I'll try and take a stab at the search parameters if that's ok.

Thanks again for the advice and merging this in! 🙌🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants