Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxy like abstraction for an attrset #8187

Open
lucasew opened this issue Apr 8, 2023 · 19 comments
Open

Proxy like abstraction for an attrset #8187

lucasew opened this issue Apr 8, 2023 · 19 comments
Labels
feature Feature request or proposal language The Nix expression language; parser, interpreter, primops, evaluation, etc

Comments

@lucasew
Copy link

lucasew commented Apr 8, 2023

Is your feature request related to a problem? Please describe.
Sometimes we pay a big cost evaluating stuff just to use a small set of what is available. A feature like this could reduce the impact for smaller evaluations.

Describe the solution you'd like
In the JavaScript world, there is a tool called proxy. Basically it allows you to create an object (like our attrset) that looks like a static object but any access or operation is handled by a function instead of a lookup.

Actually, the idea came from RFC 140. With this system, a lookup could be trivially implemented by a slice and sum of strings. A builtins.attrNames could be very expensive, though.

A clear and concise description of what you want to happen.
A builtin function to create a proxy attrset. Example:

proxy = builtins.mkProxy {
  lookup = key: "Hello, ${key}"; # proxy.world would return "Hello, world"
  list = []; # what builtins.attrNames would return, this is the default value
}

The keys could be memoized, like a evaluation cache. I have no idea actually about the difference in number of evaluation thunks for something like nixpkgs though.

Describe alternatives you've considered
(pkgs "package").outPath maybe

Additional context
Add any other context or screenshots about the feature request here.

Priorities

Add 👍 to issues you find important.

@lucasew lucasew added the feature Feature request or proposal label Apr 8, 2023
@roberth roberth added the language The Nix expression language; parser, interpreter, primops, evaluation, etc label Apr 9, 2023
@roberth
Copy link
Member

roberth commented Apr 9, 2023

This is a potential solution for

The keys could be memoized, like a evaluation cache.

Not memoizing could be a feature, as it allows the result to be garbage collected, potentially.
Memoization can be implemented as a separate feature

mkDynamicAttrs = builtins.mkProxy {
  lookup = builtins.memoise key: "Hello, ${key}";
}

I have no idea actually about the difference in number of evaluation thunks for something like nixpkgs though.

The difference would be asymptotic, as many packages use the same dependency. Upper bound may be exponential in the depth of transitive dependencies.

Describe alternatives you've considered
(pkgs "package").outPath maybe

Many accesses are through callPackage, which could take advantage of a plain function, but NixOS still needs pkgs to be an attrset.

list = []; # what builtins.attrNames would return, this is the default value

Supporting attribute names that are not in the list is of course not so nice. I think we might have a need for generic functions to distinguish between two classes of proxy attrsets: those that are properly enumerable (but may be expensive) and those that are not. Perhaps when list = null; then a new builtins.isEnumerable could return false for that proxy value, and e.g. Nixpkgs lib could act accordingly.

I think multiple dispatch in tAttrs should not be a concern, because we can exploit the same mechanism for other features that improve performance

It also opens the door to other features such as

@tomberek
Copy link
Contributor

A portion of this is implemented in master...flox:nix:dynamic_select I’ve got a continuation that implements this with a finite and infinite setting.

Called it a “getter” rather than a proxy, but I don’t have a strong opinion on the name.

{
__getter = key: “hi ${key}”;
__attrNames = [ “Alice” “Bob” ];
}

As @roberth points out, this in combination with a memoize primitive can be quite powerful. I had used the double-underscore pattern as this seems similar to the functor and rec-overrides usage. Another approach could be to make this construct memoize by-default and then it can be used to create a general memoizer out of it.

@roberth
Copy link
Member

roberth commented Apr 10, 2023

If implemented as a primop that returns a sibling of tAttrs, we don't have to worry about underscores.

Another approach could be to make this construct memoize by-default and then it can be used to create a general memoizer out of it.

This would restrict the memoizer to strings only, and disallow string context. This approach reduces the potential of the memoizer and the proxy (by doing too much), so I'd rather have the separation of concerns.

Regarding naming, lookup corresponds to ExprSelect in the AST. Perhaps this is a good opportunity to rename select to lookup everywhere? (For instance, inherit (foo) bar baz; arguably selects attributes, but is not a single lookup; in other words selection is close to projection (as in math), whereas lookup suggest only one key.)

@tomberek
Copy link
Contributor

If implemented as a primop that returns a sibling of tAttrs, we don't have to worry about underscores.

So we'd have normal attrSets, recursive attrSets, dynamic attrSets. Very similar, but perhaps sometimes unclear which one you are working with. ie: given an attrSet, which type is it?

@roberth
Copy link
Member

roberth commented Apr 10, 2023

I don't think recursive is quite the right name for #4154, because we already have rec { }, which doesn't need a special representation. Dynamic attribute sets could be confused with dynamic attributes, { ${foo} = bar; }.

Let's call them

perhaps sometimes unclear which one you are working with

You don't have to care about this when they have the same semantics. In order of increasing scariness:

Caching attrsets should have no reason to behave differently.

If we have lazily merged attrsets, either

  • they're implemented such that they are the default behavior, making strict spine attrsets a performance optimization for some primops, or
  • you know when you have one when you've built it yourself.

Proxy attrsets can break the invariant that x?${a} == (find a (attrNames x) != null), but otherwise they behave normally. This is my I think we should throw or return null instead of returning [], to avoid confusion.

Although proxies are the scariest, I think they're more powerful and useful than lazily merged ones. If I had to pick one of the two, I'd go with proxies.

@gytis-ivaskevicius
Copy link
Contributor

This would be really nice for nix2data (terranix, nix2vim, etc) projects 👍

@roberth
Copy link
Member

roberth commented Apr 26, 2023

This would be really nice for nix2data (terranix, nix2vim, etc) projects +1

I think language bindings or an IPC protocol for evaluation would be more helpful for such applications, especially terranix. The proxy type discussed here does not interact with the outside world. Even the adjacent use cases for the implementation work that I've mentioned don't interact much with it.

@roberth
Copy link
Member

roberth commented May 6, 2023

This feature would allow vtable-based method dispatch to be implemented transparently; a memory optimization that allows a different programming style.
Currently, "methods" must be bound early and stored in attributes that each have a per instance overhead. A proxy attrset could bind the methods late, so that they don't all have to be stored as individual closures.

let
new = class: data: mkProxy {
  hasAttr = k: class.methods?${k} || data?${k};
  getAttr = k: if class.methods?${k}
    then class.methods.${k} data 
    else data.${k};
}

class can also contain 0-arity members such as a constant _type.

Whether all of this is desirable, I don't know. The possible downsides don't seem so bad.

  • Unidiomatic Nix code style
    • Not that I like the current style. The incentive towards exposing everything makes things messy.
  • Harder to optimize in the evaluator than a native, fixed-functionality class system.
    • Prototyping outside the evaluator is good though; a necessary first step?

@roberth
Copy link
Member

roberth commented May 31, 2023

Another use case is functions like

Evaluating a cleaned up tree of packages based on recurseForDerivations is expensive, because the leaves have to be evaluated. However, by implementing such cleanup functions and transformations, we can optimize the use cases individually.

  • Getting a single package. Instead of having to construct the listing, we can check the relevant attributes only, and be about as strict as a regular lookup.
  • Checking whether an attribute exists: very similar to getting a single package.
  • Listing all the packages. This remains a bit slow, but could be further optimized by allowing a prefix tree of attribute names to be returned. That would keep the command line autocompletion quick. The prefix tree method of the proxy attrset would only be used by autocompletion as far as I can tell, but is worth the extra performance. Most evaluations won't have to list the packages, as they only depend on specific attributes, not all attrNames. Proxy attrsets make that distinction possible.

A simple // would defeat these optimizations though, so it would make sense to implement a special thunk or evaluation mode that would also achieve the goal of

@tomberek
Copy link
Contributor

Another fun possibility (though probably not an ideal or recommended one) is an alternative to NixOS/rfcs#148 using the more familiar method chaining syntax.

@Atry
Copy link
Contributor

Atry commented Apr 3, 2024

Given that a functor is an attrset used as a function, then a proxy should be to a function used as an attrset. Therefore I think the follow syntax would make sense:

myProxy = __proxy:
  if __proxy == "attrNames" then [ "name1" "name2" ]
  else if __proxy == "getAttr" then (name: "Hello, ${name}")
  else if __proxy == "hasAttr" then (name: name == "name1" || name == "name2")
  else throw "Unsupported proxy operator"

The argument name __proxy is a magic to denote the function is a proxy. attrNames, getAttr and hasAttr are names stolen from existing built-in functions.

This implementation can be easy and effient because:

  • it does not introduce a new data type to the language
  • it does not affect the runtime performance in happy path

@Atry
Copy link
Contributor

Atry commented Apr 3, 2024

A nix library (maybe in nixpkgs-lib?) can implement a proxy from another attrset as easy as:

mkProxy = vtable: __proxy: vtable.${__proxy}

@Atry
Copy link
Contributor

Atry commented Apr 3, 2024

@roberth Do you think if it would help if I turn it into an RFC?

@Atry
Copy link
Contributor

Atry commented Apr 3, 2024

lazyUpdate can be implemented as following:

lazyUpdate = first: second: __proxy:
  if __proxy == "attrNames" then
    lib.lists.unique ((builtins.attrNames first) ++ (builtins.attrNames second))
  else if __proxy == "getAttr" then
    second.${name} or first.${name}
  else if __proxy == "hasAttr" then
    second ? name || first ? name
  else
    throw "Unsupported proxy operator"

@roberth
Copy link
Member

roberth commented Apr 4, 2024

This implementation can be easy and effient because:

* it does not introduce a new data type to the language

* it does not affect the runtime performance in happy path

@Atry Do I understand correctly that you're proposing not to change the language at all, or to put it in a catch block?

I'm a bit concerned about the use of bare functions

  • The explicit intent is lost, making error messages worse.
  • isAttrs, isFunction and typeOf won't behave transparently.

Considering lucasew's motivating use case, a goal should be to make it as transparent as possible, so that a well-behaved proxy can be used in place of an equivalent attrset safely.

I believe an extra conditional jump is worth the cost, especially if we'll have other uses for it, such as attrsets that are backed by an evaluation cache. If we don't want to touch the hot path, we could probably still implement this in catch blocks, even if it's a new internalType.

@Atry
Copy link
Contributor

Atry commented Apr 4, 2024

Do I understand correctly that you're proposing not to change the language at all, or to put it in a catch block?

I was proposing to change the language but only in the "unhappy path", for example here.

I understand your concern and I agree they are valid points when desigining a new language.

Here I more value precedents and backward compatibility than transparent behavior.

nix-repl> builtins.isFunction { __functor = self: x: x; }
false

@Atry
Copy link
Contributor

Atry commented Apr 4, 2024

such as attrsets that are backed by an evaluation cache.

I think as a pure functional language, evaluation cache can be decoupled from the internal data type. The cache is not neccesarily backed by an attrset-like data structure in memory. I wonder if it is possible to memoize functions in SQLite.

@roberth
Copy link
Member

roberth commented Apr 4, 2024

Here I more value precedents and backward compatibility than transparent behavior.

nix-repl> builtins.isFunction { __functor = self: x: x; }
false

This is why lib.isFunction exists, and it is a drag on all code that may or may not need it. I do not want to repeat this mistake.

I think as a pure functional language, evaluation cache can be decoupled from the internal data type.

Possibly. It could also be done with a new thunk type, in which case a.k requires the retrieval of attrNames a from the cache, which is not optimal, but not terrible.

I wonder if it is possible to memoize functions in SQLite.

It is, with any key value store, but it is far from trivial.

@2xsaiko
Copy link

2xsaiko commented Apr 4, 2024

I don't think attaching special meaning to parameter binding names is a good idea since that's really awkward and also not done anywhere else in the language. This is also inherently a backwards-incompatible change anyway since the '.' operator needs to be extended to work on these new dynamic sets.

If anything, I would use the __functor-like style of "attrset with special functions":

{
  # all 5 of these need to be set for this to act like a dynamic set
  type = "dynamic";
  # `self' receives the containing set with the `type' attribute removed so that
  # set operators on `self' work normally here
  __attrNames = self: builtins.attrNames self.data;
  __hasAttr = self: name: self.data ? name;
  __getAttr = self: name: self.data.${name};
  __update = self: other: { type = "dynamic"; inherit (self) __attrNames __getAttr __update; data = self.data // other; };

  data = {a = 1; b = 2;};
}

But I would also much rather have a function like builtins.mkProxy since that's a lot less awkward to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Feature request or proposal language The Nix expression language; parser, interpreter, primops, evaluation, etc
Projects
None yet
Development

No branches or pull requests

6 participants