Support for parameterized methods #2

BiggerNoise · 2013-11-18T20:01:28Z

Hello.

I just finished publishing https://github.com/KoanHealth/forget-me-not. While tipping the hat to @sferik, he pointed me to this gem. I agree with him that the world doesn't need these ideas developed in two places, so I'd like to offer to help in any way that I can.

Two things that I incorporated into forget-me-not are arguments and caching.

Arguments are pretty straightforward.

Caching is intended to be more of a system wide memoization. We started from @sferik's code to enable some classes that chew on aggregate medical claims data to share the results of their initial queries.

My big question is can these two ideas find a home in this gem?

mbj · 2013-11-18T20:10:07Z

@BiggerNoise I do not like the idea of memoizing the result of a send with arguments - at all.

IMHO it is an antipattern that should be avoided at all costs.

@dkubb There was a very verbose discussion we had in a comment thread, I cant find it. You?

snusnu · 2013-11-18T20:12:42Z

@mbj while i don't see any reason why it would be an antipattern, i agree that if anything, it's "dangerous", and should be used with care. I probably wouldn't mind supporting it, even tho i'd use it rarely. That being said, I would use it in certain places, with great care.

snusnu · 2013-11-18T20:15:37Z

To elaborate a bit, the usecases I can think of involve having a somewhat low tech cache, when adding some "proper cache" could be counted as overengineering. I agree that there are probably always ways around this, tho.

snusnu · 2013-11-18T20:16:39Z

@mbj would you "feel better about it" if the cache was external to the object?

mbj · 2013-11-18T20:16:56Z

@snusnu Isnt support for something dangerous in the public API an antipattern?

mbj · 2013-11-18T20:20:00Z

@snusnu

Mhh:

receiver.selector(argument)

can be generically externalized as:

Cache.new(receiver, selector, argument).call

Around that API some sugar could be added:

cache = reciever.cache(:selector)
cache.call(argument)

In this way the cache is externalized and I'd support it.

snusnu · 2013-11-18T20:20:08Z

@mbj actually, imo, no. Although I do agree that public API should make the fact that there might be dragons, very obvious, ugly even.

BiggerNoise · 2013-11-18T20:20:19Z

@mbj - I'd really be interested in reading the comment thread you mention. I am having a hard time grasping why this would be considered an anti-pattern or even dangerous.

snusnu · 2013-11-18T20:21:49Z

@BiggerNoise I'd like to see the thread too, although I vaguely remember it. I guess by dangerous, we mean that using something like this might blow up memory, if the workings of client code aren't well thought of.

misfo · 2013-11-18T20:26:30Z

It was discussed in this PR to Adamantium:
dkubb/adamantium#21

mbj · 2013-11-18T20:28:46Z

I'd support this in case we can find a clean use case where caching nonzero arity methods is cleaner than using an external wrapper.

snusnu · 2013-11-18T20:33:40Z

@mbj I wonder if that's the point tho. If the cache is external to the object, the feature can be thought of as simply being a convenience DSL.

BiggerNoise · 2013-11-18T20:35:38Z

OK. That's definitely a reasonable caution. Not sure I would call it an anti-pattern, but could definitely be misused in ugly ways.

Our original use cases generally involved a very small set of arguments. Something like figuring out financial details for different kinds of medical visit types (e.g., Emergency, Inpatient, etc.). The argument sets were generally known and/or enumerable. The logic only varied by the visit type.

These methods were on a service object and we would create one per request. The raw SQL results were cached, but the 'chewed' results were merely memoized.

mbj · 2013-11-18T20:39:23Z

@snusnu If the object does not have a reference to the cache you have a small grained cache invalidation. Its just as composition over inheritance. Adding more complexity to an entity or combine two entities of low complexity.

snusnu · 2013-11-18T20:39:32Z

@BiggerNoise fwiw, I had similar usecases in mind. Caching a limited number of objects that need some building up. On the implementation side, one way to stop complexity from growing, would be to expose this feature with an explicit method (instead of making memoize do different things based on the arity)

snusnu · 2013-11-18T20:40:58Z

@mbj i'm not entirely sure i get it. If the implementation is done externalized, what harm does it do to provide extra DSL to make using that feature more comfortable?

All that said, I have no strong opinion ...

mbj · 2013-11-18T20:43:29Z

@snusnu Lets start with the least invasive way and reduce the public interface. I'd love to have an immutable method that only supports zero arity. So we could extend the memoize one in need. But I still thing the added complexity is not needed at all.

@dkubb We planned to alias memoize to immutable ?

BiggerNoise · 2013-11-19T01:03:09Z

This is a pretty important use case for me, so I suppose we'll just let the gems develop independently.

Although I feel that the larger issue is memoizing in any long lived object, I did update the documentation in my gem to reflect the additional hazard posed by argument based memoization. I appreciate the discussion here that helped flesh these thoughts out.

dkubb · 2013-11-19T04:48:17Z

@mbj I think we talked about aliasing memoize to idempotent. I'd still support that since it is technically correct.

@BiggerNoise I feel like there may be a solution somewhere, probably with a cross-impl weakref lib. At the very least we'd need something that is well supported and handles all the rubies defined in our travis config (which is nearly every ruby they provide). My biggest concern is holding references to the original arguments passed into the method, thus causing a memory leak since they can't be GC'd.

One thing we have going for us is that I refactored the method building logic into a separate class. This class can handle memoizing the zero-arity case quite well; we could have a second class that handles the more complex case when the arity is non-zero. I'd really like to keep the zero-arity case as simple as possible though, since that's going to be my common case.

If anyone is interested in working on this I could open up a branch with some initial ideas and then grant you commit access to play around until we get a solution that meets our needs and is hard to use incorrectly.

BiggerNoise · 2013-11-19T05:12:25Z

My implementation does not hold the original arguments. Since we initially adapted the code to make a Cacheable class and we wanted the cache key to be a string, the original implementation produced a string key which included a portion based on to_s()ing the supplied arguments.

We have since replaced that part of the key with simply taking the hash() of the supplied arguments (whether there are any or not). Since we don't collapse the hash value into a limited number of buckets, the chances of a collision for any real set of arguments is pretty small.

misfo · 2013-11-19T07:54:57Z

My biggest concern is holding references to the original arguments passed into the method, thus causing a memory leak since they can't be GC'd.

We can avoid keeping references to the original arguments entirely and keep only the value of hashing each argument instead. See the PR I referenced above.

The return values need to be held onto, though, which could still cause problems if there wasn't any cache expiration logic added

mbj · 2013-11-19T09:49:06Z

I still think we should discuss this with a use case. One that demonstrates an external cache is the more complex solution.

BiggerNoise · 2013-11-19T15:14:46Z

I disagree that there needs to be any real difference between methods that take args vs those that don't. Perhaps a different call to caution the caller that there are potential memory issues hiding out, but the use cases as well as the underlying implementation are identical.

I outlined our specific use case above, but it essentially involves an Aggregator service object that chews on the results of cached database queries to implement methods like per_member_per_month_cost(visit_type). Visit type is an enumerated set of types (inpatient, outpatient, emergency, etc.). The logic in per_member_per_month_cost only differs by visit type.

If I had a different method total_per_member_per_month_cost(), it would be memoizable without args. I am trying, but failing, to discern the difference that makes memoizing one fine but the other a problem.

Memoizing in any long lived object is always going to pose problems for the naive user. I would propose that those issues, as well as the additional issues caused by extra memory consumption, are best handled with documentation; I've now got a pretty extensive caution section now in https://github.com/KoanHealth/forget-me-not.

sferik · 2013-11-29T19:47:53Z

I’m reopening this issue because I think memoizing methods with arguments is an important use case. One of the canonical examples of memoization (discussed on the Ruby Forum) is memozing a recursive Fibonacci function.

def fib(num)
  return num if num < 2
  fib(num - 1) + fib(num - 2)
end

A memoized version of this function will run significantly faster than the non-memoized equivalent. Granted, it uses more memory but this’s the essential tradeoff of memoization: memory in exchange for CPU cycles. If CPU cycles are the bottleneck, memoization may be a good choice; if memory is the bottleneck, re-computing the values may make more sense.

I don’t think anyone said definitively that we would not add this feature, so I’m going to leave this issue open for discussion until we reach a conclusion, one way or the other.

Personally, I’m okay with giving users power tools, even if it means they can shoot themselves in the foot. Ruby itself has many such features.

BiggerNoise · 2013-11-29T20:26:06Z

Thanks for re-opening.

There was another question in my original post that seems to have been swamped. I opened another issue for any discussion related to cacheable.

I'm not sure what I can add to the discussion on memoization, but I'll keep an eye on the discussion.

sferik · 2013-11-30T07:47:13Z

@BiggerNoise Thanks. I think it makes sense to discuss these two issues separately. As such, I’ve changed the title of this issue.

dkubb · 2013-11-30T20:06:41Z

I'm beginning to put together a succinct list of requirements I would have for this specific feature. This specific thread is more like a discussion, which tend to ramble a bit. Since one of us is (maybe) going to implement this I wanted a single unambiguous place to list my requirements for such a feature:

https://gist.github.com/dkubb/7723595

For each of these I can provide justification when it is not otherwise obvious. These constraints are ones that I've already defined for Adamantium and ROM gems, which both use (or will be using) memoizable under the hood.

asthasr · 2013-12-08T00:21:19Z

This gem was recently removed from one of my company's projects specifically because it doesn't meet the use case of memoizing methods with arguments. I'd really like to see this implemented (a simple hash of provided arguments allowing the creation of a lookup table, for example), because without it we've resorted to a bit of a homegrown solution that's not nearly as nice as just saying memoize method.

There are memory issues if the objects are long lived or the results huge, but most of the time things that I'm interested in memoizing (as opposed to caching) are neither. It's a method call that's used many times in a view or something similar, and whose results should be garbage collected when appropriate, along with the object. Using an external cache costs latency, even in the case of very fast caches; the memory penalty seems a small price to pay when I'm probably memoizing a few strings.

Even blocks should be hashable, and we're interested only in the {hash_of_args => result}, so I can see very few cases where a method without side effects would be unmemoizable.

asthasr · 2013-12-12T13:47:27Z

Upon doing more research, I found that Procs can't be memoized. 👎

mbj · 2013-12-12T13:56:29Z

@asthasr I'm generally against this feature. And I think procs cannot/shouldnot be memoized. But what is your actual reason for the statement: "Procs cannot be memoized". I'm interested ;)

asthasr · 2013-12-12T14:06:56Z

Hashes of identical procs aren't the same; there is no way to compare them without doing something terrible like source introspection.

asthasr · 2013-12-12T14:32:51Z

Regarding the argument of whether to memoize functions with arguments, please note that the original paper defining "memo functions" intended them to be used with arguments.

benissimo · 2013-12-12T14:34:56Z

When it dropped support for memoizing arguments, I stopped using this gem.

dkubb · 2013-12-12T17:18:49Z

@asthasr yeah, you discovered the reason behind that last item on my requirements list. I've implemented memoization like this at least a half dozen times in DataMapper and elsewhere, and the things that people are rediscovering through this process will very likely result in agreement on the list I pasted above.

One thing I'd like to highlight is that the memoization code in this gem was extracted from a gem that was written in 2009. Its gone through quite a lot of evolution. Memoization with arguments was attempted with almost all of the suggestions here and in other threads, but it was backed out due to various issues (like objects not being GC'd in long running processes). The gist above is my current best understanding of how we could actually implement this and not hit one of the land mines that have already been found.

However, feel free to continue with experimentation using other approaches. If you can find an approach that works, but does not require something like a weak reference, then I'd love to see it in the form of a gist or better yet a pull request.

As far as whether we should memoize functions with arguments, I'm not against it as long as we don't compromise correctness in the process. The memoization code is/will be used in quite a few gems and commercial applications that I maintain, and I will not introduce anything that compromises the correctness of the systems. I realize this is probably frustrating to people who want this feature now, but I'd rather hold off doing something rather than doing it half-assed.

BiggerNoise · 2013-12-12T17:37:16Z

@dkubb I am curious if you ever tried hashing the arguments rather than retaining them. If so, did you ever run into issues with hash collisions?

Given your last paragraph, I can understand the reluctance to use that approach as it cannot be proven to be correct. We chose that approach because it we felt that for any real set of arguments, it was extraordinarily unlikely to be incorrect.

BiggerNoise · 2013-12-12T17:46:06Z

@dkubb - I saw your comment on the gist and am replying there

asthasr · 2013-12-12T17:58:16Z

I think the hash approach is the best, with a warning in the documentation that any parameters must have a "real" hash method (i.e. if a == b then a.hash == b.hash) or some benefit will be lost. That way, GC issues are avoided, and it'll still work for most cases.

dkubb · 2013-12-12T17:59:15Z

@BiggerNoise it was attempted in dkubb/adamantium#21

Aside from the obvious hash collisions, there's also the case where #hash is not implemented properly. This is surprisingly common and one of the reasons I wrote equalizer.

In fact just two days ago I was actually debugging this exact issue in a production server. Someone had created a cache key without factoring in all the variables, so two different objects hashed to the same value. The cache was returning the original object's cached result instead of computing and caching results with the newer object. This wasn't with memoizable, but something home grown (I did not write it! ;)).

mbj · 2013-12-12T18:09:32Z

I'm still thinking: If someone would need to memoize return values of methods with arguments, he should just factor this out into a caching object.

receiver.selector(argument) => Cache.new(receiver, selector).call(argument). This way cache invalidation is not bound to the livecycle of receiver. And people can fail theirselves implementing Cache#call ;)

mbj · 2013-12-12T18:16:26Z

Here is the caching object from above, and pls lets stop discussion ;)

class Cache
  include Concord.new(:receiver, :selector)

  def call(*arguments)
    raise "Calls with blocks cannot be memoized" if block_given?

    do_cool_caching(arguments) do
      receiver.public_send(*arguments)
    end
  end

  def do_cool_caching(arguments)
     # fail here
  end
end

And now lets finish ROM in our limited OSS time!

BiggerNoise · 2013-12-12T19:21:08Z

@mbj - I don't think that an external cache adds much to this discussion. I don't know why you would memoize methods that didn't take args (transparent to a class's users) and use an external cache for methods that did (very visible to a class's users); the use case for memoizing with and without arguments is the same.

I'll grant that creating a provably correct implementation is a much tougher beast when you have arguments. Any implementation is probably going to have to choose some set of compromises; I am not sure that all of the bases can be covered by a single implementation.

Perhaps a pluggable policy is the best approach. By default, you get the no arguments approach which is maximally correct. Changing an option would allow you to sacrifice some correctness for additional functionality.

@dkubb - If an approach like that is something that you would consider, then I'll take a look at the code in this gem and make a recommendation on where I think it might fit. That way you guys can focus on ROM.

mbj · 2013-12-12T19:49:14Z

@BiggerNoise I think it does add value to the discussion. Because it allows people to decouple the cache live-cycle from the receiving objects live-cycle, instantiating a new cache object does also allow to explicitly invalidate the cache. Using that caching object could maybe mitigate this feature request to memoizable itself.

dkubb · 2013-12-12T20:02:38Z

@BiggerNoise I would support a plugin approach, probably with a Memoizable::MethodBuilder subclass. I could move the #assert_zero_arity line to #call, which would allow the #initialize method to be reused.

The memoize method would probably need to accept an options Hash, which is fine since it is the same interface in adamantium, which is what memoizable was extracted from. The options could specify which method builder class to use.

The plugin could be in your application or in a separate gem for now.

BiggerNoise · 2013-12-12T20:43:25Z

@mbj - If it is important to a developer to decouple the lifetimes of the receiver and the cache as well as take control over cache invalidation, then the argument count is a peripheral concern. What said developer really needs (and what you are describing) is an alternative to memoization.

@dkubb - Let me take a look at that, it will probably be this weekend.

mbj · 2015-08-21T21:24:39Z

@dkubb Lets close this issue.

IMO its an misfeature. And we can have a plugin for people who want it. It'll not be shipped from this repo.

BiggerNoise · 2015-08-21T21:58:47Z

I am not pursuing this any further, I have implemented the feature in my own gem.

BiggerNoise closed this as completed Nov 19, 2013

sferik reopened this Nov 29, 2013

BiggerNoise mentioned this issue Nov 29, 2013

Cacheable capability? #3

Closed

sferik mentioned this issue Dec 18, 2013

Add devtools as a development dependency #4

Closed

BiggerNoise closed this as completed Aug 21, 2015

backus mentioned this issue Nov 30, 2015

Use a leading underscore for memoization thoughtbot/guides#373

Merged

Support for parameterized methods #2

Support for parameterized methods #2

Comments

BiggerNoise commented Nov 18, 2013

mbj commented Nov 18, 2013

snusnu commented Nov 18, 2013

snusnu commented Nov 18, 2013

snusnu commented Nov 18, 2013

mbj commented Nov 18, 2013

mbj commented Nov 18, 2013

snusnu commented Nov 18, 2013

BiggerNoise commented Nov 18, 2013

snusnu commented Nov 18, 2013

misfo commented Nov 18, 2013

mbj commented Nov 18, 2013

snusnu commented Nov 18, 2013

BiggerNoise commented Nov 18, 2013

mbj commented Nov 18, 2013

snusnu commented Nov 18, 2013

snusnu commented Nov 18, 2013

mbj commented Nov 18, 2013

BiggerNoise commented Nov 19, 2013

dkubb commented Nov 19, 2013

BiggerNoise commented Nov 19, 2013

misfo commented Nov 19, 2013

mbj commented Nov 19, 2013

BiggerNoise commented Nov 19, 2013

sferik commented Nov 29, 2013

BiggerNoise commented Nov 29, 2013

sferik commented Nov 30, 2013

dkubb commented Nov 30, 2013

asthasr commented Dec 8, 2013

asthasr commented Dec 12, 2013

mbj commented Dec 12, 2013

asthasr commented Dec 12, 2013

asthasr commented Dec 12, 2013

benissimo commented Dec 12, 2013

dkubb commented Dec 12, 2013

BiggerNoise commented Dec 12, 2013

BiggerNoise commented Dec 12, 2013

asthasr commented Dec 12, 2013

dkubb commented Dec 12, 2013

mbj commented Dec 12, 2013

mbj commented Dec 12, 2013

BiggerNoise commented Dec 12, 2013

mbj commented Dec 12, 2013

dkubb commented Dec 12, 2013

BiggerNoise commented Dec 12, 2013

mbj commented Aug 21, 2015

BiggerNoise commented Aug 21, 2015