Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the case with splitting the big object graph #265

Closed
dadhi opened this issue Apr 14, 2020 · 8 comments · Fixed by #268
Closed

Improve the case with splitting the big object graph #265

dadhi opened this issue Apr 14, 2020 · 8 comments · Fixed by #268
Assignees
Labels
bug Something isn't working performance
Milestone

Comments

@dadhi
Copy link
Owner

dadhi commented Apr 14, 2020

Related to #258

The idea I had previously is to replace splitting via Resolve(<5+ arguments>) with a more lightweight and sophisticated approach.

The current way

But first, we need to recall what problem the splitting is solving.
It is supposed to solve the problem of a "big" object graph (500+ dependencies).

Why it is a problem?

  1. Because the big graph consumes a lot of memory
  2. Because the result compiled dynamic method IL is huge

How splitting via Resolve call helps with 1 and 2?

  1. Resolve call expression is supposed to be smaller than the sub-graph it replaces
  2. Resolve call breaks the sub-graph into a separate compiled method.

Let's see if 1 is true - I don't think so:

// Original graph
r => new D1(new D2(new D3(new D4(new D5(new D6(new D7()))))))

// Graph split with WithDependencyDepthToSplitObjectGraph(6)
r => new D1(new D2(new D3(new D4(new D5(
    r => r.Resolve(typeof(D6), null, IfUnresolved.Throw, null, 
        preResolveParentRequest, // constant which holds 5 ancestor request objects up to the root
        null)
))))))

It probably make sense for the big subgraph replaced by Resolve,
but the current algorithm split by counting from the root downward
(because looking ahead is harder... ?)

Does it solves the 2 - it seems so with the added cost is an additional entry in resolution cache and lookup time.

The new way

Ideally I would rather not splitting the graph at all and solving all the problems with minimized Overall memory usage (expression memoisation, cache, etc.) and smarter IL gen

In addition, those kinda things may help:
#211 (comment)

But given that we still need to split, say for the very big graph consisting of Transient services - those infinite new(new(new(...))), here is the idea.

Briefly, we may just split the expression by wrapping it in a lambda and then the lambda invocation.

It may not solve the 1 (seems Resolve doesn't solve it either) but will solve the 2:

We may dig deeper into the sub-graph and count the threshold + the minimal depth of sub-graph to split, then just traverse back and mark the corresponding parent request for the split. Then when all dependencies expressions are constructed we may check if the request is marked for the split and wrap the result service expression. In addition we should avoid the excessive split if the sub-graph is already split, e.g. by the Scoped service lambda or Func wrapper.

Btw: FEC is inlining the parameterless lambda invocation, so we should change that in order to compile the nested lambda into the separate unit:
dadhi/FastExpressionCompiler#230

@Havunen , what do you think?

@dadhi dadhi added bug Something isn't working performance labels Apr 14, 2020
@Havunen
Copy link
Contributor

Havunen commented Apr 15, 2020

In my opinion DryIoc already has many settings which can be configured user land. It would make sense to reduce these settings even if its sacrifice in performance if in trade it provides reliability and correctness.

To me it feels first priority should be fixing these bugs and then later performance can be optimized if it gives measurable gains.

I'm too new to DryIoc implementation details to say what would be correct way to go forward here in this case.

@dadhi
Copy link
Owner Author

dadhi commented Apr 15, 2020

In my opinion DryIoc already has many settings which can be configured user land. It would make sense to reduce these settings even if its sacrifice in performance if in trade it provides reliability and correctness.

Agree, some of the settings will be obsolete with V5 and I happily get rid of more. Reducing the settings should also improve perf and simplify the code. From the other hand some of them provide the escape path in case of problems...

So I'll test my ideas about splitting and will see what to do.

@dadhi
Copy link
Owner Author

dadhi commented Apr 21, 2020

Update:

First, I did not have a lot of time to play with it.

I tried to split graph from the bottom-up, instead of the current top-down. The idea was that it requires less nested lambdas, and so the less compilation.
But either I did it wrong or something else, the ReducedLoad tests started to take the 1 min instead of 3 sec.
Currently I think that in order to understand what's happening I need more diagnostic info. It would be good to count the total number of nodes in graph, number of nodes in split nested lambdas, and count the lambdas.

Thinking about battling the compilation times, there is another strategy to gradually compile the nested lambdas - using the interpretation for the remaining part, compiling more on each resolve until no more interpretation.
DryIoc has all instruments for doing that.

@dadhi
Copy link
Owner Author

dadhi commented Apr 30, 2020

Given a much more clear picture from the #258 here is the new idea:

To track the sub-expression usage count in the ExpressionCache. Then wrap the sub-expression with a compilation unit (lambda) if it is used twice and more times with the size bigger of the certain threshold.

@dzmitry-lahoda
Copy link
Contributor

Alternatively could collect dependency graph data from production application and suggest graph split and rebalance by manually changing code.

@dadhi
Copy link
Owner Author

dadhi commented May 4, 2020

Ok, likely I did something working in #268

  • Split sub-graph by wrapping it in the lambda and its invocation.
  • Fixed the bug in Fec with SO when collecting closure constants for Invocation node
  • Counting the total number of dependencies for each sub-graph
  • If sub-graph is interpreted to the singleton then dependency count for all ancestors is decreased appropriately
  • When the sub-expression is about to be cached (and possibly be reused) I am checking the dependency count and do the wrapping / split if needed, caching the result lambda invocation. The ancestors dependency count is decreased by splitted sub-graph count.

Currently splitting is hardcoded in #268 to >= 256 dependencies which produces 39 lambdas in ReducedLoadTest (use DecorateeReuse: true).

Btw, interpretation may optimize the split lambda invocation away - TBD

@dadhi
Copy link
Owner Author

dadhi commented May 4, 2020

I can still keep the split by inserting the Resolve call to be used without FEC, because System Compile doesn't like (or ignores) split by lambda invocation - TBD

As I see now, split by depth does not make sense, so will disable now and remove in V5.

@dadhi
Copy link
Owner Author

dadhi commented May 4, 2020

Another observation, in test I saw that almost a half time of compilation is taken by collecting closure constants. So I am back to the idea of collecting them while resolving the expression, then constructing the closure info (including the nested lambda info) in DryIoc and using FEC.TryCompileWithPreCreatedClosure(...) to avoid closure collecting step in FEC altogether.

@dadhi dadhi changed the title Improve the case with WithDependencyDepthToSplitObjectGraph splitting on the leafs and consuming more than a leaf Improve the case with splitting the big object graph May 18, 2020
@dadhi dadhi self-assigned this May 18, 2020
@dadhi dadhi added this to the v4.2.0 milestone May 18, 2020
@dadhi dadhi linked a pull request May 18, 2020 that will close this issue
@dadhi dadhi closed this as completed May 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants