Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce amount of subscription data stored in Elixir registry #1006

Conversation

zoldar
Copy link
Contributor

@zoldar zoldar commented Dec 22, 2020

This PR is a follow-up to a proposal that was initially posted on Elixir slack and on Elixir Forum.

Instead of storing precomputed initial_phases where options (along with context) are repeated several times over, we are storing just enough information to reconstruct them using Absinthe.Pipeline.for_document/2 which comes down to building a list of phases with options put in a number of spots, which should not be computationally expensive.

If I haven鈥檛 missed anything obvious, that change should be transparent and non-breaking 馃槄

I have run a series of simple tests with absinthe鈥檚 test suite as well as test suite of a larger app I鈥檓 working on that uses subscriptions extensively. For the purpose of the experiment I have added a line calculating byte size of a serialized map stored in elixir registry by Absinthe.Subscriptions.subscribe, like this:

doc_value |> :erlang.term_to_binary() |> byte_size() |> IO.inspect(label: :PAYLOAD)

This is not the same amount of space it takes up in ETS memory but it gives an idea in relative terms. Then I have run tests for both absinthe and the app without and with the patch applied. Here are the results:

https://gist.github.com/zoldar/9f06e95bfced82cd25793c25f2e632c5

As can be seen, the memory usage can be lowered even >10x this way. Given, in our case, roughly ~90% of VM memory in production is taken up by those ETS entries for subscriptions (multiple subscriptions per user), that would substantially lower memory usage. What do you think?

UPDATE: Here鈥檚 one more round of test runs against our app where the sum of memory occupied by the ETS tables for key partitions is checked (10648 is subtracted from each table because that seems to be the base size on this particular setup):

https://gist.github.com/zoldar/d99c636a933725c73d2a4f7131fdd936

@zoldar zoldar force-pushed the zoldar/reduce-size-of-subscription-data-kept-in-elixir-registry branch from aa4760e to 26ce61b Compare December 22, 2020 09:52
@benwilson512
Copy link
Contributor

Hey @zoldar this is really great! The current structure of some of this stuff definitely causes issues when the structures are flattened inside of :ets. The changes you're making here are very small and look pretty safe, but I need to make sure that the phases are the correct ones. I'll pull this down today and take a look, thanks!

Copy link
Contributor

@benwilson512 benwilson512 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the specific approach here can't work, for_document/2 is simply not equivalent to doc.initial_phases. However, if there are a lot of memory savings to be had, then perhaps we can do some "compression" of sorts on the pipeline that is registered to minimize the space duplication that occurs when it is flattened in the :ets table. Let me take a look and see.

lib/absinthe/subscription.ex Show resolved Hide resolved
@zoldar zoldar force-pushed the zoldar/reduce-size-of-subscription-data-kept-in-elixir-registry branch from 26ce61b to 187e655 Compare December 28, 2020 13:10
@zoldar
Copy link
Contributor Author

zoldar commented Dec 28, 2020

@benwilson512 You are right of course, I got hung up on Absinthe.run using that default pipeline 馃う

I have made a PoC of an alternative approach in 187e655 - as described in #1006 (comment).

I have run that version against our app's test suite again and the savings seem to be still pretty sizable:

https://gist.github.com/zoldar/f185d0896a217e36db13bc1f67f7f12e

I'm still going to verify it with additional tests, but when inspecting the serialized/deserialised data, it seems to behave correctly. What do you think about this approach?

@binaryseed
Copy link
Contributor

I'm curious if we can look at the broader context of this... what is it that is winding up in options that is getting duplicated & taking up a lot of memory?

@benwilson512
Copy link
Contributor

@binaryseed that's a good question. Often one killer thing for folks is the context, a large context can explode things very fast. Large inputs (variables) can do the same thing.

In Absinthe 2.0 I want to build a proper %Absinthe.Pipeline{} struct that would contain a simple phases key which would be a list of just the modules, and then an options key, I think per phase options adds unnecessary complications.

I think the best option right now is to sort of re-create that structure by de-duplicating the options. Off the top of my head:

defp compress([], phases, opts_map, _index) do
  {Enum.reverse(phases), Map.new(opts_map, fn {k, v} -> {v, k} end)}
end

defp compress([phase | phases], compressed_phases, opts_map, index) do
  case phase do
    {phase, opts} ->
      {opts_map, index} = compress_opts(opts_map, opts, index)
      compress(phases, [{phase, index} | compressed_phases], opts_map, index)
    phase ->
      compress(phases, [phase, compressed_phases], opts_map, index)
  end
end

defp compress_opts(opts_map, opts, index) do
  if Map.has_key?(opts_map, opts) do
    {opts_map, index}
  else
    index = index + 1
    opts_map = Map.put(opts_map, opts, index)
    {opts_map, index}
  end
end

Basically, this turns a list of {phase, opts} tuples into a list of {phase, integer} tuples where the integer maps to a unique opts value. Haven't actually run this but I'm pretty sure it's roughly correct. Decompressing is as simple as:

Enum.map(compressed_phases, fn
  {phase, i} -> {phase, Map.fetch!(opts_map, i)}
  phase -> phase
end)

@zoldar
Copy link
Contributor Author

zoldar commented Dec 29, 2020

@benwilson512 Just to double-check - have you seen 187e655 ? 馃槄 (mentioned in #1006 (comment))

Copy link
Contributor

@benwilson512 benwilson512 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah lol I missed that sorry. Looks great, and I'm glad to see there are still size savings.

lib/absinthe/subscription/phase_serializer.ex Outdated Show resolved Hide resolved

{{phase, options_label}, options_reverse_map}
else
new_index = map_size(options_reverse_map)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah clever, using the map size as an implicit indexer.

@zoldar
Copy link
Contributor Author

zoldar commented Dec 29, 2020

@benwilson512 Cool! I'll get it into shape (apply feedback and add serializer tests) soon.

@zoldar
Copy link
Contributor Author

zoldar commented Dec 30, 2020

@benwilson512 The PR should be now fully ready for review 馃槄

One observation though. I'm keeping a fork of an older tag of absinthe at the moment with the same patch applied. The fork does not contain this change: b407000. Telemetry phases there are repeating the original options two times with customized event for each respective phase. For current version of absinthe that means - provided this default pipeline serves as a base - that even the packed version will always have at least 3 full copies of options stored from the get go - with a significant overlap.

I'm wondering if it would be worth the effort to make the serializer try to handle this, doing an extra processing step (diffing existing options and finding a common base perhaps)? Maybe there's a better way to handle this though.

(That's definitely something beyond the scope of this PR).

@zoldar
Copy link
Contributor Author

zoldar commented Jan 6, 2021

If there's anything that needs addressing to make this patch acceptable, please let me know. Thanks!

@benwilson512 benwilson512 merged commit 5e54c50 into absinthe-graphql:master Jan 9, 2021
@benwilson512
Copy link
Contributor

This looks great, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants