Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested awful.spawn line callbacks aren't GCing something #1490

Closed
thekrampus opened this issue Jan 30, 2017 · 14 comments
Closed

Nested awful.spawn line callbacks aren't GCing something #1490

thekrampus opened this issue Jan 30, 2017 · 14 comments

Comments

@thekrampus
Copy link
Contributor

awesome v4.0-61-g466988eb (Harder, Better, Faster, Stronger)
 • Compiled against Lua 5.3.3 (running with Lua 5.3)
 • D-Bus support: ✔
 • execinfo support: ✔
 • RandR 1.5 support: ✔
 • LGI version: 0.9.1

In my personal config, I have a widget on an update timer. I call awful.spawn.easy_async on timeout to poll some system resources. In the callback for that first call, I call easy_async again to poll something else, and the callback for that updates the widget.

So, I wrote that, it seemed to work fine, and the next day awesome had eaten up a couple GB of memory. My quick-fix was to call collectgarbage() at the end of the first callback from easy_async and that did the trick.

I'll give a minimal example, but here's the original bug in my config in case I missed something important. I don't know much about Lua's garbage collection, sorry.

How to reproduce the issue:

Call awful.spawn.easy_async in a callback from another call to awful.spawn.easy_async. Here's a working example:

local test_cmd = "cat /proc/meminfo" -- or anything that'll write a bunch to stdout...
function async_test(n)
  print("using " .. collectgarbage("count") .. " kb")
  -- I logged the above to a file for analysis, but that's not needed to reproduce
  if n > 0 then
    awful.spawn.easy_async(test_cmd, function() async_test(n-1) end)
    -- using this instead does NOT show the same resource leak:
    -- awful.spawn.with_line_callback(test_cmd, {exit = function() async_test(n-1) end})
  end
end

-- call with some large argument to test:
async_test(10000)

Note: you don't need to build a huge stack of callbacks to reproduce this, but it shows the problem pretty fast.

Actual result:

Memory usage grows without bound, even when garbage collection is run. I don't know anything about memory profiling in Lua, so I just wrote collectgarbage("count") to a file each iteration and plotted the results:
async_plot
Usage falls when a GC cycle is triggered, but clearly something's not getting marked.

Expected result:

Here's the memory usage plot for the same example, but calling awful.spawn.with_line_callback with the function as an exit callback:
exit_plot
Looks to me like everything's getting collected.

My best guess is something's screwy with awful.spawn.read_lines. Don't really know anything about lgi but I'll look into this a little further sometime...

@SammysHP
Copy link
Contributor

SammysHP commented Jan 30, 2017

Please also see my issue #1489 where I reported a similar behavior. I was able to reproduce it with two simple lines:

testwidget = wibox.widget.textbox()
gears.timer.start_new(0.1, function() testwidget.text = math.random(100,999); return true end)

The memory usage increases (although it stabilizes sometimes for a while) if the widget is shown in the wibar. If it is not shown or I call collectgarbage() just before the return in the callback, the memory usage stays almost constant.

PS: Nice graphs!

@psychon
Copy link
Member

psychon commented Jan 30, 2017

local test_cmd = "cat /proc/meminfo" -- or anything that'll write a bunch to stdout..

Doesn't have to write to stdout. It also happens with { "sh", "-c", "exit" } (although the increase might be less steep...?)

Here's the memory usage plot for the same example, but calling awful.spawn.with_line_callback with the function as an exit callback:

That difference goes away when you also provide a stdout callback (stdout = function() end) so that the result of cat is read instead of sent to awesome's stdout.

When running awesome under valgrind --tool=massif, the last data point that it grabs begins with:

87.60% (54,631,750B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->49.58% (30,925,160B) 0x8021D5F: ??? (in /usr/lib64/liblua-5.3.so)
| ->42.52% (26,518,768B) 0x801DF5C: ??? (in /usr/lib64/liblua-5.3.so)
| | ->39.11% (24,391,846B) 0x802899F: ??? (in /usr/lib64/liblua-5.3.so)
| | | ->39.11% (24,391,846B) 0x80125FC: lua_newuserdata (in /usr/lib64/liblua-5.3.so)
| | |   ->23.27% (14,511,088B) 0x16928AE3: ??? (in /usr/lib64/lua/5.3/lgi/corelgilua51.so)
| | |   | ->23.05% (14,378,496B) 0x1692A534: lgi_callable_create (in /usr/lib64/lua/5.3/lgi/corelgilua51.so)
| | |   | | ->22.94% (14,308,720B) 0x1693026F: lgi_marshal_2c (in /usr/lib64/lua/5.3/lgi/corelgilua51.so)

So about a quarter of the memory was allocated by LGI to reference to callables.
Also, with the following patch I can see that none of the GDataInputStream instances are garbage-collected (however, they are if I add calls to collectgarbage("collect")):

diff --git a/awesomerc.lua b/awesomerc.lua
index 0c7fd01..368f533 100644
--- a/awesomerc.lua
+++ b/awesomerc.lua
@@ -567,3 +567,18 @@ end)
 client.connect_signal("focus", function(c) c.border_color = beautiful.border_focus end)
 client.connect_signal("unfocus", function(c) c.border_color = beautiful.border_normal end)
 -- }}}
+
+local test_cmd = "cat /proc/meminfo" -- or anything that'll write a bunch to stdout...
+local count = 500
+function async_test(n)
+  print(count-n, #awful.spawn.data,"using " .. collectgarbage("count") .. " kb")
+  -- I logged the above to a file for analysis, but that's not needed to reproduce
+  if n > 0 then
+    --awful.spawn.easy_async(test_cmd, function() async_test(n-1) end)
+    -- using this instead does NOT show the same resource leak:
+    awful.spawn.with_line_callback(test_cmd, {exit = function() async_test(n-1) end, stdout=function() end})
+  end
+end
+
+-- call with some large argument to test:
+async_test(count)
diff --git a/lib/awful/spawn.lua b/lib/awful/spawn.lua
index bcfb68d..b18c965 100644
--- a/lib/awful/spawn.lua
+++ b/lib/awful/spawn.lua
@@ -377,8 +377,10 @@ end
 -- @tparam[opt] function done_callback Function that is called when the
 --   operation finishes (e.g. due to end of file).
 -- @tparam[opt=false] boolean close Should the stream be closed after end-of-file?
+spawn.data = setmetatable({}, { __mode = "v"})
 function spawn.read_lines(input_stream, line_callback, done_callback, close)
     local stream = Gio.DataInputStream.new(input_stream)
+    table.insert(spawn.data, stream)
     local function done()
         if close then
             stream:close()

Edit: I tried to reproduce this with "just Lua", but I failed. Still, I'll leave my attempt at doing so here:

local lgi = require("lgi")
local glib = lgi.GLib
local gio = lgi.Gio

local count = 1000

local main = glib.MainLoop()

local function launch(i)
	print(i, collectgarbage("count"))

	local launcher = gio.SubprocessLauncher.new(gio.SubprocessFlags.STDOUT_PIPE)
	local proc = launcher:spawnv({"/bin/cat", "/proc/meminfo"})
	local pipe = proc:get_stdout_pipe()
	pipe = gio.DataInputStream.new(pipe)

	while true do
		local line, length = pipe:async_read_line()
		if not line then break end
	end
	pipe:async_close()

	assert(proc:async_wait())
	assert(proc:get_successful())
	if i < count then
		launch(i+1)
	else
		main:quit()
	end
end

gio.Async.start(function()
	local success, msg = pcall(launch, 1)
	if not success then
		print(msg)
	end
end)()

main:run()

@psychon
Copy link
Member

psychon commented Jan 30, 2017

Ok, I had some more luck with showing that this is not (directly) awesome's fault. The following Lua program reproduces the problem. Does anyone want to debug it or open a bug report for LGI?

local lgi = require("lgi")
local glib = lgi.GLib
local gio = lgi.Gio

local count = 1000

local main = glib.MainLoop()

local function launch(i)
	print(i, collectgarbage("count"))

	local launcher = gio.SubprocessLauncher.new(gio.SubprocessFlags.STDOUT_PIPE)
	local proc = launcher:spawnv({"/bin/cat", "/proc/meminfo"})
	local pipe = proc:get_stdout_pipe()
	pipe = gio.DataInputStream.new(pipe)

	local start_read, finish_read
	start_read = function()
		pipe:read_line_async(glib.PRIORITY_DEFAULT, nil, finish_read)
	end
	finish_read = function(obj, res)
		local line, length = obj:read_line_finish(res)
		if line then
			start_read()
		else
			assert(obj:close())
			assert(proc:wait())
			assert(proc:get_successful())
			if i < count then
				launch(i+1)
			else
				main:quit()
			end
		end
	end
	start_read()
end

launch(1)

main:run()

Edit: This might just be how Lua's garbage collector is supposed to work? I'm not sure.

@psychon
Copy link
Member

psychon commented Feb 2, 2017

Forwarded upstream: lgi-devs/lgi#157
As far as I can see, there is nothing we can do about this (besides using more aggressive GC settings or calling collectgarbage("collect") sometimes).

@psychon psychon closed this as completed Feb 2, 2017
@psychon
Copy link
Member

psychon commented Feb 4, 2017

@thekrampus What's your Lua version? I actually cannot reproduce with Lua 5.2...

@xinhaoyuan
Copy link
Contributor

Is this issue resolved? I was investigating memory leaks in my config and saw leaked Gio.*InputStream from spawn.lua. (4.3-git with Lua 5.2/5.3)

@psychon
Copy link
Member

psychon commented May 13, 2020

Not really, I guess. Workarounds include "kicking" Lua's GC by either using more aggressive settings or starting a timer that does GC steps.

How exactly did you see leaked Gio.*InputStream from spawn.lua? (More specifically: Do you have some fancy tool for diagnosing memory leaks in Lua that I do not know about?)

@xinhaoyuan
Copy link
Contributor

I'm still figuring out the best way to release personal code. It's not so fancy, but here is what I did:

  1. Use LD_PRELOAD to instrument lua_pushlightuserdata and lua_touserdata and add an additional lua_pcall before returning.
  2. In rc.lua, implement the call back function, and use lua debug functionality to attach timestamps and stack traces to the userdata in a weak table.
  3. Use awesome-client to dump the weak table once in a while.

@psychon
Copy link
Member

psychon commented May 15, 2020

Use awesome-client to dump the weak table once in a while.

Every time you do this, you are creating a strong reference to every entry of the weak table. This means that the object in question cannot be GC'd in this GC cycle. If you manage to do this every GC cycle, you are effectively preventing the GC from collecting your objects.

(No idea how often GC cycles occur, but I guess a lot of our GC problems would go away with more often GC cycles. Which would require either more aggressive GC settings or less memory usage...)

@psychon
Copy link
Member

psychon commented May 15, 2020

Oh and: Wow, that's a nice trick/hack. Very creative.

@xinhaoyuan
Copy link
Contributor

Thanks :) before dumping the table it calls collectgarbage, so it should work well with GC cycles?

@psychon
Copy link
Member

psychon commented May 15, 2020

Uhm. But then this is a genuine memory leak and not just a case of "the GC is too slow to collect my garbage".

You could try debugging that via my traverse.lua hack: #2983 (comment)

Beginning at _G (and debug.getregistry()) this "gets to" all reachable objects. It can then print how it managed to reach some object. For example, the version in the gist prints paths to screens: https://gist.github.com/psychon/5a9c3dbd04a3a4d2589ec6f264696c10#file-traverse-lua-L160

Based on this, you'd need to somehow make it print how it reached the GIoStream objects that are not being collected. That should tell us where the leak comes from.

(Also, I guess it makes sense to open a new issue instead of continuing to spam this on)

@xinhaoyuan
Copy link
Contributor

xinhaoyuan commented May 15, 2020

Ah good idea, thanks! I'll open a new issue once I get more information.

Edit: after some investigation I found out that one reason of the many userdata left in the weak table might be that I included lightuserdata, which would not be cleaned up by the GC cycles. Anyway, I don't see memory usage growing after I enforce the full GC every minute. So I don't worry about it for now.

@hoelzro
Copy link
Contributor

hoelzro commented Aug 10, 2022

@psychon It seems I'm a bit late to this party, but I stumbled upon this issue (among several others) while researching my own Awesome memory "leak" issues. I managed to abuse lgi's marshal module (which seems to be largely for internal use) to prepare a single closure value that gets re-used across calls to read_line_async:

local lgi = require 'lgi'
local core = require 'lgi.core'
local glib = lgi.GLib
local gio = lgi.Gio

-- read lines from standard input
local stdin = gio.UnixInputStream.new(0, false)
local stream = gio.DataInputStream.new(stdin)

local loop = glib.MainLoop.new()

local start_read, finish_read
function start_read()
  stream:read_line_async(glib.PRIORITY_DEFAULT, nil, finish_read)
end

function finish_read(obj, res)
  local line, length = obj:read_line_finish(res)

  assert(type(length) == 'number')
  if line then
    start_read()
  else
    loop:quit()
  end
end

-- create a "prepared" closure on the C-side exactly once, to avoid creating as many registry references
local ti = stream.read_line_async.params[4].typeinfo
local guard -- not sure what to do with this guard value :|
guard, finish_read = core.marshal.callback(ti.interface, finish_read)

start_read()

print(#debug.getregistry())
loop:run()
print(#debug.getregistry())

Before the core.marshal.callback hack, I saw roughly N*2 registry references created per line of input; now I see about N. The N references are all thread values, and they're all the same thread. I have some thoughts on that, but I can put those in lgi's issue tracker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants