gcylc: large memory footprint in graph mode - possible leak? #1620

arjclark · 2015-09-25T14:30:51Z

When using the graph view in gcylc the memory footprint grows over time to worryingly large values. My suspicion is that we have some sort of memory leak either in our code or the underlying graphviz library.

Following a report from a user about large memory footprints for their gcylc sessions of over a gig each (1.4G and 1.9G) I've been investigating this and reproducing signs of a problem with a version of the um's rose-stem suite running in simulation mode with each task set to take 60-120s to run. As per the user's report I've been setting the view to graph mode, ungrouping all families and filtering out succeeded tasks.

What leads me to believe we have a memory leak is that, logically, as the number of tasks decreases in the view the overheads should be smaller, whereas what I'm seeing is the memory footprint increasing with time until the point of completion of the suite at which point (blank graph) it remains at the higher value. Swapping view to one of the others doesn't reduce the footprint and swapping back to graph sets it increasing again. If I close and re-open the gui the footprint starts small again and then keeps increasing again adding to suggestion of a memory leak (although not as bad as it once had been in this case, presumably due to a smaller initial number of graph nodes and edges)

This doesn't appear to be a recent development (had thought it might be down to some of the caching we've started doing) as I've been able to reproduce it from master back to earlier 6.x versions (got as far back as 6.1.0 with this so far before hitting a task naming convention error).

What I'm not sure of is what's going on. Is this a cylc issue or a graphviz one as it doesn't relate to the other views. We generally wouldn't recommend users running with graph view for suites as large as the one I'm investigating but the problem presumably still exists for other suites too.

I'll look to produce a simple example suite to investigate with next week.

arjclark · 2015-09-28T09:44:56Z

Example suite to play with:

#!jinja2
[scheduling]
    [[dependencies]]
        graph = """

            INSTALL:succeed-all => FOO_FAM
            MAKERS:succeed-all => FOO_FAM
            {% for i0 in range(40) -%}
            really_long_task_name_foo_{{ i0 }} => BAR_SUBFAM_{{ i0 }}:succeed-all => really_long_task_name_baz_{{ i0 }} => housekeep
            {% endfor %}
        """
[runtime]
    [[root]]
        [[[simulation mode]]] 
        run time range = 60,120 # seconds
    [[INSTALL]]
    [[MAKERS]]
    [[FOO_FAM]]
    [[BAR_FAM]]
    [[BAZ_FAM]]
    {% for i0 in range(40) -%}
    [[BAR_SUBFAM_{{i0}}]]
    [[really_long_task_name_install_{{i0}}]]
    inherit = INSTALL
    [[really_long_task_name_maker_{{i0}}]]
    inherit = MAKERS
    [[really_long_task_name_foo_{{i0}}]]
    inherit = FOO_FAM
    {% for i1 in range(5) -%}
    [[really_long_task_name_bar_{{i0}}_{{i1}}]]
    inherit = BAR_FAM, BAR_SUBFAM_{{i0}}
    {% endfor %}
    [[really_long_task_name_baz_{{i0}}]]
    inherit = BAZ_FAM
    {% endfor %}
    [[housekeep]]

Increase the number of tasks to make the problem more prominent.

hjoliver · 2015-10-05T03:08:54Z

With your example suite above, I get a very similar gross memory profile with constant task run length regardless of the value (of run length), e.g. for script = sleep 10:

and script = sleep 120 is very similar:

But for varied lengths, e.g. your sim mode settings above or live mode with sleep $(( 60 + RANDOM % 60)), it is considerably worse:

In the latter case presumably the suite graph updates more often because fewer things happen near-simultaneously.

All this would seem to suggest that graph view memory goes up with each graph update, even though (certainly in this non-cycling case) the graph is not actually growing any bigger.

hjoliver · 2015-10-05T04:03:26Z

On a simpler suite, graph = foo => FAM:succeed-all => bar (with FAM having 100 members) the form of the memory curve remains the same but its steepness increases with increasing task name length (tested 3 chars and 100) ... which might support the theory that we are leaking graph nodes - which contain task names - in the graph view.

hjoliver · 2015-10-05T04:25:59Z

Same result at cylc-5.4.7

hjoliver · 2015-10-05T05:40:28Z

I've been doing some internal function profiling too - https://pypi.python.org/pypi/memory_profiler - but have not found the problem as yet. It could be in the xdot code.

arjclark · 2015-10-05T07:43:02Z

All this would seem to suggest that graph view memory goes up with each graph update, even though (certainly in this non-cycling case) the graph is not actually growing any bigger.
All this would seem to suggest that graph view memory goes up with each graph update, even though (certainly in this non-cycling case) the graph is not actually growing any bigger.

Agreed. I think its definitely related to the updates themselves.

the form of the memory curve remains the same but its steepness increases with increasing task name length

Interesting, that probably explains why the user's suite was so badly affected - lots of very long task names.

It could be in the xdot code

I've been coming to that conclusion myself but struggling to narrow down where the problem is occurring in it. Commenting out the:

                   self.update_xdot(no_zoom=needed_no_redraw)

line under def run() under update_graph.py seems to remove the problem (as it then doesn't redraw the graph view).

hjoliver · 2015-10-05T11:25:44Z

Filename: /home/oliverh/cylc/cylc.git/lib/xdot.py

Line #    Mem usage    Increment   Line Contents
================================================
   468     70.4 MiB      0.0 MiB       @profile
   469                                 def draw(self, cr, highlight_items=None):
   470     70.4 MiB      0.0 MiB           if highlight_items is None:
   471     70.4 MiB      0.0 MiB               highlight_items = ()
   472     70.4 MiB      0.0 MiB           cr.set_source_rgba(0.0, 0.0, 0.0, 1.0)
   473                             
   474     70.4 MiB      0.0 MiB           cr.set_line_cap(cairo.LINE_CAP_BUTT)
   475     70.4 MiB      0.0 MiB           cr.set_line_join(cairo.LINE_JOIN_MITER)
   476                             
   477     70.4 MiB      0.0 MiB           for shape in self.shapes:
   478     70.4 MiB      0.0 MiB               shape.draw(cr)
   479     70.4 MiB      0.0 MiB           for edge in self.edges:
   480     70.4 MiB      0.0 MiB               edge.draw(cr, highlight=(edge in highlight_items))
   481     70.9 MiB  *** 0.6 MiB ***       for node in self.nodes:
   482     70.9 MiB      0.0 MiB               node.draw(cr, highlight=(node in highlight_items))

For my 100-member suite with 100-char task names (above), line 481 here increments 0.6MiB for every graph update.

From further investigation, it seems that text rendering in xdot is the problem: returning immediately from lib/xdot.py:TextShape.draw() gets rid of the leak (but the resulting graph nodes are not text-labelled of course).

hjoliver · 2015-10-07T11:04:17Z

I suspect the problem is due to this extant pygtk bug: https://bugzilla.gnome.org/show_bug.cgi?id=599730

matthewrmshin · 2015-10-07T11:11:39Z

I suspect the problem is due to this extant pygtk bug: https://bugzilla.gnome.org/show_bug.cgi?id=599730

I have also come to the same conclusion having done various Google searches.

hjoliver · 2015-10-07T11:18:57Z

Not sure what to do about this. Presumably we could provide a pygtk patch ourselves, but having to build pygtk would be a significant complication for cylc installation!

hjoliver · 2015-10-07T11:26:31Z

Fixed in Ubuntu? https://bugs.launchpad.net/ubuntu/+source/pygtk/+bug/981376

hjoliver · 2015-10-07T11:27:27Z

See also https://bugzilla.gnome.org/show_bug.cgi?id=660216

matthewrmshin · 2015-10-08T08:36:16Z

Given that it is not really a cylc bug, if we can confirm that the bug is fixed on the latest release of GTK and has been rolled out on latest releases of major Linux distros such as RHEL, Ubuntu, (SuSE?), etc then we should close this issue.

hjoliver · 2015-10-08T09:48:06Z

Unfortunately that's not going to work. I took a git clone of pygtk, the last release 2.24.0 was more than four years ago. It looks like the bug was fixed two years ago, but I'm guessing it will never be released because pygtk is being superseded by pygobject:

$ git log --date=short --pretty="%cd %s" | head -n 8
2013-04-05 Add John Stowers as a maintainer to pygtk.doap
2013-04-05 Fix leaks of Pango objects  <<<<-----------------------------------
2011-10-02 reference: pygtk-gtkdialog.xml: fix link to gtk.VBox
2011-08-06 typo in configure variable
2011-08-06 use objective-c for quartz
2011-04-02 Post release version bump
2011-04-01 Update for 2.24.0 release   <<<<-----------------------------------
2011-03-30 Update README

matthewrmshin · 2015-10-08T10:49:15Z

That's unfortunate. Perhaps there is a need to move to pygobject (or other more modern GUI technology) sooner than we think.

hjoliver · 2015-10-08T11:03:52Z

See #112

arjclark · 2016-06-09T10:41:44Z

@matthewrmshin/@hjoliver - as this is a non-cylc issue and will be resolved/irrelevent when we later move to a different gui technology can we close this down?

hjoliver · 2016-06-09T20:13:52Z

@arjclark - I think we should keep this open while we're still using the current GUI technology, as a reference for anyone who notices the problem. Also, I remain skeptical that the current GUI can be superseded very quickly, unless perhaps we go to pygobject before developing a web-based GUI.

benfitzpatrick · 2016-06-15T14:37:59Z

Can we test this on a up-to-date distro?

hjoliver · 2016-06-19T14:27:43Z

@hjoliver to mention this issue on the new FAQ

We can't do anything about it until #112 is feasible.

matthewrmshin · 2017-11-27T14:30:31Z

Solving #1873 will hopefully kill this issue.

kinow · 2019-03-10T21:40:05Z

We are now working on the new Web GUI, which will replace the previous Desktop PyGTK GUI in our next major release of Cylc 8.

This issue is being closed for not being applicable to this new Web GUI.

arjclark added bug? priority medium labels Sep 25, 2015

arjclark added this to the soon milestone Sep 25, 2015

matthewrmshin removed the priority medium label Oct 8, 2015

hjoliver added non-cylc bug and removed bug? labels Oct 11, 2015

hjoliver modified the milestones: some-day, soon Jun 9, 2016

benfitzpatrick modified the milestones: soon, some-day Jun 15, 2016

benfitzpatrick assigned arjclark Jun 15, 2016

hjoliver modified the milestones: never, soon Jun 19, 2016

hjoliver mentioned this issue Jun 19, 2016

Port GUIs from PyGTK/Gnome-2 to PyGobject/Gnome-3 #112

Closed

arjclark removed their assignment Oct 7, 2016

hjoliver mentioned this issue Apr 27, 2018

GUI: unused view memory release? #2633

Closed

matthewrmshin added this to To do in Python3 Feb 1, 2019

oliver-sanders mentioned this issue Mar 7, 2019

Python3 #2966

Merged

5 tasks

kinow closed this as completed Mar 10, 2019

Python3 automation moved this from To do to Done Mar 10, 2019

matthewrmshin added wontfix superseded labels Mar 10, 2019

matthewrmshin removed this from the never milestone Mar 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gcylc: large memory footprint in graph mode - possible leak? #1620

gcylc: large memory footprint in graph mode - possible leak? #1620

arjclark commented Sep 25, 2015

arjclark commented Sep 28, 2015

hjoliver commented Oct 5, 2015

hjoliver commented Oct 5, 2015

hjoliver commented Oct 5, 2015

hjoliver commented Oct 5, 2015

arjclark commented Oct 5, 2015

hjoliver commented Oct 5, 2015

hjoliver commented Oct 7, 2015

matthewrmshin commented Oct 7, 2015

hjoliver commented Oct 7, 2015

hjoliver commented Oct 7, 2015

hjoliver commented Oct 7, 2015

matthewrmshin commented Oct 8, 2015

hjoliver commented Oct 8, 2015

matthewrmshin commented Oct 8, 2015

hjoliver commented Oct 8, 2015

arjclark commented Jun 9, 2016

hjoliver commented Jun 9, 2016

benfitzpatrick commented Jun 15, 2016

hjoliver commented Jun 19, 2016

matthewrmshin commented Nov 27, 2017

kinow commented Mar 10, 2019

gcylc: large memory footprint in graph mode - possible leak? #1620

gcylc: large memory footprint in graph mode - possible leak? #1620

Comments

arjclark commented Sep 25, 2015

arjclark commented Sep 28, 2015

hjoliver commented Oct 5, 2015

hjoliver commented Oct 5, 2015

hjoliver commented Oct 5, 2015

hjoliver commented Oct 5, 2015

arjclark commented Oct 5, 2015

hjoliver commented Oct 5, 2015

hjoliver commented Oct 7, 2015

matthewrmshin commented Oct 7, 2015

hjoliver commented Oct 7, 2015

hjoliver commented Oct 7, 2015

hjoliver commented Oct 7, 2015

matthewrmshin commented Oct 8, 2015

hjoliver commented Oct 8, 2015

matthewrmshin commented Oct 8, 2015

hjoliver commented Oct 8, 2015

arjclark commented Jun 9, 2016

hjoliver commented Jun 9, 2016

benfitzpatrick commented Jun 15, 2016

hjoliver commented Jun 19, 2016

matthewrmshin commented Nov 27, 2017

kinow commented Mar 10, 2019