Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examples of analyzing heap_dump #10

Open
kbaum opened this issue Apr 25, 2013 · 7 comments
Open

Examples of analyzing heap_dump #10

kbaum opened this issue Apr 25, 2013 · 7 comments

Comments

@kbaum
Copy link

kbaum commented Apr 25, 2013

Are there any example scripts that analyze this heap dump and perhaps produce some type of visualization with a tool like graphviz? The heap_dump looks incredibly useful but I am having a difficult time figuring out how to use it to trace a memory leak.

thx!

-karl

@Vasfed
Copy link
Owner

Vasfed commented Apr 25, 2013

I usually do analysing via grep and small scripts around it
But there's some pre-pre-alpha tool by @oruen - https://github.com/oruen/trailblazer

@kbaum
Copy link
Author

kbaum commented Apr 25, 2013

I have a memory leak and i have successfully dumped the heap but now i really am having a hard time knowing where to begin.

@Vasfed
Copy link
Owner

Vasfed commented Apr 25, 2013

Do you have a way to replicate the leak?
Best way is to find a way to examine a leak in progress and watch HeapDump.count_objects with your namespace as leak becomes bigger
In this way you usually can figure out what types of objects leak.
Also usually there're some objects like sessions/controllers/requests whose counts are proportional to load you throw at the app, and also which should be cleaned up when load is over.
HeapDump calls gc by default so that what is in count/dump - it is leaked or used at the moment.

Then trace these objects references to some root-objects (globals, class variables etc.), also look for symbol#to_proc (usually used in constructs like arr.map(&:this_is_the_symbol)) - it tends to leave references to context(VM/env/object self) linger in cache

@kbaum
Copy link
Author

kbaum commented Apr 26, 2013

Regarding HeapDump.count_objects, i dont have a namespace i use for all of my objects. When i do HeapDump.count_objects, i just get something like:

4] pry(main)> puts HeapDump.count_objects
{
    "total_slots": 754211,
    "free_slots": 103849,
    "basic_types": {
        "T_OBJECT": 16798,
        "T_CLASS": 9724,
        "T_MODULE": 1909,
        "T_FLOAT": 9,
        "T_STRING": 344722,
        "T_REGEXP": 3528,
        "T_ARRAY": 154334,
        "T_HASH": 12191,
        "T_STRUCT": 608,
        "T_BIGNUM": 14,
        "T_FILE": 5,
        "T_DATA": 59356,
        "T_MATCH": 87,
        "T_COMPLEX": 1,
        "T_RATIONAL": 77,
        "T_NODE": 44205,
        "T_ICLASS": 2794
    },
    "user_types": {

    }
}
=> nil

Re: tracing these object references, I am having a hard time understanding how to trace references. I think the problem is i dont fully understand the meaning of all of the fields within the json. How to know if one object references another?

Thanks for your help!

@Vasfed
Copy link
Owner

Vasfed commented Apr 26, 2013

count_objects cannot determine that for you, as only you know structure of your code

By namespace i mean с++ term, you can think of it as of root module/class
For example:

module ThisIsNamespace
   class SomeClass
   ...
   end
   class SomeAnotherClass
   ...
   end
end

classes will be named ThisIsNamespace::SomeClass, ThisIsNamespace::SomeAnotherClass etc., so you can count their instances without naming them all - HeapDump.count_objects([ThisIsNamespace, SomeOtherNamespace])

One object references another if (simplified) it has another object's id stored in it. Most long numbers in dump are ids. Each line contains one object so you can simply grep for target id, this will give the object itself and all objects that reference it.
Unfortunately in current version there's no way to tell a number from id, but you usually know if you use id values along with references (@foo = other_obj.object_id - will not produce a hard reference, but will show up in dump).

@kbaum
Copy link
Author

kbaum commented Apr 26, 2013

In regards to count_objects, i i have no idea what classes are leaking so my first instinct is to just count them all. How do i know where to start? Why not just have a way to count all objects?

RE: tracing object references. I think the format of the json should allow for generic scripts to make sense of the heap dump without understanding the developer's object model. Imagine how much more usage heap_dump would get if it came with some reusable logic that could analyse your heap for you.

Memprof doesn't work with ruby 1.9+ but have you seen this presentation?

http://www.scribd.com/doc/30739474/Debugging-Ruby-with-MongoDB.

I think the nice thing about memprof is that it produces json that allows for reusable introspection of the data. The example mongo queries within the presentation should work for anyone's heap.

thx!

@Vasfed
Copy link
Owner

Vasfed commented Apr 27, 2013

Unfortunately there's no silver bullet. No one can tell if the object is actually leaked without understanding object model and what the program does in general.
Some global variables may store objects for long time and for purpose, while the same behaviour in other cases may be a leak.

The idea is to separate basic types and that of libraries from yours so that you have less noise. And compare what you observe with what you expect.
For example - you know that (if you debugging rails of similar) you have a one controller instance per request that should be deleted after request was processed - so make a counter for controllers and see which ones are not deleted.
You do not have to find the leak at once, go in steps, isolate parts of program etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants