Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory consumption when more than 8M files count. #28

Closed
jurajazz opened this issue Sep 30, 2020 · 10 comments
Closed

High memory consumption when more than 8M files count. #28

jurajazz opened this issue Sep 30, 2020 · 10 comments

Comments

@jurajazz
Copy link

Hi,

I'm using diskusage.exe for more than one year to monitor most consumming directories. It is very simple and handy tool.
However while scanning of one my disk (with logs) containing:
Overall info:
Total time: 4h35m55.7915268s
Total dirs: 499520
Total files: 8887953
Total links: 0
Total size: 10.45 Tb
diskusage.exe (on Windows 64bit) consumes more than 3GB of RAM. It tooks also more than 4 hours.

Is there some way how to optimize at least - memory consumption or also the time needed?

Jurajazz

@aleksaan
Copy link
Owner

aleksaan commented Oct 2, 2020

You're probably right. I have not done memory optimization yet.

@jurajazz
Copy link
Author

jurajazz commented Oct 2, 2020

I use '-depth 1', so in this case the program needs to memory only the first level while scanning. I think this should not require too much memory. If it would be freed while scanning (e.g. based on -depth level), it would definitly help to keep the memory low. I believe it can be solved easily.

For information - here is callstack after exhausting all free 4GB of RAM while scanning:
diskusage-log-when-memory-exhausted.txt

All the best.

@aleksaan
Copy link
Owner

aleksaan commented Oct 2, 2020

I think this should not require too much memory.

That isn't. Depth defines only what results will be printed, but it's full scan must be done to know size of the root directory.

@jurajazz
Copy link
Author

jurajazz commented Oct 4, 2020

Yes, full scan must be done, however when scan of one branch is finished, all nodes below '-depth x' could be forgotten (free), because they are not considered/used later in printing phase. This can save a big amount of memory while scanning.

Example of directory structure and printing with '- level 2':
a

  • a1
    -- a11
    -- a12
    --- a121
  • a2
    -- a21
    -- a22
  • a3

b

  • b1
    -- b11
  • b2
  • b3

c

  • c1

  • c2
    -- c21

  • nodes a11,a12,a121 can be forgotten (free) at finishing of scan of a1 branch, because a1 holds all needed information for printing of a,a1.

  • nodes a21,a22 could be forgotten (free) after scanning of a2 branch

@aleksaan
Copy link
Owner

aleksaan commented Oct 4, 2020

You are right! But it's only memory optimization, not time decreazing.

@jurajazz
Copy link
Author

jurajazz commented Oct 4, 2020

  1. Yes, my suggestion points only the memory issue, that block me using the diskusage from using it for by logs directories. Implementation of memory optimization would help me to use the diskusage for monitoring.

  2. For time optimization, you can use one of the nice feature of golang, the go function. E.g. scanning of each root directory can run in separate thread (go function) or for other branches, however it would probably require some limiting and synchronization.
    https://www.youtube.com/watch?v=Zg7GK759ZzA

Nice approach for using multiple threads for scanning the disk tree is the Windirstat - https://windirstat.net/, that runs as multithread, with limit of threads.

@aleksaan
Copy link
Owner

aleksaan commented Oct 21, 2020

@jurajazz , hi

can you just test a new version before I publish it?
diskusage.zip

I made files metrics will not save in memory if real depth of file upper than parameter "depth". And, there was added system memory allocated metric to results

I didnt especially time optimization, I think execution time most related to disk speed than a parrallels computations

@aleksaan
Copy link
Owner

@jurajazz please see above

@jurajazz
Copy link
Author

Hi Alexander,

I started to use linux command du compiled for windows (as part of git package), which does the similar with minimum memory consumptions.

I can test also your new version with optimized memory, however instead of compiled executable, I would prefer to compile it myself for security reasons. Could you please commit the changes of source codes - e.g. into special branch or just zip the source codes. It would also save me some time if you could describe the way you suggest to compile it on Windows.

Juraj

@aleksaan
Copy link
Owner

aleksaan commented Oct 28, 2020

o, ok, sources is updated
and read readme - I've added a new metric - Total used memory
Please compare this metric within actual memory consumption

In my case It was 203Mb without optimization & 28Mb with this one (for 101.62 Gb disk, depth=2, limit=20)
And there is some side effect of reducing total time from 1.45min to 1.15min

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants