Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String concatenation memory leak #389

Open
flossy83 opened this issue Feb 21, 2024 · 17 comments
Open

String concatenation memory leak #389

flossy83 opened this issue Feb 21, 2024 · 17 comments

Comments

@flossy83
Copy link

As referenced here

ColorBars().ConvertToYV12().KillAudio()

ScriptClip(last, "Leak(last, current_frame)", after_frame=true, local=false)

function Leak(clip c, int current_frame){

##################################### doesn't leak ####################################
/* string = 
\ "stringstringstringstringstringstringstringstringstringstringstringstringstringstring
\  stringstringstringstringstringstringstringstringstringstringstringstringstringstring
\  stringstringstringstringstringstringstringstringstringstringstringstringstringstring
\  stringstringstringstringstringstringstringstringstringstringstringstringstringstring
\  stringstringstringstringstringstringstringstringstringstringstringstringstringstring
\  stringstringstringstringstringstringstringstringstringstringstringstringstringstring
\  stringstringstringstringstringstringstringstringstringstringstringstringstringstring
\  stringstringstringstringstringstringstringstringstringstringstringstringstringstring
\  stringstringstringstringstringstringstringstringstringstringstringstringstringstring
\  stringstringstringstringstringstringstringstringstringstringstringstringstringstring
\  stringstringstringstringstringstringstringstringstringstringstringstringstringstring
\  stringstringstringstringstringstringstringstringstringstringstringstringstringstring
\  stringstringstringstringstringstringstringstringstringstringstringstringstringstring
\  stringstringstringstringstringstringstringstringstringstringstringstringstringstring" 
*/

#################################### leaks 1MB/sec ####################################
string = 
\ "string" + "string" + "string" + "string" + "string" + "string" + "string" + "string" +
\ "string" + "string" + "string" + "string" + "string" + "string" + "string" + "string" +
\ "string" + "string" + "string" + "string" + "string" + "string" + "string" + "string" +
\ "string" + "string" + "string" + "string" + "string" + "string" + "string" + "string" +
\ "string" + "string" + "string" + "string" + "string" + "string" + "string" + "string" +
\ "string" + "string" + "string" + "string" + "string" + "string" + "string" + "string" +
\ "string" + "string" + "string" + "string" + "string" + "string" + "string" + "string" +
\ "string" + "string" + "string" + "string" + "string" + "string" + "string" + "string" +
\ "string" + "string" + "string" + "string" + "string" + "string" + "string" + "string" +
\ "string" + "string" + "string" + "string" + "string" + "string" + "string" + "string" +
\ "string" + "string" + "string" + "string" + "string" + "string" + "string" + "string" +
\ "string" + "string" + "string" + "string" + "string" + "string" + "string" + "string" +
\ "string" + "string" + "string" + "string" + "string" + "string" + "string" + "string" 

c
}
@pinterf
Copy link

pinterf commented Feb 21, 2024

Until a deeper investigation, as a memo: concatenation is done here:
https://github.com/AviSynth/AviSynthPlus/blob/master/avs_core/core/parser/expression.cpp#L345

I suppose the "memory leak" (string heap increase) is more expressed when you have so many additions. The function VSprintf always makes a copy of the new values atop the string heap as seen here:

https://github.com/AviSynth/AviSynthPlus/blob/master/avs_core/core/avisynth.cpp#L1504

First "string"+"string", then "stringstring" + "string", then "stringstringstring" + "string", etc. are stored. The above example contains 104 elements, 6, 12, 18, 24, ... 624 bytes (over 32000 bytes plus overhead) are stored on the heap after the additions, and it seems that this is done in each function call during the function body evaluation.

@flossy83
Copy link
Author

flossy83 commented Feb 26, 2024

Random idea: add a new Internal String Function like eg. CatStr(string) which uses different code to do the concatenation without without risking breaking any + operator functionality.

@flossy83
Copy link
Author

Oh Format can do concatenation:

  s1="string1"  s3="string3"
# cat = s1 + "string2" + s3 + "string4"
  cat = Format("{s1}{}{s3}{}", "string2", "string4") 

  s5="string5"  s7="string7"
# cat = cat + "\n" + s5 + "string6" + s7 + "string8
  cat = Format("{cat}{}{s5}{}{s7}{}", "\n", "string6", "string8") 

result:
string1string2string3string4
string5string6string7string8

Memory usage seems to depend on how you use it...

#################################### leaks 1MB/sec ####################################
cat = "string"
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")		cat = Format("{cat}{}", "string") 


#################################### leaks 20kB/sec ###################################
/* 
cat = Format("{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}",
\ "string", "string", "string", "string", "string", "string", "string", "string",
\ "string", "string", "string", "string", "string", "string", "string", "string",
\ "string", "string", "string", "string", "string", "string", "string", "string",
\ "string", "string", "string", "string", "string", "string", "string", "string",
\ "string", "string", "string", "string", "string", "string", "string", "string",
\ "string", "string", "string", "string", "string", "string", "string", "string",
\ "string", "string", "string", "string", "string", "string", "string", "string",
\ "string", "string", "string", "string", "string", "string", "string", "string",
\ "string", "string", "string", "string", "string", "string", "string", "string",
\ "string", "string", "string", "string", "string", "string", "string", "string",
\ "string", "string", "string", "string", "string", "string", "string", "string",
\ "string", "string", "string", "string", "string", "string", "string", "string",
\ "string", "string", "string", "string", "string", "string", "string", "string")  */

@pinterf
Copy link

pinterf commented Feb 27, 2024

Yep, it depends on when strings are pushed on the (always incrementing) heap. In the second - small leak - case the string constants already exist they are probably evaluated once, during function instantiating. There is one single result (cat=) that must be pushed on the heap (and the 8*13 "string" of course). Inside "Format" there is no extra memory penalty since the temporary results are c++ std::string and not real Avisynth "AVSValue".

The first - big leak - one is pushing each intermediate result on the string heap, since each evaluation is a new AVSValue which must be stored.

I had a deeper look into the parser, but I could not find a way how to recognise when multiple string constants are added together and replace them with a single multi-addition.

@flossy83
Copy link
Author

How come math operations like a = b + c don't leak? Are the numbers not pushed onto the heap as well?

I thought at the end of the ScriptClip all local variables would be cleared anyway since they are no longer accessible at the next frame (which is why I have to make all my runtime vars globals so they don't disappear at the next frame).

In any case I think using Format to join strings is a good enough solution and maybe it's not worth pulling apart the Avisynth core to try and get the root cause, unless you feel motivated enough to do it that is.

@pinterf
Copy link

pinterf commented Feb 29, 2024

Strings are pointers to buffers. Avisynth does not know if a pointer points to a buffer that exists forever (e.g. static constant in program code) or just exists in a temporary buffer. To be sure, Avisynth always has to make a safe copy from all strings that would be later used in script. Avisynth cannot determine by just a pointer if the specific memory area is volatile or not. The heap (better said: string store) is ever increasing. If this happens in runtime, with significant amount of new strings per frame than we'll see a more serious memory consumption.

@Asd-g
Copy link

Asd-g commented Apr 9, 2024

The recent days I experienced this behavior. I had to store Max(RplaneMax, GPlaneMax, BPlaneMax) (1) for every frame and then sort the values ascending. The easiest way to sort values I found is RT_TxtSort (Function to sort a Chr(10) separated multiline string.) I stored the value (1) for every frame in such Chr(10) separated multiline string. The memory usage was acceptable for a clip with few hundred frames but the memory usage was ~24GB after ~70000 frames. I decided to store the values (1) in temporary file because then easily they can be loaded as a Chr(10) separated multiline string.

@pinterf
Copy link

pinterf commented Apr 11, 2024

I experimented with a string cache, which helps in the topic starter script.

Before

Frames processed:                   43460 (0 - 43459)
FPS (min | max | average):          373.7 | 1837 | 1665
Process memory usage (max):         1419 MiB

After

Frames processed:                   42140 (0 - 42139)
FPS (min | max | average):          369.9 | 1741 | 1626
Process memory usage (max):         60 MiB

The speed is a bit slower in this specific test. The extra check whether the strings are in the cache, require a little more time. Despite of this example is an artificial one, this solution probably hugely helps in such cases when most of the strings in ScriptClip are repeatable.

@pinterf
Copy link

pinterf commented Apr 15, 2024

Test build, x64, no commit yet, I'd just like to see how it is working on your side in your usual workflow.
Crossposted to #379
See readme txt.
https://drive.google.com/uc?export=download&id=1IznUhi6-7o8bRJoGHQsF6zAaBWJkKeNg

@flossy83
Copy link
Author

flossy83 commented Apr 16, 2024

Test build, x64, no commit yet, I'd just like to see how it is working on your side in your usual workflow. Crossposted to #379 See readme txt. https://drive.google.com/uc?export=download&id=1IznUhi6-7o8bRJoGHQsF6zAaBWJkKeNg

I can report the synthetic test in the OP is resolved on my end.

As far as my real world usage goes the issue persists. Possibly as I'm concatenating in a different way inside my ScriptClips, eg. string = String(some_int) + "some text" + String(some_float) + "some more text". I'll try and provide you with a demo script if you're motivated to continue with this. I am satisfied with the Format workaround and have already converted all my scripts to use Format instead of +, but can help with this issue further if needed.

@pinterf
Copy link

pinterf commented Apr 16, 2024

Yes, in cases, where the string is built from formatted numbers, which are probably different for each frame, then my method would not help (unless integers 16-240 are involved, because they form a finite number of the possible strings). This change does not do any harm otherwise, and sometimes it helps a lot, so I'm gonna keep this mod. Thanks, I don't need more demo scripts, I see the problem of the other use cases.

@pinterf
Copy link

pinterf commented Apr 16, 2024

Anyway, I wonder if Asd-g's 24 GB@70000 frames has been decreased a bit or not.

@Asd-g
Copy link

Asd-g commented Apr 16, 2024

No change.

Just for info:

  • This is the draft I use for the test.
  • This is the used sample.
  • This is the script (requires also RT_Stats, avsresize):
FFVideoSource("Sony 4K HDR Demo - Camping in Nature.mp4")
Loop(20)
HDR10_Content_Metadata()

@pinterf
Copy link

pinterf commented Apr 17, 2024

Hi, downloaded the sample, despite the zip file name "Sony Camping In Nature HDR 4K Demo.zip", inside there is a "Sony Camp 4K Demo.mp4" file instead of "Sony 4K HDR Demo - Camping in Nature.mp4" you are using in the script.
And the script exits with "HDR10_Content_Metadata: the clip must have _Transfer frame property 16.". I don't know if they are really different and you just renamed the original one?
I'm using a latest https://github.com/FFMS/ffms2/releases/tag/5.0-RC3

@Asd-g
Copy link

Asd-g commented Apr 17, 2024

Sorry, from here you can download Sony 4K HDR Demo - Camping in Nature.mp4.

This is the ffms2 version I use.

@pinterf
Copy link

pinterf commented Apr 17, 2024

@Asd-g
Copy link

Asd-g commented Apr 18, 2024

Thanks, I will test it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants