Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task::spawn cause memory leak #702

Closed
gmg137 opened this issue Feb 11, 2020 · 9 comments
Closed

task::spawn cause memory leak #702

gmg137 opened this issue Feb 11, 2020 · 9 comments

Comments

@gmg137
Copy link

gmg137 commented Feb 11, 2020

I have a download function:

pub(crate) async fn download_music(url: &str, path: &str) -> Result<(), surf::Exception> {
    if url.starts_with("http://") || url.starts_with("https://") {
        let music_url = url.replace("https:", "http:");
        let client = surf::Client::new();
        let buffer = client.get(&music_url).recv_bytes().await?;
        if !buffer.is_empty() {
            fs::write(path, &buffer).await?;
        }
    }
    Ok(())
}

Executing with task :: spawn will cause a significant increase in memory usage and will not be released after the download is complete.

    for _ in 0..100 {
        task::spawn(async {
            let url = "https://***.mp3";
            download_music(url, "***.mp3").await.ok();
        });
    }

With task :: block_on there is no memory leak.

    for _ in 0..100 {
        task::block_on(async {
            let url = "https://***.mp3";
            download_music(url, "***.mp3").await.ok();
        });
    }
@ghost
Copy link

ghost commented Feb 11, 2020

Can you try the following?

task::spawn(async {});
for _ in 0..100 {
    task::block_on(async {
        let url = "https://***.mp3";
        download_music(url, "***.mp3").await.ok();
    });
}

And also this?

for _ in 0..100 {
    task::spawn(async {
        let url = "https://***.mp3";
        download_music(url, "***.mp3").await.ok();
    })
    .await;
}

I wonder if those two pieces of code also exhibit the memory leak?

@skade
Copy link
Collaborator

skade commented Feb 11, 2020

Can you quantify "significant"? task::spawn will start the runtime lazily on the first call, which will be kept around. This spins up the runtime threads. Given that threads on e.g. Linux use at least one memory page (4MB on my machine), you will see a jump depending on the number of cores your machine has. Those threads will be kept around, this is not a leak.

block_on will use the current thread to drive and not start the runtime.

Do you see a leak per spawn?

@gmg137
Copy link
Author

gmg137 commented Feb 12, 2020

@stjepang Have tested your code, the former did not find a memory leak, the latter still has a memory leak!

@gmg137
Copy link
Author

gmg137 commented Feb 12, 2020

@skade Hello, not every time spawn is called.
I tested downloading two files of different sizes separately:
File 1
File size: 4.3M
Starting memory: 709k
End of download memory: 59.1 M
Memory increase every time: 2-10M

File 2
File size: 462k
Starting memory: 704k
End of download memory: 5.1M
Memory increase per time: 0.2-1 M

According to observations, not every spawn call will increase the fixed memory.

@ghost
Copy link

ghost commented Feb 14, 2020

Can you perhaps try running with a different memory allocator? Two suggestions:

I wonder if the memory use is simply due to the allocator lazily returning memory back to the OS?

@gmg137
Copy link
Author

gmg137 commented Feb 15, 2020

@stjepang
I tested jemallocator and mamallocator
The results are as follows:
jemallocator
Memory footprint during download: 530 M
After downloading: 158 M

mimallocator
Memory footprint during download: 627 M
After downloading: 627 M (not free)

@gmg137
Copy link
Author

gmg137 commented Feb 15, 2020

I found that the following code also causes a memory leak

    task::spawn(async {
        for _ in 0..100 {
            let url = "https://***.mp3";
            download_music(url, "***.mp3").await.ok();
        }
    });

@ghost
Copy link

ghost commented Feb 15, 2020

Okay, I think I see what is happening. The allocator has a separate memory pool for each thread. At first, memory usage is low because the task is running on just one thread. However, then another thread steals the task and keeps running it.

Every time a new thread steals the task, the allocator will initialize its thread-local memory pool, and thus the total memory usage keeps growing. So this is not really a memory leak, it's just how allocators work.

I think the new scheduler would exhibit lower total memory usage: #631
The reason why I believe that is because the new scheduler steals tasks much less aggressively.

@gmg137
Copy link
Author

gmg137 commented Feb 16, 2020

@stjepang Thanks i think i understand.

@gmg137 gmg137 closed this as completed Feb 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants