What happened:
My scripts would terminate after spitting out the following:
distributed.worker - WARNING - Memory use is high but worker has no data to store to disk. Perhaps some other process is leaking memory? Process memory: yyy GB -- Worker memory limit: xxx GB
I tried increasing permissible memory limits but this would simply occur at a different time point. After profiling the function that was passed to client.submit using memory_profiler, there appeared to be occasional, incremental use of memory at the line containing time.sleep(zzz). If I comment out time.sleep I never get the memory leak warning for my script. However, for my current task at hand, I require the sleep inside the function passed to client.submit.
Minimal Complete Verifiable Example:
This should be a minimal, barebones example replicating the original script's design/flow.
import time
import psutil
from distributed import Client
def dummyfunc(x, verbose=False):
time.sleep(x)
if verbose:
mem = psutil.Process().memory_info().rss / 1024 ** 2
print("Worker #{:<2} - Mem: {:<.4f} MB".format(x, mem))
def is_workers_available(futures, client):
if len(futures) >= sum(client.nthreads().values()):
return False
return True
if __name__ == "__main__":
n_workers = 3
runlen = 20
client = Client(n_workers=n_workers, processes=True, threads_per_worker=1)
futures = []
counter = 0
while counter < runlen:
if is_workers_available(futures, client):
futures.append(client.submit(dummyfunc, counter))
mem = psutil.Process().memory_info().rss / 1024 ** 2
print("Parent #{:<2} - Mem: {:<.4f} MB".format(counter, mem))
counter += 1
futures = [future for future in futures if not future.done()]
client.close()
print("Terminating...")
Sample output:
Parent #0 - Mem: 97.8398 MB
Parent #1 - Mem: 97.8633 MB
Parent #2 - Mem: 97.8633 MB
Parent #3 - Mem: 97.8633 MB
Parent #4 - Mem: 98.8164 MB
Parent #5 - Mem: 98.8164 MB
Parent #6 - Mem: 98.9648 MB
Parent #7 - Mem: 98.9648 MB
Parent #8 - Mem: 98.9648 MB
Parent #9 - Mem: 99.1953 MB
Parent #10 - Mem: 99.2578 MB
Parent #11 - Mem: 99.2578 MB
Parent #12 - Mem: 99.5430 MB
Parent #13 - Mem: 99.5430 MB
Parent #14 - Mem: 99.8008 MB
Parent #15 - Mem: 99.8008 MB
Parent #16 - Mem: 100.0781 MB
Parent #17 - Mem: 100.0781 MB
Parent #18 - Mem: 100.3477 MB
Parent #19 - Mem: 100.7656 MB
Terminating...
I am not sure if this is expected behaviour for mem to keep growing. In my case, when sleeps are for longer durations (>80s), the memory blows up quite fast.
The above code snippet was tested on the two environments below:
Environment (1):
- Dask version: 2.27.0
- Python version: 3.6.9
- Operating System: Ubuntu
- Install method (conda, pip, source): pip
Environment (2):
- Dask version: 2.30.0
- Python version: 3.7.9
- Operating System: Ubuntu
- Install method (conda, pip, source): pip
Additional info:
The following maybe related:
What happened:
My scripts would terminate after spitting out the following:
distributed.worker - WARNING - Memory use is high but worker has no data to store to disk. Perhaps some other process is leaking memory? Process memory: yyy GB -- Worker memory limit: xxx GBI tried increasing permissible memory limits but this would simply occur at a different time point. After profiling the function that was passed to
client.submitusingmemory_profiler, there appeared to be occasional, incremental use of memory at the line containingtime.sleep(zzz). If I comment outtime.sleepI never get the memory leak warning for my script. However, for my current task at hand, I require the sleep inside the function passed toclient.submit.Minimal Complete Verifiable Example:
This should be a minimal, barebones example replicating the original script's design/flow.
Sample output:
I am not sure if this is expected behaviour for
memto keep growing. In my case, when sleeps are for longer durations (>80s), the memory blows up quite fast.The above code snippet was tested on the two environments below:
Environment (1):
Environment (2):
Additional info:
The following maybe related: