# Monitoring of CARS memory consumption during full_res_dsm_pipeline step
This notebook shows how to load data and plot graph to monitor memory consumption during execution of CARS `full_res_dsm_pipeline` step with Dask.

## Necessary imports

In [1]:
import numpy as np
import matplotlib as mp
import matplotlib.pyplot as plt
import os

## Reading Data
* `output_dir` should be replaced by the output folder of the compute DSM step. 
* `nb_workers_per_pbs_jobs` is the number of workers process per pbs job (2 by defaults)
* `nb_pbs_jobs` is the number of pbs jobs (Number of workers divided by 'nb_workers_per_pbs_jobs')

In [2]:
output_dir = 'TODO'
nb_workers_per_pbs_job = 2
nb_pbs_jobs = 100

Next cell reads the data file if they are available for each worker. Note that data are updated as `compute_dsm goes`, so you can track updates by re-executing the cell.

In [9]:
data = []
for i in range(0,nb_workers_per_pbs_job):
    for j in range(0,nb_pbs_jobs):
        data_file = os.path.join(output_dir,'dask_log','memory_{}-{}.npy'.format(j,i))
        if os.path.isfile(data_file):
            data.append(np.load(data_file))

## Number of datasets in memory
The next cell shows the number of dataset (points clouds, rasters and total from left to right) for each worker.

In [None]:
fig, axes = plt.subplots(ncols=3,nrows=1,figsize=(20,5), sharey=True)
for d in data:
    axes[0].set_title("Number of points cloud datasets stored against time")
    axes[0].set_xlabel("Elapsed time in seconds")
    axes[0].set_ylabel("Number of datasets")
    axes[0].grid(True)
    axes[0].plot(d[:,0],d[:,1])
for d in data:
    axes[1].set_title("Number of rasters datasets stored against time")
    axes[1].set_xlabel("Elapsed time in seconds")
    axes[1].set_ylabel("Number of datasets")
    axes[1].grid(True)
    axes[1].plot(d[:,0],d[:,3])
for d in data:
    axes[2].set_title("Total number of datasets stored against time")
    axes[2].set_xlabel("Elapsed time in seconds")
    axes[2].set_ylabel("Number of datasets")
    axes[2].grid(True)
    axes[2].plot(d[:,0],d[:,3]+d[:,1])
    

## Estimated memory consumed by datasets
The next cell shows the estimated memory consumed by those datasets (points clouds, rasters and total from left to right) for each worker.

In [None]:
fig, axes = plt.subplots(ncols=3,nrows=1,figsize=(20,5), sharey=True)
for d in data:
    axes[0].set_title("Estimated points cloud used memory against time")
    axes[0].set_xlabel("Elapsed time in seconds")
    axes[0].set_ylabel("Memory in Mb")
    axes[0].grid(True)
    axes[0].plot(d[:,0],d[:,2]/1000000)
for d in data:
    axes[1].set_title("Estimated rasters used memory against time")
    axes[1].set_xlabel("Elapsed time in seconds")
    axes[1].set_ylabel("Memory in Mb")
    axes[1].grid(True)
    axes[1].plot(d[:,0],d[:,4]/1000000)
for d in data:
    axes[2].set_title("Total estimated memory for datasets against time")
    axes[2].set_xlabel("Elapsed time in seconds")
    axes[2].set_ylabel("Memory in Mb")
    axes[2].grid(True)
    axes[2].plot(d[:,0],(d[:,2]+d[:,4])/1000000)

## Memory used by workers processes
In addition, the above figure shows the memory consumed by the full python processes of workers, as estimated by ```psutils```. Left graph shows total estimated memory for datasets, for the sake of comparison.

In [None]:
fig, (ax0, ax1) = plt.subplots(ncols=2, nrows=1, figsize=(20,5), sharey=True)
for d in data:
    ax0.set_title("Worker process memory against time")
    ax0.set_xlabel("Elapsed time in seconds")
    ax0.set_ylabel("Memory in Mb")
    ax0.grid(True)
    ax0.plot(d[:,0],d[:,5]/1000000)
for d in data:
    ax1.set_title("Total estimated memory for datasets against time")
    ax1.set_xlabel("Elapsed time in seconds")
    ax1.set_ylabel("Memory in Mb")
    ax1.grid(True)
    ax1.plot(d[:,0],(d[:,2]+d[:,4])/1000000)