## Evaluation exercise for Google Summer of Code / Energy cost of scientific software 2023

In this exercise we will look at the profiling of a package I have written in Python, that performs data preprocessing and principal component analysis of monsoon data. The profiling is done with the help of a open-source tool, Scalene, a high-precision CPU and memory profiler. 

In [2]:
from mssa import preprocessing as pre
from mssa import mssa

# Load Scalene
%load_ext scalene

Scalene extension successfully loaded. Note: Scalene currently only
supports CPU+GPU profiling inside Jupyter notebooks. For full Scalene
profiling, use the command line version.


**Profile of the data preprocessing:**

In [8]:
%scrun pre.read_data("test-data/TRMM-GPM_pr_Indian_region_1998.nc")

In [10]:
%scrun rainfall_data = pre.moving_mean(pre.stack(pre.read_data("test-data/TRMM-GPM_pr_Indian_region_1998.nc")), 60, 'time')

In [14]:
!scalene --cli --reduced-profile mssa/preprocessing.py

            Memory usage: [32m■■■▀▀[0m (max: 30.196 MB, growth rate: 101%)            
      mssa/preprocessing.py: % of time = 100.00% (1.776s) out of 1.776s.       
       ╷       ╷       ╷       ╷        ╷       ╷               ╷       ╷      
 [1m      [0m│[1;34mTime[0m[1m  [0m[1m [0m│[1;34m––––––[0m[1m [0m│[1;34m––––––[0m[1m [0m│[1;32mMemory[0m[1m [0m[1m [0m│[1;32m––––––[0m[1m [0m│[1;32m–––––––––––[0m[1m   [0m[1m [0m│[1;33mCopy[0m[1m  [0m[1m [0m│[1m    [0m[1m [0m 
 [1m [0m[1;2mLine[0m[1m [0m│[1;3;34mPython[0m[1m [0m│[1;3;34mnative[0m[1m [0m│[1;3;34msystem[0m[1m [0m│[1;3;32mPython[0m[1m [0m[1m [0m│[1;3;32mpeak[0m[1m  [0m[1m [0m│[1;3;32mtimeline[0m[1;32m/%[0m[1m    [0m[1m [0m│[1;3;33m(MB/s)[0m[1m [0m│[1mmss…[0m[1m [0m 
╺━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━━━━━┿━━━━━━━┿━━━━━━━━━━━━━━━┿━━━━━━━┿━━━━━╸
 [2m [0m[2m   1[0m[2m [0m│[34m    9%[0m[34m [0m│[34m    6%[0m[34m [0m│[34m   2% [0

**Profile of the principal component analysis:**

In [13]:
%scrun mssa.pca(rainfall_data.dropna('time'))

In [15]:
!scalene --cli --reduced-profile mssa/mssa.py

            Memory usage: [32m▄■■▀▀[0m (max: 40.101 MB, growth rate: 100%)            
           mssa/mssa.py: % of time = 100.00% (5.651s) out of 5.651s.           
       ╷       ╷       ╷       ╷        ╷       ╷               ╷       ╷      
 [1m      [0m│[1;34mTime[0m[1m  [0m[1m [0m│[1;34m––––––[0m[1m [0m│[1;34m––––––[0m[1m [0m│[1;32mMemory[0m[1m [0m[1m [0m│[1;32m––––––[0m[1m [0m│[1;32m–––––––––––[0m[1m   [0m[1m [0m│[1;33mCopy[0m[1m  [0m[1m [0m│[1m    [0m[1m [0m 
 [1m [0m[1;2mLine[0m[1m [0m│[1;3;34mPython[0m[1m [0m│[1;3;34mnative[0m[1m [0m│[1;3;34msystem[0m[1m [0m│[1;3;32mPython[0m[1m [0m[1m [0m│[1;3;32mpeak[0m[1m  [0m[1m [0m│[1;3;32mtimeline[0m[1;32m/%[0m[1m    [0m[1m [0m│[1;3;33m(MB/s)[0m[1m [0m│[1mmss…[0m[1m [0m 
╺━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━━━━┿━━━━━━━━┿━━━━━━━┿━━━━━━━━━━━━━━━┿━━━━━━━┿━━━━━╸
 [2m [0m[2m   1[0m[2m [0m│[1;31m   60%[0m[34m [0m│[1;31m   26%[0m[34m [0m│[34m  14%