# Memory Profiler

This is a python module for monitoring memory consumption of a process as well as line-by-line analysis of memory consumption for python programs. It is a pure python module which depends on the **[psutil](http://pypi.python.org/pypi/psutil)** module. It is one kind of profiling where we measure space complexity (memory consumption) of a program/process. <br>

An important thing to remember is that memory-profiler itself consumes a **significant amount of memory**. Use this only in development but avoid it in production. <br>
**Ways to Profile Memory Usage Of Python Code using "memory_profiler"** <br>
- **@profile Decorator** - Used to profile memory usage of individual Python functions. Provide statistics showing memory usage by an individual line of python code.
- **mprof Shell/Command Line Command** - Used to profile memory usage of whole Python script (".py" file) as a function of time. It'll let us analyze memory usage during code run time rather than by individual line of code.
- **memory_usage() function** - Used to profile memory usage of process, python statements, and Python functions for a specified time interval.
- **mprun & memit cell/line Magic commands of Jupyter notebook** - Used to profile memory usage of individual python statement or code of whole cell in Jupyter Notebook.

**To Know more about memory_profiler visit the [github_repo](https://github.com/pythonprofilers/memory_profiler)**

## Installation

Install via pip::

    $ pip install -U memory_profiler

The package is also available on `conda-forge
<https://github.com/conda-forge/memory_profiler-feedstock>`

To install from source, download the package, extract and type:

    $ pip install .

## 1. Example to use @profile Decorator

Use @profile decorator above the function you want to profile. <br> 
Create a file **example1.py** with below code and **uncomment** the code<br>

In [1]:
# from memory_profiler import profile
# import random

# @profile
# def random_number_generator():
#     arr1 = [random.randint(1,10) for i in range(100000)]
#     arr2 = [random.randint(1,10) for i in range(100000)]
#     arr3 = [arr1[i]+arr2[i] for i in range(100000)]
#     del arr1
#     del arr2
#     tot = sum(arr3)
#     del arr3
#     print(tot)

# if __name__ == "__main__":
#     random_number_generator()

In [2]:
!python -m memory_profiler example1.py

1101973
Filename: example1.py

Line #    Mem usage    Increment  Occurrences   Line Contents
     4     41.2 MiB     41.2 MiB           1   @profile
     5                                         def random_number_generator():
     6     42.5 MiB      1.3 MiB      100003       arr1 = [random.randint(1,10) for i in range(100000)]
     7     43.4 MiB      0.9 MiB      100003       arr2 = [random.randint(1,10) for i in range(100000)]
     8     44.2 MiB      0.9 MiB      100003       arr3 = [arr1[i]+arr2[i] for i in range(100000)]
     9     43.5 MiB     -0.8 MiB           1       del arr1
    10     42.7 MiB     -0.8 MiB           1       del arr2
    11     42.7 MiB      0.0 MiB           1       tot = sum(arr3)
    12     42.0 MiB     -0.8 MiB           1       del arr3
    13     42.0 MiB      0.0 MiB           1       print(tot)




The output has 5 columns:

- **Line #**: Line Number
- **Line Contents**: Python code at each line number
- **Mem usage**: Memory usage by the Python interpreter after every execution of the line.
- **Increment**: Difference in memory consumption from the current line to the last line. It basically denotes the memory consumed by a particular line of Python code.
- **Occurrences**: Number of times a particular line of code is executed.

Mem Usage can be tracked to observe the total memory occupancy by the Python interpreter, whereas the Increment column can be observed to see the memory consumption for a particular line of code. By observing the memory usage one can optimize the memory consumption to develop a production-ready code. This gives us the best idea of how much memory in total is getting used and how much a particular variable is using for better decision-making.

### Example to use high precision and save results to log file

Create a file **example2.py** with below code and **uncomment** the code<br>

In [3]:
# from memory_profiler import profile
# import random
# fp = open("./Memory_Profiler/example_report.log", "w+")
# @profile(precision=4,stream=fp)
# def random_number_generator():
#     arr1 = [random.randint(1,10) for i in range(100000)]
#     arr2 = [random.randint(1,10) for i in range(100000)]
#     arr3 = [arr1[i]+arr2[i] for i in range(100000)]
#     del arr1
#     del arr2
#     tot = sum(arr3)
#     del arr3
#     print(tot)

# if __name__ == "__main__":
#     random_number_generator()

In [4]:
!python -m memory_profiler example2.py

1101948


To view results, use **cat** command

In [5]:
!cat "example_report.log"

Filename: example2.py

Line #    Mem usage    Increment  Occurrences   Line Contents
     4  41.2031 MiB  41.2031 MiB           1   @profile(precision=4,stream=fp)
     5                                         def random_number_generator():
     6  42.7539 MiB   1.5508 MiB      100003       arr1 = [random.randint(1,10) for i in range(100000)]
     7  43.4219 MiB   0.6680 MiB      100003       arr2 = [random.randint(1,10) for i in range(100000)]
     8  44.2266 MiB   0.8047 MiB      100003       arr3 = [arr1[i]+arr2[i] for i in range(100000)]
     9  43.5273 MiB  -0.6992 MiB           1       del arr1
    10  42.7617 MiB  -0.7656 MiB           1       del arr2
    11  42.7617 MiB   0.0000 MiB           1       tot = sum(arr3)
    12  41.9961 MiB  -0.7656 MiB           1       del arr3
    13  41.9961 MiB   0.0000 MiB           1       print(tot)




### Example to use **memory_profiler** for **intelligent_indexing** ref kit 

The **[Intelligent Indexing](https://github.com/oneapi-src/intelligent-indexing)** ref kit demonstrates one way of building an NLP pipeline for classifying documents to their respective topics and describe how we can leverage the **Intel® AI Analytics Toolkit (AI Kit)** to accelerate the pipeline.

**Intel® AI Analytics Toolkit (AI Kit)** is used to achieve quick results even when the data for a model are huge. It provides the capability to reuse the code present in different languages so that the hardware utilization is optimized to provide these results.

The **Intelligent Indexing** ref kit has different Intel® oneAPI optimizations enabled like:
- **[Intel® Distribution of Modin*](https://www.intel.com/content/www/us/en/developer/tools/oneapi/distribution-of-modin.html#gs.v03x2l)**
The Intel® Distribution of Modin* is a performant, parallel, and distributed dataframe system that is designed around enabling data scientists to be more productive. It provides drop-in acceleration to your existing **pandas** workflows. No upfront cost to learning a new API. Integrates with the Python* ecosystem. Seamlessly scales across multicores with Ray* and Dask* clusters (run on and with what you have)
- **[Intel® Extension for Scikit-learn*](https://www.intel.com/content/www/us/en/developer/tools/oneapi/scikit-learn.html)**
Designed for data scientists, Intel® Extension for Scikit-Learn* is a seamless way to speed up your Scikit-learn applications for machine learning to solve real-world problems. This extension package dynamically patches scikit-learn estimators to use Intel® oneAPI Data Analytics Library (oneDAL) as the underlying solver, while achieving the speed up for your machine learning algorithms out-of-box.

**NOTE** Please visit the **[Intelligent Indexing](https://github.com/oneapi-src/intelligent-indexing)** Ref kit page to know more about the kit.
- Please follow the steps in github repo to clone and create the environment.
- After creating environment install **memory_profiler** in both the environments **doc_class_stock** and **doc_class_intel** using
```
pip install -U memory_profiler
```
We will be using **memory_profiler** to profile this workload below.

Just add the **@profile** decorator at the top of functions you want to profile. <br>
Modify the **run_benchmarks.py (Location '../intelligent-indexing/src/run_benchmarks.py')** file to add **@profile** decorators as shown below <br>

**get_data() function**
```
from memory_profiler import profile
@profile(precision=4,stream = fp)
def get_data(path_to_csv: str) -> pd.DataFrame:
    """Read in and clean data
    Args:
        path_to_csv (str): processed data
    """
    data = pd.read_csv(path_to_csv)[
        ['category', 'headline', 'short_description', 'link']
    ]
    data = data.dropna(subset=['headline', 'short_description', 'link'])

    data.link = data.link.apply(clean_link)
    data.short_description = data.short_description \
        .apply(clean_short_description)
    data.headline = data.headline.apply(clean_headline)

    data['text'] = data.link + " " + data.short_description \
        + " " + data.headline
    data['tokens'] = data.text.apply(tokenize)
    return data
```
<br>

**Create a function train_data()**
```
@profile(precision=4,stream = fp)
def train_data(train,test):
    vectorizer = TfidfVectorizer(
    min_df=50,
    lowercase=False,
    tokenizer=lambda x: x)
                                    
    svc = SVC()
    svc.fit(vectorizer.fit_transform(train.tokens), train.category)
    training_time = time.time()
    y_pred = svc.predict(vectorizer.transform(test.tokens))
    return svc, training_time, y_pred
```

**NOTE** : To save log files at appropriate locations. Add  <br>
**While profiling stock**
```
fp = open("../../Profiling_Guide/Memory_Profiler/Memory_Profiler_Results/stock_results/stock_report.log", "w+") ## For stock results
```
**While profiling intel extension**
```
fp = open("../../Profiling_Guide/Memory_Profiler/Memory_Profiler_Results/oneapi_optimized_results/intel_report.log", "w+") ## For oneapi_optimized_results
```

#### Profile Intelligent Indexing Ref Kit with Stock packages

To run the profiler on the intelligent indexing ref kit <br>
- Navigate to directory **intelligent-indexing/src/** in terminal
- ```conda activate doc_class_stock```
- execute the below commands

In [None]:
# python -m memory_profiler run_benchmarks.py -l "../logs/stock_stock.log"

To visualize the results execute the below command from the **intelligent-indexing/src/** directory

In [None]:
# cat "../../Profiling_Guide/Memory_Profiler/Memory_Profiler_Results/stock_results/stock_report.log"

#### Profile Intelligent Indexing Ref Kit with Intel oneAPI optimized packages

To run the profiler on the intelligent indexing ref kit <br>
- Navigate to directory **intelligent-indexing/src/** in terminal
- ```conda activate doc_class_intel```
- execute the below commands

In [None]:
# python -m memory_profiler run_benchmarks.py -i -l "../logs/intel_intel.log"

To visualize the results execute the below command from the **intelligent-indexing/src/** directory

In [None]:
# cat "../../Profiling_Guide/Memory_Profiler/Memory_Profiler_Results/oneapi_optimized_results/intel_report.log"

## 2. Example to use **mprof** 

Use @profile decorator above the function you want to profile. <br> 
Create a file **example1.py** with below code and **uncomment** the code<br>

In [6]:
# from memory_profiler import profile
# import random

# @profile
# def random_number_generator():
#     arr1 = [random.randint(1,10) for i in range(100000)]
#     arr2 = [random.randint(1,10) for i in range(100000)]
#     arr3 = [arr1[i]+arr2[i] for i in range(100000)]
#     del arr1
#     del arr2
#     tot = sum(arr3)
#     del arr3
#     print(tot)

# if __name__ == "__main__":
#     random_number_generator()

In [7]:
!mprof run example1.py

mprof: Sampling memory every 0.1s
running new process
running as a Python program...
1098869
Filename: example1.py

Line #    Mem usage    Increment  Occurrences   Line Contents
     4     41.1 MiB     41.1 MiB           1   @profile
     5                                         def random_number_generator():
     6     42.5 MiB      1.4 MiB      100003       arr1 = [random.randint(1,10) for i in range(100000)]
     7     43.4 MiB      0.9 MiB      100003       arr2 = [random.randint(1,10) for i in range(100000)]
     8     44.1 MiB      0.7 MiB      100003       arr3 = [arr1[i]+arr2[i] for i in range(100000)]
     9     43.4 MiB     -0.7 MiB           1       del arr1
    10     42.6 MiB     -0.8 MiB           1       del arr2
    11     42.6 MiB      0.0 MiB           1       tot = sum(arr3)
    12     41.9 MiB     -0.8 MiB           1       del arr3
    13     41.9 MiB      0.0 MiB           1       print(tot)




The above command will execute the script and generate the new file by name **mprofile_[current_datetime].dat**

mprof command has different parameters
- **--interval INTERVAL or -T INTERVAL** - As we had mentioned earlier, "mprof" records memory usage every "0.1" second by default. We can override this setting using this parameter. We can give time interval in seconds here.
- **--timeout TIMEOUT or -t TIMEOUT** - By default, "mprof" monitors total execution of program/process. We can instruct it to stop monitoring after a specified amount of time using this parameter. It let us specify a time in seconds.
- **--output FILENAME or -o FILENAME** - We can direct the result of profiling to an output file using this command. By default, "mprof" creates a file named "mprofile_datetime.dat". We can override that using this argument.
- **--backend BACKEND** - This command let us specify backend for profiling. The default is "psutil" as we had mentioned a few times earlier. We would recommend users to stick to default as other backends do not seem reliable yet.
- **--include-children** - It monitors memory usage across all children of process and shows their usage as one line chart.
- **--multiprocess** - It generates a sample line chart for each sub-process and their memory usage per time.

In [9]:
!mprof plot -o mprof_example1.png

Using last profile data.


mprof plot has different parameters like
- **-o** :  To save plot to output file
- **-t** : To give title to plot <br>
It automatically takes the latest .dat file created by mprof

### Example to use **mprof** for **intelligent_indexing** ref kit 

Just add the **@profile** decorator at the top of functions you want to profile. <br>
Modify the run_benchmarks.py file to add **@profile** decorators as shown in above example for **intelligent-indexing**<br>


#### Profile Intelligent Indexing Ref Kit with Stock packages

To run the profiler on the intelligent indexing ref kit <br>
- Navigate to directory **intelligent-indexing/src/** in terminal
- ```conda activate doc_class_stock```
- execute the below commands

In [None]:
# mprof run --output '../../Profiling_Guide/Memory_Profiler/Memory_Profiler_Results/stock_results/stock_output.dat' --python python run_benchmarks.py -l "../logs/stock_stock.log"

To visualize the results execute the below command from the **Memory_Profiler/Memory_Profiler_Results/stock_results/** directory

In [None]:
# mprof plot -o stock_output.png

#### Profile Intelligent Indexing Ref Kit with Intel oneAPI optimized packages

To run the profiler on the intelligent indexing ref kit <br>
- Navigate to directory **intelligent-indexing/src/** in terminal
- ```conda activate doc_class_intel```
- execute the below commands

In [None]:
# mprof run --output '../../Profiling_Guide/Memory_Profiler/Memory_Profiler_Results/oneapi_optimized_results/intel_output.dat' --python python run_benchmarks.py -i -l "../logs/intel_intel.log"

To visualize the results execute the below command from the **Memory_Profiler/Memory_Profiler_Results/oneapi_optimized_results/** directory

In [None]:
# mprof plot -o intel_output.png

## 3. Example to use **memory_usage()**

In [10]:
from memory_profiler import memory_usage
import time
import numpy as np

def random_generator(sz=1000):
    time.sleep(5)
    arr1 = np.random.randint(1,100, size=(sz, sz))
    avg = arr1.mean()
    return avg

In [12]:
mem_usage = memory_usage((random_generator, (1000,), ), timestamps=True, interval=0.1)
mem_usage

[(65.984375, 1682322349.1823976),
 (66.21875, 1682322349.2092044),
 (66.21875, 1682322349.3104758),
 (66.21875, 1682322349.4110591),
 (66.21875, 1682322349.511631),
 (66.21875, 1682322349.612216),
 (66.21875, 1682322349.7127843),
 (66.21875, 1682322349.8133283),
 (66.21875, 1682322349.9138525),
 (66.21875, 1682322350.0143762),
 (66.21875, 1682322350.1149023),
 (66.21875, 1682322350.2154295),
 (66.21875, 1682322350.3159516),
 (66.21875, 1682322350.416493),
 (66.21875, 1682322350.517037),
 (66.21875, 1682322350.6175709),
 (66.21875, 1682322350.718097),
 (66.21875, 1682322350.8186202),
 (66.21875, 1682322350.9191444),
 (66.21875, 1682322351.019676),
 (66.21875, 1682322351.120203),
 (66.21875, 1682322351.220736),
 (66.21875, 1682322351.32128),
 (66.21875, 1682322351.4218082),
 (66.21875, 1682322351.5223436),
 (66.21875, 1682322351.6228833),
 (66.21875, 1682322351.7234464),
 (66.21875, 1682322351.824025),
 (66.21875, 1682322351.9245908),
 (66.21875, 1682322352.0251849),
 (66.21875, 16823223

To Profile using **memory_usage**. We need one of the following
- **Process**: We need to provide process id as an integer or string.
- **Python Function**: The function followed by its arguments needs to be provided as a tuple. <br>

In the above example we use
- **Python function**: random_generator
- **interval**: 0.1 seconds
- **timestamps**: True, will return the timestamps at which memory_usage was recorded

## 4. Example to use **mprun** and **memit** magic line commands

The memory_profiler provides 2 line magic commands and 2 cell magic commands to be used in jupyter notebooks.

- **Line Magic Commands**: %mprun & %memit
- **Cell Magic Commands**: %%mprun & %%memit

To **enable memory_profiler** in jupyter notebook, load the extension 

In [13]:
%load_ext memory_profiler

### Create an example3.py with following code

In [14]:
# import time
# import numpy as np

# @profile
# def very_slow_random_generator():
#     time.sleep(5)
#     arr1 = np.random.randint(1,100, size=(1000,1000))
#     avg = arr1.mean()
#     return avg

# @profile
# def slow_random_generator():
#     time.sleep(2)
#     arr1 = np.random.randint(1,100, size=(1000,1000))
#     avg = arr1.mean()
#     return avg

# @profile
# def main_func():
#     avg1 = slow_random_generator()
#     avg2 = very_slow_random_generator()

#     print("Averages: {:.3f}, {:.3f}".format(avg1,avg2))

# if __name__ == '__main__':
#     main_func()

#### **%mprun** magic command

In [15]:
from example3 import very_slow_random_generator,\
                                    slow_random_generator,\
                                    main_func

%mprun -f very_slow_random_generator -T slow_profile_dump.log -f slow_random_generator -f main_func main_func()

Averages: 50.001, 49.966


*** Profile printout saved to text file slow_profile_dump.log. 


Filename: /ws2/yfulwani/Profiling_Guide/Memory_Profiler/example3.py

Line #    Mem usage    Increment  Occurrences   Line Contents
     4     75.8 MiB     75.8 MiB           1   def very_slow_random_generator():
     5     75.8 MiB      0.0 MiB           1       time.sleep(5)
     6     75.8 MiB      0.0 MiB           1       arr1 = np.random.randint(1,100, size=(1000,1000))
     7     75.8 MiB      0.0 MiB           1       avg = arr1.mean()
     8     75.8 MiB      0.0 MiB           1       return avg


Filename: /ws2/yfulwani/Profiling_Guide/Memory_Profiler/example3.py

Line #    Mem usage    Increment  Occurrences   Line Contents
    11     68.2 MiB     68.2 MiB           1   def slow_random_generator():
    12     68.2 MiB      0.0 MiB           1       time.sleep(2)
    13     75.8 MiB      7.5 MiB           1       arr1 = np.random.randint(1,100, size=(1000,1000))
    14     75.8 MiB      0.0 MiB           1       avg = arr1.mean()
    15     75.8 MiB      0.0 MiB           1   

**mprun** has following parameters
- **-f**: To specify what functions to profile
- **-T**: Specify after function to save results for that function at a particular path <br>


In above example the **very_slow_random_generator**, **slow_random_generator**, **main_function** are profiled and results of **very_slow_random_generator** are saved to **slow_profile_dump.log**

#### **%memit** Line Magic Command

The **%memit** cell command works for whole cell and reports **peak memory usage** of the whole cell. We run it for the random_generator function defined above

In [16]:
%memit random_generator()

peak memory: 94.23 MiB, increment: 0.01 MiB
