# Memory Profiler

**Ways to Profile Memory Usage Of Python Code using "memory_profiler"** <br>
- **@profile Decorator** - Used to profile memory usage of individual Python functions. Provide statistics showing memory usage by an individual line of python code.
- **mprof Shell/Command Line Command** - Used to profile memory usage of whole Python script (".py" file) as a function of time. It'll let us analyze memory usage during code run time rather than by individual line of code.
- **memory_usage() function** - Used to profile memory usage of process, python statements, and Python functions for a specified time interval.
- **mprun & memit cell/line Magic commands of Jupyter notebook** - Used to profile memory usage of individual python statement or code of whole cell in Jupyter Notebook.

**To Know more about memory_profiler visit the [github_repo](https://github.com/pythonprofilers/memory_profiler)**

## Installation

Install via pip::

    $ pip install -U memory_profiler

The package is also available on `conda-forge
<https://github.com/conda-forge/memory_profiler-feedstock>`_.

To install from source, download the package, extract and type::

    $ pip install .

## 1. Example to use @profile Decorator

Use @profile decorator above the function you want to profile. <br> 
Here is a file **example1.py** with below code <br>

In [None]:
from memory_profiler import profile
import random

@profile
def random_number_generator():
    arr1 = [random.randint(1,10) for i in range(100000)]
    arr2 = [random.randint(1,10) for i in range(100000)]
    arr3 = [arr1[i]+arr2[i] for i in range(100000)]
    del arr1
    del arr2
    tot = sum(arr3)
    del arr3
    print(tot)

if __name__ == "__main__":
    random_number_generator()

In [None]:
!python -m memory_profiler example1.py

The output has 5 columns:

- **Line #**: Line Number
- **Line Contents**: Python code at each line number
- **Mem usage**: Memory usage by the Python interpreter after every execution of the line.
- **Increment**: Difference in memory consumption from the current line to the last line. It basically denotes the memory consumed by a particular line of Python code.
- **Occurrences**: Number of times a particular line of code is executed.

Mem Usage can be tracked to observe the total memory occupancy by the Python interpreter, whereas the Increment column can be observed to see the memory consumption for a particular line of code. By observing the memory usage one can optimize the memory consumption to develop a production-ready code. This gives us the best idea of how much memory in total is getting used and how much a particular variable is using for better decision-making.

### Example to use high precision and save results to log file

In [None]:
from memory_profiler import profile
import random
fp = open("./Memory_Profiler/example_report.log", "w+")
@profile(precision=4,stream=fp)
def random_number_generator():
    arr1 = [random.randint(1,10) for i in range(100000)]
    arr2 = [random.randint(1,10) for i in range(100000)]
    arr3 = [arr1[i]+arr2[i] for i in range(100000)]
    del arr1
    del arr2
    tot = sum(arr3)
    del arr3
    print(tot)

if __name__ == "__main__":
    random_number_generator()

In [None]:
!python -m memory_profiler example.py

To view results, use **cat** command

In [None]:
!cat "./Memory_Profiler/example_report.log"

### Example to use **memory_profiler** for **intelligent_indexing** ref kit 

Just add the **@profile** decorator at the top of functions you want to profile. <br>
Modify the run_benchmarks.py file to add **@profile** decorators as shown below <br>

**get_data() function**
```
@profile(precision=4,stream = fp)
def get_data(path_to_csv: str) -> pd.DataFrame:
    """Read in and clean data
    Args:
        path_to_csv (str): processed data
    """
    data = pd.read_csv(path_to_csv)[
        ['category', 'headline', 'short_description', 'link']
    ]
    data = data.dropna(subset=['headline', 'short_description', 'link'])

    data.link = data.link.apply(clean_link)
    data.short_description = data.short_description \
        .apply(clean_short_description)
    data.headline = data.headline.apply(clean_headline)

    data['text'] = data.link + " " + data.short_description \
        + " " + data.headline
    data['tokens'] = data.text.apply(tokenize)
    return data
```
<br>

**Create a function train_data()**
```
@profile(precision=4,stream = fp)
def train_data(train,test):
    vectorizer = TfidfVectorizer(
    min_df=50,
    lowercase=False,
    tokenizer=lambda x: x)
                                    
    svc = SVC()
    svc.fit(vectorizer.fit_transform(train.tokens), train.category)
    training_time = time.time()
    y_pred = svc.predict(vectorizer.transform(test.tokens))
    return svc, training_time, y_pred
```

#### To run for stock

In [None]:
!python -m memory_profiler "../intelligent-indexing/src/run_benchmarks.py" -l "../intelligent-indexing/logs/stock_stock.log"

#### To run for IPEX

In [None]:
!python -m memory_profiler "../intelligent-indexing/src/run_benchmarks.py" -i -l "../intelligent-indexing/logs/intel_intel.log"

**NOTE** : To save log files at appropriate locations. Add  <br>
**While profiling stock**
```
fp = open("./Memory_Profiler_Results/stock_ressults/stock_report.log", "w+") ## For stock results
```
**While profiling intel extension**
```
fp = open("./Memory_Profiler_Results/ipex_ressults/ipex_report.log", "w+") ## For ipex results
```

## 2. Example to use **mprof** 

Use @profile decorator above the function you want to profile. <br> 
Here is a file **example1.py** with below code <br>

In [None]:
from memory_profiler import profile
import random

@profile
def random_number_generator():
    arr1 = [random.randint(1,10) for i in range(100000)]
    arr2 = [random.randint(1,10) for i in range(100000)]
    arr3 = [arr1[i]+arr2[i] for i in range(100000)]
    del arr1
    del arr2
    tot = sum(arr3)
    del arr3
    print(tot)

if __name__ == "__main__":
    random_number_generator()

In [None]:
!mprof run example.py

The above command will execute the script and generate the new file by name **mprofile_[current_datetime].dat**

mprof command has different parameters
- **--interval INTERVAL or -T INTERVAL** - As we had mentioned earlier, "mprof" records memory usage every "0.1" second by default. We can override this setting using this parameter. We can give time interval in seconds here.
- **--timeout TIMEOUT or -t TIMEOUT** - By default, "mprof" monitors total execution of program/process. We can instruct it to stop monitoring after a specified amount of time using this parameter. It let us specify a time in seconds.
- **--output FILENAME or -o FILENAME** - We can direct the result of profiling to an output file using this command. By default, "mprof" creates a file named "mprofile_datetime.dat". We can override that using this argument.
- **--backend BACKEND** - This command let us specify backend for profiling. The default is "psutil" as we had mentioned a few times earlier. We would recommend users to stick to default as other backends do not seem reliable yet.
- **--include-children** - It monitors memory usage across all children of process and shows their usage as one line chart.
- **--multiprocess** - It generates a sample line chart for each sub-process and their memory usage per time.

In [None]:
!mprof plot

mprof plot has different parameters like
- **-o** :  To save plot to output file
- **-t** : To give title to plot <br>
It automatically takes the latest .dat file created by mprof

### Example to use **mprof** for **intelligent_indexing** ref kit 

Just add the **@profile** decorator at the top of functions you want to profile. <br>
Modify the run_benchmarks.py file to add **@profile** decorators as shown in above example for **intelligent-indexing**<br>


#### To run for stock

In [None]:
!mprof run "../intelligent-indexing/src/run_benchmarks.py" -l "../intelligent-indexing/logs/stock_stock.log"

#### To run for IPEX

In [None]:
!mprof run "../intelligent-indexing/src/run_benchmarks.py" -i -l "../intelligent-indexing/logs/intel_intel.log"

**NOTE** : To save log files at appropriate locations. Add  <br>
**While profiling stock**
```
fp = open("./Memory_Profiler_Results/stock_ressults/stock_report.log", "w+") ## For stock results
```
**While profiling intel extension**
```
fp = open("./Memory_Profiler_Results/ipex_ressults/ipex_report.log", "w+") ## For ipex results
```

##### **To visaulize the plots**

Go to the respective results folder and execute
```
!mprof plot
```

## 3. Example to use **memory_usage()**

In [1]:
from memory_profiler import memory_usage
import time
import numpy as np

def very_slow_random_generator(sz=1000):
    time.sleep(5)
    arr1 = np.random.randint(1,100, size=(sz, sz))
    avg = arr1.mean()
    return avg

In [None]:
mem_usage = memory_usage((very_slow_random_generator, (10000,), ), timestamps=True, interval=0.1,stream=open("example_memory_usage.txt", "w"))
mem_usage

To Profile using **memory_usage**. We need one of the following
- **Process**: We need to provide process id as an integer or string.
- **Python Function**: The function followed by its arguments needs to be provided as a tuple. <br>

In the above example we use
- **Python function**: very_slow_random_generator
- **interval**: 0.1 seconds
- **timestamps**: True, will return the timestamps at which memory_usage was recorded
- **stream**: To save results to output file

## 4. Example to use **mprun** and **memit** magic line commands

The memory_profiler provides 2 line magic commands and 2 cell magic commands to be used in jupyter notebooks.

- **Line Magic Commands**: %mprun & %memit
- **Cell Magic Commands**: %%mprun & %%memit

### Create an example.py with following code

```
import time
import numpy as np

@profile
def very_slow_random_generator():
    time.sleep(5)
    arr1 = np.random.randint(1,100, size=(1000,1000))
    avg = arr1.mean()
    return avg

@profile
def slow_random_generator():
    time.sleep(2)
    arr1 = np.random.randint(1,100, size=(1000,1000))
    avg = arr1.mean()
    return avg

@profile
def main_func():
    avg1 = slow_random_generator()
    avg2 = very_slow_random_generator()

    print("Averages: {:.3f}, {:.3f}".format(avg1,avg2))

if __name__ == '__main__':
    main_func()
```

#### **%mprun** magic command

In [None]:
from example import very_slow_random_generator,\
                                    slow_random_generator,\
                                    main_func

%mprun -f very_slow_random_generator -T 'slow_profile_dump.log' -f slow_random_generator -f main_func main_func()

**mprun** has following parameters
- **-f**: To specify what functions to profile
- **-T**: Specify after function to save results for that function at a particular path <br>


In above example the **very_slow_random_generator**, **slow_random_generator**, **main_function** are profiled and results of **very_slow_random_generator** are saved to **slow_profile_dump.log**

#### **%memit** Line Magic Command

The **%memit** cell command works for whole cell and reports **peak memory usage** of the whole cell.

In [None]:
%memit very_slow_random_generator()