## Line Profiler

Function profiling tools only time function calls. This is a good first step for locating hotspots in one's program and is frequently all one needs to do to optimize the program. **However, sometimes the cause of the hotspot is actually a single line in the function, and that line may not be obvious from just reading the source code**. These cases are particularly frequent in scientific computing. Functions tend to be larger (sometimes because of legitimate algorithmic complexity, sometimes because the programmer is still trying to write FORTRAN code), and a single statement without function calls can trigger lots of computation when using libraries like numpy. cProfile only times explicit function calls, not special methods called because of syntax. Consequently, a relatively slow numpy operation on large arrays like this,
```
a[large_index_array] = some_other_large_array
```
is a hotspot that never gets broken out by cProfile because there is no explicit function call in that statement.

***LineProfiler can be given functions to profile, and it will time the execution of each individual line inside those functions.*** <br>

To know more about line profiler visit [link](https://github.com/pyutils/line_profiler) <br>

**To Install line_profiler** <br>
**Using Conda**
```
conda install line_profiler
```
**Using pip**
```
$ pip install line_profiler

To install compatible IPython version using pip:
$ pip install line_profiler[ipython]

To check out the development sources, you can use Git:
$ git clone https://github.com/pyutils/line_profiler.git
```

Ways to Profile Python Code using "line_profiler"
- **kernprof**: Command Prompt/Shell Command: This command let us profile whole Python script from command line/shell.
- **LineProfiler**: Object in Python Script: This let us profile individual functions of our code by declaring profiler object in script itself.
- **%lprun**: Jupyter Notebook Magic Command: This let us profile functions in Jupyter Notebooks using "%lprun" line magic command.

### Example to use KernProf and Line_Profiler Object

Create a python file **random_number_average.py**  and **uncomment** the code. We have already created one for your reference

In [None]:
# import time
# import random

# @profile
# def very_slow_random_generator():
#     time.sleep(5)
#     arr = [random.randint(1,100) for i in range(100000)]
#     return sum(arr) / len(arr)

# @profile
# def slow_random_generator():
#     time.sleep(2)
#     arr = [random.randint(1,100) for i in range(100000)]
#     return sum(arr) / len(arr)
    
# @profile
# def main_func():
#     result = slow_random_generator()
#     print(result)

#     result = very_slow_random_generator()
#     print(result)

# main_func()

#### KernProf

To Profile this file use decorators **@profile** above the function use 

In [None]:
!kernprof -l random_number_average.py

To see **output** from the above file, use the cell below

In [None]:
!python -m line_profiler random_number_average.py.lprof

To understand the results above, the different column definitions are given below
- **Hits:** The first column represents number of times that line was hit inside that function. In our example hits it is 1 but it can be more than one in case of recurrences.
- **Time:** The second column represents the time taken by that line in total for all hits. This time is in microseconds.
- **Per Hit:** The third column represents time taken per each call of that line.
- **% Time:** The fourth column represents % of time taken by that line of total function time.
- **Line Contents:** The fifth column represents code in that line of function.

#### Line Profiler Object

We then need to create an **object of LineProfiler class** first. We then need to create a wrapper around main_func() by calling the LineProfiler instance passing it main_func. We can then execute that line profiler wrapper which will execute main_func().

Add this code in your python file and to add multiple functions use **add_function**
```
from line_profiler import LineProfiler

lprofiler = LineProfiler()

lprofiler.add_function(very_slow_random_generator)
lprofiler.add_function(slow_random_generator)

lp_wrapper = lprofiler(main_func)

lp_wrapper()

lprofiler.print_stats()
```

We have an example file with name **random_number_average_with_lp_object.py**. To view the results run the file

In [None]:
!python random_number_average_with_lp_object.py

## Profile [Intelligent Indexing](https://github.com/oneapi-src/intelligent-indexing) Ref kit using Line Profiler


The **[Intelligent Indexing](https://github.com/oneapi-src/intelligent-indexing)** ref kit demonstrates one way of building an NLP pipeline for classifying documents to their respective topics and describe how we can leverage the **Intel® AI Analytics Toolkit (AI Kit)** to accelerate the pipeline.

**Intel® AI Analytics Toolkit (AI Kit)** is used to achieve quick results even when the data for a model are huge. It provides the capability to reuse the code present in different languages so that the hardware utilization is optimized to provide these results.

The **Intelligent Indexing** ref kit has different Intel® oneAPI optimizations enabled like:
- **[Intel® Distribution of Modin*](https://www.intel.com/content/www/us/en/developer/tools/oneapi/distribution-of-modin.html#gs.v03x2l)**
The Intel® Distribution of Modin* is a performant, parallel, and distributed dataframe system that is designed around enabling data scientists to be more productive. It provides drop-in acceleration to your existing **pandas** workflows. No upfront cost to learning a new API. Integrates with the Python* ecosystem. Seamlessly scales across multicores with Ray* and Dask* clusters (run on and with what you have)
- **[Intel® Extension for Scikit-learn*](https://www.intel.com/content/www/us/en/developer/tools/oneapi/scikit-learn.html)**
Designed for data scientists, Intel® Extension for Scikit-Learn* is a seamless way to speed up your Scikit-learn applications for machine learning to solve real-world problems. This extension package dynamically patches scikit-learn estimators to use Intel® oneAPI Data Analytics Library (oneDAL) as the underlying solver, while achieving the speed up for your machine learning algorithms out-of-box.

**NOTE** Please visit the **[Intelligent Indexing](https://github.com/oneapi-src/intelligent-indexing)** Ref kit page to know more about the kit.
- Please follow the steps in github repo to clone and create the environment.
- After creating environment install **line_profiler** in both the environments **doc_class_stock** and **doc_class_intel** using
```
conda install line_profiler
```
We will be using **line_profiler** to profile this workload below.

Just add the **@profile** decorator at the top of functions you want to profile. <br>
Modify the **run_benchmarks.py (Location '../intelligent-indexing/src/run_benchmarks.py')** file to add **@profile** decorators as shown below. <br> 
We have provided a **run_benchmarks_modified.py** for your reference. <br>

**get_data() function**
```
@profile
def get_data(path_to_csv: str) -> pd.DataFrame:
    """Read in and clean data
    Args:
        path_to_csv (str): processed data
    """
    data = pd.read_csv(path_to_csv)[
        ['category', 'headline', 'short_description', 'link']
    ]
    data = data.dropna(subset=['headline', 'short_description', 'link'])

    data.link = data.link.apply(clean_link)
    data.short_description = data.short_description \
        .apply(clean_short_description)
    data.headline = data.headline.apply(clean_headline)

    data['text'] = data.link + " " + data.short_description \
        + " " + data.headline
    data['tokens'] = data.text.apply(tokenize)
    return data
```
<br>

**Create a function train_data()**
```
@profile
def train_data(train,test):
    vectorizer = TfidfVectorizer(
    min_df=50,
    lowercase=False,
    tokenizer=lambda x: x)
                                    
    svc = SVC()
    svc.fit(vectorizer.fit_transform(train.tokens), train.category)
    training_time = time.time()
    y_pred = svc.predict(vectorizer.transform(test.tokens))
    return svc, training_time, y_pred
```

### Profile Intelligent Indexing Ref Kit with Stock packages

To run the profiler on the intelligent indexing ref kit <br>
- Navigate to directory **intelligent-indexing/src/** in terminal
- ```conda activate doc_class_stock```
- execute the below commands

In [None]:
# kernprof -l -o '../../Profiling_Guide/Line_Profiler/Line_Profiler_results/stock_results/line_stock.txt' run_benchmarks.py -l ../logs/stock_stock.log

To visualize the results execute the below command from the **intelligent-indexing/src/** directory

In [None]:
# python -m line_profiler '../../Profiling_Guide/Line_Profiler/Line_Profiler_results/stock_results/line_stock.txt'

You can also save the profiling stats using flag **-o** as fname.prof and then use **snakeviz** to visualize the profiling results.

### Profile Intelligent Indexing Ref Kit with Intel oneAPI optimized packages

To run the profiler on the intelligent indexing ref kit <br>
- Navigate to directory **intelligent-indexing/src/** in terminal
- ```conda activate doc_class_intel```
- execute the below commands

In [None]:
# kernprof -l -o '../../Profiling_Guide/Line_Profiler/Line_Profiler_results/oneapi_optimized_results/line_intel.txt' run_benchmarks.py -i -l ../logs/intel_intel.log

To visualize the results execute the below command from the **intelligent-indexing/src/** directory

In [None]:
# python -m line_profiler '../../Profiling_Guide/Line_Profiler/Line_Profiler_results/oneapi_optimized_results/line_intel.txt'

You can also save the profiling stats using flag **-o** as fname.prof and then use **snakeviz** to visualize the profiling results.