# Scalene Profiler

**Scalene** is a high-performance CPU, GPU and memory profiler for Python that does a number of things that other Python profilers cannot do. It runs orders of magnitude faster than many other profilers while delivering detailed information. It is also the first profiler ever to incorporate **AI-powered proposed optimizations**. To enable these, you need to enter an **OpenAI key**. <br>
Once a valid key is entered, click on the lightning bolt (⚡) beside any line or the explosion (💥) for an entire region of code to generate a proposed optimization. Click on a proposed optimization to copy it to the clipboard. <br>
Scalene can profile GPU workloads. It can profile individual threads. It provides memory consumption information. It does not provide call stack information. <br>
To know more about Scalene visit its **[GitHub respository.](https://github.com/plasma-umass/scalene)**

## To Install Scalene

**Note:** Please use scalene **version 1.3.12** for profiling **[Intelligent-Indexing](https://github.com/oneapi-src/intelligent-indexing)** toolkit explained below. <br>
**Using pip**

`pip install scalene==1.3.12`

**For general installation** 
**Using pip**

`python3 -m pip install -U scalene`

**Using Conda**

`conda install -c conda-forge scalene`

## Ways to Profile Python Code Using Scalene

- **1. Using scalene in CLI:** No code modification is required in this case.
- **2. Using Scalene to profile only specific functions via @profile decorator:** Code modification is required in this case.

**Web-based GUI** <br>
Scalene has both a CLI and a web-based GUI **[demo here.](http://plasma-umass.org/scalene-gui/)** <br>

By default, once Scalene has profiled your program, it will open a tab in a web browser with an interactive user interface (all processing is done locally). Hover over bars to see breakdowns of CPU and memory consumption, and click on underlined column headers to sort the columns. The generated file profile.html is self-contained and can be saved for later use.

### 1: Using Scalene in CLI

Create a python file **example1.py**  and **uncomment** the code. We have already created one for your reference, so you are free to skip this step.

In [1]:
# import time
# import numpy as np

# def very_slow_random_generator():     ##Function to generate random number
#     time.sleep(5)                     ## Added 5 secs sleep to simulate slow generator
#     arr1 = np.random.randint(1,100, size=(1000,1000))
#     avg = arr1.mean()
#     return avg


# def slow_random_generator():         ##Function to generate random number
#     time.sleep(2)                    ## Added 2 secs sleep to simulate slow generator
#     arr1 = np.random.randint(1,100, size=(1000,1000))
#     avg = arr1.mean()
#     return avg

# def main_func():
#     avg1 = slow_random_generator()
#     avg2 = very_slow_random_generator()

#     print("Averages: {:.3f}, {:.3f}".format(avg1,avg2))

# if __name__ == '__main__':
#     main_func()

To know more about how to use Scalene use **--help**

In [2]:
!scalene --help

usage: scalene [1m[[0m-h[1m][0m [1m[[0m--version[1m][0m [1m[[0m--column-width COLUMN_WIDTH[1m][0m
               [1m[[0m--outfile OUTFILE[1m][0m [1m[[0m--html[1m][0m [1m[[0m--json[1m][0m [1m[[0m--cli[1m][0m [1m[[0m--stacks[1m][0m
               [1m[[0m--web[1m][0m [1m[[0m--viewer[1m][0m [1m[[0m--reduced-profile[1m][0m
               [1m[[0m--profile-interval PROFILE_INTERVAL[1m][0m [1m[[0m--cpu[1m][0m [1m[[0m--cpu-only[1m][0m
               [1m[[0m--gpu[1m][0m [1m[[0m--memory[1m][0m [1m[[0m--profile-all[1m][0m
               [1m[[0m--profile-only PROFILE_ONLY[1m][0m
               [1m[[0m--profile-exclude PROFILE_EXCLUDE[1m][0m [1m[[0m--use-virtual-time[1m][0m
               [1m[[0m--cpu-percent-threshold CPU_PERCENT_THRESHOLD[1m][0m
               [1m[[0m--cpu-sampling-rate CPU_SAMPLING_RATE[1m][0m
               [1m[[0m--allocation-sampling-window ALLOCATION_SAMPLING_WINDOW[1m][0m
            

Execute the below line to profile **example1.py** using **Scalene**

In [3]:
!scalene --html --outfile example1_output.html example1.py

Averages: 49.945, 50.002


#### Scalene Example1 Results Interpretation 

While in the jupyter notebook click the `example1_output.html` file. It will open a new tab. 

The result has the following columns

- **Memory usage:** At the top, visualized by "sparklines", memory consumption over the runtime of the profiled code. <br>
- **Time Python:** How much time was spent in Python code.
- **native:** How much time was spent in non-Python code (e.g., libraries written in C/C++).
- **system:** How much time was spent in the system (e.g., I/O).
- **GPU:** (not shown here) How much time spent on the GPU, if your system has an NVIDIA GPU installed.
- **Memory Python:** How much of the memory allocation happened on the Python side of the code, as opposed to in non-Python code (e.g., libraries written in C/C++).
- **net:** Positive net memory numbers indicate total memory allocation in megabytes; negative net memory numbers indicate memory reclamation.
- **timeline / %:** Visualized by "sparklines", memory consumption generated by this line over the program runtime, and the percentages of total memory activity this line represents.
- **Copy (MB/s):** The amount of megabytes being copied per second (see "About Scalene").

### 2. Using Scalene to profile specific functions in Code

Create a python file **example2.py**  and **uncomment** the code. We have already created one for your reference, so you are free to skip this step.

In [1]:
# import time
# import numpy as np


# @profile                            ## Add @profile decorator to profile function
# def very_slow_random_generator():   ##Function to generate random number
#     time.sleep(5)                   ##Added 5 sec sleep to simulate slow generator
#     arr1 = np.random.randint(1,100, size=(1000,1000))
#     avg = arr1.mean()
#     return avg

# @profile                           ## Add @profile decorator to profile function
# def slow_random_generator():       ##Function to generate random number
#     time.sleep(2)                  ##Added 2 sec sleep to simulate slow generator
#     arr1 = np.random.randint(1,100, size=(1000,1000))
#     avg = arr1.mean()
#     return avg

# def main_func():
#     avg1 = slow_random_generator()
#     avg2 = very_slow_random_generator()
#     print("Averages: {:.3f}, {:.3f}".format(avg1,avg2))

# if __name__ == '__main__':
#     main_func()

Execute the below line to profile **example2.py** using **Scalene**

In [6]:
!scalene --html --outfile example2_output.html example2.py

Averages: 49.983, 49.998


While in the jupyter notebook click the `example2_output.html` file. It will open a new tab. 

The result has the following columns

- **Memory usage:** At the top, visualized by "sparklines", memory consumption over the runtime of the profiled code. <br>
- **Time Python:** How much time was spent in Python code.
- **native:** How much time was spent in non-Python code (e.g., libraries written in C/C++).
- **system:** How much time was spent in the system (e.g., I/O).
- **GPU:** (not shown here) How much time spent on the GPU, if your system has an NVIDIA GPU installed.
- **Memory Python:** How much of the memory allocation happened on the Python side of the code, as opposed to in non-Python code (e.g., libraries written in C/C++).
- **net:** Positive net memory numbers indicate total memory allocation in megabytes; negative net memory numbers indicate memory reclamation.
- **timeline / %:** Visualized by "sparklines", memory consumption generated by this line over the program runtime, and the percentages of total memory activity this line represents.
- **Copy (MB/s):** The amount of megabytes being copied per second (see "About Scalene").

There is no difference in the results from **example1** and **example2**. In **example2** we can choose what functions we want to profile.

### Using **Scalene** for **intelligent_indexing** ref kit 

The **[Intelligent Indexing](https://github.com/oneapi-src/intelligent-indexing)** ref kit demonstrates one way of building an NLP pipeline for classifying documents to their respective topics and describe how we can leverage the **Intel® AI Analytics Toolkit (AI Kit)** to accelerate the pipeline.

**Intel® AI Analytics Toolkit (AI Kit)** is used to achieve quick results even when the data for a model are huge. It provides the capability to reuse the code present in different languages so that the hardware utilization is optimized to provide these results.

The **Intelligent Indexing** ref kit has different Intel® oneAPI optimizations enabled like:
- **[Intel® Distribution of Modin*](https://www.intel.com/content/www/us/en/developer/tools/oneapi/distribution-of-modin.html#gs.v03x2l)**
The Intel® Distribution of Modin* is a performant, parallel, and distributed dataframe system that is designed around enabling data scientists to be more productive. It provides drop-in acceleration to your existing **pandas** workflows. No upfront cost to learning a new API. Integrates with the Python* ecosystem. Seamlessly scales across multicores with Ray* and Dask* clusters (run on and with what you have)
- **[Intel® Extension for Scikit-learn*](https://www.intel.com/content/www/us/en/developer/tools/oneapi/scikit-learn.html)**
Designed for data scientists, Intel® Extension for Scikit-Learn* is a seamless way to speed up your Scikit-learn applications for machine learning to solve real-world problems. This extension package dynamically patches scikit-learn estimators to use Intel® oneAPI Data Analytics Library (oneDAL) as the underlying solver, while achieving the speed up for your machine learning algorithms out-of-box.

**NOTE** Please visit the **[Intelligent Indexing](https://github.com/oneapi-src/intelligent-indexing)** Ref kit page to know more about the kit.
- Please follow the steps in github repo to clone and create the environment.
- After creating environment install **scalene** in both the environments **doc_class_stock** and **doc_class_intel** using

`pip install scalene==1.3.12`

We will be using **scalene** to profile this workload below. Please note that Intel® Distribution of Modin* will be disabled for this profiling, and Scalene does not currently support Intel® Distribution of Modin* with Ray* backend, which is used in the Intelligent Indexing ref kit.

#### Modify the run_benchmarks.py File of the Reference Kit

Just add the **@profile** decorator at the top of functions you want to profile. <br>

Modify the **run_benchmarks.py (Location '../intelligent-indexing/src/run_benchmarks.py')** file to add **@profile** decorators as shown below (we have provided a **run_benchmarks_scalene.py** for your reference, **copy the file at Location '../intelligent-indexing/src/run_benchmarks_scalene.py'**, if you would like to skip this step and move to the next section). <br>

**Modify the get_data() function**
```
@profile
def get_data(path_to_csv: str) -> pd.DataFrame:
    """Read in and clean data
    Args:
        path_to_csv (str): processed data
    """
    data = pd.read_csv(path_to_csv)[
        ['category', 'headline', 'short_description', 'link']
    ]
    data = data.dropna(subset=['headline', 'short_description', 'link'])

    data.link = data.link.apply(clean_link)
    data.short_description = data.short_description \
        .apply(clean_short_description)
    data.headline = data.headline.apply(clean_headline)

    data['text'] = data.link + " " + data.short_description \
        + " " + data.headline
    data['tokens'] = data.text.apply(tokenize)
    return data
```
<br>

**Create a function train_data()**
```
@profile
def train_data(train,test):
    vectorizer = TfidfVectorizer(
    min_df=50,
    lowercase=False,
    tokenizer=lambda x: x)
                                    
    svc = SVC()
    svc.fit(vectorizer.fit_transform(train.tokens), train.category)
    training_time = time.time()
    y_pred = svc.predict(vectorizer.transform(test.tokens))
    return svc, training_time, y_pred
```

#### Profile Intelligent Indexing Ref Kit with Stock packages

To ignore warnings run the below cell and if the result directories are not present uncomment the below code

In [None]:
import warnings
warnings.filterwarnings('ignore')
# mkdir -p Scalene_Profiler_Results  # create `Scalene_Profiler_Results` dir in the parent dir if not present
# mkdir -p Scalene_Profiler_Results/stock_results  # create `stock_results` dir in the Scalene_Profiler_Results if not present
# mkdir -p Scalene_Profiler_Results/oneapi_optimized_results  # create `oneapi_optimized_results` dir in the Scalene_Profiler_Results if not present

**Ensure that you have run_benchmarks_scalene.py file**<br>
To run the profiler on the intelligent indexing ref kit <br>
1. Navigate to directory **intelligent-indexing/src/** in terminal <br>
`conda activate doc_class_stock`

Execute the below command in Terminal:

`scalene --html --outfile ../../Profiling_Guide/Scalene_Profiler/Scalene_Profiler_Results/stock_results/scalene_stock.html  run_benchmarks_scalene.py -l ../logs/stock_stock.log`

**To visualize the results**, go to the **Scalene_Profiler_Results/stock_results** directory and being in the jupyter notebook open the file **scalene_stock.html**. <br>
The result has the following columns <br>
- **Memory usage:** At the top, visualized by "sparklines", memory consumption over the runtime of the profiled code. <br>
- **Time Python:** How much time was spent in Python code.
- **native:** How much time was spent in non-Python code (e.g., libraries written in C/C++).
- **system:** How much time was spent in the system (e.g., I/O).
- **GPU:** (not shown here) How much time spent on the GPU, if your system has an NVIDIA GPU installed.
- **Memory Python:** How much of the memory allocation happened on the Python side of the code, as opposed to in non-Python code (e.g., libraries written in C/C++).
- **net:** Positive net memory numbers indicate total memory allocation in megabytes; negative net memory numbers indicate memory reclamation.
- **timeline / %:** Visualized by "sparklines", memory consumption generated by this line over the program runtime, and the percentages of total memory activity this line represents.
- **Copy (MB/s):** The amount of megabytes being copied per second (see "About Scalene").

#### Profile Intelligent Indexing Ref Kit with Intel oneAPI optimized packages

**Ensure that you have run_benchmarks_scalene.py file.** <br>
To run the profiler on the intelligent indexing ref kit <br>
1. Navigate to directory **intelligent-indexing/src/** in terminal <br>
`conda activate doc_class_intel`

Execute the below command in Terminal:

`scalene --html --outfile ../../Profiling_Guide/Scalene_Profiler/Scalene_Profiler_Results/oneapi_optimized_results/scalene_intel.html run_benchmarks_scalene.py -i -l ../logs/intel_intel.log`

**To visualize the results**, go to the **Scalene_Profiler_Results/oneapi_optimized_results** directory and being in the jupyter notebook open the file **scalene_intel.html**. <br>
The result has the following columns <br>
- **Memory usage:** At the top, visualized by "sparklines", memory consumption over the runtime of the profiled code. <br>
- **Time Python:** How much time was spent in Python code.
- **native:** How much time was spent in non-Python code (e.g., libraries written in C/C++).
- **system:** How much time was spent in the system (e.g., I/O).
- **GPU:** (not shown here) How much time spent on the GPU, if your system has an NVIDIA GPU installed.
- **Memory Python:** How much of the memory allocation happened on the Python side of the code, as opposed to in non-Python code (e.g., libraries written in C/C++).
- **net:** Positive net memory numbers indicate total memory allocation in megabytes; negative net memory numbers indicate memory reclamation.
- **timeline / %:** Visualized by "sparklines", memory consumption generated by this line over the program runtime, and the percentages of total memory activity this line represents.
- **Copy (MB/s):** The amount of megabytes being copied per second (see "About Scalene").