# Bodo Getting Started Tutorial

In a nutshell, Bodo provides a just-in-time (JIT) compilation workflow using the `@bodo.jit` decorator. It replaces decorated Python functions with an optimized and parallelized binary version.

In this tutorial, we will cover the basics of using Bodo and explain its important concepts. We strongly recommend reading this page before using Bodo.

Let's get started!

## Environment Setup

##### (NOTE: This step  will be ignored on the Bodo Cloud Platform)

Please follow the [README.md](README.md) to set up bodo, ipyparallel, and your environment. Fore more details please read [Bodo installation](https://docs.bodo.ai/installation_and_setup/install/) and [Jupyter Notebook Setup](https://docs.bodo.ai/2022.6/installation_and_setup/ipyparallel/#testinstall) pages to setup the environment, then initialize the `ipyparallel` environment:

In [1]:
import os
if os.environ.get("BODO_PLATFORM_WORKSPACE_UUID",'NA') == 'NA':
    print("You are not on Bodo Platform, running ipyparallel.. ")
    import ipyparallel as ipp
    import psutil; n = min(psutil.cpu_count(logical=False), 8)
    rc = ipp.Cluster(engines='mpi', n=n).start_and_connect_sync(activate=True)

You are not on Bodo Platform, running ipyparallel.. 
Starting 8 engines with <class 'ipyparallel.cluster.launcher.MPIEngineSetLauncher'>
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:09<00:00,  1.18s/engine]


In [2]:
!where python

/opt/miniconda3/envs/Bodo/bin/python


Run the following code to verify that your IPyParallel cluster is set up correctly:

In [3]:
%%px
import bodo
print(f"Hello World from rank {bodo.get_rank()}. Total ranks={bodo.get_size()}")

%px:   0%|                                                                                                                         | 0/8 [00:13<?, ?tasks/s]

[stdout:6] Hello World from rank 6. Total ranks=8


[stdout:7] Hello World from rank 7. Total ranks=8


[stdout:3] Hello World from rank 3. Total ranks=8


[stdout:0] Hello World from rank 0. Total ranks=8


[stdout:1] Hello World from rank 1. Total ranks=8


[stdout:5] Hello World from rank 5. Total ranks=8


[stdout:2] Hello World from rank 2. Total ranks=8


%px: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:13<00:00,  1.66s/tasks]


[stdout:4] Hello World from rank 4. Total ranks=8


## Parallel Pandas with Bodo
First, we demonstrate how Bodo automatically parallelizes and optimizes standard Python programs that make use of pandas and NumPy, without the need to rewrite your code. Bodo can scale your analytics code to thousands of cores, providing orders of magnitude speed up depending on program characteristics.

After this initialization, any code that we run in the notebook with `%%px` is sent for execution on all MPI engines. Note, if the machine on which you are running the example has less then 8 cores, it will run on that number of cores. For the sake of this tutorial, when executing code with 8 cores, we will limit the maximum number of rows printed to standard out, to reduce clutter:

In [4]:
%%px
import pandas as pd
pd.options.display.max_rows = 3

### Generate data
To begin, let's generate a simple dataset and write to a [Parquet](http://parquet.apache.org/) file:

In [5]:
import pandas as pd
import numpy as np

# 10m data points
df = pd.DataFrame(
    {
        "A": np.repeat(pd.date_range("2013-01-03", periods=1000), 10_000),
        "B": np.arange(10_000_000),
    }
)
# set some values to NA
df.iloc[np.arange(1000) * 3, 0] = pd.NA
# using row_group_size helps with efficient parallel read of data later
df.to_parquet("pd_example.pq", row_group_size=100000)
print(df)

                 A        B
0              NaT        0
1       2013-01-03        1
2       2013-01-03        2
3              NaT        3
4       2013-01-03        4
...            ...      ...
9999995 2015-09-29  9999995
9999996 2015-09-29  9999996
9999997 2015-09-29  9999997
9999998 2015-09-29  9999998
9999999 2015-09-29  9999999

[10000000 rows x 2 columns]


### Example Code in Pandas
Here is a simple data transformation code in Pandas that processes a column of datetime values and creates two new columns:

In [6]:
import time
import pandas as pd

def data_transform():
    t0 = time.time()
    df = pd.read_parquet("pd_example.pq")
    df["B"] = df.apply(lambda r: "NA" if pd.isna(r.A) else "P1" if r.A.month < 5 else "P2", axis=1)
    df["C"] = df.A.dt.month
    t2 = time.time()
    print("Total time: {:.2f}".format(time.time()-t0))
    return df

data_transform()

Total time: 115.97


Unnamed: 0,A,B,C
0,NaT,,
1,2013-01-03,P1,1.0
2,2013-01-03,P1,1.0
3,NaT,,
4,2013-01-03,P1,1.0
...,...,...,...
9999995,2015-09-29,P2,9.0
9999996,2015-09-29,P2,9.0
9999997,2015-09-29,P2,9.0
9999998,2015-09-29,P2,9.0


Standard Python is quite slow for these data transforms since
1. The use of custom code inside apply() does not let Pandas run an optimized prebuilt C library in its backend. Therefore, the Python interpreter overheads dominate.
2. Python uses just a single CPU core and does not parallelize computation.

Bodo solves both of these problems as we demonstrate below.

### Using Bodo JIT Decorator
Bodo optimizes and parallelizes data workloads by providing just-in-time (JIT) compilation. To run the code with Bodo, all that we have to do is add the `bodo.jit` decorator to the function.

In [9]:
import pandas as pd
import bodo
import time

@bodo.jit
def data_transform():
    t0 = time.time()
    df = pd.read_parquet("pd_example.pq")
    df["B"] = df.apply(lambda r: "NA" if pd.isna(r.A) else "P1" if r.A.month < 5 else "P2", axis=1)
    df["C"] = df.A.dt.month
    print("Total time: {:.2f}".format(time.time()-t0))
    return df

data_transform()

Total time: 0.71


Unnamed: 0,A,B,C
0,NaT,,
1,2013-01-03,P1,1
2,2013-01-03,P1,1
3,NaT,,
4,2013-01-03,P1,1
...,...,...,...
9999995,2015-09-29,P2,9
9999996,2015-09-29,P2,9
9999997,2015-09-29,P2,9
9999998,2015-09-29,P2,9


Even though the code is still running on a single core, it is 94x faster because Bodo compiles the function into a native binary, eliminating the interpreter overheads in apply.

Now let’s run the code on 8 cores using using the `%%px` [*magic*](https://ipyparallel.readthedocs.io/en/latest/tutorial/magics.html) to run on MPI engines:

In [10]:
%%px
import pandas as pd
import time

@bodo.jit
def data_transform():
    t0 = time.time()
    df = pd.read_parquet("pd_example.pq")
    t1 = time.time()
    df["B"] = df.apply(lambda r: "NA" if pd.isna(r.A) else "P1" if r.A.month < 5 else "P2", axis=1)
    df["C"] = df.A.dt.month
    t2 = time.time()
    print("IO time: {:.2f}".format(t2-t1))
    print("Compute time: {:.2f}".format(time.time()-t0))
    print("Total time: {:.2f}".format(time.time()-t0))
    return df

data_transform()

%px:   0%|                                                                                                                         | 0/8 [00:08<?, ?tasks/s]

Unnamed: 0,A,B,C
6250000,2014-09-20,P2,9
...,...,...,...
7499999,2015-01-22,P1,1


[stdout:0] IO time: 0.14
Compute time: 0.65
Total time: 0.65


%px:  25%|████████████████████████████▎                                                                                    | 2/8 [00:08<00:00, 18.97tasks/s]

Unnamed: 0,A,B,C
7500000,2015-01-23,P1,1
...,...,...,...
8749999,2015-05-27,P2,5


Unnamed: 0,A,B,C
3750000,2014-01-13,P1,1
...,...,...,...
4999999,2014-05-17,P2,5


Unnamed: 0,A,B,C
0,NaT,,
...,...,...,...
1249999,2013-05-07,P2,5


%px: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:08<00:00,  1.12s/tasks]

Unnamed: 0,A,B,C
2500000,2013-09-10,P2,9
...,...,...,...
3749999,2014-01-12,P1,1





Unnamed: 0,A,B,C
5000000,2014-05-18,P2,5
...,...,...,...
6249999,2014-09-19,P2,9


Unnamed: 0,A,B,C
8750000,2015-05-28,P2,5
...,...,...,...
9999999,2015-09-29,P2,9


Unnamed: 0,A,B,C
1250000,2013-05-08,P2,5
...,...,...,...
2499999,2013-09-09,P2,9


Although the program appears to be a regular sequential Python program, Bodo compiles and *transforms* the decorated code (the `data_transform` function in this example) under the hood, so that it can run in parallel on many cores. Each core operates on a different chunk of the data and communicates with other cores when necessary. The speedup depends on the data and program characteristics, as well as the number of cores used. Usually, we can continue scaling to many more cores as long as the data is large enough.

### Compilation Time and Caching
Bodo’s JIT workflow compiles the function the first time it is called, but reuses the compiled version for subsequent calls. In the previous example, we added timers inside the function to avoid measuring compilation time. Let’s move the timers outside and call the function twice:

In [9]:
@bodo.jit
def data_transform():
    df = pd.read_parquet("pd_example.pq")
    df["B"] = df.apply(lambda r: "NA" if pd.isna(r.A) else "P1" if r.A.month < 5 else "P2", axis=1)
    df["C"] = df.A.dt.month
    df.to_parquet("bodo_output.pq")


t0 = time.time()
data_transform()
print("Total time first call: {:.2f}".format(time.time()-t0))
t0 = time.time()
data_transform()
print("Total time second call: {:.2f}".format(time.time()-t0))

Total time first call: 4.40
Total time second call: 1.97


The first call is slower due to compilation of the function, but the second call reuses the compiled version and runs faster. See [Caching](https://docs.bodo.ai/performance/caching/?h=caching) for more information.

### Parallel Python Processes
![Groupby shuffle communication pattern](img/groupby.jpg)

Bodo uses the MPI parallelism model, which runs the full program on all cores from the beginning. Essentially, mpiexec launches idential Python processes but Bodo divides the data and computation in JIT functions to exploit parallelism.

In [10]:
%%px

def load_data_pandas():
    df = pd.read_parquet("pd_example.pq")
    print("pandas dataframe: \n", df)

@bodo.jit
def load_data_bodo():
    df = pd.read_parquet("pd_example.pq")
    print("Bodo dataframe: \n", df)

load_data_pandas()
load_data_bodo()

%px:   0%|                                                                                               | 0/8 [00:00<?, ?tasks/s]

[stdout:3] pandas dataframe: 
                  A        B
0              NaT        0
...            ...      ...
9999999 2015-09-29  9999999

[10000000 rows x 2 columns]
Bodo dataframe: 
                  A        B
3750000 2014-01-13  3750000
...            ...      ...
4999999 2014-05-17  4999999

[1250000 rows x 2 columns]


[stdout:2] pandas dataframe: 
                  A        B
0              NaT        0
...            ...      ...
9999999 2015-09-29  9999999

[10000000 rows x 2 columns]
Bodo dataframe: 
                  A        B
2500000 2013-09-10  2500000
...            ...      ...
3749999 2014-01-12  3749999

[1250000 rows x 2 columns]


[stdout:5] pandas dataframe: 
                  A        B
0              NaT        0
...            ...      ...
9999999 2015-09-29  9999999

[10000000 rows x 2 columns]
Bodo dataframe: 
                  A        B
6250000 2014-09-20  6250000
...            ...      ...
7499999 2015-01-22  7499999

[1250000 rows x 2 columns]


[stdout:7] pandas dataframe: 
                  A        B
0              NaT        0
...            ...      ...
9999999 2015-09-29  9999999

[10000000 rows x 2 columns]
Bodo dataframe: 
                  A        B
8750000 2015-05-28  8750000
...            ...      ...
9999999 2015-09-29  9999999

[1250000 rows x 2 columns]


[stdout:6] pandas dataframe: 
                  A        B
0              NaT        0
...            ...      ...
9999999 2015-09-29  9999999

[10000000 rows x 2 columns]
Bodo dataframe: 
                  A        B
7500000 2015-01-23  7500000
...            ...      ...
8749999 2015-05-27  8749999

[1250000 rows x 2 columns]


[stdout:4] pandas dataframe: 
                  A        B
0              NaT        0
...            ...      ...
9999999 2015-09-29  9999999

[10000000 rows x 2 columns]
Bodo dataframe: 
                  A        B
5000000 2014-05-18  5000000
...            ...      ...
6249999 2014-09-19  6249999

[1250000 rows x 2 columns]


%px:   0%|                                                                                               | 0/8 [00:00<?, ?tasks/s]

[stdout:0] pandas dataframe: 
                  A        B
0              NaT        0
...            ...      ...
9999999 2015-09-29  9999999

[10000000 rows x 2 columns]
Bodo dataframe: 
                  A        B
0              NaT        0
...            ...      ...
1249999 2013-05-07  1249999

[1250000 rows x 2 columns]


[stdout:1] pandas dataframe: 
                  A        B
0              NaT        0
...            ...      ...
9999999 2015-09-29  9999999

[10000000 rows x 2 columns]
Bodo dataframe: 
                  A        B
1250000 2013-05-08  1250000
...            ...      ...
2499999 2013-09-09  2499999

[1250000 rows x 2 columns]


%px: 100%|███████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:04<00:00,  1.84tasks/s]


The first eight dataframes printed are regular Pandas dataframes which are replicated on both processes and have all 10 million rows. However, the last eight dataframes printed are Bodo parallelized Pandas dataframes, with 1.25 million rows each. In this case, Bodo parallelizes read_parquet automatically and loads different chunks of data into different cores. Therefore, the non-JIT parts of the Python program are replicated across cores whereas Bodo JIT functions are parallelized. For more information on handling distributed data in python/JIT code, see [Handling distributed data](https://docs.bodo.ai/file_io/?h=data)

### Parallel Computation
Bodo automatically divides computation and manages communication across cores as this example demonstrates:

In [11]:
%%px

@bodo.jit
def data_groupby():
    df = pd.read_parquet("pd_example.pq")
    df2 = df.groupby("A", as_index=False).sum()
    print(df2)

This program uses groupby which requires rows with the same key to be aggregated together. Therefore, Bodo shuffles the data automatically under the hoods using MPI, and the user doesn’t need to worry about parallelism challenges like communication.

### Bodo JIT Requirements
Bodo JIT supports specific APIs in Pandas currently, and other APIs cannot be used inside JIT functions. For example:

In [12]:
%%px

@bodo.jit
def df_unsupported():
    df = pd.DataFrame({"A": [1, 2, 3]})
    df2 = df.transpose()
    return df2

df_unsupported()

%px: 100%|██████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 166.49tasks/s]


[6:execute]
[0;31m---------------------------------------------------------------------------[0m
[0;31mBodoError[0m                                 Traceback (most recent call last)
Input [0;32mIn [5][0m, in [0;36m<module>[0;34m[0m
[1;32m      4[0m     df2 [38;5;241m=[39m df[38;5;241m.[39mtranspose()
[1;32m      5[0m     [38;5;28;01mreturn[39;00m df2
[0;32m----> 7[0m [43mdf_unsupported[49m[43m([49m[43m)[49m

File [0;32m~/opt/anaconda3/envs/bodo/lib/python3.9/site-packages/bodo/numba_compat.py:764[0m, in [0;36m_compile_for_args[0;34m(***failed resolving arguments***)[0m
[1;32m    762[0m     [38;5;28;01mdel[39;00m args
[1;32m    763[0m     [38;5;28;01mif[39;00m error:
[0;32m--> 764[0m         [38;5;28;01mraise[39;00m error
[1;32m    765[0m [38;5;28;01mreturn[39;00m bee__phvsr

[0;31mBodoError[0m: [1m[1m[1m[1m[1mDataFrame.transpose() not supported yet[0m
[1m
File "../../../../var/folders/69/zhshxt5s1qb53jfcc1pg95mr0000gn/T/ipykernel

AlreadyDisplayedError: 8 errors

As the error indicates, Bodo doesn’t currently support the transpose call in JIT functions. In these cases, an alternative API should be used or this portion of the code should be done in regular Python. See [Pandas Operations](https://docs.bodo.ai/api_docs/pandas/general/#pdcrosstab) for the complete list of supported Pandas operations.

### Type Stability
The key requirement of JIT compilation is being able to infer data types for all variables and values. In Bodo, column names are part of dataframe data types, so Bodo tries to infer column name related inputs in all operations. For example, key names in groupby are used to determine the output data type and need to be known to Bodo:

In [13]:
@bodo.jit
def get_keys():
    keys = []
    keys.append("A")
    return keys

@bodo.jit
def groupby_keys():
    df = pd.read_parquet("pd_example.pq")
    keys = get_keys()  # some computation that cannot be inferred
    df2 = df.groupby(keys).sum()
    print(df2)
    
groupby_keys()

BodoError: [1m[1m[1m[1m[1mgroupby(): 'by' parameter only supports a constant column label or column labels, not list(unicode_type)<iv=None>.[0m
[1m
File "../../../../var/folders/69/zhshxt5s1qb53jfcc1pg95mr0000gn/T/ipykernel_31579/3283720622.py", line 11:[0m
[1m<source missing, REPL/exec in use?>[0m
[0m[0m[0m[0m

In this case, the list of groupby keys is determined by a separate get_keys() function, and Bodo is not able to infer it from the program during compilation time. The alternative is to pass the keys as an argument to the JIT function to make the values known to Bodo:

In [14]:
def get_keys():
    keys = []
    keys.append("A")
    return keys

@bodo.jit
def groupby_keys(keys):
    df = pd.read_parquet("pd_example.pq")
    df2 = df.groupby(keys).sum()
    print(df2)
    
groupby_keys(get_keys())

                      B
A                      
2013-01-03     48496500
2013-01-04    149995000
2013-01-05    249995000
2013-01-06    349995000
2013-01-07    449995000
...                 ...
2015-09-25  99549995000
2015-09-26  99649995000
2015-09-27  99749995000
2015-09-28  99849995000
2015-09-29  99949995000

[1000 rows x 1 columns]


This program works since `keys` is passed from regular Python to the JIT function. In addition, we recommend small functions like `get_keys` that don’t use large datasets to be in regular Python in general.

For more information on out type stability requirement, see our [Documentation on compile time constants](https://docs.bodo.ai/bodo_parallelism/compile_time_constants/?h=compile+time+constan)

### Python Features

Bodo uses [Numba](https://numba.pydata.org/) for compiling regular Python features and some of Numba’s requirements applyto Bodo as well. For example, values in data structures like lists should have the same data type. This example fails since list values are either integers or strings:

In [15]:
@bodo.jit
def create_list():
    out = []
    out.append(0)
    out.append("A")
    out.append(1)
    out.append("B")
    return out

create_list()

TypingError: [1m[1m[1mInvalid use of BoundFunction(list.append for list(int64)<iv=None>) with parameters (Literal[str](A))
[0m[0m[0m
[1m
File "../../../../var/folders/69/zhshxt5s1qb53jfcc1pg95mr0000gn/T/ipykernel_31579/3649282494.py", line 5:[0m
[1m<source missing, REPL/exec in use?>[0m

[1m
File "../../../../var/folders/69/zhshxt5s1qb53jfcc1pg95mr0000gn/T/ipykernel_31579/3649282494.py", line 5:[0m
[1m<source missing, REPL/exec in use?>[0m


Using tuples can often solve these problems since tuples can hold values of different types:

In [16]:
@bodo.jit
def create_list():
    out = []
    out.append((0, "A"))
    out.append((1, "B"))
    return out
create_list()

[(0, 'A'), (1, 'B')]

See [Unsupported Python Programs](https://docs.bodo.ai/2022.6/bodo_parallelism/not_supported/?h=bodo+pr) for more details.