# Bodo Getting Started Tutorial

Bodo is the simplest and most efficient analytics engine. It accelerates and scales data science programs
automatically and enables instant deployment, eliminating the need to rewrite Python analytics code to Spark/Scala, SQL or MPI/C++.
In this tutorial, we will cover the basics of using Bodo and explain how it works under the hoods.

In a nutshell, Bodo provides a just-in-time (JIT) compilation workflow using the @bodo.jit decorator. It replaces the decorated Python functions with an optimized and parallelized binary version using advanced compilation methods.

Let's get started!

## Environment Setup
Make sure MPI engines are started in the `IPython Clusters` tab (or using `ipcluster start -n 4 --profile=mpi`), then initialize the `ipyparallel` environment:

In [8]:
import ipyparallel as ipp
c = ipp.Client(profile='mpi')
view = c[:]
view.activate()

## Pi Example
Let's start with a simple example, which computes the value of Pi using Monte-Carlo Integration, to get familiar with the execution environment. Here is the Python version:

In [2]:
import numpy as np
import time

def calc_pi(n):
    t1 = time.time()
    x = 2 * np.random.ranf(n) - 1
    y = 2 * np.random.ranf(n) - 1
    pi = 4 * np.sum(x**2 + y**2 < 1) / n
    print("Execution time:", time.time()-t1, "\nresult:", pi)
    return pi

calc_pi(2 * 10**8)

Execution time: 11.916785955429077 
result: 3.14167832


3.14167832

Now let's add the @bodo.jit decorator and run it (without parallel engines):

In [3]:
import bodo

@bodo.jit
def calc_pi(n):
    t1 = time.time()
    x = 2 * np.random.ranf(n) - 1
    y = 2 * np.random.ranf(n) - 1
    pi = 4 * np.sum(x**2 + y**2 < 1) / n
    print("Execution time:", time.time()-t1, "\nresult:", pi)
    return pi

calc_pi(2 * 10**8)

Execution time: 3.017206907272339 
result: 3.14152218


3.14152218

We see significant speedup due to compiler optimization. Now let's use the parallel engines using the `%%px` magic:

In [4]:
%%px --block
import bodo
import numpy as np
import time

@bodo.jit
def calc_pi(n):
    t1 = time.time()
    x = 2 * np.random.ranf(n) - 1
    y = 2 * np.random.ranf(n) - 1
    pi = 4 * np.sum(x**2 + y**2 < 1) / n
    print("Execution time:", time.time()-t1, "\nresult:", pi)
    return pi

p = calc_pi(2 * 10**8)

[stdout:0] 
Execution time: 1.522454023361206 
result: 3.1416625


Bodo automatically parallelizes this code and distributes the work among parallel engines. Hence, we see additional speedup depending on the number of cores used by engines.

## File Input
<img style="float: right;" src="img/file-read.jpg">
Efficient parallel data processing requries data I/O to be parallelized effectively as well. Bodo provides parallel file I/O for different formats such as CSV, Parquet, and HDF5 (not covered in this tutorial). This diagram demonstrates how chunks of data are partitioned among execution engines by Bodo.

### Parquet
Parquet is a commonly used file format in analytics due to its efficient columnar storage.
Let's read a Parquet file:


In [10]:
%%px --block
import pandas as pd

@bodo.jit
def pq_read():
    df = pd.read_parquet('cycling_dataset.pq')
    return df.head()

pq_read()

Unnamed: 0.1,Unnamed: 0,altitude,cadence,distance,hr,latitude,longitude,power,speed,time
0,0,185.800003,51,3.46,81,30.313309,-97.732711,45,3.459,2016-10-20 22:01:26
1,1,185.800003,68,7.17,82,30.313277,-97.732715,0,3.71,2016-10-20 22:01:27
2,2,186.399994,38,11.04,82,30.313243,-97.732717,42,3.874,2016-10-20 22:01:28
3,3,186.800003,38,15.18,83,30.313212,-97.73272,5,4.135,2016-10-20 22:01:29
4,4,186.600006,38,19.43,83,30.313172,-97.732723,1,4.25,2016-10-20 22:01:30


Unnamed: 0.1,Unnamed: 0,altitude,cadence,distance,hr,latitude,longitude,power,speed,time
0,0,185.800003,51,3.46,81,30.313309,-97.732711,45,3.459,2016-10-20 22:01:26
1,1,185.800003,68,7.17,82,30.313277,-97.732715,0,3.71,2016-10-20 22:01:27
2,2,186.399994,38,11.04,82,30.313243,-97.732717,42,3.874,2016-10-20 22:01:28
3,3,186.800003,38,15.18,83,30.313212,-97.73272,5,4.135,2016-10-20 22:01:29
4,4,186.600006,38,19.43,83,30.313172,-97.732723,1,4.25,2016-10-20 22:01:30


Unnamed: 0.1,Unnamed: 0,altitude,cadence,distance,hr,latitude,longitude,power,speed,time
0,0,185.800003,51,3.46,81,30.313309,-97.732711,45,3.459,2016-10-20 22:01:26
1,1,185.800003,68,7.17,82,30.313277,-97.732715,0,3.71,2016-10-20 22:01:27
2,2,186.399994,38,11.04,82,30.313243,-97.732717,42,3.874,2016-10-20 22:01:28
3,3,186.800003,38,15.18,83,30.313212,-97.73272,5,4.135,2016-10-20 22:01:29
4,4,186.600006,38,19.43,83,30.313172,-97.732723,1,4.25,2016-10-20 22:01:30


Unnamed: 0.1,Unnamed: 0,altitude,cadence,distance,hr,latitude,longitude,power,speed,time
0,0,185.800003,51,3.46,81,30.313309,-97.732711,45,3.459,2016-10-20 22:01:26
1,1,185.800003,68,7.17,82,30.313277,-97.732715,0,3.71,2016-10-20 22:01:27
2,2,186.399994,38,11.04,82,30.313243,-97.732717,42,3.874,2016-10-20 22:01:28
3,3,186.800003,38,15.18,83,30.313212,-97.73272,5,4.135,2016-10-20 22:01:29
4,4,186.600006,38,19.43,83,30.313172,-97.732723,1,4.25,2016-10-20 22:01:30


Bodo parallelizes `pd.read_parquet` and reads equal chunks of dataframe data into the engines. Hence, the returned `df` dataframe is fully distributed and ready for scalable computation.

### CSV
Let's read the same data from a CSV file:

In [11]:
%%px --block

@bodo.jit
def csv_example(fname):
    coltypes = {'altitude': np.float64,
                'cadence': np.float64,
                'distance': np.float64,
                'hr': np.float64,
                'latitude': np.float64,
                'longitude': np.float64,
                'power': np.float64,
                'speed': np.float64,
                'time': str}
    df = pd.read_csv(fname, names=coltypes.keys(), dtype=coltypes)
    return df.head()

fname = 'cycling_dataset.csv'
csv_example(fname)

Unnamed: 0,altitude,cadence,distance,hr,latitude,longitude,power,speed,time
0,0.0,0.0,185.800003,51.0,3.46,81.0,30.313309,-97.732711,45
1,1.0,1.0,185.800003,68.0,7.17,82.0,30.313277,-97.732715,0
2,2.0,2.0,186.399994,38.0,11.04,82.0,30.313243,-97.732717,42
3,3.0,3.0,186.800003,38.0,15.18,83.0,30.313212,-97.73272,5
4,4.0,4.0,186.600006,38.0,19.43,83.0,30.313172,-97.732723,1


Unnamed: 0,altitude,cadence,distance,hr,latitude,longitude,power,speed,time
0,0.0,0.0,185.800003,51.0,3.46,81.0,30.313309,-97.732711,45
1,1.0,1.0,185.800003,68.0,7.17,82.0,30.313277,-97.732715,0
2,2.0,2.0,186.399994,38.0,11.04,82.0,30.313243,-97.732717,42
3,3.0,3.0,186.800003,38.0,15.18,83.0,30.313212,-97.73272,5
4,4.0,4.0,186.600006,38.0,19.43,83.0,30.313172,-97.732723,1


Unnamed: 0,altitude,cadence,distance,hr,latitude,longitude,power,speed,time
0,0.0,0.0,185.800003,51.0,3.46,81.0,30.313309,-97.732711,45
1,1.0,1.0,185.800003,68.0,7.17,82.0,30.313277,-97.732715,0
2,2.0,2.0,186.399994,38.0,11.04,82.0,30.313243,-97.732717,42
3,3.0,3.0,186.800003,38.0,15.18,83.0,30.313212,-97.73272,5
4,4.0,4.0,186.600006,38.0,19.43,83.0,30.313172,-97.732723,1


Unnamed: 0,altitude,cadence,distance,hr,latitude,longitude,power,speed,time
0,0.0,0.0,185.800003,51.0,3.46,81.0,30.313309,-97.732711,45
1,1.0,1.0,185.800003,68.0,7.17,82.0,30.313277,-97.732715,0
2,2.0,2.0,186.399994,38.0,11.04,82.0,30.313243,-97.732717,42
3,3.0,3.0,186.800003,38.0,15.18,83.0,30.313212,-97.73272,5
4,4.0,4.0,186.600006,38.0,19.43,83.0,30.313172,-97.732723,1


In this example, we need to provide column types since file name is not a constant in the function to allow Bodo to know the data types during compilation.

<img style="float: right;" src="img/data-parallel.jpg">

## Data-Parallel Operations
Many operations in Numpy and Pandas are fully data-parallel, which let's Bodo parallelize them across data blocks without communication between processors.
Examples are many of math operators, filtering, combining columns, normlization, dropping rows/columns, etc.

Let's drop some rows and columns, and create a new column by extracting month from the time column.

In [3]:
%%px --block

@bodo.jit
def data_par():
    df = pd.read_parquet('cycling_dataset.pq')
    df = df[df.power!=0]
    df['month'] = df.time.dt.month
    df = df.drop(['latitude', 'longitude', 'power', 'time'], axis=1)
    return df.head()

data_par()

Unnamed: 0.1,Unnamed: 0,altitude,cadence,distance,hr,speed,month
0,0,185.800003,51,3.46,81,3.459,10
2,2,186.399994,38,11.04,82,3.874,10
3,3,186.800003,38,15.18,83,4.135,10
4,4,186.600006,38,19.43,83,4.25,10
12,12,186.199997,0,51.610001,80,3.029,10


Unnamed: 0.1,Unnamed: 0,altitude,cadence,distance,hr,speed,month
0,0,185.800003,51,3.46,81,3.459,10
2,2,186.399994,38,11.04,82,3.874,10
3,3,186.800003,38,15.18,83,4.135,10
4,4,186.600006,38,19.43,83,4.25,10
12,12,186.199997,0,51.610001,80,3.029,10


Unnamed: 0.1,Unnamed: 0,altitude,cadence,distance,hr,speed,month
0,0,185.800003,51,3.46,81,3.459,10
2,2,186.399994,38,11.04,82,3.874,10
3,3,186.800003,38,15.18,83,4.135,10
4,4,186.600006,38,19.43,83,4.25,10
12,12,186.199997,0,51.610001,80,3.029,10


Unnamed: 0.1,Unnamed: 0,altitude,cadence,distance,hr,speed,month
0,0,185.800003,51,3.46,81,3.459,10
2,2,186.399994,38,11.04,82,3.874,10
3,3,186.800003,38,15.18,83,4.135,10
4,4,186.600006,38,19.43,83,4.25,10
12,12,186.199997,0,51.610001,80,3.029,10


<img style="float: right;" src="img/reduction.jpg">

## Reduction operations
Some operators such as `sum` require reduction operation across all of data, which implies communication across data blocks. Bodo handles these operations using efficient MPI communication, and makes the output available on all processors.

As an example let's compute the mean of the 'power' column.

In [4]:
%%px --block

@bodo.jit
def mean_power():
    df = pd.read_parquet('cycling_dataset.pq')
    return df.power.mean()

print(mean_power())

[stdout:0] 102.07842132239877
[stdout:1] 102.07842132239877
[stdout:2] 102.07842132239877
[stdout:3] 102.07842132239877


<img style="float: right;" src="img/groupby.jpg">

## GroupBy/Aggregation
Grouping operations, which are typically followed by aggregations/reductions, are
more challenging for parallel and distributed environments. Bodo uses efficient MPI communication primitives to provide fast and scalable groupby/aggregations.

Let's compute the average power output per hour:

In [5]:
%%px --block
import pandas as pd
import numpy as np
import bodo

@bodo.jit
def mean_power_pm():
    df = pd.read_parquet('cycling_dataset.pq')
    df['hour'] = df.time.dt.hour
    grp = df.groupby('hour')
    mean_df = grp['power'].mean()
    return mean_df.head()

mean_power_pm()

[0;31mOut[0:3]: [0m
22    110.625821
23     71.754079
Name: power, dtype: float64

[0;31mOut[1:3]: [0m
22    110.625821
23     71.754079
Name: power, dtype: float64

[0;31mOut[2:3]: [0m
22    110.625821
23     71.754079
Name: power, dtype: float64

[0;31mOut[3:3]: [0m
22    110.625821
23     71.754079
Name: power, dtype: float64

<img style="float: right;" src="img/rolling.jpg">

## Sliding Windows
Some popular analytics operations, especially for time-series analysis, are based on sliding windows. Examples include moving averages and percentage change. In a distributed setup, these require communication beyond map-reduce (which is the basis of most systems such as Spark). Bodo handles these cases using efficient patterns known from HPC.

Let's compute the moving average of the heart-rate.

In [12]:
%%px --block

@bodo.jit
def mov_avg():
    df = pd.read_parquet('cycling_dataset.pq')
    mv_av = df.hr.rolling(4).mean()
    return mv_av.head()

mov_avg()

[0;31mOut[0:4]: [0m
0     NaN
1     NaN
2     NaN
3    82.0
4    82.5
Name: hr, dtype: float64

[0;31mOut[1:4]: [0m
0     NaN
1     NaN
2     NaN
3    82.0
4    82.5
Name: hr, dtype: float64

[0;31mOut[2:4]: [0m
0     NaN
1     NaN
2     NaN
3    82.0
4    82.5
Name: hr, dtype: float64

[0;31mOut[3:4]: [0m
0     NaN
1     NaN
2     NaN
3    82.0
4    82.5
Name: hr, dtype: float64

## Join
Bodo can also efficiently join dataframes, which uses a communication pattern similar to Groupby.

Let's read data, split into 2 dataframes and re-join on time column.

In [13]:
%%px --block

@bodo.jit
def merge_dfs():
    df = pd.read_parquet('cycling_dataset.pq')
    df1 = df[['altitude', 'cadence', 'distance', 'hr', 'time']]
    df2 = df[['latitude', 'longitude', 'power', 'speed', 'time']]
    df3 = df1.merge(df2, on='time')
    return df3.head()

merge_dfs()

Unnamed: 0,altitude,cadence,distance,hr,time,latitude,longitude,power,speed
0,185.800003,51,3.46,81,2016-10-20 22:01:26,30.313309,-97.732711,45,3.459
1,185.800003,68,7.17,82,2016-10-20 22:01:27,30.313277,-97.732715,0,3.71
2,186.399994,38,11.04,82,2016-10-20 22:01:28,30.313243,-97.732717,42,3.874
3,186.800003,38,15.18,83,2016-10-20 22:01:29,30.313212,-97.73272,5,4.135
4,186.600006,38,19.43,83,2016-10-20 22:01:30,30.313172,-97.732723,1,4.25


Unnamed: 0,altitude,cadence,distance,hr,time,latitude,longitude,power,speed
0,185.800003,51,3.46,81,2016-10-20 22:01:26,30.313309,-97.732711,45,3.459
1,185.800003,68,7.17,82,2016-10-20 22:01:27,30.313277,-97.732715,0,3.71
2,186.399994,38,11.04,82,2016-10-20 22:01:28,30.313243,-97.732717,42,3.874
3,186.800003,38,15.18,83,2016-10-20 22:01:29,30.313212,-97.73272,5,4.135
4,186.600006,38,19.43,83,2016-10-20 22:01:30,30.313172,-97.732723,1,4.25


Unnamed: 0,altitude,cadence,distance,hr,time,latitude,longitude,power,speed
0,185.800003,51,3.46,81,2016-10-20 22:01:26,30.313309,-97.732711,45,3.459
1,185.800003,68,7.17,82,2016-10-20 22:01:27,30.313277,-97.732715,0,3.71
2,186.399994,38,11.04,82,2016-10-20 22:01:28,30.313243,-97.732717,42,3.874
3,186.800003,38,15.18,83,2016-10-20 22:01:29,30.313212,-97.73272,5,4.135
4,186.600006,38,19.43,83,2016-10-20 22:01:30,30.313172,-97.732723,1,4.25


Unnamed: 0,altitude,cadence,distance,hr,time,latitude,longitude,power,speed
0,185.800003,51,3.46,81,2016-10-20 22:01:26,30.313309,-97.732711,45,3.459
1,185.800003,68,7.17,82,2016-10-20 22:01:27,30.313277,-97.732715,0,3.71
2,186.399994,38,11.04,82,2016-10-20 22:01:28,30.313243,-97.732717,42,3.874
3,186.800003,38,15.18,83,2016-10-20 22:01:29,30.313212,-97.73272,5,4.135
4,186.600006,38,19.43,83,2016-10-20 22:01:30,30.313172,-97.732723,1,4.25


## Automatic Distribution

Bodo automatically distributes data and computation of the target function by analyzing it for parallelization. It chooses the best and *safest* possible distribution. For example, returning distributed data is not necessary *safe*, since the code outside of the bodo scope would need to handle chunks of data instead of the full data. Consider the example below:

In [41]:
%%px --block

@bodo.jit
def read_pq():
    df = pd.read_parquet('cycling_dataset.pq')
    return df

df = read_pq()
print(df.shape)
read_pq.distributed_diagnostics()

[stdout:0] 
(3902, 10)
Distributed diagnostics for function read_pq, <ipython-input-14-9a3196442774> (2)

Data distributions:
   Unnamed: 0.11257           REP
   altitude.11258             REP
   cadence.11259              REP
   distance.11260             REP
   hr.11261                   REP
   latitude.11262             REP
   longitude.11263            REP
   power.11264                REP
   speed.11265                REP
   time.11266                 REP
   __index_level_0__.11267    REP
   $0.23.11389                REP
   $df.11422                  REP
   $0.6                       REP

Parfor distributions:
No parfors to distribute.

Distributed listing for function read_pq, <ipython-input-14-9a3196442774> (2)
--------------------------------------------------| parfor_id/variable: distribution
@bodo.jit                                         | 
def read_pq():                                    | 
    df = pd.read_parquet('cycling_dataset.pq')----| Unnamed: 0.11257: REP, alti

[stderr:0] 
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

To find out why, try turning on parallel diagnostics, see http://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help.

File "<ipython-input-14-9a3196442774>", line 3:
@bodo.jit
def read_pq():
^

  self.func_ir.loc))
[stderr:1] 
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

To find out why, try turning on parallel diagnostics, see http://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help.

File "<ipython-input-14-9a3196442774>", line 3:
@bodo.jit
def read_pq():
^

  self.func_ir.loc))
[stderr:2] 
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

To find out why, try turning on parallel diagnostics, see http://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help.

File "<ipython

The `distributed_diagnostics` function prints diagnostics information about distribution analysis by Bodo. In this case, all variables are assigned the `REP` distribution, which means they are replicated and there is no distribution of data. The reason is the return of `df`, which also propagates `REP` to all other variables since they are involved in parallel computation with `df`.

We can change this behavior by a simple annotation for `df`:

In [42]:
%%px --block

@bodo.jit(distributed=['df'])
def read_pq():
    df = pd.read_parquet('cycling_dataset.pq')
    return df

df = read_pq()
print(df.shape)
read_pq.distributed_diagnostics()

[stdout:0] 
(976, 10)
Distributed diagnostics for function read_pq, <ipython-input-15-33fdda938c29> (2)

Data distributions:
   Unnamed: 0.11531            1D_Block
   altitude.11532              1D_Block
   cadence.11533               1D_Block
   distance.11534              1D_Block
   hr.11535                    1D_Block
   latitude.11536              1D_Block
   longitude.11537             1D_Block
   power.11538                 1D_Block
   speed.11539                 1D_Block
   time.11540                  1D_Block
   __index_level_0__.11541     1D_Block
   $0.23.11673                 1D_Block
   $df.11706                   1D_Block
   distributed_return.11600    1D_Block
   $dist_return.11597.11707    1D_Block

Parfor distributions:
No parfors to distribute.

Distributed listing for function read_pq, <ipython-input-15-33fdda938c29> (2)
--------------------------------------------------| parfor_id/variable: distribution
@bodo.jit(distributed=['df'])                     | 
def read_

[stderr:0] 
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

To find out why, try turning on parallel diagnostics, see http://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help.

File "<ipython-input-15-33fdda938c29>", line 3:
@bodo.jit(distributed=['df'])
def read_pq():
^

  self.func_ir.loc))
[stderr:1] 
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

To find out why, try turning on parallel diagnostics, see http://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help.

File "<ipython-input-15-33fdda938c29>", line 3:
@bodo.jit(distributed=['df'])
def read_pq():
^

  self.func_ir.loc))
[stderr:2] 
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

To find out why, try turning on parallel diagnostics, see http://numba.pydata.org/numba-doc/latest/user/parallel.ht

In this case, all variables are assigned the `1D_Block` distribution, which means they are divided in equal chunks among processors. The returned dataframe on each processor is therefore a chunk of the full dataset. This is useful, for example, when computation on chunks is desired outside the scope of Bodo (e.g. mixing Bodo code with custom non-Bodo code and other packages like TensorFlow).