## Parallel Computation with Bodo

+ Finally, let's get some practice using Bodo for parallel computing on a cluster.

---

```
State: RUNNING
Number of instances: 2
Instance Type: c5.2xlarge
Bodo Version: 2022.3.5
```

+ This notebook is executed on a cluster with 4 instances.
+ Each instance has 2 cores.


---

```python
import psutil
n_proc = max(psutil.cpu_count(logical=False), 2)
```

```python
import ipyparallel as ipp
rc = (ipp.Cluster(engines='mpi', n=n_proc)
         .start_and_connect_sync(activate=True))
```

---

+ If you're not running on Bodo's Cloud Platform, you'll have to do some set up.
+ The specific way to choose `n_proc` depends on your setup.
+ This line probes your operating system for available engines.

---

+ Next, you have to initialise engines for `ipyparallel`.
+ The `rc` object created in this fashion serves as an interface.
+ If you're running on Bodo's Cloud platform, these lines are unnecessary.

---

In [1]:
%%px
import pandas as pd, numpy as np
import time
import bodo
pd.set_option('display.precision', 2)

Starting 8 engines with <class 'ipyparallel.cluster.launcher.MPIEngineSetLauncher'>


  0%|          | 0/8 [00:00<?, ?engine/s]

%px:   0%|          | 0/8 [00:00<?, ?tasks/s]

+ Let's prepare each node on the cluster by importing key libraries.
+ Remember, we use the `%%px` cell magic to execute code on all engines.
+ We'll use `px` magics in most cells of this notebook.

---

### A Groupby Example

```python
@bodo.jit
def compute_groupby_mean_bodo():
    DATA_ROOT = 'bodo-examples-data/bodo-training-fundamentals/DATA'
    DATA_SRC  = f's3://{DATA_ROOT}/PARQUET_050'
    t0 = time.time()
    df = pd.read_parquet(DATA_SRC)
    avgs = df.groupby('Product')['Purchase_Amount'].mean()
    t1 = time.time()
    return avgs, t1 - t0
```

```python
avgs, elapsed = compute_groupby_mean_bodo()
print(f'Time elapsed: {elapsed:10.4f} s')
```

```
Time elapsed:    96.9399 s
```

---

+ Returning to the previous groupby example...
---

+ ...we computed the `Purchase_Amount` averages grouped by `Product` category.
+ We used a Bodo-jitted function to compute the resulting Series
  + & return it with the computation time.
+ The data in this function comprised 50 million observations stored in 50 Parquet files.
---
+ That was computed on my laptop with 16 GiB of RAM in about a minute & a half.

---

In [2]:
%%px
@bodo.jit
def compute_groupby_mean_bodo():
    DATA_ROOT = 'bodo-examples-data/bodo-training-fundamentals/DATA'
    DATA_SRC  = f's3://{DATA_ROOT}/PARQUET' # Use full dataset
    t0 = time.time()
    df = pd.read_parquet(DATA_SRC)
    avgs = df.groupby('Product')['Purchase_Amount'].mean()
    t1 = time.time()
    return avgs, t1 - t0

+ We'll modify the preceding function:
  + this time, let's use *the entire 500 million row dataset* stored in *500 files*.
+ Again, we use the `%%px` magic to define the compiled function on *all* engines.

---

In [3]:
%%px
avgs, elapsed = compute_groupby_mean_bodo()

%px:   0%|          | 0/8 [00:00<?, ?tasks/s]

+ This computation works cleanly on this cluster.
+ This compiled version takes advantage of all engines available and distributes the 500 million rows accordingly.

---

```python
%px avgs

Out[0:4]: Series([], Name: Purchase_Amount, dtype: float64)

Out[1:4]: 
Product
Sporting-Goods    129.0
Name: Purchase_Amount, dtype: float64

Out[5:4]: 
Product
Automotive    688.0
Name: Purchase_Amount, dtype: float64

Out[4:4]: Series([], Name: Purchase_Amount, dtype: float64)

    <More output...>
```

+ Trying to display the averages purchase amounts grouped by product is a bit messy.
+ The Series computed are spread out among the engines.
+ Some of these engines hold empty Series and some hold a portion of the final result.

---

In [4]:
%%px
times = bodo.gatherv(np.array([elapsed]))
purchase_means = bodo.gatherv(avgs)

%px:   0%|          | 0/8 [00:00<?, ?tasks/s]

+ To make more sense of the results, let's *gather* them onto the root node.
+ Notice to gather the scalars `elapsed` onto the `root` engine, they need to be wrapped as a singleton NumPy arrays on each engine.
+ By contrast, the Series `avgs` can be gathered from all engines using the data directly.
+ Standard NumPy arrays, Pandas Series & DataFrames all share this behavior with `bodo.gatherv`.

---

In [5]:
%%px
if bodo.get_rank()==0:
    print(f'[{times.min():.2f}, {times.max():.2f}]')
    display(pd.DataFrame(purchase_means).transpose())

[stdout:0] [32.39, 32.48]


[output:0]

Product,Sporting-Goods,Electronics,Computers,Toys,Music,Books,Automotive,Food,Clothes,Health,Beauty
Purchase_Amount,129.0,86.0,860.0,43.0,17.2,34.4,688.0,8.6,64.5,21.5,25.8


+ The desired outputs—`times` & `purchase_means`—have been gathered onto the `root` engine now.
+ The array `times` stores computation times recorded from *all* the groupbys & aggregations on all engines.
+ Rather than printing a single time, we'll print the range from the shortest to the longest—32.39 to 32.48 seconds.
+ We'll also display the computed averages grouped by product as a DataFrame & transposed to tidy the output.

---

```python
%px compute_groupby_mean_bodo.distributed_diagnostics()
```

```
Data distributions:
   pq_table.1               1D_Block
   pq_index.2               1D_Block
   Purchase_Amount.73       1D_Block_Var
   Product.72               1D_Block_Var
   _v14call_method_6_103    1D_Block_Var
   _v16call_method_7_93     1D_Block_Var
   dist_return_tp.23        [<Distribution.OneD_Var: 4>, None]
   dist_return.24           [<Distribution.OneD_Var: 4>, None]

Parfor distributions:
No parfors to distribute.

   <More output ...>
```

+ The `distributed_diagnostics` method uncovers how local objects and data within the jitted function are distributed to compute engines.
+ Notice the compiled version of the code makes use of the fact that only two columns need to be read from the Parquet file:
  + the `Product` and `Purchase_Amount` columns
+ Basically, these are distributed as 1D block arrays to the engines.
+ Remember, by contrast with traditional MPI or `ipyparallel` programming, the Bodo jit compiler is able to handle this data distribution sensibly for us.

---

### A Transformation & Groupby Example

In [6]:
%%px
def extract_score(row):
    scores = {'Terrible':1, 'Fine':2, 'Good':3, 'Great':4}
    return (np.nan if pd.isna(row.Product_Review)
                   else scores[row.Product_Review.split()[0]])

+ The other more complicated example looked at previously did some string processing.
+ Specifically, the function `extract_score` mapped onto the rows of the DataFrame.
  + Its purpose is to extract a numerical score from each row using the `Product_Review`.
+ Remember, we need to define it with the `%%px` magic to ensure it gets defined on all engines.

---

In [7]:
%%px
@bodo.jit
def compute_scores_groupby_means_bodo():
    DATA_ROOT = 'bodo-examples-data/bodo-training-fundamentals/DATA'
    DATA_SRC  = f's3://{DATA_ROOT}/PARQUET' # Use full dataset
    t0 = time.time()
    df = pd.read_parquet(DATA_SRC)
    df['Score'] = df.apply(extract_score, axis=1)
    result = df.groupby('Product')[['Purchase_Amount', 'Score']].mean()
    t1 = time.time()
    return result, t1 - t0

+ With `extract_score` defined, we build the function to 
+ Following that, the average `Score` and `Purchase_Amount` was computed grouped by `Product`.
+ Again, we'll wrap this in a function and apply the Bodo JIT decorator on *all* engines.
+ This time, we'll use the full dataset of 500 million records.

---

In [8]:
%%px
avgs, elapsed = compute_scores_groupby_means_bodo()

%px:   0%|          | 0/8 [00:00<?, ?tasks/s]

+ The function `compute_scores_groupby_means_bodo` is executed on the cluster with all the data.
+ Remember, this would crash on a single node.
+ The compiled Bodo function distributes the data to all available nodes.

---

In [9]:
%%px
purchase_score_means = bodo.gatherv(avgs)
times = bodo.gatherv(np.array([elapsed]))

if bodo.get_rank()==0:
    print(f'[{times.min():.2f}, {times.max():.2f}]')
    display(pd.DataFrame(purchase_score_means).transpose())

[stdout:0] [119.27, 119.46]


[output:0]

Product,Sporting-Goods,Electronics,Computers,Toys,Music,Books,Automotive,Food,Clothes,Health,Beauty
Purchase_Amount,129.0,86.0,860.0,43.0,17.2,34.4,688.0,8.6,64.5,21.5,25.8
Score,2.5,2.54,2.52,2.51,2.49,2.51,2.54,2.52,2.48,2.51,2.5


+ This example took about 2 minutes on every node to process 500 million records.
+ This wasn't feasible on my laptop without crashing before!

---

```python
%px compute_scores_groupby_means_bodo.distributed_diagnostics()
```

```
Data distributions:
   pq_table.371             1D_Block
   pq_index.372             1D_Block
   c0_449                   1D_Block
   S0_476                   1D_Block
   T2_743                   1D_Block
   Purchase_Amount.811      1D_Block_Var
   Score.812                1D_Block_Var
   Product.810              1D_Block_Var
   _v14call_method_6_854    1D_Block_Var
   result                   1D_Block_Var
   dist_return_tp.393       [<Distribution.OneD_Var: 4>, None]
   dist_return.394          [<Distribution.OneD_Var: 4>, None]

Parfor distributions:
   1                    1D_Block
   
   <More output ...>
```

+ Again, the `distributed_diagnostics` method reveals a lot about the compiled function.
+ In particular, there are many `1D_Block`s distributed to the engines...
   + ...to reduce the individual memory load and to share the computation.
+ This is a lot to digest; the documentation provides some insight on how to read this.

---

```python
# (not needed on Bodo Cloud platform)
# Run the following command to stop the cluster:
rc.cluster.stop_cluster_sync()
```

+ Should you be running the notebooks on your laptop or some other machine, you'll need to deactivate the engine processes as above.
+ Again, this is taken care of for you on the Bodo Cloud platform.

---

## Summary

+ `bodo.jit` + `%%px` (or `%px`) for the win!

+ We've seen examples here to illustrate the use of Bodo in distributed computing.
+ By combining JIT compilation with parallel workflows, Bodo enables scaling out more easily.
+ This is particularly useful with datasets approaching super-computing sizes.

---