## Basic worflow of Primula

The current Jupyter notebook represents a basic worflow for Primula. It is designed to be executed from any laptop with python version above 3.4 installed, as all the computationally intensive tasks are performed through serverless functions. We show some sort operations on CSV formatted datasets.

In [None]:
# The config.json file must be previously filled with data extracted from the user's IBM Cloud account 
# (see README.md)
import json
# We load the config.json file and instantiate an executor from IBM-PyWren from such configuration.
config = json.load(open('config.json'))
import pywren_ibm_cloud as pywren
pywren_executor = pywren.ibm_cf_executor(config=config)

In [None]:
# If we plan to automatize our workflow through Primula's inferentiation of the optimal number of
# workers, we must fullfill some connection parameters (bandwidth and throughput) between IBM Cloud
# Functions and IBM Cloud Object Storage.
# Such parameters are automatically calculated through the following function, in a few minutes. 
# Once we have determined the connection parameters, they are saved in pywren's local directory
# and do not have to be recalculated for the next executions.
pywren_executor.setup_shuffle_model()

In [None]:
# Primula's sort call is extremely simple, as it is able to inference the number of workers
# automatically.
pywren_executor.sort("cos://us-east/german-data/Brain02_Bregma1-42_02_v2.csv", primary_key_column=1)

In [None]:
# Connection parameters have already been calculated, so now sort operations can be automatized
# and pipelined with close to zero intervention by the user.
pywren_executor.sort("cos://us-east/german-data/CT26_xenograft.csv", primary_key_column=1)

In [None]:
# We can also give some hints to the sort function for efficiency, for instance, the data type of
# each column.
pywren_executor.sort("cos://us-east/german-data/CT26_xenograft.csv", primary_key_column=1, dtypes=['int32','float64','float32'])

In [None]:
# The user can also specify the number of workers. In this case, we are specifying the number of final
# segments (40) that will output the sort algorithm. Thus, the sort algorithm will launch 40 parallel 
# workers at the map phase and 40 parallel workers at the reduce phase.
pywren_executor.sort(config, "cos://us-east/german-data/X089-Mousebrain.csv", primary_key_column=1, segm_n=40)