# nbmodular

> Convert notebooks to modular code.

Convert data science notebooks with poor modularity to fully modular notebooks that are automatically exported as python modules.

## Motivation

In data science, it is usual to develop experimentally and quickly based on notebooks, with little regard to software engineering practices and modularity. It can become challenging to start working on someone else's notebooks with no modularity in terms of separate functions, and a great degree of duplicated code between the different notebooks. This makes it difficult to understand the logic in terms of semantically separate units, see what are the commonalities and differences between the notebooks, and be able to extend, generalize, and configure the current solution.

## Objectives

`nbmodular` is a library conceived with the objective of helping converting the cells of a notebook into separate functions with clear dependencies in terms of inputs and outputs. This is done though a combination of tools which semi-automatically understand the data-flow in the code, based on mild assumptions about its structure. It also helps test the current logic and compare it against a modularized solution, to make sure that the refactored code is equivalent to the original one. 

## Features

- [x] Convert cells to functions.
- [x] The logic of a single function can be written across multiple cells.
- [x] Functions can be either regular functions or unit test functions.
- [x] Functions and tests are exported to separate python modules. 
- [ ] TODO: use nbdev to sync the exported python module with the notebook code, so that changes to the module are reflected back in the notebook.
- [x] Processed cells can continue to operate as cells or be only used as functions.
- [x] A pipeline function is automatically created and updated. This pipeline provides the data-flow from the first to the last function call in the notebook.
- [x] Functions act as nodes in a dependency graph. These nodes can optionally hold the values of local variables for inspection outside of the function. This is similar to having a single global scope, which is the original situation. Since this is memory-consuming, storing local variables is optional.
- [x] Local variables are persisted in disk, so that we may decide to reuse previous results without running the whole notebook. 
- [ ] TODO: Once we are able to construct a graph, we may be able to draw it or show it in text, and pass it to ADG processors that can run functions sequentially or in parallel.
- [ ] TODO: if we have the dependency graph and persisted inputs / outputs, we may decide to only run those cells that are predecessors of the current one, i.e., the ones that provide the inputs needed by the current cell. 
- [ ] TODO: if we associate a hash code to input data, we may only run the cells when the input data changes. Similarly, if we associate a hash code with AST-converted function code, we may only run those cells whose code has been updated. 
- [ ] TODO:  the output of a test cell can be used for assertions, where we require that the current output is the same as the original one.
- [ ] TODO: Compare the result of the pipeline with the result of running the original notebook.
- [ ] TODO: Currently, AST processing is used for assessing whether variables are modified in the cell or are just read. This just gives an estimate. We may want to compare the values of existing variables before and after running the code in the cell. We may also use a type checker such as mypy to assess whether a variable is immutable in the cell (e.g., mark the variable as Final and see if mypy complaints)
- [ ] TODO: have indicated test be used as examples in docstrings. Have optional flag indicate that the next cell's output should be converted to text and included as example output in the docstring.
- [ ] TODO: have the possibility of writing the tests in the same module as the functions, where each test goes after the function that is testing. This can help as a form of documentation for the function, especially if the test code is not included in the function's docstring.

## Install

```sh
pip install nbmodular
```

## Usage

Load ipython extension

In [None]:
%load_ext nbmodular.core.cell2func

<div style="background-color: rgb(250, 250, 250);">
```python
%load_ext nbmodular.core.cell2func
```
</div>

This allows us to use the following magic commands, among others


- function <name_of_function_to_define>
- print <name_of_previous_function>
- function_info <name_of_previous_function>
- print_pipeline

Let's go one by one

### function

#### Basic usage

The magic command `function` allows to run the code in the cell, as it would be normally done, and at the same time it performs a number of additional steps. Let's go over each one in turn through the following example:

<div style="background-color: rgb(250, 250, 250);">
```python
%%function two_plus_three
a = 2
b = 3
c = a+b
print (f'The result of adding {a}+{b} is {c}')
```

In [None]:
%%function two_plus_three
#|echo: false
a = 2
b = 3
c = a+b
print (f'The result of adding {a}+{b} is {c}')

The result of adding 2+3 is 5


In [None]:
(a, b, c)

(2, 3, 5)

As we can see, the previous cell just runs as it would normally do. In addition to this, the code syntax is analyzed using an `ast`, and the result of this analysis is stored in a new object called `two_plus_three_info`. Let's look at some of the information provided by this object.

First, the object stores the list of variables that were created inside this function:

In [None]:
two_plus_three_info.created_variables

['a', 'b', 'c']

By default, this object also stores the values of those variables:

In [None]:
two_plus_three_info.current_values

{'a': 2, 'b': 3, 'c': 5}

It stores the names of the variables used by this function and created before calling it:

In [None]:
two_plus_three_info.previous_variables

[]

In [None]:
#| hide
assert (a, b, c) == (2, 3, 5)
assert two_plus_three_info.created_variables==['a', 'b', 'c']
assert two_plus_three_info.current_values=={'a': 2, 'b': 3, 'c': 5}
assert two_plus_three_info.previous_variables==[]
assert two_plus_three_info.arguments==[]
assert two_plus_three_info.return_values==[]

In the previous example, there are no previous variables. Let's see a new example with previous variables:

In [None]:
my_previous_variable=10

<div style="background-color: rgb(250, 250, 250);">
```python
%%function add_100
my_previous_variable = my_previous_variable + 100
print (f'The result of adding 100 to my_previous_variable is {my_previous_variable}')
```
</div>

In [None]:
%%function add_100
#|echo: false
my_previous_variable = my_previous_variable + 100
print (f'The result of adding 100 to my_previous_variable is {my_previous_variable}')

The result of adding 100 to my_previous_variable is 110


In [None]:
add_100_info.previous_variables

['my_previous_variable']

`my_previous_variable` is also included in the list of `created_variables`, since a new value for this variable has been generated:

In [None]:
add_100_info.created_variables

['my_previous_variable']

In [None]:
add_100_info.arguments

[]

In [None]:
#| hide
assert my_previous_variable==110
assert add_100_info.created_variables==['my_previous_variable']
assert add_100_info.previous_variables==['my_previous_variable']
assert add_100_info.current_values=={'my_previous_variable': 110}
assert add_100_info.arguments==[]
assert add_100_info.return_values==[]

In addition to the previous analysis, each call to `%%function` creates a new functions, called `two_plus_three` and `add_100` respectively. We can print the code of each function:

<div style="background-color: rgb(250, 250, 250);">
```python
%print two_plus_three
```
</div>

In [None]:
%print two_plus_three 
#| echo: false

def two_plus_three():
    a = 2
    b = 3
    c = a+b
    print (f'The result of adding {a}+{b} is {c}')



<div style="background-color: rgb(250, 250, 250);">
```python
%print add_100
```
</div>

In [None]:
%print add_100 
#| echo: false

def add_100(my_previous_variable):
    my_previous_variable = my_previous_variable + 100
    print (f'The result of adding 100 to my_previous_variable is {my_previous_variable}')



The created functions can be called as with any other defined function:

In [None]:
two_plus_three ()

The result of adding 2+3 is 5


In [None]:
add_100 (45)

The result of adding 100 to my_previous_variable is 145


All the functions created so far can be printed at once using `print all`: 

<div style="background-color: rgb(250, 250, 250);">
```python
%print all
```
</div>

In [None]:
%print all
#| echo: false

def two_plus_three():
    a = 2
    b = 3
    c = a+b
    print (f'The result of adding {a}+{b} is {c}')

def add_100(my_previous_variable):
    my_previous_variable = my_previous_variable + 100
    print (f'The result of adding 100 to my_previous_variable is {my_previous_variable}')



And they are also written to a python module with the same name of the notebook (the current notebook being called "index.ipynb"):

In [None]:
!cat ../nbmodular/index.py

def two_plus_three():
    a = 2
    b = 3
    c = a+b
    print (f'The result of adding {a}+{b} is {c}')

def add_100(my_previous_variable):
    my_previous_variable = my_previous_variable + 100
    print (f'The result of adding 100 to my_previous_variable is {my_previous_variable}')


# -----------------------------------------------------
# pipeline
# -----------------------------------------------------
def index_pipeline (test=False, load=True, save=True, result_file_name="index_pipeline"):
    """Pipeline calling each one of the functions defined in this module."""
    
    # load result
    result_file_name += '.pk'
    path_variables = Path ("index") / result_file_name
    if load and path_variables.exists():
        result = joblib.load (path_variables)
        return result

    two_plus_three ()
    add_100 (my_previous_variable)

    # save result
    result = Bunch ()
    if save:    
        path_variables.parent.mkdir (parents=True, exist_ok=True)
        joblib.dump (res

In [None]:
#| hide
c = %cell_processor
function_call = ('hybrid', '#|hide\nx = 3\nx = x + 4\nprint (x)\n')
c.process_function_call (*function_call)
c.process_function_call (*function_call)
assert hybrid_info.arguments==[]

7
7


In [None]:
hybrid_info.previous_variables

['x']

In [None]:
%print hybrid

def hybrid():
    x = 3
    x = x + 4
    print (x)



In [None]:
#| hide
c = %cell_processor
function_call = ('hybrid', '#|hide\nx = 3\nx = x + 4\nprint (x)\n')
c.debug_function (call_history=[function_call, function_call])

7
> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(588)[0;36mprocess_function_call[0;34m()[0m
[0;32m    587 [0;31m    [0;32mdef[0m [0mprocess_function_call[0m [0;34m([0m[0mself[0m[0;34m,[0m [0mline[0m[0;34m,[0m [0mcell[0m[0;34m,[0m [0madd_call[0m[0;34m=[0m[0;32mTrue[0m[0;34m)[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m--> 588 [0;31m        [0mcall[0m [0;34m=[0m [0;34m([0m[0mline[0m[0;34m,[0m [0mcell[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    589 [0;31m        [0;32mif[0m [0madd_call[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m


ipdb>  b self.create_function_and_run_code


Breakpoint 1 at /home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py:721


ipdb>  c


> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(746)[0;36mcreate_function_and_run_code[0;34m()[0m
[0;32m    745 [0;31m[0;34m[0m[0m
[0m[0;32m--> 746 [0;31m        [0;32mif[0m [0mrestrict_inputs[0m [0;32mis[0m [0;32mNone[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    747 [0;31m            [0mrestrict_inputs[0m [0;34m=[0m [0mself[0m[0;34m.[0m[0mrestrict_inputs[0m[0;34m[0m[0;34m[0m[0m
[0m


ipdb>  n


> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(747)[0;36mcreate_function_and_run_code[0;34m()[0m
[0;32m    746 [0;31m        [0;32mif[0m [0mrestrict_inputs[0m [0;32mis[0m [0;32mNone[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m--> 747 [0;31m            [0mrestrict_inputs[0m [0;34m=[0m [0mself[0m[0;34m.[0m[0mrestrict_inputs[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    748 [0;31m        [0mstore_values[0m [0;34m=[0m [0;32mnot[0m [0mnot_store[0m[0;34m[0m[0;34m[0m[0m
[0m


ipdb>  


> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(748)[0;36mcreate_function_and_run_code[0;34m()[0m
[0;32m    747 [0;31m            [0mrestrict_inputs[0m [0;34m=[0m [0mself[0m[0;34m.[0m[0mrestrict_inputs[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m--> 748 [0;31m        [0mstore_values[0m [0;34m=[0m [0;32mnot[0m [0mnot_store[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    749 [0;31m        [0;31m##pdb.no_set_trace()[0m[0;34m[0m[0;34m[0m[0m
[0m


ipdb>  


> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(750)[0;36mcreate_function_and_run_code[0;34m()[0m
[0;32m    749 [0;31m        [0;31m##pdb.no_set_trace()[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m--> 750 [0;31m        [0;32mif[0m [0mtest[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    751 [0;31m            [0mfunc[0m [0;34m=[0m [0;34m'test_'[0m [0;34m+[0m [0mfunc[0m[0;34m[0m[0;34m[0m[0m
[0m


ipdb>  


> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(753)[0;36mcreate_function_and_run_code[0;34m()[0m
[0;32m    752 [0;31m[0;34m[0m[0m
[0m[0;32m--> 753 [0;31m        [0;32mif[0m [0mtest[0m [0;32mand[0m [0;32mnot[0m [0mdata[0m [0;32mand[0m [0;32mnot[0m [0moverride[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    754 [0;31m            [0mload[0m [0;34m=[0m [0mself[0m[0;34m.[0m[0mload_tests[0m[0;34m[0m[0;34m[0m[0m
[0m


ipdb>  


> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(758)[0;36mcreate_function_and_run_code[0;34m()[0m
[0;32m    757 [0;31m[0;34m[0m[0m
[0m[0;32m--> 758 [0;31m        self.current_function = self.create_function (
[0m[0;32m    759 [0;31m            [0mcell[0m[0;34m,[0m[0;34m[0m[0;34m[0m[0m
[0m


ipdb>  l


[1;32m    753 [0m        [0;32mif[0m [0mtest[0m [0;32mand[0m [0;32mnot[0m [0mdata[0m [0;32mand[0m [0;32mnot[0m [0moverride[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[1;32m    754 [0m            [0mload[0m [0;34m=[0m [0mself[0m[0;34m.[0m[0mload_tests[0m[0;34m[0m[0;34m[0m[0m
[1;32m    755 [0m            [0msave[0m [0;34m=[0m [0mself[0m[0;34m.[0m[0msave_tests[0m[0;34m[0m[0;34m[0m[0m
[1;32m    756 [0m            [0mnot_run[0m [0;34m=[0m [0;32mnot[0m [0mself[0m[0;34m.[0m[0mrun_tests[0m[0;34m[0m[0;34m[0m[0m
[1;32m    757 [0m[0;34m[0m[0m
[0;32m--> 758 [0;31m        self.current_function = self.create_function (
[0m[1;32m    759 [0m            [0mcell[0m[0;34m,[0m[0;34m[0m[0;34m[0m[0m
[1;32m    760 [0m            [0mfunc[0m[0;34m,[0m[0;34m[0m[0;34m[0m[0m
[1;32m    761 [0m            [0mcall[0m[0;34m,[0m[0;34m[0m[0;34m[0m[0m
[1;32m    762 [0m            [0munknown_input[0m[0;34m=

ipdb>  


[1;32m    764 [0m            [0mtest[0m[0;34m=[0m[0mtest[0m[0;34m,[0m[0;34m[0m[0;34m[0m[0m
[1;32m    765 [0m            [0mdata[0m[0;34m=[0m[0mdata[0m[0;34m,[0m[0;34m[0m[0;34m[0m[0m
[1;32m    766 [0m            [0mpermanent[0m[0;34m=[0m[0mpermanent[0m[0;34m,[0m[0;34m[0m[0;34m[0m[0m
[1;32m    767 [0m            [0mnot_run[0m[0;34m=[0m[0mnot_run[0m[0;34m,[0m[0;34m[0m[0;34m[0m[0m
[1;32m    768 [0m            [0;34m**[0m[0mkwargs[0m[0;34m[0m[0;34m[0m[0m
[1;32m    769 [0m        )
[1;32m    770 [0m[0;34m[0m[0m
[1;32m    771 [0m        [0;31m# register[0m[0;34m[0m[0;34m[0m[0m
[1;32m    772 [0m        [0;32mif[0m [0midx[0m [0;32mis[0m [0;32mNone[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[1;32m    773 [0m            [0midx[0m [0;34m=[0m [0mself[0m[0;34m.[0m[0mcurrent_function[0m[0;34m.[0m[0midx[0m [0;34m=[0m [0mlen[0m[0;34m([0m[0mself[0m[0;34m.[0m[0mfunction_list[0m[0;34

ipdb>  


[1;32m    775 [0m        [0;31m# get variables specific about this function[0m[0;34m[0m[0;34m[0m[0m
[1;32m    776 [0m        [0mpath_variables[0m [0;34m=[0m [0mPath[0m [0;34m([0m[0mself[0m[0;34m.[0m[0mfile_name_without_extension[0m[0;34m)[0m [0;34m/[0m [0;34mf'{func}.pk'[0m[0;34m[0m[0;34m[0m[0m
[1;32m    777 [0m        [0;32mif[0m [0mload[0m [0;32mand[0m [0mpath_variables[0m[0;34m.[0m[0mexists[0m[0;34m([0m[0;34m)[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[1;32m    778 [0m            [0mself[0m[0;34m.[0m[0mcurrent_function[0m[0;34m.[0m[0mstore_variables[0m [0;34m([0m[0mpath_variables[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[1;32m    779 [0m            [0mnot_run[0m[0;34m=[0m[0;32mTrue[0m[0;34m[0m[0;34m[0m[0m
[1;32m    780 [0m[0;34m[0m[0m
[1;32m    781 [0m        [0;32mif[0m [0;32mnot[0m [0mnot_run[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[1;32m    782 [0m            [0mis_test_function[0m

ipdb>  b 772


Breakpoint 2 at /home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py:772


ipdb>  c


> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(772)[0;36mcreate_function_and_run_code[0;34m()[0m
[0;32m    771 [0;31m        [0;31m# register[0m[0;34m[0m[0;34m[0m[0m
[0m[1;31m2[0;32m-> 772 [0;31m        [0;32mif[0m [0midx[0m [0;32mis[0m [0;32mNone[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    773 [0;31m            [0midx[0m [0;34m=[0m [0mself[0m[0;34m.[0m[0mcurrent_function[0m[0;34m.[0m[0midx[0m [0;34m=[0m [0mlen[0m[0;34m([0m[0mself[0m[0;34m.[0m[0mfunction_list[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m


ipdb>  n


> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(776)[0;36mcreate_function_and_run_code[0;34m()[0m
[0;32m    775 [0;31m        [0;31m# get variables specific about this function[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m--> 776 [0;31m        [0mpath_variables[0m [0;34m=[0m [0mPath[0m [0;34m([0m[0mself[0m[0;34m.[0m[0mfile_name_without_extension[0m[0;34m)[0m [0;34m/[0m [0;34mf'{func}.pk'[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    777 [0;31m        [0;32mif[0m [0mload[0m [0;32mand[0m [0mpath_variables[0m[0;34m.[0m[0mexists[0m[0;34m([0m[0;34m)[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m


ipdb>  idx


0


ipdb>  self.function_list


[FunctionProcessor with name hybrid, and fields: dict_keys(['original_code', 'original_cell', 'original_name', 'name', 'call', 'tab_size', 'arguments', 'return_values', 'unknown_input', 'unknown_output', 'test', 'data', 'defined', 'permanent', 'signature', 'not_run', 'previous_values', 'current_values', 'returns_dict', 'returns_bunch', 'unpack_bunch', 'include_input', 'exclude_input', 'include_output', 'exclude_output', 'store_locals_in_disk', 'original_kwargs', 'cell_idx', 'logger', 'non_unique_created_variables', 'created_variables', 'non_unique_loaded_names', 'loaded_names', 'previous_variables', 'argument_variables', 'read_only_variables', 'posterior_variables', 'all_variables', 'idx', 'all_values', 'return_all', 'only_posterior', 'code_parts', 'code'])
    Arguments: []
    Output: []
    Locals: dict_keys(['x'])]


ipdb>  len(self.function_list)


1


ipdb>  n


> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(777)[0;36mcreate_function_and_run_code[0;34m()[0m
[0;32m    776 [0;31m        [0mpath_variables[0m [0;34m=[0m [0mPath[0m [0;34m([0m[0mself[0m[0;34m.[0m[0mfile_name_without_extension[0m[0;34m)[0m [0;34m/[0m [0;34mf'{func}.pk'[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m--> 777 [0;31m        [0;32mif[0m [0mload[0m [0;32mand[0m [0mpath_variables[0m[0;34m.[0m[0mexists[0m[0;34m([0m[0;34m)[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    778 [0;31m            [0mself[0m[0;34m.[0m[0mcurrent_function[0m[0;34m.[0m[0mstore_variables[0m [0;34m([0m[0mpath_variables[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m


ipdb>  


> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(781)[0;36mcreate_function_and_run_code[0;34m()[0m
[0;32m    780 [0;31m[0;34m[0m[0m
[0m[0;32m--> 781 [0;31m        [0;32mif[0m [0;32mnot[0m [0mnot_run[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    782 [0;31m            [0mis_test_function[0m [0;34m=[0m [0mself[0m[0;34m.[0m[0mcurrent_function[0m[0;34m.[0m[0mtest[0m [0;32mand[0m [0;32mnot[0m [0mself[0m[0;34m.[0m[0mcurrent_function[0m[0;34m.[0m[0mdata[0m[0;34m[0m[0;34m[0m[0m
[0m


ipdb>  l


[1;32m    776 [0m        [0mpath_variables[0m [0;34m=[0m [0mPath[0m [0;34m([0m[0mself[0m[0;34m.[0m[0mfile_name_without_extension[0m[0;34m)[0m [0;34m/[0m [0;34mf'{func}.pk'[0m[0;34m[0m[0;34m[0m[0m
[1;32m    777 [0m        [0;32mif[0m [0mload[0m [0;32mand[0m [0mpath_variables[0m[0;34m.[0m[0mexists[0m[0;34m([0m[0;34m)[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[1;32m    778 [0m            [0mself[0m[0;34m.[0m[0mcurrent_function[0m[0;34m.[0m[0mstore_variables[0m [0;34m([0m[0mpath_variables[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[1;32m    779 [0m            [0mnot_run[0m[0;34m=[0m[0;32mTrue[0m[0;34m[0m[0;34m[0m[0m
[1;32m    780 [0m[0;34m[0m[0m
[0;32m--> 781 [0;31m        [0;32mif[0m [0;32mnot[0m [0mnot_run[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[1;32m    782 [0m            [0mis_test_function[0m [0;34m=[0m [0mself[0m[0;34m.[0m[0mcurrent_function[0m[0;34m.[0m[0mtest[0m [0;32mand[0m 

ipdb>  


[1;32m    787 [0m            arguments=(self.current_function.previous_variables if unknown_input and not self.current_function.test and not self.current_function.defined else 
[1;32m    788 [0m                           [0;34m[[0m[0;34m][0m [0;32mif[0m [0mself[0m[0;34m.[0m[0mcurrent_function[0m[0;34m.[0m[0mtest[0m [0;32melse[0m[0;34m[0m[0;34m[0m[0m
[1;32m    789 [0m                           [0mself[0m[0;34m.[0m[0mcurrent_function[0m[0;34m.[0m[0marguments[0m [0;32mif[0m [0mself[0m[0;34m.[0m[0mcurrent_function[0m[0;34m.[0m[0mdefined[0m [0;32melse[0m[0;34m[0m[0;34m[0m[0m
[1;32m    790 [0m                           input)
[1;32m    791 [0m[0;34m[0m[0m
[1;32m    792 [0m            [0;32mif[0m [0;32mnot[0m [0mself[0m[0;34m.[0m[0mcurrent_function[0m[0;34m.[0m[0mtest[0m [0;32mand[0m [0;32mnot[0m [0mself[0m[0;34m.[0m[0mcurrent_function[0m[0;34m.[0m[0mdata[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[1

ipdb>  


[1;32m    798 [0m                        x for x in arguments if (x in self.current_function.loaded_names 
[1;32m    799 [0m                                                 and x in variables_created_by_previous_functions)
[1;32m    800 [0m                    ]
[1;32m    801 [0m                    self.current_function.previous_variables = [
[1;32m    802 [0m                        x for x in self.current_function.previous_variables if (x in self.current_function.loaded_names 
[1;32m    803 [0m                                                 and x in variables_created_by_previous_functions)
[1;32m    804 [0m                    ]
[1;32m    805 [0m                [0;32melse[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[1;32m    806 [0m                    [0;31m# arguments can only be variables that are either:[0m[0;34m[0m[0;34m[0m[0m
[1;32m    807 [0m                    [0;31m# - created by previous functions ("created_variables")[0m[0;34m[0m[0;34m[0m[0m


ipdb>  b 794


Breakpoint 3 at /home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py:794


ipdb>  c


7
> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(794)[0;36mcreate_function_and_run_code[0;34m()[0m
[0;32m    793 [0;31m                [0mvariables_created_by_previous_functions[0m [0;34m=[0m [0;34m[[0m[0mx[0m [0;32mfor[0m [0mf[0m [0;32min[0m [0mself[0m[0;34m.[0m[0mfunction_list[0m[0;34m[[0m[0;34m:[0m[0midx[0m[0;34m][0m [0;32mfor[0m [0mx[0m [0;32min[0m [0mf[0m[0;34m.[0m[0mcreated_variables[0m[0;34m][0m[0;34m[0m[0;34m[0m[0m
[0m[1;31m3[0;32m-> 794 [0;31m                [0;32mif[0m [0mself[0m[0;34m.[0m[0mrestrict_inputs[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    795 [0;31m                    [0;31m# arguments can only be variables created by previous functions ("created_variables")[0m[0;34m[0m[0;34m[0m[0m
[0m


ipdb>  n


> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(809)[0;36mcreate_function_and_run_code[0;34m()[0m
[0;32m    808 [0;31m                    [0;31m# - not created by previous functions, but not created by either the current function or subsequent functions either.[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m--> 809 [0;31m                    [0mvariables_created_by_this_or_posterior_functions[0m [0;34m=[0m [0;34m[[0m[0mx[0m [0;32mfor[0m [0mf[0m [0;32min[0m [0mself[0m[0;34m.[0m[0mfunction_list[0m[0;34m[[0m[0midx[0m[0;34m:[0m[0;34m][0m [0;32mfor[0m [0mx[0m [0;32min[0m [0mf[0m[0;34m.[0m[0mcreated_variables[0m[0;34m][0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    810 [0;31m                    arguments=[
[0m


ipdb>  n


> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(810)[0;36mcreate_function_and_run_code[0;34m()[0m
[0;32m    809 [0;31m                    [0mvariables_created_by_this_or_posterior_functions[0m [0;34m=[0m [0;34m[[0m[0mx[0m [0;32mfor[0m [0mf[0m [0;32min[0m [0mself[0m[0;34m.[0m[0mfunction_list[0m[0;34m[[0m[0midx[0m[0;34m:[0m[0;34m][0m [0;32mfor[0m [0mx[0m [0;32min[0m [0mf[0m[0;34m.[0m[0mcreated_variables[0m[0;34m][0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m--> 810 [0;31m                    arguments=[
[0m[0;32m    811 [0;31m                        x for x in arguments if (x in self.current_function.loaded_names 
[0m


ipdb>  


> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(811)[0;36mcreate_function_and_run_code[0;34m()[0m
[0;32m    810 [0;31m                    arguments=[
[0m[0;32m--> 811 [0;31m                        x for x in arguments if (x in self.current_function.loaded_names 
[0m[0;32m    812 [0;31m                                                 and (x in variables_created_by_previous_functions or 
[0m


ipdb>  l


[1;32m    806 [0m                    [0;31m# arguments can only be variables that are either:[0m[0;34m[0m[0;34m[0m[0m
[1;32m    807 [0m                    [0;31m# - created by previous functions ("created_variables")[0m[0;34m[0m[0;34m[0m[0m
[1;32m    808 [0m                    [0;31m# - not created by previous functions, but not created by either the current function or subsequent functions either.[0m[0;34m[0m[0;34m[0m[0m
[1;32m    809 [0m                    [0mvariables_created_by_this_or_posterior_functions[0m [0;34m=[0m [0;34m[[0m[0mx[0m [0;32mfor[0m [0mf[0m [0;32min[0m [0mself[0m[0;34m.[0m[0mfunction_list[0m[0;34m[[0m[0midx[0m[0;34m:[0m[0;34m][0m [0;32mfor[0m [0mx[0m [0;32min[0m [0mf[0m[0;34m.[0m[0mcreated_variables[0m[0;34m][0m[0;34m[0m[0;34m[0m[0m
[1;32m    810 [0m                    arguments=[
[0;32m--> 811 [0;31m                        x for x in arguments if (x in self.current_function.loaded_name

ipdb>  b 815


Breakpoint 4 at /home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py:815


ipdb>  c


> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(815)[0;36mcreate_function_and_run_code[0;34m()[0m
[0;32m    814 [0;31m                    ]
[0m[1;31m4[0;32m-> 815 [0;31m                    self.current_function.previous_variables = [
[0m[0;32m    816 [0;31m                        x for x in self.current_function.previous_variables if (x in self.current_function.loaded_names 
[0m


ipdb>  n


> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(816)[0;36mcreate_function_and_run_code[0;34m()[0m
[1;31m4[0;32m   815 [0;31m                    self.current_function.previous_variables = [
[0m[0;32m--> 816 [0;31m                        x for x in self.current_function.previous_variables if (x in self.current_function.loaded_names 
[0m[0;32m    817 [0;31m                                                                                and (x in variables_created_by_previous_functions or 
[0m


ipdb>  


> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(815)[0;36mcreate_function_and_run_code[0;34m()[0m
[0;32m    814 [0;31m                    ]
[0m[1;31m4[0;32m-> 815 [0;31m                    self.current_function.previous_variables = [
[0m[0;32m    816 [0;31m                        x for x in self.current_function.previous_variables if (x in self.current_function.loaded_names 
[0m


ipdb>  


> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(815)[0;36m<listcomp>[0;34m()[0m
[0;32m    814 [0;31m                    ]
[0m[1;31m4[0;32m-> 815 [0;31m                    self.current_function.previous_variables = [
[0m[0;32m    816 [0;31m                        x for x in self.current_function.previous_variables if (x in self.current_function.loaded_names 
[0m


ipdb>  


> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(816)[0;36m<listcomp>[0;34m()[0m
[1;31m4[0;32m   815 [0;31m                    self.current_function.previous_variables = [
[0m[0;32m--> 816 [0;31m                        x for x in self.current_function.previous_variables if (x in self.current_function.loaded_names 
[0m[0;32m    817 [0;31m                                                                                and (x in variables_created_by_previous_functions or 
[0m


ipdb>  


> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(817)[0;36m<listcomp>[0;34m()[0m
[0;32m    816 [0;31m                        x for x in self.current_function.previous_variables if (x in self.current_function.loaded_names 
[0m[0;32m--> 817 [0;31m                                                                                and (x in variables_created_by_previous_functions or 
[0m[0;32m    818 [0;31m                                                                                     x not in variables_created_by_this_or_posterior_functions))
[0m


ipdb>  


> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(818)[0;36m<listcomp>[0;34m()[0m
[0;32m    817 [0;31m                                                                                and (x in variables_created_by_previous_functions or 
[0m[0;32m--> 818 [0;31m                                                                                     x not in variables_created_by_this_or_posterior_functions))
[0m[0;32m    819 [0;31m                    ]
[0m


ipdb>  


> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(815)[0;36m<listcomp>[0;34m()[0m
[0;32m    814 [0;31m                    ]
[0m[1;31m4[0;32m-> 815 [0;31m                    self.current_function.previous_variables = [
[0m[0;32m    816 [0;31m                        x for x in self.current_function.previous_variables if (x in self.current_function.loaded_names 
[0m


ipdb>  


> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(818)[0;36m<listcomp>[0;34m()[0m
[0;32m    817 [0;31m                                                                                and (x in variables_created_by_previous_functions or 
[0m[0;32m--> 818 [0;31m                                                                                     x not in variables_created_by_this_or_posterior_functions))
[0m[0;32m    819 [0;31m                    ]
[0m


ipdb>  


--Return--
[]
> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(818)[0;36m<listcomp>[0;34m()[0m
[0;32m    817 [0;31m                                                                                and (x in variables_created_by_previous_functions or 
[0m[0;32m--> 818 [0;31m                                                                                     x not in variables_created_by_this_or_posterior_functions))
[0m[0;32m    819 [0;31m                    ]
[0m


ipdb>  l


[1;32m    813 [0m                                                      x not in variables_created_by_this_or_posterior_functions))
[1;32m    814 [0m                    ]
[1;31m4[1;32m   815 [0m                    self.current_function.previous_variables = [
[1;32m    816 [0m                        x for x in self.current_function.previous_variables if (x in self.current_function.loaded_names 
[1;32m    817 [0m                                                                                and (x in variables_created_by_previous_functions or 
[0;32m--> 818 [0;31m                                                                                     x not in variables_created_by_this_or_posterior_functions))
[0m[1;32m    819 [0m                    ]
[1;32m    820 [0m[0;34m[0m[0m
[1;32m    821 [0m            [0;31m# return values[0m[0;34m[0m[0;34m[0m[0m
[1;32m    822 [0m            return_values=([] if unknown_output and not self.current_function.defined else 


ipdb>  b 822


Breakpoint 5 at /home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py:822


ipdb>  c


> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(822)[0;36mcreate_function_and_run_code[0;34m()[0m
[0;32m    821 [0;31m            [0;31m# return values[0m[0;34m[0m[0;34m[0m[0m
[0m[1;31m5[0;32m-> 822 [0;31m            return_values=([] if unknown_output and not self.current_function.defined else 
[0m[0;32m    823 [0;31m                               [0mself[0m[0;34m.[0m[0mcurrent_function[0m[0;34m.[0m[0mreturn_values[0m [0;32mif[0m [0mself[0m[0;34m.[0m[0mcurrent_function[0m[0;34m.[0m[0mdefined[0m [0;32melse[0m[0;34m[0m[0;34m[0m[0m
[0m


ipdb>  self.current_function.previous_variables


[]


ipdb>  ll


[1;31m1[1;32m   721 [0m    def create_function_and_run_code(
[1;32m    722 [0m        [0mself[0m[0;34m,[0m[0;34m[0m[0;34m[0m[0m
[1;32m    723 [0m        [0mfunc[0m[0;34m,[0m[0;34m[0m[0;34m[0m[0m
[1;32m    724 [0m        [0mcell[0m[0;34m,[0m[0;34m[0m[0;34m[0m[0m
[1;32m    725 [0m        [0mcall[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m[0;34m[0m[0m
[1;32m    726 [0m        [0minput[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m[0;34m[0m[0m
[1;32m    727 [0m        [0munknown_input[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m[0;34m[0m[0;34m[0m[0m
[1;32m    728 [0m        [0moutput[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m[0;34m[0m[0m
[1;32m    729 [0m        [0munknown_output[0m[0;34m=[0m[0;32mTrue[0m[0;34m,[0m[0;34m[0m[0;34m[0m[0m
[1;32m    730 [0m        [0mnot_store[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m[0;34m[0m[0m
[1;32m    731 [0m        [0mmake_function

ipdb>  r


--Return--
FunctionProce...ct_keys(['x'])
> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(890)[0;36mcreate_function_and_run_code[0;34m()[0m
[0;32m    889 [0;31m[0;34m[0m[0m
[0m[0;32m--> 890 [0;31m        [0;32mreturn[0m [0mself[0m[0;34m.[0m[0mcurrent_function[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    891 [0;31m[0;34m[0m[0m
[0m


ipdb>  self.current_function.previous_variables


[]


ipdb>  s


> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(920)[0;36mfunction[0;34m()[0m
[0;32m    919 [0;31m        [0mthis_function[0m [0;34m=[0m [0mself[0m[0;34m.[0m[0mcreate_function_and_run_code[0m[0;34m([0m[0mfunc[0m[0;34m,[0m [0mcell[0m[0;34m,[0m [0mshow[0m[0;34m=[0m[0mshow[0m[0;34m,[0m [0mtest[0m[0;34m=[0m[0mtest[0m[0;34m,[0m [0mdata[0m[0;34m=[0m[0mdata[0m[0;34m,[0m [0midx[0m[0;34m=[0m[0midx[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m--> 920 [0;31m        [0;32mif[0m [0mexisting[0m [0;32mand[0m [0;34m([0m[0mfunc[0m [0;32min[0m [0mself[0m[0;34m.[0m[0mfunction_info[0m[0;34m)[0m [0;32mand[0m [0mmerge[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    921 [0;31m            [0mthis_function[0m [0;34m=[0m [0mself[0m[0;34m.[0m[0mmerge_functions[0m [0;34m([0m[0mself[0m[0;34m.[0m[0mfunction_info[0m[0;34m[[0m[0

ipdb>  this_function.previous_variables


[]


ipdb>  n


> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(924)[0;36mfunction[0;34m()[0m
[0;32m    923 [0;31m[0;34m[0m[0m
[0m[0;32m--> 924 [0;31m        [0mfunction_name[0m [0;34m=[0m [0mthis_function[0m[0;34m.[0m[0mname[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    925 [0;31m        [0;32mif[0m [0mthis_function[0m[0;34m.[0m[0mtest[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m


ipdb>  existing


True


ipdb>  merge


False


ipdb>  n


> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(925)[0;36mfunction[0;34m()[0m
[0;32m    924 [0;31m        [0mfunction_name[0m [0;34m=[0m [0mthis_function[0m[0;34m.[0m[0mname[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m--> 925 [0;31m        [0;32mif[0m [0mthis_function[0m[0;34m.[0m[0mtest[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    926 [0;31m            [0;32mif[0m [0mthis_function[0m[0;34m.[0m[0mdata[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m


ipdb>  


> [0;32m/home/jaumeamllo/workspace/mine/nbmodular/nbmodular/core/cell2func.py[0m(934)[0;36mfunction[0;34m()[0m
[0;32m    933 [0;31m        [0;32melse[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m--> 934 [0;31m            [0mself[0m[0;34m.[0m[0mfunction_info[0m[0;34m[[0m[0mfunction_name[0m[0;34m][0m [0;34m=[0m [0mthis_function[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m    935 [0;31m            [0mself[0m[0;34m.[0m[0mfunction_list[0m [0;34m=[0m [0madd_function_to_list[0m [0;34m([0m[0mthis_function[0m[0;34m,[0m [0mself[0m[0;34m.[0m[0mfunction_list[0m[0;34m,[0m [0midx[0m[0;34m=[0m[0midx[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m


ipdb>  idx


0


ipdb>  this_function.previous_variables


[]


ipdb>  q


#### Dynamic inputs & outputs

So far, none of the created functions return any result. This is because there is no other function that needs any of the variables created inside neither `two_plus_three` nor `add_100`. Let's see what happens when we add a new function that requires the variable `c`, created in `two_plus_three`:

The code in the previous cell runs as it normally would, but and at the same time defines a function named `two_plus_three` which we can show with the magic command `print`:

```ipython
%print two_plus_three 
```

This function is defined in the notebook space, so we can invoke it:

In [None]:
%print all

def get_initial_values(test=False):
    a = 2
    b = 3
    c = a+b
    print (a+b)



The inputs and outputs of the function change dynamically every time we add a new function cell. For example, if we add a new function `get_d`:

In [None]:
%%function get_d
d = 10

In [None]:
%print get_d

def get_d():
    d = 10



And then a function `add_all` that depend on the previous two functions:

In [None]:
%%function add_all
a = a + d
b = b + d
c = c + d

In [None]:
f = %function_info add_all

In [None]:
print(f.code)

def add_all(d, b, c, a):
    a = a + d
    b = b + d
    c = c + d



In [None]:
%print add_all

def add_all(d, b, c, a):
    a = a + d
    b = b + d
    c = c + d



In [None]:
%print_pipeline --test


from sklearn.utils import Bunch
from pathlib import Path
import joblib
import pandas as pd
import numpy as np

def test_index_pipeline (test=True, prev_result=None, result_file_name="index_pipeline"):
    result = index_pipeline (test=test, load=True, save=True, result_file_name=result_file_name)
    if prev_result is None:
        prev_result = index_pipeline (test=test, load=True, save=True, result_file_name=f"test_{result_file_name}")
    for k in prev_result:
        assert k in result
        if type(prev_result[k]) is pd.DataFrame:    
            pd.testing.assert_frame_equal (result[k], prev_result[k])
        elif type(prev_result[k]) is np.array:
            np.testing.assert_array_equal (result[k], prev_result[k])
        else:
            assert result[k]==prev_result[k]



In [None]:
%print_pipeline


def index_pipeline (test=False, load=True, save=True, result_file_name="index_pipeline"):

    # load result
    result_file_name += '.pk'
    path_variables = Path ("index") / result_file_name
    if load and path_variables.exists():
        result = joblib.load (path_variables)
        return result

    b, c, a = get_initial_values (test=test)
    d = get_d ()
    add_all (d, b, c, a)

    # save result
    result = Bunch (b=b,c=c,a=a,d=d)
    if save:    
        path_variables.parent.mkdir (parents=True, exist_ok=True)
        joblib.dump (result, path_variables)
    return result



In [None]:
%print add_all

def add_all(d, b, c, a):
    a = a + d
    b = b + d
    c = c + d



We can see that the uputs from `two_plus_three` and `get_d` change as needed. We can look at all the functions defined so far by using `print all`:

In [None]:
%print all

def get_initial_values(test=False):
    a = 2
    b = 3
    c = a+b
    print (a+b)
    return b,c,a

def get_d():
    d = 10
    return d

def add_all(d, b, c, a):
    a = a + d
    b = b + d
    c = c + d



Similarly the outputs from the last function `add_all` change after we add a other functions that depend on it:

In [None]:
%%function print_all
print (a, b, c, d)

12 13 15 10


### print

We can see each of the defined functions with `print my_function`, and list all of them with `print all`

In [None]:
%print all

def get_initial_values(test=False):
    a = 2
    b = 3
    c = a+b
    print (a+b)
    return b,c,a

def get_d():
    d = 10
    return d

def add_all(d, b, c, a):
    a = a + d
    b = b + d
    c = c + d
    return b,c,a

def print_all(b, d, a, c):
    print (a, b, c, d)



### print_pipeline

As we add functions to the notebook, a pipeline function is defined. We can print this pipeline with the magic `print_pipeline`

In [None]:
%print_pipeline


def index_pipeline (test=False, load=True, save=True, result_file_name="index_pipeline"):

    # load result
    result_file_name += '.pk'
    path_variables = Path ("index") / result_file_name
    if load and path_variables.exists():
        result = joblib.load (path_variables)
        return result

    b, c, a = get_initial_values (test=test)
    d = get_d ()
    b, c, a = add_all (d, b, c, a)
    print_all (b, d, a, c)

    # save result
    result = Bunch (b=b,d=d,c=c,a=a)
    if save:    
        path_variables.parent.mkdir (parents=True, exist_ok=True)
        joblib.dump (result, path_variables)
    return result



This shows the data flow in terms of inputs and outputs

And run it:

In [None]:
self = %cell_processor

In [None]:
self.function_list

[FunctionProcessor with name get_initial_values, and fields: dict_keys(['original_code', 'name', 'call', 'tab_size', 'arguments', 'return_values', 'unknown_input', 'unknown_output', 'test', 'data', 'defined', 'permanent', 'signature', 'norun', 'created_variables', 'loaded_names', 'previous_variables', 'argument_variables', 'read_only_variables', 'posterior_variables', 'all_variables', 'idx', 'previous_values', 'current_values', 'all_values', 'code'])
     Arguments: []
     Output: ['b', 'c', 'a']
     Locals: dict_keys(['a', 'b', 'c']),
 FunctionProcessor with name get_d, and fields: dict_keys(['original_code', 'name', 'call', 'tab_size', 'arguments', 'return_values', 'unknown_input', 'unknown_output', 'test', 'data', 'defined', 'permanent', 'signature', 'norun', 'created_variables', 'loaded_names', 'previous_variables', 'argument_variables', 'read_only_variables', 'posterior_variables', 'all_variables', 'idx', 'previous_values', 'current_values', 'all_values', 'code'])
     Arguments

In [None]:
%print all

def get_initial_values(test=False):
    a = 2
    b = 3
    c = a+b
    print (a+b)
    return b,c,a

def get_d():
    d = 10
    return d

def add_all(d, b, c, a):
    a = a + d
    b = b + d
    c = c + d
    return b,c,a

def print_all(b, d, a, c):
    print (a, b, c, d)



In [None]:
index_pipeline()

{'d': 10, 'b': 13, 'a': 12, 'c': 15}

### function_info

We can get access to many of the details of each of the defined functions by calling `function_info` on a given function name:

In [None]:
two_plus_three_info = %function_info two_plus_three

This allows us to see:

- The name and value (at the time of running) of the local variables, arguments and results from the function:

In [None]:
two_plus_three_info.arguments

[]

In [None]:
two_plus_three_info.current_values

{'a': 2, 'b': 3, 'c': 5}

In [None]:
two_plus_three_info.return_values

['b', 'c', 'a']

We can also inspect the original code written in the cell...

In [None]:
print (two_plus_three_info.original_code)

a = 2
b = 3
c = a+b
print (a+b)



the code of the defined function:

In [None]:
print (two_plus_three_info.code)

def get_initial_values(test=False):
    a = 2
    b = 3
    c = a+b
    print (a+b)
    return b,c,a



.. and the AST trees:

In [None]:
print (two_plus_three_info.get_ast (code=two_plus_three_info.original_code))

Module(
  body=[
    Assign(
      targets=[
        Name(id='a', ctx=Store())],
      value=Constant(value=2)),
    Assign(
      targets=[
        Name(id='b', ctx=Store())],
      value=Constant(value=3)),
    Assign(
      targets=[
        Name(id='c', ctx=Store())],
      value=BinOp(
        left=Name(id='a', ctx=Load()),
        op=Add(),
        right=Name(id='b', ctx=Load()))),
    Expr(
      value=Call(
        func=Name(id='print', ctx=Load()),
        args=[
          BinOp(
            left=Name(id='a', ctx=Load()),
            op=Add(),
            right=Name(id='b', ctx=Load()))],
        keywords=[]))],
  type_ignores=[])
None


In [None]:
print (two_plus_three_info.get_ast (code=two_plus_three_info.code))

Module(
  body=[
    FunctionDef(
      name='get_initial_values',
      args=arguments(
        posonlyargs=[],
        args=[
          arg(arg='test')],
        kwonlyargs=[],
        kw_defaults=[],
        defaults=[
          Constant(value=False)]),
      body=[
        Assign(
          targets=[
            Name(id='a', ctx=Store())],
          value=Constant(value=2)),
        Assign(
          targets=[
            Name(id='b', ctx=Store())],
          value=Constant(value=3)),
        Assign(
          targets=[
            Name(id='c', ctx=Store())],
          value=BinOp(
            left=Name(id='a', ctx=Load()),
            op=Add(),
            right=Name(id='b', ctx=Load()))),
        Expr(
          value=Call(
            func=Name(id='print', ctx=Load()),
            args=[
              BinOp(
                left=Name(id='a', ctx=Load()),
                op=Add(),
                right=Name(id='b', ctx=Load()))],
            keywords=[])),
        Return(
       

Now, we can define another function in a cell that uses variables from the previous function.

### cell_processor

This magic allows us to get access to the CellProcessor class managing the logic for running the above magic commands, which can become handy:

In [None]:
cell_processor = %cell_processor

## Merging function cells

In order to explore intermediate results, it is convenient to split the code in a function among different cells. This can be done by passing the flag `--merge True`

In [None]:
%%function analyze
x = [1, 2, 3]
y = [100, 200, 300]
z = [u+v for u,v in zip(x,y)]

In [None]:
z

[101, 202, 303]

In [None]:
%print analyze

def analyze():
    x = [1, 2, 3]
    y = [100, 200, 300]
    z = [u+v for u,v in zip(x,y)]



In [None]:
%%function analyze --merge
product = [u*v for u, v in zip(x,y)]

In [None]:
%print analyze

def analyze():
    x = [1, 2, 3]
    y = [100, 200, 300]
    z = [u+v for u,v in zip(x,y)]
    product = [u*v for u, v in zip(x,y)]



# Test functions

By passing the flag `--test` we can indicate that the logic in the cell is dedicated to test other functions in the notebook. The test function is defined taking the well-known `pytest` library as a test engine in mind. 

This has the following consequences:   
    - The analysis of dependencies is not associated with variables found in other cells.
    - Test functions do not appear in the overall pipeline.
    - The data variables used by the test function can be defined in separate test data cells which in turn are converted to functions. These functions are called at the beginning of the test cell. 
    
Let's see an example

In [None]:
%%function input_add_all --data --test
a = 5
b = 3
c = 6
d = 7

In [None]:
add_all(d, a, b, c)

(12, 10, 13)

In [None]:
%%function add_all --test
# test function add_all
assert add_all(d, a, b, c)==(12, 10, 13)

In [None]:
%print test_add_all --test

def test_add_all():
    b,c,a,d = test_input_add_all()
    # test function add_all
    assert add_all(d, a, b, c)==(12, 10, 13)



In [None]:
%print test_input_add_all --test --data

def test_input_add_all(test=False):
    a = 5
    b = 3
    c = 6
    d = 7
    return b,c,a,d



Test functions are written in a separate test module, withprefix `test_`

In [None]:
!ls ../tests

index.ipynb  test_example.py


# Imports

In order to include libraries in our python module, we can use the magic imports. Those will be written at the beginning of the module:

In [None]:
%%imports
import pandas as pd

Imports can be indicated separately for the test module by passing the flag `--test`:

In [None]:
%%imports --test
import matplotlib.pyplot as plt

# Defined functions

Functions can be included already being defined with signature and return values. The only caveat is that, if we want the function to be executed, the variables in the argument list need to be created outside of the function. Otherwise we need to pass the flag --norun to avoid errors:

In [None]:
%%function --norun
def myfunc (x, y, a=1, b=3):
    print ('hello', a, b)
    c = a+b
    return c

Although the internal code of the function is not executed, it is still parsed using an AST. This allows to provide very tentative *warnings* regarding names not found in the argument list

In [None]:
%%function --norun
def other_func (x, y):
    print ('hello', a, b)
    c = a+b
    return c

Detected the following previous variables that are not in the argument list: ['b', 'a']


Let's do the same but running the function:

In [None]:
a=1
b=3

In [None]:
%%function
def myfunc (x, y, a=1, b=3):
    print ('hello', a, b)
    c = a+b
    return c

hello 1 3


In [None]:
myfunc (10, 20)

hello 1 3


4

In [None]:
myfunc_info = %function_info myfunc

In [None]:
myfunc_info

FunctionProcessor with name myfunc, and fields: dict_keys(['original_code', 'name', 'call', 'tab_size', 'arguments', 'return_values', 'unknown_input', 'unknown_output', 'test', 'data', 'defined', 'permanent', 'signature', 'norun', 'created_variables', 'loaded_names', 'previous_variables', 'argument_variables', 'read_only_variables', 'posterior_variables', 'all_variables', 'idx', 'previous_values', 'current_values', 'all_values', 'code'])
    Arguments: ['x', 'y', 'a', 'b']
    Output: ['c']
    Locals: dict_keys(['c'])

In [None]:
myfunc_info.c

4

# Storing local variables in memory

By default, when we run a cell function its local variables are stored in a dictionary called `current_values`:

In [None]:
%%function my_new_function
my_new_local = 3
my_other_new_local = 4

The stored variables can be accessed by calling the magic `function_info`:

In [None]:
my_new_function_info = %function_info my_new_function

In [None]:
my_new_function_info.current_values

{'my_new_local': 3, 'my_other_new_local': 4}

This default behaviour can be overriden by passing the flag `--not-store`

In [None]:
%%function my_second_new_function --not-store
my_second_variable = 100
my_second_other_variable = 200

In [None]:
my_second_new_function_info = %function_info my_second_new_function

In [None]:
my_second_new_function_info.current_values

{}

# (Un)packing Bunch I/O

In [None]:
%load_ext nbmodular.core.cell2func

In [None]:
from sklearn.utils import Bunch

In [None]:
%%function bunch_data
x = Bunch (a=1, b=2)

In [None]:
%%function bunch_processor --unpack-bunch x --include-input "day=1"
c = 3
a = 4

In [None]:
%print bunch_processor

def bunch_processor(x, day):
    a = x["a"]
    b = x["b"]
    c = 3
    a = 4
    x["a"] = a
    x["c"] = c
    x["day"] = day
    return x



# Function's info object holding local variables

In [None]:
#| hide
import pandas as pd

In [None]:
df = pd.DataFrame (dict(Year=[1,2,3], Month=[1,2,3], Day=[1,2,3]))
fy = '2023'

In [None]:
%%function
def days (df, fy, x=1, /, y=3, *, n=4):
    df_group = df.groupby(['Year','Month']).agg({'Day': lambda x: len (x)})
    df_group = df.reset_index()
    print ('other args: fy', fy, 'x', x, 'y', y)
    return df_group

other args: fy 2023 x 1 y 3
Stored the following local variables in the days current_values dictionary: ['df_group']
Detected the following previous variables that are not in the argument list: ['x', 'df', 'fy']


An info object with name <function_name>_info is created in memory, and can be used to get access to local variables

In [None]:
days_info.df_group

Unnamed: 0,index,Year,Month,Day
0,0,1,1,1
1,1,2,2,2
2,2,3,3,3


There is more information in this object: previous variables, code, etc.

In [None]:
days_info.current_values

{'df_group':    index  Year  Month  Day
 0      0     1      1    1
 1      1     2      2    2
 2      2     3      3    3}

In [None]:
days_info

FunctionProcessor with name days, and fields: dict_keys(['original_code', 'name', 'call', 'tab_size', 'arguments', 'return_values', 'unknown_input', 'unknown_output', 'test', 'data', 'defined', 'permanent', 'signature', 'not_run', 'previous_values', 'current_values', 'returns_dict', 'returns_bunch', 'unpack_bunch', 'include_input', 'exclude_input', 'include_output', 'exclude_output', 'store_locals_in_disk', 'created_variables', 'loaded_names', 'previous_variables', 'argument_variables', 'read_only_variables', 'posterior_variables', 'all_variables', 'idx'])
    Arguments: ['df', 'fy', 'x', 'y']
    Output: ['df_group']
    Locals: dict_keys(['df_group'])

The function can also be called directly:

In [None]:
days (df*100, 100, x=4)

other args: fy 100 x 4 y 3


Unnamed: 0,index,Year,Month,Day
0,0,100,100,100
1,1,200,200,200
2,2,300,300,300
