In [1]:
from cartesian_explorer import Explorer
import numpy as np
import datetime
%load_ext autoreload
%autoreload 2

<div style="height:50px">
<br></br>
</div>

# 🗺️ Mapping tutorial

Cartesian Explorer is a tool that simplifies exploring multi-input functions.
Let's start with the first stage: the improved `map` API.

## Iterating over cartesian product of arguments

The goal of `Explorer.map` function is to provide a handy way of writing the following constructs:
```python
res = []
for v_1 in args_1:
    for v_2 in args_2:
        for v_2 in args_3
            res.append(my_function(v_1, v_2, v_3))

res = np.array(res).reshape(len(arg_1), len(args_2), len(args_3))
```

The equivalent syntax for a function that takes arguments `a, b, c` is:

```python
ex = Explorer()
res = ex.map(my_function, a=args_1, b=args_2, c=args_3)
```

Let's create an example of a function with many arguments: for example a `date_string` function that returns a string representation of datetime.

In [2]:
def date_string(year, month, day, hour=0, minute=0, second=0):
    return datetime.datetime(year, month, day, hour, minute, second).isoformat()

In [3]:
ex = Explorer()
result = ex.map(date_string, 
    year = [2023],
    month = [5, 6],
    day = [1, 15],
    hour = [12, 6],
    minute = [0],
    second = [1, 2, 3]
)
result.shape

(1, 2, 2, 2, 1, 3)

The result is a `numpy` array with the same dimensions as the length of each argument.
This can be seen as a `len(shape)`-dimensional tensor with.
Now we can access the element by index in input dimensions, for example:

In [4]:
result[0, 0, 0, 0, 0, 0]

'2023-05-01T12:00:01'

Check that this output corresponds to the first item in input arguments (e.g. May 1, 2023 for date)

### Caveats of mapping syntax

The simple syntax of passing keyword arguments is useful, but in general can lead to some issues.
They are addressed by providing a fallback syntax using `constants` and `variables` keyword arguments.
This is consistently used across all `map_*` functions in cartesian explorer.

#### Iterable keyword arguments
Each argument must be an iterable. Be careful with this:

In [5]:
def string_reverse(s):
    return s[::-1]

ex.map(string_reverse, s='Mary had a little lamb')

array(['M', 'a', 'r', 'y', ' ', 'h', 'a', 'd', ' ', 'a', ' ', 'l', 'i',
       't', 't', 'l', 'e', ' ', 'l', 'a', 'm', 'b'], dtype='<U1')

Just like in `for v in arg_1` you will get iteration over the string.
To use a single-dimension input, just wrap it into a list:

In [6]:
ex.map(string_reverse, s=['Mary had a little lamb'])

array(['bmal elttil a dah yraM'], dtype='<U22')

The output is a 1-dimensional array with length 1

Note that returning a numpy array will not get nested by default:

#### Setting constant arguments

Sometimes it is handy to set constant arguments without having a dedicated dimension for it in the output.
For this, use `constants` keyword:

In [7]:
ex.map(date_string, second=[1, 2, 3], constants=dict(year=2023, month=5, day=21))

array(['2023-05-21T00:00:01', '2023-05-21T00:00:02',
       '2023-05-21T00:00:03'], dtype='<U19')

#### Avoiding conflict with special arguments

In case your function has an actual argument called `constants` (and other special keyword argumens of `map`), you can always specify arguments using `variables` argument:

* `variables` takes precedence
* you should not use both `variables` and the keyword argument syntax


In [8]:
def keyword_format(**kwargs):
    return ', '.join([f'{k}={v}' for k, v in kwargs.items()])

keywords = ex.map(keyword_format,
       variables=dict(constants=[1, 2, 3]),
       constants=dict(year=2023, constants=5, day=21)
       )
assert keywords[0] == 'constants=1, year=2023, day=21'
keywords

array(['constants=1, year=2023, day=21', 'constants=2, year=2023, day=21',
       'constants=3, year=2023, day=21'], dtype='<U30')

In [9]:
def to_array(s):
    return np.array([s, s])

array_of_lists = ex.map(to_array, s='Mary had a little lamb')
print("Shape of output:", array_of_lists.shape)
print("Converted shape:", np.array(array_of_lists.tolist()).shape)

Shape of output: (22,)
Converted shape: (22, 2)


The resulting array is not a 2-d array, but a nested array. There is a special way to specify that your function returns a vector.

## Using annotated `xarray` arrays as output

While numpy is a popular package and is supported by default, you now can see the benefit of having named dimensions.
This is the perfect use case for [xarray](https://docs.xarray.dev/en/stable/) package.

In [10]:
result_x = ex.map_xarray(keyword_format, 
    year = [2023],
    month = [5, 6],
    label = ['abc', 'def', 'fgh'],
    mystr = ['Mary had a little lamb'],
    constants=dict(C=('Constant', 'tuple'))
)
slice_x = result_x.sel(year=2023, label='fgh')
slice_x

`xarray` is a feature-rich library which gives you powerful tools to process the data and integrate it with other parts of your code.

Alternatively, you can convert the [DataArray](https://docs.xarray.dev/en/latest/generated/xarray.DataArray.html) to the pandas [DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html).

In [11]:
result_x.to_dataframe(name='keyword_format')

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,C,keyword_format
year,month,label,mystr,Unnamed: 4_level_1,Unnamed: 5_level_1
2023,5,abc,Mary had a little lamb,"(Constant, tuple)","year=2023, month=5, label=abc, mystr=Mary had ..."
2023,5,def,Mary had a little lamb,"(Constant, tuple)","year=2023, month=5, label=def, mystr=Mary had ..."
2023,5,fgh,Mary had a little lamb,"(Constant, tuple)","year=2023, month=5, label=fgh, mystr=Mary had ..."
2023,6,abc,Mary had a little lamb,"(Constant, tuple)","year=2023, month=6, label=abc, mystr=Mary had ..."
2023,6,def,Mary had a little lamb,"(Constant, tuple)","year=2023, month=6, label=def, mystr=Mary had ..."
2023,6,fgh,Mary had a little lamb,"(Constant, tuple)","year=2023, month=6, label=fgh, mystr=Mary had ..."


In [12]:
result_x.to_dataframe(name='keyword_format').reset_index()

Unnamed: 0,year,month,label,mystr,C,keyword_format
0,2023,5,abc,Mary had a little lamb,"(Constant, tuple)","year=2023, month=5, label=abc, mystr=Mary had ..."
1,2023,5,def,Mary had a little lamb,"(Constant, tuple)","year=2023, month=5, label=def, mystr=Mary had ..."
2,2023,5,fgh,Mary had a little lamb,"(Constant, tuple)","year=2023, month=5, label=fgh, mystr=Mary had ..."
3,2023,6,abc,Mary had a little lamb,"(Constant, tuple)","year=2023, month=6, label=abc, mystr=Mary had ..."
4,2023,6,def,Mary had a little lamb,"(Constant, tuple)","year=2023, month=6, label=def, mystr=Mary had ..."
5,2023,6,fgh,Mary had a little lamb,"(Constant, tuple)","year=2023, month=6, label=fgh, mystr=Mary had ..."


The map arguments `variables` and `constants` correspond to specifying a slice of xarray

In [13]:
D = ex.map_xarray(keyword_format, a=[1, 2, 3], b=['alpha', 'beta'])
slice_array = dict(a=[1, 2], b=['alpha'])
D_slice = D.sel(**slice_array)
D_map_sel = ex.map_xarray(D.sel, **slice_array)
assert D_slice.equals(D_map_sel)
D_slice

In [14]:
slice_constant = dict(a=[1, 2], b='alpha')
C_slice = D.sel(**slice_constant)
C_map_sel = ex.map_xarray(D.sel, a=[1, 2], constants=dict(b='alpha'))
assert C_slice.equals(C_map_sel)
C_slice

Note the difference in `C_slice` and `D_slice`!


You may wonder: why not support the same interface as the `xarray.DataArray.sel` method and remove the `constants` keyword argument? The reason is generators. `xarray.DataArray` is in-memory, which doesn't give a lot of usecases for using generators as variables. In the case of `map`, it is nice to support generators, for example to load large files in a lazy fashion.

Another argument is that the goal is to provide as close as possible behavior to the triple loop construct which is demonstrated in the beginning of this tutorial.

---