In [1]:
import eindex.numpy as EX
import numpy as np

In [2]:
# let's load data for this demo
from data import DemoData
data = DemoData()

## Introduction:

Several heroes travelled to various worlds.

Their travel records have been documented:

In [3]:
# we take only 2 years and 4 months in each year
data.years, data.months

([2020, 2021], ['August', 'September', 'October', 'Neverbruary'])

In [4]:
# we use a magic function to visualize arrays

# We pass an array of indices and describe contents in a special format.
# Below, an array data.travels contains values [world, place] for every combination of hero+year+month
# we additionally specified how to display it:
#       x-axis (columns) is hero, while y-axis is year and month

data.visualize(data.travels, '[world, place] hero year month', cols='hero', rows='year month')

**Important:** If above does not look like a table, but more like a list, you need to rerun notebook, or trust the notebook.

In [5]:
# shape is [2, n_heroes, n_years, n_months]
# pay attention to that first axis of length 2 - after we specified 'free variables' hero, year, month,
# we get 1-d array with two values - world and place.
print(data.travels.shape) 


# now by comparing it to the pattern '[world, place] hero year month', you see where 2 comes from
# this special axis comes first

(2, 5, 2, 4)


In [6]:
# data is not very readable as we basically operate with indices everywhere,
# thus we data.visualize throughout the tutorial
data.travels.flatten()

array([0, 3, 2, 3, 3, 1, 2, 3, 0, 0, 4, 2, 2, 4, 3, 2, 0, 0, 1, 2, 3, 4,
       4, 3, 1, 4, 2, 1, 2, 0, 1, 4, 1, 1, 1, 1, 2, 0, 3, 1, 0, 1, 0, 2,
       0, 3, 3, 0, 0, 0, 0, 3, 1, 0, 0, 3, 3, 2, 0, 2, 3, 3, 1, 0, 2, 3,
       3, 2, 1, 3, 0, 3, 0, 2, 0, 2, 0, 2, 0, 1])

In [7]:
# we also know the temperature for every month in each of these places
data.temp.shape

(5, 4, 2, 4)

In [8]:
# let's visualize temperatures
data.visualize([data.temp], '[temp] world place year month', cols='world place', rows='year month')

## Multidimensional argmax and argmin

And weather investigations

In [9]:
# it is simple to find argmax over a single axis in numpy.

# In this code we compute argmin over last axis (months),
# i.e. for every world+place+year we compute month with lowest temprature
np.argmin(data.temp, axis=-1).shape

(5, 4, 2)

In [10]:
# however numpy and other tensor engines can't compute argmin over several axes.
# That's how you can do it:
coldest_times = EX.argmin(data.temp, 'world place year month -> [year, month] world place')
# look carefully at output part: it says that we want to get year+month for every world+place,
# thus we have taken argmin over two axes (year and month, they were moved into brackets and are not free variables anymore)

# shape is 2 x n_worlds x n_places. Actually, you can see it after looking at output pattern once again:
# '[year, month] world place' -> 2 values for every world and place
print(coldest_times.shape)

(2, 5, 4)


In [11]:
# we use our magic function to visualize result.
# result shape means the same as before: tensor contains year&month for every combination of world&place

# magic function knows how every index converts to year, month, world, place, etc.
# so we can get our nice pretty output, not just numbers.
data.visualize(coldest_times, '[year, month] world place', cols='world', rows='place')


In [12]:
# we can take argmin over three axes:
coldest_time_and_place = EX.argmin(data.temp, 'world place year month -> [year, month, place] world')

data.visualize(coldest_time_and_place, '[year, month, place] world', cols='', rows='world')

In [13]:
# or even argmin on all four axes
coldest_of_all = EX.argmin(data.temp, 'world place year month -> [world, place, year, month]')
data.visualize(coldest_of_all, '[world, place, year, month]', '')

# this one is easy to replicate with np.argmin and np.unravel:
coldest_of_all_np = np.unravel_index(np.argmin(data.temp, axis=None), data.temp.shape)
# let's print both
print(' eindex:', coldest_of_all, '\n numpy:  ', coldest_of_all_np)


 eindex: [2 1 0 0] 
 numpy:   (2, 1, 0, 0)


In [14]:
# exercise: For every year and every world, find month & place with the highest temperature
# Solution
data.visualize(
    EX.argmax(data.temp, 'world place year month -> [place, month] world year'),
    '[place, month] world year', 'world', 'year',
)

## Argsorting: multidimensional ranking

In [15]:
# argsorting over multiple dimensions is similar to argmin/argmax,
# but additionally introduces an 'order' axis. This axis enumerates element in their sorted order
places_ordered_by_temperature = EX.argsort(data.temp, 'world place year month -> [world, place] year month order')

print('Coldest place in every month')
# here is the coldest month
data.visualize(places_ordered_by_temperature[..., 0], '[world, place] year month', cols='year month')

print('Top 5 coldest places in every month')
# top 5 coldest places
data.visualize(places_ordered_by_temperature[..., :5], '[world, place] year month order', cols='year month',
               additional_axes=dict(order=range(5)))

Coldest place in every month


Top 5 coldest places in every month


In [16]:
# exercise: find 2 hottest months for every place in every world
# solution:
print('Two hottest months')
hottest_months = EX.argsort(data.temp, 'world place year month -> [year, month] world place order')[..., -2:]
data.visualize(hottest_months, '[year, month] world place order', cols='world place',
               additional_axes=dict(order=range(2)))



Two hottest months


In [17]:
# Exercise: for every place, order months from coldest to warmest.
# solution:
sorted_months = EX.argsort(data.temp, 'world place year month -> [year, month] world place order')
data.visualize(sorted_months, '[year, month] world place order', cols='world place',
               additional_axes=dict(order=range(data.n_month * data.n_year)))

## Indexing

In [18]:
# since we know for every month&year where every hero was,
# and we know all temperatures, how about we combine these two?

# for every trip we pick corresponding record from array of temperatures.
# note that there are four axes that we use for indexing right now: world, place, year, and month
# and two of them are free variables, and two of them aren't
temp_for_heroes = EX.gather(
    data.temp, data.travels,
    'world place year month, [world, place] hero year month -> hero year month',
)

data.visualize([temp_for_heroes], '[temp] hero year month', cols='year month')

## Argmaxing + indexing

In [19]:
# we compute for every place its warmest time
warmest_times = EX.argmax(data.temp, 'world place year month -> [year, month] world place')
# take temperature at this moment
warmest_temp = EX.gather(data.temp, warmest_times, 'world place year month, [year, month] world place -> world place')
# and number of visitors at this moment in this place
n_visitors = EX.gather(data.visitors, warmest_times, 'world place year month, [year, month] world place -> world place')


print('temperature:')
data.visualize([warmest_temp], '[temp] world place', cols='world')

print('n visitors:')
data.visualize([n_visitors], '[other] world place', cols='world')

temperature:


n visitors:


In [20]:
# we can check highest temperature directly, as reductions in numpy can work over multiple axes:
warmest_temp_np = np.max(data.temp, axis=(2, 3))
assert np.array_equal(warmest_temp_np, warmest_temp)
# but we can't compute number of visitors the same way

## Aggregating with gather

In [21]:
# sometimes we want an aggregative information,
# e.g. to compute 'average temperature over year' for every hero, we could:
# 1) compute temperature for every month 2) average over months within every year

# Gather allows you to combine indexing and reduction.
# In example below we skip month in output and specify that we want mean-reduction
temp_for_heroes_yearly = EX.gather(
    data.temp, data.travels,
    'world place year month, [world, place] hero year month -> hero year', 'mean',
)

data.visualize([temp_for_heroes_yearly], '[temp] hero year', cols='year')

In [22]:
# take maximal temperature for every hero and in every year
temp_for_heroes_yearly = EX.gather(
    data.temp, data.travels,
    'world place year month, [world, place] hero year month -> hero year', 'max',
)
data.visualize([temp_for_heroes_yearly], '[temp] hero year', cols='year')


# exercise: for every character compute the lowest temperature experienced
# solution
temp_for_heroes = EX.gather(
    data.temp, data.travels,
    'world place year month, [world, place] hero year month -> hero', 'min',
)
data.visualize([temp_for_heroes], '[temp] hero', cols='', rows='hero')

## Scattering

In [23]:
# we also have records of how many photos were taken by heroes. These records are monthly:
data.visualize([data.photos], '[other] hero year month', 'year month')

In [24]:
# now let's compute how many photos were taken in every city and in every month.
# scatter does this.
# note that we need to provide additional information, as otherwise result output is unclear
n_photos = EX.scatter(data.photos, data.travels,
                      'hero year month, [world, place] hero year month -> world place year month',
                      place=data.n_place, world=data.n_world)
data.visualize([n_photos], '[other] world place year month', 'world place')

In [25]:
# Total number of photos for every year and every world.
# We aggregated over month and heroes

# We also do not use place below, and it was prefixed with underscore to show this.
n_photos = EX.scatter(data.photos, data.travels, 
                      'hero year month, [world, _place] hero year month -> world year', "sum", 
                      world=data.n_world)
data.visualize([n_photos], '[other] world year', 'world')

In [26]:
# however we can just use only one indexer (world), because we don't need an indexer over month:
# After looking at this example, it should be clear why indexers are the first dimension,
# and why they are comma-separated in square brackets
[world, _place] = data.travels # reminder that travels are "[world, place] hero year month"
n_photos = EX.scatter(data.photos, [world], 
                      'hero year month, [world] hero year month -> world year', "sum",
                      world=data.n_world)
data.visualize([n_photos], '[other] world year', 'world')

### How to rememeber the difference between gather and scatter

Common rule:

- in gather, you index input array (thus, index specifies where you pick values from)
- in scatter, you index output array (thus, index specifies where you put values to)

In `eindex`, this rule transforms into its equivalent:
- gather: bracketed variables can be in input
- scatter: bracketed variables can be in output

In [27]:
# how many heros had been to the city at that moment?
n_heroes = EX.scatter(np.ones([]), data.travels, 
                      ' , [world, place] hero year month -> world place year month', 
                      place=data.n_place, world=data.n_world)

data.visualize([n_heroes], '[other] world place year month', 'world place')

n_heroes_at_top_temp = EX.gather(n_heroes, warmest_times, 
                                 'world place year month, [year, month] world place -> world place')

data.visualize([n_heroes_at_top_temp], '[other] world place', 'world')

# how many heroes were seen by another hero?
seen_including_self = EX.gather(n_heroes, data.travels, 
                                'world place year month, [world, place] hero year month -> hero', 'sum')
data.visualize([seen_including_self - data.n_month * data.n_year], '[other] hero', '')

## Hero movement patterns

In [28]:
# So as we know, every month a hero moves to another place
# Let's compute for every pair of places how many heroes traveled from first to second

# connecting next and previous location by skipping last month in first copy, and first month in the second
travel_from_to = [*data.travels[...,  :-1], *data.travels[..., 1:]]

transfers_worlds = EX.scatter(
    np.ones([]), travel_from_to,
    ' , [world_from, _place_from, world_to, _place_to] hero year month_from -> world_from world_to',
    world_from=data.n_world, world_to=data.n_world
)

data.visualize([transfers_worlds], "[other] world_from world_to", "world_to")

In [29]:
# with minimal effort we can achieve very different granularity.
# I've added year to output in this example, so now statistic is split over years.

transfers_worlds = EX.scatter(
    np.ones([]), travel_from_to,
    ' , [world_from, _place_from, world_to, _place_to] hero year month_from -> year world_from world_to',
    world_from=data.n_world, world_to=data.n_world
)
data.visualize([transfers_worlds], "[other] year world_from world_to", "world_from year")

# Exercise: try adding place_from or replacing world_to with place_to, or adding month_from

# Practical recommendations

Some useful tips are collected below. Click to expand

<details markdown=1>
<summary markdown=1>
<b>Use numpy indexing when dealing with a single variable</b>
</summary>

Numpy indexing actually works very well when you need to operate only on a single axis at a time.

E.g. if you want to subselect/reorder months:

```python
temp[:, :, :, month_order]
```

is way shorter than
```python
VI.gather('world place year month_old, [month_old] month -> world place year month', temp, month_order)
```

It is quite simple to keep things in your head because only one axis is change, but the order is the same.

</details>

<details markdown=1>
<summary markdown=1>
<b>Short names for axes can be ok</b>
</summary>

Target at your reader. If it is obvious from the context which axis is what, you can just use short names

```
VI.gather(..., 'world place year month, [world, place] hero year month  -> hero year month')
VI.gather(..., 'w p y m, [w,p] h y m -> h y m')
```

Being polite to reader of your code is a good idea. In most cases it will be you :)

</details>

<details markdown=1>
<summary markdown=1>
<b>Use visual keys to group variables</b>
</summary>

It takes time to start reading patterns fluently.

Visual marking can help reader:
e.g. capitalizing indexing variables helps immediately seeing which axes were indexed in input.

Another way is prepending name with I (indexer)

```
VI.gather(..., 'w p y m, [w,p] h y m  -> h y m')
VI.gather(..., 'W P y m, [W,P] h y m  -> h y m')
VI.gather(..., 'Iw Ip y m, [Iw,Ip] h y m  -> h y m')
```

Shortcuts become handy when you code is packed with indexing
</details>



<details markdown=1>
<summary markdown=1>
<b>Do not use indexing if you don't need to</b>
</summary>

Our simple minimal example here can be normally handled by polars/pandas/sqlite by simply keeping flat records.
Dataframe/records-based solutions are maybe less memory-efficient, but are quite convenient.

Use eindex when you need to deal with tensors.
</details>

