# 01 - Getting started with AxisUtilities

## Introduction
`AxisUtilities` was originally developed to manages Time Axis and different operations related to time with the main 
focus on Earth & Atmospheric Science Community. For example, you might have a daily 3D spatially distributed temperature
and you want to calculate the monthly average of this data. This result in the same spatial coordinate, however, with
a different time axis/coordinate. 

However, similar operations could be performed on any one-dimensional axis. Let's say your data is distributed along the
z-coordinate in certain way, and now you want to average them in a different vertical distribution. Although, your 
source axis is not time anymore, the mathematical operation that is being performed is the same. For this reason, it was
decided to rename the package from [`TimeAxis`](https://github.com/maboualidev/TimeAxis) to 
[`AxisUtilities`](https://github.com/coderepocenter/AxisUtilities).

During the axis conversion (conversion from source axis to destination axis), for example computing the monthly mean
from the daily data, there are a lot of computations that needs to be done which does not involve the data itself. This
means that we could cache these computations and reuse them to achieve a better performance. As long as the source and
the destination axis have not changed, we could use the cached computation to perform the axis conversion. One of the
features that `AxisUtilities` provide is caching these computations and allowing you to reuse it to achieve better 
performance. The same concept is being used in other packages such as 
[`ESMF`](https://www.earthsystemcog.org/projects/esmf/), 
[`SCRIP`](https://github.com/SCRIP-Project/SCRIP), and 
[`2D and 3D Remapping`](https://www.mathworks.com/matlabcentral/fileexchange/41669-2d-and-3d-remapping). In those 
packages, the cached computation is referred as ***Remapping Weights***.

## How To Install?
### using pip
As usual, you could use `pip` installation as follows:

```
pip install axisutilities
```

### using conda
You could install `AxisUtilities` using conda from `aciacs` channel as follows:

```
conda install -c conda-forge -c aciacs axisutilities
```

It is a good idea to create an environment for your project. in that case you could issue:

```
conda create -c conda-forge -c aciacs --name your_environment_name axisutilities
```

## Now Really Getting Started with using `AxisUtilities`
The general procedure is:

0. Create a source axis, i.e. the axis that your original data is on,
1. Create a destination axis, i.e. the axis that you want to convert your data to,
2. Create an `AxisRemapper` object by passing the source and destination axis you created previously,
3. Finally, convert your data from the source axis to the destination axis, using the `AxisRemapper` object you created
in previous step.

You could repeat step (3) as many time as you want, as long as the source and destination axis are the same. The true
benefit of this approach is in the reuse of the same computations, a.k.a. ***remapping weights***.

### Example 0: Manually creating The axis
Let's say we have a daily data for 21 days and we are interested to average them each 7 days
In this example we are going to create both of the source and destination axis completely manually.

#### Step 0: Creating the source axis
You could use `Axis` class to manually create an axis. You would need three type of information:

0. Lower bounds,
1. Upper bounds, and
2. information about data ticks.

Although the last item is not yet needed and is provided there for future development; however,
you would need to provide it some how.

As the name suggests, Lower and upper bound defines the start and end of an interval for each
element or data entry of the axis. You would need to provide an integral value for it; however,
`Axis` class does not support any units yet; So, it is the users responsibilities to make sure
that the numbers are using a consistent unit and reference, particularly across the source and
destination axis.

For example, in this example, We mentioned that we want the source data to be 21 days. Let's
say we are going to use hour as our unit. Hence, the lower bound and the upper bound for first 3
days are:

Day| 0 | 1 | 2|
-|-|-|-|
lower bound| 0 | 24 | 48 |
upper bound| 24 | 48 | 72|

Let's create our lower and upper bound

In [35]:
import numpy as np

lower_bound = np.arange(21) * 24
upper_bound = lower_bound + 24

print("lower bound: ", lower_bound)
print("upper bound: ", upper_bound)

lower bound:  [  0  24  48  72  96 120 144 168 192 216 240 264 288 312 336 360 384 408
 432 456 480]
upper bound:  [ 24  48  72  96 120 144 168 192 216 240 264 288 312 336 360 384 408 432
 456 480 504]


Now the data ticks! Although it is not required, but you need to provide it. There are
multiple options to provide the data tick. 

0. Directly provide the values for the data ticks. For example, let's assume in our example
we want to bind the data in the middle of the interval or day. Hence, we could create the 
data ticks as follows:

In [36]:
data_ticks = lower_bound + 12
print("data ticks: ", data_ticks)

data ticks:  [ 12  36  60  84 108 132 156 180 204 228 252 276 300 324 348 372 396 420
 444 468 492]


1. another option is to provide a fraction between 0 and 1 defining where in the interval you
the data to be binded. For example, if you want to bind the data in the middle, as we did above,
you could pass `fraction = 0.5`.

2. The other option is to explicitly define the binding using words for one of the more
well known locations, i.e.: "beginning", "middle", and "end".

Let's create our source axis using all the three methods above:

In [37]:
from axisutilities import Axis

source_axis_method_0 = Axis(lower_bound, upper_bound, data_ticks=data_ticks)
source_axis_method_1 = Axis(lower_bound, upper_bound, fraction=0.5)
source_axis_method_2 = Axis(lower_bound, upper_bound, binding="middle")

all the above source axis are equal. This is just to show how you could make it in different
way. Without loosing any generality, let's stick to the last method. (I find it more 
expressive if some one else is reading the code).

In [38]:
src_axis = Axis(lower_bound, upper_bound, binding="middle")

print("Source Axis: \n", src_axis)

Source Axis: 
 <timeaxis.TimeAxis>

  > nelem:
	21
  > lower_bound:
	[0 ... 480]
  > upper_bound:
	[24 ... 504]
  > data_ticks:
	[12 ... 492]
  > fraction:
	[0.5 ... 0.5]
  > binding:
	middle


#### Step 1: Creating the destination axis
Likewise, we could create the destination axis. In our example, we are looking for weekly
destination axis covering the 21 days, hence:

In [39]:
dst_axis = Axis(
    lower_bound=np.arange(3) * 7 * 24,
    upper_bound=np.arange(1, 4) * 7 * 24,
    binding="middle"
)

print("Destination Axis: \n", dst_axis)


Destination Axis: 
 <timeaxis.TimeAxis>

  > nelem:
	3
  > lower_bound:
	[0 ... 336]
  > upper_bound:
	[168 ... 504]
  > data_ticks:
	[84 ... 420]
  > fraction:
	[0.5 ... 0.5]
  > binding:
	middle


#### Step 2: Creating the remapper object

At the moment there is only one way to create the remapper object and thats by providing 
a source axis and destination axis. However, I have plans to make it even easier. But, here
is how you could create a remapper now:

In [40]:
from axisutilities import AxisRemapper

axis_remapper = AxisRemapper(from_axis=src_axis, to_axis=dst_axis)



#### Step 3: Start remapping data

Once you have your remapper, performing operations is straight forward. 
But, let's create a sample data first:

In [41]:
data_0 = np.arange(21, dtype="float")

Now doing the actual operations

##### Calculating average, min, and max:

In [42]:
data_0_avg = axis_remapper.average(data_0)
data_0_min = axis_remapper.min(data_0)
data_0_max = axis_remapper.max(data_0)
print("Weekly Average: \n", data_0_avg)
print("Weekly min: \n", data_0_min)
print("Weekly max: \n", data_0_max)

Weekly Average: 
 [[ 3.]
 [10.]
 [17.]]
Weekly min: 
 [[ 0.]
 [ 7.]
 [14.]]
Weekly max: 
 [[ 6.]
 [13.]
 [20.]]


##### Applying a user defined function:
You could also write your own function and applying your own function to the data while 
remapping. Let's say we are looking to calculate coefficient of variation, i.e. standard 
deviation divded by mean. First we need to create our function that performs the calculation
that we want:

In [43]:
def cv(data):
    return np.nanstd(data) / np.nanmean(data)


then you should apply this function as follows:

In [44]:
data_0_cv = axis_remapper.apply_function(data_0, cv)

print("Weekly Coefficient of Variation: \n", data_0_cv)

Weekly Coefficient of Variation: 
 [[0.66666667]
 [0.2       ]
 [0.11764706]]


### Example 1: Re-using the remapper object
The real benefit of this approach is in cases which you have multiple data sets on the same
time axis, and you want to convert them to a same destination axis. In these cases, you dont
need to recreate your mapper again and you can reuse all the pre-computations that are done
before.

Lets create another data sample:

In [45]:
data_1 = np.random.random(21)

data_1_avg = axis_remapper.average(data_1)
data_1_min = axis_remapper.min(data_1)
data_1_max = axis_remapper.max(data_1)
data_1_cv  = axis_remapper.apply_function(data_1, cv)

### Example 2: Processing multi-dimensional arrays
You could use the same remapper even on the multi-dimensional array. Here is an example
showing how it is done:

First let's create a sample multi-dimensional data:

In [46]:
data_2 = np.random.random((21, 90, 360))

Now lets use the same remapper:

In [47]:
data_2_avg = axis_remapper.average(data_2)
data_2_min = axis_remapper.min(data_2)
data_2_max = axis_remapper.max(data_2)

How about the coefficient of variation? Remeber the implementation of the `cv` method. It was
not support multi-dimensional arrays. So, we need to update it first:

In [48]:
def cv2(data):
    return np.nanstd(data, axis=0) / np.nanmean(data, axis=0)

now we can use the new implementation of `cv2`:

In [49]:
data_2_cv = axis_remapper.apply_function(data_2, cv2)

### Example 3: More on Processing multi-dimensional array

You can use the same remapper on multi-dimensional arrays where the source axis is not the first dimension.

Let's create a data set and show how it is done:

In [50]:
data_3 = np.random.random((90, 21, 360))

now we could re-use the same remapper as follows:

In [51]:
data_3_avg = axis_remapper.average(data_3, dimension=1)
data_3_min = axis_remapper.min(data_3, dimension=1)
data_3_max = axis_remapper.max(data_3, dimension=1)
data_3_cv = axis_remapper.apply_function(data_3, cv2, dimension=1)

print("data_2_avg.shape: ", data_2_avg.shape)
print("data_3_avg.shape: ", data_3_avg.shape)

data_2_avg.shape:  (3, 90, 360)
data_3_avg.shape:  (90, 3, 360)


*NOTE*: We did not changed the `cv2` implementation to calculate on the second dimension.
when you are writing your own custom function, always write it assuming the source axis
is the first dimension or `axis=0`.

## Conclusions
In this module, you learned:

- how to create an `Axis` object manually from the scratch
- how to create a remapper
- how to apply the remapper to calculate min, average, max
- how to write your own customized function and use the remapper to apply it
- how to use the same remapper for multiple data fields
- how to use the same remapper on multi-dimensional data
- how to use the same remapper and define the dimension that corresponds to the source axis

In the next module, we will look into the easier way of creating Axis. Instead of creating
them manually, there are many different utility functions that you could use to easily create
your source and destination axis. You could even easily create rolling window axis.




