## NumPy Interoperability

Here, I'm working through the numpy docs on Interoperability and share my notes,
learnings and experiments. <br>
Covered docs: <br>
- [Interoperability with
  NumPy](https://numpy.org/doc/stable/user/basics.interoperability.html)
- [Writing custom array containers](https://numpy.org/doc/stable/user/basics.dispatch.html#basics-dispatch)
- [Array API standard compatibility](https://numpy.org/doc/stable/reference/array_api.html)

While numpy provides an implementation of array types and operations based on "strided
in-RAM storage", other libraries have re-implemented numpy for their own needs,
including: GPU arrays (CuPy), Sparse arrays (scipy.sparse, PyData/Sparse), parallel
arrays (Dask arrays), TensorFlow and PyTorch; also XArray and JAX build on top of the
NumPy API.

Interoperability between array libraries allows users to use the same syntax with
minimal changes. There are three groups of features used for interoperability with NumPy:

1. Methods of turning a foreign object into an ndarray;

2. Methods of deferring execution from a NumPy function to another array library;

3. Methods that use NumPy functions and return an instance of a foreign object.

4. Array API

### 1. Using arbitrary objects in NumPy

- foreign objects are treated as NumPy arrays whenever possible
- that is possible if they:
     - provide an `__array_interface__` attribute to access the data buffer
     - provide an `__array__()` method with the signature `__array__(self, dtype=None, copy=None)`
     - adhere to the [DLPack Protocol](https://dmlc.github.io/dlpack/latest/python_spec.html#python-spec)

`__array_interface__`

In [1]:
import numpy as np

x = np.array([1, 2, 5.0, 8])
x.__array_interface__

{'data': (96280064543792, False),
 'strides': None,
 'descr': [('', '<f8')],
 'typestr': '<f8',
 'shape': (4,),
 'version': 3}

In [3]:
# __array_interface__ can be used to modify the object in place:

class wrapper():
    pass

arr = np.array([1, 2, 3, 4])
buf = arr.__array_interface__
buf
# {'data': (96280064143728, False),
#  'strides': None,
#  'descr': [('', '<i8')],
#  'typestr': '<i8',
#  'shape': (4,),
#  'version': 3}

buf['shape'] = (2, 2)
w = wrapper()
w.__array_interface__ = buf
new_arr = np.array(w, copy=False)
new_arr

array([[1, 2],
       [3, 4]])

In [8]:
# `arr` and `new_arr` share the same data buffer:

new_arr[0, 0] = 1000
new_arr
# array([[1000,    2],
#        [   3,    4]])

arr

array([1000,    2,    3,    4])

`__array__()`

In [11]:
# writing a custom numpy container and providing the __array__() method:

class DiagonalArray:

    def __init__(self, N, value):
        self._N = N
        self._i = value

    def __repr__(self):
        return f"{self.__class__.__name__}(N={self._N}, value={self._i})"
    
    def __array__(self, dtype=None, copy=None):
        if copy is False:
            raise ValueError(
                "`copy=False` isn't supported. A copy is always created."
            )
        return self._i * np.eye(self._N, dtype=dtype)

arr = DiagonalArray(5, 1)

type(arr)
# __main__.DiagonalArray

arr

DiagonalArray(N=5, value=1)

In [12]:
# using np.array or np.asarray calls its __array__ method:

np.asarray(arr)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

DLPack protocol
- `np.from_dlpack()` accepts (array) objects with a `__dlpack__` method and uses that
  method to construct a new array containing the same data

In [None]:
import torch
x_torch = torch.arange(5)
x_torch
# tensor([0, 1, 2, 3, 4])

x_np = np.from_dlpack(x_torch)
x_np
# array([0, 1, 2, 3, 4])

# x_np is a view of x_torch:
x_torch[1] = 100
x_torch
# tensor([  0, 100,   2,   3,   4])

x_np

array([  0, 100,   2,   3,   4])


### 2. Operating on foreign objects without converting
- allows execution of a NumPy function on objects from another array library

In [13]:
def f(x):
    # `np.exp` is a ufunc, meaning it operates on ndarrays in an element-by-element fashion
    # `np.mean` operates along one of the array’s axes
    return np.mean(np.exp(x))

# applying `f` to a numpy object:
x = np.array([1, 2, 3, 4])
f(x)

np.float64(21.1977562209304)

NumPy-like array object that implement either `__array_ufunc__` or `__array_function__`
can handle computations in a custom-defined way without the need for explicit
conversion.

In [16]:
import pandas as pd
ser = pd.Series([1, 2, 3, 4])
type(ser)
# pandas.core.series.Series

# since pandas Series implement `__array_ufunc__`, we can use numpy ufuncs on them:
np.exp(ser)

0     2.718282
1     7.389056
2    20.085537
3    54.598150
dtype: float64

In [21]:
# and we can mix types:
np.add(ser, np.array([5, 6, 7, 8]))
# 0     6
# 1     8
# 2    10
# 3    12
# dtype: int64

f(ser)
# np.float64(21.1977562209304)

result = ser.__array__()
result
# array([1, 2, 3, 4])

type(result)

numpy.ndarray

### 3. Returning foreign objects

- idea is to use the NumPy function implementation and then convert the return value
  back into an instance of the foreign object

- `__array_finalize__` and `__array_wrap__` methods on numpy objects ensure that the
return type of a NumPy function can be specified as needed

In [24]:
# return type of this function is compatible with the initial data type

import torch
data = [[1, 2],[3, 4]]
x_np = np.array(data)
x_tensor = torch.tensor(data)

np.exp(x_tensor)

# as the warning message indicates, __array_wrap__ is used internally :)

  np.exp(x_tensor)


tensor([[ 2.7183,  7.3891],
        [20.0855, 54.5982]], dtype=torch.float64)

### 4. Array API standard 
The docs refer to scipy's [Support for the array API standard](https://docs.scipy.org/doc/scipy/dev/api-dev/array_api.html) and scikit-learn's [Array API support (experimental)](https://scikit-learn.org/stable/modules/array_api.html).