(This notebook is just being used to document and catalog any Exceptions and Unwanted behaviour encountered when stitching together NumPy & PySyft)

**Note: there are cells in Problem #8 of this notebook which will crash your kernel- please tread carefully there**

Needs further testing:
- custom numpy subclasses
- recordarray: I thought this was a minor feature but several challenges towards the end were about recordarrays


In [1]:
import numpy as np

In [2]:
from syft.core.node.new.action_object import ActionObject as AO

In [3]:
data = np.random.rand(5,5)

In [4]:
n = AO.from_obj(data)

In [5]:
type(n)

syft.core.node.new.numpy.NumpyArrayObject

# 0. Speed differences between Mock Objects and NumPy

- We're still in the ballpark of microseconds but it might be worth keeping an eye on the differences since we're 100x slower for small objects?
- **Edit: We can ignore this- the discrepancy shrinks dramatically as the array size grows**

In [6]:
%%timeit
data.min()

1.51 µs ± 138 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [7]:
%%timeit
n.min()

244 µs ± 12.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [8]:
%%timeit
np.min(data)

3.14 µs ± 168 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [9]:
%%timeit
np.min(n)

174 µs ± 1.37 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


**This 100x discrepancy shrinks dramatically as the arrays get smaller**

In [10]:
big_data = np.random.rand(1_000_000)
big_mock = AO.from_obj(big_data)

In [11]:
%%timeit
big_data.min()

461 µs ± 22.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [12]:
%%timeit
big_mock.min()

751 µs ± 46.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


# 1. ActionObjects can be cajoled into NumPy arrays, DataFrames, Tensors without qualms

- this destroys the underlying actionobject though
- Changing dype using np.astype() is fine

In [13]:
np.asarray(n)

array([[0.75623255, 0.89143062, 0.78156582, 0.83336781, 0.3335968 ],
       [0.39325524, 0.147437  , 0.27887704, 0.52646702, 0.67009805],
       [0.19971797, 0.46231332, 0.71630058, 0.78401388, 0.21372319],
       [0.92247225, 0.39762143, 0.46081398, 0.5228097 , 0.16019485],
       [0.15422984, 0.44108681, 0.49187304, 0.11544011, 0.70909087]])

In [14]:
type(np.asarray(n))

numpy.ndarray

In [15]:
type(np.max(n))

syft.core.node.new.numpy.NumpyScalarObject

In [16]:
type(np.square(n))

ufunc being called


syft.core.node.new.numpy.NumpyArrayObject

In [17]:
type(np.all(n))

syft.core.node.new.numpy.NumpyBoolObject

In [18]:
import pandas as pd

In [19]:
p = pd.DataFrame(n)

In [20]:
type(p)

pandas.core.frame.DataFrame

In [21]:
type(n.astype(np.int64))

syft.core.node.new.numpy.NumpyArrayObject

# 2. NumPy methods that return tuples

(challenge 10)

- The resultant tuple gets put in a single ActionObject, as opposed to each of the results being an ActionObject
- Lineage IDs of the elements in the Tuple

In [22]:
np.nonzero(data)

(array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4,
        4, 4, 4]),
 array([0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1,
        2, 3, 4]))

In [23]:
type(np.nonzero(data))

tuple

In [24]:
np.nonzero(n)

(array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4,
       4, 4, 4]), array([0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1,
       2, 3, 4]))

In [26]:
type(np.nonzero(n))

syft.core.node.new.action_object.AnyActionObject

**What's kinda awesome though is that it will auto-detect what the items in the tuple are when you ask for them!**

In [27]:
type(np.nonzero(n)[0])

syft.core.node.new.numpy.NumpyArrayObject

**I'm not sure if there should be any patterns with regards to Lineage IDs here though**

In [28]:
result = np.nonzero(n)

In [29]:
result.syft_lineage_id

<LineageID: 2638b60de4cd4bcface696e69b40e5c7 - 1402117398183524927>

In [30]:
result[0].syft_lineage_id

<LineageID: eea93dda75454fe18aa8b60d9791a4fc - 2302083646917190723>

In [31]:
result.syft_history_hash

1402117398183524927

In [32]:
result[0].syft_history_hash

434629797836514065

# 3. NumPy methods that create new arrays will still return new arrays

(Challenge 16, Challenge 21, ...)

- An easy example of this is np.pad which returns a new array, or np.tile
- Some methods that extract, such as np.diag will also have this problem

In [33]:
np.pad(data, pad_width=1)

array([[0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        ],
       [0.        , 0.75623255, 0.89143062, 0.78156582, 0.83336781,
        0.3335968 , 0.        ],
       [0.        , 0.39325524, 0.147437  , 0.27887704, 0.52646702,
        0.67009805, 0.        ],
       [0.        , 0.19971797, 0.46231332, 0.71630058, 0.78401388,
        0.21372319, 0.        ],
       [0.        , 0.92247225, 0.39762143, 0.46081398, 0.5228097 ,
        0.16019485, 0.        ],
       [0.        , 0.15422984, 0.44108681, 0.49187304, 0.11544011,
        0.70909087, 0.        ],
       [0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        ]])

In [34]:
np.pad(n, pad_width=1)

array([[0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        ],
       [0.        , 0.75623255, 0.89143062, 0.78156582, 0.83336781,
        0.3335968 , 0.        ],
       [0.        , 0.39325524, 0.147437  , 0.27887704, 0.52646702,
        0.67009805, 0.        ],
       [0.        , 0.19971797, 0.46231332, 0.71630058, 0.78401388,
        0.21372319, 0.        ],
       [0.        , 0.92247225, 0.39762143, 0.46081398, 0.5228097 ,
        0.16019485, 0.        ],
       [0.        , 0.15422984, 0.44108681, 0.49187304, 0.11544011,
        0.70909087, 0.        ],
       [0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        ]])

In [35]:
type(np.pad(n, pad_width=1))

numpy.ndarray

In [36]:
np.diag(n)

array([0.75623255, 0.147437  , 0.71630058, 0.5228097 , 0.70909087])

In [37]:
type(np.diag(n))

numpy.ndarray

# 4. Handling of NaNs:

(Challenge 17)

- You could say that NaN behaviour is **unpla-NaN-able**

In [38]:
array = AO.from_obj(np.empty((3,3)) * np.nan)

In [39]:
array

array([[nan, nan, nan],
       [nan, nan, nan],
       [nan, nan, nan]])

In [40]:
np.nan in array

False

In [41]:
type(array)

syft.core.node.new.numpy.NumpyArrayObject

In [42]:
print(0 * np.nan)
print(np.nan == np.nan)
print(np.inf > np.nan)
print(np.nan - np.nan)
print(np.nan in set([np.nan]))
print(0.3 == 3 * 0.1)

nan
False
False
nan
True
False


# 5 & 6. Metadata is ActionObject too!!!  & ActionObjects can't always be a replacement for integers

(Challenge 20)

In [43]:
np.unravel_index(12, data.shape)

(2, 2)

In [44]:
np.unravel_index(12, n.shape)

TypeError: 'tuple' object cannot be interpreted as an integer

**This actually happens because `.shape` returns an ActionObject instead of an integer**

In [45]:
type(n.shape)

syft.core.node.new.action_object.AnyActionObject

# 7. Chaining operations in line can sometimes cause errors

(Challenge 22)

- This was discovered when trying to normalize an array

**This works:**

In [46]:
n - n.mean()

array([[ 0.26167136,  0.39686943,  0.28700463,  0.33880662, -0.16096439],
       [-0.10130595, -0.34712419, -0.21568415,  0.03190582,  0.17553686],
       [-0.29484322, -0.03224787,  0.22173939,  0.28945268, -0.280838  ],
       [ 0.42791106, -0.09693976, -0.03374721,  0.02824851, -0.33436634],
       [-0.34033135, -0.05347438, -0.00268815, -0.37912108,  0.21452967]])

**This works:**

In [47]:
n.std()

0.2508100284940524

**This works:**

In [48]:
(n - n.mean())/0.2937

ufunc being called


array([[ 0.89094777,  1.35127488,  0.97720337,  1.15358058, -0.54805716],
       [-0.34493002, -1.18190055, -0.73436893,  0.10863406,  0.59767401],
       [-1.00389249, -0.10979867,  0.754986  ,  0.98553859, -0.95620701],
       [ 1.45696649, -0.33006386, -0.11490367,  0.09618152, -1.13846217],
       [-1.15877206, -0.18207144, -0.00915271, -1.29084466,  0.73043811]])

**So surely this should work:**

In [49]:
(n-n.mean())/n.std()

ufunc being called


---------------------------------------------------------------------------
SyftException
---------------------------------------------------------------------------
Exception: Must init <class 'syft.core.node.new.numpy.NumpyArrayObject'> with <class 'numpy.ndarray'> not <class 'numpy.float64'>


# 8. In-place modifications can fail and kill the kernel

(Challenge 25 & 35)

- **Reminder from present Ishan to future Ishan: the last cell WILL crash your kernel**

In [50]:
data

array([[0.75623255, 0.89143062, 0.78156582, 0.83336781, 0.3335968 ],
       [0.39325524, 0.147437  , 0.27887704, 0.52646702, 0.67009805],
       [0.19971797, 0.46231332, 0.71630058, 0.78401388, 0.21372319],
       [0.92247225, 0.39762143, 0.46081398, 0.5228097 , 0.16019485],
       [0.15422984, 0.44108681, 0.49187304, 0.11544011, 0.70909087]])

In [51]:
data[data > 0.5] *= 2

In [52]:
data

array([[1.5124651 , 1.78286125, 1.56313164, 1.66673562, 0.3335968 ],
       [0.39325524, 0.147437  , 0.27887704, 1.05293403, 1.3401961 ],
       [0.19971797, 0.46231332, 1.43260116, 1.56802775, 0.21372319],
       [1.8449445 , 0.39762143, 0.46081398, 1.04561941, 0.16019485],
       [0.15422984, 0.44108681, 0.49187304, 0.11544011, 1.41818173]])

**The mock object reflects the changed data:**

In [53]:
n

array([[1.5124651 , 1.78286125, 1.56313164, 1.66673562, 0.3335968 ],
       [0.39325524, 0.147437  , 0.27887704, 1.05293403, 1.3401961 ],
       [0.19971797, 0.46231332, 1.43260116, 1.56802775, 0.21372319],
       [1.8449445 , 0.39762143, 0.46081398, 1.04561941, 0.16019485],
       [0.15422984, 0.44108681, 0.49187304, 0.11544011, 1.41818173]])

In [54]:
n > 0.5

array([[ True,  True,  True,  True, False],
       [False, False, False,  True,  True],
       [False, False,  True,  True, False],
       [ True, False, False,  True, False],
       [False, False, False, False,  True]])

In [55]:
type(n > 0.5)

syft.core.node.new.numpy.NumpyArrayObject

**DON'T RUN THIS**

In [None]:
while False:
    n[n > 0.5] *= 2
    
    
    # Similarly
    np.add(mock_obj, mock_obj, out=mock_obj)
    np.sub(mock_obj, mock_obj, out=mock_obj)
    np.multiply(mock_obj, mock_obj, out=mock_obj)
    np.divide(mock_obj, mock_obj, out=mock_obj)
    
    
    # Actually even this crashes you
    n += 1

# 9. MemoryErrors are not triggered on certain operations

(Challenge 26)

- **Reminder from present Ishan to future Ishan: the last cell WILL crash your kernel**

In [56]:
np.random.rand(156816, 36, 53806)

MemoryError: Unable to allocate 2.21 TiB for an array with shape (156816, 36, 53806) and data type float64

In [68]:
while False:
    np.sum(range(int(1e16)))  # might also work with arange

# 10. NumPy Flags and Settings

(Challenge 43)

In [57]:
n.flags

  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

In [58]:
np.ones(10).flags.writeable = False

In [59]:
n.flags.writeable = False

ValueError: "AnyActionObject" object has no field "writeable"

# 11. Modifying NumPy domain node permissions

(Challenge 49)

- If python is imported using `np = client.numpy` then giving the ability to toggle things like `np.errstate(all="ignore")` could be dangerous

In [60]:
import syft as sy
W = sy.Worker()

> Starting Worker: Crazy Dolgov - 135ed68c572a420b90ec84ce5c11ce9f - NodeType.DOMAIN - [<class 'syft.core.node.new.user_service.UserService'>, <class 'syft.core.node.new.metadata_service.MetadataService'>, <class 'syft.core.node.new.action_service.ActionService'>, <class 'syft.core.node.new.test_service.TestService'>, <class 'syft.core.node.new.dataset_service.DatasetService'>, <class 'syft.core.node.new.user_code_service.UserCodeService'>, <class 'syft.core.node.new.request_service.RequestService'>, <class 'syft.core.node.new.data_subject_service.DataSubjectService'>, <class 'syft.core.node.new.network_service.NetworkService'>, <class 'syft.core.node.new.policy_service.PolicyService'>, <class 'syft.core.node.new.message_service.MessageService'>, <class 'syft.core.node.new.project_service.ProjectService'>, <class 'syft.core.node.new.data_subject_member_service.DataSubjectMemberService'>]


In [61]:
from syft.core.node.new.client import SyftClient
client = SyftClient.from_node(W).login(email="info@openmined.org", password="changethis")

In [62]:
nump = client.numpy

Using numpy version: 1.24.2


In [63]:
nump.errstate(all="ignore")

<numpy.errstate at 0x7f3ebfba0100>

# 12. Custom NumPy classes 

(Challenge 63)

- Still to test edge cases thoroughly

In [79]:
np.random.shuffle(n)

  np.random.shuffle(n)


# 13. Record Arrays are very painful

(Numerous challenges towards the end T.T)

In [64]:
Z_data = np.array([("Hello", 2.5, 3),
              ("World", 3.6, 2)])

np.core.records.fromarrays(
    Z_data.T,
    names='col1, col2, col3',
    formats = 'S8, f8, i8')

rec.array([(b'Hello', 2.5, 3), (b'World', 3.6, 2)],
          dtype=[('col1', 'S8'), ('col2', '<f8'), ('col3', '<i8')])

In [65]:
Z = AO.from_obj(np.array([("Hello", 2.5, 3),
              ("World", 3.6, 2)]))

In [66]:
Z

array([['Hello', '2.5', '3'],
       ['World', '3.6', '2']], dtype='<U32')

In [67]:
np.core.records.fromarrays(
    Z.T,
    names='col1, col2, col3',
    formats = 'S8, f8, i8')

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.