(This notebook is just being used to document and catalog any Exceptions and Unwanted behaviour encountered when stitching together NumPy & PySyft)

**Note: there are cells in Problem #8 of this notebook which will crash your kernel- please tread carefully there**

## Still to test:

- datetime compatibility
- set_printoptions
- change dtype using view()
- custom numpy subclasses
- recordarray


## Definite problems:

- flags like readonly arrays (Z.flags.writeable = False)
- setting errors off entirely (np.seterr(all="ignore"))

In [1]:
import numpy as np

In [2]:
from syft.core.node.new.action_object import ActionObject as AO

In [3]:
data = np.random.rand(5,5)

In [4]:
n = AO.from_obj(data)

In [5]:
type(n)

syft.core.node.new.numpy.NumpyArrayObject

# 0. Speed differences between Mock Objects and NumPy

- We're still in the ballpark of microseconds but it might be worth keeping an eye on the differences since we're 100x slower for small objects?
- **Edit: We can ignore this- the discrepancy shrinks dramatically as the array size grows**

In [30]:
%%timeit
data.min()

1.16 µs ± 19.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [32]:
%%timeit
n.min()

188 µs ± 2.45 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [31]:
%%timeit
np.min(data)

2.96 µs ± 77.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [33]:
%%timeit
np.min(n)

145 µs ± 8.18 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


**This 100x discrepancy shrinks dramatically as the arrays get smaller**

In [34]:
big_data = np.random.rand(1_000_000)
big_mock = AO.from_obj(big_data)

In [35]:
%%timeit
big_data.min()

307 µs ± 45.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [36]:
%%timeit
big_mock.min()

561 µs ± 24.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


# 1. ActionObjects can be cajoled into NumPy arrays, DataFrames, Tensors without qualms

- this destroys the underlying actionobject though
- Changing dype using np.astype() is fine

In [6]:
np.asarray(n)

array([[0.29357242, 0.07640433, 0.51935303, 0.15106088, 0.63918414],
       [0.66828965, 0.63527341, 0.03281293, 0.83297766, 0.27529034],
       [0.95863122, 0.92791087, 0.07548366, 0.66861583, 0.17561004],
       [0.38817218, 0.51143325, 0.84398429, 0.68755306, 0.81060654],
       [0.18598649, 0.64289139, 0.9122598 , 0.19709636, 0.3717986 ]])

In [7]:
type(np.asarray(n))

numpy.ndarray

In [8]:
type(np.max(n))

syft.core.node.new.numpy.NumpyScalarObject

In [9]:
type(np.square(n))

ufunc being called


syft.core.node.new.numpy.NumpyArrayObject

In [10]:
type(np.all(n))

syft.core.node.new.numpy.NumpyBoolObject

In [11]:
import pandas as pd

In [12]:
p = pd.DataFrame(n)

In [13]:
type(p)

pandas.core.frame.DataFrame

In [14]:
type(n.astype(np.int64))

syft.core.node.new.numpy.NumpyArrayObject

# 2. NumPy methods that return tuples

(challenge 10)

- The resultant tuple gets put in a single ActionObject, as opposed to each of the results being an ActionObject
- Lineage IDs of the elements in the Tuple

In [26]:
np.nonzero(data)

(array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4,
        4, 4, 4]),
 array([0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1,
        2, 3, 4]))

In [27]:
type(np.nonzero(data))

tuple

In [28]:
np.nonzero(n)

(array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4,
       4, 4, 4]), array([0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1,
       2, 3, 4]))

In [29]:
type(np.nonzero(n))

syft.core.node.new.action_object.AnyActionObject

**What's kinda awesome though is that it will auto-detect what the items in the tuple are when you ask for them!**

In [19]:
type(np.nonzero(n)[0])

syft.core.node.new.numpy.NumpyArrayObject

**I'm not sure if there should be any patterns with regards to Lineage IDs here though**

In [21]:
result = np.nonzero(n)

In [22]:
result.syft_lineage_id

<LineageID: 7f920971c034426399e52b4bd402f5d3 - 7386270402550823267>

In [23]:
result[0].syft_lineage_id

<LineageID: c9c190d23886468ab8504986f5f2f37a - 458470066705737730>

In [24]:
result.syft_history_hash

7386270402550823267

In [25]:
result[0].syft_history_hash

717061301373998378

# 3. NumPy methods that create new arrays will still return new arrays

(Challenge 16, Challenge 21)

- An easy example of this is np.pad which returns a new array, or np.tile
- Some methods that extract, such as np.diag will also have this problem

In [38]:
np.pad(data, pad_width=1)

array([[0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        ],
       [0.        , 0.29357242, 0.07640433, 0.51935303, 0.15106088,
        0.63918414, 0.        ],
       [0.        , 0.66828965, 0.63527341, 0.03281293, 0.83297766,
        0.27529034, 0.        ],
       [0.        , 0.95863122, 0.92791087, 0.07548366, 0.66861583,
        0.17561004, 0.        ],
       [0.        , 0.38817218, 0.51143325, 0.84398429, 0.68755306,
        0.81060654, 0.        ],
       [0.        , 0.18598649, 0.64289139, 0.9122598 , 0.19709636,
        0.3717986 , 0.        ],
       [0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        ]])

In [39]:
np.pad(n, pad_width=1)

array([[0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        ],
       [0.        , 0.29357242, 0.07640433, 0.51935303, 0.15106088,
        0.63918414, 0.        ],
       [0.        , 0.66828965, 0.63527341, 0.03281293, 0.83297766,
        0.27529034, 0.        ],
       [0.        , 0.95863122, 0.92791087, 0.07548366, 0.66861583,
        0.17561004, 0.        ],
       [0.        , 0.38817218, 0.51143325, 0.84398429, 0.68755306,
        0.81060654, 0.        ],
       [0.        , 0.18598649, 0.64289139, 0.9122598 , 0.19709636,
        0.3717986 , 0.        ],
       [0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        , 0.        ]])

In [40]:
type(np.pad(n, pad_width=1))

numpy.ndarray

In [45]:
np.diag(n)

array([0.29357242, 0.63527341, 0.07548366, 0.68755306, 0.3717986 ])

In [46]:
type(np.diag(n))

numpy.ndarray

# 4. Handling of NaNs:

(Challenge 17)

- You could say that NaN behaviour is **unpla-NaN-able**

In [42]:
array = AO.from_obj(np.empty((3,3)) * np.nan)

In [43]:
array

array([[nan, nan, nan],
       [nan, nan, nan],
       [nan, nan, nan]])

In [44]:
np.nan in array

False

In [26]:
type(array)

syft.core.node.new.numpy.NumpyArrayObject

In [27]:
print(0 * np.nan)
print(np.nan == np.nan)
print(np.inf > np.nan)
print(np.nan - np.nan)
print(np.nan in set([np.nan]))
print(0.3 == 3 * 0.1)

nan
False
False
nan
True
False


# 5 & 6. Metadata is ActionObject too!!!  & ActionObjects can't always be a replacement for integers

(Challenge 20)

In [47]:
np.unravel_index(12, data.shape)

(2, 2)

In [48]:
np.unravel_index(12, n.shape)

TypeError: 'tuple' object cannot be interpreted as an integer

**This actually happens because `.shape` returns an ActionObject instead of an integer**

In [49]:
type(n.shape)

syft.core.node.new.action_object.AnyActionObject

# 7. Chaining operations in line can sometimes cause errors

(Challenge 22)

- This was discovered when trying to normalize an array

**This works:**

In [51]:
n - n.mean()

array([[-0.20571768, -0.42288576,  0.02006294, -0.34822921,  0.13989404],
       [ 0.16899955,  0.13598332, -0.46647716,  0.33368756, -0.22399975],
       [ 0.45934112,  0.42862077, -0.42380643,  0.16932574, -0.32368006],
       [-0.11111792,  0.01214315,  0.3446942 ,  0.18826296,  0.31131645],
       [-0.3133036 ,  0.1436013 ,  0.41296971, -0.30219373, -0.1274915 ]])

**This works:**

In [52]:
n.std()

0.29372750787685786

**This works:**

In [54]:
(n - n.mean())/0.2937

ufunc being called


array([[-0.70043472, -1.43985618,  0.06831099, -1.18566297,  0.47631611],
       [ 0.57541557,  0.46300074, -1.58827771,  1.13615104, -0.76268217],
       [ 1.56398066,  1.45938295, -1.44299091,  0.57652618, -1.10207715],
       [-0.37833815,  0.04134543,  1.17362682,  0.6410043 ,  1.05998109],
       [-1.06674702,  0.48893871,  1.40609365, -1.02891976, -0.4340875 ]])

**So surely this should work:**

In [53]:
(n-n.mean())/n.std()

ufunc being called


---------------------------------------------------------------------------
SyftException
---------------------------------------------------------------------------
Exception: Must init <class 'syft.core.node.new.numpy.NumpyArrayObject'> with <class 'numpy.ndarray'> not <class 'numpy.float64'>


# 8. In-place modifications can fail and kill the kernel

(Challenge 25 & 35)

- **Reminder from present Ishan to future Ishan: the last cell WILL crash your kernel**

In [62]:
data

array([[0.29357242, 0.07640433, 1.03870606, 0.15106088, 1.27836827],
       [1.3365793 , 1.27054682, 0.03281293, 1.66595531, 0.27529034],
       [1.91726243, 1.85582173, 0.07548366, 1.33723167, 0.17561004],
       [0.38817218, 1.02286649, 1.68796858, 1.37510611, 1.62121308],
       [0.18598649, 1.28578278, 1.8245196 , 0.19709636, 0.3717986 ]])

In [63]:
data[data > 0.5] *= 2

In [64]:
data

array([[0.29357242, 0.07640433, 2.07741212, 0.15106088, 2.55673655],
       [2.67315859, 2.54109364, 0.03281293, 3.33191062, 0.27529034],
       [3.83452486, 3.71164347, 0.07548366, 2.67446334, 0.17561004],
       [0.38817218, 2.04573298, 3.37593716, 2.75021222, 3.24242616],
       [0.18598649, 2.57156557, 3.6490392 , 0.19709636, 0.3717986 ]])

**The mock object reflects the changed data:**

In [65]:
n

array([[0.29357242, 0.07640433, 2.07741212, 0.15106088, 2.55673655],
       [2.67315859, 2.54109364, 0.03281293, 3.33191062, 0.27529034],
       [3.83452486, 3.71164347, 0.07548366, 2.67446334, 0.17561004],
       [0.38817218, 2.04573298, 3.37593716, 2.75021222, 3.24242616],
       [0.18598649, 2.57156557, 3.6490392 , 0.19709636, 0.3717986 ]])

In [66]:
n > 0.5

array([[False, False,  True, False,  True],
       [ True,  True, False,  True, False],
       [ True,  True, False,  True, False],
       [False,  True,  True,  True,  True],
       [False,  True,  True, False, False]])

In [67]:
type(n > 0.5)

syft.core.node.new.numpy.NumpyArrayObject

**DON'T RUN THIS**

In [None]:
while False:
    n[n > 0.5] *= 2
    
    
    # Similarly
    np.add(mock_obj, mock_obj, out=mock_obj)
    np.sub(mock_obj, mock_obj, out=mock_obj)
    np.multiply(mock_obj, mock_obj, out=mock_obj)
    np.divide(mock_obj, mock_obj, out=mock_obj)
    
    
    # Actually even this crashes you
    n += 1

# 9. MemoryErrors are not triggered on certain operations

(Challenge 26)

- **Reminder from present Ishan to future Ishan: the last cell WILL crash your kernel**

In [69]:
np.random.rand(156816, 36, 53806)

MemoryError: Unable to allocate 2.21 TiB for an array with shape (156816, 36, 53806) and data type float64

In [68]:
while False:
    np.sum(range(int(1e16)))  # might also work with arange

# 10. NumPy Flags and Settings

(Challenge 43)

In [70]:
n.flags

  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

In [71]:
np.ones(10).flags.writeable = False

In [72]:
n.flags.writeable = False

ValueError: "AnyActionObject" object has no field "writeable"

# 11. Modifying NumPy domain node permissions

(Challenge 49)

- If python is imported using `np = client.numpy` then giving the ability to toggle things like `np.errstate(all="ignore")` could be dangerous

In [74]:
import syft as sy
W = sy.Worker()

> Starting Worker: Sweet Pearl - 6b1b8281028040d7b4198154c91cf147 - NodeType.DOMAIN - [<class 'syft.core.node.new.user_service.UserService'>, <class 'syft.core.node.new.metadata_service.MetadataService'>, <class 'syft.core.node.new.action_service.ActionService'>, <class 'syft.core.node.new.test_service.TestService'>, <class 'syft.core.node.new.dataset_service.DatasetService'>, <class 'syft.core.node.new.user_code_service.UserCodeService'>, <class 'syft.core.node.new.request_service.RequestService'>, <class 'syft.core.node.new.data_subject_service.DataSubjectService'>, <class 'syft.core.node.new.network_service.NetworkService'>, <class 'syft.core.node.new.policy_service.PolicyService'>, <class 'syft.core.node.new.message_service.MessageService'>, <class 'syft.core.node.new.project_service.ProjectService'>, <class 'syft.core.node.new.data_subject_member_service.DataSubjectMemberService'>]


In [76]:
from syft.core.node.new.client import SyftClient
client = SyftClient.from_node(W).login(email="info@openmined.org", password="changethis")

In [77]:
nump = client.numpy

Using numpy version: 1.24.2


In [78]:
nump.errstate(all="ignore")

<numpy.errstate at 0x7f3d854c45b0>

# 12. Custom NumPy classes 

(Challenge 63)

- Still to test edge cases thoroughly

In [79]:
np.random.shuffle(n)

  np.random.shuffle(n)


# 13. Record Arrays are very painful

(Numerous challenges towards the end T.T)

In [85]:
Z_data = np.array([("Hello", 2.5, 3),
              ("World", 3.6, 2)])

np.core.records.fromarrays(
    Z_data.T,
    names='col1, col2, col3',
    formats = 'S8, f8, i8')

rec.array([(b'Hello', 2.5, 3), (b'World', 3.6, 2)],
          dtype=[('col1', 'S8'), ('col2', '<f8'), ('col3', '<i8')])

In [80]:
Z = AO.from_obj(np.array([("Hello", 2.5, 3),
              ("World", 3.6, 2)]))

In [81]:
Z

array([['Hello', '2.5', '3'],
       ['World', '3.6', '2']], dtype='<U32')

In [83]:
np.core.records.fromarrays(
    Z.T,
    names='col1, col2, col3',
    formats = 'S8, f8, i8')

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.