## Copies and views

https://numpy.org/doc/stable/user/basics.copies.html

In [1]:
import numpy as np

Indexing a sub-array (but not an array scalar) creates a *view*.

In the following example:
 - `a` has underlying data;
 - `b` does not have its own data, it accesses `a`'s data.

We say that `b = a[1]` creates a view. 

A list of all types available with numpy is provided here: 


In [2]:
a = np.array([[200, 201, 202, 203], [210, 211, 212, 213], [220, 221, 222, 223]], dtype = np.int32)
b = a[1] # same as a[1, :] # `=` is a deep copy

print('a =')
print(a)
print('')

print('b =')
print(b)
print('')

b[2] = 88 # `=` is deep copy
print('after b[2] = 88')
print('')

print('a =')
print(a)
print('')

print('b =')
print(b)
print('')

a =
[[200 201 202 203]
 [210 211 212 213]
 [220 221 222 223]]

b =
[210 211 212 213]

after b[2] = 88

a =
[[200 201 202 203]
 [210 211  88 213]
 [220 221 222 223]]

b =
[210 211  88 213]



`dtype` (data type) in NumPy specifies the type of elements stored in an array. It's a way to tell NumPy how to interpret each element of the array, which has a significant impact on the amount of memory used to store the array and how operations on the array are performed.

`https://numpy.org/doc/stable/user/basics.types.html`.

We commonly use: `np.bool`, `np.uint8`, `np.float32`, `np.float64`, `np.int32`, and `np.int64`

If we were to talk about audio, we'd need `np.int16`, and sometimes `np.int32`, too.

In [4]:
a = np.array([[200, 201, 202, 203], [210, 211, 212, 213], [220, 221, 222, 223]], dtype = np.int32)
b = a[1].copy() # `.copy()` is a shallow copy

print('a =')
print(a)
print('')

print('b =')
print(b)
print('')

b[2] = 88
print('after b[2] = 88')
print('')

print('a =')
print(a)
print('')

print('b =')
print(b)
print('')

a =
[[200 201 202 203]
 [210 211 212 213]
 [220 221 222 223]]

b =
[210 211 212 213]

after b[2] = 88

a =
[[200 201 202 203]
 [210 211 212 213]
 [220 221 222 223]]

b =
[210 211  88 213]



Two reasons for using `copy`:
 - You genuinely want to copy the data so you can edit it without affecting the original data.
 - You have taken a smaller part of some very large data, and no longer need the rest of the very large data.
   Then
   ```
    b = a[indexing].copy()
    del a
   ```
   seems appropriate, because it'll allow Python's garbage collection to destroy the large data.

We have seen some basic slicing for images. There's a few other things to know about slicing:
 - slicing creates a view;
 - slicing can be combined with usual indexing;
   - slicing preserves an axis;
   - an index causes that axis to "indexed away";
 - when fewer indices and slices are provided than the number of axes, 
   the missing trailing indices are considered complete slices;
 - `...` can be used to fill in as many total slices `:` as needed to make sense.

```python
np.fromfunction(function_body, shape_of_array, dtype)
```

In [7]:
a = lambda i0, i1, i2: 3000 + 100*i0 + 10*i1 + 1*i2
a

<function __main__.<lambda>(i0, i1, i2)>

In [9]:
a = np.fromfunction(lambda i0, i1, i2: 3000 + 100*i0 + 10*i1 + 1*i2, (2, 4, 5), dtype=np.int32)
print(a)

[[[3000 3001 3002 3003 3004]
  [3010 3011 3012 3013 3014]
  [3020 3021 3022 3023 3024]
  [3030 3031 3032 3033 3034]]

 [[3100 3101 3102 3103 3104]
  [3110 3111 3112 3113 3114]
  [3120 3121 3122 3123 3124]
  [3130 3131 3132 3133 3134]]]


**Grid of Indices**: NumPy constructs a grid of indices where each dimension corresponds to one of the function's arguments. For a 3D array of shape (2, 4, 5), the indices for each dimension would be:

For i0: Two values (0 and 1), since the first dimension's size is 2. <br>
For i1: Four values (0, 1, 2, 3), since the second dimension's size is 4. <br>
For i2: Five values (0, 1, 2, 3, 4), since the third dimension's size is 5.

**Function Calls**: The function is called with each combination of these indices. For example, the first call might be with (i0=0, i1=0, i2=0), the next with (i0=0, i1=0, i2=1), and so on, covering every possible combination up to (i0=1, i1=3, i2=4) for the last element of the array. This results in the function being called once for each element of the array, with the indices specifying the position of that element.

**Element Computation**: For each call, the function computes the value of the array element at the position specified by the indices. In your example, the computation is 3000 + 100*i0 + 10*i1 + 1*i2, so the indices directly determine each element's value, with the function effectively iterating through the indices with a step of one.

In [10]:
b = a[:,1:3,::2] # the first axis is left alone, only have 1,2 from the second axis, only have 0,2,4 from the last axis
print(b)

[[[3010 3012 3014]
  [3020 3022 3024]]

 [[3110 3112 3114]
  [3120 3122 3124]]]


In [11]:
b[1,0,2] = 88 # affects the array a since the previous assignment created a view

print('a =')
print(a)
print('')

print('b =')
print(b)
print('')

a =
[[[3000 3001 3002 3003 3004]
  [3010 3011 3012 3013 3014]
  [3020 3021 3022 3023 3024]
  [3030 3031 3032 3033 3034]]

 [[3100 3101 3102 3103 3104]
  [3110 3111 3112 3113   88]
  [3120 3121 3122 3123 3124]
  [3130 3131 3132 3133 3134]]]

b =
[[[3010 3012 3014]
  [3020 3022 3024]]

 [[3110 3112   88]
  [3120 3122 3124]]]



In [12]:
a = np.fromfunction(lambda i0, i1, i2: 3000 + 100*i0 + 10*i1 + 1*i2, (2, 4, 5), dtype=np.int32)
b = a[0,1:,1:] # the first axis will be lost; the second will become smaller by 1, as will the third

print('a =')
print(a)
print('shape =', a.shape)
print('')

print('b =')
print(b)
print('shape =', b.shape)
print('')

a =
[[[3000 3001 3002 3003 3004]
  [3010 3011 3012 3013 3014]
  [3020 3021 3022 3023 3024]
  [3030 3031 3032 3033 3034]]

 [[3100 3101 3102 3103 3104]
  [3110 3111 3112 3113 3114]
  [3120 3121 3122 3123 3124]
  [3130 3131 3132 3133 3134]]]
shape = (2, 4, 5)

b =
[[3011 3012 3013 3014]
 [3021 3022 3023 3024]
 [3031 3032 3033 3034]]
shape = (3, 4)



In [13]:
a = np.array([[1, 2, 3], [4, 5, 6]], dtype = np.int32)
b = a[1:2,2:3]  # b is an array with 2 axes
c = a[1,2]      # c is an array with 0 axes, an array scalar

print('b =')
print(b)
print('shape =', b.shape)
print('')

print('c =')
print(c)
print('shape =', c.shape)
print('')

b =
[[6]]
shape = (1, 1)

c =
6
shape = ()



In [14]:
a = np.fromfunction(lambda i0, i1, i2: 3000 + 100*i0 + 10*i1 + 1*i2, (2, 4, 5), dtype=np.int32)
b = a[0,2:4]    # b will end up viewing
c = a[0,2:4,:]  # the same data as c

print('b =')
print(b)
print('')

print('c =')
print(c)
print('')

b =
[[3020 3021 3022 3023 3024]
 [3030 3031 3032 3033 3034]]

c =
[[3020 3021 3022 3023 3024]
 [3030 3031 3032 3033 3034]]



In [15]:
a = np.fromfunction(lambda i0, i1, i2: 3000 + 100*i0 + 10*i1 + 1*i2, (2, 4, 5), dtype=np.int32)
b = a[...,2]  # the same as b = a[:,:,2]

print('a =')
print(a)
print('shape =', a.shape)
print('')

print('b =')
print(b)
print('shape =', b.shape)
print('')

a =
[[[3000 3001 3002 3003 3004]
  [3010 3011 3012 3013 3014]
  [3020 3021 3022 3023 3024]
  [3030 3031 3032 3033 3034]]

 [[3100 3101 3102 3103 3104]
  [3110 3111 3112 3113 3114]
  [3120 3121 3122 3123 3124]
  [3130 3131 3132 3133 3134]]]
shape = (2, 4, 5)

b =
[[3002 3012 3022 3032]
 [3102 3112 3122 3132]]
shape = (2, 4)



## Converting types

The `astype` function is useful for data type changing.


In [16]:
a = np.arange(254.5, 258.5, 0.5, dtype = np.float32)
b = a.astype(np.int32)
c = b.astype(np.uint32)

print('a =')
print(a)
print('dtype =', a.dtype)
print('')

print('b =')
print(b)
print('dtype =', b.dtype)
print('')

print('c =')
print(c)
print('dtype =', c.dtype)
print('')

a =
[254.5 255.  255.5 256.  256.5 257.  257.5 258. ]
dtype = float32

b =
[254 255 255 256 256 257 257 258]
dtype = int32

c =
[254 255 255 256 256 257 257 258]
dtype = uint32



The data types in NumPy are different from base Python, and are closer to what C++ has. In particular, integers **have fixed-width representations** and may overflow.

While data types in Python (out of NumPy) like integers are **arbitrary-precision** and will not overflow.

In [19]:
c = np.array([254, 255, 255, 256, 256, 257, 257, 258])
d = c.astype(np.uint8)

print('d =')
print(d)
print('dtype =', d.dtype)
print('')

d =
[254 255 255   0   0   1   1   2]
dtype = uint8



Observe the output of d. (find the differences between c and d)