# CPSC380: 2_numpy_1_array_creation

In this notebook, you will learn:
 - Numpy array creation functions:`np.array`, `np.arange`, `np.zeros`, `np.ones`, `np.full`, `np.identity`, `np.eye`, `np.random.randint`, `np.random.random`,`np.random.normal`,`np.random.choice`
 - Numpy array attributes: `ndim`, `shape`, `dtype`, `size`, `itemsize`, `nbytes`
 - Numpy array reshaping: `reshape`, `flatten`, `transpose`
 
Read more: 
 - textbook (https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-numpy.html) and
 - [Numpy website] (https://numpy.org/).

Numpy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays.

To use Numpy, we first need to import the `numpy` package:

In [36]:
import numpy as np

### 1 Numpy array creation

A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. 
 - The number of dimensions is the **rank** of the array; 
 - The **shape** of an array is a tuple of integers giving the size of the array along each dimension.

In [37]:
# print(np.rank(1))                            # rank 0
# print(np.rank([1,2,3]))                      # rank 1
# print(np.rank(np.array([[1,2,3],[4,5,6]])))  # rank 2

print(np.ndim(1))                            # rank 0
print(np.ndim([1,2,3]))                      # rank 1
print(np.ndim(np.array([[1,2,3],[4,5,6]])))  # rank 2

0
1
2


We can initialize numpy arrays from nested Python lists, and access elements using square brackets:

In [38]:
# Create a rank 1 array with np.array
alist=[1,2,3]
a =np.array(alist)
print (alist, a)               

[1, 2, 3] [1 2 3]


In [39]:
# Create a rank 2 array with np.array
blist=[[1,2,3],
       [4,5,6]]
b = np.array(blist) 
print (blist, '\n\n', b)

[[1, 2, 3], [4, 5, 6]] 

 [[1 2 3]
 [4 5 6]]


Numpy also provides many functions to create arrays:

In [40]:
a = np.arange(1, 20, 3) # similar to range(start, stop, step)
print (a)

[ 1  4  7 10 13 16 19]


In [41]:
print(dir(a))

['T', '__abs__', '__add__', '__and__', '__array__', '__array_finalize__', '__array_function__', '__array_interface__', '__array_prepare__', '__array_priority__', '__array_struct__', '__array_ufunc__', '__array_wrap__', '__bool__', '__class__', '__complex__', '__contains__', '__copy__', '__deepcopy__', '__delattr__', '__delitem__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__iand__', '__ifloordiv__', '__ilshift__', '__imatmul__', '__imod__', '__imul__', '__index__', '__init__', '__init_subclass__', '__int__', '__invert__', '__ior__', '__ipow__', '__irshift__', '__isub__', '__iter__', '__itruediv__', '__ixor__', '__le__', '__len__', '__lshift__', '__lt__', '__matmul__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift_

In [68]:
c = np.array(5)

print(a.shape, b.shape, c.shape) # a-> (7,) b->(2, 3)

(30,) (2, 15) ()


**Note**: The number of dimensions and items in an array is defined by its shape, which is a **tuple** of N positive integers that specify the sizes of each dimension. We need a **comma** for a tuple otherwise we cannot tell (30) is really 30 or a tuple with one element of 30.

In [43]:
a = np.zeros((2, 5), int)  # Create an array of all zeros
print (a)

[[0 0 0 0 0]
 [0 0 0 0 0]]


In [44]:
b = np.ones((2, 5), int)  # Create an array of all ones
print (b)

[[1 1 1 1 1]
 [1 1 1 1 1]]


In [45]:
c = np.full(shape=(2, 5), fill_value=5, dtype=int) # Create a constant array with full
print (c) 

[[5 5 5 5 5]
 [5 5 5 5 5]]


In [46]:
c = np.full((2,5), fill_value=5, dtype=int)
print (c)

[[5 5 5 5 5]
 [5 5 5 5 5]]


In [47]:
d = np.identity(5)       # Create a 2x2 identity matrix
print (d)

[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]


In [48]:
np.random.seed(21) 
np.random.randint(low = 0, high = 10, size = (2, 2, 3)) 

array([[[9, 8, 4],
        [0, 0, 8]],

       [[3, 2, 1],
        [8, 9, 6]]])

In [49]:
np.random.seed(42) 
np.random.normal(size = (4, 5))  #[-1.96, 1.95] 95% of your values 

array([[ 0.49671415, -0.1382643 ,  0.64768854,  1.52302986, -0.23415337],
       [-0.23413696,  1.57921282,  0.76743473, -0.46947439,  0.54256004],
       [-0.46341769, -0.46572975,  0.24196227, -1.91328024, -1.72491783],
       [-0.56228753, -1.01283112,  0.31424733, -0.90802408, -1.4123037 ]])

In [50]:
np.random.seed(1) 
np.random.choice(np.arange(100), size = 5)

array([37, 12, 72,  9, 75])

In [51]:
np.random.seed(77) 
np.random.choice(np.arange(6), size = 8, replace = True) 

array([4, 4, 3, 5, 0, 0, 1, 5])

In [52]:
# random will generate values b/w [0,1)
np.random.seed(42) 
np.random.random(size = (4, 5))  

array([[0.37454012, 0.95071431, 0.73199394, 0.59865848, 0.15601864],
       [0.15599452, 0.05808361, 0.86617615, 0.60111501, 0.70807258],
       [0.02058449, 0.96990985, 0.83244264, 0.21233911, 0.18182497],
       [0.18340451, 0.30424224, 0.52475643, 0.43194502, 0.29122914]])

### 2. Array Attributes
- **ndim** : the number of dimensions
- **shape**: the actual size of each dimensions of an array, returned in the form of tuple
- **dtype** : the data type associated with the array
- **size**: total size of an array, which is essentially total no of elements in array
- **itemsize** : the size of each array element in bytes
- **nbytes** : total size of the array, essentially equal to size * itemsize

In [53]:
# 1-d array
a = np.array([1,2,3])
print('#Dim:       ', a.ndim)
print('Shape:      ', a.shape)
print('Data Type:  ', a.dtype)
print('Total size: ', a.size)
print('Itemsize:   ', a.itemsize)
print('Nbytes:     ', a.nbytes)

#Dim:        1
Shape:       (3,)
Data Type:   int32
Total size:  3
Itemsize:    4
Nbytes:      12


In [54]:
np.random.seed(42) 
a = np.random.normal(size = (4, 5))
print (a)
print('#Dim:       ', a.ndim)
print('Shape:      ', a.shape)
print('Data Type:  ', a.dtype)
print('Total size: ', a.size)
print('Itemsize:   ', a.itemsize)
print('Nbytes:     ', a.nbytes)

[[ 0.49671415 -0.1382643   0.64768854  1.52302986 -0.23415337]
 [-0.23413696  1.57921282  0.76743473 -0.46947439  0.54256004]
 [-0.46341769 -0.46572975  0.24196227 -1.91328024 -1.72491783]
 [-0.56228753 -1.01283112  0.31424733 -0.90802408 -1.4123037 ]]
#Dim:        2
Shape:       (4, 5)
Data Type:   float64
Total size:  20
Itemsize:    8
Nbytes:      160


### 3. Array reshaping
- reshape: Be careful that total elements in 1D and reshaped array are equal.
- flatten: reshape any nD array into 1D using .flatten method 
- transpose: : changing rows into column, and vice versa

In [55]:
a = np.arange(30)
print(a)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29]


In [56]:
# Note: Total elements in 1D and reshaped array are equal.
b = a.reshape((5,6))
print(b, '\n')

b = a.reshape((10,3))
print(b, '\n')

b = a.reshape((2,15))
print(b, '\n')

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]
 [24 25 26 27 28 29]] 

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]
 [12 13 14]
 [15 16 17]
 [18 19 20]
 [21 22 23]
 [24 25 26]
 [27 28 29]] 

[[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
 [15 16 17 18 19 20 21 22 23 24 25 26 27 28 29]] 



In [57]:
# what about if the total elements is not equal 
# b = a.reshape((3, 11))
# print(b, '\n')

In [73]:
# Sometime you may see one shape dimension can be -1
a.reshape(3, -1)

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])

In [59]:
a.reshape(-1) # flattened

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])

In [60]:
a.reshape((-1,2,5)) # equivalent to (3,2,5)

array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9]],

       [[10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29]]])

In [61]:
#np.flatten:

# ‘C’ means to flatten in row major (C style) order. 
# ‘F’ means to flatten in column major (Fortran style) order.
f = np.arange(20).reshape(5,-1)
print(f, '\n')
print(f.flatten('C'), '\n') #order C
print(f.flatten('F'), '\n') #order F

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]] 

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19] 

[ 0  4  8 12 16  1  5  9 13 17  2  6 10 14 18  3  7 11 15 19] 



In [62]:
# 2-d array transpose
t = np.arange(30).reshape(5,-1)
print('original array:\n', t, '\n')
print('transposed:\n', t.T) #transpose
print()

original array:
 [[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]
 [24 25 26 27 28 29]] 

transposed:
 [[ 0  6 12 18 24]
 [ 1  7 13 19 25]
 [ 2  8 14 20 26]
 [ 3  9 15 21 27]
 [ 4 10 16 22 28]
 [ 5 11 17 23 29]]



In [63]:
arr = np.arange(16).reshape((2, 2, 4))
print('original array:\n', arr, '\n')
print('transposed:\n', arr.transpose((1, 0, 2))) #transpose
print()


original array:
 [[[ 0  1  2  3]
  [ 4  5  6  7]]

 [[ 8  9 10 11]
  [12 13 14 15]]] 

transposed:
 [[[ 0  1  2  3]
  [ 8  9 10 11]]

 [[ 4  5  6  7]
  [12 13 14 15]]]



In [64]:
arr = np.arange(16).reshape((2, 2, 4))
print('original array:\n', arr, '\n')
print('transposed:\n', arr.transpose((2, 1, 0))) #transpose
print()


original array:
 [[[ 0  1  2  3]
  [ 4  5  6  7]]

 [[ 8  9 10 11]
  [12 13 14 15]]] 

transposed:
 [[[ 0  8]
  [ 4 12]]

 [[ 1  9]
  [ 5 13]]

 [[ 2 10]
  [ 6 14]]

 [[ 3 11]
  [ 7 15]]]



### 4 Datatypes

Every numpy array is a grid of elements of the same type. Numpy provides a large set of numeric datatypes that you can use to construct arrays. Numpy tries to guess a datatype when you create an array, but functions that construct arrays usually also include an optional argument to explicitly specify the datatype. Here is an example:

In [65]:
w = np.array([True, False])           # Let numpy choose the datatype
x = np.array([1, 2])                  # Let numpy choose the datatype
y = np.array([1.0, 2.0])              # Let numpy choose the datatype
z = np.array([1, 2], dtype=np.int64)  # Force a particular datatype

print (w.dtype, x.dtype, y.dtype, z.dtype)

bool int32 float64 int64
