# DSC4001-01 Exercise 04

**This exercise notebook will go through the data types in Python:**


* NumPy Arrays
* NumPy Array Operations


NumPy (Numpy) is a Linear Algebra Library for Python. It is so important for Data Science with Python because almost all of the libraries in the PyData Ecosystem rely on NumPy as one of their main building blocks. 

Once you've installed NumPy you can import it as a library.

NumPy has many built-in functions and capabilities. We will focus on some of the most important aspects of NumPy: arrays, number generation, ... 

In [2]:
import numpy as np
np.random.seed(0)

## NumPy Arrays

### Creating NumPy Arrays

We can create an array (or matrix) by directly converting a list (or list of lists)

In [5]:
my_list = [1,4,2,5,3]
my_list
print(my_list)

[1, 4, 2, 5, 3]


In [4]:
my_arr = np.array(my_list)
my_arr

array([1, 4, 2, 5, 3])

In [6]:
my_matrix = [[1,2,3],[4,5,6],[7,8,9]]
print(my_matrix)

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]


In [7]:
np.array(my_matrix)

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [8]:
my_arr = np.array(my_list, dtype='float32')
my_arr

array([1., 4., 2., 5., 3.], dtype=float32)

There are lots of built-in ways to generate Arrays:

* ``arange``: return evenly spaced values (mostly integers) within a given interval
* ``zeros`` and ``ones``: generate arrays of zeros or ones
* ``linspace``: return evenly spaced numbers (mostly floats) over a specified interval
* ``eye``: create an identity matrix

In [9]:
print(np.arange(0,10))
print(np.arange(0,11,2))

[0 1 2 3 4 5 6 7 8 9]
[ 0  2  4  6  8 10]


In [10]:
print(np.zeros(3))
print(np.zeros((3,4)))

[0. 0. 0.]
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]


In [11]:
print(np.ones(3))
print(np.ones((3,2,2)))

[1. 1. 1.]
[[[1. 1.]
  [1. 1.]]

 [[1. 1.]
  [1. 1.]]

 [[1. 1.]
  [1. 1.]]]


In [12]:
print(np.linspace(0,10,3))
print(np.linspace(0,10,7))

[ 0.  5. 10.]
[ 0.          1.66666667  3.33333333  5.          6.66666667  8.33333333
 10.        ]


In [13]:
print(np.eye(4))

[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]


NumPy also has lots of ways to create random number arrays:

* ``rand``: create an array of the given shape with random samples from a uniform distribution over $[0,1)$
* ``randn``: create an array of the given shape with random samples from the standard normal distribution
* ``randint``: return random integers in the interval



In [14]:
print(np.random.rand(2))
print(np.random.rand(3,4))

[0.5488135  0.71518937]
[[0.60276338 0.54488318 0.4236548  0.64589411]
 [0.43758721 0.891773   0.96366276 0.38344152]
 [0.79172504 0.52889492 0.56804456 0.92559664]]


In [15]:
print(np.random.randn(2))  ## 정규 분포에 따른다. 
print(np.random.randn(3,4))

[0.44386323 0.33367433]
[[ 1.49407907 -0.20515826  0.3130677  -0.85409574]
 [-2.55298982  0.6536186   0.8644362  -0.74216502]
 [ 2.26975462 -1.45436567  0.04575852 -0.18718385]]


In [16]:
print(np.random.randint(2, size=5))  #최소는 포함 최대는 포함이 안된다 . (0,1 중에 랜덤 하게 뽑아라 )
print(np.random.randint(1, size=5))
print(np.random.randint(1, 100, 10))  # 1에서 99 안에서 10개를 뽑아라 .

[0 0 0 1 1]
[0 0 0 0 0]
[ 5 43 59 32  2 66 42 58 36 12]


In [17]:
# 2x4 array of ints between 0 and 4, inclusive:
print(np.random.randint(5, size=(2,4)))

# 2x3 array with 3 different upper bounds:
print(np.random.randint(1, [3,5,10], size=(10,3))) ##  각 열 별로 3,5,10 을 넘지 않게 해라 

[[2 3 0 3]
 [4 1 2 4]]
[[2 1 7]
 [1 4 5]
 [2 1 5]
 [1 1 5]
 [2 3 8]
 [2 2 6]
 [2 1 2]
 [2 2 4]
 [1 4 6]
 [1 1 2]]


### Array Attributes and Methods

Let's play with some useful attributes and methods

In [23]:
arr = np.random.randint(10, size=(3,4,2))
arr

array([[[8, 4],
        [6, 5],
        [8, 2],
        [3, 9]],

       [[7, 5],
        [3, 4],
        [5, 3],
        [3, 7]],

       [[9, 9],
        [9, 7],
        [3, 2],
        [3, 9]]])

In [19]:
print('arr ndim: ', arr.ndim)
print('arr shape: ', arr.shape)
print('arr size: ', arr.size)
print('arr dtype: ', arr.dtype)
print('arr itemsize: ', arr.itemsize, 'bytes')
print('arr nbytes: ', arr.nbytes, 'bytes')

arr ndim:  3
arr shape:  (3, 4, 2)
arr size:  24
arr dtype:  int64
arr itemsize:  8 bytes
arr nbytes:  192 bytes


Reshape 

* ``reshape``: return an array containing the same data with a new shape


In [24]:
print(arr)
print(arr.shape)

[[[8 4]
  [6 5]
  [8 2]
  [3 9]]

 [[7 5]
  [3 4]
  [5 3]
  [3 7]]

 [[9 9]
  [9 7]
  [3 2]
  [3 9]]]
(3, 4, 2)


In [25]:
arr.reshape(4,3,2)

array([[[8, 4],
        [6, 5],
        [8, 2]],

       [[3, 9],
        [7, 5],
        [3, 4]],

       [[5, 3],
        [3, 7],
        [9, 9]],

       [[9, 7],
        [3, 2],
        [3, 9]]])

In [26]:
arr.reshape(2,2,6)

array([[[8, 4, 6, 5, 8, 2],
        [3, 9, 7, 5, 3, 4]],

       [[5, 3, 3, 7, 9, 9],
        [9, 7, 3, 2, 3, 9]]])

max, min, argmax, argmin


In [27]:
rand_arr = np.random.randint(0,50,10)
rand_arr

array([10, 27, 45,  7, 39, 21, 33, 44, 34, 34])

In [28]:
print(rand_arr.max())
print(rand_arr.min())

print(rand_arr.argmax())
print(rand_arr.argmin())

45
7
2
3


##Indexing and Slicing:

* ``arr[1st dim index, 2nd dim index, ...]``
* Remember that index starts from ``0`` ...
* ... and Negative intex starts from ``-1`` 

In [69]:
arr = np.random.randint(10, size=(3,4,2))
arr

array([[[2, 1],
        [1, 2],
        [1, 4],
        [2, 5]],

       [[5, 5],
        [2, 5],
        [7, 7],
        [6, 1]],

       [[6, 7],
        [2, 3],
        [1, 9],
        [5, 9]]])

In [30]:
# Get a value at an index
print(arr[0,0,0])
print(arr[0,1,1])
print(arr[2,2,1])

8
8
3


[[[8]
  [4]]]


In [51]:
print(arr[:1, :2, 0])

[[8 5]]


In [54]:
print(arr[:2, ::2,0 ])
arr[:2, ::2,0 ].shape

[[8 4]
 [5 1]]


(2, 2)

In [55]:
print(arr[:2, ::2,:1 ])
arr[:2, ::2,:1 ].shape

[[[8]
  [4]]

 [[5]
  [1]]]


(2, 2, 1)

In [71]:
print(arr[:2,0,0])

[2 5]


In [42]:
print(arr[1:,:2,:1])

[[[5]
  [8]]

 [[8]
  [4]]]


In [75]:
a=[[[2, 1],
        [1, 2],
        [1, 4],
        [2, 5]],

       [[5, 5],
        [2, 5],
        [7, 7],
        [6, 1]],

       [[6, 7],
        [2, 3],
        [1, 9],
        [5, 9]]]

print(a)


[[[2, 1], [1, 2], [1, 4], [2, 5]], [[5, 5], [2, 5], [7, 7], [6, 1]], [[6, 7], [2, 3], [1, 9], [5, 9]]]


AttributeError: ignored

In [82]:
print(a[:2][0][0])

[2, 1]


``Broadcasting``: NumPy arrays differ from a normal Python list because of their ability to broadcast:

``view``: Array slices return views rather than copies of the array data. If we modify subarray, the original array is changed. This avoids memory problems!

``copy``: To get a data copy, need to use .copy()


In [56]:
sub_arr = arr[1]
print(sub_arr)

[[5 0]
 [8 1]
 [1 0]
 [3 8]]


In [57]:
sub_arr[:,1] = 5
print(sub_arr)

[[5 5]
 [8 5]
 [1 5]
 [3 5]]


In [58]:
print(arr)

[[[8 1]
  [5 8]
  [4 0]
  [2 5]]

 [[5 5]
  [8 5]
  [1 5]
  [3 5]]

 [[8 4]
  [4 0]
  [9 3]
  [7 3]]]


In [59]:
arr_copy = arr.copy()
arr_copy

array([[[8, 1],
        [5, 8],
        [4, 0],
        [2, 5]],

       [[5, 5],
        [8, 5],
        [1, 5],
        [3, 5]],

       [[8, 4],
        [4, 0],
        [9, 3],
        [7, 3]]])

In [60]:
arr_copy[1,:,1] = 100
print(arr_copy)
print(arr) 

[[[  8   1]
  [  5   8]
  [  4   0]
  [  2   5]]

 [[  5 100]
  [  8 100]
  [  1 100]
  [  3 100]]

 [[  8   4]
  [  4   0]
  [  9   3]
  [  7   3]]]
[[[8 1]
  [5 8]
  [4 0]
  [2 5]]

 [[5 5]
  [8 5]
  [1 5]
  [3 5]]

 [[8 4]
  [4 0]
  [9 3]
  [7 3]]]


Selection: how to use brackets for selection based on comparison operators

In [61]:
arr = np.arange(1,11)
arr 

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [62]:
bool_arr = arr > 4
bool_arr

array([False, False, False, False,  True,  True,  True,  True,  True,
        True])

In [63]:
arr[bool_arr]

array([ 5,  6,  7,  8,  9, 10])

In [64]:
x = 7
arr[arr < x]

array([1, 2, 3, 4, 5, 6])

## NumPy Operations


### Arithmetic operations

We can easily perform array with array arithmetic operations:

* $+$ or ``np.add``: addition
* $-$ or ``np.subtract``: subtraction
* $*$ or ``np.multiply``: multiplication
* $/$ or ``np.divide``: division
* $**$ or ``np.power``: exponentiation
* $%$ or ``np.mod``: modulus / remainder

In [76]:
arr = np.arange(0, 10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [77]:
arr + arr

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [78]:
arr - arr

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [79]:
arr * arr

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

In [80]:
# if devision by zero, it gives ``nan``
arr / arr

  


array([nan,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])

In [81]:
arr ** 3

array([  0,   1,   8,  27,  64, 125, 216, 343, 512, 729])

Broadcasting

In [83]:
arr_1 = np.arange(3)
print(arr_1)

arr_1 + 5

[0 1 2]


array([5, 6, 7])

In [84]:
arr_2 = np.ones((3,3))
print(arr_1)
print(arr_2)

arr_1 + arr_2

[0 1 2]
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


array([[1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.]])

In [85]:
print(arr_1)
print(arr_1.reshape(3,1))

arr_1 + arr_1.reshape((3,1))

[0 1 2]
[[0]
 [1]
 [2]]


array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

### Universal Array Functions

NumPy comes with many universal array functions, which are essentially just mathematical operations you can use to perform the operation across the array.

* ``np.sum``: compute sum of elements
* ``np.prod``: compute product of elements
* ``np.mean``: compute mean of elements
* ``np.std``: compute standard deviation
* ``np.var``: compute variance
* ``np.median``: compute median of elements
* ``np.sqrt``: compute square root
* ``np.exp``: exponentiation
* ``np.log``: natural logarithm
* ``np.abs``: absolute value


In [86]:
arr = np.arange(0, 10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [87]:
np.sum(arr)

45

In [88]:
np.prod(arr[1:4])

6

In [89]:
print(np.mean(arr))
print(np.std(arr))
print(np.var(arr))

4.5
2.8722813232690143
8.25


In [90]:
np.sqrt(arr)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

In [91]:
np.exp(arr)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

In [92]:
print(np.log(arr))
print(np.log2(arr))

[      -inf 0.         0.69314718 1.09861229 1.38629436 1.60943791
 1.79175947 1.94591015 2.07944154 2.19722458]
[      -inf 0.         1.         1.5849625  2.         2.32192809
 2.5849625  2.80735492 3.         3.169925  ]


  """Entry point for launching an IPython kernel.
  


We can specify the ``axis`` along which the aggregate is computed: Note that the axis index also starts from ``0``

In [93]:
arr = np.random.randint(1,10,size=(3,4))
arr

array([[3, 1, 2, 1],
       [7, 1, 5, 9],
       [5, 4, 4, 9]])

In [94]:
print(arr.sum())

51


In [95]:
print(arr.sum(axis=0))
print(arr.sum(axis=1))

[15  6 11 19]
[ 7 22 22]


In [96]:
print(arr.mean())
print(arr.mean(axis=0))
print(arr.mean(axis=1))

4.25
[5.         2.         3.66666667 6.33333333]
[1.75 5.5  5.5 ]


In [97]:
print(arr.max())
print(arr.max(axis=0))
print(arr.max(axis=1))

9
[7 4 5 9]
[3 9 9]


Comparison operators

* $==$ or ``np.equal``: equal 
* $!=$ or ``np.not_equal``: not equal
* $<$ or ``np.less``: less
* $<=$ or ``np.less_equal``: less equal
* $>$ or ``np.greater``: greater
* $>=$ or ``np.greater_equal``: greater equal

In [98]:
arr = np.arange(0, 10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [99]:
np.less(arr, 5)

array([ True,  True,  True,  True,  True, False, False, False, False,
       False])

In [100]:
np.less_equal(arr, 5)

array([ True,  True,  True,  True,  True,  True, False, False, False,
       False])

Sorting arrays:

* ``np.sort``: return a sorted version of the array
* ``np.argsort``: return the indices of the sorted elements

In [101]:
arr = np.random.randint(0,50,10)
arr

array([27, 29, 46, 23, 32, 19,  8,  7, 23, 13])

In [102]:
arr.sort()
arr

array([ 7,  8, 13, 19, 23, 23, 27, 29, 32, 46])

In [103]:
print(np.sort(arr)) ## 어레이 값은 변하지 않는다. 그냥 값만 보여주는 것 
print(arr)

[ 7  8 13 19 23 23 27 29 32 46]
[ 7  8 13 19 23 23 27 29 32 46]


In [104]:
arr = np.random.randint(0,50,size=(3,4))
arr

array([[17,  0, 11, 28],
       [36, 25, 32, 42],
       [14, 22, 28, 20]])

In [105]:
# default axis = -1
print(np.sort(arr))
print(np.sort(arr, axis=0))
print(np.sort(arr, axis=1))

[[ 0 11 17 28]
 [25 32 36 42]
 [14 20 22 28]]
[[14  0 11 20]
 [17 22 28 28]
 [36 25 32 42]]
[[ 0 11 17 28]
 [25 32 36 42]
 [14 20 22 28]]


## Quiz

Now we've learned about NumPy Arrays. Let's test your knowledge.

1. Create an array of 10 zeros

In [142]:
print(np.zeros(10))

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


2. Create an array of 10 ones

In [143]:
print(np.ones(10))

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]


3. Create an array of the integers from 10 to 25

In [169]:
np.arange(10,26)

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25])

4. Create an array of 5 values evenly spaced between 0 to 1 (inclusive): Output should be ``array([0., 0.25, 0.5, 0.75, 1.])``

In [145]:
print(np.linspace(0,1,5))

[0.   0.25 0.5  0.75 1.  ]


5. Create a 4 x 3 matrix with integers ranging from 1 to 12

hint: use arange and reshape

In [171]:
print(np.arange(1,13).reshape(4,3))

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


In [147]:
print(np.random.rand(1))

[0.18984791]


7. Create a 5x5 matrix with 25 random numbers sampled from a standard normal distribution

In [148]:
print(np.random.randn(5,5))

[[ 0.46756761 -1.54079701  0.06326199  0.15650654  0.23218104]
 [-0.59731607 -0.23792173 -1.42406091 -0.49331988 -0.54286148]
 [ 0.41605005 -1.15618243  0.7811981   1.49448454 -2.06998503]
 [ 0.42625873  0.67690804 -0.63743703 -0.39727181 -0.13288058]
 [-0.29779088 -0.30901297 -1.67600381  1.15233156  1.07961859]]


8. Create the following matrix: 

``array([[0.01, 0.02, 0.03, 0.04, 0.05],[0.06, 0.07, 0.08, 0.09, 0.1], [0.11, 0.12, 0.13, 0.14, 0.15], [0.16, 0.17, 0.18, 0.19, 0.2]]])``

hint: first create matrix with integers ranging from 1 to 20, and divide it by 100


In [175]:
np.arange(1,21).reshape(4,5)/100

array([[0.01, 0.02, 0.03, 0.04, 0.05],
       [0.06, 0.07, 0.08, 0.09, 0.1 ],
       [0.11, 0.12, 0.13, 0.14, 0.15],
       [0.16, 0.17, 0.18, 0.19, 0.2 ]])


You will be given a matrix, and be asked to replicate the resulting matrix outputs by using **indexing, slicing, and selection**

In [176]:
arr=np.linspace(0.01,0.2,20)
print(arr.reshape(4,5))


[[0.01 0.02 0.03 0.04 0.05]
 [0.06 0.07 0.08 0.09 0.1 ]
 [0.11 0.12 0.13 0.14 0.15]
 [0.16 0.17 0.18 0.19 0.2 ]]


In [150]:
mat = np.arange(1,17).reshape(4,4)
mat

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]])

9. Output should be 
``array([[5,6,7,8],[9,10,11,12]])``

In [178]:
mat[1:3]

array([[ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

10. Output should be ``10``

In [152]:
print(mat[2,1])

10


11. Output should be ``array([[3], [7], [11]])``

Note that the output is not ``array([3, 7, 11])``: make ``3``,``7``,``11`` in separate bracket, i.e., output shape should be ``(3,1)``, not ``(3,)``.

In [181]:
print(mat[:3,2:3])

[[ 3]
 [ 7]
 [11]]


12. Output should be ``array([[6,7],[14,15]])``

In [159]:
print(mat[1::2,1:3])

[[ 6  7]
 [14 15]]


With new matrix:

In [155]:
rand_mat = np.array([[4, 19, 48, 4], [43, 13, 39, 36], [23, 6, 24, 44]])
rand_mat

array([[ 4, 19, 48,  4],
       [43, 13, 39, 36],
       [23,  6, 24, 44]])

13. Compute the sum of all values in rand_mat

In [162]:
rand_mat.sum()

303

14. Compute the standard deviation of the values in rand_mat

In [182]:
np.std(rand_mat)

15.700981922584758

15. Get the mean of all the columns in rand_mat

In [184]:
np.mean(rand_mat,axis=0)

array([23.33333333, 12.66666667, 37.        , 28.        ])

16. Get the max and its index of all the columns in rand_mat

In [186]:
# max
rand_mat.max(axis=0)

array([43, 19, 48, 44])

In [166]:
# argmax
rand_mat.argmax(axis=0)

array([1, 0, 0, 2])