## Chapter 2 : NumPy

### Overview: 
    2.1 NumPy: Numerical Python
    2.2 Array indexing
    2.3 Array slicing
    2.4 NumPy Universal Functions
    
### 2.1 NumPy : Numerical Python
    ▪ NumPy (numerical python) is a specialized package for handling numerical arrays
    ▪ builds (together with Pandas) the core of almost all data science tools

In [1]:
import numpy as np

In [2]:
np.__version__

'1.20.1'

In [6]:
# Python variables vs. NumPy arrays: 

    # ▪ Python variables are more than just their value
        #→ contain extra information about the type of the value
        
# Python lists can hold any kind of data types.       
        
L = [True, "2", 3.0, 4]

# lets check the type of items in L 

[type(item) for item in L]

[bool, str, float, int]

In [7]:
# Creating NumPy arrays: 

#    ▪ NumPy arrays can be created from (existing) lists
#      → Syntax: np.array([<list>]) (with optional dtype (data type) parameter)
#    ▪ arrays can be multidimensional (i.e., nested lists / list of lists)


np.array([1, 2, 3, 4])

array([1, 2, 3, 4])

In [8]:
np.array([1.0, 2, 3, 4]) # it can only take one type of data, so it turned everthing into floating data. 

array([1., 2., 3., 4.])

In [9]:
np.array([1, 2, 3, 4], dtype='float') # we can specify the data types. 

array([1., 2., 3., 4.])

In [14]:
# nested lists result in multi-dimensional arrays: 
np.array([[1,2,3], [4,5,6], [7,8,9]])

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

### Routines to create arrays from scratch (selection)

     Command                    Meaning
    np.zeros                  array of zeros 
    np.ones                   array of ones
    np.full                   array of arbitrary value 
    np.arange                 linear sequence
    np.linspace               evenly spaced array
    np.random.random          uniformly distributed random values
    np.random.normal          normally distributed random values
    np.random.randint         random integer values

## Examples: 

In [17]:
# 2x3 array filled with ones

np.ones((2,3))  # two rows and three columns

array([[1., 1., 1.],
       [1., 1., 1.]])

In [18]:
# Create an array of five values evenly spaced between 0 and 1

np.linspace(0, 1, 5) # linear space btw 0 and 1 with 5 values. 

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [20]:
# Array filled with a linear sequence, starting at 0, ending at 20, stepping by 2
np.arange(0, 20, 2) # linear sequence btw 0 and 20 by 2 stepping. It does not include 20

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [22]:
# 2x2 array of uniformly distributed random values between 0 and 1
np.random.random((2, 2)) # (2,2) tuples stand for two-by-two dimensional array

array([[0.93369614, 0.12061934],
       [0.77391497, 0.69068591]])

In [23]:
# 2x2 array of random integers in the interval [0, 10), not including 10
np.random.randint(0, 10, (2, 2))

array([[6, 6],
       [3, 4]])

### Exercise:

    In the following table, you see some facts about the Beatles.
        → Create a numpy array that contains the first name of each Beatle (call it firstname). 
        → Create a numpy array that contains the last name of each Beatle (call it lastname). 
        → Bonus: Create a numpy array that contains the information in the died column.
 

In [26]:
firstname = np.array(["John", "Paul", "George", "Ringo"])
firstname

array(['John', 'Paul', 'George', 'Ringo'], dtype='<U6')

In [28]:
lastname = np.array(["Lennon", "McCartney", "Harrison", "Starr"])
lastname

array(['Lennon', 'McCartney', 'Harrison', 'Starr'], dtype='<U9')

In [31]:
died = np.array([1980, "alive", 2001, "alive"])
died

# exercise is done!

array(['1980', 'alive', '2001', 'alive'], dtype='<U21')

### Basic array manipulations include ...
    ▪ Attributes of arrays: Determining the size, shape, memory consumption, and data types of arrays
    ▪ Indexing of arrays: Getting and setting the value of individual array elements 
    ▪ Slicing of arrays : Getting and setting smaller subarrays within a larger array
    
#### Most Useful NumPy Attributes: 
    ▪ ndim : number of dimensions
    ▪ shape : size of each dimension
    ▪ size : the total size of the array (i. e., number of elements)
    ▪ dtype : data type

In [67]:
np.random.seed(42)  # seed for the reproducibility 

x1 = np.random.randint(10, size=6) # 1D array
x2 = np.random.randint(10, size=(3,4)) # 2D array
x3 = np.random.randint(10, size=(3,4,5)) # 3D array with 4x5 random integers each
x1, x2, x3

(array([6, 3, 7, 4, 6, 9]),
 array([[2, 6, 7, 4],
        [3, 7, 7, 2],
        [5, 4, 1, 7]]),
 array([[[5, 1, 4, 0, 9],
         [5, 8, 0, 9, 2],
         [6, 3, 8, 2, 4],
         [2, 6, 4, 8, 6]],
 
        [[1, 3, 8, 1, 9],
         [8, 9, 4, 1, 3],
         [6, 7, 2, 0, 3],
         [1, 7, 3, 1, 5]],
 
        [[5, 9, 3, 5, 1],
         [9, 1, 9, 3, 7],
         [6, 8, 7, 4, 1],
         [4, 7, 9, 8, 8]]]))

In [45]:
# to learn the exact dimenstions of an array with a proper answer (x3.ndim is enough normally) we can do: 

print("x3 ndim: ", x3.ndim)
print("x3 shape: ", x3.shape)
print("x3 size: ", x3.size)
print("x3 size: ", x3.dtype)

x3 ndim:  3
x3 shape:  (3, 4, 5)
x3 size:  60
x3 size:  int64


### 2.2: Array Indexing

    Indexing: Accessing single elements in arrays
        ▪ works like Python’s standard list indexing (Python is zero-indexed)
        ▪ specify desired index in square brackets (e. g., x[1] )
            → in multidimensional arrays, use comma-separated tuple of values (e. g., x[1,1] )

In [68]:
x1

array([6, 3, 7, 4, 6, 9])

In [48]:
x1[0] # first element (remember zero indexing!)

6

In [52]:
x1[-1], x1[-2], x1[-3] # index from end of array using negative values

(9, 6, 4)

In [53]:
x2

array([[2, 6, 7, 4],
       [3, 7, 7, 2],
       [5, 4, 1, 7]])

In [56]:
x2[0,0] # multi-dimensional indexing. It is for first row and first column element

2

In [57]:
x2[2, -2] # third row and second element from last

1

In [71]:
## Ceveat when modifing values: 
    # NumPy arrays have a fixed type (unlike Python lists), so sometimes, values can be sliently truncated!

x1[0] = 3.1415   # this will be truncated! Because this list is defined as integers. 
x1

array([3, 3, 7, 4, 6, 9])

### 2.3 : Array slicing


    Slicing: Accessing parts of on array
        ▪ to access whole parts of arrays (i. e., subarrays) use the colon ( : ) character 
        ▪ Syntax: x[start:stop:step]

In [74]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [76]:
x[0:5] # x[:5] does the same thing

array([0, 1, 2, 3, 4])

In [78]:
x[:5] # first five elements 

array([0, 1, 2, 3, 4])

In [79]:
x[5:] # all elements after index 5

array([5, 6, 7, 8, 9])

In [80]:
x[4:7]  # middle subarray

array([4, 5, 6])

In [81]:
x[::2] # every other element with 2 stepping

array([0, 2, 4, 6, 8])

In [82]:
x[::-1] # reversed array

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [83]:
x2

array([[2, 6, 7, 4],
       [3, 7, 7, 2],
       [5, 4, 1, 7]])

In [84]:
x2[:2, :3] # two rows, three columns

array([[2, 6, 7],
       [3, 7, 7]])

In [85]:
x2[:3, ::2] # all rows, every other column

array([[2, 7],
       [3, 7],
       [5, 1]])

#### Indexing and slicing combined:

In [86]:
x2

array([[2, 6, 7, 4],
       [3, 7, 7, 2],
       [5, 4, 1, 7]])

In [89]:
print(x2[:, 0]) # combine indexing and slicing and print first column of x2

[2 3 5]


In [90]:
print(x2[:, 1]) # second column 

[6 7 4]


In [91]:
print(x2[0, :]) # first row of x2

[2 6 7 4]


#### Array slicing: No-copy views

    Views vs. copies
    
    ▪ array slices return views rather than copies of the data 
        → different from Python lists (there, slices are copies)
    ▪ if subarrays are changed, the original array is changed as well!
        → useful when working with large datasets: you can work on pieces of data without the need to copy
          the whole data buffer
    ▪ array copies can be created with the .copy() method
        → Example: x2[0:2, 1:3].copy()

In [92]:
# print original array
print(x2)

[[2 6 7 4]
 [3 7 7 2]
 [5 4 1 7]]


In [93]:
# extract 2x2 subarray

x2_sub = x2[:2, :2]
x2_sub

array([[2, 6],
       [3, 7]])

In [95]:
# modify subarray

x2_sub[0,0] = 99
x2_sub

array([[99,  6],
       [ 3,  7]])

In [97]:
# inspect original array
x2 # the first element has been changed to 99 in the original array as well! 

# so we have to be really cafeful when we change some elements in subarrays, because it will change the origial. 

array([[99,  6,  7,  4],
       [ 3,  7,  7,  2],
       [ 5,  4,  1,  7]])

### Exercise: 

     Use the two numpy arrays from the last exercise (firstname and lastname).
        → Print all first names!
        → Print the first element of the lastname object.
        → Print the last element of the lastname object – without explicitly using the last index as number.
          Hint: What happens if you put a negative number in square brackets to index a python object?
        → Print the second and third element of the lastname object.
 

In [103]:
print(firstname)
firstname[::]
lastname

['John' 'Paul' 'George' 'Ringo']


array(['Lennon', 'McCartney', 'Harrison', 'Starr'], dtype='<U9')

In [101]:
lastname[0]

'Lennon'

In [102]:
lastname[-1]

'Starr'

In [104]:
lastname[1:3] 

# the task is done!

array(['McCartney', 'Harrison'], dtype='<U9')

### 2.4 : NumPy Universal Functions (UFuncs)

    Using vectorized functions
    ▪ often it is useful to use vectorized code instead of functions 
        → easier to read, often feels more natural, usually way faster
    ▪ NumPy provides universal functions for vectorization, almost always more efficient than loops

In [115]:
import numpy as np

np.random.seed(42)     # for reproducibility


def compute_reciprocals(values):
    output = np.empty(len(values))
    for i in range(len(values)):
        output[i] = 1.0 / values[i]
    return output 

data = np.random.randint(0, 10, size=3)
data


%timeit compute_reciprocals(data)   # computation via loop
%timeit 1.0 / data                  # vectorized version 

# the second version is way faster than the first version!

# conclusion: we should use the vectorized version/universal functions!

7.08 µs ± 512 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
980 ns ± 27 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


#### Often-used universal functions: 


    Some wrappers for universal functions
    
    
    Operator         Equivalent ufunc     Description
    
    +                   np.add              Addition (e.g., 1 + 1 = 2)
    -                   np.subtract         Subtraction (e.g., 3 - 2 = 1)
    -                   np.negative         Unary negation (e.g., -2)
    *                   np.multiply         Multiplication (e.g., 2 * 3 = 6)
    /                   np.divide           Division (e.g., 3 / 2 = 1.5)
    //                  np.floor_divide     Floor division (e.g., 3 // 2 = 1)
    **                  np.power            Exponentiation (e.g., 2 ** 3 = 8)
    %                   np.mod              Modulus/remainder (e.g., 9 % 4 = 1)
    

    Other useful universal functions for data science
    
        ▪ np.absolute, np.sqrt (square root)
        ▪ np.sin, np.cos, np.theta (and their inverse pendants np.arcsin, ...)
        ▪ np.exp (i. e., e^x), np.exp2 (i. e., 2^1), np.power (e. g., 3^x via np.power(3,x)) 
        ▪ np.log (ln), np.log2(x), np.log10(x)
 
  
  

### Exercise: 

    → Given any numpy array x, what would the result of the following code be? 
        np.sqrt(np.exp(np.log(x ** 2)))
        
    → Which of the following calculations are equivalent to each other? What is different? 
        
        x = np.arange(1,4)
        y = np.arange(2,5)
        
        x+y 
        sum(x) + sum(y) 
        np.add(x,y) 
        x.sum() + y.sum()
 

In [118]:
z = np.array ([2, 3, 4, 5])
z

array([2, 3, 4, 5])

In [121]:
z1 = np.sqrt(np.exp(np.log(z ** 2)))
z1  # same thing ;) !!!

array([2., 3., 4., 5.])

In [124]:
x = np.arange(1,4)
y = np.arange(2,5)
x,y

(array([1, 2, 3]), array([2, 3, 4]))

In [125]:
x + y # summation of elements in orders

array([3, 5, 7])

In [126]:
sum(x) + sum(y) # sum of both x and y arrays and then sum of them.

15

In [127]:
np.add(x,y) # this does the same thing with x + y

array([3, 5, 7])

In [129]:
x.sum() + y.sum() # this also does the same thing with sum(x) + sum(y)

15

## NumPy Chapter is DONE!