# <h1><center>Week 1 Lab (NumPy)</center></h1>
## <h1><center>COSC 3337 Dr. Rizk</center></h1>

## Intro to NumPy

NumPy is the fundamental package for scientific computing with Python. It's used for working with arrays and contains functions for working in the domain of linear algebra, fourier transform, and matrices. In Python we have list, which serve the purpose of arrays, so why do we bother learning NumPy in the first place? Well, NumPy arrays are much faster than traditional Python lists and provide many supporting functions that make working with arrays easier. Part of why they're significantly faster is because the parts that require fast computation are written in C or C++.

Let's begin by importing pandas and learning about the Series data type. If for some reason you don't have pandas installed, you will first have to go to your terminal (or Anaconda Prompt if on Windows) and enter the following:

    conda install numpy
    
Make sure you've already installed [Anaconda](https://www.anaconda.com/)

In [1]:
import numpy as np

### Creating Arrays and Common Methods

The first way of creating a NumPy array is by converting your existing Python list.

In [2]:
python_list = [1, 2, 3, 4, 5]
print(python_list)
print(type(python_list))

[1, 2, 3, 4, 5]
<class 'list'>


In [3]:
# create python array
python_list = [1,3,24,11,20]
print(python_list)
print(type(python_list))

[1, 3, 24, 11, 20]
<class 'list'>


In [4]:
# create numpy array
numpy_array = np.array(python_list)
print(numpy_array)
print(type(numpy_array))

[ 1  3 24 11 20]
<class 'numpy.ndarray'>


In [5]:
python_2d_array = [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15]]
print(python_2d_array)
print(type(python_2d_array))

[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15]]
<class 'list'>


In [6]:
# create 2d python array
python_2d_array = [[5 ,36 ,47 ,45 ,14], [38,23,20,11,40], [31 ,25 ,4 ,48 27]]
print(python_2d_array)
print(type(python_2d_array))

SyntaxError: invalid syntax. Perhaps you forgot a comma? (4154868002.py, line 2)

In [None]:
numpy_2d_array = np.array(python_2d_array)
print(numpy_2d_array)
print(type(numpy_2d_array))

In [None]:
# create 2d numpy array
numpy_2d_array = np.array(python_2d_array)
print(numpy_2d_array)
print(type(numpy_2d_array))

However, you are more likely to use some of NumPy's built in methods to generate ndarrays. Here we'll introduce you to a few of these built in methods.

***arange(start, stop, step)*** will return evenly spaced values within a given interval. The default step size is 1.

In [None]:
np.arange(0, 15)

In [None]:
# generate a numpy array of size 30 from 0-29
np.arange(0, 30)

In [None]:
np.arange(0, 15, 2)

In [None]:
# generate a numpy array containing multiples of 3 up to 30
np.arange(0, 30, 3)

What if we wanted a 2d array instead? We can call ***reshape(rows, columns)*** on an existing NumPy array. Please note that the product of rows and columns must evaluate to the total number of elements in your current NumPy array.

In [None]:
np.arange(0, 15).reshape(3, 5)

In [None]:
# create a 10x10 numpy array containing 0-99
np.arange(0,100).reshape(10,10)

If we'd like to generate an ndarry of zeroes or ones (useful with certain calculations), we could do so by simply calling ***zeros*** or ***ones***. For example:

In [None]:
np.zeros(15)

In [None]:
# create a size 100 numpy array of 0's
np.zeros(100)

In [None]:
np.zeros(15).reshape(3, 5)

In [None]:
# create a numpy 10x10 array of 100 0's
np.zeros(100).reshape(10, 10)

In [None]:
np.ones(15)

In [None]:
np.ones(15).reshape(3, 5)

In [None]:
# create a numpy 10x10 array of 1's
np.ones(100).reshape(10,10)

You might have noticed that these values defaulted to floats. If for some reason you'd like to use a different type, say, int, you can insert an additional paramater such as ***dtype=int***. See more on dtypes [here](https://numpy.org/doc/stable/user/basics.types.html)

In [None]:
np.ones(15, dtype=int).reshape(3, 5)

In [None]:
# create numpy array of 0's that are integers
np.zeros(100, dtype=int).reshape(10,10)

A common matrix used in linear algebra is the identity matrix (an n × n square matrix with ones on the main diagonal and zeros elsewhere). We can generate this in NumPy using ***eye(n)***. 

In [None]:
np.eye(5)

In [None]:
# create an identity matrix of size 10 defaulted to integers
np.eye(10,dtype=int)

Another common use case is to generate an ndarray of random numbers. This can be done in 2 ways. ***rand*** (which will fill the ndarray with random samples from a uniform distribution over [0, 1)), and ***randn*** (which will return a sample (or samples) from the standard normal distribution.) Additionally, we can use ***randint(low, high, size)*** to generate a single or multiple random integers between [low, high). The size parameter specifies how many we'd like. Let's see a few examples.

In [None]:
np.random.rand(5)

In [None]:
# create an array of random numbers 0-1 with rand()
np.random.rand(10)

In [None]:
np.random.randn(5)

In [None]:
# create an array of random numbers 0-1 with randn()
np.random.randn(10)

In [None]:
np.random.randint(1,100)

In [None]:
np.random.randint(1, 100, 15)

In [None]:
# create a size 100 array of random numbers between 0 - 1000
np.random.randint(1,1000,100)

Other common methods that you're likely to encounter in this class include ***min***, ***max***, ***argmin***, and ***argmax***. The only difference between the two arg methods is that they'll instead return the index position of the min/max value. For example:

In [None]:
A = np.random.randint(0, 100, 20).reshape(4, 5)
A

In [None]:
print(f'The smallest value in A is {A.min()}, and is a located at position {A.argmin()}')
print(f'The largest value in A is {A.max()}, and is a located at position {A.argmax()}')

In [None]:
# create a numpy 10x10 array of numbers 0-1000
new_numpy = np.random.randint(0,1000,100).reshape(10,10)
new_numpy

In [None]:
print(f'The smallest value in A is {new_numpy.min()}, and is a located at position {new_numpy.argmin()}')
print(f'The largest value in A is {new_numpy.max()}, and is a located at position {new_numpy.argmax()}')

Lastly, we'll often find ourselves wanting to know the shape (dimensions) of our ndarray. This can be done using ***shape***.

In [None]:
A.shape

In [None]:
np.shape(A)

Great! you now know how to create NumPy arrays and some of the common methods. Let's now look into some common operations we can perform on these arrays.

### Common Operations

In [17]:
A = np.arange(1, 16)
B = np.arange(1, 30, 2)
C = np.arange(0, 4).reshape(2, 2)
D = np.arange(0, 4).reshape(2, 2)
E = np.arange(1, 16).reshape(3, 5)
F = np.arange(11)
print(f'A: {A}')
print(f'B: {B}')
print('C:')
print(C)
print('D:')
print(D)
print('E:')
print(E)
print(f'F: {F}')

A: [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]
B: [ 1  3  5  7  9 11 13 15 17 19 21 23 25 27 29]
C:
[[0 1]
 [2 3]]
D:
[[0 1]
 [2 3]]
E:
[[ 1  2  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]]
F: [ 0  1  2  3  4  5  6  7  8  9 10]


#### Arithmetic

In [None]:
A + B

In [None]:
A + 5

In [None]:
A - B

In [None]:
A - 5

In [None]:
A * B

In [None]:
A * 5

In [None]:
A / B

In [None]:
A / 2

Note: Becareful dividing by zero. You'll get ***nan*** short for Not a number. 

As you may have noticed, the standard operations \*, +, -, / work element-wise on arrays. If you'd like to instead do matrix multiplication, ***matmul*** can be used. [Here's](https://www.mathsisfun.com/algebra/matrix-multiplying.html) a quick reference in case you forgot how matrix multiplication works.

In [None]:
np.matmul(C, D)

In [15]:
#modified operations
A = np.random.randint(0, 100, 20).reshape(4, 5)
B = np.random.randint(0, 100, 20).reshape(4, 5)
print(A)
print(B)
print(A + B)
print(A * B)

[[ 7  4 56 22 50]
 [96 19 47 50 12]
 [60 59 67 10 10]
 [93 83 72 48 50]]
[[84 63 41 83 88]
 [49 95 59 55 10]
 [49 42 61 27 34]
 [65 28 81 24 31]]
[[ 91  67  97 105 138]
 [145 114 106 105  22]
 [109 101 128  37  44]
 [158 111 153  72  81]]
[[ 588  252 2296 1826 4400]
 [4704 1805 2773 2750  120]
 [2940 2478 4087  270  340]
 [6045 2324 5832 1152 1550]]


#### Universal Functions

NumPy also contains [universal functions](https://numpy.org/doc/stable/reference/ufuncs.html), which is a function that operates on ndarrays in an element-by-element fashion. Let's see a few examples.

In [18]:
np.sqrt(E)

array([[1.        , 1.41421356, 1.73205081, 2.        , 2.23606798],
       [2.44948974, 2.64575131, 2.82842712, 3.        , 3.16227766],
       [3.31662479, 3.46410162, 3.60555128, 3.74165739, 3.87298335]])

In [19]:
np.log(E)

array([[0.        , 0.69314718, 1.09861229, 1.38629436, 1.60943791],
       [1.79175947, 1.94591015, 2.07944154, 2.19722458, 2.30258509],
       [2.39789527, 2.48490665, 2.56494936, 2.63905733, 2.7080502 ]])

Something that will often come in handy is the ***where(condition, x, y)*** method. This will loop through every element in your ndarray and return a new ndarray that replaces the element with x if the condition is met, and y if the condition is not met. In the example below we're multiplying all odd numbers by 100, else replacing with -1. Try changing the -1 to F in the example below and see what happens.

In [None]:
np.where(F%2==0, -1, F*100)

In [31]:
A = np.random.randint(0,100,10)
np.where(A % 2 == 1, np.log(A), np.sqrt(A))

array([4.47213595, 3.61091791, 8.24621125, 8.71779789, 3.29583687,
       4.00733319, 8.24621125, 8.60232527, 6.32455532, 4.57471098])

Nice! Now that you know how to create NumPy arrays and perform basic operations on them, let's take a look at how to index and select certain elements.

### Indexing

#### Indexing 1d array
Note: recall that counting starts from 0. Given an array [5, 6, 7, 8], we say that value 5 is at index/position 0.

In [None]:
A = np.arange(15)
A

Obtaining a single element will look similar to Python arrays:

In [None]:
print(f'A[0]: {A[0]}')
print(f'A[5]: {A[5]}')
print(f'A[14]: {A[14]}')

We can grab a section using ***A[start_index : stop_index]***. stop_index is not inclusive.

In [None]:
A[0:10]

We can also modify values in this way.

In [None]:
A[0:10] = 500
A

In [40]:
# 1D array indexing
B = np.random.randint(0,100,100)
print(print(f'B[69]: {B[69]}'))

B[69]: 33
None


#### Indexing a 2d array

In [None]:
B = np.arange(50).reshape(5, 10)
B

Obtaining a single element can be done using ***B[row, col]***.

In [None]:
print(f'0th row and 2nd col in B: {B[0, 2]}')
print(f'3rd row and 3rd col in B: {B[3, 3]}')
print(f'4th row and 2nd col in B: {B[4, 2]}')

Similar to before, we can also grab a section of interest from this 2darray. Only now we have to specify both the row sections of interest and column sections of interest. ***B[0:2, 3:5]*** says: I want rows 0-2 from B, but only the elements in that row corresponding to columns 3-5. Recall that stop_index is not inclusive. Here stop_index for the rows is 2, and stop_index for the columns is 5.

In [None]:
B[0:2, 3:5]

Again, we can also modify values in this way.

In [None]:
B[0:2, 3:5] = -1
B

In [44]:
# 2D array indexing
B = np.random.randint(0,100,100).reshape(10,10)
print(B[0:2, 5:10])

[[79 92 12 80 15]
 [72 89 65 14 69]]


#### Boolean Array Indexing

In [45]:
C = np.arange(50).reshape(10, 5)
C

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44],
       [45, 46, 47, 48, 49]])

A really cool feature of NumPy is that we can index arrays using comparison operators. This Lets us modify or select only elements meeting some condition. To demonstrate this, lets first see what is returned when we try to use comparison operators with our arrays.

In [46]:
C%2 == 0

array([[ True, False,  True, False,  True],
       [False,  True, False,  True, False],
       [ True, False,  True, False,  True],
       [False,  True, False,  True, False],
       [ True, False,  True, False,  True],
       [False,  True, False,  True, False],
       [ True, False,  True, False,  True],
       [False,  True, False,  True, False],
       [ True, False,  True, False,  True],
       [False,  True, False,  True, False]])

We get back an array of the same shape telling us which values in ***C*** satisfy the condition C%2==0 (even values). Recall that 0 is an alias for False, and 1 is an alias for True. Because of this, we can actually call the sum function right off of this array, which will evaluate to the total number of even numbers in ***C***.

In [47]:
(C%2 == 0).sum()

25

Something we'll find ourselves doing more often is passing the boolean array as an index. What this will do is filter out the False elements and only leave us with the elements corresponding to True (even values in our case).

In [48]:
C[C%2 == 0]

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32,
       34, 36, 38, 40, 42, 44, 46, 48])

Congratulations! You now know how to create NumPy arrays, perform common operations on them, and how to index them. There's so much that NumPy can do, but this should cover just about all that you'll need to succeed in this course. Other popular Python libraries used for data science such as Pandas and Matplotlib are built on top of NumPy, so you'll be using a lot of these features alongside those libraries. 