## NumPy

### Credit, Thanks, and Purpose
To Shashank Kalanithi, whose warmth, courage to err and correct, knowledge, and hours of labor have gone into producing dozens of videos and Notion Documents on data science and analytics, from theory to technique -- all taught masterfully -- thank you.

I produced this notebook as a repository for my own understanding and plan to add to it as I learn more. I hope it might serve a similarly useful pedagogic purpose to others at the beginning of thier Python journey, and that those with greater expertise might correct any errors or misapprehensions they notice to help create a useful set of references that future us will be happy to have.

Much of the fundamentals of this knowledge are based on Shashank's video "Python for Data Scientists and Data Analysts"; I strongly encourage you to support him in any way you can and make use of the explanations, documents, and courses he produces. I have found them to be excellent resources and trust that anyone wanting to brush up on their knowledge or learn the basics will as well.

https://www.youtube.com/watch?v=sZDgJKI8DAM&ab_channel=ShashankKalanithi leads to "Python for Data Scientists and Analysts"
https://numpy.org/doc/stable/user/absolute_beginners.html leads to the NumPy documentation -- an excellent instructional tool and the original source for much of the information here

In [4]:
# Numpy is a mathematical library for Python which forms the foundation of many more advanced and useful libraries for data analysis...
# ... as such, a thorough understanding of it isn't required but is useful.

import numpy as np

### Arrays and Reshaping

In [5]:
# Numpy arrays are the same as Python lists, but: They are faster and are able to be manipulated by Numpy methods.

numpy_array_test = np.array([1, 3, 5, 7, 9, 11]) # <-- Note that within the parentheses, list brackets must be included.
numpy_array_test

array([ 1,  3,  5,  7,  9, 11])

In [6]:
# Numpy arrays can be reshaped from 1D arrays, as shown above, to 2D and 3D arrays (effectively tables):

numpy_array_test1 = np.array([1, 3, 5, 7, 9, 11]).reshape(2,3) # <-- 2 denotes the # of rows and 3 the # of columns
numpy_array_test1                                              # .reshape is attached to the array to actiate the reformatting

array([[ 1,  3,  5],
       [ 7,  9, 11]])

In [7]:
numpy_array_test3 = np.array([1, 3, 5, 7, 9, 11, 13, 15]).reshape(2,2,2) # <- 2 rows, 2 columns, in 2 tables
numpy_array_test3

array([[[ 1,  3],
        [ 5,  7]],

       [[ 9, 11],
        [13, 15]]])

### Numpy Arithmetic Methods

In [8]:
q = np.array([1,2,3,4,5,6])

print(q.sum())
print(q.min())
print(q.max())
print(q.mean())
print(q.std())

21
1
6
3.5
1.707825127659933


### Numpy Special Arrays

In [9]:
np.zeros(5)                 # Creates an array of zeros with 5 elements


array([0., 0., 0., 0., 0.])

In [10]:
np.linspace(0, 5, 10)       # Creates an array of values from 0 through 5 in 10 even spaces

array([0.        , 0.55555556, 1.11111111, 1.66666667, 2.22222222,
       2.77777778, 3.33333333, 3.88888889, 4.44444444, 5.        ])

In [11]:
np.random.rand(5)           # Three random values between 0 and 1

array([0.16732833, 0.13612664, 0.82299208, 0.0640486 , 0.53673605])

In [12]:
np.random.randint(1, 100)  # Generates a random integer between 1 and 100

64

In [13]:
np.arange(5)               # Provides a range of integers from x-y, in this case 0-5, since the beginning was unspecified.
                             # Range beginnings and end are separated by comma. ( , )

array([0, 1, 2, 3, 4])

## Reading & Understanding the NumPy Documentation Guide

In [14]:
import numpy as np # np is the conventional alias for numpy, and as such, should always be used

### NumPy Arrays: Why they're better than Python lists, how they're indexed & elements accessed, and what determines thier rank and shape

In [15]:
a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]) # Here, a Numpy array is generated. Arrays are similar to Python lists-
# -however, they are uniformly classified by dtype (the type of data in the array) in a homogeneous manner, which allows for more-
# -efficient, problem-free calculations. Moreover, the Numpy array is more compact and therefore faster, meaning that complex operations-
# -are themselves more efficient.

# As shown in the above array, a nested list allows for each element of the array to itself be composed of several sub-elements, in this-
# -case numbers. Arrays can be indexed "by a tuple of nonnegative integers, by booleans, by another array, or by integers." Their rank-
# -is determined by the number of dimensions in the array. In this example, the array has 2 dimensions (height and width). Their shape is-
# -determined by a "touple of integers giving the size of the array along each dimension", for example height 100, width 5 for a 2d array.

In [16]:
# Accessing (indexing) the information stored in an array is done with brackets, as follows, and the index of elements begins count at 0-
# -as is the case in standard Python. As such, if you wanted to acces the first element [1, 2, 3, 4] from the above array, you would write:
print(a[0])

[1 2 3 4]


### Explanation of Numpy Arrays & Relevant Terminology (Vectors, Matrixes, & Array Dimensionality/Axes)

In [17]:
# NDarray is a data class in Numpy. It stands for N-dimensional array, that is to say, an array of an undefined number of dimensions.
# 1D array is a VECTOR, a 2D array is a MATRIX, and an array with 3 or more dimensions is called a TENSOR.

# Logically, dimensions are called axes in Numpy, as they refer to the number of axes we imagine on a Cartesian plane.
# 1D would compose a line, 2d would compose a rectangle, and 3d would compose a cuboid.

In [18]:
[[0., 0., 0.],
 [1., 1., 1.]]

# Here, the array has 2 dimensions (axes). The first axis (vertical) has a length of 2, and the second (horizontal) has a length of 3
# Therefore, a rectangular shape is formed, which can be thought of as 2 in height and 3 in width.

[[0.0, 0.0, 0.0], [1.0, 1.0, 1.0]]

### Array Creation: Different Types of Arrays

In [19]:
a = np.array([1, 2, 3])
print(a)
# Above, a vertical column containing the numbers 1 , 2 , 3 is created. 1 axis, 1D.

# Instead of specifying the particular contents within each element (cell) of the array, one can specify the number of elements and-
# -fill them with the number specified by the name, for example:
b = np.zeros(2) # The 2 inside the parentheses specifies the number of elements and the method "zeros" fills them with zeros.
print(b)
np.ones(2)
# NOTE: In Numpy, like other libraries, there is often no need to print() code to execute and display it. Here, the first 2 lines of code-
# -are printed in order for them to be displayed, whilst the last line can be left without a variable assignment.

c = np.empty(14) # This creates an array of random elements; useful for creating a random dataset or random IDs. NOTE: the elements-
print(c) # -created are floats and ARE NOT unique, as such, some care must be taken in utilizing this for a random set. It does appear-
# -to me that the elements are either all identical or all unique, depending on the number of elements: 1,2,3,8,9,14 have all been-
# -identified as producing all identical elements. Others certainly exist.

np.arange(5) # This creates a range of integers from 0 to whatever number is contained within the parentheses, in this case 5:

# Range of integer arrays can further be specified by adding 2 numbers into the argument: 1) the first number of the range, 2) the last-
# -number of the range, and 3) the "step" size of the numbers in between -- arithmetic sequencing, essentially, as shown here:

np.arange(1, 20, 3.66)

# Lastly, using the linspace method, one can specify the first and last numbers of an array and then the number of intervals in between-
# -the first and last, for example:
np.linspace(0, 10, num=2)

# Lastly, within the arguments for any of the above methods and, indeed, nearly all methods in Numpy, one can specify the datatype of-
# -the data produced, for instance:
d = np.empty(14, dtype=np.int64)
print(d)

[1 2 3]
[0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0]


### Adding, Removing, and Sorting Elements

This will cover np.sort() and np.concatenate()

In [20]:
arr = np.array([2, 1, 5, 3, 7, 4, 6, 8]) # Here, we have merely created a specific array of unsorted numbers, to be used later
print(arr)

[2 1 5 3 7 4 6 8]


In [21]:
np.sort(arr) # Here, the sort() method is used to sort the numbers in the arr object in ascending order

# The following sorting mechanisms exist also:

# argsort(a, axis=-1, kind=None, order=None) which "Perform an indirect sort along the given axis using the algorithm specified by-
# -the kind keyword". Usefuleness and implementation of argsort is  unclear.

# lexsort(keys, axis=-1) which "Given multiple sorting keys, which can be interpreted as columns in a spreadsheet, lexsort returns-
# -an array of integer indices that describes the sort order by multiple columns." NOTE: Here is an extended example of how lexsort can-
# -be used:

surnames =    ('Hertz',    'Galilei', 'Hertz') # Two arrays are defined. Surnames, here and below->
first_names = ('Heinrich', 'Galileo', 'Gustav') # The first names are defined.
ind = np.lexsort((first_names, surnames)) # Then, lexsort called ind returns the index of each pair (Herts + Heinrich = 1, for example)
ind

array([1, 2, 0], dtype=int64)

In [22]:
[surnames[i] + ", " + first_names[i] for i in ind] # Here, quite simply, the variable i allows for a function to loop through the "columns"-
# -created by lexsort called ind. For each i (number), the function returns the first name and surname, split by a comma and space.

['Galilei, Galileo', 'Hertz, Gustav', 'Hertz, Heinrich']

In [23]:
# CONCATENATION, as with the same function in Excel, is used to combine several elements together
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])
np.concatenate((a, b)) # As demonstrated, the two arrays a and b are combined with the concatonate method

array([1, 2, 3, 4, 5, 6, 7, 8])

In [24]:
# Like the example above, elements which contain several numbers (2d arays) can be concatenated with a 1d array to produce a longer-
# -2d array, as seen here:

x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6]])
np.concatenate((x, y), axis=0) # As demonstrated, the two arrays are concatenated into a larger 2d array 

array([[1, 2],
       [3, 4],
       [5, 6]])

### Knowing the Shape and Size of Arrays:
This will cover ndarray.ndim, ndarray.size, ndarray.shape

In [25]:
# ndarray.ndim reports back the number of dimensions/axes of an array: a 2D array has, naturally, 2 axes

# ndarray.size reports back the number of elements of an array, that is, the product of individual axis elements: a 2D array
# with 3 rows and 2 columns will have 6 elements (2*3=6)

# ndarray.shape reports back a tuple of the number of individual axis elements of an array: the 2D array from above, with 3 rows-
# -and 2 columns would be returned as (3, 2)

In [26]:
# Demonstration of the above methods:
array_example = np.array([[[0, 1, 2, 3],
                           [4, 5, 6, 7]],

                          [[0, 1, 2, 3],
                           [4, 5, 6, 7]],

                          [[0 ,1 ,2, 3],
                           [4, 5, 6, 7]]])

In [27]:
array_example.ndim #This will return 3, as there are 3 axes in our array -> a prism

3

In [28]:
array_example.size #This will return 24, as this is the product of the individual axis elements (how long each axis is)
# in this case, there is height of 3 (rows), a depth of 2 (perpendicular rows), and a width of 2 (columns), therefore: 3*2*4=24

24

In [29]:
array_example.shape #This will simply return a tuple of the number of individual axis elements, that match the calculations above

(3, 2, 4)

### Reshaping Arrays
Using arr.reshape() and np.reshape

In [30]:
# Given an array, arr.reshape can transform it into a differently shaped array with an equal number of elements. Ex:

a = np.arange(6)
print(a)

b = a.reshape(3, 2)
print(b)

# Here, the original 1D array with 6 elements was tranformed into a 2D array with 3 rows and 2 columns, having also 6 elements.

[0 1 2 3 4 5]
[[0 1]
 [2 3]
 [4 5]]


In [31]:
# One can use the abovementioned np.reshape to give further arguments to how the array should be reshaped, though this is somewhat-
# -niche in its use:

np.reshape(a, newshape=(6, 1), order='F')

# The arguments go as follows; a is simply the named variable of our array, specified above; newshape=() indicates how the new shape-
# -of the array ought to appear, with the same rules regarding the number of elements as above; order='', where the possible options are
# - C' or 'F' indicates the order type used in the indexing of the array. C order is Row-major whilst F order is Column-major. They
# -refer to C code and Fortan code respectively, and in essence, C order reads from the last element of the array, then row-by-row
# -unil it completes indexing the array. Whereas F order reads from the first element of the array, then column-by-column until it 
# -completes indexing the array. THIS argument is optional. 

array([[0],
       [1],
       [2],
       [3],
       [4],
       [5]])

### On the Conversion of 1D Arrays into 2D Arrays & Other Transformations
Use unclear; covers np.newaxis and np.expand_dims

In [32]:
a = np.array([1, 2, 3, 4, 5, 6])
a.shape
# Given the above array as the example on which to use np.newaxis and np.expand_dims. Notice that the .shape method tells us
# -that the shape is (6,) with a blank space following the comma, indicating 1 axis.

(6,)

In [33]:
a2 = a[np.newaxis, :] # Note the positioning of the ":" argument, as its position is directly relevant to the transofrmation
print(a2.shape)
print(a2)

# Notice that the .shape method tells us the the shape is now (1, 6), which fundamentally does not change the elements in the array,
# -but rather changes the manner in which the array is read, now having 1 row of 6 columns.

(1, 6)
[[1 2 3 4 5 6]]


In [34]:
a3 = a[:, np.newaxis] # Note that the ":" argument is now in front of the .newaxis method, to indicate where the .newaxis method
print(a3.shape) #-should be executed. Rememeber also that the order of interpretation for numbers is (rows, columns) and therefore
print(a3) #- by putting the .newaxis method in the second slot, we create a column axis.

# Note the switch between the product of the shape function between the first and second execution of the newaxis method: it is again-
# -indicative of the fact that the first number in a tuple is read as the row and the second as the column.

(6, 1)
[[1]
 [2]
 [3]
 [4]
 [5]
 [6]]


### Indexing and Slicing
If you want to take only one section of a Numpy array and analyze it, you must first take a subset, slice, and/or index of said array.
This is the manner by which that is done:

NOTE: Where the index of elements begins at 0 from top->bottom, it can also be read in the negative. That is to say: the last element of
an array 4 elements long can be thought of having both the index 3 (0,1,2,3 from top->bottom) and the index -0 (-0,-1,-2,-3 from bottom->top)
        This fact can be used in indexing, though it strikes me as generally useless.

In [35]:
c = np.array([[1 , 3, 4, 1], [2, 2, 17, 2], [4, 13, 8, 9]]) # Here we have a 2d array, with 3 rows and 4 columns
print(c)


[[ 1  3  4  1]
 [ 2  2 17  2]
 [ 4 13  8  9]]


In [39]:
# Given this array, there are many slicing, indexing, and identifying methods we can apply to gain a better undersatnding of it
print(c[c<5]) # Here, we print a, when a is less than 5

# Similarly, we might want to see a boolean replace the number when it meets our conditions, such as here:
print(c<5) 

# Naturally, "greater than (<)" as demonstrated above, is not the only operator we can use. <=, %x==0, and even combinations of conditions
# -created using the & and | operators can be used to index an array as you like it, rather like SQL. Here are some examples:

c_divby2 = c[c%2==0] # Here, we produce all the numbers which are divisible by 2 (that's what the %x==0 signifies)
print(c_divby2)

c_2conditions = c[(c >= 4) & (c <= 13)] # Here, we produce all the numbers greater or equal to 4 AND lesser or equal to 13
print(c_2conditions)

[1 3 4 1 2 2 2 4]
[[ True  True  True  True]
 [ True  True False  True]
 [ True False False False]]
[4 2 2 2 4 8]
[ 4  4 13  8  9]
