# Introduction to Numpy
##### Made by Maria Musiał

---------


Numpy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays.

Python is among the least efficient languages out there. Data mining is a field of study, which requires a lot of processing power. Why would we ever want to use python in order to do ML? The answer is really simple - we do not. We use python only as an interface to packages, which contain very efficient and optimized programs, such as numpy, scipy, sklearn, tensorflow, pytorch, keras... The list is long. Python is very flexible and simple. We just like its syntax, ability to combine it with programs written in other languages, not necessairly its efficency. `NumPy` stores everything in multidimensional arrays called ndarray and vectorizes all operations on these arrays. 

Contrary to Python lists, the NumPy arrays represent tensors (e.g. 1st rank tensor - a vector, 2nd rank tensor - a matrix). Python list is just a list of things, specifically it could be a list of lists, there is no additional limitation to that. A tensor has some limitations:
- every element must be of the same type and size
- if an array has arrays, they must match as well

After all this:

\begin{matrix} 1 & 2 & 3 \\ 4 & 5 & 6 & 8 \\ 7 & 8 &  \end{matrix}

is not a matrix.

While this:



In [1]:
lst = [[1,2,3],[4,5,6,8],[7,8]]

is a list.


A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.

In [3]:
import numpy as np

In [25]:
a = np.array([1,2,3])

In [23]:
print(a[0], a[1])

[[1 2 3 4 5]
 [1 1 2 3 5]] [[ 1  2  3  4  5]
 [ 1  4  9 16 25]]


In [12]:
b= np.array([[1,2,3],[4,5,6]])
print(b.shape)  #we have 2 rpws and 3 columns

(2, 3)


In [11]:
print(b[0,0], b[0,1], b[1,0]) 

1 2 4


As you can see the handling of ndarrays is quite simillar to simple lists in python. Now let's explore some useful np functions for creating special arrays. They will prove to be useful in future for you.

In [17]:
a = np.zeros((2,2))
print("Array of zeros: \n", a)

b = np.ones((3,2))
print("Array of ones: \n", b)

c = np.full((3,3), 7)
print("Array of constants: \n", c)

d = np.eye(3)
print("Identity matrix: \n", d)

e = np.random.random((2,2))
print("Array of random values: \n", e)

Array of zeros: 
 [[0. 0.]
 [0. 0.]]
Array of ones: 
 [[1. 1.]
 [1. 1.]
 [1. 1.]]
Array of constants: 
 [[7 7 7]
 [7 7 7]
 [7 7 7]]
Identity matrix: 
 [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
Array of random values: 
 [[0.70254182 0.45544815]
 [0.7957028  0.70041184]]


You can see that sometimes the values in array have a dot (.) at the end. It means that the datatype is a float. It sometimes can mess up our calculations, so we can assign specific datatype.

In [20]:
print(type(a[0, 0]))    #<class 'numpy.float64'>

#Assign with different dtpye
a = np.array([[1,2,3],[4,5,6]], dtype=np.int8)
print(type(a[0, 0]))    #<class 'numpy.int8'>

<class 'numpy.float64'>
<class 'numpy.int8'>


In [27]:
a = np.array([
    [
    [1, 2, 3, 4, 5],
    [1, 1, 2, 3, 5]
    ],
    [
    [1, 2, 3, 4, 5],
    [1, 4, 9, 16, 25]
    ]
])

print(f"""
Shape (sizes of dimensions): {a.shape}
Number of dimensions: {a.ndim}
Length (number of elements): {len(a)}
Size (number of nested elements): {a.size}
Type : {type(a)}
Data type (type of array elements): {a.dtype}
""")


Shape (sizes of dimensions): (2, 2, 5)
Number of dimensions: 3
Length (number of elements): 2
Size (number of nested elements): 20
Type : <class 'numpy.ndarray'>
Data type (type of array elements): int32



If we mix datatypes , we end up with character string and we loose cool perks of numpy

In [36]:
a = np.array([1, 2, 'mary', 'had', 2.5, 'lambs'])
a
a.dtype

dtype('<U32')

We can also change the shape of out array whenever we want

In [37]:
a = np.array(list(range(12)))
print(a.shape)

(12,)


In [38]:
a.shape = (3,4)  #reshape to 3 rows and 4 columns
print(a)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [39]:
a = a.reshape(6, 2)   #reshape to 6 rows and 2 columns
print(a)

[[ 0  1]
 [ 2  3]
 [ 4  5]
 [ 6  7]
 [ 8  9]
 [10 11]]


In [63]:
print(a.T)  #transpose
print(np.transpose(a))

[[ 1  5  9]
 [ 2  6 10]
 [ 3  7 11]
 [ 4  8 12]]
[[ 1  5  9]
 [ 2  6 10]
 [ 3  7 11]
 [ 4  8 12]]


### Indexing arrays:

In [50]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print("Array of shape (3,4):\n", a)

# Two ways of accessing the data in the middle row of the array.
# Mixing integer indexing with slices yields an array of lower rank,
# while using only slices yields an array of the same rank as the
# original array:
row_r1 = a[1, :]    # Rank 1 view of the second row of a
row_r2 = a[1:2, :]  # Rank 2 view of the second row of a
print("row of rank 1:\n", row_r1, row_r1.shape)  
print("row of rank 2:\n", row_r2, row_r2.shape)  

# We can make the same distinction when accessing columns of an array:
col_r1 = a[:, 1]
col_r2 = a[:, 1:2]
print("column of rank 1:\n", col_r1, col_r1.shape)  
print("column of rank 2:\n" ,col_r2, col_r2.shape)  

Array of shape (3,4):
 [[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
row of rank 1:
 [5 6 7 8] (4,)
row of rank 2:
 [[5 6 7 8]] (1, 4)
column of rank 1:
 [ 2  6 10] (3,)
column of rank 2:
 [[ 2]
 [ 6]
 [10]] (3, 1)


In conclusion:

In [52]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

print(f"""{a}

Element at second row, third column: {a[1,2]}
Entire first row: {a[0]}
Entire first row as 2-d array: {a[0, None]}
First and second rows, last column: {a[:2,-1]}
2,3 row, middle elements: {a[1:, 1:3]}
""")

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Element at second row, third column: 7
Entire first row: [1 2 3 4]
Entire first row as 2-d array: [[1 2 3 4]]
First and second rows, last column: [4 8]
2,3 row, middle elements: [[ 6  7]
 [10 11]]



### Operations on arrays

In [54]:
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

In [55]:
# Elementwise sum; both produce the array
print(x + y)
print(np.add(x, y))

[[ 6.  8.]
 [10. 12.]]
[[ 6.  8.]
 [10. 12.]]


In [56]:
# Elementwise difference; both produce the array
print(x - y)
print(np.subtract(x, y))

[[-4. -4.]
 [-4. -4.]]
[[-4. -4.]
 [-4. -4.]]


In [57]:
# Elementwise product; both produce the array
print(x * y)
print(np.multiply(x, y))

[[ 5. 12.]
 [21. 32.]]
[[ 5. 12.]
 [21. 32.]]


In [58]:
#Matrix multiplication
print(x @ y)
print(np.dot(x, y))

[[19. 22.]
 [43. 50.]]
[[19. 22.]
 [43. 50.]]


In [59]:
# Elementwise division; both produce the array
print(x / y)
print(np.divide(x, y))

[[0.2        0.33333333]
 [0.42857143 0.5       ]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]


In [60]:
# Elementwise square root; produces the array
print(np.sqrt(x))

[[1.         1.41421356]
 [1.73205081 2.        ]]


In [62]:
# Sum along axis - 0 is column, 1 is row
print(np.sum(x))  #sum of all elements
print(np.sum(x, axis=0))  #sum of each column
print(np.sum(x, axis=1))  #sum of each row

10.0
[4. 6.]
[3. 7.]


### Broadcasting


Broadcasting is a powerful mechanism that allows numpy to work with arrays of different shapes when performing arithmetic operations. Frequently we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array.

In [64]:
a = np.array([1, 2, 3])
b = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Perform element-wise addition
result = a + b
print("Array a:\n", a)
print("Array b:\n", b)
print("Result of a + b:\n", result)

Array a:
 [1 2 3]
Array b:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
Result of a + b:
 [[ 2  4  6]
 [ 5  7  9]
 [ 8 10 12]]


As you can see, numpy intelligently broadcasted shape so that we added a to each row of b

In [65]:
c = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
d = 10

# Perform multiplication
result = c * d
print("Array c:\n", c)
print("Scalar d:", d)
print("Result of c * d:\n", result)

Array c:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
Scalar d: 10
Result of c * d:
 [[10 20 30]
 [40 50 60]
 [70 80 90]]


In [None]:
# Define the arrays
e = np.array([1, 2, 3, 4])
f = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

# Perform element-wise addition
result = e + f
print("Array e:\n", e)
print("Array f:\n", f)
print("Result of e + f:\n", result)


Array e:
 [1 2 3 4]
Array f:
 [[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
Result of e + f:
 [[ 2  4  6  8]
 [ 6  8 10 12]
 [10 12 14 16]]
Error: operands could not be broadcast together with shapes (3,) (3,4) 


In [68]:
g = np.array([1, 2, 3])
# Attempt to perform element-wise addition
try:
    result = g + f
    print("Array g:\n", g)
    print("Array f:\n", f)
    print("Result of g + f:\n", result)
except ValueError as e:
    print("Error:", e)

Error: operands could not be broadcast together with shapes (3,) (3,4) 


Why do we have an error this time?

### Array manipulation

The fun part of Numpy is that we can manipulate arrays however we want. We dont need to create loops that are messy and hard to read. 

In [81]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Concatenate a and b along the first axis
result_1d = np.concatenate((a, b))
print("Concatenate 1D arrays:\n", result_1d)

c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[7, 8, 9], [10, 11, 12]])

# Concatenate c and d along axis 0
result_2d = np.concatenate((c, d), axis=0)
print("\nConcatenate 2D arrays along axis 0:\n", result_2d)


Concatenate 1D arrays:
 [1 2 3 4 5 6]

Concatenate 2D arrays along axis 0:
 [[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


In [82]:
a = np.array([1, 2, 3])
b = np.array([1, 2, 3])
c = np.array([1, 2, 3])

# Stack a, b, and c along a new axis
result = np.stack((a, b, c))
print("Stacked arrays along a new axis:\n", result)
print("Shape of stacked array:", result.shape)


Stacked arrays along a new axis:
 [[1 2 3]
 [1 2 3]
 [1 2 3]]
Shape of stacked array: (3, 3)


In [83]:
a = np.array([1, 2, 3, 4, 5, 6])

# Split array into three sub-arrays
result = np.split(a, 3)
print("Split array into three parts:\n", result)

# Split a 2D array along axis 1
b = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])
result_2d = np.split(b, 2, axis=1)
print("\nSplit 2D array along axis 1:\n", result_2d)


Split array into three parts:
 [array([1, 2]), array([3, 4]), array([5, 6])]

Split 2D array along axis 1:
 [array([[ 1,  2],
       [ 5,  6],
       [ 9, 10],
       [13, 14]]), array([[ 3,  4],
       [ 7,  8],
       [11, 12],
       [15, 16]])]


Stacking arrays horizontally or vertically

In [84]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Stack arrays horizontally
result = np.hstack((a, b))
print("Horizontally stacked arrays:\n", result)

# Horizontally stack 2D arrays
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[7, 8, 9], [10, 11, 12]])
result_2d = np.hstack((c, d))
print("\nHorizontally stacked 2D arrays:\n", result_2d)

# Stack arrays vertically
result = np.vstack((a, b))
print("\nVertically stacked arrays:\n", result)

result_2d = np.vstack((c, d))
print("\nVertically stacked 2D arrays:\n", result_2d)



Horizontally stacked arrays:
 [1 2 3 4 5 6]

Horizontally stacked 2D arrays:
 [[ 1  2  3  7  8  9]
 [ 4  5  6 10 11 12]]

Vertically stacked arrays:
 [[1 2 3]
 [4 5 6]]

Vertically stacked 2D arrays:
 [[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


In [85]:
a = np.array([[1, 2], [3, 4], [5, 6]])

# Flatten the array    
result = a.flatten()
print("Original array:\n", a)
print("Flattened array:\n", result)


Original array:
 [[1 2]
 [3 4]
 [5 6]]
Flattened array:
 [1 2 3 4 5 6]


### Boolean indexing

We can also index array by boolean masks. It is useful for filtering data.

In [88]:
arr = np.arange(1, 21)
print("Original array:\n", arr)

# Use Boolean indexing to filter even numbers
even_numbers = arr[arr % 2 == 0]
print("\nEven numbers:\n", even_numbers)


Original array:
 [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20]

Even numbers:
 [ 2  4  6  8 10 12 14 16 18 20]


In [89]:
np.random.seed(0)
arr = np.random.randint(-10, 11, 10)
print("Original array:\n", arr)

# Use Boolean indexing to replace negative values with 0
arr[arr < 0] = 0
print("\nArray with negatives replaced by 0:\n", arr)

Original array:
 [  2   5 -10  -7  -7  -3  -1   9   8  -6]

Array with negatives replaced by 0:
 [2 5 0 0 0 0 0 9 8 0]


In [91]:
filtered_elements = arr[(arr > 0) & (arr <= 8)]
print("\nElements between 10 and 30:\n", filtered_elements)


Elements between 10 and 30:
 [2 5 8]


### Exercise: Boolean indexing
Create a 1D array of integers from 1 to 15. Replace all values that are either divisible by 3 or greater than 10 with -1.

In [95]:
#TODO

### Exercise: Element-wise Operations on Vectors
Create two random vectors a and b of length 10. 

1. Calculate the dot product of a and b.
2. Use broadcasting to compute a^2 + b^2 for each element and store the results in a new array c.
3. Find the maximum value in c, and then the index of this maximum.

In [94]:
#TODO

### Exercise: Row multiplication based on condition
Given a 5x5 matrix with random integers between -50 and 50, perform the following tasks:

1. Replace all negative numbers with their absolute values. (np.abs())
2. Extract all elements that are greater than 25 and find the mean of these elements. (.mean())
3. Normalize the matrix to scale values between 0 and 1 (use .min(), .max()) and formula:
- x' = (x - min(x))/(max(x) - min(x))

In [96]:
np.random.seed(0)
matrix = np.random.randint(-50, 51, (5, 5))

#TODO