# Lab Session: Introduction to NumPy

**Objective:**
By the end of this lab, you will:
- Understand the rationale behind NumPy and its advantages over basic Python.
- Learn about NumPy arrays and vectorization.
- Explore basic NumPy operations, including ufuncs and masking.

---

In [1]:
!pip install numpy matplotlib seaborn



## 1. Introduction to NumPy

### 1.1 What is NumPy?
NumPy (Numerical Python) is the fundamental package for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions.

**Key Features:**
- **Efficiency:** NumPy arrays are stored more compactly than Python lists, leading to faster computations.
- **Vectorization:** Allows operations on entire arrays without writing explicit loops.

### 1.2 Why Use NumPy?
- **Performance:** NumPy operations are implemented in C, making them much faster than equivalent Python code.

---

In [2]:
# Core imports
import time
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

### **Exercise 1: Basic Array Creation**
Create a 1D NumPy array named `data_vector` containing the integers 5, 10, 15, 20, and a $3 \times 3$ array named `identity_matrix` filled with ones.

In [12]:
#Create a 1D NumPy array named `data_vector` containing the integers 5, 10, 15, 20, and a 3x3 array named `identity_matrix` filled with ones.
data_vector = np.array([5, 10, 15, 20])
data_vector

array([ 5, 10, 15, 20])

In [13]:
identity_matrix = np.ones((3, 3))
identity_matrix

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

You can inspect the shape of the array using both the ```.ndim``` attribute or the ```.shape``` attribute. 

In [None]:
data_vector.shape #(rowx,columns)

(4,)

In [None]:
identity_matrix.shape #(rowx,columns)

(3, 3)

In [None]:
data_vector.ndim #number of dimensions (1D array)

1

In [None]:
identity_matrix.ndim #number of dimensions (2D array)

2

## 2. The Power of Vectorization

**Vectorization** is the process of converting iterative operations into operations on entire arrays. This is possible in NumPy because it uses optimized C and Fortran code under the hood.

The following code demonstrates the significant performance benefit of using vectorized NumPy operations over a standard Python list comprehension.

In [None]:
# Create a list and a NumPy array of 50 million random numbers
python_list = list(np.random.rand(50_000_000))
numpy_array = np.random.rand(50_000_000)

# Time the Python loop
start_time = time.time()
squared_list = [x**2 for x in python_list]
loop_time = time.time() - start_time

# Time the NumPy operation
start_time = time.time()
squared_array = numpy_array**2
numpy_time = time.time() - start_time

print(f"Time with Python loop: {loop_time:.5f} seconds")
print(f"Time with NumPy: {numpy_time:.5f} seconds")
print(f"NumPy is approximately {loop_time / numpy_time:.0f}x faster.")

Time with Python loop: 3.24232 seconds
Time with NumPy: 0.25680 seconds
NumPy is approximately 13x faster.


### **Exercise 2: Vectorized Scaling**
Given a NumPy array `temperatures` (in Celsius), use vectorization to convert all values to Fahrenheit using the formula: $F = C \times 1.8 + 32$.

In [None]:
temperatures = np.array([28, 17, 11, 25])
fahrenheit = (temperatures * 1.8) + 32
fahrenheit

array([82.4, 62.6, 51.8, 77. ])

---
## 3. Universal Functions (ufuncs) and Broadcasting
    

### 3.1 What are Universal Functions (ufuncs)?
**Universal Functions**, or **ufuncs**, are NumPy functions that operate on arrays in an **element-by-element** fashion. They are highly optimized for speed.

The standard arithmetic operators (`+`, `-`, `*`, `/`) are simply overloaded ufuncs.

**Common ufuncs (examples):**
- Arithmetic: `np.add`, `np.subtract`, `np.multiply`, `np.divide`
- Mathematical: `np.sin`, `np.exp`, `np.log`, `np.sqrt`


In [None]:
# Create two arrays
A = np.array([1, 2, 3])
B = np.array([5, 4, 3])

# Using the * operator (shortcut for np.multiply)
multiplication_op = A * B

# Using the explicit ufunc
multiplication_ufunc = np.multiply(A, B)

print("A * B (operator):", multiplication_op)
print("np.multiply(A, B) (ufunc):", multiplication_ufunc)

A * B (operator): [5 8 9]
np.multiply(A, B) (ufunc): [5 8 9]


### 3.2 Broadcasting
**Broadcasting** is a mechanism that allows ufuncs to perform operations on arrays of **different shapes**. The smaller array is automatically expanded (or "stretched") to match the shape of the larger one, without actually copying the data.

**Example:** Adding a 1D vector to every row of a 2D matrix.

In [40]:
#Create a 1D vector
vector_1D = np.array([1, 2, 3,4])
#Create a 2D matrix
matrix_2D = np.array([[1, 2, 3,4],[5, 6, 7,8]])
#Sum the 1D vector to each row of the 2D matrix using broadcasting
np.add(vector_1D, matrix_2D)

array([[ 2,  4,  6,  8],
       [ 6,  8, 10, 12]])

### **Exercise 3: Broadcasting Subtraction**
Create a 4x2 matrix with random values between 1 and 10. Subtract the scalar value 5 from every element in the array.

In [41]:
matrix = np.random.randint(1, 11, size=(4, 2))
print("Original Matrix:")
print(matrix, "\n")
matrix_minus_5 = matrix - 5
print("Matrix after subtracting 5:")
print(matrix_minus_5)

Original Matrix:
[[6 3]
 [8 6]
 [1 3]
 [5 6]] 

Matrix after subtracting 5:
[[ 1 -2]
 [ 3  1]
 [-4 -2]
 [ 0  1]]


---
## 4. Array Manipulation and Aggregation

### 4.1 Indexing and Slicing
Accessing elements in a NumPy array is similar to Python lists, but extends to multiple dimensions.

In [42]:
matrix_2D

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [43]:
matrix_2D[0,:] # First row

array([1, 2, 3, 4])

In [44]:
matrix_2D[:2,:2] # First two rows and first two columns

array([[1, 2],
       [5, 6]])

### 4.2 Aggregation Functions
NumPy provides functions to compute statistics on arrays, often specifying an `axis` to aggregate over (e.g., `axis=0` for columns, `axis=1` for rows).

In [47]:
random_matrix

array([[5, 5],
       [3, 0],
       [1, 9],
       [2, 7]])

In [None]:
random_matrix.sum(axis=0) # Sum of each column

array([11, 21])

In [48]:
random_matrix.sum(axis=1) # Sum of each row

array([10,  3, 10,  9])

In [49]:
np.mean(random_matrix, axis=0) # Mean of each column

array([2.75, 5.25])

In [50]:
np.mean(random_matrix, axis=1) # Mean of each row

array([5. , 1.5, 5. , 4.5])

### 4.3 Transpose and Diagonal

The **transpose** of a matrix, $A^T$, flips the matrix over its main diagonal. Use the `.T` attribute (concise) or `np.transpose()`.

The **diagonal** of a 2D array, consisting of elements $a_{i, i}$, is extracted using `np.diagonal()`. The `offset` parameter controls which diagonal is returned.

In [51]:
random_matrix.T

array([[5, 3, 1, 2],
       [5, 0, 9, 7]])

In [53]:
np.diagonal(random_matrix)

array([5, 0])

### 4.4 Boolean Masking

**Boolean Masking** is a powerful technique for selecting or modifying a subset of array elements based on a condition. The process involves two steps:
1.  **Create a Boolean Array (the Mask):** An array of `True`/`False` values is generated by comparing the array against a condition.
2.  **Apply the Mask:** The boolean array is used to index the original array, returning only the elements corresponding to `True` values.


In [None]:
testing_array = np.array([[1,2,3,4,5]]) 
testing_array > 3 #boolean array

array([[False, False, False,  True,  True]])

In [None]:
testing_array[testing_array > 3] #applying the mask to get values greater than 3

array([4, 5])

### 4.5 Fancy Indexing

Fancy Indexing is a NumPy technique that uses arrays of integers or boolean arrays to select arbitrary subsets of data. Unlike standard slicing, which only selects contiguous blocks of data, fancy indexing allows you to:

    1. Select non-contiguous elements.
    2. Select elements in a custom, non-sorted order.

In [68]:
test_scores = np.array(
    [[75, 49, 98, 86],
    [85, 48, 48, 66],
    [79, 83, 99, 54],
    [45, 33, 23, 59],
    [73, 83, 93, 63]]
)

students_indices= np.array([0, 1, 4]) #indices of the students we want to select
test_indices = np.array([1, 2]) #indices of the tests we want to select

test_scores[students_indices[:,np.newaxis], test_indices] #select specific students and tests

array([[49, 98],
       [48, 48],
       [83, 93]])

### **Exercise 4: Filtering with Masking**
Given the array `scores`, use boolean masking to select and print only the scores that are greater than or equal to 75.

In [70]:
#Boolean mask for scores >= 75
mask = test_scores >= 75

# Apply mask to select only scores >= 75
high_scores = test_scores[mask]
high_scores

array([75, 98, 86, 85, 79, 83, 99, 83, 93])

---
## 5. More capabilities


#### 5.1 Linear Algebra

Numpy is commonly used for linear algebra, an application of which is solving linear equations.

In [None]:
# CODE HERE

#### 5.2 Random Numbers and Statistics

Numpy has several built-in utilities for statistics, including for the generation of distributions and random numbers. Here, we will perform a stock return simulation.

In [None]:
# CODE HERE

---
## 6. Image Processing with NumPy

In this exercise, we will use NumPy"s array manipulation and ufuncs to convert a three-color (RGB) image into a single-channel greyscale image. 

### **The Task**

1.  Load the provided three-color image data from the binary file **`nvddt.npy`** using `np.load()`.
2.  Define a function to convert the color image (shape `(H, W, 3)`) to a greyscale image (shape `(H, W)`). Use the **weighted average** formula:
    $$\text{Grayscale} = 0.2989 \cdot R + 0.5870 \cdot G + 0.1140 \cdot B$$
    Use NumPy"s vectorization for this calculation.
3.  Plot the resulting greyscale image using `plt.imshow()`, specifying `cmap="gray"`.

In [None]:
# CODE HERE