version : **v 2.0.0.rc2**  
author : G. Fuhr  
date : 27/09/2025 

Changelog :
- v 2.0.0.rc
     - adding some comments in indexing section, in particular fancy indexing
     - python code section formatting
     - added final challenges

- v 2.0.0.rc1
    - modify exercices instructions for fancy indexing
 
- v 2.0.0.rc2
    - typo
    - add some resume at end of sections
    - correct some exercices 

# Python and Data Representations

#### Load usual modules for data manipulations

In [71]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mp

# I Introduction to NumPy for Data Science

---

## I.1. Why NumPy?

In Python, we can represent collections of numbers with **lists**. However, lists are not very efficient for numerical computations. NumPy (Numerical Python) provides a special object called an **array** which is:

- Much faster than lists for numerical operations.
- Provides many built-in mathematical operations.
- Allows easy creation of vectors, matrices, and higher-dimensional data structures.

Let's get started!

---

## I.2. Importing NumPy

```python
import numpy as np  # by convention, we import numpy as np
```

---
## I.3. Creating NumPy Arrays

There are multiple ways to create arrays.

### From Python lists
```python
# 1D array (vector)
arr1d = np.array([1, 2, 3, 4, 5])
print("1D array:", arr1d)
print("Type:", type(arr1d))

# 2D array (matrix)
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
print("\n2D array:")
print(arr2d)

# 3D array (tensor)
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print("\n3D array of dimension:")
print(arr3d.shape)
print("\n3D array:")
print(arr3d)

# 4D array (4D tensor)
arr4d = np.random.randint(10, size=(2, 4, 5, 6))
print("\n4D array of dimension:")
print(arr4d.shape)
print("\n4D array:")
print(arr4d)

```

Output : 
```python
1D array: [1 2 3 4 5]
Type: <class 'numpy.ndarray'>

2D array:
[[1 2 3]
 [4 5 6]]

3D array of dimension:
(3, 2, 3)

3D array:
[[[1 2 3]
  [4 5 6]]

 [[1 2 3]
  [4 5 6]]

 [[1 2 3]
  [4 5 6]]]

4D array of dimension:
(2, 4, 5, 6)

# Please note that since values in arr4d are random int, printed values will differ if you execute the code in a cell
4D array:
[[[[4 4 9 1 9 9]
   [8 6 2 8 6 4]
   [8 8 1 2 7 9]
   [1 9 4 8 5 4]
   [8 4 2 7 8 7]]

  [[9 7 3 8 8 4]
   [7 6 4 3 4 0]
   [8 2 1 3 6 0]
   [1 4 5 2 8 2]
   [4 4 3 1 1 8]]

  [[1 2 1 8 1 8]
   [2 2 2 3 7 4]
   [9 9 2 8 0 5]
   [2 0 8 2 4 0]
   [6 3 5 0 5 2]]

  [[0 9 2 2 4 4]
   [5 3 4 5 6 8]
   [3 6 1 8 0 7]
   [2 7 0 9 7 5]
   [0 0 3 4 5 6]]]


 [[[6 4 8 7 5 5]
   [3 4 8 7 3 0]
   [9 9 8 2 2 5]
   [9 4 4 4 7 0]
   [0 3 8 8 7 0]]

  [[4 3 6 2 6 7]
   [9 7 8 2 7 3]
   [8 5 4 4 8 0]
   [6 6 4 0 6 9]
   [1 1 5 5 6 2]]

  [[8 4 6 0 6 3]
   [1 1 3 0 2 1]
   [3 2 1 9 2 7]
   [7 6 4 9 6 3]
   [8 8 3 0 7 0]]

  [[0 3 4 6 1 6]
   [0 3 0 5 1 4]
   [1 1 3 9 7 8]
   [2 4 3 4 4 6]
   [8 3 9 7 1 0]]]]
```

### Special arrays
```python
# Array of zeros
zeros = np.zeros((3, 4))
print("\nArray of zeros:")
print(zeros)

# Array of ones
ones = np.ones((2, 5))
print("\nArray of ones:")
print(ones)

# Identity matrix
identity = np.eye(4)
print("\nIdentity matrix:")
print(identity)

# Array with random values
rand_arr = np.random.rand(2, 3)
print("\nRandom array:")
print(rand_arr)
```
Output : 
```python
Array of zeros:
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]

Array of ones:
[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]

Identity matrix:
[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]

Random array:
[[0.53932036 0.25805738 0.93616974]
 [0.53823472 0.87859448 0.99116551]]
```

---

### I.3.1. Array Dimensions and Shapes
```python
print("arr1 dimensions:", arr1d.ndim)
print("arr1 shape:", arr1d.shape)
print("arr2 dimensions:", arr2d.ndim)
print("arr2 shape:", arr2d.shape)
print("arr3 dimensions:", arr3d.ndim)
print("arr3 shape:", arr3d.shape)
print("arr4 dimensions:", arr4d.ndim)
print("arr4 shape:", arr4d.shape)
```

- `ndim` → number of dimensions
- `shape` → tuple with the size along each dimension

---

### I.3.2. Indexing and Slicing

NumPy indexing serves two purposes:
1. **Reading data**: Extract values to examine or use in calculations
2. **Writing data**: Modify values directly in the array

Let's explore both sides systematically.

#### Reading Data with Indexing
```python
arr1d = np.array([1, 2, 3, 4, 5])
arr2d = np.array([[1, 2, 3], [4, 5, 6]])

# Single elements
print("First element:", arr1d[0])          # → 1
print("Element at (1,2):", arr2d[1, 2])   # → 6

# Slices (ranges of elements)
print("First three:", arr1d[1:4])         # → [2, 3, 4]
print("First row:", arr2d[0, :])          # → [1, 2, 3]
print("Second column:", arr2d[:, 1])      # → [2, 5]
```

#### Writing Data with Indexing
Now here's the powerful part: the same indexing syntax works for assignment.

```python
# Modify single elements
arr1d[0] = 99
print("Modified:", arr1d)                 # → [99, 2, 3, 4, 5]

# Modify ranges
arr1d[1:4] = 8
print("After slice assignment:", arr1d)   # → [99, 8, 8, 8, 5]

# Modify 2D selections
arr2d[0, :] = 42                         # Set entire first row to 42
print("Modified 2D:\n", arr2d)
```

The pattern: Whatever you can read with indexing, you can also write to with assignment.
Why This Matters
This dual nature makes NumPy incredibly powerful for data manipulation:

Data cleaning: `arr[arr < 0] = 0` (set negative values to zero)
Feature engineering: `arr[:, 0] *= 2` (double the first column)
Conditional updates: `arr[arr > threshold] = threshold` (cap extreme values)

Understanding that indexing is both a "getter" and a "setter" opens up many data processing possibilities.

---

## 1. Indexing for Extraction vs Modification

- **Extraction**: Selecting elements to read their values.
- **Modification**: Selecting elements to change their values directly in the array.

---

### ✅ What You've Learned
- How to create NumPy arrays from Python lists
- The difference between 1D, 2D, 3D, and 4D arrays
- How to use special array creation functions: `np.zeros()`, `np.ones()`, `np.eye()`, `np.random.rand()`
- How to check array properties: `.shape`, `.ndim`
- Basic indexing with square brackets: `arr[0]`, `arr[1, 2]`
- The difference between extracting values and modifying values through indexing

**Key Concept**: NumPy arrays are faster and more powerful than Python lists for numerical operations.

### I.3.3 Exercises

Try solving these small exercises:

1. Create a 1D array of numbers from 10 to 19 named vec_1d.
2. Create a 1D array of numbers from 10 to 199 with step of 5 named vec_1d_step and print the shape and last value.
3. Create a 3×3 array filled with the number 7 named matrix.
4. Create a 1D array with 50 evenly spaced numbers between 0 and 5 named vec_1d_50.
5. Access the center element of a 9×9 identity matrix named identity_3d.

---

✅ You now know how to:
- Create NumPy arrays (from lists, ranges, special functions).
- Check dimensions and shapes.
- Index and slice arrays.


**do exercice in the following cell**

(38,)

**run this cell to check if your exercice is correct**

In [43]:
assert vec_1d.shape==(10,), "shape for vec_1d is wrong"
assert np.array_equal(vec_1d, np.asarray([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])), "vec_1d does not contain"
assert vec_1d_step.shape==(38,), "shape for vec_1d_step is wrong"
assert vec_1d_step[-1]==195, "vec_1d_step does not match expected range"
assert matrix.shape==(3,3), "shape for matrix is wrong"
assert matrix.sum().sum()==63
assert vec_1d_50.min()==0
assert vec_1d_50.max()==5
assert vec_1d_50.shape==(50,)

AssertionError: shape for vec_1d_step is wrong

## 🔍 Before You Continue - Quick Check
Can you do these without looking back?
1. Create a 2D array with shape (3, 4)
2. Access the element in the second row, third column
3. Change the first element of an array to 99
4. Explain why `arr.shape` shows dimensions in the order it does

If any of these feel unclear, review Section I.3 before continuing.

## I.4 Understanding the `:` Operator in NumPy Slicing

The colon `:` is used to select a **range of elements** in arrays.  
It works similarly to Python lists, but it becomes much more powerful with NumPy’s multi-dimensional arrays.

---

### General Syntax
**start:end:step**


- **start** → index where the slice begins (included)  
- **end** → index where the slice stops (excluded!)  
- **step** → how many elements to skip  

If you leave one of them empty, NumPy uses a **default value**:
- Empty start → begin from the start of the axis (index `0`)  
- Empty end → go all the way to the end of the axis  
- Empty step → default step is `1`

---

#### Examples with 1D Arrays
```python
import numpy as np

arr = np.array([10, 20, 30, 40, 50, 60])
print("Array:", arr)

# 1. Slice from index 1 to 4 (excluding 4)
print("arr[1:4] ->", arr[1:4])   # [20, 30, 40]

# 2. Slice from start to index 3 (exclusive)
print("arr[:3] ->", arr[:3])     # [10, 20, 30]

# 3. Slice from index 2 to the end
print("arr[2:] ->", arr[2:])     # [30, 40, 50, 60]

# 4. Whole array (shallow copy)
print("arr[:] ->", arr[:])       # [10, 20, 30, 40, 50, 60]

# 5. With step: every 2 elements
print("arr[::2] ->", arr[::2])   # [10, 30, 50]

# 6. Reverse array
print("arr[::-1] ->", arr[::-1]) # [60, 50, 40, 30, 20, 10]
```

#### Examples with 2D Arrays
```python
mat = np.arange(1, 13).reshape(3, 4)
print("Matrix:\n", mat)
```

#### 1. Slice first 2 rows
```python
print("mat[:2, :] ->\n", mat[:2, :])
```

#### 2. Slice last 2 columns
```python
print("mat[:, -2:] ->\n", mat[:, -2:])
```

#### 3. Every second row
```python
print("mat[::2, :] ->\n", mat[::2, :])
```

#### 4. Submatrix (rows 0-1, cols 1-3)
```python
print("mat[0:2, 1:4] ->\n", mat[0:2, 1:4])
```




### Negative Indexing and Negative Step

Python and NumPy also support **negative indices**, which count from the end of the axis:
- `-1` → last element
- `-2` → second-to-last element

This applies to **start** and **end** positions in slices, not only for the `step`.  
Additionally, if `step` is negative, you traverse the array backwards.

**Key rules:**
- Negative indices are converted to `len(axis) + index` internally.
- When using a negative step, make sure the `start` index is greater than the `end` index, otherwise the result is empty.


### ✅ What You've Learned
- The `:` operator syntax: `start:end:step`
- How empty positions use defaults: `[:3]` = "from start to 3", `[2:]` = "from 2 to end"
- Negative indices count from the end: `[-1]` = last element, `[-2:]` = last two elements
- Negative steps reverse direction: `[::-1]` reverses the entire array
- How slicing works across multiple dimensions: `arr[1:3, 0:2]`

**Key Concept**: Slicing creates "views" of your data - you're looking at the same memory, just through different windows.

---

#### Examples with 1D Arrays
```python
import numpy as np

arr = np.array([10, 20, 30, 40, 50, 60])
print("Array:", arr)

# Basic slices
print("arr[1:4] ->", arr[1:4])     # [20, 30, 40]
print("arr[:3] ->", arr[:3])       # [10, 20, 30]
print("arr[2:] ->", arr[2:])       # [30, 40, 50, 60]
print("arr[:] ->", arr[:])         # [10, 20, 30, 40, 50, 60]

# Step
print("arr[::2] ->", arr[::2])     # [10, 30, 50]
print("arr[::-1] ->", arr[::-1])   # reversed [60, 50, 40, 30, 20, 10]

# Negative indices
print("arr[-1] ->", arr[-1])       # 60 (last element)
print("arr[-3] ->", arr[-3])       # 40 (third last)
print("arr[-3:] ->", arr[-3:])     # [40, 50, 60]
print("arr[:-2] ->", arr[:-2])     # [10, 20, 30, 40]
print("arr[1:-1] ->", arr[1:-1])   # [20, 30, 40, 50]
print("arr[-5:-1] ->", arr[-5:-1]) # [20, 30, 40, 50]

# Negative step with explicit start:end
print("arr[4:1:-1] ->", arr[4:1:-1]) # [50, 40, 30]
```

---

#### Examples with 2D Arrays
```python
mat = np.arange(1, 13).reshape(3, 4)
print("Matrix:\n", mat)

# Row slicing
print("mat[:2, :] ->\n", mat[:2, :])     # first 2 rows
print("mat[::2, :] ->\n", mat[::2, :])   # every second row
print("mat[::-1, :] ->\n", mat[::-1, :]) # all rows in reverse row order

# Column slicing
print("mat[:, -2:] ->\n", mat[:, -2:])   # last 2 columns
print("mat[:, -1] ->", mat[:, -1])       # last column (1D)

# Combining row and column slices
print("mat[:2, 1:4] ->\n", mat[:2, 1:4]) # submatrix with first 2 rows and columns 1 to 3 included
print("mat[:2, -2:] ->\n", mat[:2, -2:]) # top-right 2x2

# Single elements with negative indices
print("mat[-1, :] ->", mat[-1, :])       # last row
print("mat[:, -1] ->", mat[:, -1])       # last column
```

---

**📝 Exercises: Practice with `:` and Negative Indices**

1. Create a 1D array `[0, 1, 2, …, 15]`.  
   - Extract the first 5 elements.  
   - Extract the last 5 elements (use negative indexing).  
   - Extract every 3rd element.  
   - Reverse the array using a negative step.  
   - Take the slice from index 3 up to (but excluding) index -3.  

2. Create a `4×5` array with numbers `0 → 19`.  
   - Select the first 2 rows and last 3 columns.  
   - Select all even-indexed rows.  
   - Select the submatrix consisting of rows 1–2 and columns 2–4.  
   - Select the entire last column.  
   - Select the top-right `2×2` block using negative indexing.  

3. Small puzzles to build intuition:  
   - For `arr = np.arange(10)`, what does `arr[7:2:-1]` return? Explain why.  
   - For `arr = np.arange(10)`, what does `arr[-2:2:-1]` return? Try it and reason about start/end/step.  

## I.6. Element Indexing

---
### Indexing one element in 4D
```python
print("Element at [1, 2, 3, 4] =", arr4d[1, 2, 3, 4])
```

### Slicing across dimensions
```python
print("All elements at first index in dimension 0:", arr4d[0])
print("Shape:", arr4d[0].shape)
```
**Explanation:**
- A 4D array is harder to imagine, but conceptually it’s just an extension of the table idea.
- Each number in the brackets `[...]` selects an index along one axis.
- `arr4d[1, 2, 3, 4]` means: go to the 2nd block (axis 0 = 1), then the 3rd slice (axis 1 = 2), then the 4th row (axis 2 = 3), then the 5th column (axis 3 = 4).
- Think of it as a **nested list**: each additional axis is one more level of nesting.


---


## I.7. Boolean Indexing


You can select elements based on conditions.


```python
arr = np.array([5, 12, 18, 25, 30, 40])

# Extract all values greater than 20
print(arr[arr > 20])        # -> [25 30 40]

# Modify all values greater than 20
arr[arr > 20] = -1
print(arr)                  # -> [ 5 12 18 -1 -1 -1]



# Directly in one line
print("Even values:", arr[arr % 2 == 0])
```

**Explanation:**
- A boolean mask is an array of `True`/`False` values.
- Wherever the mask is `True`, the corresponding element is selected.
- This is more powerful than lists, because NumPy applies the condition to the whole array at once.


For multi-dimensional arrays:
```python
print("\narr2 =\n", arr2d)
mask2 = arr2d > 3
print("Boolean mask:\n", mask2)
print("Values greater than 3:", arr2d[mask2])

# Modify all values greater or equal 2, replacing them with -1
arr2d[arr2d >= 2] = -1
print(mat)
```
**Explanation:**
- In 2D or higher, the mask has the same shape as the array.
- When applied, all values where the condition is true are flattened into a 1D result.


### ✅ What You've Learned
- How to access single elements in multi-dimensional arrays
- Boolean indexing: using conditions to filter arrays (`arr[arr > 5]`)
- Boolean masks are arrays of True/False values
- How boolean indexing flattens results into 1D arrays
- The power of vectorized comparisons: `arr > 5` applies to all elements at once

**Key Concept**: Boolean indexing lets you select data based on conditions, not just positions.

---

**Warm-up Exercises: Boolean Indexing**

- Create a 1D array of integers from 0 → 19.
    - Extract all even numbers.
    - Replace all numbers divisible by 3 with -99.

- Create a 5×5 array with values 0 → 24.
    - Extract all numbers greater than 15.
    - Replace all odd numbers with 0.


## 🔍 Before You Continue - Quick Check
Make sure you understand:
1. Basic slicing: `arr[1:3]` vs `arr[1]`
2. 2D slicing: `arr[0:2, 1:3]`
3. Boolean indexing: `arr[arr > 5]`
4. The difference between a view and a copy

**Test yourself**: If `arr = np.array([[1,2,3], [4,5,6]])`, what does `arr[:, 1]` return?
**Answer**: `[2, 5]` - the second column as a 1D array.

## I.8 Fancy Indexing (Using Lists or Arrays of Indices)

Fancy indexing means selecting elements using a **list/array of indices** instead of a single index or slice. Think of it as "I want these specific positions."


It works for:
- **1D arrays** (pick specific positions).
- **2D arrays** (pick rows/columns or individual coordinates).
- **3D+ arrays** (extend the same logic across higher dimensions).

---

### 1D Example - Building Intuition
```python
arr = np.array([10, 20, 30, 40, 50])
print("Original:", arr)

# Instead of one position: arr[2] → 30
# We can ask for multiple positions:
print(arr[[0, 2, 4]])  # → [10, 30, 50]
print(arr[[1, 1, 3]])  # → [20, 20, 40]  # Notice: we can repeat!
```
Key insight: `[0, 2, 4]` is a list of index positions. NumPy goes to each position and collects the values.


### I.8.2 The Critical Understanding for 2D+ Arrays

When you move to higher dimensions, **each dimension needs its own list of indices**.
```python
mat = np.arange(10, 100, 10).reshape(3, 3)
print("Matrix:\n", mat)
# [[10 20 30]
#  [40 50 60]
#  [70 80 90]]

# Get elements at positions (0,0), (1,1), (2,2) - the diagonal
print(mat[[0, 1, 2], [0, 1, 2]])  # → [10, 50, 90]
```

What's happening here?
- First list `[0, 1, 2]` controls the row indices
- Second list `[0, 1, 2]` controls the column indices
- NumPy pairs them up: `(0,0), (1,1), (2,2)`

**This is NOT going to position (0,1,2) in 3D space!**

### Example: 2D Fancy Indexing
```python
mat = np.arange(10, 100, 10).reshape(3, 3)

# Extract elements at positions (0,0), (1,1), (2,2)
print(mat[[0, 1, 2], [0, 1, 2]])   # diagonal -> [10, 50, 90]

# Modify them
mat[[0, 1, 2], [0, 1, 2]] = 0
print(mat)
```

👉 Fancy indexing creates a copy of the data (unlike slicing).
When modifying via fancy indexing, changes apply to the selected copy, but if you assign directly into the selection, NumPy updates the original array.
More details will be given at the end of the section.

### I.8.3 The Most Common Confusion (3D and 4D)

Let's tackle the example that trips up most python beginners:
```python
tensor = np.arange(1, 49).reshape(2, 3, 4, 2)  # 4D array
print("Shape:", tensor.shape)  # (2, 3, 4, 2)

# What does this do?
result = tensor[[0,0,0], [1,1,1]]
print("Result shape:", result.shape)  # (3, 4, 2)
```

Breaking it down:
- `[0,0,0]` means "I want index 0 from axis-0, three times"
- `[1,1,1]` means "for each of those, I want index 1 from axis-1"
- We're left with all of axis-2 and axis-3 (the 4, 2 part)
- Since we asked 3 times, we get 3 copies: shape (3, 4, 2)

**The result is: tensor `[0,1,:,:]` repeated 3 times.**

### I.8.4 Practical Examples
```python
# Example: Extract specific "pages" from a 3D array
cube = np.arange(24).reshape(2, 3, 4)

# Get the first page twice and the second page once
pages = cube[[0, 0, 1]]  # Shape: (3, 3, 4)
print("Selected pages shape:", pages.shape)

# Example: Extract diagonal elements efficiently
matrix = np.arange(16).reshape(4, 4)
diagonal_indices = [0, 1, 2, 3]
diagonal = matrix[diagonal_indices, diagonal_indices]  # Much cleaner than a loop!
```


### I.8.5 Example: 4D Fancy Indexing

Fancy indexing generalizes: you need one index array per dimension.

```python
tensor = np.arange(1, 81).reshape(2, 2, 4, 5)

# Extract elements at positions:
# (0,0,0,0), (1,1,1,1)
print(tensor[[0, 1], [0, 1], [0, 1], [0, 1]])  # -> [ 1, 42]

# Modify them
tensor[[0, 1], [0, 1], [0, 1], [0, 1]] = 999
print(tensor)

# Note the difference with 
print(tensor[[0,0,0,0], [1,1,1,1]]) # -> 
[[[21 22 23 24 25]
  [26 27 28 29 30]
  [31 32 33 34 35]
  [36 37 38 39 40]]

 [[21 22 23 24 25]
  [26 27 28 29 30]
  [31 32 33 34 35]
  [36 37 38 39 40]]

 [[21 22 23 24 25]
  [26 27 28 29 30]
  [31 32 33 34 35]
  [36 37 38 39 40]]

 [[21 22 23 24 25]
  [26 27 28 29 30]
  [31 32 33 34 35]
  [36 37 38 39 40]]]

#  Note that in that case the key insight is that [0,0,0,0] in fancy indexing means "I want 4 times index 0 along the first axis" and [1,1,1,1] means that for each subset with index 0 along the first axis, I want 4 times the associated index 1 along the second axis. Since we're asking for tensor[0,1] four times, and tensor[0,1] contains all values along dimensions 3 and 4 (a 2D matrix), we get 4 copies of that same 2D matrix.
```

👉 The principle is the same:

Each dimension gets its own index array.

The arrays must have the same shape, so NumPy can pair them element-by-element.



### ✅ What You've Learned
- Fancy indexing uses lists/arrays of indices instead of single numbers
- `arr[[0, 2, 4]]` selects elements at positions 0, 2, and 4
- For 2D+ arrays, you need one index list per dimension you want to specify
- **Critical insight**: `arr[[0,0,0]]` means "give me index 0 three times" not "go to position (0,0,0)"
- Fancy indexing creates copies, while slicing creates views
- You can assign through fancy indexing: `arr[[1,3]] = 0`


### ⚠️ Important: Fancy Indexing vs Slicing (Copy vs View)

One of the **biggest differences** between slicing and fancy indexing is how NumPy treats memory.

- **Slicing** → returns a **view** (a window) on the original array.  
  If you modify the slice, you modify the original array.

- **Fancy indexing** → returns a **copy** of the data.  
  If you modify the result of fancy indexing, the original array stays unchanged.  
  However, if you use fancy indexing **on the left side of an assignment**, NumPy updates the original array.


---

### Case 1: Slicing (View)
```python
arr = np.array([10, 20, 30, 40, 50])

slice_view = arr[1:4]   # elements 20, 30, 40
slice_view[:] = 0       # modify the slice

print(slice_view)       # -> [0, 0, 0]
print(arr)              # -> [10, 0, 0, 0, 50]  <-- original changed!
```
👉 Slicing does **not** copy the data; it’s a reference (view).

### Case 2: Fancy Indexing (Copy)
```python
arr = np.array([10, 20, 30, 40, 50])

fancy_copy = arr[[1, 2, 3]]   # elements 20, 30, 40
fancy_copy[:] = 0             # modify the fancy-indexed result

print(fancy_copy)             # -> [0, 0, 0]
print(arr)                    # -> [10, 20, 30, 40, 50]  <-- original unchanged!
```

👉 Fancy indexing **creates a copy**. The original array is safe.

### Case 3: Fancy Indexing on Left Side (Assignment)
```python
arr = np.array([10, 20, 30, 40, 50])

# Assign directly using fancy indexing
arr[[1, 2, 3]] = 0

print(arr)   # -> [10, 0, 0, 0, 50]
```

👉 In this case, NumPy **updates the original array**, because we are assigning into it.
It doesn’t return a copy — it writes the new values directly at those positions.

👉 Fancy indexing **on the left-hand side** updates the selected elements directly.

---

### 📋 Summary Table

| Operation                          | Behavior                           | Modifies Original? |
|-----------------------------------|------------------------------------|---------------------|
| `slice_view = arr[1:4]`           | Returns a **view** (points to data) | ✅ Yes (if modified) |
| `fancy_copy = arr[[1,2,3]]`       | Returns a **copy** (new array)      | ❌ No |
| `arr[[1,2,3]] = new_values`       | Assignment via fancy indexing       | ✅ Yes (updates directly) |

---

✅ **Key takeaway**:  
- **Slicing → view → modifies original**.  
- **Fancy indexing → copy → does NOT modify original** (unless used in assignment).  


**📝 Warm-up Exercises: Copy vs View**
- Create a 1D array named `arr_a` containing values `[0, 10, 20, 30, 40, 50]`.
    - Use slicing to extract `[10, 20, 30]` into a variable dummy_var_1.
    - Change all values in this variable to -1.
    - Print the original array and this variable. What happened?

- Create again a 1D array named `arr_b` containing values `[0, 10, 20, 30, 40, 50]`.
    - use fancy indexing (`arr_b[[1,2,3]]`) to store result into a variable dummy_var_2
    - Change all values to $-1$ in `dummy_var_2`.
    - Print the original array. What’s different?

- Finally, use fancy indexing **on the left-hand side** of the expression to directly set indices `[1,2,3]` to -1.
    - Print the original array. What’s the result now?
  

    
**📝 Warm-up Exercises: Fancy Indexing in 3D+**

- Create a 3×3×3 cube `(np.arange(27).reshape(3,3,3))`.
    - Extract the elements `(0,0,0), (1,1,1), (2,2,2)`.
    - Replace them with -1.
- Create a 2×3×4 array.
    - Extract the first element of each 2D submatrix (positions (0,0,0) and (1,0,0)).
    - Replace them with 100.
- Create a 2×2×4×5 array.
    - Extract elements at `(0,0,0,0)` and `(1,1,1,1)`.
    - Replace them with -99.

## I.9. Finding Values (where)


Use `np.where` to find indices that match a condition.


```python
indices = np.where(arr1d > 2)
print("Indices where arr1 > 2:", indices)
print("Values:", arr1d[indices])


indices2 = np.where(arr2d % 2 == 1)
print("\nIndices of odd numbers in arr2:", indices2)
print("Values:", arr2d[indices2])
```


For 4D arrays:
```python
indices4d = np.where(arr4d % 10 == 0)
print("\nIndices in arr4d where value is multiple of 10:")
print(indices4d)


print("Corresponding values:", arr4d[indices4d])
```
**Explanation:**
- `np.where` returns the **indices** where a condition holds.
- The indices are grouped by axis. For example, in 2D:
- The first array corresponds to row indices.
- The second array corresponds to column indices.
- Applying these indices to the array retrieves the corresponding values.

## ✅ What You've Learned
- `np.where()` finds indices where conditions are true
- How to combine different indexing methods for complex data selection
- The relationship between array dimensions and statistical operations
- How to apply these concepts to solve real data analysis problems

**Key Concept**: NumPy indexing is a powerful toolkit - the right combination of slicing, boolean indexing, and fancy indexing can solve most data selection problems.

---



## I.10. Extra Exercises


### I.10.1 1D indexing and slicing
1. Create a 1D array of integers from 0 to 29. Extract every 3rd element starting at index 2.
2. Reverse the array using slicing.


### I.10.2 2D slicing
3. Create a 5×5 array filled with numbers from 0 to 24.
4. Extract the 2×2 subarray from the center.
5. Extract the last row and the last column.


### I.10.3  Boolean indexing in 2D
6. Create a 4×4 array with integer values from 10 to 25.
7. Select all elements that are multiples of 3.
8. Replace all values greater than 20 with 0.


### I.10.4  Fancy indexing
9. Create an array of shape (6,) and extract the elements at indices `[0, 2, 4]`.
10. For a 3×3 array, extract the diagonal elements using fancy indexing. 
11. For a 10×10 array, extract the diagonal elements using fancy indexing. (the arange function might help here)

### I.10.5  `np.where` practice
11. Create a 1D array with 50 random integers between 0 and 100. Find the indices of all numbers greater than 80.
12. Find the indices of all even numbers and print the corresponding values.


### I.10.6 4D indexing challenge
13. Create an array of shape (2, 3, 4, 5) filled with numbers from 0 to 119.
14. Extract the entire “slice” at indices `[0, 1, :, :]`.
15. From that same array, extract all values where the last index is equal to 2.
16. Flatten the array and find the first 10 numbers greater than 50.


---

### I.10.7  1D Arrays
17. Create a 1D array of integers from 100 down to 81. Extract every second element in reverse order.
18. Create an array of 20 random integers between 0 and 50. Replace all numbers smaller than 10 with -1.
19. Create an array of 15 elements and set all values at odd indices to 0.

### I.10.8  2D Arrays
20. Create a 6×6 array of values from 0 to 35. Extract the border (first and last rows, first and last columns).
21. From the same 6×6 array, extract the 3×3 block in the top-right corner.
22. Create a 4×4 array filled with random integers between 1 and 9. Replace all numbers less than 5 with 0 using boolean indexing.

### I.10.9 Boolean and Fancy Indexing
23. Create an array of 12 random integers between 1 and 100. Extract all values that are divisible by both 2 and 3.
24. Create a 1D array of 10 values, then select the elements at positions [1, 3, 5, 7, 9].
25. In a 5×5 array filled with consecutive numbers, extract the anti-diagonal (from top-right to bottom-left).

### I.10.10 Multi-dimensional Arrays
26. Create a 3D array of shape (3, 3, 3) filled with values from 0 to 26. Extract the “middle slice” along the second axis.
27. From the same 3D array, extract all values greater than 10 and less than 20.
28. Create a 4D array of shape (2, 2, 3, 3). Extract all the values from the last “block” (index [1, 1, :, :]).

### I.10.11 Searching and np.where
29. Create a 1D array of 30 random integers between 0 and 100. Find the indices of all values between 40 and 60.
30. Create a 2D array of shape (5, 5) filled with numbers from 0 to 24. Use np.where to find all indices where the values are odd.
31. Create a 3D array of shape (2, 3, 4). Find the indices of all numbers divisible by 5.


---


### I.10.12 🎯 Mini Challenge: Slicing and Extracting from a 4D Array


Let’s practice slicing on higher dimensions (4D arrays).


```python
# Create a 4D array: shape (3, 4, 5, 6)
data = np.arange(3*4*5*6).reshape(3, 4, 5, 6)
print("Data shape:", data.shape)
```


#### Tasks
1. Extract the first 2 blocks along the **first axis** (axis=0).
2. From each block, take only the **last 2 rows** along axis=1.
3. From each row, extract the **middle 3 columns** (axis=2).
4. From the result, keep only **even-indexed elements** along the last axis.
5. Create a sub-dataset consisting of all blocks but in **reverse order** along the first axis.
6. Try combining steps: extract a `(2, 2, 3, 3)` subarray from the original `data` using slicing only.


👉 This exercise helps you think about slicing in higher dimensions, which is crucial in data science when working with tensors (e.g., images, sequences, deep learning data).


---

In [45]:
# I.10.1

In [None]:
# I.10.2

In [None]:
# I.10.3

In [None]:
# I.10.4

In [None]:
# I.10.5

In [None]:
# I.10.6

In [None]:
# I.10.7

In [None]:
# I.10.8

In [None]:
# I.10.9

In [None]:
# I.10.10

In [68]:
# I.10.11

In [69]:
# I.10.12

## 🔍 Before You Continue - Quick Check
You should be comfortable with:
1. Extracting specific rows or columns from 2D arrays
2. Using boolean conditions to filter data
3. Understanding what `.shape` tells you about your array
4. The difference between `arr[0]` and `arr[[0]]`

**Quick test**: Create a 3x4 array and extract the last two rows. Can you do this in multiple ways?


## I.11. Mathematical and Statistical Operations in NumPy

NumPy provides many mathematical and statistical functions that can be applied to arrays.  
A key concept here is the **axis** argument, which defines *along which dimension* the operation is performed.

---

### Basic mathematical operations

```python
arr = np.arange(1, 10).reshape(3, 3)
print("Array:\n", arr)

print("Sum of all elements:", np.sum(arr))
print("Product of all elements:", np.prod(arr))

print("Row-wise sum:", np.sum(arr, axis=1))  # sum across columns
print("Column-wise sum:", np.sum(arr, axis=0))  # sum across rows
```
Output : 
```
Array:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
Sum of all elements: 45
Product of all elements: 362880
Row-wise sum: [ 6 15 24]
Column-wise sum: [12 15 18]
```


**Explanation:**

Without axis, NumPy collapses the entire array into a single value.  
With axis=0, the operation is applied down the rows (i.e., column by column).  
With axis=1, the operation is applied across the columns (i.e., row by row).  
Think of axis=0 as “compressing rows” and axis=1 as “compressing columns”.  

✅ Warm-up exercise:
- Create a 4×4 array with values from 1 to 16.
- Compute the sum of each row.
- Compute the product of each column.


#### Mean and average
```python
print("Mean of all elements:", np.mean(arr))
print("Row-wise mean:", np.mean(arr, axis=1))
print("Column-wise mean:", np.mean(arr, axis=0))
```
output : 
```python
Mean of all elements: 5.0
Row-wise mean: [2. 5. 8.]
Column-wise mean: [4. 5. 6.]
```

**Explanation:**
The mean is the average of values.
Again, axis determines whether we compute it for rows, columns, or the whole array.


### ✅ What You've Learned
- Statistical functions: `np.sum()`, `np.mean()`, `np.std()`, `np.var()`
- The `axis` parameter controls which dimension to reduce
- `axis=0` operates "down the rows" (column-wise), `axis=1` operates "across columns" (row-wise)
- How to combine slicing with statistical operations for targeted analysis
- Boolean indexing for conditional statistics

**Key Concept**: The `axis` parameter is your steering wheel - it controls which direction the operation flows through your data.

✅ Warm-up exercise:
- Create a 3×5 random array with values between 0 and 1.
- Compute the mean of the whole array.
- Compute the mean of each column.

### Mixing slicing with operations (extended to higher dimensions)

So far, we used slicing with 2D arrays. But the idea extends naturally to higher dimensions.

#### Example with a 3D array
```python
arr3d = np.arange(2*3*4).reshape(2, 3, 4)
print("3D array shape:", arr3d.shape)
print(arr3d)

# Slice: take the first "block" (axis 0 = 0)
print("\nFirst block (arr3d[0]):\n", arr3d[0])

# Compute statistics only on that block
print("Mean of first block:", np.mean(arr3d[0]))

# Slice: take all blocks, but only the last row
print("\nLast row across all blocks:\n", arr3d[:, -1, :])
print("Sum of last row across all blocks:", np.sum(arr3d[:, -1, :]))
 ```
Output : 
```
3D array shape: (2, 3, 4)
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

First block (arr3d[0]):
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Mean of first block: 5.5

Last row across all blocks:
 [[ 8  9 10 11]
 [20 21 22 23]]
Sum of last row across all blocks: 124```

** Explanation:**

A 3D array can be imagined as a stack of 2D tables (or “blocks”).  
`arr3d[0]` selects the first block.  
`arr3d[:, -1, :]` means: for all blocks, take the last row, all columns.  
You can combine slicing with operations to focus on very specific regions of the array.  

✅ Warm-up exercise:

- Create a 3×4×5 array with values from 0 to 59.
- Compute the mean of the second block (index 1 along axis 0).
- Compute the sum of the first column across all blocks and all rows.

### Example with a 4D array
```python
arr4d = np.arange(2*3*4*5).reshape(2, 3, 4, 5)
print("4D array shape:", arr4d.shape)
```
### Take the first block along axis 0
```python
print("\nFirst 3D block (arr4d[0]). Shape:", arr4d[0].shape)
```

### Compute mean of this 3D block
```python
print("Mean of first 3D block:", np.mean(arr4d[0]))
```

### Slice: take all blocks, but only the second slice along axis 1
```python
print("\nSecond slice across all 3D blocks:\n", arr4d[:, 1, :, :])
print("Sum of this slice:", np.sum(arr4d[:, 1, :, :]))
```

### Slice: take the last column (axis 3 = -1) across everything
```python
print("\nLast column across all dimensions:\n", arr4d[:, :, :, -1])
print("Mean of last column across all dimensions:", np.mean(arr4d[:, :, :, -1]))
```

**Explanation:**

A 4D array is harder to visualize, but you can think of it as a collection of 3D blocks.  
Each index in the bracket drills down one level of nesting.  
For instance, `arr4d[:, 1, :, :]` means: take all elements where axis 1 = 1, but keep everything else.  
Operations like np.sum or np.mean can then be applied only to this subarray.  

✅ Warm-up exercise:
- Create a 2×3×4×6 array with numbers from 0 to 143.
- Compute the sum of the third slice (index 2 along axis 1) across all blocks.
- Compute the mean of the last column (axis 3 = -1) across the entire array.
- Compute the variance of the first 2×2 sub-block: `arr4d[0, :2, :2, :]`.

👉 The idea is always the same:

Use slicing to isolate the region of interest.  
Apply the operation (sum, mean, min, etc.) only on that slice.  
Optionally use the axis parameter to control how the operation is applied within the slice.  

--- 
### More advanced slicing: selecting multiple columns & boolean indexing

So far, we sliced entire rows/columns or single ones. NumPy also allows more flexible selections.


#### Selecting multiple columns with slicing
```python
arr2d = np.arange(1, 17).reshape(4, 4)
print("Array:\n", arr2d)

# Take rows 1 to 3 (indices 1:4), but only columns 0 and 2
print("\nRows 1-3, columns 0 and 2:\n", arr2d[1:4, [0, 2]])

# Take every second row, and the middle two columns
print("\nEvery second row, middle two columns:\n", arr2d[::2, 1:3])
```

**Explanation:**

You can select multiple non-consecutive columns (or rows) by passing a list of indices, e.g., `[0, 2]`.

You can also use slicing on columns: `1:3` means take columns 1 and 2.

This gives much more control compared to plain list slicing.

✅ Warm-up exercise:
Create a 6×6 array with values from 0 to 35.

Extract rows 2 to 4 and columns 1, 3, and 5.

Extract every other row and the last two columns.

#### Boolean indexing with slicing (2D example)
```python
arr2d = np.arange(1, 17).reshape(4, 4)
print("Array:\n", arr2d)
```
#### Boolean mask on rows: select rows where the first column > 8
```python
row_mask = arr2d[:, 0] > 8
print("\nRow mask:", row_mask)

print("Rows where first column > 8:\n", arr2d[row_mask, :])
```
#### Boolean mask on columns: select columns where the column sum > 30
```python
col_mask = np.sum(arr2d, axis=0) > 30
print("\nColumn mask:", col_mask)

print("Columns where sum > 30:\n", arr2d[:, col_mask])
```

**Explanation:**

Boolean indexing can be combined with slicing.

`arr2d[:, 0] > 8` creates a mask on rows depending on values in the first column.

Similarly, you can compute statistics per column and use them to filter columns.

Very powerful when you want to select parts of the array based on conditions, not just positions.

✅ Warm-up exercise:
- Create a 5×5 array with values from 10 to 34.  
- Select all rows where the last element is greater than 30.  
- Select all columns where the mean of the column is less than 20.  

#### Boolean indexing in 3D arrays
```python
arr3d = np.arange(2*3*4).reshape(2, 3, 4)
print("3D array shape:", arr3d.shape)
```

#### Example: keep only slices (along axis 1) where the sum is greater than 20
```python
slice_mask = np.sum(arr3d, axis=(0, 2)) > 20
print("\nSlice mask (axis=1):", slice_mask)
print("Slices where sum > 20:\n", arr3d[:, slice_mask, :])
```

**Explanation:**

In a 3D array, boolean masks can be built along one axis.  
`np.sum(arr3d, axis=(0, 2))` collapses axes 0 and 2, leaving axis 1.  
The mask then selects which “slices” (axis 1) to keep across all blocks.  

✅ Warm-up exercise:  
- Create a 2×4×5 array with values from 0 to 39.
- Select slices (axis=1) where the maximum value is greater than 30.
- Select only the columns (axis=2) where the mean is less than 15.

---



### Applying mathematical/statistical functions on advanced slices

So far, we applied functions like `sum` or `mean` on whole arrays or simple slices.  
Now let’s combine them with **more advanced slicing techniques**.

#### 1. Operations on multiple non-consecutive columns
```python
arr = np.arange(1, 17).reshape(4, 4)
print("Array:\n", arr)

# Select columns 0 and 2 (non-consecutive) and compute row-wise sum
print("\nRow-wise sum of columns 0 and 2:", np.sum(arr[:, [0, 2]], axis=1))

# Select rows 1 and 3, compute column-wise mean
print("Column-wise mean of rows 1 and 3:", np.mean(arr[[1, 3], :], axis=0))
```

**Explanation:**

Fancy indexing (`[0, 2]` or `[1, 3]`) lets you select non-adjacent rows or columns.  
Once you have this slice, you can apply sum, mean, etc. with the usual axis argument.  
axis=1 → compress across columns (row-wise operation).  
axis=0 → compress across rows (column-wise operation).  

✅ Warm-up exercise:
- Create a 6×6 array with values from 0 to 35.  
- Compute the sum of columns 1, 3, and 5 for each row.  
- Compute the mean of rows 0, 2, and 4 for each column.  

#### 2. Operations with boolean row/column selection

```python
arr = np.arange(10, 26).reshape(4, 4)
print("Array:\n", arr)
```

- Select rows where the last element > 20, then compute mean of each row

```python
row_mask = arr[:, -1] > 20
print("\nRows where last element > 20:\n", arr[row_mask])
print("Mean of these rows:", np.mean(arr[row_mask], axis=1))
```

- Select columns where the column mean < 18, then compute sum
```python
col_mask = np.mean(arr, axis=0) < 18
print("\nColumns where mean < 18:\n", arr[:, col_mask])
print("Sum of these columns:", np.sum(arr[:, col_mask], axis=0))
```

**Explanation:**

Boolean masks filter rows or columns dynamically based on conditions.  
After filtering, the result is just another NumPy array → we can apply statistical functions.  
This allows conditional statistics (e.g., “average of rows where last element > 20”).  

✅ Warm-up exercise:
- Create a 5×5 array with values from 0 to 24.
- Compute the mean of rows where the first element is even.
- Compute the sum of columns where the maximum is greater than 20.

#### 3. Operations on higher-dimensional slices (3D/4D)
```python
arr3d = np.arange(2*3*4).reshape(2, 3, 4)
print("3D array shape:", arr3d.shape)
```

- Take columns 1 and 3 across all blocks/slices, then compute mean

```python
print("\nMean of columns 1 and 3 across all blocks:\n", np.mean(arr3d[:, :, [1, 3]], axis=(0, 2)))
```

- Take only slices where total sum > 20, compute variance of each
```python
slice_mask = np.sum(arr3d, axis=(0, 2)) > 20
print("\nSlices where sum > 20:\n", arr3d[:, slice_mask, :])
print("Variance of these slices:", np.var(arr3d[:, slice_mask, :], axis=(0, 2)))

```

**Explanation:**

You can select specific columns across all dimensions (e.g., [:, :, [1, 3]] selects columns 1 and 3 in all blocks and slices).

axis=(0, 2) collapses both block and column dimensions, leaving statistics across slices.

Boolean masks also work in 3D, allowing conditional statistics.

✅ Warm-up exercise:
- Create a 2×4×5 array with values from 0 to 39.
- Compute the mean of columns [0, 2, 4] across all blocks and slices.
- Compute the variance of slices where the maximum > 30.

#### 4. Example with 4D arrays
```python
arr4d = np.arange(2*3*4*5).reshape(2, 3, 4, 5)
print("4D array shape:", arr4d.shape)
```

- Select last two columns across all dimensions, compute their overall mean
```python
print("\nMean of last two columns:", np.mean(arr4d[:, :, :, -2:], axis=(0, 1, 2)))
```

- Select only the second slice along axis 1, compute sum across all blocks
```python
print("Sum of second slice (axis=1, index=1):", np.sum(arr4d[:, 1, :, :]))
```

- Boolean mask: keep slices (axis=1) where total mean > 30
```python
slice_mask = np.mean(arr4d, axis=(0, 2, 3)) > 30
print("\nSlices where mean > 30:\n", arr4d[:, slice_mask, :, :].shape)
print("Std of those slices:", np.std(arr4d[:, slice_mask, :, :]))
```

**Explanation:**

With 4D arrays, slicing becomes very expressive:  
`arr4d[:, :, :, -2:]` → last two columns across all dimensions.  
`arr4d[:, 1, :, :]` → second slice along axis 1 across all blocks.  
Boolean masks let us filter entire slices, then compute statistics on the filtered subarray.

✅ Warm-up exercise:
- Create a 2×3×4×6 array with values from 0 to 143.
- Compute the mean of the last three columns across the whole array.
- Compute the sum of the first slice along axis 1.
- Compute the standard deviation of all slices where the column mean (axis=2) is greater than 50.

---

## 🏆 Mini-Challenge Section: Combining Slicing, Boolean Masks, and Statistics

These challenges are designed to feel closer to real data analysis, where you combine several techniques at once.

### Challenge 1: Selective Column Analysis
Create a `6×8` array filled with numbers from 0 to 47.
1. Extract only **columns 2, 4, and 6**.
2. From these columns, compute the **mean of the rows where the first element > 20**.

---

### Challenge 2: Conditional Slice Statistics (3D)
Create a `2×4×6` array with values from 0 to 47.
1. Select only the **even-numbered columns** (axis=2).
2. From those, keep only the slices (axis=1) where the **sum is greater than 100**.
3. Compute the **variance** of the resulting subarray.

---

### Challenge 3: Targeted Analysis in 4D
Create a `2×3×4×5` array with values from 0 to 119.
1. Extract the **last two rows** (axis=2) from every slice.
2. From these rows, keep only the **columns where the column mean > 50**.
3. Compute the **standard deviation** of the final selection.



---

### Challenge 4: Realistic Data Filtering
Imagine each row of a `10×5` array represents a "sample" with 5 measured features.
1. Generate the array with random integers between 10 and 99.
2. Keep only the samples (rows) where:
   - the **first feature > 50** AND
   - the **last feature < 80**.
3. From these filtered samples, compute:
   - the **mean of each feature**,
   - the **index of the feature with the maximum mean**.

---

👉 These challenges combine:  
- **Fancy indexing** (`[2, 4, 6]` or `[:, -2:]`),  
- **Boolean masks** (conditions applied on rows/columns/slices),  
- **Mathematical/statistical functions** (`mean`, `var`, `std`, `argmax`).  

Take your time to reason about *which axis* you are reducing over at each step!

---

## 11. Loops with NumPy Arrays in Data Science

NumPy is optimized to avoid Python loops by using **vectorized operations**.  
👉 However, there are situations where **loops are still useful**:
- When applying a custom rule that cannot be expressed with vectorized NumPy functions.
- When iterating over rows/samples (e.g., computing per-sample statistics in a dataset).
- When building algorithms that work step by step (e.g., gradient descent updates, simulations).

---

#### Example 1: Iterating over rows
```python
arr = np.arange(1, 13).reshape(4, 3)
print("Array:\n", arr)

# Compute row sums using a loop
row_sums = []
for row in arr:
    row_sums.append(np.sum(row))

print("Row sums (with loop):", row_sums)
```

**Explanation:**

Here we iterate over rows (for row in arr), each row is a 1D NumPy array.  
Loops make sense if you want to compute statistics row by row (per-sample analysis).

#### Example 2: Custom element-wise transformation
```python
arr = np.array([10, 15, 20, 25, 30])
print("Array:", arr)
```

- Apply custom transformation: if value > 20, subtract 5, else add 5
```python
transformed = []
for x in arr:
    if x > 20:
        transformed.append(x - 5)
    else:
        transformed.append(x + 5)

print("Transformed array:", np.array(transformed))
```

**Explanation:**

Loops are useful for applying non-standard rules that cannot be vectorized easily.

In real-world data, you may want to apply different operations depending on conditions.

#### Example 3: Iterating over 2D array with indices
```python
arr = np.arange(1, 10).reshape(3, 3)
print("Array:\n", arr)

- Replace diagonal elements with their squared value
for i in range(arr.shape[0]):
    arr[i, i] = arr[i, i] ** 2

print("Diagonal squared:\n", arr)
```

**Explanation:**

Sometimes you need to access both values and positions → indexing in the loop.

This is common in algorithms where position matters (e.g., updating diagonal, symmetric matrices).

#### Example 4: Progressive statistics with loops
```python
arr = np.random.randint(1, 100, size=10)
print("Array:", arr)

running_mean = []
total = 0
for i in range(len(arr)):
    total += arr[i]
    running_mean.append(total / (i+1))

print("Running mean:", running_mean)
```

**Explanation:**

Loops allow progressive calculations, like running averages.

This is useful for data streaming or online algorithms (processing one sample at a time).

#### Example 5: Nested loops in multidimensional arrays
```python
arr = np.arange(1, 13).reshape(3, 4)
print("Array:\n", arr)

- Count even numbers using nested loops
count_even = 0
for row in arr:
    for x in row:
        if x % 2 == 0:
            count_even += 1

print("Number of even elements:", count_even)
```

**Explanation:**

Nested loops allow iteration element by element.

Rare in NumPy-heavy work, but can be useful for teaching the connection with lists.

**📝 Exercises**

- Create a 5×5 array with numbers from 0 to 24.
    - Use a loop to compute the sum of each row.
    - Generate a 1D array of 10 random integers between 1 and 100.
    - Use a loop to count how many values are divisible by 3.

- Create a 3×4 array with values from 1 to 12.
    - Write a loop that replaces all values greater than 8 with 0.
    - Generate a 1D array of 15 random integers between 10 and 50.
    - Use a loop to compute a cumulative product (running product).

- Create a 4×4 array with values from 1 to 16.
    - Use a nested loop to build a new array where each element is squared if it is odd, and halved if it is even.

🏆 **Mini-Challenge: Loop-Based Data Cleaning**

Imagine you have a dataset stored in a 6×5 array (rows = samples, columns = features) with random integers between 0 and 99.
Using loops, create a cleaned dataset where:

    - Any value < 10 is replaced with 10.
    - Any value > 90 is replaced with 90.

After cleaning, compute for each row:

    - The mean of the row,
    - The index of the maximum element in that row.

Store these results in two separate arrays (row_means, row_argmax).

👉 This mimics a common real-world preprocessing step: cleaning extreme values and then extracting per-sample statistics.


---

# Mixing Loops, Slicing, and Statistical Functions

Now that we know:
- how to slice arrays to extract subparts,
- how to loop through rows/columns,
- how to compute statistics with NumPy functions,

👉 we can **combine these techniques** to perform more advanced data manipulations.

---

## Example 1: Row-wise statistics with slicing inside a loop
```python
arr = np.arange(1, 21).reshape(5, 4)
print("Array:\n", arr)

row_means = []
for i in range(arr.shape[0]):     # loop over rows
    row_slice = arr[i, :]         # slice one row
    row_means.append(np.mean(row_slice))

print("Row means:", row_means)
```

We use a loop to iterate over rows, slicing to extract them, and then a NumPy function to compute statistics.

Example 2: Column filtering inside a loop
```python
arr = np.random.randint(1, 100, size=(6, 6))
print("Array:\n", arr)

selected_means = []
for j in range(arr.shape[1]):            # loop over columns
    col_slice = arr[:, j]                # slice one column
    if np.mean(col_slice) > 50:          # condition on mean
        selected_means.append(np.mean(col_slice))

print("Means of selected columns:", selected_means)
```

We slice columns inside a loop and apply a condition using statistical functions.

Example 3: Combining slicing and nested loops

```python
arr = np.arange(1, 25).reshape(4, 6)
print("Array:\n", arr)

row_stats = []
for i in range(arr.shape[0]):             # loop over rows
    row = arr[i, :]                       # slice row
    stats = []
    for j in range(0, arr.shape[1], 2):   # loop over columns in steps of 2
        sub_slice = row[j:j+2]            # slice 2 consecutive columns
        stats.append(np.std(sub_slice))   # compute std deviation
    row_stats.append(stats)

print("Row-wise stats:", row_stats)
```

Here we loop over rows, and inside that loop, we slice 2-column chunks and compute statistics. This is a mix of all three concepts.

**Exercises**

- Create an 8×5 array with random integers between 10 and 99.
    - Loop over the rows. For each row, compute the maximum of the middle three columns (use slicing).
    - Store results in a list.

- Create a 10×6 array with numbers from 0 to 59.
    - For each column, compute the mean of the first 5 rows only (slice + loop).
    - Collect all column means.

- Create a 6×6 array with values from 1 to 36.
    - For each row, slice it into two halves.
    - Compute the sum of each half and store them in a tuple (sum_left, sum_right).
    - Collect all tuples in a list.

- Create a 4×8 array with random integers between 1 and 50.
    - For each row, consider only the even-indexed columns (0, 2, 4, 6).
    - Compute the variance of these values.
    - Store variances in a list.

- Create a 5×5 array with random integers between 0 and 20.
    - Loop over columns. For each column:
        - Keep only the values greater than 10 (boolean slicing).
        - Compute the mean of those filtered values.
    - Collect all column means.

**Mini-Challenge: Segment Analysis with Loops + Slicing + Statistics**  

    - Create a 12×6 array with random integers between 0 and 99.
        - For each row:
            - Slice it into three segments of equal length (2 columns each).
            - For each segment, compute the standard deviation.
            - Store the results in a nested list, where each row is represented as [std_segment1, std_segment2, std_segment3].
        - Finally, compute the average standard deviation across all rows and segments.

👉 This simulates splitting data into "feature groups," analyzing each group, and then aggregating the results.

# Mixing Loops, Slicing, and Statistical Functions

Now that we know:
- how to slice arrays to extract subparts,
- how to loop through rows/columns,
- how to compute statistics with NumPy functions,

👉 we can **combine these techniques** to perform more advanced data manipulations.  
But it’s also important to understand **when to use slicing and when to use loops**.

---

#### Example 1: Row-wise statistics with slicing inside a loop
```python
arr = np.arange(1, 21).reshape(5, 4)
print("Array:\n", arr)

row_means = []
for i in range(arr.shape[0]):     # loop over rows
    row_slice = arr[i, :]         # slice one row
    row_means.append(np.mean(row_slice))

print("Row means:", row_means)
```
Why a loop? We want the mean for each row separately, so we iterate row by row.

Why slicing? Inside the loop, we use arr[i, :] to select a single row. This is shorter and clearer than manually indexing all elements.

#### Example 2: Column filtering inside a loop
```python
arr = np.random.randint(1, 100, size=(6, 6))
print("Array:\n", arr)

selected_means = []
for j in range(arr.shape[1]):            # loop over columns
    col_slice = arr[:, j]                # slice one column
    if np.mean(col_slice) > 50:          # condition on mean
        selected_means.append(np.mean(col_slice))

print("Means of selected columns:", selected_means)
```

Why a loop? We need to test each column separately against a condition (mean > 50). Loops let us apply this condition column by column.

Why slicing? arr[:, j] extracts one column at a time for analysis, which is much cleaner than nested loops.

#### Example 3: Combining slicing and nested loops
```python
arr = np.arange(1, 25).reshape(4, 6)
print("Array:\n", arr)

row_stats = []
for i in range(arr.shape[0]):             # loop over rows
    row = arr[i, :]                       # slice row
    stats = []
    for j in range(0, arr.shape[1], 2):   # loop over columns in steps of 2
        sub_slice = row[j:j+2]            # slice 2 consecutive columns
        stats.append(np.std(sub_slice))   # compute std deviation
    row_stats.append(stats)

print("Row-wise stats:", row_stats)
```

Why an outer loop? We need to repeat the same operation for each row.  
Why an inner loop? We also want to split each row into smaller column segments (pairs of values). Looping with step size 2 is natural here.  
Why slicing? Inside the inner loop, `row[j:j+2]` selects exactly the two columns we want in each segment.  

**Exercises**

For each exercise, think:  
Do I need a loop? (Am I repeating the same process row by row, or column by column?)  
Do I need a slice? (Am I extracting a subpart of the row/column before applying a statistic?)

**Ex 1.**
- Create an 8×5 array with random integers between 10 and 99.
    - Loop over the rows. For each row, compute the maximum of the middle three columns (use slicing).
    - Store results in a list.
(Loop = needed because we want a result for each row. Slice = needed to get the middle 3 columns of each row.)

**Ex 2.**
- Create a 10×6 array with numbers from 0 to 59.
    - For each column, compute the mean of the first 5 rows only (slice + loop).
(Loop = needed because we check column by column. Slice = restrict to first 5 rows.)

**Ex 3.**
- Create a 6×6 array with values from 1 to 36.
    - For each row, slice it into two halves.
    - Compute the sum of each half and store them in a tuple (sum_left, sum_right).
(Loop = needed for row-wise processing. Slice = to split row into halves.)

**Ex 4.**
- Create a 4×8 array with random integers between 1 and 50.
    - For each row, consider only the even-indexed columns (0, 2, 4, 6).
    - Compute the variance of these values.
(Loop = needed for row-by-row analysis. Slice = to select even-indexed columns.)

**Ex 5.**
Create a 5×5 array with random integers between 0 and 20.
    - Loop over columns. For each column:
    - Keep only the values greater than 10 (boolean slicing).
    - Compute the mean of those filtered values.
    - 
(Loop = needed for column-by-column processing. Slice = used twice: once to extract the column, once to apply the boolean mask.)

### Mini-Challenge: Segment Analysis with Loops + Slicing + Statistics

1. Create a `12×6` array with random integers between 0 and 99.  
2. For each row:  
   - Slice it into **three segments** of equal length (2 columns each).  
   - For each segment, compute the **standard deviation**.  
3. Store the results in a nested list, where each row is represented as `[std_segment1, std_segment2, std_segment3]`.  
4. Finally, compute the **average standard deviation across all rows and segments**.

**Why loops?** We need to repeat the segmentation for every row.  
**Why slicing?** We must extract each 2-column segment before applying the statistical function.  

---

#### ✅ Vectorized Solution (Alternative)

Loops are clear and explicit, but NumPy allows us to do this without writing loops manually:

```python
import numpy as np

# Step 1: create the array
arr = np.random.randint(0, 100, size=(12, 6))
print("Array:\n", arr)

# Step 2: reshape so that each row is split into 3 segments of 2 columns
reshaped = arr.reshape(arr.shape[0], 3, 2)  # shape (12, 3, 2)

# Step 3: compute std deviation along the last axis (the 2-column segments)
segment_stds = np.std(reshaped, axis=2)     # shape (12, 3)
print("\nStandard deviation per segment (vectorized):\n", segment_stds)

# Step 4: compute average std deviation across all rows and segments
average_std = np.mean(segment_stds)
print("\nAverage standard deviation:", average_std)
```

Here:

`reshape` automatically splits each row into 3 segments of size 2.

`np.std(..., axis=2)` computes the standard deviation inside each segment in one go.

No explicit Python for loop is needed.

🧭 Choosing Between Loops and Vectorization

When deciding, ask yourself:

Can the operation be expressed as a bulk mathematical operation?  
→ If yes, prefer vectorization (faster, more “NumPy-style”).  
Example: computing the mean/std of entire rows or reshaping.

Do you need to apply a custom rule that is hard to write with NumPy?  
→ Then use loops.  
Example: applying different transformations depending on multiple conditions.

Hybrid approach: You can often use slicing + vectorized functions inside a loop.  
This balances clarity (loop over rows) and efficiency (vectorized statistics on each slice).

👉 Rule of thumb:

Use vectorization first if possible.

Use loops if the operation is not naturally vectorizable, or if clarity matters more than speed.

# Functions with NumPy Arrays

In data science, we often repeat the same analysis multiple times.  
Instead of rewriting code each time, we can **wrap operations into functions**.  

This makes the code:
- Reusable
- Easier to read
- Easier to test on different data

---

## Part 1: Wrapping slicing + statistics into functions

#### Example 1: Row-wise mean
```python
import numpy as np

def row_means(arr):
    """Return the mean of each row in a 2D array."""
    means = []
    for i in range(arr.shape[0]):   # loop over rows
        row = arr[i, :]             # slice the row
        means.append(np.mean(row))
    return np.array(means)

arr = np.arange(1, 13).reshape(4, 3)
print("Array:\n", arr)
print("Row means:", row_means(arr))
```
Here we encapsulated the loop + slicing + statistic into a single function.


#### Example 2: Column filtering by condition

```python
def filter_columns_by_mean(arr, threshold=50):
    """Return only the columns whose mean is above a threshold."""
    selected_cols = []
    for j in range(arr.shape[1]):      # loop over columns
        col = arr[:, j]                # slice column
        if np.mean(col) > threshold:   # check condition
            selected_cols.append(col)
    return np.array(selected_cols).T   # return as 2D array again

arr = np.random.randint(1, 100, size=(6, 6))
print("Array:\n", arr)
print("Filtered columns:\n", filter_columns_by_mean(arr, 50))
```

## Part 2: Creating custom functions to apply to arrays

NumPy lets us apply custom Python functions to elements, rows, or slices.

#### Example 3: Custom element-wise function
```python
def custom_transform(x):
    """If x > 20, subtract 5; else add 5."""
    if x > 20:
        return x - 5
    else:
        return x + 5

arr = np.array([10, 15, 25, 30])
print("Original:", arr)
```

### Apply function to each element with a loop
```python
transformed = np.array([custom_transform(v) for v in arr])
print("Transformed:", transformed)
```

#### Example 4: Custom row-level function
```python
def row_range(row):
    """Return the difference between max and min of a row."""
    return np.max(row) - np.min(row)

arr = np.random.randint(0, 50, size=(5, 4))
print("Array:\n", arr)

row_results = [row_range(arr[i, :]) for i in range(arr.shape[0])]
print("Row ranges:", row_results)
```

#### Warm-Up Exercises
- **Part 1** : Functions for slicing + statistics
     - Write a function column_variances(arr) that returns the variance of each column in a 2D array.
     - Write a function max_of_middle(arr) that returns the maximum value of the middle column(s) of the array.
     - Write a function row_halves_sum(arr) that, for each row, returns a tuple (sum_left, sum_right) of the sums of the two halves.

- **Part 2**: Functions for custom transformations
    - Write a function threshold_clip(x, min_val, max_val) that clips a single value between min_val and max_val. Apply it to all elements of an array.
    - Write a function normalize_row(row) that subtracts the mean of the row and divides by its standard deviation. Apply it row by row to a 2D array.
    - Write a function custom_replace(x) that replaces even numbers with -1 and odd numbers with 1. Apply it to a random integer array.

## Part 3: Mixing Both Ideas (Guided Exercises)

These exercises combine:  
- Writing functions,  
- Using slicing and/or loops,  
- Applying NumPy statistics.  

---

### Exercise 1: Segment Standard Deviation
We want a function that splits each row into **segments of 2 columns** and computes the standard deviation for each segment.  

1. Create a random `6×6` array.  
2. Slice the first row into 3 segments of 2 columns each. Print them.  
3. Compute the standard deviation of each segment.  
4. Wrap the above steps into a function `segment_std(arr)` that works for the whole array.  
   - Output: a new 2D array where each row has the standard deviations of its segments.  

---

### Exercise 2: Filtering Rows by Sum
We want a function that **keeps only rows with sum above a threshold**.  

1. Generate a `5×5` array with random integers between 0 and 20.  
2. Write code to compute the sum of each row.  
3. Use boolean indexing to select rows with sum > 40.  
4. Wrap the logic into a function `filter_rows_by_sum(arr, threshold)`.  
   - Input: array + threshold  
   - Output: new array with only rows that pass the condition.  

---

### Exercise 3: Custom Row Score
We define a "score" for each row as:  

$$\text{score} = \frac{\text{mean(row)} + \text{max(row)}}{\text{min(row)}}$$

1. For a single row, slice it and compute its mean, max, and min.  
2. Combine them into the formula `(mean + max) / min`.  
3. Test it on 1 row of an array.  
4. Wrap it into a function `custom_score(row)` that works for a single row.  
5. Apply this function row by row to a 2D array, storing the results in a list or NumPy array.  



# Final Challenges

## 🧩 Advanced Practice: Working with 4D Arrays

We will use a base dataset for all exercises:

```python
import numpy as np

# Shape (3, 4, 5, 6) = 3 blocks, each 4 rows × 5 cols × 6 values
data = np.random.randint(1, 100, size=(3, 4, 5, 6))
print("Data shape:", data.shape)
```


---

### **Exercise 1: Basic Extraction + Stats (★☆☆)**
- Extract the last two blocks (axis = 0).
- From these, keep only the first two rows (axis = 1).
- Compute the **mean** of each `(5, 6)` slice.
- Compare your result to computing the mean of the whole selection directly with a NumPy function.

---

### **Exercise 2: Loop over Blocks (★☆☆)**
- Write a **for loop** that goes through each block (axis = 0).
- Inside the loop:
  - Extract the last column (axis = 3).
  - Compute the **maximum** of this column for each row (axis = 1).
- Collect results into a new 2D array of shape `(3, 4)`.

---

### **Exercise 3: Mixing Slicing and Custom Function (★★☆)**
- Define a function `row_range(arr)` that takes a 2D slice `(5, 6)` and returns the **range** (maximum − minimum) of each row.
- Apply this function to **every row of every block** in the dataset.
- Store the results in a new array of shape `(3, 4, 5)`.

---

### **Exercise 4: Axis Control in Statistics (★★☆)**
- Using slicing, select only the **middle 3 columns** along axis = 2.
- Compute the **standard deviation** along axis = 3 (last axis).
- Then, write a **custom function** to compute the same result using a **for loop** over the last axis.
- Compare the outputs of the NumPy function and your custom function.

---

### **Exercise 5: Mini Challenge (★★★)**
1. From the dataset, extract only the blocks with **even index** along axis = 0.  
2. For each extracted block, keep only rows with indices `1` and `3`.  
3. Write a function `custom_score(arr)` that, for a `(5, 6)` slice:  
   - Computes the mean of even-indexed columns,  
   - Adds the maximum value of odd-indexed columns,  
   - Divides the result by the minimum value in the slice.  
4. Apply this function across all selected slices (with slicing + loops or vectorized methods).  
5. Return an array summarizing the score for each `(5, 6)` slice.  

---