# NumPy Indexing and Slicing

This notebook covers how to access and manipulate elements in NumPy arrays.
Indexing and slicing are essential for feature selection, data filtering,
and preprocessing in machine learning workflows.

Focus areas:
- Basic indexing
- Slicing
- Boolean masking
- Practical ML-related examples


In [None]:
import numpy as np

We will use small, well-defined arrays to clearly demonstrate
indexing and slicing behavior.

In [None]:
# 1D array
arr_1d = np.array([10, 20, 30, 40, 50])

# 2D array
arr_2d = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

print("1D Array:", arr_1d)
print("\n2D Array:\n", arr_2d)

## Basic Indexing

NumPy uses zero-based indexing, similar to Python lists.
Indexing allows access to individual elements in an array.


In [None]:
print("First element:", arr_1d[0])
print("Last element:", arr_1d[-1])

For 2D arrays, indexing follows the format:
`array[row_index, column_index]`

This is common in ML when accessing a specific sample or feature.

In [None]:
# Element at row 1, column 2
element = arr_2d[1, 2]
print("Element at (1, 2):", element)

## Slicing

Slicing allows selection of a range of elements.
The syntax follows:
`start:stop:step`


In [None]:
# Elements from index 1 to 3 (excluding 3)
slice_1d = arr_1d[1:4]
print("Sliced 1D array:", slice_1d)

Slicing in 2D arrays enables selection of:
- Multiple rows
- Multiple columns
- Submatrices

This is frequently used during feature selection.

In [None]:
# Select first two rows and last two columns
slice_2d = arr_2d[:2, 1:]
print("Sliced 2D array:\n", slice_2d)

## Boolean Indexing

Boolean indexing filters array elements based on conditions.
This is extremely common in ML data preprocessing.

In [None]:
# Select elements greater than 30
filtered = arr_1d[arr_1d > 30]
print("Elements greater than 30:", filtered)

Boolean masks can be applied to 2D arrays as well.

In [None]:
mask = arr_2d > 5
print("Boolean mask:\n", mask)

filtered_2d = arr_2d[mask]
print("Filtered elements:", filtered_2d)

## Practical Example: Feature Selection

In machine learning, datasets are typically represented as:
- Rows → samples
- Columns → features

Indexing and slicing are used to select subsets of features.


In [None]:
# Example dataset: rows = samples, columns = features
X = np.array([
    [5.1, 3.5, 1.4, 0.2],
    [4.9, 3.0, 1.4, 0.2],
    [6.2, 3.4, 5.4, 2.3]
])

# Select only the first two features
X_selected = X[:, :2]

print("Original dataset:\n", X)
print("\nSelected features:\n", X_selected)

## View vs Copy

Slicing often returns a *view* of the original array, not a copy.
Modifying a view will affect the original data.

This can lead to bugs if not handled carefully.


In [None]:
view = arr_1d[1:4]
view[0] = 999

print("Modified view:", view)
print("Original array after modification:", arr_1d)

In [None]:
arr_1d = np.array([10, 20, 30, 40, 50])
copy = arr_1d[1:4].copy()
copy[0] = 999

print("Copy:", copy)
print("Original array:", arr_1d)

## Key Takeaways

- Indexing accesses individual elements
- Slicing extracts subarrays efficiently
- Boolean indexing is essential for data filtering
- Feature selection relies heavily on slicing
- Understand views vs copies to avoid unintended data changes

These concepts are foundational for Pandas and scikit-learn workflows.
