# Understanding Numpy
- NumPy (Numerical Python) is the foundation library for scientific computing in Python. It provides a powerful N-dimensional array object and tools for working with these arrays. 
 - Think of NumPy as the engine that powers most data science libraries - pandas uses NumPy arrays internally, scikit-learn expects NumPy arrays for machine learning, and matplotlib uses NumPy for plotting.


- NumPy operations are implemented in C, making them 10-100x faster than pure Python
- NumPy arrays store data more compactly than Python lists
- Vectorization: Perform operations on entire arrays without writing loops
- Work with arrays of different shapes seamlessly
- Foundation for pandas, scikit-learn, matplotlib, and more

In [1]:
# import all necessary libraries 

import numpy as np
import matplotlib.pyplot as plt
import time

# check NumPy package
print(f"NumPy version: {np.__version__}")

# Display settings for cleaner output
np.set_printoptions(precision=3, suppress=True)

NumPy version: 2.3.2


**Numpy Data Structure**

- NumPy arrays are fundamentally different from Python lists:

 - Homogeneous: All elements must be the same data type
 - Fixed size: Size is determined at creation (though you can create new arrays)
 - Memory efficient: Elements stored in contiguous memory blocks
 - Vectorized operations: Mathematical operations work on entire arrays

#### Creating Numpy Arrays

In [2]:
# Creating arrays from Python lists

# 1D array: a simple sequence of numbers
array1d = np.array([1, 2, 3, 4, 5])

# 2D array: This is like a matrix or table with rows and columns
array2d = np.array([[1,2,3],
                    [4, 5, 6]])


# 3D array: This is like a stack of 2D arrays. It is useful for images, time series, etc.
array3d = np.array([[[1,2, 3], [4, 5, 6]],
                    [[7, 8, 9], [10, 11, 12]]])

print("1D array:", array1d)
print("2D array:\n", array2d)
print("3D array:\n", array3d)

1D array: [1 2 3 4 5]
2D array:
 [[1 2 3]
 [4 5 6]]
3D array:
 [[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]


 When a nested list is passed into np.array(), NumPy automatically determines the dimensions. The 1D array is like a single row, the 2D array is like a spreadsheet and the 3D array is like multiple spreadsheets stacked together.
 

##### Creating Special Arrays in Python

In [3]:
# Creating arrays filled with zeros - Useful for initializing arrays
# Shape (3, 4) means 3 rows and 4 columns
zeros = np.zeros((3, 4))

# Creating arrays filled with ones - Often used as starting points
ones = np.ones(((2, 3, 4)))  # 3D array: 2 layers, 3 rows and 4 columns

# Creating empty arrays - faster than zeros/ones but contain random values
# Use when you will immediately foill the aray with real data
empty = np.empty((2, 2))

print("Arrays with zeros in 3 by 4 shape:\n", zeros)
print("The shape of arrays with ones:\n", ones.shape)
print("Empty arrays which contains random values:\n", empty, "\n")

print(ones)

Arrays with zeros in 3 by 4 shape:
 [[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
The shape of arrays with ones:
 (2, 3, 4)
Empty arrays which contains random values:
 [[0. 0.]
 [0. 0.]] 

[[[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]

 [[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]]


`zeros()` and `ones()` are memory-efficient ways to create arrays of specific sizes. `empty()` is fastest but contains garbage values, so only use it when you'll immediately overwrite the contents.

In [4]:
# Range arrays - This works ike Python's range but is more powerful
range_array = np.arange(0, 12, 2)     # Range array from 0 to 11, printing only the 
print("Range array", range_array)

#Linearly spaced arrays- divide arange into two equal parts
#From 0 to 20 with exactly 5 points (including endpoints)
linspace_array = np.linspace(0, 20, 5)
print("Linspace array", linspace_array)

# Logarithmically spaced arrays - useful for scientific data
# From 10^0 to 10^2 (1 to 100) with 5 points
logspace_array = np.logspace(0, 2, 5)
print("Logspace array", logspace_array)


Range array [ 0  2  4  6  8 10]
Linspace array [ 0.  5. 10. 15. 20.]
Logspace array [  1.      3.162  10.     31.623 100.   ]


- `arange()` works like Python's range() but returns a NumPy array and works with floats.
- `linspace()` divides a range into equal segments - useful for plotting smooth curves.
- `logspace()` creates points that are evenly spaced on a logarithmic scale.

In [None]:
# Identity matrix - diagonal of ones, zeros elsewhere/other places
# Essential for linear algebra operations
identity = np.eye(4)        # 4 x 4 identity matrix
print("Identity matrix\n", identity)

# Diagonal matrix: This puts values on the diagonal
diagonal = np.diag([1, 2, 3, 4])
print("\nDiagonal array\n", diagonal)

# Arrays filled with a random values
rand = np.random.randint(0, 10, size=4)     # Random numbers from 0 to 9 
full_array = np.full((3, 4), rand)     # 3 x 4 array filled with random numbers from 0 to 9
print("\nFull array\n", full_array)



Identity matrix
 [[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]

Diagonal array
 [[1 0 0 0]
 [0 2 0 0]
 [0 0 3 0]
 [0 0 0 4]]

Full array
 [[6 7 6 6]
 [6 7 6 6]
 [6 7 6 6]]


Identity matrices are crucial in linear algebra - multiplying any matrix by an identity matrix and returns the original matrix.
Diagonal matrices are useful for scaling operations.

#### Numpy Data Types(dtypes)

* Understanding data types is crucial for memory efficiency and numerical precision

In [6]:
# Explicit data types - controls memory usage and precisiuon
int_array = np.array([1, 2, 3], dtype=np.int32)
float_array = np.array([1, 2, 3], dtype= np.float64)
bool_array = np.array([True, False, True], dtype=np.bool_)

# Type conversion - change dtype of existing array
converted = int_array.astype(np.float32)        # Convert 64-bit to 32-bit

print("Integer array: ", int_array)
print("Float array: ", float_array)
print("Boolean array: ", bool_array)
print("Converted dtype array: ", converted)

# Memory usage comparison
print(f"\nint32 uses {int_array.itemsize} bytes per element")
print(f"float64 uses {float_array.itemsize} bytes per element")

Integer array:  [1 2 3]
Float array:  [1. 2. 3.]
Boolean array:  [ True False  True]
Converted dtype array:  [1. 2. 3.]

int32 uses 4 bytes per element
float64 uses 8 bytes per element


- int8: 1 byte, range -128 to 127
- int32: 4 bytes, range ±2 billion
- float32: 4 bytes, ~7 decimal digits precision
- float64: 8 bytes, ~15 decimal digits precision

- Choose smaller types to save memory, larger types for precision

**Array Properties & Attributes**

- Understanding array properties helps you work effectively with your data and debug issue

In [7]:
# Create a simple 3D array for demonstration
# Think of this as 3 layers, each with 4 rows and 5 columns
array = np.random.randn(3, 4, 5)            # Output: (3, 4,5)

# Shape: The dimensions of the array (layers, rows, and columns)
print("Shape:", array.shape)

# Size: Total number of elements (3 x 4 x 5 = 60)
print("\nSize:", array.size)

# Ndim: Number of dimensions, 3D in this case
print("\nN-Dimension:", array.ndim)

# Dtype: Data type of each element
print("\nDtype:", array.dtype)     # Usually float64 for random numbers

# #Item size: Memory size of each element in bytes
print("\nItemsize:", array.itemsize)       # 8 bytes for float64

# Total memory usage in bytes and KB
print("\nMemory usage:", array.nbytes, "bytes")       # size x itemsize
print("Memory usage:", array.nbytes / 1024, "KB")


Shape: (3, 4, 5)

Size: 60

N-Dimension: 3

Dtype: float64

Itemsize: 8

Memory usage: 480 bytes
Memory usage: 0.46875 KB


These properties are essential for understanding your data's structure and memory requirements. Large datasets require careful attention to memory usage.

**Array Indexing & Slicing**

 **Basic Indexing - Accessing Individual Elements**

- NumPy indexing is similar to Python lists but more powerful for multi-dimensional arrays

In [8]:
# 1D arraym indexing - similar to Python lists
array1d = np.array([10, 20, 30, 40, 50])

print("First element:", array1d[0])     # Index 0 is 10
print("Last element:", array1d[-1])     # Negative indexing: 50
print("Slice [1:4]:", array1d[1:4])     # Elements 1, 2, 3 : [20, 30, 40]
print("Every 3rd element:", array1d[::2])   # Step of 2: [10, 30, 50]

First element: 10
Last element: 50
Slice [1:4]: [20 30 40]
Every 3rd element: [10 30 50]


Negative indices count from the end(-1 is the last element). Slicing uses [start:stop:step] where stop is exclusive

In [9]:
# 2D array indxing - row an dcolumn access
array2d = np.array([[1, 2, 3, 4],
                    [5, 6, 7, 8],
                    [9, 10, 11, 12]])

# Access specific element [row, column]
print("Element at row 1, column 2:", array2d[1, 2])

# Access entire rows 
print("\nFirst row:", array2d[0, :])        # All columns of row 0
print("Second row:", array2d[1, :])         # All columns of row 1
print("Third row:", array2d[2, :])          # All columns of row 2

# Access entire columns
print("\nFirst column:", array2d[:, 0])     # All rows of column 0
print("Second column:", array2d[:, 1])      # All rows of column 1
print("Third column:", array2d[:, 2])       # All rows of column 2
print("Fourth column:", array2d[:, 3])      # All columns of row 3

Element at row 1, column 2: 7

First row: [1 2 3 4]
Second row: [5 6 7 8]
Third row: [ 9 10 11 12]

First column: [1 5 9]
Second column: [ 2  6 10]
Third column: [ 3  7 11]
Fourth column: [ 4  8 12]


The comma separates dimensions. : means "all elements along this dimension". Slicing creates views of the original data when possible, not copies.

**Advanced Indexing - Powerful Selecion Methods**

In [None]:
# Fancy indexing - use arrays of indices to select elements
arr = np.array([10, 20, 30, 40, 50, 60])
indices = np.array([1, 2, 4])       #   This selects elements at positions 01 2, and 4
print("Fancy indexing:", arr[indices])

Fancy indexing: [20 30 50]


Fancy indexing lets you select elements in any order, repeat elements, and select non-contiguous elements. Very useful for data sampling and reordering.

In [30]:
# 2D fancy indexing - select specific row/column combinations       
arr2d = np.arange(12).reshape(3, 4)
print("Original 2D array:\n", arr2d)        # Output:  Original 2D array: [[ 0  1  2  3] [ 4  5  6  7] [ 8  9 10 11]]

# Select element at (row, column) pairs: (0, 1) and (2, 3)
rows = np.array([0, 2])
columns = np.array([1, 3])
print("Elements at (0, 1) and (2, 3):", arr2d[rows, columns])
# Select element at (row, column) pairs

Original 2D array:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Elements at (0, 1) and (2, 3): [ 1 11]


When you provide arrays for both dimensions, NumPy pairs them element-wise. This is different from slicing, which creates a rectangular subarray.

**Array Reshaping & Manipulation**

- Reshaping changes how the same data is organized in memory without changing the actual values.


In [None]:
# Start with 1D array
array_1D = np.arange(12)  # Output: 

# To reshape to 2D, breaking it to 3 rows and 4 columns
reshaped_2d = array_1D.reshape(3, 4)
print("The 1D array is now rehaped to 2D array with 3 rows and 4 columns:\n", reshaped_2d)

# Reshape to 3D: 2 layers x 2 rows x 3 columns
reshaped_3d = reshaped_2d.reshape(2, 2, 3)
print("Reshaped to 2 x 2 x 3:\n", reshaped_3d)

# Use -1 to let NumPy calculate one dimension automatically
auto_reshape = array_1D.reshape(4, -7)       # 4 rows, NumPy calculates columns, regardless of the number enetered. Just ensure the column number is negative
print("Auto-reshaped to 4x:\n", auto_reshape)




The 1D array is now rehaped to 2D array with 3 rows and 4 columns:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Reshaped to 2 x 2 x 3:
 [[[ 0  1  2]
  [ 3  4  5]]

 [[ 6  7  8]
  [ 9 10 11]]]
Auto-reshaped to 4x:
 [[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]


The total number of elements must remain the same (12 in this case). Using -1 tells NumPy to calculate that dimension automatically. Reshaping creates a view when possible, not a copy

In [None]:
# Flattening : This converts multi-dimensional arrays to 1D
array2d = np.array([[1, 2, 3], [4, 5, 6]])

#flatten() always returns a copy
flattened = array2d.flatten()
print("Flattened (copy):", flattened)

#  ravel() returns a view if possible (faster, memory efficient)
ravel = array2d.ravel()
print("Ravel (view if possible):", ravel)

Flattened (copy): [1 2 3 4 5 6]
Ravel (view if possible): [1 2 3 4 5 6]
