# NumPy Arrays and dtypes

This notebook introduces NumPy arrays, their core properties, and why they are
fundamental to machine learning workflows.

Focus areas:
- NumPy arrays vs Python lists
- Array shape, dimension, and dtype
- Why numerical types matter in ML


In [1]:
import numpy as np

## Python Lists vs NumPy Arrays

Python lists are flexible but inefficient for numerical computation.
NumPy arrays store data in contiguous memory blocks and execute operations
in optimized C code, which makes them significantly faster for ML workloads.


In [2]:
# Python list
py_list = [1, 2, 3, 4, 5]

# NumPy array
np_array = np.array(py_list)

print("Python list:", py_list)
print("NumPy array:", np_array)

Python list: [1, 2, 3, 4, 5]
NumPy array: [1 2 3 4 5]


## Array Properties

Key properties of NumPy arrays:
- `shape`: dimensions of the array
- `ndim`: number of dimensions
- `dtype`: data type of elements

These properties are critical when working with ML models.


In [3]:
print("Shape:", np_array.shape)
print("Number of dimensions:", np_array.ndim)
print("Data type:", np_array.dtype)

Shape: (5,)
Number of dimensions: 1
Data type: int64


## Data Types (dtype)

Machine learning algorithms rely heavily on numerical precision.
Choosing the correct dtype impacts:
- Memory usage
- Computation speed
- Numerical stability

In [4]:
int_array = np.array([1, 2, 3], dtype=np.int32)
float_array = np.array([1, 2, 3], dtype=np.float64)

print("int_array dtype:", int_array.dtype)
print("float_array dtype:", float_array.dtype)


int_array dtype: int32
float_array dtype: float64


In ML:
- Features are usually floating-point numbers
- Labels may be integers
- Incorrect dtypes can cause silent bugs or performance issues

Most ML frameworks expect `float32` or `float64` inputs.


In [5]:
# Convert integer array to float
converted = int_array.astype(np.float64)

print("Before:", int_array.dtype)
print("After:", converted.dtype)


Before: int32
After: float64


NumPy provides utility functions to initialize arrays, commonly used
in ML for weights, biases, and placeholders.


In [6]:
zeros = np.zeros((3, 3))
ones = np.ones((2, 4))
identity = np.eye(3)

print("Zeros:\n", zeros)
print("\nOnes:\n", ones)
print("\nIdentity matrix:\n", identity)


Zeros:
 [[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]

Ones:
 [[1. 1. 1. 1.]
 [1. 1. 1. 1.]]

Identity matrix:
 [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


NumPy arrays store homogeneous data, which makes them more memory-efficient
than Python lists.

In [7]:
import sys

list_data = list(range(1000))
array_data = np.array(list_data)

print("List memory (bytes):", sys.getsizeof(list_data))
print("NumPy array memory (bytes):", array_data.nbytes)


List memory (bytes): 8056
NumPy array memory (bytes): 8000


## Key Takeaways

- NumPy arrays are the foundation of ML computation
- `shape`, `ndim`, and `dtype` must always be understood
- Correct dtypes improve performance and stability
- NumPy arrays are faster and more memory-efficient than Python lists

This knowledge will be reused in Pandas, scikit-learn, and deep learning frameworks.
