# NumPy Fundamentals

## What is NumPy?

NumPy (Numerical Python) is a foundational library for scientific computing and data analysis in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these efficiently.

## Why NumPy is Essential for This Course

Throughout this course, you'll use NumPy extensively:
- **Week 2-13**: Practical notebooks for data analysis
- **Materials science**: Lattice vectors, atomic positions, energy calculations
- **Machine learning**: Feature extraction, data preprocessing
- **High-throughput**: Processing large datasets efficiently

## Learning Objectives

By the end of this chapter, you should:
- Create and manipulate NumPy arrays
- Understand array indexing, slicing, and broadcasting
- Perform numerical operations efficiently
- Use NumPy's vectorized operations for performance

## Topics

1. Array Creation and Shapes
2. Array Indexing and Slicing
3. Data Types
4. Array Operations
5. Broadcasting
6. Linear Algebra with NumPy
7. Performance Considerations

---

## Array Creation and Shapes

### Creating Arrays

In [None]:
import numpy as np

# 1D array (vector)
energy_levels = np.array([0.5, 1.2, 2.5, 3.0])
print(f"1D array shape: {energy_levels.shape}")  # (4,)

# 2D array (matrix)
lattice_vectors = np.array([[5.43, 0, 0],
                               [0, 5.43, 0],
                               [0, 0, 5.43]])
print(f"2D array shape: {lattice_vectors.shape}")  # (3, 3)

# Scalar
temperature = 298.0  # Room temperature in Kelvin
print(f"Scalar: {temperature}, shape: {np.array(temperature).shape}")  # ()

### Array Shapes

The **shape** attribute of an array returns a tuple `(nrows, ncols)` indicating the array's dimensions.

In [None]:
# Check shapes
print(f"Energy levels shape: {energy_levels.shape}")
print(f"Energy levels ndim: {energy_levels.ndim}")  # 1
print(f"Lattice vectors ndim: {lattice_vectors.ndim}")  # 2
print(f"Temperature ndim: {np.array(temperature).ndim}")  # 0

---

## Array Indexing and Slicing

### Basic Indexing

In [None]:
# Create a 3x3 matrix
matrix = np.array([[1, 2, 3],
                    [4, 5, 6],
                    [7, 8, 9]])

print("Matrix:")
print(matrix)
print()

# Access elements
print(f"matrix[0, 0]: {matrix[0, 0]}")  # 1
print(f"matrix[1, 0]: {matrix[1, 0]}")  # 4
print(f"matrix[2, 2]: {matrix[2, 2]}")  # 9

### Slicing with Ranges

In [None]:
# Slice first row
print(f"First row: {matrix[0, :]}")  # [1, 2, 3]

# Slice last two rows
print(f"Last two rows:\n{matrix[-2:, :]}")  # rows 1 and 2

# Slice columns
print(f"First two columns: {matrix[:, :2]}")  # columns 0 and 1

### Boolean Indexing

In [None]:
# Boolean indexing
grades = np.array([70, 85, 92, 78, 88, 65, 72, 95])
passing = grades >= 75
print(f"Passing grades: {grades[passing]}")
print(f"Grade count: {np.sum(passing)}")  # 5 passed

### Applications in Materials Science

Example: Silicon diamond structure (8 atoms)

In [None]:
# Atomic positions - Nx3 array
# Example: Silicon diamond structure (8 atoms)
si_positions = np.array([
    [0.000, 0.000, 0.000],
    [0.125, 0.125, 0.125],
    [0.250, 0.250, 0.250],
    [0.375, 0.375, 0.375],
    [0.500, 0.500, 0.500],
    [0.625, 0.625, 0.625],
    [0.750, 0.750, 0.750],
    [0.875, 0.875, 0.875]
])

print(f"Silicon positions shape: {si_positions.shape}")  # (8, 3)
print(f"Si lattice constant: {si_positions[0, 0]}")  # 0.0

---

## Array Operations

### Element-wise Operations

In [None]:
# Basic operations
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print(f"a + b = {a + b}")  # [5, 7, 9]
print(f"b - a = {b - a}")  # [3, 3, 3]
print(f"a * b = {a * b}")  # [4, 10, 18]
print(f"a / b = {a / b}")  # [0.25, 0.4, 0.5]
print(f"a ** 2 = {a ** 2}")  # [1, 4, 9]

### Statistical Operations

In [None]:
# Statistics on array
data = np.array([23.5, 28.2, 27.1, 26.3, 24.9, 25.0,
               23.8, 24.2, 20.3, 22.7, 21.1, 20.5])

print(f"Mean: {np.mean(data):.2f}")
print(f"Standard deviation: {np.std(data):.2f}")
print(f"Min: {np.min(data):.2f}")
print(f"Max: {np.max(data):.2f}")
print(f"Sum: {np.sum(data):.2f}")
print(f"Median: {np.median(data):.2f}")

### Materials science example: bulk modulus

In [None]:
# Materials science example: bulk modulus
bulk_moduli = np.array([100, 150, 200, 300, 250])
print(f"Mean bulk modulus: {np.mean(bulk_moduli):.2f} GPa")
print(f"Standard deviation: {np.std(bulk_moduli):.2f} GPa")

---

## Broadcasting

### What is Broadcasting?

NumPy's broadcasting mechanism allows operations between arrays of different shapes without explicit loops.

### Basic Broadcasting Example

In [None]:
# Example: Scale energy levels by different constants
energy_levels = np.array([[0.5, 1.0, 2.0],
                        [0.8, 1.2, 2.0],
                        [1.6, 1.4, 2.0]])

scaling_factors = np.array([0.8, 1.1, 1.2])

# Broadcasting: (3, 3) * (3,)
scaled_energies = energy_levels * scaling_factors
print(f"Scaled energies shape: {scaled_energies.shape}")  # (3, 3)
print(f"Scaled energies:\n{scaled_energies}")

---

## Performance Considerations

### Vectorization

NumPy's vectorized operations are implemented in C and optimized for performance.

In [None]:
import time

# BAD: Using loop for element-wise operations
a = np.random.rand(1000, 1000)

start = time.time()
result_loop = np.zeros_like(a)
for i in range(a.shape[0]):
    result_loop[i] = a[i] * 2
time_loop = time.time() - start

# GOOD: Vectorized operation
start = time.time()
result_vectorized = a * 2
time_vectorized = time.time() - start

print(f"Loop time: {time_loop:.4f} seconds")
print(f"Vectorized time: {time_vectorized:.4f} seconds")
print(f"Speedup: {time_loop/time_vectorized:.1f}x")

### Memory Efficiency

In [None]:
# Create large array
large_array = np.ones((1000, 1000))  # 8MB

# BAD: Creating copy in memory (wasteful)
# large_array_copy = large_array.copy()

# GOOD: Use view or assignment
large_array_view = large_array[:, 100]  # Memory efficient
print(f"View shape: {large_array_view.shape}")

---

## Key Takeaways

1. **Use `np.array()`** instead of lists for numerical data
2. **Understand `ndim`**: Check array dimensionality before operations
3. **Use slicing `:`** instead of loops when possible
4. **Prefer vectorized operations**: `*` and `@` for element-wise operations
5. **Check data types**: Use appropriate types to save memory
6. **Use `np.sum()`, `np.mean()`, etc.** instead of Python's built-in `sum()`, `mean()`
7. **Be aware of broadcasting**: Understand shape compatibility rules