# Advanced NumPy for Data Analysis and Scientific Computing

## 1. NumPy Fundamentals

NumPy (Numerical Python) is a fundamental package for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently.

### Why NumPy?

1. **Performance**: NumPy operations are implemented in C, making them much faster than equivalent Python code.
2. **Memory Efficiency**: NumPy arrays use less memory and provide better memory management than Python lists.
3. **Convenience**: NumPy provides a wide range of mathematical operations on arrays, making scientific computing more accessible.

### NumPy Arrays vs Python Lists

Let's compare NumPy arrays with Python lists:

In [None]:
import numpy as np
import sys

# Python list
py_list = [1, 2, 3, 4, 5]

# NumPy array
np_array = np.array([1, 2, 3, 4, 5])

print(f"Python list size: {sys.getsizeof(py_list) * len(py_list)} bytes")
print(f"NumPy array size: {np_array.nbytes} bytes")

You'll notice that the NumPy array uses significantly less memory.

### Performance Comparison

Let's compare the performance of NumPy arrays vs Python lists for a simple operation:

In [None]:
import time

# Python list operation
start_time = time.time()
result = [x**2 for x in range(1000000)]
end_time = time.time()
print(f"Python list operation time: {end_time - start_time} seconds")

# NumPy array operation
start_time = time.time()
np_array = np.arange(1000000)
result = np_array**2
end_time = time.time()
print(f"NumPy array operation time: {end_time - start_time} seconds")

The NumPy operation is significantly faster, especially for large arrays.

## 2. Creating and Manipulating NumPy Arrays

### Creating Arrays

NumPy provides several functions to create arrays:

In [None]:
# Create array from Python list
a = np.array([1, 2, 3, 4, 5])
print("From list:", a)

# Create array of zeros
b = np.zeros((3, 3))
print("Zeros:\n", b)

# Create array of ones
c = np.ones((2, 4))
print("Ones:\n", c)

# Create array with a range of elements
d = np.arange(0, 10, 2)
print("Arange:", d)

# Create array with evenly spaced elements
e = np.linspace(0, 1, 5)
print("Linspace:", e)

### Reshaping Arrays

You can change the shape of an array without changing its data:

In [None]:
a = np.arange(12)
print("Original:", a)

# Reshape to 2D array
b = a.reshape((3, 4))
print("Reshaped:\n", b)

# Flatten array
c = b.ravel()
print("Flattened:", c)

### Array Attributes

NumPy arrays have several useful attributes:

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6]])
print("Array:\n", a)
print("Shape:", a.shape)
print("Dimensions:", a.ndim)
print("Size:", a.size)
print("Data type:", a.dtype)

### Array Slicing and Indexing

NumPy provides powerful indexing capabilities:

In [None]:
a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print("Array:\n", a)

# Basic slicing
print("First row:", a[0])
print("First two rows:\n", a[:2])
print("First and third column:\n", a[:, [0, 2]])

# Boolean indexing
print("Elements greater than 5:\n", a[a > 5])

# Fancy indexing
indices = np.array([0, 2])
print("First and third row:\n", a[indices])

## 3. NumPy Operations

### Universal Functions (ufuncs)

Ufuncs operate element-wise on arrays:

In [None]:
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])

print("a + b =", np.add(a, b))
print("a - b =", np.subtract(a, b))
print("a * b =", np.multiply(a, b))
print("a / b =", np.divide(a, b))
print("a ** 2 =", np.power(a, 2))

### Broadcasting

Broadcasting allows NumPy to work with arrays of different shapes:

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([10, 20, 30])

print("a:\n", a)
print("b:", b)
print("a + b:\n", a + b)

### Statistical Functions

NumPy provides many statistical functions:

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6]])
print("Array:\n", a)
print("Mean:", np.mean(a))
print("Median:", np.median(a))
print("Standard deviation:", np.std(a))
print("Variance:", np.var(a))

### Linear Algebra Operations

NumPy includes basic linear algebra operations:

In [None]:
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

print("Matrix multiplication:\n", np.dot(a, b))
print("Determinant:", np.linalg.det(a))
print("Inverse:\n", np.linalg.inv(a))

## 4. Advanced Array Manipulation

### Stacking Arrays

You can combine arrays using various stacking functions:

In [None]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print("Vertical stack:\n", np.vstack((a, b)))
print("Horizontal stack:", np.hstack((a, b)))
print("Column stack:\n", np.column_stack((a, b)))

### Splitting Arrays

Arrays can be split into smaller arrays:

In [None]:
a = np.arange(9).reshape(3, 3)
print("Original array:\n", a)

print("Horizontal split:", np.hsplit(a, 3))
print("Vertical split:", np.vsplit(a, 3))

### Repeating Arrays

You can repeat arrays using `repeat()` and `tile()`:

In [None]:
a = np.array([1, 2, 3])
print("Original array:", a)
print("Repeat:", np.repeat(a, 3))
print("Tile:", np.tile(a, 3))

## 5. Random Number Generation

NumPy's random module provides various functions for random number generation:

In [None]:
# Set seed for reproducibility
np.random.seed(0)

# Generate random floats
print("Random floats:", np.random.rand(5))

# Generate random integers
print("Random integers:", np.random.randint(1, 10, 5))

# Generate random floats from normal distribution
print("Random normal:", np.random.randn(5))

## 6. File I/O with NumPy

NumPy provides functions to save and load array data:

In [None]:
# Save array to file
a = np.array([1, 2, 3, 4, 5])
np.save('my_array.npy', a)

# Load array from file
b = np.load('my_array.npy')
print("Loaded array:", b)

# Save and load multiple arrays
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
np.savez('arrays.npz', x=x, y=y)

loaded = np.load('arrays.npz')
print("Loaded x:", loaded['x'])
print("Loaded y:", loaded['y'])

# Working with CSV files
np.savetxt('array.csv', a, delimiter=',')
c = np.genfromtxt('array.csv', delimiter=',')
print("Loaded from CSV:", c)

## 7. Performance Optimization

### Vectorization

Vectorization is the process of replacing explicit loops with array operations. It's a key technique for optimizing NumPy code:

In [None]:
import time

# Non-vectorized operation
def square_loop(n):
    result = []
    for i in range(n):
        result.append(i**2)
    return result

# Vectorized operation
def square_vector(n):
    return np.arange(n)**2

n = 1000000

start = time.time()
result_loop = square_loop(n)
print(f"Loop time: {time.time() - start}")

start = time.time()
result_vector = square_vector(n)
print(f"Vector time: {time.time() - start}")

### Using np.vectorize()

For custom functions, you can use `np.vectorize()` to create a vectorized version:

In [None]:
def custom_func(x):
    if x < 0:
        return x**2
    else:
        return x**3

vectorized_func = np.vectorize(custom_func)

a = np.array([-2, -1, 0, 1, 2])
print("Original array:", a)
print("After custom function:", vectorized_func(a))