# Advanced NumPy Features

This notebook covers advanced NumPy features including structured arrays, masked arrays, memory mapping, and performance optimization techniques.

## Import NumPy

In [None]:
import numpy as np

## Structured Arrays

Structured arrays allow you to store heterogeneous data types in a single array, similar to a database table or pandas DataFrame.

In [None]:
# Define data types for structured array
dt = np.dtype([('name', 'U10'), ('age', 'i4'), ('height', 'f8')])

# Create structured array
people = np.array([('Alice', 25, 5.5), ('Bob', 30, 6.0), ('Charlie', 35, 5.8)], dtype=dt)

print('Structured array:')
print(people)
print('\nData types:', people.dtype)

# Access fields
print('\nNames:', people['name'])
print('Ages:', people['age'])
print('Heights:', people['height'])

# Access individual elements
print('\nFirst person:', people[0])
print('Alice age:', people[0]['age'])

# Modify values
people[1]['age'] = 31
print('\nUpdated ages:', people['age'])

## Masked Arrays

Masked arrays allow you to work with arrays that contain missing or invalid data by masking certain elements.

In [None]:
import numpy.ma as ma

# Create array with some invalid data
data = np.array([1, 2, -999, 4, 5, -999, 7])
print('Original data:', data)

# Create masked array
masked_data = ma.masked_values(data, -999)
print('Masked data:', masked_data)
print('Mask:', masked_data.mask)

# Operations on masked arrays
print('Mean (ignoring masked):', masked_data.mean())
print('Sum (ignoring masked):', masked_data.sum())

# Create mask manually
arr = np.array([1, 2, 3, 4, 5])
mask = [False, True, False, True, False]
masked_arr = ma.masked_array(arr, mask=mask)
print('\nManually masked array:', masked_arr)
print('Compressed (non-masked only):', masked_arr.compressed())

## Memory Mapping

Memory mapping allows you to work with large arrays that don't fit in memory by mapping files directly to memory.

In [None]:
# Create a large array and save it
large_array = np.random.rand(1000, 1000)
np.save('large_array.npy', large_array)
print('Saved large array')

# Memory map the file
mmapped_array = np.load('large_array.npy', mmap_mode='r')
print('Memory mapped array shape:', mmapped_array.shape)
print('Memory mapped array dtype:', mmapped_array.dtype)

# Work with the memory mapped array (only loads data as needed)
subset = mmapped_array[100:200, 200:300]
print('Subset shape:', subset.shape)
print('Subset mean:', subset.mean())

# Memory mapping modes:
# 'r' - read-only
# 'r+' - read-write
# 'w+' - read-write, create if doesn't exist
# 'c' - copy-on-write

print('\nMemory mapping allows efficient access to large files without loading everything into memory.')

## Performance Optimization Tips

NumPy is designed for performance, but there are best practices to maximize efficiency.

In [None]:
import time

# Vectorized operations vs loops
arr = np.random.rand(1000000)

# Vectorized (fast)
start = time.time()
result_vec = np.sin(arr) + np.cos(arr)
vec_time = time.time() - start

# Loop (slow)
start = time.time()
result_loop = np.zeros_like(arr)
for i in range(len(arr)):
    result_loop[i] = np.sin(arr[i]) + np.cos(arr[i])
loop_time = time.time() - start

print(f'Vectorized time: {vec_time:.4f} seconds')
print(f'Loop time: {loop_time:.4f} seconds')
print(f'Speedup: {loop_time / vec_time:.1f}x')

# Use appropriate data types
print('\nData type memory usage:')
arr_float64 = np.array([1.0, 2.0, 3.0], dtype=np.float64)
arr_float32 = np.array([1.0, 2.0, 3.0], dtype=np.float32)

print(f'float64: {arr_float64.nbytes} bytes')
print(f'float32: {arr_float32.nbytes} bytes')

# In-place operations
a = np.random.rand(1000, 1000)
b = np.random.rand(1000, 1000)

# Not in-place (creates new array)
start = time.time()
c = a + b
not_inplace_time = time.time() - start

# In-place (modifies existing array)
start = time.time()
a += b
inplace_time = time.time() - start

print(f'\nNot in-place time: {not_inplace_time:.4f} seconds')
print(f'In-place time: {inplace_time:.4f} seconds')

## Summary

You have explored advanced NumPy features:
- Structured arrays for heterogeneous data
- Masked arrays for handling missing data
- Memory mapping for large datasets
- Performance optimization techniques

These advanced features make NumPy suitable for complex scientific computing and data analysis tasks.