# Episode 2: Working with NumPy Arrays

NumPy is the foundation of scientific computing in Python. In this notebook, we'll learn how to work with NumPy arrays to analyze inflammation data efficiently.

## Learning Objectives
- Import and use NumPy
- Create and manipulate NumPy arrays
- Load data from files using NumPy
- Perform mathematical operations on arrays
- Calculate summary statistics

## Introduction

NumPy (Numerical Python) provides powerful tools for working with large arrays of data. It's much faster than regular Python lists for numerical computations.

## 1. Importing NumPy

First, let's import NumPy. The standard convention is to import it as `np`:

In [None]:
import numpy as np
print("NumPy version:", np.__version__)

## 2. Creating NumPy Arrays

Let's create some arrays to represent inflammation data:

In [None]:
# Create arrays from lists
patient_1_data = np.array([0, 1, 3, 5, 4, 2, 1, 0])
patient_2_data = np.array([0, 2, 4, 6, 5, 3, 2, 0])

print("Patient 1 inflammation data:", patient_1_data)
print("Patient 2 inflammation data:", patient_2_data)
print("Array type:", type(patient_1_data))
print("Data type:", patient_1_data.dtype)

In [None]:
# Create arrays with specific functions
zeros_array = np.zeros(10)  # Array of zeros
ones_array = np.ones(5)     # Array of ones
range_array = np.arange(0, 10, 2)  # Array with range
linspace_array = np.linspace(0, 1, 5)  # Array with equally spaced values

print("Zeros array:", zeros_array)
print("Ones array:", ones_array)
print("Range array:", range_array)
print("Linspace array:", linspace_array)

## 3. Array Properties

NumPy arrays have many useful properties:

In [None]:
# Array properties
print("Array shape:", patient_1_data.shape)
print("Array size:", patient_1_data.size)
print("Number of dimensions:", patient_1_data.ndim)
print("Data type:", patient_1_data.dtype)
print("Item size (bytes):", patient_1_data.itemsize)

## 4. Multi-dimensional Arrays

Real inflammation data is often stored as 2D arrays (patients × days):

In [None]:
# Create a 2D array: 3 patients × 8 days
inflammation_data = np.array([
    [0, 1, 3, 5, 4, 2, 1, 0],  # Patient 1
    [0, 2, 4, 6, 5, 3, 2, 0],  # Patient 2
    [0, 1, 2, 4, 3, 2, 1, 0]   # Patient 3
])

print("Inflammation data shape:", inflammation_data.shape)
print("Number of patients:", inflammation_data.shape[0])
print("Number of days:", inflammation_data.shape[1])
print("\nInflammation data:")
print(inflammation_data)

## 5. Array Indexing and Slicing

Accessing specific elements and subsets of arrays:

In [None]:
# Indexing (remember: Python uses 0-based indexing)
print("First patient's data:", inflammation_data[0])
print("Patient 1, Day 3 inflammation:", inflammation_data[0, 2])
print("Patient 2, Day 5 inflammation:", inflammation_data[1, 4])

# Slicing
print("\nFirst two patients:", inflammation_data[:2])
print("All patients, first 4 days:", inflammation_data[:, :4])
print("Patient 2, days 3-6:", inflammation_data[1, 2:6])

### Exercise 2.1
Extract the inflammation data for the last patient on the last three days:

In [None]:
# Exercise 2.1 - Your code here

## 6. Mathematical Operations

NumPy allows element-wise operations on entire arrays:

In [None]:
# Element-wise operations
doubled_data = inflammation_data * 2
celsius_to_fahrenheit = inflammation_data * 9/5 + 32  # If these were temperatures
normalized_data = inflammation_data / np.max(inflammation_data)

print("Original data (first patient):")
print(inflammation_data[0])
print("\nDoubled data (first patient):")
print(doubled_data[0])
print("\nNormalized data (first patient):")
print(normalized_data[0])

In [None]:
# Mathematical functions
sqrt_data = np.sqrt(inflammation_data)
log_data = np.log(inflammation_data + 1)  # Add 1 to avoid log(0)
exp_data = np.exp(inflammation_data / 10)  # Scale down to avoid overflow

print("Square root of inflammation data (first patient):")
print(sqrt_data[0])
print("\nLog of inflammation data (first patient):")
print(log_data[0])

## 7. Statistical Operations

NumPy provides many statistical functions:

In [None]:
# Basic statistics for the entire dataset
print("Dataset Statistics:")
print(f"Mean inflammation: {np.mean(inflammation_data):.2f}")
print(f"Standard deviation: {np.std(inflammation_data):.2f}")
print(f"Minimum value: {np.min(inflammation_data)}")
print(f"Maximum value: {np.max(inflammation_data)}")
print(f"Median value: {np.median(inflammation_data):.2f}")

In [None]:
# Statistics along specific axes
# axis=0: across patients (column-wise)
# axis=1: across days (row-wise)

daily_averages = np.mean(inflammation_data, axis=0)
patient_averages = np.mean(inflammation_data, axis=1)

print("Daily averages (across all patients):")
print(daily_averages)
print("\nPatient averages (across all days):")
print(patient_averages)

# Maximum inflammation per day
daily_max = np.max(inflammation_data, axis=0)
print("\nDaily maximum inflammation:")
print(daily_max)

## 8. Boolean Indexing

Select data based on conditions:

In [None]:
# Boolean conditions
high_inflammation = inflammation_data > 3
print("High inflammation (>3):")
print(high_inflammation)

# Count high inflammation readings
num_high_readings = np.sum(high_inflammation)
print(f"\nNumber of high inflammation readings: {num_high_readings}")

# Extract values that meet condition
high_values = inflammation_data[high_inflammation]
print(f"High inflammation values: {high_values}")
print(f"Average of high values: {np.mean(high_values):.2f}")

In [None]:
# Complex conditions
moderate_inflammation = (inflammation_data >= 2) & (inflammation_data <= 4)
extreme_values = (inflammation_data == 0) | (inflammation_data >= 5)

print(f"Moderate inflammation readings (2-4): {np.sum(moderate_inflammation)}")
print(f"Extreme inflammation readings (0 or >=5): {np.sum(extreme_values)}")

### Exercise 2.2
Find the percentage of readings that show no inflammation (value = 0):

In [None]:
# Exercise 2.2 - Your code here

## 9. Array Manipulation

Reshape, concatenate, and split arrays:

In [None]:
# Reshape arrays
flat_data = inflammation_data.flatten()  # Convert to 1D
reshaped_data = flat_data.reshape(4, 6)  # Reshape to 4×6

print("Original shape:", inflammation_data.shape)
print("Flattened shape:", flat_data.shape)
print("Reshaped to 4×6:", reshaped_data.shape)
print("\nReshaped data:")
print(reshaped_data)

In [None]:
# Transpose arrays
transposed_data = inflammation_data.T  # or np.transpose(inflammation_data)
print("Original shape:", inflammation_data.shape)
print("Transposed shape:", transposed_data.shape)
print("\nTransposed data (days × patients):")
print(transposed_data)

## 10. Loading Data from Files

In practice, you'll often load data from files. Let's simulate this:

In [None]:
# Create sample data file
sample_data = np.random.randint(0, 10, size=(5, 10))
np.savetxt('sample_inflammation.csv', sample_data, delimiter=',', fmt='%d')

# Load data from file
loaded_data = np.loadtxt('sample_inflammation.csv', delimiter=',')

print("Loaded data shape:", loaded_data.shape)
print("First few rows:")
print(loaded_data[:3, :5])  # First 3 rows, first 5 columns

# Verify data matches
print("\nData loading successful:", np.array_equal(sample_data, loaded_data))

## 11. Advanced Array Operations

Some useful advanced operations:

In [None]:
# Find indices of specific values
max_indices = np.where(inflammation_data == np.max(inflammation_data))
print("Indices of maximum values:")
print(f"Patients: {max_indices[0]}, Days: {max_indices[1]}")

# Sorting
sorted_patient_1 = np.sort(inflammation_data[0])
print(f"\nPatient 1 data sorted: {sorted_patient_1}")

# Unique values
unique_values = np.unique(inflammation_data)
print(f"Unique inflammation values: {unique_values}")

In [None]:
# Cumulative operations
cumulative_sum = np.cumsum(inflammation_data[0])
cumulative_max = np.maximum.accumulate(inflammation_data[0])

print("Patient 1 daily inflammation:", inflammation_data[0])
print("Cumulative sum:", cumulative_sum)
print("Cumulative maximum:", cumulative_max)

### Exercise 2.3
Calculate the day-to-day change in inflammation for each patient (difference between consecutive days):

In [None]:
# Exercise 2.3 - Your code here
# Hint: Use np.diff() function

## Summary

In this episode, we learned:
- **NumPy basics**: Importing and creating arrays
- **Array properties**: Shape, size, dtype
- **Indexing and slicing**: Accessing data subsets
- **Mathematical operations**: Element-wise calculations
- **Statistical functions**: Mean, std, min, max, median
- **Boolean indexing**: Conditional data selection
- **Array manipulation**: Reshape, transpose, flatten
- **File I/O**: Loading and saving data

NumPy is essential for efficient scientific computing in Python!

## Clean up

Remove the temporary file we created:

In [None]:
import os
if os.path.exists('sample_inflammation.csv'):
    os.remove('sample_inflammation.csv')
    print("Temporary file removed.")