# Introduction to NumPy

Welcome to your first step in learning data analysis with Python!

### What is NumPy?

NumPy (Numerical Python) is a powerful library that lets you:

- Work efficiently with numerical data
- Store data in arrays (like tables of numbers)
- Perform fast calculations and statistics
- Analyze experimental data in biology and chemistry

> NumPy is fast and efficient because it stores data in compact arrays and performs operations on entire datasets at once — no loops needed.  


---


### In this chapter, you’ll learn how to:

1. Create and use NumPy arrays  
2. Perform basic math and statistics  
3. Analyze real-world data like protein concentrations and enzyme activity

<br>


---

## Quick Introduction to Useful NumPy Syntax

Before we jump into real examples, here are some of the most common things you'll do with NumPy:

- **Creating arrays**  
  `np.array([1, 2, 3])` → turns a Python list into a NumPy array  

- **Generating ranges of numbers**  
  `np.arange(0, 10, 2)` → `[0 2 4 6 8]`  

- **Basic math operations (element-wise)**  
  `array * 2`, `array + 5`, `array / 10` → applies to every element automatically  

- **Useful stats functions**  
  `np.mean(array)`, `np.std(array)`, `np.min(array)`, `np.max(array)`  

- **Differences between values**  
  `np.diff(array)` → calculates the difference between each pair of elements  

- **Unit conversion**  
  You can convert values easily:  
  `nmol = umol_array * 1000`  

These are the building blocks we'll use in the following biochemistry examples.


In [1]:
# First, import NumPy
import numpy as np

---

## Creating Arrays

### From Python lists:

In [2]:
arr = np.array([1, 2, 3, 4])
print(arr)

[1 2 3 4]


---

### Multidimensional array:

In [3]:
mat = np.array([[1, 2], [3, 4]])
print(mat)

[[1 2]
 [3 4]]


---

## Array Attributes

In [4]:
print(arr.shape)      # Shape of array
print(arr.dtype)      # Data type
print(arr.ndim)       # Number of dimensions

(4,)
int64
1


---

## Creating Arrays with Defaults

In [5]:
np.zeros((2, 3))        # 2x3 array of zeros
np.ones((3,))           # 1D array of ones
np.arange(0, 10, 2)     # [0, 2, 4, 6, 8]
np.linspace(0, 1, 5)    # 5 evenly spaced points from 0 to 1

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

--- 

## Array Indexing and Slicing

In [6]:
arr = np.array([10, 20, 30, 40])
print(arr[1])           # 20
print(arr[1:3])         # [20 30]
mat = np.array([[1, 2, 3], [4, 5, 6]])
print(mat[1, 2])        # 6

20
[20 30]
6


---

## Vectorized Operations

In [7]:
arr = np.array([1, 2, 3])
print(arr + 5)           # [6 7 8]
print(arr * 2)           # [2 4 6]

[6 7 8]
[2 4 6]


---

## Mathematical Functions

In [8]:
np.mean(arr)
np.std(arr)
np.sum(arr)
np.max(arr)
np.min(arr)

np.int64(1)

---

## Reshaping and Transposing

In [9]:
arr = np.arange(6).reshape(2, 3)  # 2 rows, 3 columns
print(arr)

print(arr.T)                     # Transpose

[[0 1 2]
 [3 4 5]]
[[0 3]
 [1 4]
 [2 5]]


---

## Example 1: Protein Concentration Measurements

Let's say you've measured the concentration of a protein (in mg/mL) in several different samples.

We'll use a NumPy array to store and analyze these values.

In [10]:
# Create a NumPy array of protein concentrations (mg/mL)
protein_conc = np.array([1.2, 2.5, 2.3, 1.8, 2.1, 1.9, 2.6])

# Print the raw data
print("Protein concentrations (mg/mL):", protein_conc)

Protein concentrations (mg/mL): [1.2 2.5 2.3 1.8 2.1 1.9 2.6]


In [11]:
# Calculate the average (mean) concentration
mean_conc = np.mean(protein_conc)
print("Mean concentration:", mean_conc)

# Calculate the standard deviation (spread of values)
std_conc = np.std(protein_conc)
print("Standard deviation:", std_conc)

# Find the maximum and minimum values
max_conc = np.max(protein_conc)
min_conc = np.min(protein_conc)
print("Max concentration:", max_conc)
print("Min concentration:", min_conc)

Mean concentration: 2.0571428571428574
Standard deviation: 0.4435478484645721
Max concentration: 2.6
Min concentration: 1.2


## Example 2: Enzyme Activity Over Time

Suppose you're studying an enzyme reaction and you measure activity (µmol/min) every 5 minutes.

Let’s store the time and activity data and calculate the rate of change (how fast the reaction speeds up).

In [12]:
# Time in minutes and enzyme activity in µmol/min
time_minutes = np.array([0, 5, 10, 15, 20, 25, 30])
enzyme_activity = np.array([0, 2.3, 4.1, 5.8, 7.2, 7.9, 8.1])

print("Time (minutes):", time_minutes)
print("Enzyme activity (µmol/min):", enzyme_activity)

Time (minutes): [ 0  5 10 15 20 25 30]
Enzyme activity (µmol/min): [0.  2.3 4.1 5.8 7.2 7.9 8.1]


In [13]:
# Calculate the rate of change between time points (Δactivity / Δtime)
rate_of_change = np.diff(enzyme_activity) / np.diff(time_minutes)
print("Rate of change (µmol/min per minute):", rate_of_change)

Rate of change (µmol/min per minute): [0.46 0.36 0.34 0.28 0.14 0.04]


In [14]:
# Convert activity to nmol/min (1 µmol = 1000 nmol)
enzyme_activity_nmol = enzyme_activity * 1000
print("Enzyme activity in nmol/min:", enzyme_activity_nmol)

Enzyme activity in nmol/min: [   0. 2300. 4100. 5800. 7200. 7900. 8100.]


## Example 3: DNA Melting Temperature vs. Salt Concentration

DNA melting temperature (Tm) depends on salt concentration. We'll simulate this relationship.

Here’s how Tm changes at different NaCl concentrations.

In [15]:
# Salt concentration in mM and DNA Tm in °C
salt_conc_mM = np.array([50, 100, 150, 200, 250])
tm_values = np.array([65.2, 67.5, 68.9, 70.1, 71.0])

print("Salt concentrations (mM):", salt_conc_mM)
print("Melting temperatures (°C):", tm_values)

Salt concentrations (mM): [ 50 100 150 200 250]
Melting temperatures (°C): [65.2 67.5 68.9 70.1 71. ]


In [16]:
# Calculate the Tm increase between each salt step
delta_tm = np.diff(tm_values)
print("Increase in Tm between salt steps (°C):", delta_tm)

# Average increase per 50 mM
avg_increase_per_50mM = np.mean(delta_tm)
print("Average increase per 50 mM NaCl:", avg_increase_per_50mM, "°C")

Increase in Tm between salt steps (°C): [2.3 1.4 1.2 0.9]
Average increase per 50 mM NaCl: 1.4499999999999993 °C


## Summary

In this chapter, you learned how to:

- Use NumPy to store experimental data in arrays
- Calculate basic statistics (mean, std, max, min)
- Compute differences and apply unit conversions
- Analyze bioscience data like enzyme activity and DNA melting temperatures
