<a href="https://colab.research.google.com/github/Ash100/Python_for_Lifescience/blob/main/Chapter_5_NumPy_for_Numerical_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Learn Python for Biological Data Analysis**
## **Chapter 5:** NumPy for Numerical Data

This course is designed and taught by **Dr. Ashfaq Ahmad**. During teaching I will use all the examples from the Biological Sciences or Life Sciences.

## 📅 Course Outline

---

## 🏗️ Foundation (Weeks 1–2)

### 📘 Chapter 1: Getting Started with Python and Colab [Watch Lecture](https://youtu.be/BKe2CmiG_TU)
- Introduction to Google Colab interface
- Basic Python syntax and data types
- Variables, strings, and basic operations
- Print statements and comments

### 📘 Chapter 2: Control Structures [Watch Lecture](https://youtu.be/uPHeqVb4Mo0)
- Conditional statements (`if`/`else`)
- Loops (`for` and `while`)
- Basic functions and scope

---

## 🧬 Data Handling (Weeks 3–4)

### 📘 Chapter 3: Data Structures for Biology [Watch Lecture](https://youtu.be/x1IJwSYhNZg)
- Lists and tuples (storing sequences, experimental data)
- Dictionaries (gene annotations, species data)
- Sets (unique identifiers, sample collections)

### 📘 Chapter 4: Working with Files [Watch Lecture](https://youtu.be/D27MyLpSdks)
- Reading and writing text files
- Handling CSV files (experimental data)
- Basic file operations for biological datasets

---

## 📊 Scientific Computing (Weeks 5–7)

### 📘 Chapter 5: NumPy for Numerical Data
- Arrays for storing experimental measurements
- Mathematical operations on datasets
- Statistical calculations (mean, median, standard deviation)

### 📘 Chapter 6: Pandas for Data Analysis
- DataFrames for structured biological data
- Data cleaning and manipulation
- Filtering and grouping experimental results
- Handling missing data

### 📘 Chapter 7: Data Visualization
- Matplotlib basics for scientific plots
- Creating publication-quality figures
- Specialized plots for biological data (histograms, scatter plots, box plots)

---

## 🔬 Biological Applications (Weeks 8–10)

### 📘 Chapter 8: Sequence Analysis
- String manipulation for DNA/RNA sequences
- Basic sequence operations (reverse complement, transcription)
- Reading FASTA files
- Simple sequence statistics

### 📘 Chapter 9: Statistical Analysis for Biology
- Hypothesis testing basics
- t-tests and chi-square tests
- Correlation analysis
- Introduction to `scipy.stats`

### 📘 Chapter 10: Practical Projects
- Analyzing gene expression data
- Population genetics calculations
- Ecological data analysis
- Creating reproducible research workflows

---

## 🚀 Advanced Topics *(Optional – Weeks 11–12)*

### 📘 Chapter 11: Bioinformatics Libraries
- Introduction to Biopython
- Working with biological databases
- Phylogenetic analysis basics

### 📘 Chapter 12: Best Practices
- Code organization and documentation
- Error handling
- Reproducible research practices
- Sharing code and results

---

✅ We will move from basic programming concepts to practical biological applications, ensuring students can immediately apply what they learn to their research and coursework.


##Introduction
NumPy is a foundational library in Python for numerical computing. In biology, we often deal with large datasets, from gene expression levels to protein concentrations. NumPy provides a powerful and efficient way to handle these numerical data.

## What is a NumPy Array?
A NumPy array is a grid of values, all of the same type, and is indexed by a tuple of non-negative integers. Think of it as a super-charged list that's designed for numerical operations. While a standard Python list can hold different data types and is great for general purpose use, a NumPy array is optimized for performing mathematical operations on entire datasets at once.

###Creating an Array
We'll start by importing the NumPy library, which is a standard practice. We use the alias `np` to make our code cleaner.

In [None]:
import numpy as np

# Creating an array from a list
measurements = np.array([2.5, 3.1, 2.8, 3.5, 2.9])
print(measurements)

In the context of biology, this array could represent the lengths of bacterial cells in micrometers, or the optical density readings of a culture over time.

### Mathematical Operations on Datasets
One of NumPy's key strengths is its ability to perform element-wise operations. This means you can apply a mathematical operation to every single element in an array with a single line of code. This is much faster and more concise than using a for loop.

### Scalar Operations
You can add, subtract, multiply, or divide an array by a single number (a scalar). For example, if we need to convert our bacterial cell lengths from **micrometers** to **nanometers**, we'd simply multiply the entire array by 1000.

In [None]:
# Our measurements in micrometers
micrometers = np.array([2.5, 3.1, 2.8, 3.5, 2.9])

# Convert to nanometers
nanometers = micrometers * 1000
print(f"Measurements in nanometers: {nanometers}")

### Array-to-Array Operations
You can also perform operations between two arrays. This is useful for comparing or combining different datasets. For example, if we have a second set of measurements and want to find the difference between the two, we can do this directly.

In [None]:
# First set of cell lengths (micrometers)
set1 = np.array([2.5, 3.1, 2.8, 3.5, 2.9])

# Second set of cell lengths
set2 = np.array([2.6, 3.0, 2.9, 3.4, 3.0])

# Find the difference between the two sets
difference = set1 - set2
print(f"Difference between sets: {difference}")

###Statistical Calculations
NumPy makes it incredibly easy to compute common statistical measures on our biological data. These functions are built directly into the NumPy library and operate on the entire array.

###Mean
The mean (average) is a central measure of a dataset. In biology, we often calculate the mean of replicates to determine the average effect of a treatment.

In [None]:
data = np.array([120, 125, 118, 130, 122])  # Heart rate measurements

# Calculate the mean
average_heart_rate = np.mean(data)
print(f"Average heart rate: {average_heart_rate}")

###Median
The median is the middle value of a dataset. It's especially useful when your data might have outliers, as the median is less affected by extreme values than the mean.

In [None]:
# A dataset with an outlier
gene_expression = np.array([10, 12, 11, 13, 200])

# Calculate the median
median_expression = np.median(gene_expression)
print(f"Median gene expression: {median_expression}")

###Standard Deviation
The standard deviation measures the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean, while a high standard deviation indicates that the values are spread out over a wider range. This is crucial for understanding the variability in our biological experiments.

In [None]:
# Let's use our heart rate data again
data = np.array([120, 125, 118, 130, 122])

# Calculate the standard deviation
std_dev = np.std(data)
print(f"Standard deviation of heart rates: {std_dev}")

###Example 1. <br>Arrays for Storing Experimental Measurements
Suppose we measured enzyme activity (in μmol/min) across 5 samples:

In [None]:
enzyme_activity = np.array([12.5, 15.3, 14.8, 13.1, 16.0])
print("Enzyme activity:", enzyme_activity)

###Normalize Enzyme Activity
We often normalize data to compare across experiments:

In [None]:
mean_activity = np.mean(enzyme_activity)
normalized_activity = enzyme_activity / mean_activity
print("Normalized activity:", normalized_activity)

###Fold Change in Gene Expression Data

In [None]:
control_expr = np.array([100, 200, 150])
treated_expr = np.array([300, 400, 225])

fold_change = treated_expr / control_expr
print("Fold change:", fold_change)


### Mean, Median, Standard Deviation

In [None]:
print("Mean:", np.mean(enzyme_activity))
print("Median:", np.median(enzyme_activity))
print("Standard Deviation:", np.std(enzyme_activity))

### Exercise for you

In [None]:
protein_conc = np.array([2.5, 3.0, 2.8, 3.2, 2.9, 3.1])

Calculate Mean, Median, Standard Deviation and Normalize the dataset for the above Array. You can use the below empty cells.

### Thank you for takinf interests in this class.