<a href="https://colab.research.google.com/github/Preetirai-tech/Python-Tutorials-on-Structural-Biology/blob/main/chapter_1_foundation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Learn Python for Biological Science**

This course is designed and taught by **Dr. Ashfaq Ahmad**. During teaching I will use all the examples from the Biological Sciences or Life Sciences.

## 📅 Course Outline

---

## 🏗️ Foundation (Weeks 1–2)

### 📘 Chapter 1: Getting Started with Python and Colab
- Introduction to Google Colab interface
- Basic Python syntax and data types
- Variables, strings, and basic operations
- Print statements and comments

### 📘 Chapter 2: Control Structures
- Conditional statements (`if`/`else`)
- Loops (`for` and `while`)
- Basic functions and scope

---

## 🧬 Data Handling (Weeks 3–4)

### 📘 Chapter 3: Data Structures for Biology
- Lists and tuples (storing sequences, experimental data)
- Dictionaries (gene annotations, species data)
- Sets (unique identifiers, sample collections)

### 📘 Chapter 4: Working with Files
- Reading and writing text files
- Handling CSV files (experimental data)
- Basic file operations for biological datasets

---

## 📊 Scientific Computing (Weeks 5–7)

### 📘 Chapter 5: NumPy for Numerical Data
- Arrays for storing experimental measurements
- Mathematical operations on datasets
- Statistical calculations (mean, median, standard deviation)

### 📘 Chapter 6: Pandas for Data Analysis
- DataFrames for structured biological data
- Data cleaning and manipulation
- Filtering and grouping experimental results
- Handling missing data

### 📘 Chapter 7: Data Visualization
- Matplotlib basics for scientific plots
- Creating publication-quality figures
- Specialized plots for biological data (histograms, scatter plots, box plots)

---

## 🔬 Biological Applications (Weeks 8–10)

### 📘 Chapter 8: Sequence Analysis
- String manipulation for DNA/RNA sequences
- Basic sequence operations (reverse complement, transcription)
- Reading FASTA files
- Simple sequence statistics

### 📘 Chapter 9: Statistical Analysis for Biology
- Hypothesis testing basics
- t-tests and chi-square tests
- Correlation analysis
- Introduction to `scipy.stats`

### 📘 Chapter 10: Practical Projects
- Analyzing gene expression data
- Population genetics calculations
- Ecological data analysis
- Creating reproducible research workflows

---

## 🚀 Advanced Topics *(Optional – Weeks 11–12)*

### 📘 Chapter 11: Bioinformatics Libraries
- Introduction to Biopython
- Working with biological databases
- Phylogenetic analysis basics

### 📘 Chapter 12: Best Practices
- Code organization and documentation
- Error handling
- Reproducible research practices
- Sharing code and results

---

## 🧠 Key Teaching Strategies

1. Start each chapter with biological context – explain why the programming concept matters for their field.
2. Use biological datasets throughout – gene sequences, experimental measurements, species data.
3. Include hands-on exercises after each concept.
4. Emphasize reproducibility – show how code documents their analysis process.
5. Build complexity gradually – start with simple examples, then real research scenarios.

---

✅ This progression moves from basic programming concepts to practical biological applications, ensuring students can immediately apply what they learn to their research and coursework.


# **Chapter 1:** Getting Started with Python and Colab
# Python for Biological Sciences


Welcome to your first Python programming lesson!

This notebook is designed specifically for biological science students.
You'll learn programming concepts using examples relevant to biology,
from DNA sequences to experimental data analysis.

By the end of this chapter, you'll be able to:
- Navigate Google Colab
- Write basic Python code
- Work with different data types
- Understand variables and basic operations
- Use print statements effectively


# ========================================================================
# SECTION 1: INTRODUCTION TO GOOGLE COLAB
# =======================================================================


🔬 **WHAT IS GOOGLE COLAB?**

Google Colab is a free online platform that lets you write and run Python code
in your browser - no installation required! It's perfect for:
- Data analysis
- Creating graphs and visualizations
- Sharing your research code
- Collaborating with colleagues

🧬 **WHY PYTHON FOR BIOLOGY?**
- Analyze DNA sequences
- Process experimental data
- Create publication-quality graphs
- Automate repetitive calculations
- Share reproducible research


# =============================================================
# SECTION 2: YOUR FIRST PYTHON CODE
# ==============================================================

# Let's start with the traditional first program - but with a biological twist!

In [None]:
print("Welcome to Python for Biology!")

Welcome to Python for Biology!


In [None]:
print(Welcome to Python for Biology)

# =================================================================
# SECTION 3: COMMENTS - DOCUMENTING YOUR CODE
# ==================================================================


Comments are text that Python ignores - they're for humans to read.
In research, good comments are essential for:
- Explaining what your analysis does
- Helping collaborators understand your code
- Remembering what you did months later


In [None]:
print("This code will run")  # Comment at the end of a line

This code will run


# This is a single-line comment
print("This code will run")  # Comment at the end of a line

This is a multi-line comment (docstring).
Use it for longer explanations of your code.
Perfect for describing experimental methods!

In [None]:
organism = "E. coli"

In [None]:
temperature = 37

In [None]:
print(f"Growing {organism} at {temperature}°C")

In [None]:
Boy = "David"
Age = 25
Country = "USA"

In [None]:
print(f"{Boy} is of {Age} years old living in {Country} ")

# ===============================================================
# SECTION 4: VARIABLES - STORING YOUR DATA
# =================================================================

Variables are containers for storing data values.
Think of them as labeled test tubes in your lab!

🧪 NAMING RULES FOR VARIABLES:
- Must start with a letter or underscore
- Can contain letters, numbers, and underscores
- Case-sensitive (DNA ≠ dna)
- Use descriptive names

## 🧮 VARIABLES in Python

### ✅ What is a Variable?

A **variable** is a **name** that stores a **value** in memory so you can use it later in your code.

Think of it like a labeled container that holds data.

---

In [None]:
# Examples of good variable names for biology
gene_name = "BRCA1"
patient_age = 45
dna_concentration = 250.5
is_mutation_present = True

In [None]:
# Examples of poor variable names
x = "BRCA1"  # Not descriptive
2gene = "BRCA1"  # Starts with number (ERROR!)
gene-name = "BRCA1"  # Contains hyphen (ERROR!)

In [None]:
# Print variables to see their values
print("Gene:", gene_name)
print("Patient age:", patient_age)
print("DNA concentration:", dna_concentration, "ng/μL")
print("Mutation present:", is_mutation_present

In [None]:
print("Gene:", gene_name)

In [None]:
print("Patient age:", patient_age)

In [None]:
print("DNA concentration:", dna_concentration, "ng/μL")

In [None]:
print("Mutation present:", is_mutation_present)

### **EXERCISE:** Create variables for your own biological data

### Example: species_name, sample_size, experiment_date

# ===============================================================
# SECTION 5: DATA TYPES - DIFFERENT KINDS OF INFORMATION
# ================================================================


Python has several basic data types. Let's explore them with biological examples:


## 🧬 DATA TYPES in Python

Python has several **built-in data types** that describe the kind of data a variable holds.

Understanding data types is essential for writing correct and efficient code.

---

### ✅ Common Data Types

| Type       | Example         | Description                          |
|------------|------------------|--------------------------------------|
| `int`      | `5`              | Integer (whole number)               |
| `float`    | `3.14`           | Decimal number                       |
| `str`      | `"ATGC"`         | String (text or sequence)            |
| `bool`     | `True`, `False`  | Boolean (logical values)             |
| `list`     | `[1, 2, 3]`      | Ordered, mutable collection          |
| `tuple`    | `(1, 2)`         | Ordered, **immutable** collection    |
| `dict`     | `{"gene": "BRCA1"}` | Key-value pairs                   |
| `set`      | `{1, 2, 3}`      | Unordered, **unique** values         |
| `NoneType` | `None`           | Represents no value or "empty"       |

---

### 🔬 Examples in Biology

```python
gene = "TP53"                # str
length = 393                 # int
gc_content = 0.55            # float
is_protein_coding = True     # bool
bases = ["A", "T", "G", "C"] # list

In [None]:
# 1. STRINGS (text) - for names, sequences, descriptions
species = "Homo sapiens"
dna_sequence = "ATCGTAGCTA"
experiment_notes = "Sample collected from healthy tissue"

In [None]:
print("Species:", species)

In [None]:
dna_sequence = "ATCGTAGCTA"

In [None]:
experiment_notes = "Sample collected from healthy tissue"

In [None]:
# 2. INTEGERS (whole numbers) - for counts, ages, generations
chromosome_count = 46
patient_count = 120
generation_number = 5

In [None]:
print("Chromosomes:", chromosome_count)

In [None]:
print("Patients in study:", patient_count)

In [None]:
print("Generation:", generation_number)

In [None]:
# 3. FLOATS (decimal numbers) - for measurements, concentrations
body_temperature = 37.2
ph_level = 7.4
protein_concentration = 15.8

In [None]:
print("Temperature:", body_temperature, "°C")
print("pH level:", ph_level)
print("Protein concentration:", protein_concentration, "mg/mL")

In [None]:
# 4. BOOLEANS (True/False) - for yes/no questions
is_healthy = True
mutation_detected = False
treatment_effective = True

In [None]:
print("Patient healthy:", is_healthy)
print("Mutation detected:", mutation_detected)
print("Treatment effective:", treatment_effective)

In [None]:
# Check the type of a variable
print("\nData types:")
print("Type of species:", type(species))
print("Type of chromosome_count:", type(chromosome_count))
print("Type of body_temperature:", type(body_temperature))
print("Type of is_healthy:", type(is_healthy))

# ================================================================
# SECTION 6: BASIC OPERATIONS
# =================================================================

Let's perform some basic operations with biological data

In [None]:
# ARITHMETIC OPERATIONS
sample_1_cells = 1000000
sample_2_cells = 750000
total_cells = sample_1_cells + sample_2_cells

In [None]:
print("Sample 1 cells:", sample_1_cells)
print("Sample 2 cells:", sample_2_cells)
print("Total cells:", total_cells)

In [None]:
# More arithmetic operations
initial_bacteria = 100
growth_rate = 2
time_hours = 3

In [None]:
# Bacterial growth calculation
final_bacteria = initial_bacteria * (growth_rate ** time_hours)
print(f"After {time_hours} hours: {final_bacteria} bacteria")

After 3 hours: 800 bacteria


In [None]:
# STRING OPERATIONS
gene_id = "ENSG00000139618"
gene_symbol = "BRCA2"
full_identifier = gene_id + "_" + gene_symbol

print("Full gene identifier:", full_identifier)

In [None]:
# STRING REPETITION (useful for creating sequences)
codon = "ATG"
start_sequence = codon * 3
print("Repeated start codon:", start_sequence)

# ===============================================================
# SECTION 7: PRINT STATEMENTS - DISPLAYING YOUR RESULTS
# ===============================================================
The print() function is your main tool for displaying results.
Let's explore different ways to format output for biological data.


In [None]:
# Basic printing
organism = "Drosophila melanogaster"
print(organism)

In [None]:
# Printing multiple items
gene = "white"
chromosome = "X"
print("Gene:", gene, "Location:", chromosome)

Gene: white Location: X


In [None]:
# Using f-strings (formatted strings) - RECOMMENDED for biology
temperature = 25.5
humidity = 65
print(f"Culture conditions: {temperature}°C, {humidity}% humidity")

In [None]:
# Including units and proper formatting
concentration = 0.05
print(f"Antibiotic concentration: {concentration} mg/mL")

In [None]:
# Formatting numbers
pi = 3.14159265359
print(f"π rounded to 2 decimals: {pi:.2f}")

In [None]:
# Multiple variables in one f-string
sample_id = "S001"
cell_count = 250000
viability = 0.95
print(f"Sample {sample_id}: {cell_count} cells, {viability:.1%} viable")

**What does :.1% mean?**
It tells Python to:

Format the number as a percentage

With 1 digit after the decimal point

# ============================================================
# SECTION 8: INTERACTIVE EXERCISES
# ============================================================


EXERCISE 1: Create Your Lab Profile
Create variables for your information and print them nicely.

In [None]:
# TODO: Fill in your information
your_name = "Your Name Here"
research_area = "Your Research Area"
favorite_organism = "Your Favorite Organism"
years_experience = 0


In [None]:
print("=== LAB MEMBER PROFILE ===")
print(f"Name: {your_name}")
print(f"Research Area: {research_area}")
print(f"Favorite Organism: {favorite_organism}")
print(f"Years of Experience: {years_experience}")

EXERCISE 2: DNA Sequence Analysis
Work with a DNA sequence and extract basic information.

In [None]:
# Given DNA sequence
dna_seq = "ATGCGATCGTAGCTAGCATGC"

# TODO: Calculate and print the following:
sequence_length = len(dna_seq)

In Python, len() is a built-in function that returns the number of items in an object.

In [None]:
print(f"DNA sequence: {dna_seq}")
print(f"Length: {sequence_length} base pairs")

In [None]:
# Count specific nucleotides (we'll learn better ways later!)
a_count = dna_seq.count('A')
t_count = dna_seq.count('T')
g_count = dna_seq.count('G')
c_count = dna_seq.count('C')

In Python, .count() is a string (or list/tuple) method used to count how many times a specific element or substring appears.

In [None]:
print(f"A: {a_count}, T: {t_count}, G: {g_count}, C: {c_count}")

In [None]:
# Calculate GC content (important for DNA analysis!)
gc_content = (g_count + c_count) / sequence_length

In [None]:
print(f"GC content: {gc_content:.2%}")

GC content: 52.38%


EXERCISE 3: Experimental Data
Practice with typical lab measurements.

In [None]:
# Experimental data
experiment_name = "Protein Expression Analysis"
control_reading = 0.125
treatment_reading = 0.847
background = 0.05

In [None]:
# TODO: Calculate fold change
corrected_control = control_reading - background
corrected_treatment = treatment_reading - background
fold_change = corrected_treatment / corrected_control

In [None]:
print(f"Experiment: {experiment_name}")
print(f"Control (corrected): {corrected_control:.3f}")
print(f"Treatment (corrected): {corrected_treatment:.3f}")
print(f"Fold change: {fold_change:.2f}x")

# ================================================================
# SECTION 9: COMMON MISTAKES AND DEBUGGING
# =================================================================


🚨 COMMON MISTAKES TO AVOID:

1. Forgetting quotes around strings
2. Using reserved words as variable names
3. Case sensitivity issues
4. Mixing up data types

In [None]:
# MISTAKE 1: Forgetting quotes (uncomment to see error)
gene = BRCA1

In [None]:
# MISTAKE 2: Using reserved words
class = "Mammalia"  # ERROR! 'class' is reserved
#organism_class = "Mammalia"  # CORRECT

In [None]:
# MISTAKE 3: Case sensitivity
Gene = "BRCA1"
gene = "brca1"
print(f"These are different variables: {Gene} vs {gene}")

In [None]:
# MISTAKE 4: Mixing data types incorrectly
age = "25"  # This is a string, not a number
# age_next_year = age + 1  # ERROR! Can't add number to string
age_next_year = int(age) + 1  # CORRECT: convert to integer first
print(f"Next year's age: {age_next_year}")

# =================================================================
# SECTION 10: PRACTICE PROBLEMS
# ===================================================================


🧬 PRACTICE PROBLEM 1: Culture Media Preparation
You need to prepare 500 mL of culture media with specific concentrations.

In [None]:
# Given information
total_volume = 500  # mL
glucose_concentration = 2.0  # g/L
nacl_concentration = 0.5  # g/L

TODO: Calculate how much of each component you need

Hint: Convert L to mL, then calculate grams needed

In [None]:
glucose_needed = (glucose_concentration * total_volume) / 1000
nacl_needed = (nacl_concentration * total_volume) / 1000

In [None]:
print("=== CULTURE MEDIA RECIPE ===")
print(f"For {total_volume} mL of media:")
print(f"Glucose: {glucose_needed} g")
print(f"NaCl: {nacl_needed} g")

PRACTICE PROBLEM 2: Cell Culture Passage
Calculate dilution factors for cell culture.

In [None]:
# Given information
current_density = 2000000  # cells/mL
target_density = 500000   # cells/mL
culture_volume = 10       # mL

In [None]:
# TODO: Calculate dilution factor and volumes needed
dilution_factor = current_density / target_density
volume_needed = culture_volume / dilution_factor
media_to_add = culture_volume - volume_needed

In [None]:
print("=== CELL PASSAGE CALCULATION ===")
print(f"Current density: {current_density:,} cells/mL")
print(f"Target density: {target_density:,} cells/mL")
print(f"Dilution factor: 1:{dilution_factor:.1f}")
print(f"Take {volume_needed:.2f} mL of culture")
print(f"Add {media_to_add:.2f} mL of fresh media")

🏠 HOMEWORK:
1. Create a notebook with information about your research project
2. Practice calculating molarity and dilutions
3. Try working with different DNA sequences
4. Experiment with different print formatting options

# ==============================================================
# SECTION 11: SUMMARY AND NEXT STEPS
# ================================================================

🎉 CONGRATULATIONS! You've completed Chapter 1!

You've learned:
✅ How to use Google Colab
✅ Basic Python syntax
✅ Variables and data types
✅ Basic operations
✅ Print statements and formatting
✅ Common mistakes to avoid

🔬 BIOLOGICAL APPLICATIONS COVERED:
- DNA sequence analysis
- Experimental data calculations
- Lab calculations (concentrations, dilutions)
- Data formatting for research