# 🔢 Python Data Types for Biology

## Understanding Different Types of Data

In biology, we work with many types of data:
- **Numbers**: cell counts, concentrations, temperatures
- **Text**: gene names, species, sequences
- **Decimals**: pH values, weights, measurements
- **True/False**: is it mutated? is the control positive?

Python has specific **data types** for each kind of information. Let's explore them!

## 🎯 Why Data Types Matter

Using the correct data type is like using the right container in the lab:
- You wouldn't store liquid nitrogen in a plastic cup! ❄️
- You wouldn't use a volumetric flask to weigh solids! ⚖️

Similarly, Python needs the right data type for each kind of information to work properly.

## 1️⃣ Integers (int) - Whole Numbers

Integers are whole numbers without decimal points. Perfect for counting!

In [None]:
# Integers - whole numbers only
cell_count = 50000
sample_size = 24
pcr_cycles = 35
chromosome_number = 46  # Human chromosomes

# Check the type
print(f"cell_count = {cell_count}")
print(f"Type: {type(cell_count)}")
print()

# Integers can be negative
temperature_celsius = -80  # Freezer temperature
print(f"Freezer temp: {temperature_celsius}°C")

### Integer Operations

In [None]:
# Basic math with integers
initial_cells = 1000
doubling_cycles = 3

# Calculate final cell count after doublings
final_cells = initial_cells * (2 ** doubling_cycles)
print(f"After {doubling_cycles} doublings: {final_cells} cells")

# Division might not give an integer!
cells_per_well = final_cells / 8
print(f"Cells per well: {cells_per_well}")
print(f"Type after division: {type(cells_per_well)}")  # Float!

## 2️⃣ Floats - Decimal Numbers

Floats store numbers with decimal points. Essential for measurements!

In [None]:
# Floats - numbers with decimals
ph_value = 7.4
protein_concentration = 2.5  # mg/mL
od600 = 0.485  # Optical density
molecular_weight = 342.3  # Daltons

print(f"pH = {ph_value}")
print(f"Type: {type(ph_value)}")
print()

# Scientific notation
avogadro = 6.022e23  # 6.022 × 10²³
nano_to_milli = 1e-6  # 0.000001
print(f"Avogadro's number: {avogadro}")
print(f"Nano to milli conversion: {nano_to_milli}")

### Float Precision

In [None]:
# Floats can have rounding issues
volume1 = 0.1
volume2 = 0.2
total = volume1 + volume2

print(f"0.1 + 0.2 = {total}")
print(f"Is it exactly 0.3? {total == 0.3}")
print(f"Actual value: {total:.20f}")  # Show 20 decimal places

# For lab work, round appropriately
rounded_total = round(total, 2)
print(f"Rounded for pipetting: {rounded_total} mL")

## 3️⃣ Strings (str) - Text Data

Strings store text - gene names, sequences, labels, and more!

In [None]:
# Strings - text data
gene_name = "BRCA1"
species = "Homo sapiens"
dna_sequence = "ATCGATCGTAGC"
protein_sequence = "MLKTVFD"

print(f"Gene: {gene_name}")
print(f"Type: {type(gene_name)}")
print()

# Strings can use single or double quotes
enzyme1 = "DNA Polymerase"  # Double quotes
enzyme2 = 'RNA Polymerase'  # Single quotes
print(f"Both are strings: {type(enzyme1)}, {type(enzyme2)}")

### String Operations

In [None]:
# Working with biological sequences
promoter = "TATAAA"
gene = "ATGGCTAGC"
terminator = "AATAAA"

# Concatenation (joining strings)
full_sequence = promoter + gene + terminator
print(f"Full sequence: {full_sequence}")

# Length
print(f"Sequence length: {len(full_sequence)} bp")

# Case conversion
print(f"Uppercase: {gene_name.upper()}")
print(f"Lowercase: {gene_name.lower()}")

# Checking content
print(f"Contains 'ATG': {'ATG' in gene}")

## 4️⃣ Booleans (bool) - True/False Values

Booleans represent yes/no, true/false conditions - perfect for biological states!

In [None]:
# Booleans - True or False
is_mutant = True
has_phenotype = False
control_positive = True
treatment_applied = False

print(f"Is mutant? {is_mutant}")
print(f"Type: {type(is_mutant)}")
print()

# Booleans from comparisons
ph = 7.2
is_neutral = (ph == 7.0)
is_basic = (ph > 7.0)
is_acidic = (ph < 7.0)

print(f"pH {ph} is neutral? {is_neutral}")
print(f"pH {ph} is basic? {is_basic}")
print(f"pH {ph} is acidic? {is_acidic}")

## 🔄 Type Conversion

Sometimes we need to convert between types - like converting lab measurements!

In [None]:
# Converting between types

# String to number
measurement = "25.5"
actual_value = float(measurement)
print(f"String '{measurement}' → Float {actual_value}")

# Number to string
cell_count = 50000
label = "Sample has " + str(cell_count) + " cells"
print(label)

# Float to integer (loses decimal!)
exact_ph = 7.86
rounded_ph = int(exact_ph)  # Just removes decimal, doesn't round!
print(f"pH {exact_ph} → {rounded_ph} (decimal lost!)")

# Better rounding
properly_rounded = round(exact_ph)
print(f"Properly rounded: {properly_rounded}")

## 🔍 Checking Data Types

Use `type()` to check what kind of data you have:

In [None]:
# Mystery variables - what type are they?
mystery1 = 42
mystery2 = 42.0
mystery3 = "42"
mystery4 = True

print(f"mystery1 = {mystery1}, type: {type(mystery1)}")
print(f"mystery2 = {mystery2}, type: {type(mystery2)}")
print(f"mystery3 = {mystery3}, type: {type(mystery3)}")
print(f"mystery4 = {mystery4}, type: {type(mystery4)}")

# Using isinstance() to check types
print(f"\nIs mystery1 an integer? {isinstance(mystery1, int)}")
print(f"Is mystery2 a float? {isinstance(mystery2, float)}")

## 🧬 Real Biology Example: Analyzing a Gene

Let's use different data types to analyze a gene:

In [None]:
# Gene analysis using multiple data types

# String data
gene_name = "p53"
full_name = "Tumor protein p53"
sequence = "ATGGAGGAGCCGCAGTCAGATCCTAGC"

# Integer data
sequence_length = len(sequence)
exon_count = 11
chromosome = 17

# Float data  
molecular_weight = 43653.0  # Daltons
expression_level = 2.45  # Fold change

# Boolean data
is_tumor_suppressor = True
is_oncogene = False
has_start_codon = sequence.startswith("ATG")

# Analysis
print(f"=== Gene Analysis: {gene_name} ===")
print(f"Full name: {full_name}")
print(f"Chromosome: {chromosome}")
print(f"Sequence length: {sequence_length} bp")
print(f"Number of exons: {exon_count}")
print(f"Molecular weight: {molecular_weight/1000:.1f} kDa")
print(f"Expression level: {expression_level:.2f}-fold")
print(f"\nFunctional properties:")
print(f"  Tumor suppressor? {is_tumor_suppressor}")
print(f"  Oncogene? {is_oncogene}")
print(f"  Has ATG start? {has_start_codon}")

# GC content calculation
gc_count = sequence.count('G') + sequence.count('C')
gc_content = (gc_count / sequence_length) * 100
print(f"\nGC content: {gc_content:.1f}%")

## ⚠️ Common Type Errors

Watch out for these common mistakes:

In [None]:
# Error 1: Can't add strings and numbers
concentration = 2.5
# This will error:
# label = "Concentration: " + concentration  # TypeError!

# Fix: Convert to string
label = "Concentration: " + str(concentration)
print(label)

# Or use f-strings (easier!)
label = f"Concentration: {concentration} mg/mL"
print(label)

In [None]:
# Error 2: String math doesn't work as expected
num1 = "10"
num2 = "20"
result = num1 + num2
print(f"'10' + '20' = {result}")  # Not 30!

# Fix: Convert to numbers first
result = int(num1) + int(num2)
print(f"10 + 20 = {result}")

In [None]:
# Error 3: Integer division surprise
cells = 100
wells = 3

# Integer division in Python 2 (old Python)
# cells_per_well = cells / wells  # Would give 33 in Python 2!

# Python 3 gives float (correct)
cells_per_well = cells / wells
print(f"Cells per well: {cells_per_well}")

# For integer division use //
cells_per_well_int = cells // wells
remaining = cells % wells
print(f"Integer division: {cells_per_well_int} per well, {remaining} remaining")

## 💡 Choosing the Right Type

### Quick Guide:

| Use Case | Data Type | Example |
|----------|-----------|----------|
| Counting | `int` | `cell_count = 50000` |
| Measurements | `float` | `ph = 7.4` |
| Names/Sequences | `str` | `gene = "BRCA1"` |
| Yes/No States | `bool` | `is_mutant = True` |

## 🎯 Practice Exercises

### Exercise 1: Identify the Types

What data type would you use for each of these?

In [None]:
# TODO: Create variables with appropriate types

# 1. Number of mice in experiment
# mice_count = ???

# 2. Average mouse weight in grams  
# avg_weight = ???

# 3. Mouse strain name
# strain = ???

# 4. Is the mouse transgenic?
# is_transgenic = ???

# 5. DNA sequence
# dna = ???

# 6. Temperature in Celsius
# temp = ???

### Exercise 2: Type Conversion Challenge

In [None]:
# Fix these type errors

# 1. Concatenation error
sample_id = 42
# label = "Sample #" + sample_id  # This errors!
# TODO: Fix it

# 2. Math with strings
concentration1 = "2.5"
concentration2 = "3.5"
# total = concentration1 + concentration2  # Doesn't add!
# TODO: Fix to get 6.0

# 3. Boolean from string
user_input = "True"
# is_positive = user_input  # This is a string, not boolean!
# TODO: Convert to actual boolean

### Exercise 3: Build a Sample Tracker

In [None]:
# Create a sample tracking system using all data types

# TODO: Define these variables with appropriate types
# sample_id = ???           # Should be string like "S001"
# organism = ???            # Species name
# cell_count = ???          # Whole number
# viability_percent = ???   # Decimal 0-100
# is_treated = ???          # Boolean
# treatment_dose_mg = ???   # Decimal

# Create a summary using f-strings
# summary = f"Sample {sample_id}: {organism}, {cell_count} cells, {viability_percent}% viable"
# print(summary)

### Exercise 4: Debug the Lab Calculator

In [None]:
# This dilution calculator has type errors. Fix them!

# User inputs (these come as strings from input())
stock_conc_input = "1000"  # mg/mL
desired_conc_input = "50"  # mg/mL  
final_vol_input = "10.0"   # mL

# Calculate dilution (this has errors!)
# dilution_factor = stock_conc_input / desired_conc_input
# stock_volume = final_vol_input / dilution_factor
# water_volume = final_vol_input - stock_volume

# print(f"Add {stock_volume} mL stock")
# print(f"Add {water_volume} mL water")

# TODO: Fix the type errors to make this work

## 🎉 Summary

You've learned Python's four basic data types:

1. **`int`** - Whole numbers for counting
2. **`float`** - Decimals for measurements
3. **`str`** - Text for names and sequences
4. **`bool`** - True/False for states

### Key Takeaways:
- ✅ Each type has a specific purpose
- ✅ Use `type()` to check data types
- ✅ Convert between types when needed
- ✅ F-strings handle type conversion automatically
- ✅ Type errors are common - now you can fix them!

### 🚀 Next Steps

In future lessons, we'll explore:
- Collections (lists, dictionaries) for organizing data
- Working with files and datasets
- NumPy arrays for scientific computing
- Pandas DataFrames for data analysis

**Keep practicing!** 🧬💻