# 📊 Matplotlib Practice Toolkit

## Visualizing Biological Data

Welcome to your **matplotlib practice toolkit**! This notebook helps you practice the visualization patterns from Lecture 4.

**What you'll practice:**
- 📈 Line plots and customization
- 📊 Histograms and distributions
- 🎯 Scatter plots for relationships
- 📉 Box plots for comparisons
- 🔲 Subplots and layouts

**Skills:** Figure/axes, plot customization, multi-panel figures

**Data:** Real gene expression data from DepMap cancer cell lines

## 📥 Setup and Data Loading

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

print("✓ Libraries loaded!")

In [None]:
# Load gene expression data
url = "https://zenodo.org/records/17377786/files/expression_filtered.csv?download=1"
gene_df = pd.read_csv(url)

print(f"✓ Loaded {gene_df.shape[0]} cell lines")
print(f"✓ Loaded {gene_df.shape[1] - 9} genes")
print(f"\nCancer types: {gene_df['oncotree_lineage'].unique()}")

---
## 📈 Part 1: Line Plots and Customization

**Concept:** Line plots show trends over a sequence. We'll visualize gene expression patterns.

**Pattern:**
```python
fig, ax = plt.subplots()
ax.plot(x, y)
ax.set_xlabel('X label')
ax.set_ylabel('Y label')
ax.set_title('Title')
```

### Example 1: Basic Line Plot

Plot the expression of BRCA1 across cell lines (sorted by expression level).

In [None]:
# Sort by BRCA1 expression
sorted_brca1 = gene_df['BRCA1'].sort_values().values

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(sorted_brca1, color='steelblue', linewidth=2)
ax.set_xlabel('Cell Line (sorted)')
ax.set_ylabel('BRCA1 Expression (log2 TPM+1)')
ax.set_title('BRCA1 Expression Across Cell Lines')
ax.grid(alpha=0.3)
plt.show()

### Example 2: Multiple Lines with Legend

Compare BRCA1 and BRCA2 expression patterns.

In [None]:
sorted_brca1 = gene_df['BRCA1'].sort_values().values
sorted_brca2 = gene_df['BRCA2'].sort_values().values

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(sorted_brca1, color='steelblue', linewidth=2, label='BRCA1')
ax.plot(sorted_brca2, color='coral', linewidth=2, label='BRCA2')
ax.set_xlabel('Cell Line (sorted)')
ax.set_ylabel('Expression (log2 TPM+1)')
ax.set_title('BRCA1 vs BRCA2 Expression Patterns')
ax.legend()
ax.grid(alpha=0.3)
plt.show()

### 🎯 Challenge 1.1: Plot TP53 with Markers

Create a line plot of TP53 expression with:
- Green color
- Circular markers (`marker='o'`)
- Dashed line style (`linestyle='--'`)
- Appropriate labels

In [None]:
# Your code here:


### 🎯 Challenge 1.2: Compare Three Oncogenes

Plot MYC, KRAS, and BRAF on the same axes with different colors and styles. Add a legend.

In [None]:
# Your code here:


---
## 📊 Part 2: Histograms and Distributions

**Concept:** Histograms show the distribution of a single variable.

**Pattern:**
```python
fig, ax = plt.subplots()
ax.hist(data, bins=20, color='blue', alpha=0.7)
ax.set_xlabel('Value')
ax.set_ylabel('Frequency')
```

### Example 3: Basic Histogram

Show the distribution of BRCA1 expression.

In [None]:
fig, ax = plt.subplots(figsize=(8, 5))
ax.hist(gene_df['BRCA1'], bins=20, color='skyblue', edgecolor='black', alpha=0.7)
ax.set_xlabel('BRCA1 Expression (log2 TPM+1)')
ax.set_ylabel('Number of Cell Lines')
ax.set_title('Distribution of BRCA1 Expression')
ax.grid(axis='y', alpha=0.3)
plt.show()

### Example 4: Overlaid Histograms

Compare distributions of BRCA1 in Breast vs Myeloid cancer.

In [None]:
breast_brca1 = gene_df[gene_df['oncotree_lineage'] == 'Breast']['BRCA1']
myeloid_brca1 = gene_df[gene_df['oncotree_lineage'] == 'Myeloid']['BRCA1']

fig, ax = plt.subplots(figsize=(8, 5))
ax.hist(breast_brca1, bins=15, alpha=0.6, label='Breast', color='pink', edgecolor='black')
ax.hist(myeloid_brca1, bins=15, alpha=0.6, label='Myeloid', color='purple', edgecolor='black')
ax.set_xlabel('BRCA1 Expression')
ax.set_ylabel('Frequency')
ax.set_title('BRCA1: Breast vs Myeloid Cancer')
ax.legend()
ax.grid(axis='y', alpha=0.3)
plt.show()

### 🎯 Challenge 2.1: MYC Distribution

Create a histogram of MYC expression with 30 bins. Add a vertical line showing the mean value using `ax.axvline()`.

**Hint:** `mean_val = gene_df['MYC'].mean()` then `ax.axvline(mean_val, color='red', linestyle='--')`

In [None]:
# Your code here:


### 🎯 Challenge 2.2: Density Plots

Create overlaid density plots (KDE) comparing TP53 expression in three cancer types of your choice.

**Hint:** Use `data.plot(kind='density', ax=ax, label='...')` for each cancer type

In [None]:
# Your code here:


---
## 🎯 Part 3: Scatter Plots for Relationships

**Concept:** Scatter plots show relationships between two variables.

**Pattern:**
```python
fig, ax = plt.subplots()
ax.scatter(x, y, alpha=0.6, s=50)
ax.set_xlabel('X variable')
ax.set_ylabel('Y variable')
```

### Example 5: Basic Scatter Plot

Explore the relationship between BRCA1 and BRCA2.

In [None]:
fig, ax = plt.subplots(figsize=(7, 7))
ax.scatter(gene_df['BRCA1'], gene_df['BRCA2'], 
           alpha=0.6, s=60, color='steelblue', edgecolor='black', linewidth=0.5)
ax.set_xlabel('BRCA1 Expression')
ax.set_ylabel('BRCA2 Expression')
ax.set_title('BRCA1 vs BRCA2 Correlation')
ax.grid(alpha=0.3)
plt.show()

### Example 6: Color by Category

Color points by cancer type to reveal subgroups.

In [None]:
fig, ax = plt.subplots(figsize=(9, 7))

# Plot each cancer type with different color
for lineage in ['Breast', 'Myeloid', 'Lung']:
    data = gene_df[gene_df['oncotree_lineage'] == lineage]
    ax.scatter(data['BRCA1'], data['BRCA2'], 
               alpha=0.6, s=60, label=lineage, edgecolor='black', linewidth=0.5)

ax.set_xlabel('BRCA1 Expression')
ax.set_ylabel('BRCA2 Expression')
ax.set_title('BRCA1 vs BRCA2 by Cancer Type')
ax.legend()
ax.grid(alpha=0.3)
plt.show()

### 🎯 Challenge 3.1: Find a Strong Correlation

Create scatter plots for these gene pairs and identify which shows the strongest correlation:
1. TP53 vs MDM2
2. MYC vs MAX
3. AKT1 vs AKT2

Use subplots to show all three side-by-side.

In [None]:
# Your code here:


### 🎯 Challenge 3.2: Advanced Scatter

Create a scatter plot of TSC1 vs TSC2 where:
- Points are colored by cancer type
- Point size is proportional to MYC expression
- Add a legend

**Hint:** `s=gene_df['MYC']*10` for size scaling

In [None]:
# Your code here:


---
## 📉 Part 4: Box Plots for Group Comparisons

**Concept:** Box plots show the distribution of data across different groups.

**Pattern:**
```python
fig, ax = plt.subplots()
ax.boxplot([group1, group2, group3])
ax.set_xticklabels(['Group1', 'Group2', 'Group3'])
```

### Example 7: Compare BRCA1 Across Cancer Types

In [None]:
# Prepare data for each cancer type
cancer_types = ['Breast', 'Lung', 'Myeloid']
brca1_by_cancer = [gene_df[gene_df['oncotree_lineage'] == ct]['BRCA1'] for ct in cancer_types]

fig, ax = plt.subplots(figsize=(8, 6))
ax.boxplot(brca1_by_cancer, labels=cancer_types)
ax.set_ylabel('BRCA1 Expression')
ax.set_title('BRCA1 Expression by Cancer Type')
ax.grid(axis='y', alpha=0.3)
plt.show()

### 🎯 Challenge 4.1: Compare Multiple Genes

Create a box plot comparing the expression of BRCA1, TP53, and MYC in Breast cancer cell lines.

**Hint:** Filter for Breast cancer first, then create list of values for each gene

In [None]:
# Your code here:


### 🎯 Challenge 4.2: Side-by-Side Comparison

Create two subplots:
1. Box plot of TP53 across all cancer types
2. Box plot of MYC across the same cancer types

Which gene shows more variation across cancer types?

In [None]:
# Your code here:


---
## 🔲 Part 5: Subplots and Multi-Panel Figures

**Concept:** Combine multiple plots into a single figure.

**Pattern:**
```python
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
axes[0, 0].plot(x, y)  # Top-left
axes[0, 1].hist(data)  # Top-right
# etc...
plt.tight_layout()
```

### Example 8: 2×2 Grid of Different Plot Types

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Top-left: Line plot
axes[0, 0].plot(gene_df['BRCA1'].sort_values().values, color='steelblue')
axes[0, 0].set_title('Line: BRCA1 Sorted')
axes[0, 0].set_ylabel('Expression')

# Top-right: Histogram
axes[0, 1].hist(gene_df['BRCA1'], bins=20, color='coral', edgecolor='black')
axes[0, 1].set_title('Histogram: BRCA1 Distribution')
axes[0, 1].set_xlabel('Expression')

# Bottom-left: Scatter
axes[1, 0].scatter(gene_df['BRCA1'], gene_df['BRCA2'], alpha=0.6, s=30)
axes[1, 0].set_title('Scatter: BRCA1 vs BRCA2')
axes[1, 0].set_xlabel('BRCA1')
axes[1, 0].set_ylabel('BRCA2')

# Bottom-right: Box plot
cancer_types = ['Breast', 'Lung', 'Myeloid']
data = [gene_df[gene_df['oncotree_lineage'] == ct]['BRCA1'] for ct in cancer_types]
axes[1, 1].boxplot(data, labels=cancer_types)
axes[1, 1].set_title('Box: BRCA1 by Cancer Type')
axes[1, 1].set_ylabel('Expression')

plt.tight_layout()
plt.show()

### 🎯 Challenge 5.1: Gene Comparison Dashboard

Create a 2×3 grid showing:
- Row 1: Histogram, scatter plot, box plot for TP53
- Row 2: Histogram, scatter plot, box plot for MYC

For scatter plots, use correlation with BRCA1. For box plots, compare across cancer types.

In [None]:
# Your code here:


### 🎯 Challenge 5.2: Publication-Quality Figure

Create a figure with:
- Top panel: Large scatter plot of your choice, colored by cancer type
- Bottom left: Histogram for gene 1
- Bottom right: Histogram for gene 2

**Hint:** Use `plt.subplots(2, 2)` and `fig.add_subplot()` or use `gridspec` for custom layouts

In [None]:
# Your code here:


---
## 🚀 Advanced Challenges

Test your skills with these more complex tasks!

### 🎯 Advanced Challenge 1: Correlation Matrix Visualization

Create a 4×4 grid showing all pairwise scatter plots for 4 genes of your choice.

**Hint:** Diagonal plots can show histograms instead of scatter plots (gene vs itself)

In [None]:
# Your code here:


### 🎯 Advanced Challenge 2: Expression Heatmap

Create a visualization showing expression of 5 genes across all cancer types:
- X-axis: Genes
- Y-axis: Cancer types
- Color intensity: Mean expression

**Hint:** Calculate mean expression per cancer type, then use `ax.imshow()` or `ax.pcolormesh()`

In [None]:
# Your code here:


### 🎯 Advanced Challenge 3: Interactive Exploration

Create a function that takes two gene names and generates a 4-panel figure:
1. Scatter plot of gene1 vs gene2 colored by cancer type
2. Histogram of gene1
3. Histogram of gene2
4. Box plot comparing both genes across cancer types

Test your function with different gene pairs!

In [None]:
# Your code here:
def explore_gene_pair(gene1, gene2):
    """
    Create comprehensive visualization comparing two genes
    """
    # Your implementation here
    pass

# Test your function
# explore_gene_pair('BRCA1', 'BRCA2')

---
## 🎯 Summary

You've practiced the essential matplotlib patterns:

**1. Figure/Axes Pattern:**
```python
fig, ax = plt.subplots()
ax.plot(...) / ax.scatter(...) / ax.hist(...)
ax.set_xlabel(...)
ax.set_ylabel(...)
ax.set_title(...)
```

**2. Customization:**
- `color`, `alpha`, `linewidth`, `marker`
- `label` + `ax.legend()`
- `edgecolor`, `linestyle`

**3. Multiple Plots:**
```python
fig, axes = plt.subplots(2, 2)
axes[0, 0].plot(...)
plt.tight_layout()
```

**4. Plot Types:**
- Line plots: trends over sequence
- Histograms: distribution of values
- Scatter plots: relationships between variables
- Box plots: compare groups

**Key Tip:** Always use `fig, ax = plt.subplots()` for explicit control!

---

## 📚 Next Steps

- Explore `seaborn` for more advanced statistical visualizations
- Learn about `plotly` for interactive plots
- Study color theory for better plot design
- Practice creating publication-quality figures

**Keep practicing!** 📊🔬