# Perturbio Quickstart Tutorial

**Time:** 5-10 minutes  
**Level:** Beginner  

## Learning Objectives

In this tutorial, you'll learn how to:
1. Load Crop-Seq data into Perturbio
2. Extract CRISPR guide barcodes from cells
3. Run differential expression analysis
4. Generate and interpret basic visualizations

## The Magic of Perturbio

Perturbio takes you from raw Crop-Seq data to biological insights in just a few lines of code!

## Step 1: Import Libraries and Load Data

First, let's import Perturbio and create some example data for this tutorial.

In [None]:
# Import required libraries
import perturbio as pt
import scanpy as sc
import numpy as np
import pandas as pd

# For this tutorial, we'll create synthetic data
# In real usage, you would load your own h5ad file:
# adata = sc.read_h5ad("your_cropseq_data.h5ad")

print("‚úì Perturbio version:", pt.__version__)

## Step 2: Create Example Data

Let's create a small synthetic Crop-Seq dataset to demonstrate Perturbio's capabilities.

**Note:** In real analysis, you would skip this step and load your actual data!

In [None]:
from scipy.sparse import csr_matrix

# Set random seed for reproducibility
np.random.seed(42)

# Create synthetic data: 200 cells √ó 100 genes
n_cells = 200
n_genes = 100

# Generate sparse count matrix
X = csr_matrix(np.random.poisson(2, size=(n_cells, n_genes)).astype(np.float32))

# Create gene names (including guide RNAs)
guide_names = ['BRCA1_guide1', 'MYC_guide1', 'TP53_guide1', 'non-targeting_1']
regular_genes = [f'Gene_{i}' for i in range(n_genes - len(guide_names))]
var_names = guide_names + regular_genes

# Create AnnData object
adata = sc.AnnData(
    X=X,
    obs=pd.DataFrame(index=[f'Cell_{i}' for i in range(n_cells)]),
    var=pd.DataFrame(index=var_names)
)

# Add guide expression to simulate Crop-Seq data
# Cells 0-49: BRCA1_guide1
adata.X[0:50, 0] = np.random.poisson(10, size=50)
# Cells 50-99: MYC_guide1  
adata.X[50:100, 1] = np.random.poisson(10, size=50)
# Cells 100-149: TP53_guide1
adata.X[100:150, 2] = np.random.poisson(10, size=50)
# Cells 150-199: non-targeting control
adata.X[150:200, 3] = np.random.poisson(10, size=50)

print(f"Created example dataset: {adata.n_obs} cells √ó {adata.n_vars} genes")
print(f"Guide RNAs in dataset: {guide_names}")

## Step 3: Create Guide Library

A guide library tells Perturbio which genes each guide RNA targets.

In [None]:
# Create guide library
guide_library = pd.DataFrame({
    'guide_id': ['BRCA1_guide1', 'MYC_guide1', 'TP53_guide1', 'non-targeting_1'],
    'target_gene': ['BRCA1', 'MYC', 'TP53', 'control'],
    'guide_sequence': ['GCACTCAGGAAACAGCTATG', 'GTACTTGGTGAGGCCAGCGC', 
                       'CCATTGTTCAATATCGTCCG', 'GTAGCGAACGTGTCCGGCGT']
})

# Save it temporarily
guide_library.to_csv('temp_guides.csv', index=False)

print("Guide library:")
print(guide_library)

## Step 4: The Magic - One-Line Analysis! ‚ú®

This is the **magical moment** - run complete Crop-Seq analysis in just 2 lines of code!

In [None]:
# Initialize analyzer
analyzer = pt.CropSeqAnalyzer(adata)

# Run complete analysis (extract guides ‚Üí differential expression ‚Üí done!)
results = analyzer.run(guide_file='temp_guides.csv', min_cells=10)

## Step 5: Explore Results

Let's look at what Perturbio found!

In [None]:
# Print summary
print(results.summary())

In [None]:
# View top differentially expressed genes for BRCA1 perturbation
top_brca1 = results.top_hits('BRCA1', n=10)
print("\nTop 10 genes affected by BRCA1 knockout:")
print(top_brca1)

In [None]:
# View all perturbations tested
print("\nPerturbations analyzed:")
print(results.differential_expression.perturbations)

## Step 6: Visualize Results

Perturbio creates publication-quality plots automatically!

In [None]:
# Plot 1: Cell counts per perturbation
fig = analyzer.plot_perturbation_counts()
fig.suptitle('How many cells have each perturbation?', fontsize=14, y=1.02)

In [None]:
# Plot 2: Volcano plot showing genes affected by BRCA1 knockout
fig = analyzer.plot_volcano('BRCA1', fdr_threshold=0.05)

In [None]:
# Plot 3: Volcano plot for MYC perturbation
fig = analyzer.plot_volcano('MYC', fdr_threshold=0.05)

## Step 7: Access the Annotated Data

Perturbio adds perturbation information directly to your AnnData object!

In [None]:
# Check what was added to your data
print("New columns in adata.obs:")
print(list(analyzer.adata.obs.columns))

# View perturbation assignments
print("\nPerturbation assignments:")
print(analyzer.adata.obs['perturbation'].value_counts())

## Step 8: Export Results

Save everything for later analysis or sharing!

In [None]:
# Export all results to a directory
output_dir = analyzer.export(output_dir='quickstart_results')

print(f"\n‚úì Results saved to: {output_dir}")
print("\nYou can find:")
print("  ‚Ä¢ Differential expression results (CSV)")
print("  ‚Ä¢ Annotated data with perturbations (H5AD)")
print("  ‚Ä¢ Summary statistics (TXT)")
print("  ‚Ä¢ All figures (PNG)")

## üéâ Congratulations!

You've completed your first Perturbio analysis! In just a few minutes, you:

1. ‚úÖ Loaded Crop-Seq data
2. ‚úÖ Extracted CRISPR guide barcodes 
3. ‚úÖ Identified differentially expressed genes
4. ‚úÖ Generated publication-quality visualizations
5. ‚úÖ Exported results for downstream analysis

## Next Steps

Ready to learn more? Check out:

- **Tutorial 02**: Complete workflow with real data preprocessing
- **Tutorial 03**: Advanced customization and scanpy integration
- **Tutorial 04**: Command-line usage for batch processing

## Need Help?

- üìñ [Documentation](https://perturbio.readthedocs.io)
- üí¨ [GitHub Discussions](https://github.com/perturbio/perturbio/discussions)
- üêõ [Report Issues](https://github.com/perturbio/perturbio/issues)

In [None]:
# Cleanup
import os
if os.path.exists('temp_guides.csv'):
    os.remove('temp_guides.csv')
print("Tutorial complete! ‚ú®")