# Xist Amplicon SNP Analysis Pipeline

This notebook provides a complete, automated workflow for analyzing Nanopore amplicon sequencing data to quantify allele-specific expression (Cast vs. B6) in the *Xist* gene.

## 1. Environment and Setup
The analysis requires the `bioinfo` conda environment. 

In [None]:
import os
import sys

# Ensure we are in the project root
PROJECT_ROOT = os.getcwd()
print(f"Project Root: {PROJECT_ROOT}")

# Define core paths
DATA_DIR = os.path.join(PROJECT_ROOT, "data")
RESULTS_DIR = os.path.join(PROJECT_ROOT, "results")

if not os.path.exists(DATA_DIR):
    print("WARNING: data/ directory not found. Please ensure FASTQ files are placed there.")

## 2. Reference Initialization
We start by extracting the amplicon sequence from the genome using the validated primers. This step also identifies the known B6/Cast SNPs within the targeted region.

**Input**: `ValidatedPrimers.fa`  
**Outputs**: `results/ref_seq/target_amplicon.fa`, `results/ref_seq/snps.json`, `results/ref_seq/amplicon_to_genome.sam`

In [None]:
!python scripts/initialize_reference.py --output_dir .

## 3. Data Processing (QC & Alignment)
Next, we perform quality control on the raw Nanopore reads and align them to our newly generated reference sequence.

In [None]:
print("Running FASTQ QC...")
!python scripts/fastq_qc.py --data_dir data --output_dir .

print("\nAligning reads...")
!python scripts/align_reads.py --data_dir data --output_dir .

print("\nGenerating alignment stats...")
!python scripts/alignment_stats.py --output_dir .

## 4. Allele Quantification
This step examines each aligned read, identifies the base at the targeted SNP positions, and assigns the read to a specific allele (B6 or Cast) based on a majority-rule assignment.

In [None]:
!python scripts/quantify_alleles.py --output_dir .

## 5. Stoichiometry & Reliability
To ensure high confidence, we analyze the stoichiometry of SNP co-occurrence. High-quality single-amplicon reads should ideally show all target SNPs together.

In [None]:
!python scripts/analyze_stoichiometry.py --output_dir .
!python scripts/diagnose_stoichiometry.py --output_dir .

## 6. Report Generation
Finally, we consolidate all individual results into a reader-friendly Markdown report.

In [None]:
!python scripts/generate_reports.py

print(f"\nPipeline complete! Report generated at: results/reports/Automated_Summary_Report.md")