# üß¨ NBDScanner - Non-B DNA Motif Detection System

**Comprehensive Standalone Notebook for Non-B DNA Structure Analysis**

---

## üìã Table of Contents
1. [Introduction](#introduction)
2. [Installation & Setup](#installation)
3. [Quick Start](#quickstart)
4. [Motif Detection](#detection)
5. [Visualization](#visualization)
6. [Export Results](#export)
7. [Advanced Usage](#advanced)

---

## üìñ Introduction <a id="introduction"></a>

NBDScanner detects **11 major classes** with **22+ subclasses** of Non-B DNA motifs:

| Class | Name | Subclasses | Key Features |
|-------|------|------------|------------|
| **1** | Curved DNA | Global curvature, Local Curvature | A-tract mediated curvature |
| **2** | Slipped DNA | Direct Repeat, STR | Tandem repeats, slipped structures |
| **3** | Cruciform | Palindromic Inverted Repeat | Four-way junctions |
| **4** | R-Loop | R-loop formation sites | RNA-DNA hybrids |
| **5** | Triplex | Mirror Repeat, Sticky DNA | Three-stranded structures |
| **6** | G-Quadruplex | 7 subclasses | Four-stranded G-rich structures |
| **7** | i-Motif | Canonical, Relaxed, AC-Motif | C-rich structures |
| **8** | Z-DNA | Classic Z-DNA, eGZ | Left-handed double helix |
| **9** | A-philic | A-philic DNA | A-rich protein binding sites |
| **10** | Hybrid | Multi-class Overlap | Overlapping motifs |
| **11** | Cluster | Motif Hotspot | High-density regions |

---

## ‚öôÔ∏è Installation & Setup <a id="installation"></a>

### Prerequisites
- Python 3.8+
- Jupyter Notebook

### Install Dependencies

In [None]:
# Install required packages
!pip install pandas numpy matplotlib seaborn biopython -q

print("‚úÖ Dependencies installed successfully!")

### Import Modules

In [None]:
# Import NBDScanner modules
import sys
import os

# Add current directory to path if not already there
if '.' not in sys.path:
    sys.path.insert(0, '.')

# Import core modules
from scanner import analyze_sequence, get_motif_classification_info, export_results_to_dataframe
from utilities import parse_fasta, get_basic_stats, export_to_csv, export_to_bed, export_to_json
from visualizations import (
    plot_motif_distribution, plot_coverage_map, plot_length_distribution, 
    plot_nested_pie_chart, MOTIF_CLASS_COLORS
)

# Standard libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from collections import Counter
from IPython.display import display, Markdown, HTML

# Configure matplotlib for better displays
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['figure.dpi'] = 100

print("‚úÖ NBDScanner modules loaded successfully!")
print("üìä Ready for Non-B DNA motif analysis")

---

## üöÄ Quick Start <a id="quickstart"></a>

### Example 1: Analyze a Simple Sequence

In [None]:
# Example DNA sequence with multiple Non-B DNA motifs
example_sequence = """
GGGTTAGGGTTAGGGTTAGGGAAAAAAAATTTTTTCACACACACACACACA
CGCGCGCGCGCGCGCGCCCCTAACCCTAACCCTAACCCAAAAAATTTTTT
ATATATATATATATATATGAAGAAGAAGAAGAAGAAGAAGAAGAAGAAGAA
""".replace('\n', '')

print(f"üß¨ Example Sequence: {len(example_sequence)} bp")
print(f"Sequence: {example_sequence[:80]}...")
print("\nüîç Running NBDScanner analysis...\n")

# Run analysis
motifs = analyze_sequence(example_sequence, "example_seq")

# Display results
print(f"‚úÖ Analysis complete! Found {len(motifs)} motifs")
print(f"\nüìä Motif Classes Detected:")
class_counts = Counter(m.get('Class', 'Unknown') for m in motifs)
for motif_class, count in class_counts.most_common():
    print(f"  ‚Ä¢ {motif_class}: {count} motifs")

### View Detected Motifs

In [None]:
# Convert to DataFrame for better visualization
if motifs:
    df = export_results_to_dataframe(motifs)
    
    # Display key columns
    display_cols = ['Class', 'Subclass', 'Start', 'End', 'Length', 'Score']
    available_cols = [col for col in display_cols if col in df.columns]
    
    print("\nüìã Detailed Motif Table:\n")
    display(df[available_cols].head(20))
else:
    print("No motifs detected in this sequence.")

---

## üî¨ Motif Detection <a id="detection"></a>

### Load Your Own Sequence

In [None]:
# Option 1: Paste your sequence directly
your_sequence = """
>Your_Sequence_Name
ATCGATCGATCGAAAATTTTATTTAAATTTAAATTTGGGTTAGGGTTAGGGTTAGGG
CCCCCTCCCCCTCCCCCTCCCCATCGATCGCGCGCGCGATCGCACACACACAGCTGC
"""

# Parse FASTA if header is present
if your_sequence.strip().startswith('>'):
    sequences = parse_fasta(your_sequence)
    seq_name = list(sequences.keys())[0]
    seq = list(sequences.values())[0]
else:
    seq = your_sequence.replace('\n', '').replace(' ', '').upper()
    seq_name = "custom_sequence"

print(f"üìè Loaded sequence: {seq_name}")
print(f"   Length: {len(seq)} bp")

# Get basic statistics
stats = get_basic_stats(seq)
print(f"\nüìä Sequence Statistics:")
for key, value in stats.items():
    print(f"   {key}: {value}")

In [None]:
# Option 2: Load from FASTA file
# Uncomment and modify the path below:

# fasta_file_path = "path/to/your/sequence.fasta"
# with open(fasta_file_path, 'r') as f:
#     fasta_content = f.read()
# 
# sequences = parse_fasta(fasta_content)
# seq_name = list(sequences.keys())[0]
# seq = list(sequences.values())[0]
# 
# print(f"‚úÖ Loaded from file: {fasta_file_path}")
# print(f"   Sequence: {seq_name}")
# print(f"   Length: {len(seq)} bp")

### Run Complete Analysis

In [None]:
print("üîç Running comprehensive NBDScanner analysis...")
print(f"   Analyzing {len(seq):,} bp sequence\n")

# Run analysis
import time
start_time = time.time()
motifs = analyze_sequence(seq, seq_name)
elapsed_time = time.time() - start_time

# Calculate performance metrics
bp_per_second = len(seq) / elapsed_time if elapsed_time > 0 else 0

print(f"\n‚úÖ Analysis complete!")
print(f"   Time taken: {elapsed_time:.2f} seconds")
print(f"   Speed: {bp_per_second:,.0f} bp/second")
print(f"   Total motifs found: {len(motifs)}")

# Separate regular motifs from hybrid/cluster
regular_motifs = [m for m in motifs if m.get('Class') not in ['Hybrid', 'Non-B_DNA_Clusters']]
hybrid_cluster_motifs = [m for m in motifs if m.get('Class') in ['Hybrid', 'Non-B_DNA_Clusters']]

print(f"\nüìä Breakdown:")
print(f"   Regular motifs: {len(regular_motifs)}")
print(f"   Hybrid motifs: {len([m for m in hybrid_cluster_motifs if m.get('Class') == 'Hybrid'])}")
print(f"   Cluster motifs: {len([m for m in hybrid_cluster_motifs if m.get('Class') == 'Non-B_DNA_Clusters'])}")

# Display class distribution
print(f"\nüìà Motif Class Distribution:")
class_counts = Counter(m.get('Class', 'Unknown') for m in regular_motifs)
for motif_class, count in sorted(class_counts.items(), key=lambda x: x[1], reverse=True):
    print(f"   ‚Ä¢ {motif_class}: {count} motifs")

### View Detection Results

In [None]:
# Create detailed results table
if regular_motifs:
    df_motifs = export_results_to_dataframe(regular_motifs)
    
    print("\nüìã Detailed Motif Detection Results:\n")
    print(f"Showing all {len(df_motifs)} detected motifs:")
    print("="*80)
    
    # Display with styling
    display(df_motifs.style.background_gradient(subset=['Score'], cmap='YlOrRd')
                            .format({'Score': '{:.3f}'})
                            .set_properties(**{'text-align': 'left'}))
else:
    print("‚ö†Ô∏è No regular motifs detected in this sequence.")

---

## üìä Visualization <a id="visualization"></a>

### Motif Distribution Charts

In [None]:
if regular_motifs:
    print("üìä Generating visualizations...\n")
    
    # 1. Class Distribution
    fig1 = plot_motif_distribution(regular_motifs, by='Class', 
                                   title=f"Motif Class Distribution - {seq_name}")
    plt.tight_layout()
    plt.show()
    
    # 2. Subclass Distribution
    fig2 = plot_motif_distribution(regular_motifs, by='Subclass', 
                                   title=f"Motif Subclass Distribution - {seq_name}")
    plt.tight_layout()
    plt.show()
else:
    print("‚ö†Ô∏è No motifs to visualize.")

### Coverage Map

In [None]:
if regular_motifs:
    fig3 = plot_coverage_map(regular_motifs, len(seq), 
                            title=f"Sequence Coverage - {seq_name}")
    plt.tight_layout()
    plt.show()
else:
    print("‚ö†Ô∏è No motifs to visualize.")

### Length Distribution

In [None]:
if regular_motifs:
    fig4 = plot_length_distribution(regular_motifs, by_class=True, 
                                   title=f"Motif Length Distribution - {seq_name}")
    plt.tight_layout()
    plt.show()
else:
    print("‚ö†Ô∏è No motifs to visualize.")

### Nested Pie Chart (Class-Subclass Hierarchy)

In [None]:
if regular_motifs:
    fig5 = plot_nested_pie_chart(regular_motifs, 
                                 title=f"Class-Subclass Distribution - {seq_name}")
    plt.tight_layout()
    plt.show()
else:
    print("‚ö†Ô∏è No motifs to visualize.")

---

## üíæ Export Results <a id="export"></a>

### Export to CSV

In [None]:
if regular_motifs:
    # Export to CSV
    csv_output = export_to_csv(regular_motifs)
    
    # Save to file
    csv_filename = f"{seq_name}_motifs.csv"
    with open(csv_filename, 'w') as f:
        f.write(csv_output)
    
    print(f"‚úÖ Exported to CSV: {csv_filename}")
    print(f"   File size: {len(csv_output)} bytes")
    print(f"   Motifs exported: {len(regular_motifs)}")
else:
    print("‚ö†Ô∏è No motifs to export.")

### Export to BED Format

In [None]:
if regular_motifs:
    # Export to BED format (for genome browsers)
    bed_output = export_to_bed(regular_motifs, seq_name)
    
    # Save to file
    bed_filename = f"{seq_name}_motifs.bed"
    with open(bed_filename, 'w') as f:
        f.write(bed_output)
    
    print(f"‚úÖ Exported to BED: {bed_filename}")
    print(f"   Compatible with: UCSC Genome Browser, IGV")
    print(f"   Motifs exported: {len(regular_motifs)}")
else:
    print("‚ö†Ô∏è No motifs to export.")

### Export to JSON

In [None]:
if regular_motifs:
    # Export to JSON
    json_output = export_to_json(regular_motifs, pretty=True)
    
    # Save to file
    json_filename = f"{seq_name}_motifs.json"
    with open(json_filename, 'w') as f:
        f.write(json_output)
    
    print(f"‚úÖ Exported to JSON: {json_filename}")
    print(f"   Format: Pretty-printed JSON")
    print(f"   Motifs exported: {len(regular_motifs)}")
else:
    print("‚ö†Ô∏è No motifs to export.")

---

## üîß Advanced Usage <a id="advanced"></a>

### Batch Analysis of Multiple Sequences

In [None]:
# Example: Analyze multiple sequences
multi_fasta = """
>Sequence_1_G4_Rich
GGGTTAGGGTTAGGGTTAGGGCCCCTAACCCTAACCCTAACCC
>Sequence_2_Z_DNA
CGCGCGCGCGCGCGCGATATATATATATATATAT
>Sequence_3_Mixed
AAAAATTTTAAAAATTTTGAAGAAGAAGAAGAAGAA
"""

# Parse multiple sequences
sequences = parse_fasta(multi_fasta)

print(f"üìö Batch Analysis of {len(sequences)} sequences\n")

# Analyze each sequence
batch_results = {}
for name, seq in sequences.items():
    print(f"Analyzing: {name} ({len(seq)} bp)")
    motifs = analyze_sequence(seq, name)
    batch_results[name] = motifs
    print(f"  Found: {len(motifs)} motifs\n")

# Summary
print("\nüìä Batch Analysis Summary:")
for name, motifs in batch_results.items():
    regular = [m for m in motifs if m.get('Class') not in ['Hybrid', 'Non-B_DNA_Clusters']]
    classes = len(set(m.get('Class') for m in regular))
    print(f"  {name}: {len(regular)} motifs, {classes} classes")

### Get System Information

In [None]:
# Get NBDScanner classification info
info = get_motif_classification_info()

print("üî¨ NBDScanner System Information\n")
print(f"Version: {info.get('version', 'N/A')}")
print(f"Architecture: {info.get('architecture', 'N/A')}")
print(f"Total Classes: {info.get('total_classes', 'N/A')}")
print(f"Total Subclasses: {info.get('total_subclasses', 'N/A')}")

if 'total_detectors' in info:
    print(f"Total Detectors: {info['total_detectors']}")
if 'total_patterns' in info:
    print(f"Total Patterns: {info['total_patterns']}")

print("\nüìã Motif Classification:")
if 'classification' in info:
    for class_id, class_info in sorted(info['classification'].items()):
        print(f"\n  Class {class_id}: {class_info['name']}")
        print(f"    Subclasses: {', '.join(class_info['subclasses'])}")

### Custom Analysis Parameters

In [None]:
# Example: Filter motifs by score threshold
if regular_motifs:
    score_threshold = 0.5
    high_score_motifs = [m for m in regular_motifs if m.get('Score', 0) >= score_threshold]
    
    print(f"üéØ High-confidence motifs (Score >= {score_threshold}):")
    print(f"   Total: {len(high_score_motifs)} out of {len(regular_motifs)}")
    print(f"   Percentage: {100*len(high_score_motifs)/len(regular_motifs):.1f}%")
    
    if high_score_motifs:
        df_high = export_results_to_dataframe(high_score_motifs)
        display(df_high[['Class', 'Subclass', 'Start', 'End', 'Score']].head(10))
else:
    print("‚ö†Ô∏è No motifs to filter.")

---

## üìö Documentation & References

### Scientific References

- **G4Hunter**: Bedrat et al., 2016, Nucleic Acids Research
- **QmRLFS**: Jenjaroenpun & Wongsurawat, 2016
- **Z-DNA**: Ho et al., 1986, Nature
- **Curved DNA**: Olson et al., 1998, PNAS
- **A-philic DNA**: Vinogradov, 2003, Bioinformatics

### Contact

**Dr. Venkata Rajesh Yella**
- Email: yvrajesh_bt@kluniversity.in
- GitHub: [@VRYella](https://github.com/VRYella)

### Citation

If you use NBDScanner in your research, please cite:

```
NonBScanner: Comprehensive Detection and Analysis of Non-B DNA Motifs
Dr. Venkata Rajesh Yella
GitHub: https://github.com/VRYella/NonBScanner
```

---

## ‚ú® Tips & Best Practices

1. **Sequence Quality**: Ensure your input sequences contain only valid DNA bases (A, C, G, T)
2. **Sequence Length**: NBDScanner works efficiently on sequences from 100 bp to >1 Mbp
3. **Score Interpretation**: Higher scores indicate stronger motif confidence
4. **Hybrid/Cluster Motifs**: These represent complex overlapping regions - analyze separately
5. **Export Formats**: 
   - Use CSV for spreadsheet analysis
   - Use BED for genome browser visualization
   - Use JSON for programmatic access

---

**End of NBDScanner Standalone Notebook**

Thank you for using NBDScanner! üß¨