# 🧬 NBDFinder: Advanced Batch Analysis Tool

**Comprehensive Non-B DNA Motif Detection and Positional Analysis**

*Developed by Dr. Venkata Rajesh Yella*

---

## 📖 Overview

This Jupyter notebook provides advanced batch analysis capabilities for detecting Non-B DNA structures across multiple sequences of equal length. It offers:

- **Nucleotide positional occurrence analysis** of all motif types
- **Comparative analysis** across multiple sequences
- **Statistical analysis** and visualization of motif patterns
- **Export capabilities** for downstream analysis
- **Professional visualizations** suitable for publications

### 🔬 Supported Non-B DNA Motifs

1. **G-quadruplex-related**: Canonical G4, Relaxed G4, Bulged G4, Bipartite G4, Multimeric G4, Imperfect G4
2. **G-Triplex**: Three-stranded DNA structures
3. **i-motif related**: Canonical i-Motif, AC-motif
4. **Helix deviations**: Z-DNA, eGZ (Extruded-G), Curved DNA
5. **Repeat/junction**: Slipped DNA, Cruciform, Sticky DNA, Triplex DNA, R-Loop
6. **Hybrid**: Combinations of overlapping motifs
7. **Non-B DNA Clusters**: Hotspots of multiple structures

## 🚀 Installation and Setup

In [None]:
# Install required packages (run only if needed)
# !pip install pandas numpy matplotlib seaborn plotly biopython scipy

# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
from collections import defaultdict, Counter
import re
from scipy import stats
from Bio import SeqIO
import io
import warnings
warnings.filterwarnings('ignore')

# Import the NBDFinder motif detection functions
from motifs import all_motifs, parse_fasta

print("✅ All libraries imported successfully!")
print("📊 NBDFinder Advanced Batch Analysis Tool - Ready!")

## 📥 Data Input and Preparation

In [None]:
def load_sequences_from_fasta(fasta_content):
    """
    Load sequences from FASTA format content.
    Ensures all sequences are of equal length for positional analysis.
    """
    sequences = []
    names = []
    
    # Parse FASTA content
    current_seq = ""
    current_name = ""
    
    for line in fasta_content.strip().split('\n'):
        if line.startswith('>'):
            if current_seq:
                sequences.append(parse_fasta(current_seq))
                names.append(current_name)
            current_name = line[1:].strip()
            current_seq = ""
        else:
            current_seq += line.strip()
    
    # Add the last sequence
    if current_seq:
        sequences.append(parse_fasta(current_seq))
        names.append(current_name)
    
    # Check length consistency
    lengths = [len(seq) for seq in sequences]
    if len(set(lengths)) > 1:
        print(f"⚠️  Warning: Sequences have different lengths: {set(lengths)}")
        print("For positional analysis, sequences should be of equal length.")
        
        # Option to trim to minimum length
        min_length = min(lengths)
        response = input(f"Trim all sequences to {min_length} bp? (y/n): ")
        if response.lower() == 'y':
            sequences = [seq[:min_length] for seq in sequences]
            print(f"✅ All sequences trimmed to {min_length} bp")
    
    return sequences, names

# Example FASTA content (replace with your data)
EXAMPLE_FASTA = """
>Sequence_1_G4_Rich
TTAGGGTTAGGGTTAGGGTTAGGGAAAAATCCGTCGAGCAGAGTTAAAAAGGGGTTAGGGTTAGGGTTAGGGCCCCCTCCCCCTCCCCCTCCCC
>Sequence_2_Z_DNA_Rich
CGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGCGATCGATCGATCGATCGATCGTACGTACGTACGTACGTACGTACGTACGTAC
>Sequence_3_Mixed_Structures
GGGTTAGGGTTAGGGTTAGGGCCCCCTCCCCCTCCCCCTCCCCGCGCGCGCGCGCGCGCGCGCGCGAAAAATTTTTAAAAATTTTTAAAAA
"""

# Load example data or replace with your FASTA content
sequences, sequence_names = load_sequences_from_fasta(EXAMPLE_FASTA)

print(f"📊 Loaded {len(sequences)} sequences:")
for i, (name, seq) in enumerate(zip(sequence_names, sequences)):
    print(f"  {i+1}. {name}: {len(seq)} bp")

sequence_length = len(sequences[0]) if sequences else 0
print(f"\n🧬 Sequence length: {sequence_length} bp")

## 🔍 Comprehensive Motif Detection

In [None]:
def analyze_sequences_batch(sequences, names):
    """
    Perform comprehensive motif analysis on multiple sequences.
    """
    all_results = []
    
    print("🔍 Analyzing sequences for Non-B DNA motifs...")
    
    for i, (seq, name) in enumerate(zip(sequences, names)):
        print(f"  Processing {i+1}/{len(sequences)}: {name}")
        
        # Detect all motifs
        motifs = all_motifs(seq, sequence_name=name)
        
        # Add sequence information
        for motif in motifs:
            motif['Sequence_Index'] = i
            motif['Sequence_Name'] = name
        
        all_results.extend(motifs)
    
    # Convert to DataFrame for easier analysis
    df = pd.DataFrame(all_results)
    
    print(f"\n✅ Analysis complete!")
    print(f"📊 Total motifs detected: {len(df)}")
    
    if not df.empty:
        print(f"🎯 Motif classes found: {df['Class'].nunique()}")
        print(f"📋 Class distribution:")
        class_counts = df['Class'].value_counts()
        for motif_class, count in class_counts.head(10).items():
            print(f"    {motif_class}: {count}")
    
    return df

# Perform the analysis
results_df = analyze_sequences_batch(sequences, sequence_names)

# Display summary
if not results_df.empty:
    print("\n📋 Sample results:")
    display(results_df[['Sequence_Name', 'Class', 'Subtype', 'Start', 'End', 'Length', 'Score']].head(10))
else:
    print("⚠️  No motifs detected in the provided sequences.")

## 📍 Nucleotide Positional Occurrence Analysis

In [None]:
def create_positional_occurrence_matrix(results_df, sequence_length, motif_classes=None):
    """
    Create a matrix showing motif occurrence at each nucleotide position.
    """
    if results_df.empty:
        return pd.DataFrame()
    
    # Filter for specific motif classes if provided
    if motif_classes:
        results_df = results_df[results_df['Class'].isin(motif_classes)]
    
    # Get unique motif classes
    classes = sorted(results_df['Class'].unique())
    num_sequences = results_df['Sequence_Index'].nunique()
    
    # Initialize position matrix
    position_matrix = pd.DataFrame(
        0, 
        index=range(1, sequence_length + 1),  # 1-based positions
        columns=classes
    )
    
    # Fill the matrix
    for _, motif in results_df.iterrows():
        start_pos = int(motif['Start'])
        end_pos = int(motif['End'])
        motif_class = motif['Class']
        
        # Mark all positions covered by this motif
        for pos in range(start_pos, end_pos + 1):
            if 1 <= pos <= sequence_length:
                position_matrix.loc[pos, motif_class] += 1
    
    # Convert to frequencies (normalize by number of sequences)
    position_matrix = position_matrix / num_sequences
    
    return position_matrix

# Create positional occurrence matrix
position_matrix = create_positional_occurrence_matrix(results_df, sequence_length)

if not position_matrix.empty:
    print(f"📊 Positional occurrence matrix created: {position_matrix.shape}")
    print(f"📍 Positions: {position_matrix.shape[0]}")
    print(f"🎯 Motif classes: {position_matrix.shape[1]}")
    
    # Display summary statistics
    print("\n📈 Summary statistics:")
    summary_stats = position_matrix.describe()
    display(summary_stats)
else:
    print("⚠️  No positional data available.")

## 📊 Advanced Visualizations

In [None]:
def create_positional_heatmap(position_matrix, title="Non-B DNA Motif Positional Occurrence"):
    """
    Create an interactive heatmap showing motif occurrence across positions.
    """
    if position_matrix.empty:
        print("⚠️  No data available for heatmap.")
        return
    
    # Create heatmap
    fig = go.Figure(data=go.Heatmap(
        z=position_matrix.T.values,
        x=position_matrix.index,
        y=position_matrix.columns,
        colorscale='Viridis',
        hoverongaps=False,
        hovertemplate='Position: %{x}<br>Motif: %{y}<br>Frequency: %{z:.3f}<extra></extra>'
    ))
    
    fig.update_layout(
        title=title,
        xaxis_title="Nucleotide Position",
        yaxis_title="Motif Class",
        width=1000,
        height=600
    )
    
    fig.show()
    return fig

def create_motif_density_plot(position_matrix):
    """
    Create density plots for each motif class across positions.
    """
    if position_matrix.empty:
        print("⚠️  No data available for density plot.")
        return
    
    # Calculate total density per position
    total_density = position_matrix.sum(axis=1)
    
    # Create subplots
    n_motifs = len(position_matrix.columns)
    fig = make_subplots(
        rows=n_motifs + 1, cols=1,
        subplot_titles=['Total Motif Density'] + list(position_matrix.columns),
        vertical_spacing=0.02
    )
    
    # Add total density plot
    fig.add_trace(
        go.Scatter(
            x=position_matrix.index,
            y=total_density,
            mode='lines',
            name='Total Density',
            line=dict(color='black', width=2)
        ),
        row=1, col=1
    )
    
    # Add individual motif density plots
    colors = px.colors.qualitative.Set3
    for i, motif in enumerate(position_matrix.columns):
        fig.add_trace(
            go.Scatter(
                x=position_matrix.index,
                y=position_matrix[motif],
                mode='lines',
                name=motif,
                line=dict(color=colors[i % len(colors)]),
                fill='tozeroy',
                fillcolor=colors[i % len(colors)]
            ),
            row=i+2, col=1
        )
    
    fig.update_layout(
        title="Motif Density Across Nucleotide Positions",
        xaxis_title="Nucleotide Position",
        height=200 * (n_motifs + 1),
        showlegend=False
    )
    
    fig.show()
    return fig

# Create visualizations
if not position_matrix.empty:
    print("🎨 Creating positional heatmap...")
    heatmap_fig = create_positional_heatmap(position_matrix)
    
    print("\n📈 Creating density plots...")
    density_fig = create_motif_density_plot(position_matrix)
else:
    print("⚠️  No data available for visualization.")

## 📊 Statistical Analysis

In [None]:
def perform_statistical_analysis(results_df, position_matrix):
    """
    Perform comprehensive statistical analysis of motif patterns.
    """
    print("📊 Performing statistical analysis...")
    
    if results_df.empty:
        print("⚠️  No data available for statistical analysis.")
        return
    
    # 1. Basic statistics
    print("\n1️⃣ Basic Statistics:")
    print(f"   Total motifs detected: {len(results_df)}")
    print(f"   Unique motif classes: {results_df['Class'].nunique()}")
    print(f"   Sequences analyzed: {results_df['Sequence_Index'].nunique()}")
    
    # 2. Motif length distribution
    print("\n2️⃣ Motif Length Statistics:")
    length_stats = results_df['Length'].describe()
    print(f"   Mean length: {length_stats['mean']:.1f} bp")
    print(f"   Median length: {length_stats['50%']:.1f} bp")
    print(f"   Range: {length_stats['min']:.0f}-{length_stats['max']:.0f} bp")
    
    # 3. Score distribution
    if 'Score' in results_df.columns:
        print("\n3️⃣ Score Distribution:")
        score_stats = results_df['Score'].describe()
        print(f"   Mean score: {score_stats['mean']:.3f}")
        print(f"   Median score: {score_stats['50%']:.3f}")
        print(f"   Range: {score_stats['min']:.3f}-{score_stats['max']:.3f}")
    
    # 4. Positional preferences
    if not position_matrix.empty:
        print("\n4️⃣ Positional Preferences:")
        
        # Find hotspots (positions with high motif density)
        total_density = position_matrix.sum(axis=1)
        hotspot_threshold = total_density.quantile(0.9)
        hotspots = total_density[total_density >= hotspot_threshold]
        
        print(f"   Hotspot threshold (90th percentile): {hotspot_threshold:.3f}")
        print(f"   Number of hotspot positions: {len(hotspots)}")
        
        if len(hotspots) > 0:
            print(f"   Top hotspot positions:")
            for pos, density in hotspots.nlargest(5).items():
                print(f"     Position {pos}: {density:.3f}")
    
    # 5. Class-specific analysis
    print("\n5️⃣ Class-specific Analysis:")
    class_summary = results_df.groupby('Class').agg({
        'Length': ['count', 'mean', 'std'],
        'Score': ['mean', 'std'] if 'Score' in results_df.columns else ['count']
    }).round(3)
    
    print("   Top 5 most frequent motif classes:")
    class_counts = results_df['Class'].value_counts().head(5)
    for motif_class, count in class_counts.items():
        avg_length = results_df[results_df['Class'] == motif_class]['Length'].mean()
        print(f"     {motif_class}: {count} occurrences (avg length: {avg_length:.1f} bp)")
    
    return {
        'basic_stats': results_df.describe(),
        'class_summary': class_summary,
        'hotspots': hotspots if not position_matrix.empty else None
    }

# Perform statistical analysis
stats_results = perform_statistical_analysis(results_df, position_matrix)

## 💾 Export Results

In [None]:
def export_results(results_df, position_matrix, sequence_names):
    """
    Export analysis results to various formats.
    """
    print("💾 Exporting results...")
    
    # 1. Export detailed motif results
    if not results_df.empty:
        results_df.to_csv('NBDFinder_Detailed_Results.csv', index=False)
        print("✅ Detailed results exported to: NBDFinder_Detailed_Results.csv")
    
    # 2. Export positional occurrence matrix
    if not position_matrix.empty:
        position_matrix.to_csv('NBDFinder_Positional_Matrix.csv')
        print("✅ Positional matrix exported to: NBDFinder_Positional_Matrix.csv")
    
    # 3. Export summary statistics
    if not results_df.empty:
        summary_stats = []
        
        # Overall summary
        summary_stats.append({
            'Metric': 'Total Sequences',
            'Value': len(sequence_names)
        })
        summary_stats.append({
            'Metric': 'Total Motifs Detected',
            'Value': len(results_df)
        })
        summary_stats.append({
            'Metric': 'Unique Motif Classes',
            'Value': results_df['Class'].nunique()
        })
        
        # Per-sequence summary
        for i, name in enumerate(sequence_names):
            seq_motifs = results_df[results_df['Sequence_Index'] == i]
            summary_stats.append({
                'Metric': f'Motifs in {name}',
                'Value': len(seq_motifs)
            })
        
        # Class distribution
        class_counts = results_df['Class'].value_counts()
        for motif_class, count in class_counts.items():
            summary_stats.append({
                'Metric': f'{motif_class} Count',
                'Value': count
            })
        
        summary_df = pd.DataFrame(summary_stats)
        summary_df.to_csv('NBDFinder_Summary_Statistics.csv', index=False)
        print("✅ Summary statistics exported to: NBDFinder_Summary_Statistics.csv")
    
    print("\n📁 All results exported successfully!")
    print("📋 Files created:")
    print("   • NBDFinder_Detailed_Results.csv - Complete motif detection results")
    print("   • NBDFinder_Positional_Matrix.csv - Nucleotide positional occurrence data")
    print("   • NBDFinder_Summary_Statistics.csv - Summary statistics and counts")

# Export results
export_results(results_df, position_matrix, sequence_names)

## 🔬 Advanced Analysis: Comparative Sequence Analysis

In [None]:
def comparative_sequence_analysis(results_df, sequence_names):
    """
    Perform comparative analysis between sequences.
    """
    if results_df.empty:
        print("⚠️  No data available for comparative analysis.")
        return
    
    print("🔬 Performing comparative sequence analysis...")
    
    # Create comparison matrix
    comparison_data = []
    
    for i, seq_name in enumerate(sequence_names):
        seq_motifs = results_df[results_df['Sequence_Index'] == i]
        
        # Calculate metrics for this sequence
        total_motifs = len(seq_motifs)
        unique_classes = seq_motifs['Class'].nunique()
        avg_length = seq_motifs['Length'].mean() if total_motifs > 0 else 0
        avg_score = seq_motifs['Score'].mean() if 'Score' in seq_motifs.columns and total_motifs > 0 else 0
        
        # Most common motif class
        most_common = seq_motifs['Class'].mode().iloc[0] if total_motifs > 0 else "None"
        
        comparison_data.append({
            'Sequence': seq_name,
            'Total_Motifs': total_motifs,
            'Unique_Classes': unique_classes,
            'Avg_Motif_Length': round(avg_length, 2),
            'Avg_Score': round(avg_score, 3),
            'Most_Common_Class': most_common
        })
    
    comparison_df = pd.DataFrame(comparison_data)
    
    print("\n📊 Sequence Comparison Table:")
    display(comparison_df)
    
    # Create comparison visualization
    fig = make_subplots(
        rows=2, cols=2,
        subplot_titles=(
            'Total Motifs per Sequence',
            'Unique Classes per Sequence', 
            'Average Motif Length',
            'Average Score'
        )
    )
    
    # Total motifs
    fig.add_trace(
        go.Bar(x=comparison_df['Sequence'], y=comparison_df['Total_Motifs'], name='Total Motifs'),
        row=1, col=1
    )
    
    # Unique classes
    fig.add_trace(
        go.Bar(x=comparison_df['Sequence'], y=comparison_df['Unique_Classes'], name='Unique Classes'),
        row=1, col=2
    )
    
    # Average length
    fig.add_trace(
        go.Bar(x=comparison_df['Sequence'], y=comparison_df['Avg_Motif_Length'], name='Avg Length'),
        row=2, col=1
    )
    
    # Average score
    fig.add_trace(
        go.Bar(x=comparison_df['Sequence'], y=comparison_df['Avg_Score'], name='Avg Score'),
        row=2, col=2
    )
    
    fig.update_layout(
        title="Comparative Sequence Analysis",
        height=600,
        showlegend=False
    )
    
    fig.show()
    
    return comparison_df

# Perform comparative analysis
comparison_results = comparative_sequence_analysis(results_df, sequence_names)

## 📈 Publication-Ready Figures

In [None]:
def create_publication_figure(results_df, position_matrix, sequence_names):
    """
    Create a comprehensive publication-ready figure.
    """
    if results_df.empty:
        print("⚠️  No data available for publication figure.")
        return
    
    print("📈 Creating publication-ready figure...")
    
    # Set publication style
    plt.style.use('seaborn-v0_8')
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    fig.suptitle('Non-B DNA Motif Analysis: Comprehensive Results', fontsize=16, fontweight='bold')
    
    # Panel A: Motif class distribution
    class_counts = results_df['Class'].value_counts().head(10)
    axes[0, 0].bar(range(len(class_counts)), class_counts.values, color='steelblue')
    axes[0, 0].set_xlabel('Motif Class')
    axes[0, 0].set_ylabel('Count')
    axes[0, 0].set_title('A. Motif Class Distribution')
    axes[0, 0].set_xticks(range(len(class_counts)))
    axes[0, 0].set_xticklabels(class_counts.index, rotation=45, ha='right')
    
    # Panel B: Length distribution
    axes[0, 1].hist(results_df['Length'], bins=20, color='lightcoral', alpha=0.7, edgecolor='black')
    axes[0, 1].set_xlabel('Motif Length (bp)')
    axes[0, 1].set_ylabel('Frequency')
    axes[0, 1].set_title('B. Motif Length Distribution')
    axes[0, 1].axvline(results_df['Length'].mean(), color='red', linestyle='--', label=f'Mean: {results_df["Length"].mean():.1f} bp')
    axes[0, 1].legend()
    
    # Panel C: Positional density heatmap (simplified)
    if not position_matrix.empty:
        # Select top motif classes for clarity
        top_classes = results_df['Class'].value_counts().head(5).index
        subset_matrix = position_matrix[top_classes]
        
        im = axes[1, 0].imshow(subset_matrix.T.values, aspect='auto', cmap='viridis')
        axes[1, 0].set_xlabel('Nucleotide Position')
        axes[1, 0].set_ylabel('Motif Class')
        axes[1, 0].set_title('C. Positional Occurrence Heatmap')
        axes[1, 0].set_yticks(range(len(top_classes)))
        axes[1, 0].set_yticklabels(top_classes)
        
        # Add colorbar
        cbar = plt.colorbar(im, ax=axes[1, 0])
        cbar.set_label('Occurrence Frequency')
    
    # Panel D: Score distribution by class
    if 'Score' in results_df.columns:
        top_classes_for_score = results_df['Class'].value_counts().head(5).index
        score_data = [results_df[results_df['Class'] == cls]['Score'].values for cls in top_classes_for_score]
        
        axes[1, 1].boxplot(score_data, labels=top_classes_for_score)
        axes[1, 1].set_xlabel('Motif Class')
        axes[1, 1].set_ylabel('Score')
        axes[1, 1].set_title('D. Score Distribution by Class')
        axes[1, 1].tick_params(axis='x', rotation=45)
    else:
        # Alternative: sequence comparison
        seq_counts = [len(results_df[results_df['Sequence_Index'] == i]) for i in range(len(sequence_names))]
        axes[1, 1].bar(sequence_names, seq_counts, color='gold')
        axes[1, 1].set_xlabel('Sequence')
        axes[1, 1].set_ylabel('Total Motifs')
        axes[1, 1].set_title('D. Motifs per Sequence')
        axes[1, 1].tick_params(axis='x', rotation=45)
    
    plt.tight_layout()
    plt.savefig('NBDFinder_Publication_Figure.png', dpi=300, bbox_inches='tight')
    plt.savefig('NBDFinder_Publication_Figure.pdf', bbox_inches='tight')
    plt.show()
    
    print("✅ Publication-ready figures saved:")
    print("   • NBDFinder_Publication_Figure.png (300 DPI)")
    print("   • NBDFinder_Publication_Figure.pdf (Vector format)")

# Create publication figure
create_publication_figure(results_df, position_matrix, sequence_names)

## 📋 Analysis Summary Report

In [None]:
def generate_analysis_report(results_df, sequence_names, position_matrix):
    """
    Generate a comprehensive analysis report.
    """
    print("📋 Generating comprehensive analysis report...")
    
    report = []
    report.append("=" * 80)
    report.append("🧬 NBDFinder: Advanced Batch Analysis Report")
    report.append("=" * 80)
    report.append(f"Generated on: {pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')}")
    report.append("Developed by: Dr. Venkata Rajesh Yella")
    report.append("")
    
    # Input summary
    report.append("📥 INPUT SUMMARY")
    report.append("-" * 50)
    report.append(f"Number of sequences analyzed: {len(sequence_names)}")
    
    if sequences:
        report.append(f"Sequence length: {len(sequences[0])} bp")
    
    report.append("Sequences:")
    for i, name in enumerate(sequence_names):
        report.append(f"  {i+1}. {name}")
    report.append("")
    
    if not results_df.empty:
        # Detection summary
        report.append("🔍 DETECTION SUMMARY")
        report.append("-" * 50)
        report.append(f"Total motifs detected: {len(results_df)}")
        report.append(f"Unique motif classes: {results_df['Class'].nunique()}")
        report.append(f"Average motifs per sequence: {len(results_df)/len(sequence_names):.1f}")
        report.append("")
        
        # Class distribution
        report.append("📊 MOTIF CLASS DISTRIBUTION")
        report.append("-" * 50)
        class_counts = results_df['Class'].value_counts()
        for motif_class, count in class_counts.items():
            percentage = (count / len(results_df)) * 100
            report.append(f"{motif_class}: {count} ({percentage:.1f}%)")
        report.append("")
        
        # Length statistics
        report.append("📏 LENGTH STATISTICS")
        report.append("-" * 50)
        length_stats = results_df['Length'].describe()
        report.append(f"Mean length: {length_stats['mean']:.1f} bp")
        report.append(f"Median length: {length_stats['50%']:.1f} bp")
        report.append(f"Standard deviation: {length_stats['std']:.1f} bp")
        report.append(f"Range: {length_stats['min']:.0f} - {length_stats['max']:.0f} bp")
        report.append("")
        
        # Per-sequence analysis
        report.append("🧬 PER-SEQUENCE ANALYSIS")
        report.append("-" * 50)
        for i, seq_name in enumerate(sequence_names):
            seq_motifs = results_df[results_df['Sequence_Index'] == i]
            report.append(f"{seq_name}:")
            report.append(f"  Total motifs: {len(seq_motifs)}")
            report.append(f"  Unique classes: {seq_motifs['Class'].nunique()}")
            
            if len(seq_motifs) > 0:
                top_class = seq_motifs['Class'].mode().iloc[0]
                top_count = seq_motifs['Class'].value_counts().iloc[0]
                report.append(f"  Most frequent: {top_class} ({top_count} occurrences)")
            report.append("")
        
        # Positional analysis
        if not position_matrix.empty:
            report.append("📍 POSITIONAL ANALYSIS")
            report.append("-" * 50)
            total_density = position_matrix.sum(axis=1)
            max_density_pos = total_density.idxmax()
            max_density_val = total_density.max()
            
            report.append(f"Highest density position: {max_density_pos} (density: {max_density_val:.3f})")
            report.append(f"Average density per position: {total_density.mean():.3f}")
            
            # Hotspots
            hotspot_threshold = total_density.quantile(0.9)
            hotspots = total_density[total_density >= hotspot_threshold]
            report.append(f"Hotspot positions (>90th percentile): {len(hotspots)}")
            report.append("")
    
    else:
        report.append("⚠️  No motifs detected in the analyzed sequences.")
        report.append("")
    
    # Footer
    report.append("📚 REFERENCES")
    report.append("-" * 50)
    report.append("NBDFinder implements scientifically validated algorithms for Non-B DNA detection.")
    report.append("For detailed references, see the main application documentation.")
    report.append("")
    report.append("=" * 80)
    
    # Save report
    report_text = "\n".join(report)
    
    with open('NBDFinder_Analysis_Report.txt', 'w') as f:
        f.write(report_text)
    
    print(report_text)
    print("\n✅ Analysis report saved to: NBDFinder_Analysis_Report.txt")

# Generate comprehensive report
generate_analysis_report(results_df, sequence_names, position_matrix)

## 🎯 Conclusion

This advanced batch analysis tool provides comprehensive capabilities for analyzing Non-B DNA structures across multiple sequences. The analysis includes:

✅ **Comprehensive motif detection** across 19 distinct Non-B DNA structure types  
✅ **Nucleotide positional occurrence analysis** for detailed structural mapping  
✅ **Statistical analysis** with descriptive statistics and hotspot identification  
✅ **Comparative sequence analysis** for identifying differences between sequences  
✅ **Professional visualizations** suitable for publication  
✅ **Multiple export formats** for downstream analysis  

### 📁 Output Files Generated

- **NBDFinder_Detailed_Results.csv** - Complete motif detection results
- **NBDFinder_Positional_Matrix.csv** - Nucleotide positional occurrence data
- **NBDFinder_Summary_Statistics.csv** - Summary statistics and counts
- **NBDFinder_Analysis_Report.txt** - Comprehensive text report
- **NBDFinder_Publication_Figure.png/pdf** - Publication-ready figures

### 🔬 Next Steps

1. **Customize Analysis**: Modify the input sequences and parameters for your specific research needs
2. **Extended Analysis**: Use the exported data for additional statistical analysis or machine learning
3. **Publication**: Use the generated figures and statistics for manuscript preparation
4. **Integration**: Incorporate results into larger genomic analysis pipelines

---

*For questions or support, contact: Dr. Venkata Rajesh Yella (yvrajesh_bt@kluniversity.in)*