# NARDINI Online Fasta Analysis

This tool, developed by Cohan et al., conducts statistical analysis of **amino acid patterning** within intrisically disordered regions (**IDRs**).

The inputs and outputs are the same as command-line NARDINI. The input is a **.fasta** file of IDRs; the output is a **.zip** containing **.tsv** and **.png** files.

This notebook sends the FASTA to be processed to an external server, where NARDINI statistical analysis is performed. **You can close this notebook and the analysis will still run, and then come back and get your results.**


# Usage Instructions

## Getting Started
1. **Install requirements**: Make sure you have the `requests` library installed:
   ```bash
   pip install requests ipykernel
   ```

2. **Prepare your FASTA file**: Place your FASTA file in one of these locations:
   - Current directory (same as this notebook)
   - `data/fasta_inputs/` folder
   - Or specify a custom path

3. **Run the cells in order**: Execute each cell from top to bottom

## Workflow
1. **Setup** - Install dependencies and configure environment
2. **Test Connection** - Verify backend service is available
3. **Select FASTA** - Choose your input file
4. **Run Analysis** - Submit file for processing (get Run ID)
5. **Check Progress** - Monitor analysis status
6. **Download Results** - Get your results when complete

## Output
Results will be saved to `data/zip_outputs/` folder containing:
- A zip file with your analysis results
- Run information text files for reference


In [None]:
# ## Setup and Configuration
# Import required packages for NARDINI analysis

import os
from pathlib import Path

from config import INPUTS_DIR, OUTPUTS_DIR, ensure_dirs, get_backend_url, JSON_PATH
from core import (
    download_zip,
    get_run_status,
    test_health,
    upload_fasta,
    save_run_info,
    get_available_runs
)

BACKEND_URL = get_backend_url(dev=True)

In [None]:
# Test Connection to Server
print("Testing connection to NARDINI backend...")
test_result = test_health(BACKEND_URL)

if test_result.get("error"):
    print("❌ Connection failed!")
    print(f"Response: {test_result}")
    print("\nTroubleshooting:")
    print("1. Check your internet connection")
    print("2. Verify the backend URL is correct")
    print("3. The server may be temporarily unavailable")
else:
    print("✅ Connection to server established!")
    print(f"Backend URL: {BACKEND_URL}")
    print(f"Response: {test_result}")

# Create directories
ensure_dirs()

In [None]:
get_available_runs(JSON_PATH)

# Note: The cells below need to be cleaned up and organized

In [None]:
# ## Select FASTA File 📁

# Look for FASTA files in inputs directory
search_paths = [INPUTS_DIR]

fasta_files = []
for search_path in search_paths:
    if search_path.exists():
        # Find FASTA files (common extensions)
        for pattern in ["*.fasta", "*.fa", "*.fas"]:
            fasta_files.extend(search_path.glob(pattern))

# Convert to relative paths and remove duplicates
fasta_files = list(set([str(f) for f in fasta_files]))
fasta_files.sort()

print("Available FASTA files:")
if fasta_files:
    for i, file in enumerate(fasta_files, 1):
        file_size = os.path.getsize(file) / 1024  # KB
        print(f"  {i}. {file} ({file_size:.1f} KB)")

    print(f"\nFound {len(fasta_files)} FASTA file(s)")
    print("To select a file, set the FASTA_FILEPATH variable in the next cell")
else:
    print("  No FASTA files found!")
    print("\nMake sure your FASTA files are in one of these locations:")
    for path in search_paths:
        print(f"  - {path}/")
    print("\nOr manually set FASTA_FILEPATH to the path of your FASTA file.")

In [None]:
# ## Run NARDINI Analysis ⚙️
# Submit your FASTA file for analysis

# STEP 1: Set the path to your FASTA file
FASTA_FILEPATH = "../data/fasta_inputs/test_seqs.fasta"

# STEP 2: Submit for analysis
if FASTA_FILEPATH and Path(FASTA_FILEPATH).exists():
    print(f"\n🔬 Submitting {FASTA_FILEPATH} for NARDINI analysis...")

    try:
        result = upload_fasta(BACKEND_URL, FASTA_FILEPATH)
        if isinstance(result, dict) and "run_id" in result:
            run_id = result["run_id"]
            print("✅ Analysis started successfully!")
            print(f"🆔 Run ID: {run_id}")

            # Save run information
            save_run_info(run_id, FASTA_FILEPATH, JSON_PATH)

            print("\n📝 Next steps:")
            print("1. Use the 'Check Progress' cell to monitor analysis")
            print("2. Use the 'Download Results' cell when complete")
            print(f"3. Your Run ID is: {run_id}")
        else:
            print(f"❌ Error submitting file: {result}")
            run_id = None
    except Exception as e:
        print(f"❌ Error occurred: {e}")
        run_id = None
elif FASTA_FILEPATH:
    print(f"❌ File not found: {FASTA_FILEPATH}")
    run_id = None
else:
    print("⚠️  Please set FASTA_FILEPATH to the path of your FASTA file first!")

In [None]:
# ## Check Progress 🔍
# Monitor the status of your NARDINI analysis
# You can either use the run_id from the previous cell or enter one manually
check_run_id = run_id

# Option: Manually enter a run ID if needed (uncomment and modify)
# check_run_id = "your-run-id-here"

print(f"Checking status for Run ID: {check_run_id}")

if not check_run_id:
    print("⚠️  No Run ID available!")
    print("Either run the analysis cell above first, or manually set check_run_id")
else:
    print(f"\n🔍 Checking progress for: {check_run_id}")

    try:
        status_dict = get_run_status(BACKEND_URL, check_run_id)

        if isinstance(status_dict, dict):
            status = status_dict.get("status", "unknown")
            print(f"📊 Analysis Status: {status.upper()}")

            if status == "in_progress":
                print("⏳ Analysis is running...")

                # Show pending sequences
                pending_sequences = status_dict.get("pending_sequences", [])
                if pending_sequences:
                    remaining_count = len(pending_sequences)
                    print("\n📈 Progress Details:")
                    print(f"⏱️  {remaining_count} sequences remaining to process")

                    # Show first few pending sequences (limit output)
                    display_limit = min(5, len(pending_sequences))
                    for i, sequence in enumerate(pending_sequences[:display_limit], 1):
                        # Show only first 30 chars of sequence to keep output manageable
                        short_seq = (
                            sequence[:30] + "..." if len(sequence) > 30 else sequence
                        )
                        print(f"  ⏳ {i}. {short_seq}")

                    if len(pending_sequences) > display_limit:
                        print(
                            f"  ... and {len(pending_sequences) - display_limit} more sequences"
                        )
                else:
                    print("🔄 Processing has started, checking sequence completion...")

            elif status == "completed":
                print("🎉 Analysis completed successfully!")
                print("📥 You can now download the results using the next cell.")

            else:
                print(f"ℹ️  Status: {status}")

        else:
            print(f"❌ Error checking status: {status_dict}")

    except Exception as e:
        print(f"❌ Error occurred while checking status: {e}")

print("\n💡 Tip: Re-run this cell to get updated progress information")

In [None]:
# ## Download Results 📥
# Download the completed NARDINI analysis results

# Use the run_id from previous cells or enter one manually
download_run_id = run_id

# Option: Manually enter a run ID if needed (uncomment and modify)
# download_run_id = "your-run-id-here"

if not download_run_id:
    print("⚠️  No Run ID available!")
    print("Either run the analysis cell above first, or manually set download_run_id")
else:
    print(f"\n📥 Downloading results for: {download_run_id}")

    # Attempt download directly
    try:
        results = download_zip(BACKEND_URL, download_run_id, OUTPUTS_DIR)
        if results.get("error"):
            print(f"❌ Download failed: {results.get('error')}")
            print("The analysis may still be in progress or an error occurred")
            print("Use the 'Check Progress' cell to verify the analysis status")
        else:
            print("\n🎉 Download successful!")
            print(f"📁 Results saved to: {results.get('destination_filepath')}")

    except Exception as e:
        print(f"❌ Error occurred during download: {e}")
        print("The analysis may still be in progress or there was a connection issue")

# Credits

**✨ Made by Tanuj Vasudeva and Ethan Caine, 2025 ✨**

This notebook has been adapted for use in any Jupyter environment, not just Google Colab.



# Acknowledgments

We would like to thank Dr. John Woolford at Carnegie Mellon University — for whose lab this notebook was made — for his support of this project; Modal for hosting this service; and Katherine Parry for helpful advice.



# References

Cohan, M. C., Shinn, M. K., Lalmansingh, J. M., & Pappu, R. V. (2021). Uncovering non-random binary patterns within sequences of intrinsically disordered proteins. *Journal of Molecular Biology*, 434(2), 167373.

## Additional Information

- **NARDINI Tool**: This notebook provides a user-friendly interface to the NARDINI analysis tool
- **Backend Service**: Analysis is performed on remote servers for optimal performance
- **Output Format**: Results include statistical data (.tsv files) and visualization plots (.png files)
- **Caching**: Previously analyzed sequences are cached to speed up repeat analyses