# NARDINI Online Fasta Analysis

This tool, developed by Cohan et al., conducts statistical analysis of **amino acid patterning** within intrisically disordered regions (**IDRs**).

The inputs and outputs are the same as command-line NARDINI. The input is a **.fasta** file of IDRs; the output is a **.zip** containing **.tsv** and **.png** files.

This notebook sends the FASTA to be processed to an external server, where NARDINI statistical analysis is performed. **You can close this notebook and the analysis will still run, and then come back and get your results.**


# Usage Instructions

## Getting Started
1. **Install requirements**: Make sure you have the `requests` library installed:
   ```bash
   pip install requests ipykernel
   ```

2. **Prepare your FASTA file**: Place your FASTA file in one of these locations:
   - Current directory (same as this notebook)
   - `data/fasta_inputs/` folder
   - Or specify a custom path

3. **Run the cells in order**: Execute each cell from top to bottom

## Workflow
1. **Setup** - Install dependencies and configure environment
2. **Test Connection** - Verify backend service is available
3. **Select FASTA** - Choose your input file
4. **Run Analysis** - Submit file for processing (get Run ID)
5. **Check Progress** - Monitor analysis status
6. **Download Results** - Get your results when complete

## Output
Results will be saved to `data/zip_outputs/` folder containing:
- A zip file with your analysis results
- Run information text files for reference


In [None]:
# ## Setup and Configuration
# Import required packages for NARDINI analysis
 
import requests
from pathlib import Path
import os
import datetime

# Configuration
ROOT_DIR = Path("..")
INPUTS_DIR = ROOT_DIR / "data" / "fasta_inputs"
OUTPUTS_DIR = ROOT_DIR / "data" / "zip_outputs"
BACKEND_URL = "https://tangentleman--nardini-backend-fastapi-app.modal.run"

# Create outputs directory if it doesn't exist
if not OUTPUTS_DIR.exists():
    OUTPUTS_DIR.mkdir(parents=True, exist_ok=True)

# Test the health endpoint
def test_health(url: str):
    """Test if the NARDINI backend service is healthy."""
    try:
        health_response = requests.get(f"{url}/health")
        if health_response.ok:
            return health_response.json()
        else:
            return f"Error: {health_response.status_code} {health_response.text}"
    except Exception as e:
        return f"Connection error: {e}"

# Main function to run Nardini
def run_nardini(url: str, fasta_filepath: str):
    """Submit a FASTA file for NARDINI analysis."""
    if not Path(fasta_filepath).exists():
        raise FileNotFoundError(f"File {fasta_filepath} does not exist")

    with open(fasta_filepath, "rb") as f:
        files = {"file": f}
        response = requests.post(f"{url}/upload_fasta", files=files)
    if response.ok:
        return response.json()
    else:
        return f"Error: {response.status_code} {response.text}"

def check_run_status(url: str, run_id: str):
    """Check the status of a NARDINI analysis run."""
    try:
        status_response = requests.get(f"{url}/status/{run_id}")
        if status_response.ok:
            return status_response.json()
        else:
            return f"Error: {status_response.status_code} {status_response.text}"
    except Exception as e:
        return f"Connection error: {e}"

def download_results(url: str, run_id: str, destination_dir: Path = OUTPUTS_DIR):
    """Download the results zip file for a completed analysis."""
    if not run_id:
        print("Please provide a valid Run ID.")
        return False

    try:
        response = requests.get(f"{url}/download/{run_id}", stream=True)
        if response.ok:
            destination_filepath = destination_dir / f"{run_id}.zip"
            with open(destination_filepath, "wb") as f:
                for chunk in response.iter_content(chunk_size=8192):
                    f.write(chunk)
            print(f"Downloaded results to: {destination_filepath}")
            return True
        else:
            print(f"Error downloading file: {response.status_code} {response.text}")
            print("Analysis may still be in progress or an error occurred.")
            return False
    except Exception as e:
        print(f"Download error: {e}")
        return False

def save_run_info(run_id: str, fasta_filepath: str):
    """Save run information to a text file for reference."""
    timestamp = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    output_filename = OUTPUTS_DIR / f"run_info_{run_id}.txt"
    
    with open(output_filename, "w") as f:
        f.write(f"NARDINI Analysis Run Information\n")
        f.write(f"================================\n")
        f.write(f"Timestamp: {timestamp}\n")
        f.write(f"FASTA File: {fasta_filepath}\n")
        f.write(f"Run ID: {run_id}\n")
        f.write(f"Backend URL: {BACKEND_URL}\n")
    
    print(f"Run information saved to: {output_filename}")
    return output_filename

print("NARDINI analysis environment setup complete!")
print(f"Outputs will be saved to: {OUTPUTS_DIR}")

NARDINI analysis environment setup complete!
Outputs will be saved to: ../data/zip_outputs


In [5]:
# ## Test Connection to Server
# Verify that the NARDINI backend service is available

print("Testing connection to NARDINI backend...")
test_result = test_health(BACKEND_URL)

if isinstance(test_result, dict) and test_result.get('status') == 'healthy' and test_result.get('service') == 'nardini-backend':
    print('✅ Connection to server established!')
    print(f"Backend URL: {BACKEND_URL}")
else:
    print('❌ Connection failed!')
    print(f'Response: {test_result}')
    print("\nTroubleshooting:")
    print("1. Check your internet connection")
    print("2. Verify the backend URL is correct")
    print("3. The server may be temporarily unavailable")

Testing connection to NARDINI backend...
✅ Connection to server established!
Backend URL: https://tangentleman--nardini-backend-fastapi-app.modal.run


In [6]:
# ## Select FASTA File 📁

# Look for FASTA files in inputs directory
search_paths = [
    INPUTS_DIR
]

fasta_files = []
for search_path in search_paths:
    if search_path.exists():
        # Find FASTA files (common extensions)
        for pattern in ["*.fasta", "*.fa", "*.fas"]:
            fasta_files.extend(search_path.glob(pattern))

# Convert to relative paths and remove duplicates
fasta_files = list(set([str(f) for f in fasta_files]))
fasta_files.sort()

print("Available FASTA files:")
if fasta_files:
    for i, file in enumerate(fasta_files, 1):
        file_size = os.path.getsize(file) / 1024  # KB
        print(f"  {i}. {file} ({file_size:.1f} KB)")
    
    print(f"\nFound {len(fasta_files)} FASTA file(s)")
    print("To select a file, set the FASTA_FILEPATH variable in the next cell")
else:
    print("  No FASTA files found!")
    print("\nMake sure your FASTA files are in one of these locations:")
    for path in search_paths:
        print(f"  - {path}/")
    print("\nOr manually set FASTA_FILEPATH to the path of your FASTA file.")

Available FASTA files:
  1. ../data/fasta_inputs/Halophile-pHtolerant-yeast-first16.fasta (3.4 KB)

Found 1 FASTA file(s)
To select a file, set the FASTA_FILEPATH variable in the next cell


In [8]:
# ## Run NARDINI Analysis ⚙️
# Submit your FASTA file for analysis

# STEP 1: Set the path to your FASTA file
# Either select from the files found above or provide your own path
FASTA_FILEPATH = "../data/fasta_inputs/Halophile-pHtolerant-yeast-first16.fasta"

# Option A: Select from detected files (uncomment and modify the index)
if fasta_files:
    # FASTA_FILEPATH = fasta_files[0]  # Uncomment and change index (0, 1, 2, etc.)
    print("Available files:")
    for i, file in enumerate(fasta_files):
        print(f"  {i}: {file}")
    print("\nTo select a file, uncomment and modify the line above")
    print("Example: FASTA_FILEPATH = fasta_files[0]  # selects the first file")

# Option B: Provide your own path (uncomment and modify)
# FASTA_FILEPATH = "path/to/your/file.fasta"

print(f"\nSelected FASTA file: {FASTA_FILEPATH}")

# STEP 2: Submit for analysis
if FASTA_FILEPATH and Path(FASTA_FILEPATH).exists():
    print(f"\n🔬 Submitting {FASTA_FILEPATH} for NARDINI analysis...")
    
    try:
        result = run_nardini(BACKEND_URL, FASTA_FILEPATH)
        if isinstance(result, dict) and 'run_id' in result:
            run_id = result['run_id']
            print(f"✅ Analysis started successfully!")
            print(f"🆔 Run ID: {run_id}")
            
            # Save run information
            save_run_info(run_id, FASTA_FILEPATH)
            
            print(f"\n📝 Next steps:")
            print(f"1. Use the 'Check Progress' cell to monitor analysis")
            print(f"2. Use the 'Download Results' cell when complete")
            print(f"3. Your Run ID is: {run_id}")
        else:
            print(f"❌ Error submitting file: {result}")
            run_id = None
    except Exception as e:
        print(f"❌ Error occurred: {e}")
        run_id = None
elif FASTA_FILEPATH:
    print(f"❌ File not found: {FASTA_FILEPATH}")
    run_id = None
else:
    print("⚠️  Please set FASTA_FILEPATH to the path of your FASTA file first!")


Available files:
  0: ../data/fasta_inputs/Halophile-pHtolerant-yeast-first16.fasta

To select a file, uncomment and modify the line above
Example: FASTA_FILEPATH = fasta_files[0]  # selects the first file

Selected FASTA file: ../data/fasta_inputs/Halophile-pHtolerant-yeast-first16.fasta

🔬 Submitting ../data/fasta_inputs/Halophile-pHtolerant-yeast-first16.fasta for NARDINI analysis...
✅ Analysis started successfully!
🆔 Run ID: 6cf73f1e-8b48-423e-93c3-fae4db1a7cc9
Run information saved to: ../data/zip_outputs/run_info_6cf73f1e-8b48-423e-93c3-fae4db1a7cc9.txt

📝 Next steps:
1. Use the 'Check Progress' cell to monitor analysis
2. Use the 'Download Results' cell when complete
3. Your Run ID is: 6cf73f1e-8b48-423e-93c3-fae4db1a7cc9


In [9]:
# ## Check Progress 🔍
# Monitor the status of your NARDINI analysis

# You can either use the run_id from the previous cell or enter one manually
check_run_id = run_id if 'run_id' in globals() and run_id else None

# Option: Manually enter a run ID if needed (uncomment and modify)
# check_run_id = "your-run-id-here"

print(f"Checking status for Run ID: {check_run_id}")

if not check_run_id:
    print("⚠️  No Run ID available!")
    print("Either run the analysis cell above first, or manually set check_run_id")
else:
    print(f"\n🔍 Checking progress for: {check_run_id}")
    
    try:
        status_dict = check_run_status(BACKEND_URL, check_run_id)
        
        if isinstance(status_dict, dict):
            status = status_dict.get('status', 'unknown')
            print(f"📊 Analysis Status: {status.upper()}")
            
            if status == 'in_progress':
                print("⏳ Analysis is running...")
                
                # Count completed sequences
                progress = status_dict.get('progress', {})
                if progress:
                    completed_count = 0
                    total_sequences = len(progress)
                    
                    print(f"\n📈 Progress Details:")
                    for sequence, result in progress.items():
                        if isinstance(result, str) and result.endswith('.zip'):
                            completed_count += 1
                            # Show only first 50 chars of sequence to keep output manageable
                            short_seq = sequence[:50] + "..." if len(sequence) > 50 else sequence
                            print(f"  ✅ {completed_count}. {short_seq}")
                    
                    remaining = total_sequences - completed_count
                    print(f"\n📊 Summary: {completed_count}/{total_sequences} sequences completed")
                    if remaining > 0:
                        print(f"⏱️  {remaining} sequences remaining")
                else:
                    print("🔄 Processing has started, waiting for progress updates...")
                    
            elif status == 'completed' or status == 'done':
                print("🎉 Analysis completed successfully!")
                print("📥 You can now download the results using the next cell.")
                
            elif status == 'failed' or status == 'error':
                print("❌ Analysis failed!")
                if 'error' in status_dict:
                    print(f"Error details: {status_dict['error']}")
                    
            else:
                print(f"ℹ️  Status: {status}")
                
        else:
            print(f"❌ Error checking status: {status_dict}")
            
    except Exception as e:
        print(f"❌ Error occurred while checking status: {e}")

print(f"\n💡 Tip: Re-run this cell to get updated progress information")

Checking status for Run ID: 6cf73f1e-8b48-423e-93c3-fae4db1a7cc9

🔍 Checking progress for: 6cf73f1e-8b48-423e-93c3-fae4db1a7cc9
📊 Analysis Status: COMPLETED
🎉 Analysis completed successfully!
📥 You can now download the results using the next cell.

💡 Tip: Re-run this cell to get updated progress information


In [10]:
# ## Download Results 📥
# Download the completed NARDINI analysis results

# Use the run_id from previous cells or enter one manually
download_run_id = globals().get('run_id')

# Option: Manually enter a run ID if needed (uncomment and modify)
# download_run_id = "your-run-id-here"

if not download_run_id:
    print("⚠️  No Run ID available!")
    print("Either run the analysis cell above first, or manually set download_run_id")
else:
    print(f"\n📥 Downloading results for: {download_run_id}")
    
    # First check if the analysis is complete
    try:
        status_dict = check_run_status(BACKEND_URL, download_run_id)
        
        if isinstance(status_dict, dict):
            status = status_dict.get('status', 'unknown')
            
            if status in ['completed', 'done']:
                print("✅ Analysis is complete, proceeding with download...")
                
                # Attempt download
                success = download_results(BACKEND_URL, download_run_id, OUTPUTS_DIR)
                
                if success:
                    results_file = OUTPUTS_DIR / f"{download_run_id}.zip"
                    file_size = os.path.getsize(results_file) / (1024 * 1024)  # MB
                    
                    print(f"\n🎉 Download successful!")
                    print(f"📁 Results saved to: {results_file}")
                    print(f"📊 File size: {file_size:.1f} MB")
                    print(f"\n📂 To extract the results:")
                    print(f"   1. Navigate to: {OUTPUTS_DIR}")
                    print(f"   2. Extract: {download_run_id}.zip")
                    print(f"   3. The zip contains .tsv data files and .png plots")
                
            elif status == 'in_progress':
                print("⏳ Analysis is still in progress")
                print("Please wait for the analysis to complete before downloading")
                print("Use the 'Check Progress' cell to monitor status")
                
            elif status in ['failed', 'error']:
                print("❌ Analysis failed - no results available for download")
                if 'error' in status_dict:
                    print(f"Error details: {status_dict['error']}")
                    
            else:
                print(f"ℹ️  Current status: {status}")
                print("Download may not be available yet")
                
        else:
            print(f"❌ Error checking status: {status_dict}")
            print("Attempting download anyway...")
            download_results(BACKEND_URL, download_run_id, OUTPUTS_DIR)
            
    except Exception as e:
        print(f"❌ Error occurred: {e}")
        print("You can try downloading anyway...")
        download_results(BACKEND_URL, download_run_id, OUTPUTS_DIR)


📥 Downloading results for: 6cf73f1e-8b48-423e-93c3-fae4db1a7cc9
✅ Analysis is complete, proceeding with download...
Downloaded results to: ../data/zip_outputs/6cf73f1e-8b48-423e-93c3-fae4db1a7cc9.zip

🎉 Download successful!
📁 Results saved to: ../data/zip_outputs/6cf73f1e-8b48-423e-93c3-fae4db1a7cc9.zip
📊 File size: 1.0 MB

📂 To extract the results:
   1. Navigate to: ../data/zip_outputs
   2. Extract: 6cf73f1e-8b48-423e-93c3-fae4db1a7cc9.zip
   3. The zip contains .tsv data files and .png plots


# Credits

**✨ Made by Tanuj Vasudeva and Ethan Caine, 2025 ✨**

This notebook has been adapted for use in any Jupyter environment, not just Google Colab.



# Acknowledgments

We would like to thank Dr. John Woolford at Carnegie Mellon University — for whose lab this notebook was made — for his support of this project; Modal for hosting this service; and Katherine Parry for helpful advice.



# References

Cohan, M. C., Shinn, M. K., Lalmansingh, J. M., & Pappu, R. V. (2021). Uncovering non-random binary patterns within sequences of intrinsically disordered proteins. *Journal of Molecular Biology*, 434(2), 167373.

## Additional Information

- **NARDINI Tool**: This notebook provides a user-friendly interface to the NARDINI analysis tool
- **Backend Service**: Analysis is performed on remote servers for optimal performance
- **Output Format**: Results include statistical data (.tsv files) and visualization plots (.png files)
- **Caching**: Previously analyzed sequences are cached to speed up repeat analyses