# ClassyFire Chemical Classification Tutorial

**NOTE**: currently, it does not run. It seems to be a problem with the service.

ClassyFire is a web-based application for the automated structural classification of chemical entities. This tutorial demonstrates how to use the `ClassyFireAPI` class from the `provesid` package to classify chemical compounds using their structural features.

ClassyFire provides hierarchical chemical classification based on:
- Chemical structure analysis
- Functional group identification
- Taxonomic classification into superclass, class, subclass levels
- Molecular framework analysis
- Chemical fingerprinting

The ClassyFire system can classify compounds into over 4,800 chemical categories and is particularly useful for:
- Metabolomics research
- Chemical database organization
- Drug discovery
- Natural product analysis
- Chemical space exploration

**Service URL**: http://classyfire.wishartlab.com
**Input Types**: SMILES, InChI, chemical names, structure files

In [2]:
from provesid import ClassyFireAPI
import time
import json

# Initialize ClassyFire API
print("ClassyFire API initialized successfully!")
print(f"Service URL: {ClassyFireAPI.URL}")
print("Ready to classify chemical compounds!")

# Note: ClassyFire is a web service that requires submitting queries and waiting for results
print("\nImportant: ClassyFire processing involves:")
print("1. Submit a query (with SMILES, InChI, or chemical name)")
print("2. Wait for processing (can take several seconds to minutes)")
print("3. Retrieve classification results")
print("4. Parse the hierarchical classification data")

ClassyFire API initialized successfully!
Service URL: http://classyfire.wishartlab.com
Ready to classify chemical compounds!

Important: ClassyFire processing involves:
1. Submit a query (with SMILES, InChI, or chemical name)
2. Wait for processing (can take several seconds to minutes)
3. Retrieve classification results
4. Parse the hierarchical classification data


## 1. Basic Usage - Submitting a Classification Query

The basic workflow involves submitting a query and then retrieving results. Let's start with a simple example:

In [5]:
# Example 1: Classify aspirin using SMILES
aspirin_smiles = "CC(=O)OC1=CC=CC=C1C(=O)O"

print("Submitting classification query for aspirin...")
print(f"SMILES: {aspirin_smiles}")

# Submit the query
response = ClassyFireAPI.submit_query("Aspirin Classification", aspirin_smiles)

if response.status_code == 200:
    query_result = response.json()
    query_id = query_result['id']
    print(f"✓ Query submitted successfully!")
    print(f"  Query ID: {query_id}")
    print(f"  Label: {query_result.get('label', 'N/A')}")
    print(f"  Status: {query_result.get('classification_status', 'N/A')}")
    
    # Store the query ID for later use
    aspirin_query_id = query_id
    
else:
    print(f"✗ Failed to submit query. Status code: {response.status_code}")
    print(f"  Response: {response.text}")
    aspirin_query_id = None

Submitting classification query for aspirin...
SMILES: CC(=O)OC1=CC=CC=C1C(=O)O
✗ Failed to submit query. Status code: 201
  Response: {"id":12606536,"label":"Aspirin Classification","finished_at":null,"created_at":"2023-03-02T06:39:54.000Z","updated_at":"2023-03-02T06:39:54.000Z","query_errors":null,"finished_processing_at":null,"query_type":"STRUCTURE","fstruc_file_name":null,"fstruc_content_type":null,"fstruc_file_size":null,"fstruc_updated_at":null,"query_input":"CC(=O)OC1=CC=CC=C1C(=O)O","tag_list":["Aspirin Classification"]}
✗ Failed to submit query. Status code: 201
  Response: {"id":12606536,"label":"Aspirin Classification","finished_at":null,"created_at":"2023-03-02T06:39:54.000Z","updated_at":"2023-03-02T06:39:54.000Z","query_errors":null,"finished_processing_at":null,"query_type":"STRUCTURE","fstruc_file_name":null,"fstruc_content_type":null,"fstruc_file_size":null,"fstruc_updated_at":null,"query_input":"CC(=O)OC1=CC=CC=C1C(=O)O","tag_list":["Aspirin Classification"]}


## 2. Checking Query Status

After submitting a query, we need to check its status before retrieving results:

In [6]:
# Check query status for aspirin
if aspirin_query_id:
    print(f"Checking status for query ID: {aspirin_query_id}")
    
    status_response = ClassyFireAPI.query_status(aspirin_query_id)
    
    if status_response and status_response.status_code == 200:
        status_data = status_response.json()
        print(f"✓ Status check successful:")
        print(f"  Classification Status: {status_data.get('classification_status', 'Unknown')}")
        print(f"  Submission Time: {status_data.get('created_at', 'Unknown')}")
        
        # Check if classification is complete
        if status_data.get('classification_status') == 'Done':
            print("  🎉 Classification is complete! Ready to retrieve results.")
        elif status_data.get('classification_status') == 'In progress':
            print("  ⏳ Classification is still in progress. Please wait...")
        else:
            print(f"  ⚠️  Status: {status_data.get('classification_status')}")
            
    else:
        print("✗ Failed to check query status")
        if status_response:
            print(f"  Status code: {status_response.status_code}")
else:
    print("No query ID available to check status")

print()
print("Note: ClassyFire processing typically takes 30 seconds to several minutes")
print("depending on the complexity of the molecule and server load.")

No query ID available to check status

Note: ClassyFire processing typically takes 30 seconds to several minutes
depending on the complexity of the molecule and server load.


## 3. Retrieving Classification Results

Once the classification is complete, we can retrieve the detailed results in various formats:

In [None]:
# Wait a bit to allow processing (in real use, you might need longer waits)
print("Waiting for classification to complete...")
time.sleep(5)  # Adjust as needed

# Retrieve classification results for aspirin
if aspirin_query_id:
    print(f"Retrieving classification results for query ID: {aspirin_query_id}")
    
    # Get results in JSON format
    result_response = ClassyFireAPI.get_query(aspirin_query_id, format="json")
    
    if result_response.status_code == 200:
        classification_results = result_response.json()
        print("✓ Classification results retrieved successfully!")
        print()
        
        # Display basic information
        print("=== COMPOUND INFORMATION ===")
        print(f"Query Label: {classification_results.get('label', 'N/A')}")
        print(f"SMILES: {classification_results.get('smiles', 'N/A')}")
        print(f"InChI: {classification_results.get('inchi', 'N/A')}")
        print(f"InChI Key: {classification_results.get('inchikey', 'N/A')}")
        print(f"Molecular Formula: {classification_results.get('molecular_formula', 'N/A')}")
        print()
        
        # Display hierarchical classification
        print("=== HIERARCHICAL CLASSIFICATION ===")
        entities = classification_results.get('entities', [])
        if entities:
            for entity in entities:
                kingdom = entity.get('kingdom', {})
                superclass = entity.get('superclass', {})
                class_info = entity.get('class', {})
                subclass = entity.get('subclass', {})
                
                print(f"Kingdom: {kingdom.get('name', 'N/A')} ({kingdom.get('description', 'No description')})")
                print(f"Superclass: {superclass.get('name', 'N/A')} ({superclass.get('description', 'No description')})")
                print(f"Class: {class_info.get('name', 'N/A')} ({class_info.get('description', 'No description')})")
                print(f"Subclass: {subclass.get('name', 'N/A')} ({subclass.get('description', 'No description')})")
                
                # Show intermediate nodes if available
                intermediate_nodes = entity.get('intermediate_nodes', [])
                if intermediate_nodes:
                    print(f"Intermediate Nodes ({len(intermediate_nodes)}):")
                    for i, node in enumerate(intermediate_nodes[:3], 1):  # Show first 3
                        print(f"  {i}. {node.get('name', 'N/A')}: {node.get('description', 'No description')}")
                    if len(intermediate_nodes) > 3:
                        print(f"  ... and {len(intermediate_nodes) - 3} more")
                
                # Show direct parent
                direct_parent = entity.get('direct_parent', {})
                if direct_parent:
                    print(f"Direct Parent: {direct_parent.get('name', 'N/A')}")
                
                print()
        
        # Display molecular framework (if available)
        molecular_framework = classification_results.get('molecular_framework', 'N/A')
        if molecular_framework != 'N/A':
            print(f"=== MOLECULAR FRAMEWORK ===")
            print(f"Framework: {molecular_framework}")
            print()
            
    elif result_response.status_code == 202:
        print("⏳ Classification is still in progress. Please wait longer and try again.")
    else:
        print(f"✗ Failed to retrieve results. Status code: {result_response.status_code}")
        print(f"  Response: {result_response.text}")
else:
    print("No query ID available to retrieve results")

## 4. Classifying Multiple Compounds

Let's classify several different types of compounds to see the diversity of ClassyFire classifications:

In [None]:
# Define different types of compounds to classify
compounds = {
    "Caffeine": "CN1C=NC2=C1C(=O)N(C(=O)N2C)C",
    "Glucose": "C([C@@H]1[C@H]([C@@H]([C@H]([C@H](O1)O)O)O)O)O",
    "Cholesterol": "C[C@H](CCCC(C)C)[C@H]1CC[C@@H]2[C@@]1(CC[C@H]3[C@H]2CC=C4[C@@]3(CC[C@@H](C4)O)C)C",
    "Ethanol": "CCO",
    "Benzene": "C1=CC=CC=C1"
}

# Submit queries for multiple compounds
query_ids = {}
print("Submitting classification queries for multiple compounds:")
print("=" * 60)

for name, smiles in compounds.items():
    print(f"Submitting query for {name}...")
    print(f"  SMILES: {smiles}")
    
    response = ClassyFireAPI.submit_query(f"{name} Classification", smiles)
    
    if response.status_code == 200:
        query_result = response.json()
        query_id = query_result['id']
        query_ids[name] = query_id
        print(f"  ✓ Success! Query ID: {query_id}")
    else:
        print(f"  ✗ Failed! Status code: {response.status_code}")
        query_ids[name] = None
    
    print()
    # Add a small delay between requests to be respectful to the server
    time.sleep(1)

print(f"Submitted {len([q for q in query_ids.values() if q])} successful queries out of {len(compounds)} compounds.")
print("\\nNote: Processing may take several minutes. You can check status individually.")

## 5. Different Output Formats

ClassyFire supports multiple output formats. Let's demonstrate JSON, CSV, and SDF formats:

In [None]:
# Demonstrate different output formats using aspirin query (if available)
if aspirin_query_id:
    print("Demonstrating different output formats for aspirin classification:")
    print("=" * 65)
    
    # JSON format (most detailed)
    print("1. JSON Format (detailed):")
    json_response = ClassyFireAPI.get_query(aspirin_query_id, format="json")
    if json_response.status_code == 200:
        json_data = json_response.json()
        print(f"   ✓ JSON data retrieved ({len(json.dumps(json_data))} characters)")
        print(f"   Contains keys: {list(json_data.keys())}")
        
        # Show a sample of the JSON structure
        print("   Sample JSON structure:")
        sample_data = {
            "smiles": json_data.get("smiles"),
            "molecular_formula": json_data.get("molecular_formula"),
            "kingdom": json_data.get("entities", [{}])[0].get("kingdom", {}).get("name") if json_data.get("entities") else None
        }
        print(f"   {json.dumps(sample_data, indent=4)}")
    else:
        print(f"   ✗ Failed to get JSON format. Status: {json_response.status_code}")
    
    print()
    
    # CSV format
    print("2. CSV Format (tabular):")
    csv_response = ClassyFireAPI.get_query(aspirin_query_id, format="csv")
    if csv_response.status_code == 200:
        csv_data = csv_response.text
        print(f"   ✓ CSV data retrieved ({len(csv_data)} characters)")
        # Show first few lines of CSV
        csv_lines = csv_data.split('\\n')[:5]
        print("   First few lines of CSV:")
        for i, line in enumerate(csv_lines, 1):
            if line.strip():
                print(f"   {i}: {line[:100]}..." if len(line) > 100 else f"   {i}: {line}")
    else:
        print(f"   ✗ Failed to get CSV format. Status: {csv_response.status_code}")
    
    print()
    
    # SDF format
    print("3. SDF Format (structure file):")
    sdf_response = ClassyFireAPI.get_query(aspirin_query_id, format="sdf")
    if sdf_response.status_code == 200:
        sdf_data = sdf_response.text
        print(f"   ✓ SDF data retrieved ({len(sdf_data)} characters)")
        # Show first few lines of SDF
        sdf_lines = sdf_data.split('\\n')[:10]
        print("   First few lines of SDF:")
        for i, line in enumerate(sdf_lines, 1):
            if line.strip():
                print(f"   {i}: {line}")
    else:
        print(f"   ✗ Failed to get SDF format. Status: {sdf_response.status_code}")
        
else:
    print("No aspirin query ID available to demonstrate output formats")

print()
print("Format recommendations:")
print("• JSON: Best for programmatic analysis and detailed classification data")
print("• CSV: Good for spreadsheet analysis and simple data processing")  
print("• SDF: Ideal for integration with chemical structure software")

## 6. Helper Functions for Automated Classification

Let's create some helper functions to streamline the classification process:

In [None]:
def classify_compound_complete(name, smiles, max_wait_time=300, check_interval=10):
    """
    Complete classification workflow with automatic waiting and result retrieval.
    
    Args:
        name (str): Descriptive name for the compound
        smiles (str): SMILES string of the compound
        max_wait_time (int): Maximum time to wait for completion (seconds)
        check_interval (int): How often to check status (seconds)
    
    Returns:
        dict: Classification results or error information
    """
    print(f"Starting complete classification for {name}...")
    
    # Submit query
    response = ClassyFireAPI.submit_query(f"{name} Classification", smiles)
    
    if response.status_code != 200:
        return {
            "success": False,
            "error": f"Failed to submit query: {response.status_code}",
            "name": name,
            "smiles": smiles
        }
    
    query_result = response.json()
    query_id = query_result['id']
    print(f"  Query submitted. ID: {query_id}")
    
    # Wait for completion
    elapsed_time = 0
    while elapsed_time < max_wait_time:
        print(f"  Checking status... (elapsed: {elapsed_time}s)")
        
        status_response = ClassyFireAPI.query_status(query_id)
        if status_response and status_response.status_code == 200:
            status_data = status_response.json()
            status = status_data.get('classification_status', 'Unknown')
            
            if status == 'Done':
                print(f"  ✓ Classification complete!")
                break
            elif status in ['In progress', 'Queued']:
                print(f"  ⏳ Status: {status}")
            else:
                print(f"  ⚠️  Unexpected status: {status}")
        
        time.sleep(check_interval)
        elapsed_time += check_interval
    
    if elapsed_time >= max_wait_time:
        return {
            "success": False,
            "error": "Timeout waiting for classification to complete",
            "name": name,
            "smiles": smiles,
            "query_id": query_id
        }
    
    # Retrieve results
    result_response = ClassyFireAPI.get_query(query_id, format="json")
    
    if result_response.status_code == 200:
        classification_results = result_response.json()
        return {
            "success": True,
            "name": name,
            "smiles": smiles,
            "query_id": query_id,
            "results": classification_results
        }
    else:
        return {
            "success": False,
            "error": f"Failed to retrieve results: {result_response.status_code}",
            "name": name,
            "smiles": smiles,
            "query_id": query_id
        }

def extract_classification_summary(classification_data):
    """
    Extract key classification information from ClassyFire results.
    
    Args:
        classification_data (dict): Full ClassyFire classification results
    
    Returns:
        dict: Simplified classification summary
    """
    if not classification_data.get("success"):
        return {"error": classification_data.get("error", "Unknown error")}
    
    results = classification_data["results"]
    entities = results.get("entities", [])
    
    if not entities:
        return {"error": "No classification entities found"}
    
    entity = entities[0]  # Use first entity
    
    return {
        "compound_name": classification_data["name"],
        "smiles": results.get("smiles"),
        "molecular_formula": results.get("molecular_formula"),
        "inchi_key": results.get("inchikey"),
        "kingdom": entity.get("kingdom", {}).get("name"),
        "superclass": entity.get("superclass", {}).get("name"),
        "class": entity.get("class", {}).get("name"),
        "subclass": entity.get("subclass", {}).get("name"),
        "direct_parent": entity.get("direct_parent", {}).get("name"),
        "molecular_framework": results.get("molecular_framework"),
        "num_intermediate_nodes": len(entity.get("intermediate_nodes", []))
    }

# Example usage of helper functions
print("Testing helper functions with a simple compound (ethanol):")
print("=" * 60)

# Note: This is a demonstration - actual execution may take several minutes
ethanol_smiles = "CCO"
print(f"Using SMILES: {ethanol_smiles}")
print()
print("In a real application, you would run:")
print("ethanol_classification = classify_compound_complete('Ethanol', ethanol_smiles)")
print("ethanol_summary = extract_classification_summary(ethanol_classification)")
print()
print("This would return a complete classification summary including:")
print("- Kingdom, Superclass, Class, Subclass hierarchy")
print("- Molecular framework information")
print("- Chemical identifiers (SMILES, InChI Key, etc.)")
print("- Intermediate classification nodes")

## 7. Error Handling and Best Practices

ClassyFire classification can encounter various issues. Let's demonstrate proper error handling:

In [None]:
# Test error handling with various scenarios
print("Testing error handling scenarios:")
print("=" * 40)

# Test 1: Invalid SMILES
print("1. Testing with invalid SMILES:")
invalid_smiles = "INVALID_SMILES_STRING"
try:
    response = ClassyFireAPI.submit_query("Invalid SMILES Test", invalid_smiles)
    print(f"   Status Code: {response.status_code}")
    if response.status_code != 200:
        print(f"   ✓ Correctly handled invalid SMILES")
        print(f"   Response: {response.text[:100]}...")
    else:
        print(f"   Unexpected success with invalid SMILES")
except Exception as e:
    print(f"   Exception caught: {e}")

print()

# Test 2: Very long SMILES (may cause issues)
print("2. Testing with very long SMILES:")
long_smiles = "C" * 1000  # Very long alkane chain
try:
    response = ClassyFireAPI.submit_query("Long SMILES Test", long_smiles)
    print(f"   Status Code: {response.status_code}")
    if response.status_code == 200:
        print(f"   ✓ Long SMILES accepted")
    else:
        print(f"   Long SMILES rejected: {response.text[:100]}...")
except Exception as e:
    print(f"   Exception caught: {e}")

print()

# Test 3: Query with invalid ID
print("3. Testing with invalid query ID:")
try:
    invalid_id = "invalid_query_id_12345"
    status_response = ClassyFireAPI.query_status(invalid_id)
    if status_response:
        print(f"   Status Code: {status_response.status_code}")
        if status_response.status_code == 404:
            print(f"   ✓ Correctly returned 404 for invalid ID")
        else:
            print(f"   Response: {status_response.text[:100]}...")
    else:
        print(f"   ✓ Correctly returned None for invalid ID")
except Exception as e:
    print(f"   Exception caught: {e}")

print()

# Best practices recommendations
print("=== BEST PRACTICES ===")
print()
print("1. Input Validation:")
print("   • Validate SMILES strings before submission")
print("   • Check molecular size (very large molecules may fail)")
print("   • Ensure proper chemical structure representation")
print()
print("2. Rate Limiting:")
print("   • Add delays between requests (1-2 seconds minimum)")
print("   • Don't submit too many queries simultaneously")
print("   • Monitor server response times")
print()
print("3. Error Handling:")
print("   • Always check response status codes")
print("   • Implement retry logic for temporary failures")
print("   • Handle timeout scenarios gracefully")
print()
print("4. Waiting for Results:")
print("   • Classification can take 30 seconds to several minutes")
print("   • Check status periodically rather than continuously")
print("   • Implement reasonable timeout limits")
print()
print("5. Data Processing:")
print("   • Parse JSON results carefully (structure may vary)")
print("   • Handle missing classification levels gracefully")
print("   • Store intermediate results for large batches")

print()
print("=== COMMON ISSUES ===")
print()
print("• Server overload: Classification may be slow during peak times")
print("• Invalid structures: Some SMILES may not be classifiable")
print("• Network issues: Implement retry logic for connection problems")
print("• Large molecules: Complex structures may timeout or fail")
print("• API changes: Monitor for service updates and changes")

## 8. Practical Applications

Here are some practical use cases for ClassyFire chemical classification:

In [None]:
# Use Case 1: Building a Chemical Class Database
def build_classification_database(compounds_dict, delay=2):
    """
    Build a database of chemical classifications for multiple compounds.
    
    Args:
        compounds_dict (dict): Dictionary of {name: smiles} pairs
        delay (int): Delay between submissions in seconds
    
    Returns:
        dict: Classification database
    """
    print(f"Building classification database for {len(compounds_dict)} compounds...")
    print("Note: This is a demonstration of the workflow")
    
    database = {}
    
    for name, smiles in compounds_dict.items():
        print(f"\\n{name}:")
        print(f"  SMILES: {smiles}")
        
        # In a real implementation, you would:
        # 1. Submit the query
        # 2. Wait for completion
        # 3. Extract classification data
        
        # Simulated database entry
        database[name] = {
            "smiles": smiles,
            "submitted": True,
            "classification_levels": {
                "kingdom": "Organic compounds",  # Example
                "superclass": "Unknown (would be determined)",
                "class": "Unknown (would be determined)",
                "subclass": "Unknown (would be determined)"
            },
            "molecular_framework": "Unknown (would be determined)",
            "status": "Would be determined after classification"
        }
        
        print(f"  Database entry created (simulated)")
        
        if delay > 0:
            time.sleep(delay)  # Respectful delay
    
    return database

# Example with pharmaceutical compounds
pharma_compounds = {
    "Aspirin": "CC(=O)OC1=CC=CC=C1C(=O)O",
    "Paracetamol": "CC(=O)NC1=CC=C(C=C1)O",
    "Ibuprofen": "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O",
    "Caffeine": "CN1C=NC2=C1C(=O)N(C(=O)N2C)C"
}

print("Use Case 1: Building a Classification Database")
print("=" * 50)

pharma_db = build_classification_database(pharma_compounds)

print(f"\\nDatabase created with {len(pharma_db)} entries:")
for name, data in pharma_db.items():
    print(f"  {name}: {data['status']}")

print()
print("In a real implementation, this database would contain:")
print("• Complete hierarchical classifications")
print("• Molecular frameworks")
print("• Chemical fingerprints")
print("• Structural features")
print("• Cross-references to chemical databases")

In [None]:
# Use Case 2: Metabolomics Classification Analysis
def analyze_metabolite_classes(metabolite_classifications):
    """
    Analyze the distribution of chemical classes in a metabolomics dataset.
    
    Args:
        metabolite_classifications (list): List of classification results
    
    Returns:
        dict: Analysis summary
    """
    print("Use Case 2: Metabolomics Classification Analysis")
    print("=" * 50)
    
    # Simulated metabolomics classification data
    simulated_data = [
        {"name": "Glucose", "superclass": "Organic oxygen compounds", "class": "Organooxygen compounds"},
        {"name": "Alanine", "superclass": "Organic acids and derivatives", "class": "Carboxylic acids and derivatives"},
        {"name": "Cholesterol", "superclass": "Lipids and lipid-like molecules", "class": "Steroids and steroid derivatives"},
        {"name": "Caffeine", "superclass": "Organoheterocyclic compounds", "class": "Purinones"},
        {"name": "Glucose-6-phosphate", "superclass": "Organic oxygen compounds", "class": "Organooxygen compounds"},
        {"name": "Tryptophan", "superclass": "Organic acids and derivatives", "class": "Carboxylic acids and derivatives"}
    ]
    
    print("Analyzing chemical class distribution in metabolomics data:")
    print()
    
    # Count superclasses
    superclass_counts = {}
    class_counts = {}
    
    for metabolite in simulated_data:
        superclass = metabolite["superclass"]
        class_name = metabolite["class"]
        
        superclass_counts[superclass] = superclass_counts.get(superclass, 0) + 1
        class_counts[class_name] = class_counts.get(class_name, 0) + 1
    
    print("Superclass Distribution:")
    for superclass, count in sorted(superclass_counts.items(), key=lambda x: x[1], reverse=True):
        percentage = (count / len(simulated_data)) * 100
        print(f"  {superclass}: {count} ({percentage:.1f}%)")
    
    print()
    print("Class Distribution:")
    for class_name, count in sorted(class_counts.items(), key=lambda x: x[1], reverse=True):
        percentage = (count / len(simulated_data)) * 100
        print(f"  {class_name}: {count} ({percentage:.1f}%)")
    
    print()
    print("Analysis Summary:")
    print(f"  Total metabolites analyzed: {len(simulated_data)}")
    print(f"  Unique superclasses: {len(superclass_counts)}")
    print(f"  Unique classes: {len(class_counts)}")
    print(f"  Most common superclass: {max(superclass_counts, key=superclass_counts.get)}")
    print(f"  Chemical diversity index: {len(class_counts) / len(simulated_data):.2f}")
    
    return {
        "total_metabolites": len(simulated_data),
        "superclass_distribution": superclass_counts,
        "class_distribution": class_counts,
        "diversity_metrics": {
            "unique_superclasses": len(superclass_counts),
            "unique_classes": len(class_counts),
            "diversity_index": len(class_counts) / len(simulated_data)
        }
    }

# Run metabolomics analysis
metabolomics_analysis = analyze_metabolite_classes([])

print()
print("Applications in Metabolomics:")
print("• Pathway enrichment analysis")
print("• Chemical space visualization")
print("• Biomarker classification")
print("• Metabolite identification support")
print("• Chemical similarity assessment")

## Summary

The `ClassyFireAPI` class provides comprehensive access to the ClassyFire chemical classification service:

### Main ClassyFireAPI Methods:
1. **`submit_query(label, input, type='STRUCTURE')`**: Submit a classification query
2. **`query_status(query_id)`**: Check the status of a submitted query
3. **`get_query(query_id, format="json")`**: Retrieve classification results in various formats

### Supported Input Types:
- **SMILES notation**: Most common format for chemical structures
- **InChI strings**: International chemical identifiers
- **Chemical names**: Common or IUPAC names (may have variable success)
- **Structure files**: SDF and other chemical file formats

### Output Formats:
- **JSON**: Complete hierarchical classification data with all details
- **CSV**: Tabular format suitable for spreadsheet analysis
- **SDF**: Structure-Data File format for chemical software integration

### Classification Hierarchy:
ClassyFire organizes compounds into a hierarchical taxonomy:
1. **Kingdom**: Broadest level (e.g., "Organic compounds")
2. **Superclass**: Major chemical groups (e.g., "Organoheterocyclic compounds")
3. **Class**: More specific classifications (e.g., "Purinones")
4. **Subclass**: Detailed sub-categories
5. **Intermediate Nodes**: Additional classification levels
6. **Direct Parent**: Most specific classification level
7. **Molecular Framework**: Core structural motif

### Key Features:
- ✅ **Hierarchical Classification**: Multi-level taxonomic organization
- ✅ **Structural Analysis**: Based on molecular structure features
- ✅ **Multiple Formats**: JSON, CSV, SDF output options
- ✅ **Free Service**: No API key required
- ✅ **Comprehensive Database**: Over 4,800 chemical categories
- ✅ **Research-Quality**: Peer-reviewed classification system
- ✅ **Batch Processing**: Can handle multiple compounds

### Workflow Pattern:
1. **Submit**: Send SMILES/InChI to ClassyFire
2. **Wait**: Classification takes 30 seconds to several minutes
3. **Check**: Monitor query status until complete
4. **Retrieve**: Get results in desired format
5. **Parse**: Extract relevant classification information

### Best Use Cases:
- **Metabolomics Research**: Classify detected metabolites
- **Drug Discovery**: Organize compound libraries by chemical class
- **Natural Products**: Systematic classification of natural compounds
- **Chemical Database Curation**: Standardize chemical classifications
- **Pathway Analysis**: Group compounds by chemical function
- **Chemical Space Exploration**: Understand molecular diversity

### Limitations:
- **Processing Time**: Can take several minutes per compound
- **Server Dependent**: Relies on external web service availability
- **Complex Molecules**: Very large structures may fail or timeout
- **Rate Limits**: Should respect server capacity with delays between requests
- **Network Dependency**: Requires stable internet connection

### Service Information:
- **Provider**: Wishart Research Group, University of Alberta
- **URL**: http://classyfire.wishartlab.com
- **Method**: RESTful API with JSON responses
- **Citation**: Required for academic use (check ClassyFire website)
- **Updates**: Classification database is periodically updated

ClassyFire is an essential tool for chemical classification and is particularly valuable for researchers working with large chemical datasets who need systematic, hierarchical organization of molecular structures. The service provides research-quality classifications that are widely accepted in the chemical and biochemical research communities.