In [None]:
# Orpha Disease Preprocessing System - Installation Guide

Welcome to the Orpha Disease Preprocessing System! This notebook will guide you through the installation and setup process.

## Learning Objectives

By the end of this notebook, you will be able to:
- ✅ Install all required dependencies
- ✅ Verify your Python environment
- ✅ Process Orphanet XML data into the optimized format
- ✅ Validate your installation with basic queries
- ✅ Understand the directory structure

## Prerequisites

- Python 3.8 or higher
- Internet connection for downloading packages
- Basic familiarity with command-line tools
- Orphanet XML data file (we'll help you get this)

## Time Estimate
**15-20 minutes**

---

Let's get started! 🚀


In [None]:
## Step 1: Environment Check

First, let's verify that your Python environment meets the requirements.


In [None]:
import sys
import os
from pathlib import Path

print("🐍 Python Environment Check")
print("=" * 30)

# Check Python version
python_version = sys.version_info
print(f"Python version: {python_version.major}.{python_version.minor}.{python_version.micro}")

if python_version >= (3, 8):
    print("✅ Python version is compatible")
else:
    print("❌ Python 3.8+ is required")

# Check current working directory
current_dir = Path.cwd()
print(f"\nCurrent directory: {current_dir}")

# Check if we're in the right project structure
expected_dirs = ["utils", "data", "cookbooks", "tools"]
missing_dirs = []

for dir_name in expected_dirs:
    if not (current_dir / dir_name).exists():
        missing_dirs.append(dir_name)

if not missing_dirs:
    print("✅ Project structure looks correct")
else:
    print(f"⚠️  Missing directories: {missing_dirs}")
    print("💡 Make sure you're running this from the project root directory")

print(f"\nPython executable: {sys.executable}")
print(f"Platform: {sys.platform}")
print(f"System: {os.name}")

print("\n✅ Environment check complete!")


In [None]:
## Step 2: Install Required Packages

Let's install the necessary Python packages for the Orpha system.


In [None]:
import subprocess
import sys

def install_package(package_name):
    """Install a package using pip"""
    try:
        subprocess.check_call([sys.executable, "-m", "pip", "install", package_name])
        return True
    except subprocess.CalledProcessError:
        return False

def check_package(package_name):
    """Check if a package is installed"""
    try:
        __import__(package_name)
        return True
    except ImportError:
        return False

# Required packages
required_packages = [
    "pydantic",
    "lxml", 
    "jupyter",
    "matplotlib",
    "pandas",
    "numpy"
]

print("📦 Package Installation Check")
print("=" * 35)

for package in required_packages:
    print(f"\nChecking {package}...")
    
    if check_package(package):
        print(f"✅ {package} is already installed")
    else:
        print(f"⚠️  {package} not found, installing...")
        if install_package(package):
            print(f"✅ Successfully installed {package}")
        else:
            print(f"❌ Failed to install {package}")

print("\n✅ Package installation complete!")
print("💡 If you see any errors, you may need to install packages manually using:")
print("   pip install pydantic lxml jupyter matplotlib pandas numpy")


In [None]:
## Step 3: Data Setup

The Orpha system requires processed XML data. Let's check if you have the required data and help you set it up.


In [None]:
from pathlib import Path
import os

print("📁 Data Setup Check")
print("=" * 25)

# Check data directory structure
data_dir = Path("data")
processed_dir = data_dir / "processed"
input_dir = data_dir / "input"

print(f"Data directory: {data_dir.absolute()}")

# Check if processed data exists
if processed_dir.exists():
    required_subdirs = ["taxonomy", "instances", "cache"]
    missing_subdirs = []
    
    for subdir in required_subdirs:
        if not (processed_dir / subdir).exists():
            missing_subdirs.append(subdir)
    
    if not missing_subdirs:
        print("✅ Processed data found and appears complete")
        
        # Show data file info
        taxonomy_files = list((processed_dir / "taxonomy").glob("*.json"))
        instance_files = list((processed_dir / "instances").glob("*.json"))
        
        print(f"  📂 Taxonomy files: {len(taxonomy_files)}")
        print(f"  🏥 Instance files: {len(instance_files)}")
        
        if taxonomy_files:
            print(f"  📊 Example taxonomy file: {taxonomy_files[0].name}")
        
        data_ready = True
    else:
        print(f"⚠️  Processed data incomplete, missing: {missing_subdirs}")
        data_ready = False
else:
    print("⚠️  No processed data found")
    data_ready = False

if not data_ready:
    print("\n📋 To set up data:")
    print("1. Place your Orphanet XML file in data/input/")
    print("2. Run: python tools/disease_preprocessing.py data/input/your_file.xml data/processed/")
    print("3. This will create the optimized JSON files")
    
    # Check if input directory exists and has XML files
    if input_dir.exists():
        xml_files = list(input_dir.glob("*.xml"))
        if xml_files:
            print(f"\n💡 Found {len(xml_files)} XML file(s) in data/input/:")
            for xml_file in xml_files:
                print(f"  - {xml_file.name}")
            print("You can process these using the preprocessing script!")
        else:
            print("\n💡 No XML files found in data/input/")
    else:
        print("\n💡 data/input/ directory doesn't exist yet")
        print("   Create it and place your Orphanet XML file there")

print(f"\n📊 Data setup status: {'✅ Ready' if data_ready else '⚠️  Setup needed'}")


In [None]:
## Step 4: Installation Validation

Let's test that everything is working correctly by trying to import and initialize the Orpha system.


In [None]:
print("🧪 Installation Validation")
print("=" * 30)

# Test 1: Import the Orpha modules
try:
    from utils.orpha import OrphaTaxonomy, TaxonomyGraph, DiseaseInstances
    print("✅ Successfully imported Orpha modules")
    import_success = True
except ImportError as e:
    print(f"❌ Import failed: {e}")
    import_success = False

if import_success:
    # Test 2: Initialize the system (only if data exists)
    try:
        if data_ready:  # Use variable from previous cell
            print("\n🚀 Testing system initialization...")
            taxonomy = OrphaTaxonomy()
            print("✅ System initialized successfully!")
            
            # Test 3: Basic functionality
            print("\n📊 Testing basic functionality...")
            stats = taxonomy.get_statistics()
            print(f"  - Total categories: {stats['combined']['total_categories']}")
            print(f"  - Total diseases: {stats['combined']['total_diseases']}")
            print(f"  - System version: {stats.get('version', 'N/A')}")
            
            # Test 4: Simple query
            print("\n🔍 Testing simple query...")
            all_categories = taxonomy.taxonomy.get_all_categories()
            if all_categories:
                first_category = all_categories[0]
                print(f"  - First category: {first_category.name}")
                print(f"  - Category ID: {first_category.id}")
                print("✅ Basic query successful!")
            else:
                print("⚠️  No categories found")
            
            print("\n🎉 All tests passed! Installation is successful!")
            
        else:
            print("\n⚠️  Skipping system tests - processed data not available")
            print("    Complete the data setup first, then re-run this cell")
            
    except Exception as e:
        print(f"❌ System initialization failed: {e}")
        print("💡 Check that your processed data is complete and valid")

else:
    print("\n❌ Cannot proceed with validation due to import errors")
    print("💡 Check that all required packages are installed")

print("\n" + "=" * 50)
print("🎯 Installation Summary:")
print(f"  Python Environment: {'✅' if python_version >= (3, 8) else '❌'}")
print(f"  Required Packages: {'✅' if import_success else '❌'}")
print(f"  Data Setup: {'✅' if data_ready else '⚠️'}")
print(f"  System Ready: {'✅' if import_success and data_ready else '⚠️'}")
print("=" * 50)


In [None]:
## 🎉 Installation Complete!

Congratulations! You have successfully set up the Orpha Disease Preprocessing System.

### What You've Accomplished

- ✅ Verified your Python environment
- ✅ Installed all required packages
- ✅ Set up the data directory structure
- ✅ Validated the system installation

### Next Steps

Now you're ready to explore the Orpha system! Here's what to do next:

1. **Learn the Basics**: Continue with [02_basic_concepts.ipynb](02_basic_concepts.ipynb)
2. **Try Your First Queries**: Move on to [03_first_queries.ipynb](03_first_queries.ipynb)
3. **Explore Examples**: Check out the `utils/orpha/examples/basic_usage.py` file
4. **Read Documentation**: Review the `utils/orpha/docs/README.md` for detailed information

### If You Encountered Issues

- **Data Setup Problems**: Make sure you have an Orphanet XML file in `data/input/` and run the preprocessing script
- **Package Installation Issues**: Try installing packages manually: `pip install pydantic lxml jupyter matplotlib pandas numpy`
- **Import Errors**: Ensure you're running from the project root directory
- **Other Issues**: Check the [Troubleshooting Guide](../../../utils/orpha/docs/TROUBLESHOOTING.md)

### Quick Test

If everything is working, you should be able to run this simple test:

```python
from utils.orpha import OrphaTaxonomy
taxonomy = OrphaTaxonomy()
stats = taxonomy.get_statistics()
print(f"Total diseases: {stats['combined']['total_diseases']}")
```

### Resources

- **📚 Documentation**: `utils/orpha/docs/`
- **🔗 API Reference**: `utils/orpha/docs/API_REFERENCE.md`
- **💡 Examples**: `utils/orpha/examples/`
- **📝 Cookbooks**: `cookbooks/orpha/`

---

**Happy exploring!** 🚀

You're now ready to discover the power of the Orpha Disease Preprocessing System. The next notebook will introduce you to the core concepts and architecture.
