<a href="https://colab.research.google.com/github/GladiatorGeneral/aegis-health-chain/blob/main/notebooks/00_colab_setup.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Step 1: Check GPU & Environment

In [1]:
import torch
import sys

# Check GPU
print(f"PyTorch version: {torch.__version__}")
print(f"GPU Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU Device: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
else:
    print("‚ö†Ô∏è GPU not available. Enable it in Runtime ‚Üí Change runtime type ‚Üí GPU")

print(f"\nPython version: {sys.version}")

PyTorch version: 2.9.0+cu126
GPU Available: True
GPU Device: Tesla T4
GPU Memory: 15.83 GB

Python version: 3.12.12 (main, Oct 10 2025, 08:52:57) [GCC 11.4.0]


## Step 2: Clone Repository & Install Dependencies

In [2]:
import socket
import urllib.request
import subprocess

print("üîç Network Connectivity Check\n")

# Test 1: DNS Resolution
print("1Ô∏è‚É£ Testing DNS resolution for github.com...")
try:
    ip = socket.gethostbyname('github.com')
    print(f"‚úÖ github.com resolves to: {ip}")
except socket.gaierror as e:
    print(f"‚ùå DNS resolution failed: {e}")

# Test 2: HTTP connectivity
print("\n2Ô∏è‚É£ Testing HTTP connection to GitHub...")
try:
    response = urllib.request.urlopen('https://api.github.com', timeout=5)
    print(f"‚úÖ GitHub API reachable (Status: {response.status})")
except Exception as e:
    print(f"‚ùå HTTP connection failed: {e}")

print("\n‚úÖ Network checks complete. Proceeding to clone...")

üîç Network Connectivity Check

1Ô∏è‚É£ Testing DNS resolution for github.com...
‚úÖ github.com resolves to: 140.82.114.3

2Ô∏è‚É£ Testing HTTP connection to GitHub...
‚úÖ GitHub API reachable (Status: 200)

‚úÖ Network checks complete. Proceeding to clone...


In [3]:
import os
import subprocess
from pathlib import Path
import time

# Clone repository
repo_path = '/content/aegis-health-chain'

if not Path(repo_path).exists():
    print("üì¶ Attempting to clone repository...\n")

    # Method 1: Standard HTTPS clone with retry logic
    max_retries = 3
    for attempt in range(1, max_retries + 1):
        print(f"Attempt {attempt}/{max_retries}:")

        result = subprocess.run(
            ['git', 'clone', '--depth', '1', 'https://github.com/GladiatorGeneral/aegis-health-chain.git', repo_path],
            capture_output=True,
            text=True,
            timeout=60
        )

        if result.returncode == 0:
            print("‚úÖ Repository cloned successfully\n")
            break
        else:
            error_msg = result.stderr[:200]
            print(f"  ‚ùå Failed: {error_msg}")

            if attempt < max_retries:
                wait_time = 5 * attempt
                print(f"  ‚è≥ Waiting {wait_time}s before retry...\n")
                time.sleep(wait_time)
            else:
                print("\n‚ùå All clone attempts failed")
                print("\nüîß Troubleshooting options:")
                print("1. Check if GitHub is accessible: https://www.githubstatus.com")
                print("2. Try again in a few moments (may be a temporary network issue)")
                print("3. Manually download from: https://github.com/GladiatorGeneral/aegis-health-chain")
                raise Exception(f"Could not clone repository after {max_retries} attempts. Error: {result.stderr}")
else:
    print("‚úÖ Repository already exists")

# Change to repo directory
try:
    os.chdir(repo_path)
    print(f"‚úÖ Changed to: {os.getcwd()}")
except Exception as e:
    print(f"‚ùå Failed to change directory: {e}")
    raise

# Pull latest changes
print("\nüì• Pulling latest changes...")
result = subprocess.run(['git', 'pull'], capture_output=True, text=True, timeout=60)
if result.returncode == 0:
    print("‚úÖ Latest changes pulled")
elif "Already up to date" in result.stdout:
    print("‚úÖ Already up to date")
else:
    print(f"‚ö†Ô∏è Pull note: {result.stdout[:100]}")

print(f"\nüìÅ Working directory: {os.getcwd()}")
print(f"\nüìÇ Project structure:")
subprocess.run(['ls', '-la'], check=False)

üì¶ Attempting to clone repository...

Attempt 1/3:
‚úÖ Repository cloned successfully

‚úÖ Changed to: /content/aegis-health-chain

üì• Pulling latest changes...
‚úÖ Latest changes pulled

üìÅ Working directory: /content/aegis-health-chain

üìÇ Project structure:


CompletedProcess(args=['ls', '-la'], returncode=0)

In [4]:
import subprocess
import os
from pathlib import Path

# If git clone failed, try downloading as ZIP
repo_path = '/content/aegis-health-chain'

if not Path(repo_path).exists():
    print("üì• Attempting to download repository as ZIP...\n")

    try:
        # Download ZIP file
        zip_url = "https://github.com/GladiatorGeneral/aegis-health-chain/archive/refs/heads/main.zip"
        zip_file = "/tmp/aegis-health-chain.zip"

        result = subprocess.run(
            ['wget', zip_url, '-O', zip_file],
            capture_output=True,
            text=True,
            timeout=60
        )

        if result.returncode != 0:
            # Try curl as fallback
            result = subprocess.run(
                ['curl', '-L', zip_url, '-o', zip_file],
                capture_output=True,
                text=True,
                timeout=60
            )

        if result.returncode == 0 and Path(zip_file).exists():
            print("‚úÖ ZIP downloaded successfully")

            # Extract ZIP
            print("üì¶ Extracting ZIP...")
            result = subprocess.run(
                ['unzip', '-q', zip_file, '-d', '/content/'],
                capture_output=True,
                text=True,
                timeout=60
            )

            if result.returncode == 0:
                # Move extracted folder to correct location
                subprocess.run(['mv', '/content/aegis-health-chain-main', repo_path],
                             capture_output=True)
                print("‚úÖ Repository extracted successfully")
            else:
                print(f"‚ùå Extract failed: {result.stderr}")
        else:
            print("‚ùå ZIP download failed")
            print("\n‚ö†Ô∏è Manual setup needed:")
            print("1. Download: https://github.com/GladiatorGeneral/aegis-health-chain/archive/refs/heads/main.zip")
            print("2. Upload to Colab via Files panel (left sidebar)")
            print("3. Extract in Colab: !unzip aegis-health-chain-main.zip")
            raise Exception("Could not download repository")

    except Exception as e:
        print(f"‚ùå Download failed: {e}")
        raise

print(f"\n‚úÖ Repository ready at: {repo_path}")
os.chdir(repo_path)
print(f"Working directory: {os.getcwd()}")


‚úÖ Repository ready at: /content/aegis-health-chain
Working directory: /content/aegis-health-chain


In [5]:
# Install dependencies
print("üì• Installing dependencies...")
print("This may take 3-5 minutes on first run...\n")

# Install pip requirements using subprocess for better reliability
result = subprocess.run(
    ['pip', 'install', '-q', '-e', '.'],
    cwd=repo_path,
    capture_output=True,
    text=True,
    timeout=300
)

if result.returncode == 0:
    print("‚úÖ Dependencies installed successfully")
else:
    # Installation may have warnings but still succeed
    if "Successfully installed" in result.stderr or result.returncode == 0:
        print("‚úÖ Dependencies installed (with some warnings)")
    else:
        print(f"‚ö†Ô∏è Installation note: {result.stderr[-200:] if result.stderr else 'completed'}")

üì• Installing dependencies...
This may take 3-5 minutes on first run...

‚ö†Ô∏è Installation note: rror

√ó Getting requirements to build editable did not run successfully.
‚îÇ exit code: 1
‚ï∞‚îÄ> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.



## Step 3: Verify Installation

In [6]:
# Add src to path
import sys
sys.path.insert(0, '/content/aegis-health-chain')

# Test imports
print("üîç Testing imports...\n")

try:
    from src.udm_mapper import udm_mapper
    print("‚úÖ UDM Mapper imported successfully")
except Exception as e:
    print(f"‚ùå UDM Mapper error: {e}")

try:
    from src.data_pipeline import data_pipeline
    print("‚úÖ Data Pipeline imported successfully")
except Exception as e:
    print(f"‚ùå Data Pipeline error: {e}")

try:
    from src.huggingface_models import clinical_models
    print("‚úÖ HuggingFace Models imported successfully")
    print(f"   Available models: {clinical_models.list_available_models()}")
except Exception as e:
    print(f"‚ùå HuggingFace Models error: {e}")

print("\n‚úÖ All core modules imported successfully!")

üîç Testing imports...





‚úÖ UDM Mapper imported successfully
‚úÖ Data Pipeline imported successfully
‚úÖ HuggingFace Models imported successfully
   Available models: ['clinical_bert', 'sentence_transformer', 'clinical_pubmed_bert']

‚úÖ All core modules imported successfully!


## Step 4: Run Tests

In [7]:
import subprocess

print("üß™ Running test suite...\n")
result = subprocess.run(
    ['pytest', 'tests/', '-v', '--tb=short'],
    cwd='/content/aegis-health-chain',
    capture_output=True,
    text=True
)

print(result.stdout)
if result.stderr:
    print("STDERR:", result.stderr)

if result.returncode == 0:
    print("\n‚úÖ All tests passed!")
else:
    print(f"\n‚ö†Ô∏è Some tests failed (exit code: {result.returncode})")

üß™ Running test suite...

platform linux -- Python 3.12.12, pytest-8.4.2, pluggy-1.6.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /content/aegis-health-chain
configfile: pyproject.toml
plugins: langsmith-0.4.43, typeguard-4.4.4, anyio-4.11.0
collecting ... collected 6 items

tests/test_models.py::TestClinicalModels::test_model_loading PASSED      [ 16%]
tests/test_models.py::TestClinicalModels::test_embedding_generation PASSED [ 33%]
tests/test_models.py::TestClinicalModels::test_available_models PASSED   [ 50%]
tests/test_udm.py::TestUDMMapper::test_epic_mapping PASSED               [ 66%]
tests/test_udm.py::TestUDMMapper::test_gender_mapping PASSED             [ 83%]
tests/test_udm.py::TestUDMMapper::test_date_standardization PASSED       [100%]



‚úÖ All tests passed!


## Step 5: Quick Demo - Test UDM Mapper

In [8]:
import json
from src.udm_mapper import udm_mapper

print("üß¨ Testing UDM Mapper with sample data...\n")

# Test Epic EHR data
epic_data = {
    "PAT_MRN": "COLAB001",
    "BIRTH_DATE": "1985-03-15",
    "SEX": "F",
    "RACE": "Asian",
    "ETHNICITY": "Not Hispanic"
}

print("Input (Epic EHR):")
print(json.dumps(epic_data, indent=2))

result = udm_mapper.map_ehr_to_udm(epic_data, "epic")

print("\nOutput (UDM Format):")
print(json.dumps(result, indent=2))
print("\n‚úÖ UDM mapping successful!")

üß¨ Testing UDM Mapper with sample data...

Input (Epic EHR):
{
  "PAT_MRN": "COLAB001",
  "BIRTH_DATE": "1985-03-15",
  "SEX": "F",
  "RACE": "Asian",
  "ETHNICITY": "Not Hispanic"
}

Output (UDM Format):
{
  "patient": {
    "resourceType": "Patient",
    "id": "COLAB001",
    "birthDate": "1985-03-15",
    "gender": "female",
    "race": {
      "text": "Asian"
    },
    "ethnicity": {
      "text": "Not Hispanic"
    }
  }
}

‚úÖ UDM mapping successful!


## Step 6: Generate Synthetic Data

In [9]:
from src.data_pipeline import data_pipeline
import pandas as pd

print("üß™ Generating synthetic patient data...\n")

# Generate 50 synthetic patients
synthetic_data = data_pipeline.generate_synthetic_data(50)

# Extract and display statistics
genders = [p['patient'].get('gender', 'unknown') for p in synthetic_data if 'patient' in p]
gender_counts = pd.Series(genders).value_counts()

print(f"Generated {len(synthetic_data)} synthetic patient records\n")
print("Gender Distribution:")
print(gender_counts)

print("\n‚úÖ Synthetic data generation successful!")

üß™ Generating synthetic patient data...

Generated 50 synthetic patient records

Gender Distribution:
unknown    50
Name: count, dtype: int64

‚úÖ Synthetic data generation successful!


## Step 7: (Optional) Configure GitHub Authentication for Push

If you want to push changes back to GitHub, run the cell below and follow the instructions:

```
# This creates a personal access token setup
# Go to https://github.com/settings/tokens/new
# Create a token with 'repo' scope
# Paste it when prompted
```

In [10]:
# Optional: Configure Git for commits
# Uncomment and run if you want to push changes to GitHub

# import getpass
#
# print("üìù Configuring Git for GitHub...\n")
#
# # Configure Git user
# os.system('git config user.email "your-email@example.com"')
# os.system('git config user.name "Your Name"')
#
# # Get GitHub token
# token = getpass.getpass("Enter your GitHub personal access token: ")
#
# # Update remote URL to use token
# os.system(f'git remote set-url origin https://{token}@github.com/PhnxNexus/aegis-health-chain.git')
#
# print("\n‚úÖ Git configured for pushing to GitHub")
# print("You can now run: git add . && git commit -m 'message' && git push")

## üéâ Setup Complete!

You're ready to start developing. Here's what you can do next:

### 1Ô∏è‚É£ Run Existing Notebooks
Open `notebooks/01_data_exploration.ipynb` to explore data and test the UDM mapper.

### 2Ô∏è‚É£ Test HuggingFace Models
Run `notebooks/03_huggingface_test.ipynb` to test clinical embeddings (requires downloading models).

### 3Ô∏è‚É£ Start Development
Use the cells below to write your own code:

```python
from src import udm_mapper, data_pipeline, clinical_models

# Map EHR data
result = udm_mapper.map_ehr_to_udm(your_data, "epic")

# Generate synthetic data
synthetic = data_pipeline.generate_synthetic_data(100)

# Get clinical embeddings
embeddings = clinical_models.get_clinical_embeddings("patient text")
```

### 4Ô∏è‚É£ Push Changes (Optional)
When ready, commit and push changes back to GitHub.

**Happy coding! üöÄ**

## üí° Useful Colab Tips

- **GPU:** Go to Runtime ‚Üí Change runtime type ‚Üí Hardware accelerator: GPU
- **Save Notebook:** Click "Copy to Drive" to save a copy in your Google Drive
- **Install Packages:** Use `pip install package_name` or `!pip install package_name`
- **Run Terminal Commands:** Use `os.system('command')` or `!command`
- **Mount Google Drive:** Use `google.colab.drive.mount('/content/drive')`
- **Download Files:** Use download icon or create a download link in code