# Train the Neural Mesh Simplification model

## Set up the environment

### [*only required for remote runs*] Remote environment setup

If you are running this notebook remotely (e.g. Google Colab), you'll want to set up the environment by
* Downloading the repository from GitHub
* Setting up the python environment

If are opening this notebook locally, by running `jupyter lab` from the repository root and the right conda environment activated, the above step is not required.

#### Step 1. Check out the repo
That's where the source code for mesh simplification, along with its dependency definitions and other utilities, lives.

In [None]:
!git clone https://github.com/dw-janubeus/neural-mesh-simplification.git neural-mesh-simplification

#### Step 2. Install python version 3.12 using apt-get

Check the current python version by running the following command. This notebook requires Python 3.12 to run. Either install it via your Notebook environment settings and jump to Step 6 or follow all the steps below.

In [None]:
!python --version

In [None]:
!sudo apt-get update
!sudo apt-get install python3.12

#### Step 3. Update alternatives to use the new Python version

In [None]:
!sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.12 1
!sudo update-alternatives --config python3

#### Step 4. Install pip and the required packages for the new Python version.

In [None]:
!rm -f get-pip*.py
!wget https://bootstrap.pypa.io/get-pip.py
!python get-pip.py
!python -m pip install ipykernel
!python -m ipykernel install --user --name python3.12 --display-name "Python 3.12"

#### Step 5. Restart and verify
At this point you may need to restart the session, after which you want to verify that `python` is at the right version (`3.12`)

In [None]:
!python --version

#### Step 6. Upgrade pip and setuptools

In [None]:
!pip install --upgrade pip setuptools wheel
!pip install --upgrade build

### Set repository as the working directory 
CD into the repository downloaded above

In [None]:
%cd neural-mesh-simplification

### Package requirements

Depending on whether you are using PyTorch on a CPU or a GPU,
you'll have to use the correct binaries for PyTorch and the PyTorch Geometric libraries. You can install them via:

In [None]:
!pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121
!pip install torch_cluster==1.6.3 torch_geometric==2.5.3 torch_scatter==2.1.2 torch_sparse==0.6.18 -f https://data.pyg.org/whl/torch-2.4.0+cu121.html

Replace “cu121” with the appropriate CUDA version for your system. If you don't know what is your cuda version, run `nvidia-smi`

Only then you can install the requirements via pip:

In [None]:
!pip install -r requirements.txt
!pip install GPUtil  # For GPU monitoring in Colab
!pip uninstall -y neural-mesh-simplification
!pip install .

---
## Download the training data
We can use the Hugging Face API to download some mesh data to use for training and evaluation.

In [None]:
import os
import shutil
from huggingface_hub import snapshot_download

target_folder = "data/raw"
wip_folder = os.path.join(target_folder, "wip")
os.makedirs(wip_folder, exist_ok=True)

# abc_train is really large (+5k meshes), so download just a sample
folder_patterns = ["abc_extra_noisy/03_meshes/*.ply", "abc_train/03_meshes/*.ply"]

# Download
snapshot_download(
    repo_id="perler/ppsurf",
    repo_type="dataset",
    cache_dir=wip_folder,
    allow_patterns=folder_patterns[0],
)

# Move files from wip folder to target folder
for root, _, files in os.walk(wip_folder):
    for file in files:
        if file.endswith(".ply"):
            src_file = os.path.join(root, file)
            dest_file = os.path.join(target_folder, file)
            shutil.copy2(src_file, dest_file)
            os.remove(src_file)

# Remove the wip folder
shutil.rmtree(wip_folder)

## Prepare the data
The downloaded data needs to be prepapared for training. We can use a script in the repository we checked out for that.

In [None]:
!mkdir -p data/processed
!python scripts/preprocess_data.py

---
## Model Training

When using a GPU, ensure the training is happening on the GPU, and the environment is configured properly.

In [None]:
import torch
import psutil
import GPUtil

# GPU verification and memory management for Colab T4
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device count: {torch.cuda.device_count()}")
    for i in range(torch.cuda.device_count()):
        print(f"Device {i}: {torch.cuda.get_device_name(i)}")
        print(f"Memory total: {torch.cuda.get_device_properties(i).total_memory / 1024**3:.2f} GB")
    
    # Clear cache and optimize for T4
    torch.cuda.empty_cache()
    torch.backends.cudnn.benchmark = True
    
print(f"\nSystem RAM: {psutil.virtual_memory().total / 1024**3:.2f} GB")
print(f"Available RAM: {psutil.virtual_memory().available / 1024**3:.2f} GB")

!nvcc --version

### Setup Google Drive (Optional)
Mount Google Drive to persist checkpoints across Colab sessions:

In [None]:
# Uncomment to mount Google Drive for checkpoint persistence
# from google.colab import drive
# drive.mount('/content/drive')

# Set checkpoint directory to Google Drive (uncomment if using Drive)
# checkpoint_dir = '/content/drive/MyDrive/neural-mesh-checkpoints'
# !mkdir -p "{checkpoint_dir}"

# Or use local storage (will be lost when session ends)
checkpoint_dir = 'checkpoints'
!mkdir -p {checkpoint_dir}

print(f"Checkpoint directory: {checkpoint_dir}")

### Training Configuration
Configure training parameters optimized for Colab T4:

In [None]:
# Training configuration for T4
import os

# Check available data
data_path = "data/processed"
if os.path.exists(data_path):
    data_files = os.listdir(data_path)
    print(f"Found {len(data_files)} processed data files")
else:
    print("Warning: Processed data directory not found!")

# Set training arguments
config_path = "configs/default.yaml"
resume_checkpoint = None  # Set to path if resuming training

print(f"Data path: {data_path}")
print(f"Config path: {config_path}")
print(f"Checkpoint directory: {checkpoint_dir}")
print(f"Resume from: {resume_checkpoint or 'None (fresh training)'}")

### Start Training with Enhanced Arguments
Using all available training script features optimized for Colab:

In [None]:
# Enhanced training command with all available arguments
cmd = f"python scripts/train.py \
    --data-path {data_path} \
    --config {config_path} \
    --checkpoint-dir {checkpoint_dir} \
    --monitor \
    --debug"

# Add resume option if checkpoint exists
if resume_checkpoint:
    cmd += f" --resume {resume_checkpoint}"

print(f"Training command:\n{cmd}\n")
print("Starting training...")
print("=" * 50)

!{cmd}

---
## Post-Training Analysis

### Check Available Checkpoints
List and analyze saved checkpoints:

In [None]:
import os
import glob

# Find all checkpoint files
checkpoint_pattern = os.path.join(checkpoint_dir, '*.pth')
checkpoints = glob.glob(checkpoint_pattern)

if checkpoints:
    print(f"Found {len(checkpoints)} checkpoint(s):")
    for cp in sorted(checkpoints):
        size_mb = os.path.getsize(cp) / (1024*1024)
        print(f"  - {os.path.basename(cp)} ({size_mb:.1f} MB)")
    
    # Use the most recent checkpoint
    latest_checkpoint = max(checkpoints, key=os.path.getctime)
    print(f"\nMost recent checkpoint: {os.path.basename(latest_checkpoint)}")
else:
    print("No checkpoints found. Training may have failed or not started.")
    latest_checkpoint = None

### Model Evaluation
Evaluate the trained model on test data:

In [None]:
if latest_checkpoint:
    # Run evaluation on test set
    eval_cmd = f"python scripts/evaluate.py \
        --eval-data-path {data_path} \
        --checkpoint {latest_checkpoint} \
        --config {config_path}"
    
    print(f"Evaluation command:\n{eval_cmd}\n")
    print("Running evaluation...")
    print("=" * 30)
    
    !{eval_cmd}
else:
    print("No checkpoint available for evaluation.")

### Inference Example
Test the trained model on a sample mesh:

In [None]:
if latest_checkpoint:
    # Find a sample mesh file
    sample_meshes = glob.glob("data/raw/*.ply")
    
    if sample_meshes:
        sample_mesh = sample_meshes[0]
        output_mesh = "simplified_sample.obj"
        
        infer_cmd = f"python scripts/infer.py \
            --input-file {sample_mesh} \
            --output-file {output_mesh} \
            --model-checkpoint {latest_checkpoint} \
            --device cuda"
        
        print(f"Inference command:\n{infer_cmd}\n")
        print(f"Simplifying: {os.path.basename(sample_mesh)}")
        print("=" * 30)
        
        !{infer_cmd}
        
        # Check if output was created
        if os.path.exists(output_mesh):
            output_size = os.path.getsize(output_mesh) / 1024
            input_size = os.path.getsize(sample_mesh) / 1024
            print(f"\nSimplification complete!")
            print(f"Input size: {input_size:.1f} KB")
            print(f"Output size: {output_size:.1f} KB")
            print(f"Compression ratio: {input_size/output_size:.2f}x")
        else:
            print("\nInference failed - no output file generated.")
    
    else:
        print("No sample meshes found in data/raw/ for inference test.")
else:
    print("No checkpoint available for inference.")

### Training Tips for Colab T4

**For resuming interrupted training:**
1. Set `resume_checkpoint` to the path of your latest checkpoint
2. Re-run the training cell

**For Google Drive persistence:**
1. Uncomment the Google Drive mounting code
2. Set `checkpoint_dir` to your Drive folder
3. Your checkpoints will survive Colab session restarts

**Memory management:**
- The default batch size (2) is optimized for T4's ~15GB memory
- If you get OOM errors, reduce batch size in `configs/default.yaml`
- Monitor GPU memory usage with the included monitoring

**Training monitoring:**
- Debug logs provide detailed training progress
- Resource monitoring tracks CPU/GPU usage
- Checkpoints are saved automatically for recovery