# Mass Simulation on Google Colab - Setup & Execution

This notebook will help you run your mass simulation on Google Colab with maximum CPU parallelization.

## üìã Prerequisites
- You've uploaded the following files to your Google Drive or will upload them directly:
  - `mass_simulation_LHS_G.py`
  - `main_simulation_G.py`
  - `simulation_utils_G.py`

## üöÄ Quick Start
1. Run each cell in order (Shift+Enter)
2. Upload the 3 Python files when prompted
3. Configure your simulation parameters
4. Start the simulation
5. Download results when complete

## 1Ô∏è‚É£ Check Available Resources

In [None]:
# Check CPU and RAM availability
import multiprocessing
import psutil

n_cpus = multiprocessing.cpu_count()
ram_gb = psutil.virtual_memory().total / (1024**3)

print("="*60)
print("üñ•Ô∏è  GOOGLE COLAB RESOURCES")
print("="*60)
print(f"CPU Cores Available: {n_cpus}")
print(f"Total RAM: {ram_gb:.1f} GB")
print("="*60)
print("\nüí° TIP: With compute units, you should have access to more CPUs!")
print("   Your simulations will automatically use all available cores.")

## 2Ô∏è‚É£ Install Required Packages

In [None]:
# Install required packages
!pip install -q numpy pandas matplotlib scipy numba

print("‚úÖ All packages installed successfully!")

## 3Ô∏è‚É£ Option A: Upload Files Directly (Recommended for First Time)

In [None]:
from google.colab import files
import os

print("üì§ Please upload the following 3 files:")
print("   1. mass_simulation_LHS_G.py")
print("   2. main_simulation_G.py")
print("   3. simulation_utils_G.py")
print("\nClick 'Choose Files' and select all 3 files at once.\n")

uploaded = files.upload()

# Verify all files are uploaded
required_files = ['mass_simulation_LHS_G.py', 'main_simulation_G.py', 'simulation_utils_G.py']
missing = [f for f in required_files if f not in uploaded]

if missing:
    print(f"\n‚ö†Ô∏è  Missing files: {missing}")
    print("Please run this cell again and upload the missing files.")
else:
    print("\n‚úÖ All required files uploaded successfully!")

## 3Ô∏è‚É£ Option B: Mount Google Drive (if you've already uploaded files there)

In [None]:
# Uncomment and run this cell if you want to use Google Drive
# from google.colab import drive
# drive.mount('/content/drive')

# # Copy files from your Google Drive folder to the working directory
# import shutil
# drive_folder = '/content/drive/MyDrive/your_folder_name/'  # CHANGE THIS PATH
# !cp {drive_folder}/*_G.py .
# print("‚úÖ Files copied from Google Drive!")

## 4Ô∏è‚É£ Configure Simulation Parameters

Adjust these parameters based on your needs and available compute units:

In [None]:
# ============= SIMULATION CONFIGURATION =============

# Number of parameter sets to sample (LHS sampling)
N_PARAM_SETS = 100  # Increase this for more parameter coverage

# Number of replicate simulations per parameter-scenario combination
N_REPLICATES = 10  # Each replicate uses a different random seed

# Number of CPU cores to use (None = use all available)
N_CORES = None  # Set to a specific number if you want to limit CPU usage

# Output directory name
OUTPUT_DIR = 'mass_sim_results'

# Random seed for reproducibility
RANDOM_SEED = 42

# Combine all results into a single file at the end?
# (Warning: This requires more memory for large simulations)
COMBINE_RESULTS = False  # Set to True if you want one big CSV file

# ====================================================

# Calculate total simulations
n_scenarios = 6  # Fixed based on your scenario generation
total_sims = N_PARAM_SETS * n_scenarios * N_REPLICATES

print("="*60)
print("‚öôÔ∏è  SIMULATION CONFIGURATION")
print("="*60)
print(f"Parameter Sets: {N_PARAM_SETS}")
print(f"Replicates per param-scenario: {N_REPLICATES}")
print(f"CPU Cores: {N_CORES if N_CORES else 'All available'}")
print(f"Output Directory: {OUTPUT_DIR}")
print(f"\nüìä TOTAL SIMULATIONS: {total_sims:,}")
print(f"   ({N_PARAM_SETS} params √ó {n_scenarios} scenarios √ó {N_REPLICATES} replicates)")
print("="*60)

## 5Ô∏è‚É£ Run the Mass Simulation

This will take a while depending on your configuration. Progress will be displayed below.

In [None]:
# Import and run the mass simulation
from mass_simulation_LHS_G import run_mass_simulations
from datetime import datetime

print("\n" + "="*60)
print("üöÄ STARTING MASS SIMULATION")
print("="*60)
print(f"Start Time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print("="*60 + "\n")

# Run the simulation
results = run_mass_simulations(
    n_param_sets=N_PARAM_SETS,
    n_replicates=N_REPLICATES,
    n_cores=N_CORES,
    output_dir=OUTPUT_DIR,
    base_seed=RANDOM_SEED,
    combine_at_end=COMBINE_RESULTS,
    verbose=True
)

print("\n" + "="*60)
print("‚úÖ SIMULATION COMPLETE!")
print("="*60)
print(f"End Time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print("="*60)

## 6Ô∏è‚É£ Preview Results

In [None]:
# List all generated files
import os
import glob

result_files = glob.glob(f"{OUTPUT_DIR}/*.csv")
result_files.sort()

print(f"\nüìÅ Generated {len(result_files)} result files:")
print("="*60)

total_size = 0
for f in result_files:
    size_mb = os.path.getsize(f) / (1024**2)
    total_size += size_mb
    print(f"  {os.path.basename(f):50s} {size_mb:8.2f} MB")

print("="*60)
print(f"Total size: {total_size:.2f} MB")
print("="*60)

## 7Ô∏è‚É£ Quick Data Check

In [None]:
import pandas as pd

# Load and preview sampled parameters
params_df = pd.read_csv(f"{OUTPUT_DIR}/sampled_parameters.csv")
print("\nüìä Sampled Parameters (first 5 rows):")
print(params_df.head())

# Load and preview scenarios
scenarios_df = pd.read_csv(f"{OUTPUT_DIR}/scenarios.csv")
print("\nüìã Scenarios:")
print(scenarios_df)

# Load and preview first parameter set results
first_result = pd.read_csv(f"{OUTPUT_DIR}/simulation_results_param_set_0.csv")
print("\nüìà Sample Results (first 5 rows of param_set_0):")
print(first_result.head())
print(f"\nShape: {first_result.shape[0]} rows √ó {first_result.shape[1]} columns")

## 8Ô∏è‚É£ Create a ZIP file for Download

In [None]:
import shutil
from datetime import datetime

# Create a zip file of all results
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
zip_filename = f'simulation_results_{timestamp}'

print(f"\nüì¶ Creating ZIP archive: {zip_filename}.zip")
shutil.make_archive(zip_filename, 'zip', OUTPUT_DIR)

zip_size = os.path.getsize(f'{zip_filename}.zip') / (1024**2)
print(f"‚úÖ ZIP created successfully! Size: {zip_size:.2f} MB")

## 9Ô∏è‚É£ Download Results

In [None]:
from google.colab import files

print("‚¨áÔ∏è  Starting download...")
print("   (This may take a moment depending on file size)\n")

files.download(f'{zip_filename}.zip')

print("\n‚úÖ Download initiated! Check your browser's download folder.")

## üîü (Optional) Save Results to Google Drive

In [None]:
# Uncomment and run this if you want to save to Google Drive
# from google.colab import drive
# import shutil

# # Mount Google Drive if not already mounted
# drive.mount('/content/drive', force_remount=False)

# # Copy the entire results folder to Google Drive
# drive_destination = '/content/drive/MyDrive/simulation_results/'  # CHANGE THIS PATH
# shutil.copytree(OUTPUT_DIR, drive_destination, dirs_exist_ok=True)

# print(f"‚úÖ Results saved to Google Drive: {drive_destination}")

## üìù Notes

### About CPU Usage
- The simulation automatically detects and uses all available CPU cores
- With Google Colab's compute units, you'll have access to more powerful CPUs
- Progress is displayed for each parameter set

### About Results
- Results are saved incrementally (one file per parameter set)
- This allows you to resume if the session disconnects
- Each file contains all scenarios and replicates for that parameter set

### Troubleshooting
- **Session timeout**: If your session times out, re-run cells 1-5, then cell 6 will resume from where it left off
- **Out of memory**: Reduce `N_PARAM_SETS` or `N_REPLICATES`
- **Too slow**: Check that you have compute units enabled and are using all CPUs

### Next Steps
After downloading, you can analyze results using your local R or Python environment.