# Kaggle SSH Tunnel Setup (ngrok)

This notebook establishes an SSH tunnel to the Kaggle VM using ngrok, allowing remote development via VS Code/Cursor Remote-SSH extension.

## Prerequisites

1. **Kaggle Secrets** configured in Account Settings:
   - `DVC_SERVICE_ACCOUNT_JSON`: Google Service Account JSON
   - `WANDB_API_KEY`: Weights & Biases API key
   - `NGROK_TOKEN`: ngrok authentication token

2. **GPU** enabled for this notebook

3. **Internet** enabled for this notebook

## Usage

1. Run all cells in order
2. Copy the SSH command and password from Cell 4 output
3. Connect via SSH from your local machine (password is randomly generated per session)
4. Or use VS Code/Cursor Remote-SSH extension
5. Keep this notebook running throughout your training session

## Security Note

- SSH password is **randomly generated** each session for security
- Save the password from Cell 4 output immediately
- Password changes every time you restart the notebook


## Cell 1: Install ngrok


In [1]:
print("Installing ngrok...")
!wget -q https://bin.equinox.io/c/bNyj1mQVY4c/ngrok-v3-stable-linux-amd64.tgz
!tar xzf ngrok-v3-stable-linux-amd64.tgz -C /usr/local/bin
!rm ngrok-v3-stable-linux-amd64.tgz
!ngrok version
print("‚úì ngrok installed")


Installing ngrok...
ngrok version 3.34.0
‚úì ngrok installed


## Cell 2: Setup SSH Service


In [2]:
print("Setting up SSH service...")
import subprocess
import secrets

# Install SSH server
!apt-get update -qq && apt-get install -y -qq openssh-server > /dev/null

# Generate random SSH password per session (store globally for later cells)
SSH_PASSWORD = secrets.token_urlsafe(16)

# Configure SSH
!echo "root:{SSH_PASSWORD}" | chpasswd
!echo "PermitRootLogin yes" >> /etc/ssh/sshd_config
!echo "PasswordAuthentication yes" >> /etc/ssh/sshd_config

# Start service
!service ssh restart

print("‚úì SSH service started on port 22")
print(f"üîê SSH Password (save this!): {SSH_PASSWORD}")
print("‚ö†Ô∏è  Password is randomly generated each session for security")


Setting up SSH service...
W: Skipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu/ubuntu jammy InRelease' does not seem to provide it (sources.list entry misspelt?)
 * Restarting OpenBSD Secure Shell server sshd
   ...done.
‚úì SSH service started on port 22
üîê SSH Password (save this!): onb4l44pb5PHr9Eq0GNvQw
‚ö†Ô∏è  Password is randomly generated each session for security


## Cell 3: Inject Kaggle Secrets as Environment Variables


In [3]:
print("Injecting Kaggle Secrets as environment variables...")
import os

# ‚úÖ CORRECT WAY: Use kaggle_secrets to access Kaggle Secrets
from kaggle_secrets import UserSecretsClient

user_secrets = UserSecretsClient()

# Read secrets using Kaggle Secrets API
# Note: Secret names must match exactly what you set in Kaggle Account Settings
try:
    dvc_json = user_secrets.get_secret("DVC_SERVICE_ACCOUNT_JSON")
    print(f"‚úì DVC_SERVICE_ACCOUNT_JSON loaded: {len(dvc_json)} characters")
except Exception as e:
    print(f"‚ùå Error loading DVC_SERVICE_ACCOUNT_JSON: {e}")
    print("   Make sure you've added this secret in Kaggle Account Settings")
    dvc_json = ""

try:
    wandb_key = user_secrets.get_secret("WANDB_API_KEY")
    print(f"‚úì WANDB_API_KEY loaded: {len(wandb_key)} characters")
except Exception as e:
    print(f"‚ùå Error loading WANDB_API_KEY: {e}")
    print("   Make sure you've added this secret in Kaggle Account Settings")
    wandb_key = ""

try:
    ngrok_token = user_secrets.get_secret("NGROK_TOKEN")
    print(f"‚úì NGROK_TOKEN loaded: {len(ngrok_token)} characters")
except Exception as e:
    print(f"‚ùå Error loading NGROK_TOKEN: {e}")
    print("   Make sure you've added this secret in Kaggle Account Settings")
    ngrok_token = ""

# Validate secrets were loaded
if not dvc_json or not wandb_key or not ngrok_token:
    print("\n‚ùå FAILED: One or more secrets are missing!")
    print("\nPlease check:")
    print("  1. Go to Kaggle Account Settings ‚Üí Secrets")
    print("  2. Add secrets with EXACT names:")
    print("     - DVC_SERVICE_ACCOUNT_JSON")
    print("     - WANDB_API_KEY")
    print("     - NGROK_TOKEN")
    print("  3. Make sure 'Add-ons' are enabled for this notebook")
else:
    # Expose as environment variables
    os.environ["KAGGLE_SECRET_DVC_JSON"] = dvc_json
    os.environ["KAGGLE_SECRET_WANDB_KEY"] = wandb_key
    os.environ["KAGGLE_SECRET_NGROK_TOKEN"] = ngrok_token

    # Persist to .bashrc for SSH sessions
    # Use base64 encoding to avoid bash injection vulnerabilities
    import base64

    dvc_json_b64 = base64.b64encode(dvc_json.encode()).decode()
    wandb_key_b64 = base64.b64encode(wandb_key.encode()).decode()
    ngrok_token_b64 = base64.b64encode(ngrok_token.encode()).decode()

    with open("/root/.bashrc", "a") as f:
        f.write("\n# Kaggle Secrets for Training (base64-encoded for security)\n")
        f.write(f'export KAGGLE_SECRET_DVC_JSON_B64="{dvc_json_b64}"\n')
        f.write(f'export KAGGLE_SECRET_WANDB_KEY_B64="{wandb_key_b64}"\n')
        f.write(f'export KAGGLE_SECRET_NGROK_TOKEN_B64="{ngrok_token_b64}"\n')

    print("\n‚úÖ Secrets injected successfully!")
    print(f"   - KAGGLE_SECRET_DVC_JSON: {len(dvc_json)} characters")
    print(f"   - KAGGLE_SECRET_WANDB_KEY: {len(wandb_key)} characters")
    print(f"   - KAGGLE_SECRET_NGROK_TOKEN: {len(ngrok_token)} characters")
    print("\n‚úì Secrets are now available in SSH sessions")

Injecting Kaggle Secrets as environment variables...
‚úì DVC_SERVICE_ACCOUNT_JSON loaded: 2398 characters
‚úì WANDB_API_KEY loaded: 40 characters
‚úì NGROK_TOKEN loaded: 49 characters

‚úÖ Secrets injected successfully!
   - KAGGLE_SECRET_DVC_JSON: 2398 characters
   - KAGGLE_SECRET_WANDB_KEY: 40 characters
   - KAGGLE_SECRET_NGROK_TOKEN: 49 characters

‚úì Secrets are now available in SSH sessions


## Cell 4: Start ngrok Tunnel (Keep Running!)

‚ö†Ô∏è **IMPORTANT:** This cell will run indefinitely. Keep this notebook running!


In [4]:
print("=" * 60)
print("Starting ngrok tunnel...")
print("=" * 60)
print("")
print("INSTRUCTIONS:")
print("1. Copy the SSH command from output below")
print("2. Connect via SSH from your local machine:")
print("")
print("   ssh root@<ngrok-host> -p <port>")
print(f"   Password: {SSH_PASSWORD}")
print("")
print("3. Or connect via VS Code/Cursor Remote-SSH extension")
print("")
print("=" * 60)
print("")

# Keep tunnel alive: authenticate ngrok, start tunnel, then keep kernel alive
import os
import re
import time
import json
import datetime
import subprocess
from pathlib import Path

KEEPALIVE_SLEEP = 15  # heartbeat interval in seconds

# Get ngrok token from environment
ngrok_token = os.environ.get("KAGGLE_SECRET_NGROK_TOKEN", "")

if not ngrok_token:
    print("‚ùå NGROK_TOKEN not found in environment!")
    print("   Make sure Cell 3 ran successfully and the secret is configured.")
    raise SystemExit(1)

# Kill previous ngrok instances
subprocess.run(
    ["pkill", "-f", "ngrok"],
    check=False,
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
)

# Wait for previous instances to fully terminate (avoid race condition)
time.sleep(2)

# Authenticate ngrok
print("Authenticating ngrok...")
auth_result = subprocess.run(
    ["ngrok", "config", "add-authtoken", ngrok_token], capture_output=True, text=True
)

if auth_result.returncode != 0:
    print(f"‚ùå Failed to authenticate ngrok:")
    print(f"   stdout: {auth_result.stdout}")
    print(f"   stderr: {auth_result.stderr}")
    raise SystemExit(1)

print("‚úì ngrok authenticated")
print(f"   Config output: {auth_result.stdout.strip()}")

# Start ngrok tunnel for SSH (port 22) - use daemon mode + API query
print("Starting ngrok tunnel on port 22...")
print("‚ö†Ô∏è  Note: Free tier only allows 1 agent online at a time!")
print("   If you have another ngrok tunnel running, it will be disconnected.\n")

# Start ngrok in background (write logs to file)
log_file = Path("/kaggle/working/ngrok.log")
proc = subprocess.Popen(
    ["ngrok", "tcp", "22", "--log", str(log_file)],
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL,
)

# Wait for ngrok to initialize (increased for reliability)
print("Initializing ngrok...", end="", flush=True)
time.sleep(5)

if proc.poll() is not None:
    print("\n‚ùå ngrok died immediately! Check log:")
    print(f"   cat {log_file}")
    raise SystemExit(1)

print(" OK")

# Query ngrok API for tunnel info (more reliable than parsing stdout)
print("Querying ngrok API...", end="", flush=True)

tunnel_host = None
tunnel_port = None
timeout = 20  # seconds
start_time = time.time()

# Query ngrok local API (http://localhost:4040) for tunnel info
import urllib.request

while time.time() - start_time < timeout:
    try:
        # Check if process died
        if proc.poll() is not None:
            print("\n‚ùå ngrok process died!")
            print(f"   Check log: cat {log_file}")
            break

        # Query API
        response = urllib.request.urlopen(
            "http://localhost:4040/api/tunnels", timeout=2
        )
        data = json.loads(response.read().decode())

        tunnels = data.get("tunnels", [])
        for tunnel in tunnels:
            if tunnel.get("proto") == "tcp":
                public_url = tunnel.get("public_url", "")
                match = re.search(r"tcp://([^:]+):(\d+)", public_url)
                if match:
                    tunnel_host = match.group(1)
                    tunnel_port = match.group(2)
                    break

        if tunnel_host:
            break

    except Exception:
        # API not ready yet
        pass

    time.sleep(1)
    print(".", end="", flush=True)

print("")  # newline

# Report result
if not tunnel_host or not tunnel_port:
    print("‚ö†Ô∏è  Could not retrieve tunnel URL from ngrok API")
    print("   This usually means ngrok is still initializing or there's an API issue.")
    print(f"\nüí° Solutions:")
    print(f"   1. Check ngrok dashboard: https://dashboard.ngrok.com/tunnels/agents")
    print(f"   2. Run Cell 7 (next cell) to manually query the URL")
    print(f"   3. Or check API directly: curl http://localhost:4040/api/tunnels")
    print(f"\nüìÅ Log file: {log_file}")
    print("\n‚ö†Ô∏è  Note: Tunnel may still be running! Check dashboard or Cell 7.")
else:
    print("\n‚úÖ Tunnel established! Connect via:")
    print(f"   ssh root@{tunnel_host} -p {tunnel_port}")
    print(f"   Password: {SSH_PASSWORD}")
    print("\n‚ö†Ô∏è  KEEPING KERNEL ALIVE. DO NOT CLOSE THIS TAB!")
    print("üí° Tip: Drag this tab to a separate window and DON'T minimize it.\n")

    try:
        counter = 0
        while True:
            time.sleep(KEEPALIVE_SLEEP)

            # Check if ngrok is still running
            if proc.poll() is not None:
                print("\n‚ùå ngrok died unexpectedly!")
                break

            # Print heartbeat every minute
            counter += 1
            if counter % 4 == 0:  # Every 60s (15s * 4)
                timestamp = datetime.datetime.now().strftime("%H:%M:%S")
                print(f"[{timestamp}] Kernel is active... (Tunnel OK)")

    except KeyboardInterrupt:
        print("\nüõë User interrupted.")
        proc.terminate()
    except Exception as e:
        print(f"\n‚ùå Error: {e}")
        proc.terminate()

Starting ngrok tunnel...

INSTRUCTIONS:
1. Copy the SSH command from output below
2. Connect via SSH from your local machine:

   ssh root@<ngrok-host> -p <port>
   Password: onb4l44pb5PHr9Eq0GNvQw

3. Or connect via VS Code/Cursor Remote-SSH extension


Authenticating ngrok...
‚úì ngrok authenticated
   Config output: Authtoken saved to configuration file: /root/.config/ngrok/ngrok.yml
Starting ngrok tunnel on port 22...
‚ö†Ô∏è  Note: Free tier only allows 1 agent online at a time!
   If you have another ngrok tunnel running, it will be disconnected.

Initializing ngrok... OK
Querying ngrok API...

‚úÖ Tunnel established! Connect via:
   ssh root@8.tcp.ngrok.io -p 15193
   Password: onb4l44pb5PHr9Eq0GNvQw

‚ö†Ô∏è  KEEPING KERNEL ALIVE. DO NOT CLOSE THIS TAB!
üí° Tip: Drag this tab to a separate window and DON'T minimize it.

[14:05:07] Kernel is active... (Tunnel OK)
[14:06:07] Kernel is active... (Tunnel OK)
[14:07:07] Kernel is active... (Tunnel OK)
[14:08:07] Kernel is active... (

## Get Tunnel URL (if Cell 6 hangs)

If the previous cell is stuck at "Waiting for tunnel URL...", run this cell to get the URL from ngrok API.


In [None]:
import json
import urllib.request

try:
    # Query ngrok local API
    response = urllib.request.urlopen("http://localhost:4040/api/tunnels", timeout=5)
    data = json.loads(response.read().decode())

    tunnels = data.get("tunnels", [])

    if not tunnels:
        print("‚ùå No active tunnels found!")
        print("   ngrok may still be starting up. Wait 5-10 seconds and try again.")
    else:
        print("‚úÖ Active tunnels:")
        for tunnel in tunnels:
            proto = tunnel.get("proto", "")
            public_url = tunnel.get("public_url", "")

            if proto == "tcp" and "ngrok.io" in public_url:
                # Parse tcp://host:port
                import re

                match = re.search(r"tcp://([^:]+):(\d+)", public_url)
                if match:
                    host = match.group(1)
                    port = match.group(2)
                    print(f"\nüîó SSH Connection:")
                    print(f"   ssh root@{host} -p {port}")
                    print(f"   Password: kaggle2024")
                    print(f"\nüìã For VS Code/Cursor Remote-SSH, add to ~/.ssh/config:")
                    print(f"   Host kaggle-gpu")
                    print(f"       HostName {host}")
                    print(f"       User root")
                    print(f"       Port {port}")
                    print(f"       StrictHostKeyChecking no")
                else:
                    print(f"   {proto}: {public_url}")
            else:
                print(f"   {proto}: {public_url}")

except urllib.error.URLError as e:
    print("‚ùå Cannot connect to ngrok API (http://localhost:4040)")
    print("   This means ngrok is not running or not ready yet.")
    print(f"   Error: {e}")
except Exception as e:
    print(f"‚ùå Error: {e}")