# AutoTSP: run non-synthetic datasets with 30s timeout on Colab

This notebook clones the repo, installs dependencies, ingests the TSPLIB-style datasets in `Instance Datasets/datasets` into a JSONL problem file, and runs *all* algorithms with a 30 second per-instance timeout.

> Notes: The ingest step can produce a large JSONL (hundreds of MB). You can cap the number of files (`LIMIT`) or skip very large instances (`MAX_CITIES`). If `pyconcorde` fails to build on Colab, rerun the install cell after enabling GPU or temporarily remove the Concorde solver from `SOLVER_SPECS`.

Upload your prebuilt JSONL (e.g., /content/problems.jsonl) to skip the ingest cell; it will be used directly.


In [None]:
# Clone the repository
!rm -rf AutoTSP
!git clone https://github.com/OzDuys/AutoTSP.git
%cd AutoTSP
!git status -s

In [None]:
# Install dependencies (best-effort for pyconcorde)
!python -m pip install --upgrade pip
# Install everything except pyconcorde first (keeps failures from stopping the rest)
!grep -v pyconcorde requirements.txt | python -m pip install -r /dev/stdin
# Try to install pyconcorde; if it fails, we'll skip Concorde in the run step
!python -m pip install git+https://github.com/jvkersch/pyconcorde || echo "pyconcorde failed; Concorde solver will be skipped"

In [None]:
# Parameters
PROBLEMS = "/content/problems.jsonl"  # set to your uploaded full dataset JSONL
RESULTS = "Instance-Algorithm Datasets/Full Dataset/results_nonsynth_colab.jsonl"
TIME_LIMIT = 30.0  # seconds per algorithm per instance

# Optional caps if you decide to regenerate instead of using the uploaded file. Set to "" to disable.
LIMIT = ""       # e.g., 500 to ingest only first 500 files
MAX_CITIES = ""  # e.g., 5000 to skip instances with >5000 cities


In [None]:
# Ingest TSPLIB-style datasets into JSONL (skips if the output file already exists)
%%bash -s "$PROBLEMS" "$LIMIT" "$MAX_CITIES"
PROBLEMS_PATH="$1"
LIMIT="$2"
MAX_CITIES="$3"

set -e
if [ -f "$PROBLEMS_PATH" ]; then
  echo "Found $PROBLEMS_PATH; skipping ingest."
  exit 0
fi

ARGS=(--root "Instance Datasets/datasets" --output "$PROBLEMS_PATH")
if [ -n "$LIMIT" ]; then ARGS+=(--limit "$LIMIT"); fi
if [ -n "$MAX_CITIES" ]; then ARGS+=(--max-cities "$MAX_CITIES"); fi

python "Instance Datasets/ingest_instances_from_datasets.py" "${ARGS[@]}"


In [None]:
# Run algorithms with a 30s timeout per instance.
# If pyconcorde failed to build, Concorde will be excluded automatically.
%%bash -s "$PROBLEMS" "$RESULTS" "$TIME_LIMIT"
PROBLEMS_PATH="$1"
RESULTS_PATH="$2"
TIME_LIMIT="$3"

set -e

# Build algorithm list, dropping concorde_exact if pyconcorde is unavailable
ALGOS=$(python - <<'PY'
import importlib
from AutoTSP import SOLVER_SPECS
has_concorde = True
try:
    importlib.import_module("concorde")
except Exception:
    has_concorde = False
algos = []
for name in SOLVER_SPECS:
    if not has_concorde and name == "concorde_exact":
        continue
    algos.append(name)
print(" ".join(algos))
PY
)
echo "Running algorithms: $ALGOS"

python "Instance-Algorithm Datasets/run_algorithms.py" \
  --problems "$PROBLEMS_PATH" \
  --results "$RESULTS_PATH" \
  --time-limit "$TIME_LIMIT" \
  --algorithms $ALGOS \
  --overwrite

In [None]:
# Inspect a few result rows
import json
from itertools import islice

results_path = "Instance-Algorithm Datasets/Full Dataset/results_nonsynth_colab.jsonl"
with open(results_path, "r", encoding="utf-8") as fh:
    for row in islice(fh, 5):
        print(json.loads(row))