# Best Routes Extractor (Top Profit per Day)

This notebook scans a folder of CSV files named like:

- `routes_<HUB>_<AIRCRAFT>_<TRIPS>.csv`  
  Example: `routes_LIM_a388_2.csv`

It reads **only the first N lines per file** (default 50), computes:

- **profit_per_day** = `profit_pt * trips_pd_pa`
- **hours_per_trip** = `24 / trips_pd_pa`

Then exports the **Top K routes** (default 500) to a single CSV.

---


In [1]:
# Install / import dependencies
# If needed: !pip install pandas

from __future__ import annotations

import re
from dataclasses import dataclass
from pathlib import Path
from typing import Optional, Iterable, Tuple

import pandas as pd


A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.2.6 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\leand\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\ipykernel_l

AttributeError: _ARRAY_API not found


A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.2.6 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\leand\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\ipykernel_l

AttributeError: _ARRAY_API not found


A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.2.6 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\leand\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\ipykernel_l

AttributeError: _ARRAY_API not found

In [2]:
# Configuration (edit these values, or call functions directly)

BASE_FOLDER = Path.cwd()
INPUT_FOLDER = BASE_FOLDER  # change to your folder
AIRCRAFT_FILTER = "b748f"            # e.g. "a388" or None
TRIPS_FILTER = None                 # e.g. 3 or None
LINES_PER_FILE = 50                 # first N lines only
TOP_K = 60                         # best routes to keep
OUTPUT_CSV = BASE_FOLDER / f"best_routes_{AIRCRAFT_FILTER}.csv"


In [3]:
# Core functions

FILENAME_RE = re.compile(
    r"^routes_(?P<hub>[^_]+)_(?P<aircraft>[^_]+)_(?P<trips>\d+)\.csv$",
    re.IGNORECASE,
)

# Columns that always exist (for both passenger & cargo aircraft)
BASE_REQUIRED_COLUMNS = [
    "dest.id",
    "dest.name",
    "dest.country",
    "stop.name",
    "stop.country",
    "trips_pd_pa",
    "profit_pt",
]

# Passenger aircraft columns (most aircraft)
PAX_COLUMNS = ["cfg.y", "cfg.j", "cfg.f", "tkt.y", "tkt.j", "tkt.f"]

# Cargo aircraft columns (aircraft name ends with 'f' or 'F', e.g. b748f)
CARGO_COLUMNS = ["cfg.l", "cfg.h", "tkt.l", "tkt.h"]


def normalize_aircraft_name(raw: str) -> str:
    """
    Normalize aircraft name to determine if it is cargo.

    Examples:
      'a388[sfc]' -> 'a388'
      'b748f'     -> 'b748f'
      'B748F[SFC]'-> 'B748F'
    """
    if raw is None:
        return ""
    raw = str(raw).strip()
    # remove bracket suffix like [sfc]
    return re.sub(r"\[.*\]$", "", raw).strip()


def is_cargo_aircraft(aircraft_name: str) -> bool:
    a = normalize_aircraft_name(aircraft_name)
    return bool(a) and a[-1].lower() == "f"


def required_columns_for_aircraft(aircraft_name: str) -> list[str]:
    return BASE_REQUIRED_COLUMNS + (CARGO_COLUMNS if is_cargo_aircraft(aircraft_name) else PAX_COLUMNS)


def output_columns_for_aircraft(aircraft_name: str) -> list[str]:
    # Keep the output schema aligned to the detected aircraft type
    seat_cols = CARGO_COLUMNS if is_cargo_aircraft(aircraft_name) else PAX_COLUMNS
    return [
        "HUB",
        "dest.id",
        "dest.name",
        "dest.country",
        "stop.name",
        "stop.country",
        *seat_cols,
        "hours_per_trip",   # 24 / trips_pd_pa
        "profit_per_day",   # profit_pt * trips_pd_pa
    ]


@dataclass(frozen=True)
class RouteFileMeta:
    path: Path
    hub: str
    aircraft: str
    trips_in_name: int


def iter_route_files(
    folder: Path,
    aircraft_filter: Optional[str] = None,
    trips_filter: Optional[int] = None,
) -> Iterable[RouteFileMeta]:
    """Yield metadata for all matching routes_*.csv files in folder."""
    folder = Path(folder)
    for p in folder.iterdir():
        if not p.is_file():
            continue
        m = FILENAME_RE.match(p.name)
        if not m:
            continue

        hub = m.group("hub")
        aircraft = m.group("aircraft")
        trips_in_name = int(m.group("trips"))

        if aircraft_filter and aircraft_filter.lower() not in p.name.lower():
            continue
        if trips_filter is not None and trips_in_name != trips_filter:
            continue

        yield RouteFileMeta(path=p, hub=hub, aircraft=aircraft, trips_in_name=trips_in_name)


def read_top_lines_csv(path: Path, nrows: int = 50) -> pd.DataFrame:
    """Read only the first `nrows` rows from a CSV."""
    return pd.read_csv(path, nrows=nrows)


def compute_profit_metrics(df: pd.DataFrame) -> pd.DataFrame:
    """Compute hours_per_trip and profit_per_day (returns a copy)."""
    out = df.copy()

    # numeric coercion
    out["trips_pd_pa"] = pd.to_numeric(out["trips_pd_pa"], errors="coerce")
    out["profit_pt"] = pd.to_numeric(out["profit_pt"], errors="coerce")

    # drop invalid rows
    out = out.dropna(subset=["trips_pd_pa", "profit_pt"])
    out = out[out["trips_pd_pa"] != 0]

    out["hours_per_trip"] = 24.0 / out["trips_pd_pa"]
    out["profit_per_day"] = out["profit_pt"] * out["trips_pd_pa"]

    return out


def extract_best_routes(
    folder: Path,
    aircraft_filter: Optional[str] = None,
    trips_filter: Optional[int] = None,
    lines_per_file: int = 50,
    top_k: int = 500,
) -> pd.DataFrame:
    """Return a DataFrame with the top K routes by profit_per_day.

    The notebook auto-detects cargo aircraft if the aircraft name ends with 'f' or 'F'
    (e.g. b748f). In that case it uses cfg.l/cfg.h and tkt.l/tkt.h instead of cfg.y/j/f
    and tkt.y/j/f.
    """
    folder = Path(folder)
    all_rows = []
    matched_files = 0
    detected_aircraft_type: Optional[bool] = None  # True=cargo, False=pax
    detected_aircraft_name: Optional[str] = None

    for meta in iter_route_files(folder, aircraft_filter=aircraft_filter, trips_filter=trips_filter):
        matched_files += 1

        # Enforce a single aircraft type in a single run (recommended)
        this_is_cargo = is_cargo_aircraft(meta.aircraft)
        if detected_aircraft_type is None:
            detected_aircraft_type = this_is_cargo
            detected_aircraft_name = meta.aircraft
        elif detected_aircraft_type != this_is_cargo:
            raise ValueError(
                "Mixed passenger/cargo route files detected in the same run. "
                "Filter by aircraft so you only process one type at a time."
            )

        try:
            df = read_top_lines_csv(meta.path, nrows=lines_per_file)
        except Exception as e:
            print(f"[WARN] Could not read {meta.path.name}: {e}")
            continue

        required = required_columns_for_aircraft(meta.aircraft)
        missing = [c for c in required if c not in df.columns]
        if missing:
            kind = "CARGO" if this_is_cargo else "PAX"
            print(f"[WARN] Skipping {meta.path.name} ({kind} missing columns: {missing})")
            continue

        df = df[required].copy()
        df.insert(0, "HUB", meta.hub)

        df = compute_profit_metrics(df)
        all_rows.append(df)

    if matched_files == 0:
        raise ValueError(
            "No files matched the pattern routes_<HUB>_<AIRCRAFT>_<TRIPS>.csv "
            "with your current filters."
        )

    if not all_rows:
        raise ValueError("Files matched, but no usable rows were found (read/column issues).")

    big = pd.concat(all_rows, ignore_index=True)
    big = big.sort_values("profit_per_day", ascending=False).head(top_k)

    # Output columns depend on detected aircraft type
    out_cols = output_columns_for_aircraft(detected_aircraft_name or (aircraft_filter or ""))
    return big[out_cols].copy()


def export_best_routes_to_csv(
    folder: Path,
    output_csv: Path,
    aircraft_filter: Optional[str] = None,
    trips_filter: Optional[int] = None,
    lines_per_file: int = 50,
    top_k: int = 500,
) -> Path:
    """Extract best routes and export them to a CSV. Returns output path."""
    df = extract_best_routes(
        folder=folder,
        aircraft_filter=aircraft_filter,
        trips_filter=trips_filter,
        lines_per_file=lines_per_file,
        top_k=top_k,
    )

    output_csv = Path(output_csv)
    output_csv.parent.mkdir(parents=True, exist_ok=True)
    df.to_csv(output_csv, index=False)
    return output_csv


## Example usage (run these cells)

You can call the functions from other code cells, for example:
- `extract_best_routes(...)` returns a DataFrame
- `export_best_routes_to_csv(...)` writes the CSV


In [4]:
# Example 1: Get a DataFrame (top routes) and preview it

df_best = extract_best_routes(
    folder=INPUT_FOLDER,
    aircraft_filter=AIRCRAFT_FILTER,
    trips_filter=TRIPS_FILTER,
    lines_per_file=LINES_PER_FILE,
    top_k=TOP_K,
)

df_best.head(10)


[WARN] Could not read routes_LIM_b748f_2.csv: Error tokenizing data. C error: Expected 1 fields in line 27, saw 2



Unnamed: 0,HUB,dest.id,dest.name,dest.country,stop.name,stop.country,cfg.l,cfg.h,tkt.l,tkt.h,hours_per_trip,profit_per_day
150,SIN,3077,Puerto Asís,Colombia,Norway House,Canada,93,7,21.59,15.05,12.0,7625855.0
151,SIN,3114,Pasto,Colombia,Grande Prairie,Canada,100,0,21.459999,14.96,12.0,7603898.0
152,SIN,3060,Macas,Ecuador,Maré,New Caledonia,89,11,21.540001,15.02,12.0,7591305.0
100,PEK,2902,Santa Rosa,Argentina,Richmond,United States,100,0,21.4,14.91,12.0,7581079.0
153,SIN,3167,Iquitos,Perú,Port Elizabeth,South Africa,100,0,21.389999,14.91,12.0,7577078.0
0,HKG,2872,Salta,Argentina,Denham,Australia,88,12,21.51,14.99,12.0,7575280.0
154,SIN,3113,Popayán,Colombia,Kasabonika,Canada,100,0,21.360001,14.89,12.0,7565620.0
155,SIN,3093,Florencia,Colombia,Charlo,Canada,83,17,21.469999,14.96,12.0,7540552.0
50,ICN,2904,Tandil,Argentina,Isla De Pascua,Chile,100,0,21.290001,14.83,12.0,7539146.0
101,PEK,2896,Tres Arroyos,Argentina,Kavala,Greece,82,18,21.469999,14.97,12.0,7537457.0


In [5]:
# Example 2: Export to CSV

out_path = export_best_routes_to_csv(
    folder=INPUT_FOLDER,
    output_csv=OUTPUT_CSV,
    aircraft_filter=AIRCRAFT_FILTER,
    trips_filter=TRIPS_FILTER,
    lines_per_file=LINES_PER_FILE,
    top_k=TOP_K,
)

out_path


[WARN] Could not read routes_LIM_b748f_2.csv: Error tokenizing data. C error: Expected 1 fields in line 27, saw 2



WindowsPath('c:/Users/leand/OneDrive/Bureau/PERSO/Airline Manager 4/best_routes_b748f.csv')

## Notes / customization

- If you set `TRIPS_FILTER = 3`, it selects files ending with `_3.csv`.
- If you leave `TRIPS_FILTER = None`, it includes *all trips* for the selected aircraft.
- If you set `AIRCRAFT_FILTER = None`, it includes all aircraft types (based on filename).
