# Active Scans Report (BMC Discovery)

This notebook fetches discovery run activity from a BMC Discovery appliance using the REST API and loads the results into pandas for inspection and export.

## Requirements

We use `requests` for HTTP, `pandas` for tabular data, and `PyYAML` to read configuration. If they are not installed, uncomment the install cell below.

In [None]:
# If needed, install dependencies
# %pip install -q requests pandas pyyaml

import pandas as pd
import requests
import yaml
from pathlib import Path
from urllib.parse import urljoin
import json, os

## Select Appliance (optional)

If your `config.yaml` defines multiple appliances under the `appliances:` list, set `APPLIANCE_NAME` to one of their names (recommended) or set `APPLIANCE_INDEX` to pick by position. Leave both as-is to default to the first appliance.

In [None]:
APPLIANCE_NAME = None   # e.g., 'prod' or 'dev'
APPLIANCE_INDEX = 0     # integer index if not using name selection
#APPLIANCE_INDEX = 1

## Configuration (from config.yaml)

This cell reads settings from `config.yaml` in the project root (one level above this notebooks directory). Expected keys:
- `target`: hostname or base URL of the Discovery appliance (scheme optional)
- `token` or `token_file`: API token string or path to a file containing it (without the 'Bearer' prefix)
- `api_version` (optional): defaults to `v1.14` if not provided
- `verify_ssl` (optional): boolean, defaults to `True`

The report CSV is saved to `output_<target>` in the project root, matching the CLI tool's layout.

In [None]:
# Locate config.yaml relative to this notebook (../config.yaml)
# Robustly locate the project root (directory containing config.yaml) without using Path.resolve()
def _find_repo_root(start: Path) -> Path:
    for p in [start] + list(start.parents):
        if (p / 'config.yaml').exists():
            return p
    # Fallback to parent of CWD (useful when running from notebooks/)
    return start.parent

repo_root = _find_repo_root(Path.cwd())
config_path = repo_root / 'config.yaml'
if not config_path.exists():
    raise FileNotFoundError(f'config.yaml not found at {config_path}')

with open(config_path, 'r') as fh:
    cfg = yaml.safe_load(fh) or {}

# Select appliance from list if present
selected = None
apps = cfg.get('appliances') or []
if isinstance(apps, list) and apps:
    if APPLIANCE_NAME:
        selected = next((a for a in apps if a.get('name') == APPLIANCE_NAME), None)
        if selected is None:
            raise ValueError(f"No appliance named '{APPLIANCE_NAME}' in config.yaml")
    else:
        try:
            selected = apps[int(APPLIANCE_INDEX)]
        except Exception:
            selected = apps[0]

# Resolve target and base URL
target = ((selected or {}).get('target') or cfg.get('target') or '').strip()
if not target:
    raise ValueError('config.yaml missing "target"')
BASE_URL = target if ('://' in target) else f'https://{target}'

# Resolve token or token file
token = (((selected or {}).get('token') or cfg.get('token') or '').strip())
token_file = (selected or {}).get('token_file') or cfg.get('token_file') or cfg.get('f_token')
if not token and token_file:
    tf_path = Path(token_file)
    if not tf_path.is_absolute():
        tf_path = repo_root / tf_path
    with open(tf_path, 'r') as tf:
        token = tf.read().strip()
if not token:
    raise ValueError('API token not found in config.yaml (token or token_file)')

API_VERSION = str((selected or {}).get('api_version') or cfg.get('api_version') or 'v1.14')
VERIFY_SSL = bool((selected or {}).get('verify_ssl', cfg.get('verify_ssl', True)))

# Prepare output directory consistent with CLI naming
sanitized = target.replace('.', '_').replace(':', '_').replace('/', '_')
output_dir = repo_root / f'output_{sanitized}'
output_dir.mkdir(parents=True, exist_ok=True)

print('Appliance     :', (selected or {}).get('name', '(single)'))
print('Base URL      :', BASE_URL)
print('API Version   :', API_VERSION)
print('Verify SSL    :', VERIFY_SSL)
print('Output folder :', output_dir)

## Create a session and helper functions

Set up a `requests` session with the Authorization header. Add small helpers to build API URLs and safely parse JSON responses.

In [None]:
session = requests.Session()
# Allow token with or without 'Bearer ' prefix
auth_value = token if token.lower().startswith('bearer ') else f'Bearer {token}'
session.headers.update({
    'Authorization': auth_value,
    'Accept': 'application/json'
})
session.verify = VERIFY_SSL

def api_url(path: str) -> str:
    base = BASE_URL.rstrip('/') + f'/api/{API_VERSION}/'
    return urljoin(base, path.lstrip('/'))

def get_json(url: str, **kwargs):
    resp = session.get(url, **kwargs)
    if resp.status_code != 200:
        print(f'Error {resp.status_code} fetching {url}: {resp.text[:200]}')
        return {}
    try:
        return resp.json()
    except Exception as e:
        print(f'Failed to decode JSON: {e}')
        return {}

## Fetch discovery runs

Call the Discovery API endpoint that lists discovery runs. We normalize the JSON into a pandas DataFrame.

In [None]:
runs_url = api_url('discovery/runs')
raw = get_json(runs_url)

# Handle either a list response or an object with a 'results' list
if isinstance(raw, dict) and 'results' in raw:
    records = raw['results']
elif isinstance(raw, list):
    records = raw
else:
    records = []

df = pd.json_normalize(records)
print(f'Total runs retrieved: {len(df)}')
df.head()

## Inspect common fields

Show a few relevant columns such as labels, timing and counts when present.

In [None]:
if not df.empty:
    cols = [c for c in ['label','starttime','outpost_name','done','total','finished','scan_level','scan_type'] if c in df.columns]
    display(df[cols].head(20) if cols else df.head(20))
else:
    print('No runs returned.')

## Filter in-progress runs

Filter the DataFrame to show only runs that are not yet finished (when the `finished` field is present).

In [None]:
if not df.empty and 'finished' in df.columns:
    in_progress = df.loc[df['finished'] == False]
    print(f'In-progress runs: {len(in_progress)}')
    display(in_progress.head(10))
else:
    print('The dataset has no "finished" column or is empty.')

## Save to CSV (optional)

Persist the full dataset to the project output directory (`output_<target>`).

This cell formats the output to match the DisMAL CLI report for Active Scans by:
- Inserting a 'Discovery Instance' column as the first column.
- Casting numeric fields (done, pre_scanning, scanning, total) to integers when present.
- Sorting remaining columns alphabetically to mirror json2csv header ordering.

In [None]:
# Prepare output
df_out = df.copy()
# Cast numeric columns when present
for col in ['done', 'pre_scanning', 'scanning', 'total']:
    if col in df_out.columns:
        df_out[col] = pd.to_numeric(df_out[col], errors='coerce').astype('Int64')

# Insert 'Discovery Instance' first
df_out.insert(0, 'Discovery Instance', target)

# Reorder columns: 'Discovery Instance' + sorted remaining
other_cols = sorted([c for c in df_out.columns if c != 'Discovery Instance'])
df_out = df_out[['Discovery Instance'] + other_cols]

# Check Results
display(df_out.head(10))

In [None]:
# Save CSV
OUTPUT_CSV = str(output_dir / 'active_scans.csv')
df_out.to_csv(OUTPUT_CSV, index=False)
print(f'Saved to {OUTPUT_CSV}')

---
### Notes
- If your appliance uses a self-signed certificate, set `VERIFY_SSL = False`.
- If the appliance exposes a different API version, update `API_VERSION`.
- You can further transform the dataset with `pandas.json_normalize` or additional joins if needed.