# Honeypot Results

This notebook reads JSON-lines produced by the honeypot logfile (default: `honeypot.log` in the project root), parses the connection entries, shows a table of recent events, and draws simple summaries (connections per port, top peers).

Instructions:
1. Ensure this notebook's current working directory is the project root (the folder that contains `honeypot.log` or the honeypot script). If not, change the working directory or update `logpath` in the next cell.
2. If no logfile exists yet, start the honeypot (for local testing you can run it bound to `127.0.0.1`) and then re-run the cells.

The cells below try to use pandas/matplotlib if available; they fall back to plain Python printing if not.

In [None]:
# Cell 2: load and parse honeypot logfile (JSON lines)
import os
import sys
import json
from datetime import datetime

# Adjust this if your logfile is in a different place
project_root = os.path.abspath('.')
logpath = os.path.join(project_root, 'honeypot.log')

print('Project root:', project_root)
print('Looking for logfile at:', logpath)

if not os.path.exists(logpath):
    print('
No logfile found at the path above.')
    print('If you haven\'t run the honeypot yet, start it (e.g. `python scripts/honeypot.py --host 127.0.0.1 --ports 2222 --banner 
`) and then re-run this cell.')
    entries = []
else:
    entries = []
    with open(logpath, 'r', encoding='utf-8') as f:
        for i, line in enumerate(f, 1):
            line = line.strip()
            if not line:
                continue
            try:
                obj = json.loads(line)
            except Exception as e:
                print(f'Failed to parse line {i}:', e)
                # show beginning of the problematic line
                print(line[:400])
                continue
            entries.append(obj)
    print(f'Loaded {len(entries)} entries from {logpath}')

# Try to present using pandas if available
try:
    import pandas as pd
    if entries:
        df = pd.DataFrame(entries)
        # Normalize timestamp column if present
        if 'timestamp' in df.columns:
            df['timestamp'] = pd.to_datetime(df['timestamp'], errors='coerce')
        # Show basic info
        print('
Dataframe shape:', df.shape)
        display(df.head(50))
    else:
        print('No entries to display.')
except Exception as e:
    print('pandas not available or other error:', e)
    # Fallback: simple print of first entries
    from pprint import pprint
    pprint(entries[:20])

In [None]:
# Cell 3: summary statistics and simple plots (matplotlib required)
try:
    import matplotlib.pyplot as plt
    plotting_ok = True
except Exception as e:
    print('matplotlib not available, skipping plots:', e)
    plotting_ok = False

if 'entries' not in globals() or not entries:
    print('No entries loaded; run the previous cell after you have honeypot logs.')
else:
    # Ports distribution
    ports = [e.get('port') for e in entries if 'port' in e]
    peers = [e.get('peer') for e in entries if 'peer' in e]

    from collections import Counter
    port_counts = Counter(ports)
    peer_counts = Counter(peers)

    print('
Connections per port:')
    for port, cnt in port_counts.most_common():
        print(f'  {port}: {cnt}')

    print('
Top peers:')
    for peer, cnt in peer_counts.most_common(10):
        print(f'  {peer}: {cnt}')

    if plotting_ok:
        try:
            fig, ax = plt.subplots(1, 2, figsize=(12, 4))
            # Ports bar
            ports_sorted = sorted(port_counts.items())
            if ports_sorted:
                xs, ys = zip(*ports_sorted)
                ax[0].bar(xs, ys)
                ax[0].set_title('Connections per port')
                ax[0].set_xlabel('Port')
                ax[0].set_ylabel('Connections')
            else:
                ax[0].text(0.5, 0.5, 'No data', ha='center')

            # Top peers pie (if enough data)
            top_peers = peer_counts.most_common(6)
            if top_peers:
                labels, sizes = zip(*top_peers)
                ax[1].pie(sizes, labels=labels, autopct='%1.1f%%')
                ax[1].set_title('Top peers')
            else:
                ax[1].text(0.5, 0.5, 'No data', ha='center')

            plt.tight_layout()
            plt.show()
        except Exception as e:
            print('Plotting failed:', e)

In [None]:
# Cell 4: show raw last N entries (tail)
N = 20
if 'entries' in globals() and entries:
    tail = entries[-N:]
    try:
        import pandas as pd
        display(pd.DataFrame(tail))
    except Exception:
        from pprint import pprint
        pprint(tail)
else:
    print('No entries available to tail.')

## Next steps

- If you want the notebook to start the honeypot for you, I can add a cell that launches the honeypot as a background subprocess and then re-runs the parsing cells. That cell will warn about running processes and recommend using localhost binding or the firewall helper.
- I can also add a live-tail cell that polls the logfile and updates the display in real time.
- If you prefer a CSV/Excel export of the results, I can add that too.

Tell me which of these you'd like and I will add the corresponding cell.