# 03 — Geo Anomalies & VPN Masking

This notebook detects identity anomalies that may indicate masking (VPN/proxy), and correlates them with travel records.

**Data:** `id_auth.csv`, `travel_records.csv`

In [None]:
import pandas as pd
import numpy as np
from pathlib import Path

data_dir = Path('data')
auth = pd.read_csv(data_dir/'id_auth.csv', parse_dates=['timestamp'])
travel = pd.read_csv(data_dir/'travel_records.csv')

auth.head()

## Suspicious patterns

1) **Impossible travel**: country changes within a short time window
2) **Travel mismatch**: travel system indicates foreign travel but logins “appear domestic” (e.g., VPN exit country = UK)


In [None]:
a = auth[auth['user']=='engineer.a'].copy().sort_values('timestamp')
logins = a[a['event']=='login_success'].copy()

# Impossible travel: consecutive logins within 2 hours from different src_country
logins['prev_time'] = logins['timestamp'].shift(1)
logins['prev_country'] = logins['src_country'].shift(1)
logins['delta'] = (logins['timestamp'] - logins['prev_time'])

impossible = logins[(logins['delta']<=pd.Timedelta('2h')) & (logins['src_country']!=logins['prev_country'])]
impossible[['timestamp','src_country','prev_time','prev_country','delta','vpn_used','vpn_exit_country','asn','device']].head(20)

In [None]:
# Travel mismatch join
logins['date'] = logins['timestamp'].dt.date.astype(str)
merged = logins.merge(travel, on=['date','user'], how='left')

mismatch = merged[(merged['travel_country'].notna()) & (merged['vpn_exit_country']=='UK') & (merged['travel_country']!='UK')]
mismatch[['timestamp','src_country','vpn_used','vpn_exit_country','travel_country','source','asn','device']].sort_values('timestamp')

## Output

Use this output as part of the escalation packet: identity anomalies + supporting evidence.