Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
3416e9f
restructuring as pip installable sdk
rayruizhiliao Nov 17, 2025
8457d30
install dev dependencies
rayruizhiliao Nov 17, 2025
bffa37e
fix: entry point for execute_routines
rayruizhiliao Nov 17, 2025
42afe3d
update readme
rayruizhiliao Nov 17, 2025
4e82bbe
update readme
rayruizhiliao Nov 17, 2025
efa52ee
include a quickstart script
rayruizhiliao Nov 17, 2025
54e288c
update readme
rayruizhiliao Nov 17, 2025
a4462d1
use built-in collection types
rayruizhiliao Nov 18, 2025
50b8e4d
add a quickstart python script
rayruizhiliao Nov 18, 2025
f4ba5c2
remove quickstart bash script
rayruizhiliao Nov 18, 2025
fe5ed52
fix: box alignment
rayruizhiliao Nov 18, 2025
0d3c5e7
improve step 1 of quickstart
rayruizhiliao Nov 18, 2025
40e3081
remove error message for requests import
rayruizhiliao Nov 19, 2025
f5e7311
Update scripts/quickstart.py
rayruizhiliao Nov 19, 2025
5c2f52c
clarify log messaging
rayruizhiliao Nov 19, 2025
776534d
improve task input experience
rayruizhiliao Nov 19, 2025
39e9511
remove unused import
rayruizhiliao Nov 19, 2025
5eeb100
rename vars
rayruizhiliao Nov 19, 2025
a7eeab4
allow user to skip steps
rayruizhiliao Nov 19, 2025
09b4147
ask the user whether to rm existing data
rayruizhiliao Nov 19, 2025
38b36b0
move quickstart.py
rayruizhiliao Nov 19, 2025
7e847fd
update open_url_in_chrome messaging
rayruizhiliao Nov 19, 2025
53e4c32
do not open the documentation page if chrome was already running
rayruizhiliao Nov 19, 2025
cf953dd
move imports
rayruizhiliao Nov 19, 2025
b5dfdf1
graceful exit
rayruizhiliao Nov 19, 2025
14b5d78
enable quickstart script to close chrome
rayruizhiliao Nov 20, 2025
da0522e
give user high-level context before starting
rayruizhiliao Nov 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ jobs:
key: ${{ runner.os }}-uv-${{ hashFiles('pyproject.toml') }}

- name: Install dependencies
run: uv sync
run: uv sync --extra dev

- name: Lint
run: uv run pylint $(git ls-files '*.py')
Expand Down
170 changes: 107 additions & 63 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,113 +148,151 @@ This substitutes parameter values and injects `auth_token` from cookies. The JSO

- Python 3.12+
- Google Chrome (stable)
- [uv (Python package manager)](https://github.com/astral-sh/uv)
- [uv (Python package manager)](https://github.com/astral-sh/uv) (optional, for development)
- macOS/Linux: `curl -LsSf https://astral.sh/uv/install.sh | sh`
- Windows (PowerShell): `iwr https://astral.sh/uv/install.ps1 -UseBasicParsing | iex`
- OpenAI API key

## Set up Your Environment 🔧
## Installation

### Linux
### From PyPI (Recommended)

**Note:** We recommend using a virtual environment to avoid dependency conflicts.

```bash
# Create and activate a virtual environment
# Option 1: Using uv (recommended - handles Python version automatically)
uv venv web-hacker-env
source web-hacker-env/bin/activate # On Windows: web-hacker-env\Scripts\activate
uv pip install web-hacker

# Option 2: Using python3 (if Python 3.12+ is your default)
python3 -m venv web-hacker-env
source web-hacker-env/bin/activate # On Windows: web-hacker-env\Scripts\activate
pip install web-hacker

# Option 3: Using pyenv (if you need a specific Python version)
pyenv install 3.12.3 # if not already installed
pyenv local 3.12.3
python -m venv web-hacker-env
source web-hacker-env/bin/activate # On Windows: web-hacker-env\Scripts\activate
pip install web-hacker

# Troubleshooting: If pip is not found, recreate the venv or use:
python -m ensurepip --upgrade # Install pip in the venv
pip install web-hacker
```

### From Source (Development)

For development or if you want the latest code:

```bash
# 1) Clone and enter the repo
# Clone the repository
git clone https://github.com/VectorlyApp/web-hacker.git
cd web-hacker

# 2) Create & activate virtual environment (uv)
uv venv --prompt web-hacker
source .venv/bin/activate # Windows: .venv\\Scripts\\activate
# Create and activate virtual environment
python3 -m venv web-hacker-env
source web-hacker-env/bin/activate # On Windows: web-hacker-env\Scripts\activate

# 3) Install exactly what lockfile says
uv sync
# Install in editable mode
pip install -e .

# 4) Install in editable mode via uv (pip-compatible interface)
# Or using uv (faster)
uv venv web-hacker-env
source web-hacker-env/bin/activate
uv pip install -e .

# 5) Configure environment
cp .env.example .env # then edit values
# or set directly
export OPENAI_API_KEY="sk-..."
```

### Windows
## Quickstart (Easiest Way) 🚀

```powershell
# 1) Clone and enter the repo
git clone https://github.com/VectorlyApp/web-hacker.git
cd web-hacker

# 2) Install uv (if not already installed)
iwr https://astral.sh/uv/install.ps1 -UseBasicParsing | iex
The fastest way to get started is using the quickstart script, which automates the entire workflow:

# 3) Create & activate virtual environment (uv)
uv venv --prompt web-hacker
.venv\Scripts\activate
```bash
# Make sure web-hacker is installed
pip install web-hacker

# 4) Install in editable mode via uv (pip-compatible interface)
uv pip install -e .
# Set your OpenAI API key
export OPENAI_API_KEY="sk-..."

# 5) Configure environment
copy .env.example .env # then edit values
# or set directly
$env:OPENAI_API_KEY="sk-..."
# Run the quickstart script
python quickstart.py
```

The quickstart script will:
1. ✅ Automatically launch Chrome in debug mode
2. 📊 Start browser monitoring (you perform actions)
3. 🤖 Discover routines from captured data
4. 📝 Show you how to execute the discovered routine

**Note:** The quickstart script is included in the repository. If you installed from PyPI, you can download it from the [GitHub repository](https://github.com/VectorlyApp/web-hacker/blob/main/quickstart.py).

## Launch Chrome in Debug Mode 🐞

### Instructions for MacOS
> 💡 **Tip:** The [quickstart script](#quickstart-easiest-way-🚀) automatically launches Chrome for you. You only need these manual instructions if you're not using the quickstart script.

```
# You should see JSON containing a webSocketDebuggerUrl like:
# ws://127.0.0.1:9222/devtools/browser/*************************************# Create temporary chrome user directory
mkdir $HOME/tmp
mkdir $HOME/tmp/chrome
### macOS

```bash
# Create temporary Chrome user directory
mkdir -p $HOME/tmp/chrome

# Launch Chrome app in debug mode (this exposes websocket for controlling and monitoring the browser)
# Launch Chrome in debug mode
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
--remote-debugging-address=127.0.0.1 \
--remote-debugging-port=9222 \
--user-data-dir="$HOME/tmp/chrome" \
'--remote-allow-origins=*' \
--remote-allow-origins=* \
--no-first-run \
--no-default-browser-check


# Verify chrome is running in debug mode
# Verify Chrome is running
curl http://127.0.0.1:9222/json/version

# You should see JSON containing a webSocketDebuggerUrl like:
# ws://127.0.0.1:9222/devtools/browser/*************************************
```

### Instructions for Windows
### Windows

```
```powershell
# Create temporary Chrome user directory
New-Item -ItemType Directory -Force -Path "$env:USERPROFILE\\tmp\\chrome" | Out-Null
New-Item -ItemType Directory -Force -Path "$env:USERPROFILE\tmp\chrome" | Out-Null

# Locate Chrome (adjust path if Chrome is installed elsewhere)
$chrome = "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe"
# Locate Chrome
$chrome = "C:\Program Files\Google\Chrome\Application\chrome.exe"
if (!(Test-Path $chrome)) {
$chrome = "C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe"
$chrome = "C:\Program Files (x86)\Google\Chrome\Application\chrome.exe"
}

# Launch Chrome in debug mode (exposes DevTools WebSocket)
# Launch Chrome in debug mode
& $chrome `
--remote-debugging-address=127.0.0.1 `
--remote-debugging-port=9222 `
--user-data-dir="$env:USERPROFILE\\tmp\\chrome" `
--user-data-dir="$env:USERPROFILE\tmp\chrome" `
--remote-allow-origins=* `
--no-first-run `
--no-default-browser-check


# Verify Chrome is running in debug mode
# Verify Chrome is running
(Invoke-WebRequest http://127.0.0.1:9222/json/version).Content
```

### Linux

```bash
# Create temporary Chrome user directory
mkdir -p $HOME/tmp/chrome

# You should see JSON containing a webSocketDebuggerUrl like:
# ws://127.0.0.1:9222/devtools/browser/*************************************
# Launch Chrome in debug mode (adjust path if needed)
google-chrome \
--remote-debugging-address=127.0.0.1 \
--remote-debugging-port=9222 \
--user-data-dir="$HOME/tmp/chrome" \
--remote-allow-origins=* \
--no-first-run \
--no-default-browser-check

# Verify Chrome is running
curl http://127.0.0.1:9222/json/version
```

## HACK (reverse engineer) WEB APPS 👨🏻‍💻
Expand All @@ -265,6 +303,12 @@ The reverse engineering process follows a simple three-step workflow:
2. **Discover** — Let the AI agent analyze the captured data and generate a reusable Routine
3. **Execute** — Run the discovered Routine with different parameters to automate the task

### Quick Start (Recommended)

**Easiest way:** Use the [quickstart script](#quickstart-easiest-way-🚀) which automates the entire workflow.

### Manual Workflow (Step-by-Step)

Each step is detailed below. Start by ensuring Chrome is running in debug mode (see [Launch Chrome in Debug Mode](#launch-chrome-in-debug-mode-🐞) above).

### 0. Legal & Privacy Notice ⚠️
Expand All @@ -277,7 +321,7 @@ Use the CDP browser monitor to block trackers and capture network, storage, and
**Run this command to start monitoring:**

```bash
python scripts/browser_monitor.py --host 127.0.0.1 --port 9222 --output-dir ./cdp_captures --url about:blank --incognito
web-hacker-monitor --host 127.0.0.1 --port 9222 --output-dir ./cdp_captures --url about:blank --incognito
```

The script will open a new tab (starting at `about:blank`). Navigate to your target website, then manually perform the actions you want to automate (e.g., search, login, export report). Keep Chrome focused during this process. Press `Ctrl+C` and the script will consolidate transactions and produce a HAR automatically.
Expand Down Expand Up @@ -313,7 +357,7 @@ Use the **routine-discovery pipeline** to analyze captured data and synthesize a

**Linux/macOS (bash):**
```bash
python scripts/discover_routines.py \
web-hacker-discover \
--task "Recover API endpoints for searching for trains and their prices" \
--cdp-captures-dir ./cdp_captures \
--output-dir ./routine_discovery_output \
Expand All @@ -323,7 +367,7 @@ python scripts/discover_routines.py \
**Windows (PowerShell):**
```powershell
# Simple task (no quotes inside):
python scripts/discover_routines.py --task "Recover the API endpoints for searching for trains and their prices" --cdp-captures-dir ./cdp_captures --output-dir ./routine_discovery_output --llm-model gpt-5
web-hacker-discover --task "Recover the API endpoints for searching for trains and their prices" --cdp-captures-dir ./cdp_captures --output-dir ./routine_discovery_output --llm-model gpt-5
```

**Example tasks:**
Expand Down Expand Up @@ -372,21 +416,21 @@ Run the example routine:
```bash
# Using a parameters file:

python scripts/execute_routine.py \
web-hacker-execute \
--routine-path example_routines/amtrak_one_way_train_search_routine.json \
--parameters-path example_routines/amtrak_one_way_train_search_input.json

# Or pass parameters inline (JSON string):

python scripts/execute_routine.py \
web-hacker-execute \
--routine-path example_routines/amtrak_one_way_train_search_routine.json \
--parameters-dict '{"origin": "BOS", "destination": "NYP", "departureDate": "2026-03-22"}'
```

Run a discovered routine:

```bash
python scripts/execute_routine.py \
web-hacker-execute \
--routine-path routine_discovery_output/routine.json \
--parameters-path routine_discovery_output/test_parameters.json
```
Expand Down
63 changes: 56 additions & 7 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,22 +6,71 @@ build-backend = "hatchling.build"

[project]
name = "web-hacker"
version = "0.1.0"
description = " Reverse engineer any web app!"
version = "1.1.0"
description = "SDK for reverse engineering web apps"
readme = "README.md"
requires-python = ">=3.12.3,<3.13" # pinning to 3.12.x
requires-python = ">=3.12.3,<3.13"
license = {text = "Apache-2.0"}
authors = [
{name = "Vectorly", email = "contact@vectorly.app"}
]
keywords = [
"web-scraping",
"automation",
"cdp",
"chrome-devtools",
"api-discovery",
"reverse-engineering",
"browser-automation",
"sdk",
]
classifiers = [
"Development Status :: 4 - Beta",
"Intended Audience :: Developers",
"License :: OSI Approved :: Apache Software License",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.12",
"Topic :: Software Development :: Libraries :: Python Modules",
"Topic :: Internet :: WWW/HTTP :: Browsers",
"Topic :: Software Development :: Testing",
]
dependencies = [
"ipykernel>=6.29.5",
"openai>=2.6.1",
"pydantic>=2.11.4",
"pylint>=3.0.0",
"pytest>=8.3.5",
"python-dotenv>=1.2.1",
"requests>=2.31.0",
"websockets>=15.0.1",
"websocket-client>=1.6.0",
"beautifulsoup4>=4.14.2",
]

[project.optional-dependencies]
dev = [
"ipykernel>=6.29.5",
"pylint>=3.0.0",
"pytest>=8.3.5",
]

[project.scripts]
web-hacker-monitor = "web_hacker.scripts.browser_monitor:main"
web-hacker-discover = "web_hacker.scripts.discover_routines:main"
web-hacker-execute = "web_hacker.scripts.execute_routine:main"

[project.urls]
Homepage = "https://www.vectorly.app"
Documentation = "https://github.com/VectorlyApp/web-hacker#readme"
Repository = "https://github.com/VectorlyApp/web-hacker"
Issues = "https://github.com/VectorlyApp/web-hacker/issues"

[tool.hatch.build.targets.wheel]
packages = ["src"]
packages = ["web_hacker"]

[tool.hatch.build.targets.sdist]
include = [
"/web_hacker",
"/tests",
"/example_routines",
"README.md",
"LICENSE",
"pyproject.toml",
]
Loading