Skip to content

AlwaysLearning-dev/Gh0stW4lk

Repository files navigation

Gh0stW4lk

A self-hosted web noise generator that runs in a Docker container. Gh0stW4lk creates fake browsing traffic by autonomously visiting and navigating websites using a real headless Chromium browser — with support for multiple concurrent tabs browsing different sites simultaneously. It pollutes your data trail with meaningless noise, making it significantly harder for trackers, ISPs, and algorithms to build an accurate profile of your real browsing behavior.

Unlike simple HTTP request generators, Gh0stW4lk uses a full browser engine — every visit executes JavaScript, loads analytics scripts, sets cookies, and fires tracking pixels exactly like a real user would. Modern trackers can't distinguish it from genuine browsing.

Think of it as leaving thousands of fake footprints everywhere so your real ones can't be followed.

image

Matrix-themed dashboard Docker Python

How It Works

Gh0stW4lk runs a headless Chromium browser inside a Docker container. Once started, it:

  1. Picks a random site from your approved site list
  2. Visits the site and loads it like a real browser (executes JavaScript, renders the page, etc.)
  3. Crawls 2-5 internal links on that site — it finds all same-origin links on the page, filters out downloads and external links, and randomly clicks through them
  4. Waits a randomized delay between each page load to simulate natural human browsing patterns
  5. Moves on to another random site from the list and repeats

This runs 24/7 in the background on your server — no browser window needed, no extension to install, no computer that needs to stay open.

Site Selection & Navigation

  • Sites are chosen randomly from your list on each rotation — there's no sequential order
  • On each site, it follows 2 to 5 internal links before moving on
  • It only follows same-origin links (stays on the same domain, never opens external links)
  • Links that open new windows/tabs are ignored
  • File downloads are skipped — PDFs, images, ZIPs, executables, Office documents, media files, etc.
  • If a site fails to load, it logs the error and moves on to the next one
  • If no internal links are found on a page, it logs it and rotates to the next site

Realistic Traffic

  • User-Agent rotation — cycles through 6 real browser User-Agent strings (Chrome, Firefox, Safari, Edge on Windows/Mac/Linux) so each visit looks like it's coming from a different browser
  • Randomized timing — delays between page loads are randomized within your custom range, not fixed intervals
  • Full browser rendering — uses Playwright with Chromium, so JavaScript-heavy sites (SPAs, dynamic content) work just like in a real browser

Features

  • Web Dashboard — Matrix-themed UI with animated falling code rain background
  • Start/Stop Control — one-click to toggle noise generation
  • Verified Activity Log — real-time WebSocket feed showing every page visited with page title, body size, and link count so you can confirm pages are actually rendering (not bot-blocked)
  • Clear Log — wipe the activity log with one click
  • Custom Speed — set your own min/max delay in seconds (e.g., 5-10s for fast, 60-120s for stealth)
  • Concurrent Browsing — run multiple browser tabs simultaneously (1-N workers), each independently crawling different sites. Each tab uses ~50-150MB RAM. Set the number of tabs in the dashboard and restart the engine to apply
  • Lite Mode — blocks images, fonts, CSS, and media to save ~80% bandwidth while still generating the same tracker-visible traffic (DNS lookups, HTTP requests, URL paths)
  • 100 Preloaded Sites — ships with a diverse default list spanning news, shopping, tech, education, health, finance, sports, travel, food, science, entertainment, real estate, and more
  • Site Management — add or remove individual sites, or clear all sites at once
  • Import/Export — upload a JSON or YAML file with your own site list, or export your current list
  • Persistent Config — your site list and settings survive container restarts via a mounted volume
  • Auto-Start — optionally start generating noise as soon as the container launches
  • Stats Tracking — see total pages visited, number of sites in rotation, and uptime at a glance
  • Pulsing Status Indicator — glowing green dot when running, glowing red when stopped

Quick Start

Prerequisites

  • Docker installed and running

Run It

git clone https://github.com/AlwaysLearning-dev/Gh0stW4lk.git
cd Gh0stW4lk
docker compose up --build

Open http://localhost:8888 in your browser.

Click Start and watch the activity log fill up.

Auto-Start on Launch

To start generating noise automatically when the container starts, set the environment variable in docker-compose.yml:

environment:
  - GH0STW4LK_AUTO_START=true

Configuration

docker-compose.yml

services:
  gh0stwalk:
    build: .
    ports:
      - "8888:8080"
    volumes:
      - ./data:/app/data    # Persistent config storage
    environment:
      - GH0STW4LK_AUTO_START=false
    restart: unless-stopped

Speed Recommendations

Delay Bandwidth/hour Notes
1-3 sec 2-6 GB Will get CAPTCHAs and IP bans on most sites
5-10 sec 500 MB - 2 GB Good for testing. Some strict sites may notice
15-30 sec 200-800 MB Safe for most sites. Looks like normal browsing
30-60 sec 100-400 MB Invisible. No site will care

Enable Lite Mode to cut bandwidth by ~80% — it blocks images, fonts, CSS, and media while still generating all the DNS/HTTP traffic that trackers see.

Importing a Site List

You can upload a site list file through the dashboard using the Upload button. Supported formats:

JSON (array):

[
  "https://www.example.com",
  "https://www.another-site.org"
]

JSON (object with key):

{
  "sites": [
    "https://www.example.com",
    "https://www.another-site.org"
  ]
}

YAML:

sites:
  - https://www.example.com
  - https://www.another-site.org

Imported sites are added to your existing list (not replaced). Use Clear All first if you want to start fresh.

Exporting Your Site List

Click Export in the dashboard to download your current site list as a JSON file.

Default Site List

Gh0stW4lk ships with 100 sites across deliberately diverse categories to create a scattered, incoherent browsing profile:

Category Examples
News Reuters, BBC, CNN, Al Jazeera, NPR, Guardian
Shopping Amazon, eBay, Etsy, Target, Walmart, Nordstrom
Tech GitHub, Stack Overflow, Hacker News, Wired, Ars Technica
Education Wikipedia, Khan Academy, Coursera, edX, MIT
Health WebMD, Mayo Clinic, Healthline, NIH
Finance Investopedia, NerdWallet, Yahoo Finance, Bloomberg
Travel TripAdvisor, Lonely Planet, Booking.com, Airbnb
Food Allrecipes, Food Network, Bon Appetit, Serious Eats
Sports ESPN, NBA, NFL, MLB
Science Nature, Scientific American, National Geographic, NASA
Real Estate Zillow, Realtor, Redfin
Auto Cars.com, AutoTrader, Carfax
Outdoor/Retail REI, Patagonia, Nike, Adidas
Books Goodreads, Audible, Barnes & Noble
Pets Petfinder, AKC, ASPCA
Social/Forums Reddit, Medium, Quora

Project Structure

gh0stw4lk/
├── Dockerfile              # Playwright + Chromium base image
├── docker-compose.yml      # One-command deployment
├── requirements.txt        # Python dependencies
├── default_sites.json      # 100 preloaded sites
├── data/                   # Persistent config (volume mount)
│   └── config.json         # Auto-generated on first run
└── app/
    ├── main.py             # FastAPI app, REST API, WebSocket
    ├── noise_engine.py     # Headless browser crawling engine
    ├── config.py           # Config persistence layer
    ├── models.py           # Data models
    └── static/
        └── index.html      # Matrix-themed dashboard (single file)

API Endpoints

Method Endpoint Description
GET / Dashboard UI
GET /api/status Engine status, stats
POST /api/start Start noise generation
POST /api/stop Stop noise generation
GET /api/sites List all sites
POST /api/sites Add a site ({"url": "..."})
DELETE /api/sites/{id} Remove a site
DELETE /api/sites Remove all sites
GET /api/sites/export Export site list as JSON
POST /api/sites/import Import site list (multipart file upload)
GET /api/settings Current settings
PUT /api/settings Update settings (speed_min, speed_max, lite_mode, etc.)
GET /api/log Recent activity log
DELETE /api/log Clear activity log
WS /ws/log Live activity stream

Verifying It Works

Each log entry shows verification data so you can confirm real browsing is happening:

04:12:33  ✓  https://www.reuters.com       HTTP 200 | "Reuters | Breaking International News" | 342.1KB | 87 links
04:13:01  ✓  https://www.reuters.com/world  HTTP 200 | "World News | Reuters" | 218.5KB | 54 links
04:13:45  –  https://www.reuters.com/world  No onsite links found
04:14:12  ✓  https://www.amazon.com         HTTP 200 | "Amazon.com: Online Shopping" | 891.2KB | 203 links
04:14:33  ✗  https://www.example.com        HTTP 403
  • Page title — confirms a real page loaded, not a captcha or block page
  • Body size (KB) — a real page is typically 50KB+; a bot-block page is usually <5KB
  • Link count — confirms the DOM rendered and links were parsed for crawling

You can also verify Chromium is actively running inside the container:

docker compose exec gh0stwalk ps aux | grep chrom

Security

Container Isolation

  • Read-only filesystem — the container's root filesystem is mounted read-only. Only /app/data (config) and /tmp (browser cache) are writable, so even if Chromium is exploited, nothing persistent can be written
  • Dropped capabilities — all Linux capabilities are dropped (cap_drop: ALL). The container cannot mount filesystems, change networking, load kernel modules, etc.
  • No privilege escalationno-new-privileges prevents any process inside from gaining additional privileges via setuid/setgid
  • Non-root user — runs as pwuser, keeping Chromium's sandbox enabled
  • Tmpfs for browser data — Chromium's temp files live in memory-backed tmpfs, wiped on every restart
  • LAN isolation (optional) — uncomment the networks section in docker-compose.yml to block the container from accessing your local network entirely, only allowing internet traffic
  • Memory limit (2GB) — prevents a compromised browser from consuming all host memory
  • CPU limit (1 core) — prevents crypto-miner exploits from pegging all cores
  • PID limit (300) — blocks fork bomb attacks. Chromium spawns renderer processes per concurrent tab (~10-15 per tab)
  • Log rotation — Docker container logs capped at 10MB with 3 rotated files (30MB total), preventing disk exhaustion

Application Security

  • SSRF protection — blocks localhost, private IPs, loopback, and link-local addresses from being added as sites
  • XSS prevention — all user-supplied data is HTML-escaped before DOM insertion
  • Upload limits — file imports capped at 1MB
  • URL validation — only http:// and https:// schemes are allowed

What About Malicious Sites?

The headless browser visits real websites and executes their JavaScript, which carries inherent risk. The mitigations above ensure that even if a site serves a browser exploit:

  1. The attacker lands in a read-only container with no capabilities — there's almost nothing to do
  2. They're running as non-root with no privilege escalation path
  3. They cannot reach your LAN if you enable the isolated network
  4. Everything in /tmp is wiped on restart — no persistence
  5. Chromium's own sandbox is active (unlike when running as root)

Roadmap

  • Concurrent browsing — run multiple browser tabs simultaneously within a single Chromium instance. Set the number of tabs in the dashboard. Each tab uses ~50-150MB RAM.
  • API key authentication — protect the dashboard and API with a token via GH0STW4LK_API_KEY environment variable, preventing unauthorized access when exposed beyond localhost
  • Localhost-only binding — change port mapping to 127.0.0.1:8888:8080 so only the host machine can access the dashboard, not other devices on the LAN
  • DNS-over-HTTPS — route container DNS through a DoH proxy so your ISP can't see which domains the browser is resolving, even though the browsing itself is HTTPS

Tech Stack

  • Python 3.12 + FastAPI — async web framework
  • Playwright — headless Chromium browser automation
  • Uvicorn — ASGI server
  • Vanilla HTML/CSS/JS — zero-dependency dashboard with Matrix rain canvas animation

Disclaimer

Yes this was made with the help of language models. Yes, I am a pseudo-dev. Yes I understand cybersecurity (As much as I can). Yes I understand dependencies (Usually). Yes I love The Matrix.

Support

If you find this useful and want to buy me a coffee:

Bitcoin: 3LHdWbp4NBcP3EowD9fNRBgRQx3whmq3tP (only on BTC network)

Dogecoin: DDRDQeAZZxfPEb6XptEeWqwjKMG3R1W6SR (only on Doge network)

License

MIT

Star History Chart

About

A self-hosted web noise generator that pollutes your data trail with fake browsing traffic. Runs a headless Chromium browser in Docker, autonomously visiting and navigating websites to make it harder for trackers, ISPs, and algorithms to profile your real browsing behavior.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors