Cloak by Praxis

Chat with cloud LLMs while a local model swaps your PII out before the prompt leaves your machine — and rehydrates the answer back to your real entities.

The idea

Every time you ask a cloud AI about something personal — a city you're moving to, a company you work for, a name — you're sending that data to an external server. Cloak intercepts each prompt, runs a local PII detector, replaces sensitive entities with realistic stand-ins (not [REDACTED] — actual fake names and cities), sends the scrubbed prompt to the cloud, then maps the answer back to your real entities before you see it.

The round-trip substitution is the differentiator: the cloud model sees a coherent, answerable prompt; you get a useful, real answer; and the actual PII never left your device.

This is disclosure control, not a privacy guarantee. The pipeline reduces PII exposure significantly — but a determined adversary with enough queries could still reconstruct context. The goal is to make routine LLM use substantially less leaky, without sacrificing answer quality.

Quickstart

Prerequisites: Ollama installed and running.

# 1. Install Ollama (if not already):
#    https://ollama.com/download

# 2. Clone and launch:
git clone https://github.com/praxis-society/praxis-cloak.git
cd praxis-cloak
./scripts/run.sh

run.sh will:

Download praxis/spanfinder-3b and praxis/relevance-3b from HuggingFace and build them into Ollama locally (first run only, ~3.8 GB total — no Ollama account needed)
Create a Python virtual environment and install dependencies
Build the frontend (requires Node; skipped if npm is not found)
Start the local server at http://127.0.0.1:8765 and open it in your browser

Settings: Paste your Anthropic or OpenAI API key in Settings to route scrubbed prompts through your preferred provider.

Models

Model	Weights	Size (q4_k_m)	Role
`praxis/spanfinder-3b`	HuggingFace	~1.8 GB	Detects PII spans in the prompt
`praxis/relevance-3b`	HuggingFace	~1.8 GB	Decides which detected entities to scrub vs. keep (needed to answer the question)

run.sh downloads these GGUFs from HuggingFace and builds them into Ollama for you (the relevance model ships a chat template via its Modelfile, so build it locally rather than pulling the bare GGUF). Both are Qwen2.5-3B-Instruct LoRA fine-tunes; q8_0 reference quantizations (~3.1 GB each) and the LoRA adapters are also on HuggingFace. To try the span detector standalone: ollama run hf.co/praxis-nation/spanfinder-3b:Q4_K_M.

How it works

User prompt
    │
    ▼
[regex fast-pass]          ← catches obvious patterns (email, phone)
    │
    ▼
[spanfinder-3b]            ← on-device; detects PII spans (names, orgs, locations)
    │
    ▼
[relevance-3b]             ← decides: KEEP (load-bearing) or SCRUB (incidental)
    │
    ▼
[substitution]             ← replaces SCRUB entities with realistic fakes
    │                         (not [REDACTED]; actual coherent stand-ins)
    ▼
[cloud LLM]                ← sees a coherent, answerable, scrubbed prompt
    │
    ▼
[rehydration]              ← maps fake entities in the answer back to real ones
    │
    ▼
Answer (real entities)

Re relevance step - Keyword scrubbing over-removes: "What's the tax rate in Toronto?" needs "Toronto" to produce a useful answer. The relevance model judges whether each detected entity is needed — and only the incidental ones get substituted.

Re realistic fakes - [REDACTED] breaks the cloud model's ability to reason coherently. A fake city name preserves sentence structure and answer quality; the rehydration step maps it back.

Repo layout

app/
  backend/          FastAPI server (scrub pipeline, cloud calls, rehydration)
  frontend/         React UI
models/
  spanfinder-3b/    HuggingFace model card
  relevance-3b/     HuggingFace model card
  Modelfile.spanfinder
  Modelfile.relevance
  LICENSE_AND_ATTRIBUTION.md
scripts/
  run.sh            One-command launcher
  install.sh        Dependency installer (no server start)
src/                Core library (span detection, substitution, pipeline)

License

Model artifacts (*.gguf, *.safetensors, LoRA adapter weights): Qwen Research License (non-commercial). Commercial use requires a separate license from Alibaba Cloud.

Built with Qwen — praxis/spanfinder-3b and praxis/relevance-3b are fine-tuned derivatives of Qwen/Qwen2.5-3B-Instruct. Qwen is licensed under the Qwen RESEARCH LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved.

See NOTICE and models/LICENSE_AND_ATTRIBUTION.md for full attribution including the CAPID dataset (MIT) used to train the relevance model.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
models		models
scripts		scripts
src/cloak		src/cloak
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cloak by Praxis

The idea

Quickstart

Models

How it works

Repo layout

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Cloak by Praxis

The idea

Quickstart

Models

How it works

Repo layout

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages