SelfChat

Code for running self-chat between two role-inverted instances of a language model on a local Ollama server, plus analysis tooling for studying attractor states in the resulting transcripts. The current setup compares an int4-quantized Gemma-4-31B-it official checkpoint against an abliterated variant of the same checkpoint under matched quantization.

References:

Anthropic's Claude 4 System Card — "spiritual bliss attractor state".
LessWrong: Models have some pretty funny attractor states and its companion repo.
The original Dreams of an Electric Mind self-chat.

Setup

Install uv (if you don't have it):

curl -LsSf https://astral.sh/uv/install.sh | sh

Or on Windows: powershell -ExecutionPolicy Bypass -c "irm https://astral.sh/uv/install.ps1 | iex"

Install dependencies:

uv sync

Get the data

Transcripts and embedding artifacts are published as a HuggingFace dataset: alliedtoasters/forbidden-backrooms-gemma-4-31B-it. The hf CLI ships with uv sync (it's a project dep), so step 2 above already put it in .venv/bin/hf. Use it rather than git clone — the npz/jsonl blobs are Git-LFS-tracked, and hf download resolves LFS pointers natively.

# Download the dataset into a sibling directory of this repo:
.venv/bin/hf download alliedtoasters/forbidden-backrooms-gemma-4-31B-it \
  --repo-type dataset \
  --local-dir ../forbidden-backrooms-data

# Symlink the data into the repo so all paths resolve as the code expects:
ln -s ../forbidden-backrooms-data/transcripts transcripts
ln -s ../forbidden-backrooms-data/artifacts  artifacts

If you already have local transcripts/ or artifacts/ directories from your own runs, move them aside first — symlinks won't overwrite real directories.

Browse the data

.venv/bin/streamlit run selfchat/viz/browse.py

The default page shows terminal-state PCA over runs (click a point → transcript loads in side panel). The cluster lab page (sidebar) does interactive per-message KMeans + PCA/t-SNE with optional Llama Guard 3 color-by.

Generate new transcripts

Requires Ollama (http://localhost:11434) with both model tags pulled and served under matched int4 quantization (so quantization noise isn't a confound between them):

Vanilla: google/gemma-4-31B-it — served locally as gemma-4-vanilla-q4.
Jailbroken: llmfan46/gemma-4-31B-it-uncensored-heretic — abliterated fine-tune of the vanilla checkpoint; served locally as gemma-4-refusalstudy-q4.

.venv/bin/python -m selfchat.runs.run_experiment \
  --variants vanilla jailbroken \
  --seeds freedom freedom_dark task \
  --runs 20 --turns 50

Sample-size table per (variant, seed):

ls transcripts/ | sed 's/_[0-9a-f]\{32\}_.*//' | sort | uniq -c | sort -rn

Safety

The jailbroken variant is the abliterated gemma-4-31B-it-uncensored-heretic fine-tune. Outputs may contain content that the official model would refuse. Every message in the published dataset has been screened by Llama Guard 3 8B; per-message and per-run verdicts live in artifacts/vet_results.jsonl. The author also manually reviewed the highest-p_unsafe messages and judged the content non-graphic. See the dataset card for the full vetting protocol and content notes.

The pipeline records raw model outputs verbatim — outputs are not sanitized, redacted, or content-filtered, so the experimental signal is preserved. Neither model checkpoint is committed to this repo; both are pulled from the HuggingFace links above into your local Ollama store.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
scripts		scripts
selfchat		selfchat
tests		tests
.gitignore		.gitignore
CITATIONS.bib		CITATIONS.bib
LICENSE		LICENSE
README.md		README.md
priors.md		priors.md
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SelfChat

Setup

Get the data

Browse the data

Generate new transcripts

Safety

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SelfChat

Setup

Get the data

Browse the data

Generate new transcripts

Safety

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages