Collecting the vibes of coding — one log at a time.
We’re building an open dataset to capture real-world coding interactions between developers and AI coding assistants — and we need your help!
This dataset will help researchers and developers better understand how humans and code models interact across different tools, and improve the future of AI-assisted software development.
The VibeCoding Dataset aims to collect anonymized client ↔ server message logs from popular AI coding tools and interfaces. These logs will form the basis of an open dataset hosted on Hugging Face and GitHub.
- Hugging Face: https://huggingface.co/datasets/QuixiAI/VibeCoding
- GitHub: https://github.com/QuixiAI/vibecoding
We’re collecting interaction logs from the following coding assistants and CLIs:
- Claude Code
- OpenAI Codex
- Gemini CLI
- Open-Code
- Cline
- Roo Code
- Continue.dev
- Cursor
- Windsurf
- Goose
- OpenHands
- Aider
- Factory Droid CLI
- charmbracelet/crush
If you regularly use any of these — you’re exactly who we need!
-
Set up a logging proxy Use a lightweight tool like LiteLLM or Dolphin Logger to capture your coding assistant’s request/response data.
-
Record your sessions As you use your AI coding tool normally, your proxy will record the message logs between your client and the model API.
-
Anonymize and submit Before submission, make sure your logs contain no private or sensitive information. See our Data Cleaning Guide (coming soon).
-
Contribute your data Submit your anonymized logs via:
- Pull request to this repo, or
- Upload through the Hugging Face dataset page
Here’s how to get started capturing logs safely and easily.
LiteLLM is a drop-in proxy for OpenAI-compatible APIs.
pip install litellmlitellm --port 4000 --log --log_file logs/vibecoding.jsonlChange your AI coding tool or CLI to point to:
OPENAI_API_BASE=http://localhost:4000
Keep your normal API key set as usual.
LiteLLM will log all incoming/outgoing messages in logs/vibecoding.jsonl.
Dolphin Logger provides an intercepting proxy that records JSON message streams.
git clone https://github.com/yoheinakajima/dolphin-logger.git
cd dolphin-logger
npm installnpm startBy default, this runs on http://localhost:3000.
Point your coding assistant’s API endpoint or environment variable to:
HTTP_PROXY=http://localhost:3000
Your logs will appear in the logs/ directory as JSON files.
Before submission, please remove or redact:
- Any personal identifiers (e.g., email, usernames)
- Proprietary or confidential code
- Project names or unique file paths
You can anonymize text manually or use our upcoming sanitize_logs.py script.
When your logs are ready:
- Fork this repository
- Create a folder under
submissions/<your_handle>/ - Add your cleaned
.jsonor.jsonllogs - Open a pull request
Alternatively, you can upload them directly to our Hugging Face dataset.
All volunteers who contribute cleaned and usable logs will be credited by name or handle in:
- The dataset release notes
- The model card
- The GitHub contributors section
We appreciate your help in making open-source AI more transparent and human-centered!
Join the discussion in our dedicated channel:
👉 #vibecoding-dataset-project
Ask questions, share your setup, or get help with proxy configuration.
This project follows the principles of open, ethical data collection:
- No private or proprietary data
- No identifying information
- Only voluntary, informed contributions
Dataset licensed under Apache 2.0.
- 🧠 Dataset: Hugging Face – QuixiAI/VibeCoding
- 💻 Code & Instructions: GitHub – QuixiAI/vibecoding
- 💬 Discussion:
#vibecoding-dataset-project
Help us capture the rhythm of coding — one conversation at a time.