A cookiecutter template for data-science, statistics, and text-analysis projects. Born as the internal template we use at Crow Intelligence for day-to-day analytics, modelling, and NLP research — and now available for anyone to use.
A project ready to work in, with no manual setup:
- Python at the version you choose, installed and pinned by uv (downloads a prebuilt python-build-standalone binary — no system build deps required)
- uv as package manager with a locked
uv.lock - ruff for linting and formatting + ty for type checking, both wired into pre-commit
- DVC for data versioning with a GCS remote
- MLflow with local tracking and GCS artifact store (optional)
- structlog for structured logging
- pydantic-settings + python-dotenv for typed config and GCS credential management
- Sphinx docs scaffold (autodoc, napoleon, RTD theme)
- Claude Code integration — a per-project
CLAUDE.mdand pre-installed skill packs (see below) - Sensible
.gitignoreand.dvcignoredefaults - A
Makefilewithmake help,make lint,make test,make dvc-push,make install-skills, and more
my-project/
├── data/
│ ├── raw/ ← original, immutable data (DVC-tracked)
│ ├── processed/ ← transformed data (DVC-tracked)
│ └── external/
├── docs/ ← Sphinx docs
├── models/ ← serialised models and embeddings (DVC-tracked)
├── notebooks/ ← exploratory analysis
├── reports/figures/ ← generated graphics
└── src/my_project/ ← importable Python package
├── config.py ← typed settings via pydantic-settings
├── logging.py ← structlog setup
└── tracking.py ← MLflow helpers (if enabled)
uvx cookiecutter https://github.com/crow-intelligence/corvus.gitOr, if you don't have uv installed:
pip install cookiecutter
cookiecutter https://github.com/crow-intelligence/corvus.gitFollow the prompts. When done, cd into your new project and:
cp .env.template .env # fill in your GCP credentials
uv sync
uv run dvc pull # once data exists on the remote
make help # see all available commands| Prompt | Default | Notes |
|---|---|---|
project_name |
my-project |
Slug and package name are derived automatically |
python_version |
3.11 |
Installed via uv python install — 3.11 resolves to the latest patch |
licence |
MIT |
MIT, BSD 2/3, GPL/LGPL/AGPL v3, CC BY/BY-SA/BY-NC, Proprietary, Custom |
gcs_bucket |
gs://my-bucket |
Used for DVC remote and MLflow artifacts |
gcp_project_id |
my-gcp-project |
Required for GCS bucket operations |
gcs_region |
EU |
Region for bucket creation |
use_mlflow |
yes |
Adds MLflow and generates tracking.py |
mlflow_experiment |
project name | MLflow experiment name |
use_spacy |
no |
Adds spaCy to runtime deps |
install_claude_skills_python |
yes |
Vendor Matthew Honnibal's Python code-quality skills (MIT) |
install_claude_skills_analytics |
yes |
Fetch nimrodfisher's 30 data-analytics skills at generation time |
install_claude_skills_anthropic |
no |
Fetch Anthropic's official skill library (Apache-2.0) |
Before running corvus, you need:
- uv — package manager and Python installer (
curl -LsSf https://astral.sh/uv/install.sh | sh) - Google Cloud SDK — for DVC and MLflow on GCS (optional; you can configure GCS later)
Tracking store is local (.mlruns/, gitignored). Artifacts go to GCS.
from my_project.tracking import init_experiment, start_run
import mlflow
init_experiment()
with start_run("feature-extraction"):
mlflow.log_param("window_size", 5)
mlflow.log_metric("coverage", 0.94)
mlflow.log_artifact("reports/figures/vocab_growth.png")View runs locally:
uv run mlflow ui# After adding new raw data:
uv run dvc add data/raw/corpus.jsonl
git add data/raw/corpus.jsonl.dvc data/raw/.gitignore
git commit -m "data: add raw corpus"
uv run dvc push
# On another machine:
uv run dvc pullEvery generated project ships with:
CLAUDE.mdat the project root — per-project context (package name, Python version, layout, commands, MLflow/spaCy flags) that Claude Code loads automatically at session start..claude/skills/— pre-installed skill packs that Claude Code discovers on open.
| Pack | Source | Default | Licence |
|---|---|---|---|
python-quality |
honnibal/claude-skills — vendored into this template | yes |
MIT |
data-analytics |
nimrodfisher/data-analytics-skills — fetched at generation | yes |
unspecified upstream |
anthropic |
anthropics/skills — fetched at generation | no |
Apache-2.0 |
Flip any of these off at cookiecutter time. To change your mind later, edit .claude/skills/MANIFEST.yaml in the generated project and run make install-skills.
These skills are project-scoped — they live in the project's git repo and apply only when Claude Code is open in that project. For machine-wide skills, either symlink entries into ~/.claude/skills/ or install them there directly; project-level skills take precedence when both exist.
Crow Intelligence is an independent research group and boutique consultancy specialising in language, cognition, and AI. We apply computational methods to understand how language shapes thought — across historical corpora, political discourse, financial narratives, and beyond.
Corvus is the template we use for every new project. We're sharing it because good scaffolding shouldn't be reinvented each time.
PRs are welcome — see CONTRIBUTING.md.