Skip to content

workspace import-dir: default-exclude .git, .databricks, node_modules#5118

Merged
jamesbroadhead merged 6 commits into
mainfrom
jb/import-dir-default-exclude-git
May 20, 2026
Merged

workspace import-dir: default-exclude .git, .databricks, node_modules#5118
jamesbroadhead merged 6 commits into
mainfrom
jb/import-dir-default-exclude-git

Conversation

@jamesbroadhead
Copy link
Copy Markdown
Contributor

@jamesbroadhead jamesbroadhead commented Apr 29, 2026

Summary

databricks workspace import-dir walks the source tree and copies every entry into the workspace verbatim — it has no awareness of .gitignore or default exclusions. This change adds a name-based skip for .git, .databricks, and node_modules directories during the walk. .gitignore and other dotfiles at the root remain copied. If a user explicitly passes .git (or any of the others) as the source root, that root is still copied — the skip rule applies to entries encountered during recursion.

Motivation: align import-dir with sync's existing defaults

databricks sync already hard-codes skips for the same two directories that cause the most trouble:

  • libs/git/repository.go// Always ignore root .git directory. adds .git to the default ignore rules unconditionally.
  • libs/git/view.go (SetupDefaults) — // Hard code .databricks ignore pattern so that we never sync it (irrespective of .gitignore patterns).

So sync and import-dir currently produce different workspace contents for the same source tree: sync skips .git/ and .databricks/, import-dir copies them. This PR closes that gap for import-dir so the two commands behave consistently.

node_modules is the one entry that goes beyond what sync does by default. For any project with a typical .gitignore, sync would already skip it via gitignore rules; import-dir ignores .gitignore entirely, so adding it to the name-based skip list keeps the behavior aligned with what users get from sync.

Why this matters in practice

databricks workspace import-dir is commonly reached for as the inverse of databricks workspace export-dir. Without these defaults, the imported tree carries:

  1. The local repo's .git/ directory, including its config and history.
  2. The local .databricks/ bundle cache, which can clobber state that bundle commands maintain remotely.
  3. node_modules/ for JS/TS projects — large, slow to upload, and recreated by the runtime install step anyway.

The canonical answer is databricks sync, which respects .gitignore and already excludes the first two by default. This PR is not a substitute for sync — it just brings import-dir's defaults into line for users who reach for it anyway.

Test plan

  • Unit tests covering: root .git/ skipped, nested .git/ skipped, .databricks/ skipped, node_modules/ skipped, .gitignore file kept, explicit .git root copied (escape hatch).
  • go test ./cmd/workspace/workspace/ — pass
  • golangci-lint run ./cmd/workspace/workspace/ — clean
  • Existing integration TestImportDir — unchanged, no .git in its testdata so behavior is identical.

This pull request and its description were written by Isaac.

@jamesbroadhead jamesbroadhead requested a review from pietern May 5, 2026 14:52
@jamesbroadhead jamesbroadhead changed the title workspace import-dir: default-exclude .git and .databricks directories workspace import-dir: default-exclude .git, .databricks, node_modules May 5, 2026
The previous walker copied every entry under the source tree into the
workspace verbatim. That has two practical consequences for users
deploying Databricks Apps via `databricks workspace import-dir` followed
by `databricks apps deploy`:

1. The local repo's `.git/config` (often containing the template-repo
   origin URL, sometimes cached credentials) ends up at
   `/app/python/source_code/.git/` in the running app container.
2. Local bundle cache `.databricks/` overwrites whatever the bundle
   pipeline put in the remote workspace.

Empirically reproduced on a probe deployment (deploy04-probe-jb on
e2-dogfood.staging) — the running container had a full `.git/` tree
including HEAD, config, objects, refs, hooks. CoDA
(github.com/datasciencemonkey/coding-agents-databricks-apps) ships an
in-app `_reinit_app_git()` to scrub this on every startup, and its
CLAUDE.md warns "never move .git folder to the workspace if you're
running workspace import" — that workaround is the bug surface this
change closes.

Reported as DEPLOY-04 #2 in Tushar's "Apps Gaps That Matter to EMEA
Apps" doc.

Skip is name-based and applied during the walk; if a user explicitly
passes `.git` (or `.databricks`) as the source root, that root is still
copied — the rule only fires on entries encountered during recursion.
`.gitignore` and other dot-files at the root remain copied as before.

Co-authored-by: Isaac
Same rationale as .git/.databricks: gets uploaded by accident, large,
re-installed in the runtime anyway.

Co-authored-by: Isaac
@jamesbroadhead jamesbroadhead force-pushed the jb/import-dir-default-exclude-git branch from bdaa5f8 to 1a24a14 Compare May 18, 2026 22:38
Copy link
Copy Markdown
Member

@simonfaltum simonfaltum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I reviewed the import-dir exclusion behavior and follow-up help/changelog updates; the remaining CI status can proceed independently.

@jamesbroadhead jamesbroadhead added this pull request to the merge queue May 20, 2026
Merged via the queue into main with commit 60d4de5 May 20, 2026
23 checks passed
@jamesbroadhead jamesbroadhead deleted the jb/import-dir-default-exclude-git branch May 20, 2026 18:24
deco-sdk-tagging Bot added a commit that referenced this pull request May 21, 2026
## Release v1.0.0

### Notable Changes

* The Databricks CLI is now generally available with version v1.0.0 as the first major release 🚀. From this version on, the CLI follows semantic versioning (see [README](README.md)). This change does not impact DABs or other existing commands beyond the changes listed below.
* The 0.299.x line continues to receive security-critical patches through May 20, 2027; see [SECURITY](SECURITY.md) for the support policy.
* Starting with v1.0.0, the CLI will use [immutable release tags](https://docs.github.com/en/code-security/concepts/supply-chain-security/immutable-releases) to increase security against supply chain attacks.
* Breaking change: OAuth tokens for interactive logins (`auth_type = databricks-cli`) are now stored in the OS-native secure store by default (Keychain on macOS, Credential Manager on Windows, Secret Service on Linux) instead of `~/.databricks/token-cache.json`. After upgrading, run `databricks auth login` once per profile to re-authenticate; cached tokens from older versions are not migrated. To keep the previous file-backed storage, set `DATABRICKS_AUTH_STORAGE=plaintext` or add `auth_storage = plaintext` under `[__settings__]` in `~/.databrickscfg` (the env var takes precedence over the config setting), then re-run `databricks auth login`. On systems where the OS keyring is not reachable (e.g. Linux containers without a D-Bus session bus), the CLI transparently falls back to the file cache when reading tokens so legacy `token-cache.json` entries remain accessible without manual configuration.

### CLI

* Added `databricks aitools` command group for installing Databricks skills into your coding agents (Claude Code, Cursor, Codex CLI, OpenCode, GitHub Copilot, Antigravity). Skills are fetched from [github.com/databricks/databricks-agent-skills](https://github.com/databricks/databricks-agent-skills) and either symlinked into each agent's skills directory or copied into the current project. Use `databricks aitools install` to set up, `update` to pull newer versions, `list` to see what's available, and `uninstall` to remove them. Pick where they go with `--scope=project|global` (`--scope=both` is accepted on `update` and `list`).
* `[__settings__].default_profile` is now consulted as a fallback by `databricks api`, `databricks auth token`, and bundle commands when neither `--profile` nor `DATABRICKS_CONFIG_PROFILE` is set. `databricks auth token` continues to give precedence to `DATABRICKS_HOST` over `default_profile`. For bundle commands, `default_profile` only applies when the bundle does not pin its own `workspace.host`.
* Fixed bug where auth commands did not load the DEFAULT profile properly during auth where type is `databricks-cli`.
* `databricks workspace import-dir` now skips `.git`, `.databricks`, and `node_modules` directories during recursive imports. To import one of these directories deliberately, pass it as `SOURCE_PATH` ([#5118](#5118)).
* `databricks postgres create-role --help` now documents the `--json` body shape and rejects the common mistake of wrapping the body in `{"role": ...}` client-side with a hint pointing at the correct shape ([#5111](#5111)).
* `databricks aitools list` honors `--output json`, emitting a structured `{release, skills[...], summary{}}` document so coding agents and CI can consume the skill/version/installation matrix without scraping the tabular text output ([#5233](#5233)).

### Bundles
* Make sure warnings asking for approval are understood by agents ([#5239](#5239))
* Support `replace_existing: true` on `postgres_branches` and `postgres_endpoints` so bundles can manage the implicitly-created production branch and primary read-write endpoint of a Lakebase project.
* Add `postgres_catalogs` resource to bind a Unity Catalog catalog to a Postgres database on a Lakebase Autoscaling branch ([#5265](#5265)).
* Add `postgres_synced_tables` resource to sync a Unity Catalog Delta table into a Postgres table on a Lakebase Autoscaling branch ([#5268](#5268)).
* engine/direct: Changes to state file now persisted to .wal file right away instead of being saved in the end ([#5149](#5149))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants