workspace import-dir: default-exclude .git, .databricks, node_modules#5118
Merged
Conversation
The previous walker copied every entry under the source tree into the workspace verbatim. That has two practical consequences for users deploying Databricks Apps via `databricks workspace import-dir` followed by `databricks apps deploy`: 1. The local repo's `.git/config` (often containing the template-repo origin URL, sometimes cached credentials) ends up at `/app/python/source_code/.git/` in the running app container. 2. Local bundle cache `.databricks/` overwrites whatever the bundle pipeline put in the remote workspace. Empirically reproduced on a probe deployment (deploy04-probe-jb on e2-dogfood.staging) — the running container had a full `.git/` tree including HEAD, config, objects, refs, hooks. CoDA (github.com/datasciencemonkey/coding-agents-databricks-apps) ships an in-app `_reinit_app_git()` to scrub this on every startup, and its CLAUDE.md warns "never move .git folder to the workspace if you're running workspace import" — that workaround is the bug surface this change closes. Reported as DEPLOY-04 #2 in Tushar's "Apps Gaps That Matter to EMEA Apps" doc. Skip is name-based and applied during the walk; if a user explicitly passes `.git` (or `.databricks`) as the source root, that root is still copied — the rule only fires on entries encountered during recursion. `.gitignore` and other dot-files at the root remain copied as before. Co-authored-by: Isaac
Same rationale as .git/.databricks: gets uploaded by accident, large, re-installed in the runtime anyway. Co-authored-by: Isaac
bdaa5f8 to
1a24a14
Compare
simonfaltum
approved these changes
May 20, 2026
Member
simonfaltum
left a comment
There was a problem hiding this comment.
Looks good. I reviewed the import-dir exclusion behavior and follow-up help/changelog updates; the remaining CI status can proceed independently.
deco-sdk-tagging Bot
added a commit
that referenced
this pull request
May 21, 2026
## Release v1.0.0 ### Notable Changes * The Databricks CLI is now generally available with version v1.0.0 as the first major release 🚀. From this version on, the CLI follows semantic versioning (see [README](README.md)). This change does not impact DABs or other existing commands beyond the changes listed below. * The 0.299.x line continues to receive security-critical patches through May 20, 2027; see [SECURITY](SECURITY.md) for the support policy. * Starting with v1.0.0, the CLI will use [immutable release tags](https://docs.github.com/en/code-security/concepts/supply-chain-security/immutable-releases) to increase security against supply chain attacks. * Breaking change: OAuth tokens for interactive logins (`auth_type = databricks-cli`) are now stored in the OS-native secure store by default (Keychain on macOS, Credential Manager on Windows, Secret Service on Linux) instead of `~/.databricks/token-cache.json`. After upgrading, run `databricks auth login` once per profile to re-authenticate; cached tokens from older versions are not migrated. To keep the previous file-backed storage, set `DATABRICKS_AUTH_STORAGE=plaintext` or add `auth_storage = plaintext` under `[__settings__]` in `~/.databrickscfg` (the env var takes precedence over the config setting), then re-run `databricks auth login`. On systems where the OS keyring is not reachable (e.g. Linux containers without a D-Bus session bus), the CLI transparently falls back to the file cache when reading tokens so legacy `token-cache.json` entries remain accessible without manual configuration. ### CLI * Added `databricks aitools` command group for installing Databricks skills into your coding agents (Claude Code, Cursor, Codex CLI, OpenCode, GitHub Copilot, Antigravity). Skills are fetched from [github.com/databricks/databricks-agent-skills](https://github.com/databricks/databricks-agent-skills) and either symlinked into each agent's skills directory or copied into the current project. Use `databricks aitools install` to set up, `update` to pull newer versions, `list` to see what's available, and `uninstall` to remove them. Pick where they go with `--scope=project|global` (`--scope=both` is accepted on `update` and `list`). * `[__settings__].default_profile` is now consulted as a fallback by `databricks api`, `databricks auth token`, and bundle commands when neither `--profile` nor `DATABRICKS_CONFIG_PROFILE` is set. `databricks auth token` continues to give precedence to `DATABRICKS_HOST` over `default_profile`. For bundle commands, `default_profile` only applies when the bundle does not pin its own `workspace.host`. * Fixed bug where auth commands did not load the DEFAULT profile properly during auth where type is `databricks-cli`. * `databricks workspace import-dir` now skips `.git`, `.databricks`, and `node_modules` directories during recursive imports. To import one of these directories deliberately, pass it as `SOURCE_PATH` ([#5118](#5118)). * `databricks postgres create-role --help` now documents the `--json` body shape and rejects the common mistake of wrapping the body in `{"role": ...}` client-side with a hint pointing at the correct shape ([#5111](#5111)). * `databricks aitools list` honors `--output json`, emitting a structured `{release, skills[...], summary{}}` document so coding agents and CI can consume the skill/version/installation matrix without scraping the tabular text output ([#5233](#5233)). ### Bundles * Make sure warnings asking for approval are understood by agents ([#5239](#5239)) * Support `replace_existing: true` on `postgres_branches` and `postgres_endpoints` so bundles can manage the implicitly-created production branch and primary read-write endpoint of a Lakebase project. * Add `postgres_catalogs` resource to bind a Unity Catalog catalog to a Postgres database on a Lakebase Autoscaling branch ([#5265](#5265)). * Add `postgres_synced_tables` resource to sync a Unity Catalog Delta table into a Postgres table on a Lakebase Autoscaling branch ([#5268](#5268)). * engine/direct: Changes to state file now persisted to .wal file right away instead of being saved in the end ([#5149](#5149))
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
databricks workspace import-dirwalks the source tree and copies every entry into the workspace verbatim — it has no awareness of.gitignoreor default exclusions. This change adds a name-based skip for.git,.databricks, andnode_modulesdirectories during the walk..gitignoreand other dotfiles at the root remain copied. If a user explicitly passes.git(or any of the others) as the source root, that root is still copied — the skip rule applies to entries encountered during recursion.Motivation: align
import-dirwithsync's existing defaultsdatabricks syncalready hard-codes skips for the same two directories that cause the most trouble:libs/git/repository.go—// Always ignore root .git directory.adds.gitto the default ignore rules unconditionally.libs/git/view.go(SetupDefaults) —// Hard code .databricks ignore pattern so that we never sync it (irrespective of .gitignore patterns).So
syncandimport-dircurrently produce different workspace contents for the same source tree:syncskips.git/and.databricks/,import-dircopies them. This PR closes that gap forimport-dirso the two commands behave consistently.node_modulesis the one entry that goes beyond whatsyncdoes by default. For any project with a typical.gitignore,syncwould already skip it via gitignore rules;import-dirignores.gitignoreentirely, so adding it to the name-based skip list keeps the behavior aligned with what users get fromsync.Why this matters in practice
databricks workspace import-diris commonly reached for as the inverse ofdatabricks workspace export-dir. Without these defaults, the imported tree carries:.git/directory, including its config and history..databricks/bundle cache, which can clobber state that bundle commands maintain remotely.node_modules/for JS/TS projects — large, slow to upload, and recreated by the runtime install step anyway.The canonical answer is
databricks sync, which respects.gitignoreand already excludes the first two by default. This PR is not a substitute forsync— it just bringsimport-dir's defaults into line for users who reach for it anyway.Test plan
.git/skipped, nested.git/skipped,.databricks/skipped,node_modules/skipped,.gitignorefile kept, explicit.gitroot copied (escape hatch).go test ./cmd/workspace/workspace/— passgolangci-lint run ./cmd/workspace/workspace/— cleanTestImportDir— unchanged, no.gitin its testdata so behavior is identical.This pull request and its description were written by Isaac.