Skip to content

fix(core): clone repositories from warm mirror cache#341

Open
skulidropek wants to merge 4 commits into
mainfrom
vk/2562-github-138
Open

fix(core): clone repositories from warm mirror cache#341
skulidropek wants to merge 4 commits into
mainfrom
vk/2562-github-138

Conversation

@skulidropek
Copy link
Copy Markdown
Member

Fixes #138

Summary

  • Adds a shared git mirror cache mounted into project containers at /home/dev/.docker-git/.cache/git-mirrors.
  • Reuses a warm bare mirror as clone source only after successful authenticated refresh from the upstream remote.
  • Repairs/verifies mirror HEAD before cache reuse so stale issue-* fallback heads cannot break future clones.
  • Keeps PR/MR refs safe: clone from cache can happen first, but the requested PR/MR ref is fetched from the authenticated upstream URL.
  • Adds .kanban/changes/ABC-2 audit trail for the Vibe Kanban pipeline run.

Verification

  • bun run --cwd packages/lib test -- core/templates.test.ts -> 44 tests passed
  • bun run typecheck -> passed
  • bun run lint:effect -> passed
  • bun run test -> 124 test files / 744 tests passed
  • bun run lint -> passed
  • bun run build -> passed
  • DOCKER_GIT_API_URL=http://172.18.0.3:3336 DOCKER_GIT_API_CONTAINER_NAME=docker-git-api-cloudflared DOCKER_GIT_E2E_CLONE_CACHE_TIMEOUT=900s bash scripts/e2e/clone-cache.sh -> cache reuse verified

Mathematical Guarantees

  • refresh_success(cache, remote) -> may_clone_from(cache)
  • refresh_failure(cache, remote) -> clone_source = authenticated_remote
  • may_clone_from(cache) -> exists(cache.HEAD) && cache.HEAD in refs/heads/*
  • repoUrl equality -> same mirror key
  • requested repoRef preserved in final working tree

No merge was performed from the Kanban Ship stage.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

Документация

  • Новые возможности

    • Улучшена система кеширования при клонировании репозиториев с использованием зеркал в Docker-окружении.
    • Добавлена автоматическая валидация и восстановление состояния репозитория перед использованием из кеша.
    • Реализована стратегия fallback на оригинальный источник при недоступности кеша.
  • Документация

    • Добавлена полная документация процесса кеширования и выполненных проверок.

Walkthrough

This PR implements a git repository clone cache system by introducing a bare mirror repository per URL in shared Docker volume. The implementation conditionally switches clone operations to use the cached mirror after validating mirror HEAD health, while maintaining authenticated fetch for access verification and specific reference resolution.

Changes

Clone-cache repository implementation

Layer / File(s) Summary
Documentation and specifications
.kanban/changes/ABC-2/README.md, .kanban/changes/ABC-2/files.md, .kanban/changes/ABC-2/review.md, .kanban/changes/ABC-2/verification.md
Audit trail establishes clone-cache behavior: bare mirror per URL, limited refspecs for refresh (heads/tags only), HEAD validation and repair before use, and cache source switching only after successful refresh. Risk review documents P1 mitigations (invalid mirror HEAD and unsafe cache reuse). Verification checklist confirms passed checks (tests, typecheck, Docker build, lint) and e2e runtime behavior.
Cache initialization and mirror refresh
packages/app/src/lib/core/templates-entrypoint/tasks.ts, packages/lib/src/core/templates-entrypoint/tasks.ts
Initialize CLONE_SOURCE_REPO_URL with AUTH_REPO_URL at cache setup. Implement mirror refresh with narrowed refspecs excluding pull/merge refs. Check for usable HEAD (preferring main/master, falling back to any heads ref), repair HEAD with git symbolic-ref, and conditionally switch CLONE_SOURCE_REPO_URL to mirror path and set CLONE_CACHE_ARGS=--no-local only on successful usable HEAD discovery.
Clone operations with dynamic source switching
packages/app/src/lib/core/templates-entrypoint/tasks.ts, packages/lib/src/core/templates-entrypoint/tasks.ts
Switch git clone to use CLONE_SOURCE_REPO_URL instead of direct AUTH_REPO_URL. For pull/merge refs, clone from cache then fetch specific ref from auth source before checkout. For branch refs, attempt clone with --branch and retry without flag on failure. For unspecified refs, clone directly from cache source. Maintain auth source fetch for access verification and reference resolution.
Test validation of cache behavior
packages/lib/tests/core/templates.test.ts
Add assertions validating CLONE_SOURCE_REPO_URL initialization and conditional CLONE_CACHE_ARGS setting. Verify limited refspecs exclude broad pull/merge-requests. Confirm mirror HEAD validation and repair sequence. Assert --reference-if-able not used in warm-cache path. Validate refresh failure block explicitly prevents CLONE_SOURCE_REPO_URL reassignment to $CACHE_REPO_DIR.

🎯 3 (Moderate) | ⏱️ ~20 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
Requirements Alignment ❌ Error Review flagged unaddressed permission issues: git commands (symbolic-ref/show-ref/for-each-ref) run as root while fetch runs as sshUser; issues not fixed or documented in audit trail. Fix permission inconsistency by wrapping git commands in 'su - ${config.sshUser} -c' or document as known limitation with justification.
Security Regression ⚠️ Warning Command injection: AUTH_REPO_URL with tokens expanded inside su -c double quotes allows shell metacharacters to execute arbitrary commands without proper escaping. Escape AUTH_REPO_URL using printf %q before su -c, or validate tokens to exclude metacharacters; use safe argument passing instead of shell interpolation.
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title clearly summarizes the main change: implementing use of warm mirror cache for repository clones.
Description check ✅ Passed The description covers all required sections: linked issue (#138), clear summary of changes, verification steps with specific results, and mathematical guarantees.
Linked Issues check ✅ Passed The PR fully addresses issue #138 requirements: implements shared repository cache, reuses warm mirror cache for clones, performs authenticated refresh before reuse, verifies/repairs mirror HEAD, and preserves requested refs.
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #138 objectives: cache implementation, mirror refresh logic, HEAD verification, and test coverage for the new cache behavior.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch vk/2562-github-138

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.kanban/changes/ABC-2/review.md:
- Around line 17-33: The review found a command-injection risk where
AUTH_REPO_URL (and derived RESOLVED_GIT_AUTH_TOKEN/RESOLVED_GIT_AUTH_USER) get
interpolated into su - <sshUser> -c "... '$AUTH_REPO_URL' ..." inside single
quotes; update the code to eliminate that injection vector by never placing
untrusted values directly into a shell -c string: either (a) fully
validate/sanitize AUTH_REPO_URL and tokens to disallow any single-quote or shell
metacharacters (reject on invalid chars) in
validateGithubToken/validateGitlabToken and the REPO_URL handling, or (b) avoid
shell interpolation altogether by invoking git via exec/array args or a safe API
under su (e.g., use sudo -u/exec with argument array) so that AUTH_REPO_URL,
RESOLVED_GIT_AUTH_TOKEN and RESOLVED_GIT_AUTH_USER are passed as arguments
rather than embedded in a single-quoted command; apply this change to the code
paths that form the su -c command that include AUTH_REPO_URL and to any helpers
that build that command.

In `@packages/app/src/lib/core/templates-entrypoint/tasks.ts`:
- Around line 151-155: The git commands that read refs (commands setting and
checking CACHE_HEAD_REF using symbolic-ref, show-ref, for-each-ref against
CACHE_REPO_DIR) are being run as root while the prior fetch runs under
${config.sshUser}, causing permission issues; change the script so those
commands run as the same user (use the same su - ${config.sshUser} context or
run them via sudo -u ${config.sshUser}) when operating on CACHE_REPO_DIR and
when assigning CACHE_HEAD_REF, ensuring all git invocations (fetch,
symbolic-ref, show-ref, for-each-ref) execute with the repository owner’s
privileges.

In `@packages/lib/src/core/templates-entrypoint/tasks.ts`:
- Around line 151-155: The git cache commands (symbolic-ref, show-ref,
for-each-ref) are being executed as root while fetch runs as ${config.sshUser},
causing permission inconsistencies; update the calls that set and check
CACHE_HEAD_REF (the git --git-dir "$CACHE_REPO_DIR" symbolic-ref, show-ref and
for-each-ref invocations) to run under the same user as fetch (e.g., prefix
those git commands with the same sudo/runuser invocation used for fetch or
otherwise switch to ${config.sshUser} before executing them) so the cache
directory is always accessed by the same SSH user.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: eb112aec-4a31-4d41-8302-be6d215a7630

📥 Commits

Reviewing files that changed from the base of the PR and between 9140d58 and 5410944.

📒 Files selected for processing (7)
  • .kanban/changes/ABC-2/README.md
  • .kanban/changes/ABC-2/files.md
  • .kanban/changes/ABC-2/review.md
  • .kanban/changes/ABC-2/verification.md
  • packages/app/src/lib/core/templates-entrypoint/tasks.ts
  • packages/lib/src/core/templates-entrypoint/tasks.ts
  • packages/lib/tests/core/templates.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: E2E (Browser command)
  • GitHub Check: E2E (Clone cache)
  • GitHub Check: Test
  • GitHub Check: E2E (OpenCode)
  • GitHub Check: E2E (Runtime volumes + SSH)
  • GitHub Check: E2E (Clone auto-open SSH)
  • GitHub Check: Lint
  • GitHub Check: E2E (Login context)
  • GitHub Check: Final build (windows-latest)
🧰 Additional context used
📓 Path-based instructions (9)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Implement Functional Core, Imperative Shell (FCIS) pattern: CORE layer contains only pure functions with immutable data and mathematical operations; SHELL layer isolates all effects (IO, network, database). Strict dependency direction: SHELL → CORE (never reverse).
Never use any, unknown, eslint-disable, ts-ignore, or as type assertions (except in rigorously justified cases with documentation). Always use exhaustive union type analysis through .exhaustive() pattern matching.
All external dependencies must be wrapped through typed interfaces and injected via Effect-TS Layer pattern. Never call external services directly from CORE functions.
Use monadic composition with Effect-TS for all effects: Effect<Success, Error, Requirements>. Compose effects through pipe() and Effect.flatMap(). Implement dependency injection via Layer pattern. Handle errors without try/catch blocks.
All functions must be pure in the CORE layer: no side effects (logging, console output, IO operations, mutations). Separate all side effects into the SHELL layer.
Use exhaustive pattern matching with Effect.Match instead of switch statements. Example: Match.value(item).pipe(Match.when(...), Match.exhaustive).
Document all functions with comprehensive TSDoc including: @pure (true/false), @effect (required services), @invariant (mathematical invariants), @precondition, @postcondition, @complexity (time and space), @throws Never (errors must be typed in Effect).
Use functional comment markers for code clarity: CHANGE (brief description), WHY (mathematical/architectural justification), QUOTE(ТЗ) (requirement citation), REF (RTM or message ID), SOURCE (external source with quote), FORMAT THEOREM (∀x ∈ Domain: P(x) → Q(f(x))), PURITY (CORE|SHELL), EFFECT (Effect type signature), INVARIANT (mathematical invariant), COMPLEXITY (time/space).
Define all external service dependencies as Context.Tag classes with fully typed methods returning Effect types. Example: `class Da...

Files:

  • packages/lib/tests/core/templates.test.ts
  • packages/app/src/lib/core/templates-entrypoint/tasks.ts
  • packages/lib/src/core/templates-entrypoint/tasks.ts
**/*.test.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.test.{ts,tsx}: Implement property-based testing using fast-check for mathematical properties and invariants. Example: fc.property(fc.array(messageArbitrary), (messages) => isChronologicallySorted(sortMessagesByTimestamp(messages))).
Mock external dependencies in unit tests using Effect's testing utilities. Run tests without Effect runtime for speed. Example: Effect.provide(MockService), Effect.runPromise.

Files:

  • packages/lib/tests/core/templates.test.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

**/*.{ts,tsx,js,jsx}: Forbidden constructs in CORE code: any, eslint-disable, ts-ignore, async/await, raw Promise chains (then/catch), Promise.all, try/catch for logic control, console.*, switch statements (use Match with .exhaustive() instead)
All functions must use Effect-TS for composing effects: Effect<Success, Error, Requirements>. No direct async/await, Promise chains, or try/catch in product logic.
Functional comments must include: CHANGE, WHY, QUOTE(ТЗ) or n/a, REF, SOURCE or n/a, FORMAT THEOREM, PURITY (CORE|SHELL), EFFECT signature for SHELL functions, INVARIANT, and COMPLEXITY.
All data mutations must use immutable patterns (ReadonlyArray, readonly properties, Object.freeze); mutation in SHELL only when absolutely necessary and documented.

Files:

  • packages/lib/tests/core/templates.test.ts
  • packages/app/src/lib/core/templates-entrypoint/tasks.ts
  • packages/lib/src/core/templates-entrypoint/tasks.ts
**/*.{test,spec}.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Property-based tests (fast-check) must verify mathematical invariants; unit tests must use Effect test utilities without async/await.

Files:

  • packages/lib/tests/core/templates.test.ts
**/*.{sh,bash,py,js,ts,jsx,tsx,go,java,rb,php}

📄 CodeRabbit inference engine (Custom checks)

Fail if changed files introduce command injection or unsafe shell/process execution with user-controlled input

Files:

  • packages/lib/tests/core/templates.test.ts
  • packages/app/src/lib/core/templates-entrypoint/tasks.ts
  • packages/lib/src/core/templates-entrypoint/tasks.ts
**/*.{py,js,ts,jsx,tsx,go,java,rb,php,sh,bash,c,cpp}

📄 CodeRabbit inference engine (Custom checks)

Fail if changed files introduce path traversal or writes outside intended project/container state directories

Files:

  • packages/lib/tests/core/templates.test.ts
  • packages/app/src/lib/core/templates-entrypoint/tasks.ts
  • packages/lib/src/core/templates-entrypoint/tasks.ts
**/*.{js,ts,jsx,tsx,py,java,go,rb,php,sh,bash,yml,yaml,json,env*,toml,cfg,config,dockerfile,dockerignore}

📄 CodeRabbit inference engine (Custom checks)

Fail if changed files expose credentials, tokens, private-keys, or PII in source, generated config, logs, or CI output

Files:

  • packages/lib/tests/core/templates.test.ts
  • packages/app/src/lib/core/templates-entrypoint/tasks.ts
  • packages/lib/src/core/templates-entrypoint/tasks.ts
**/*

⚙️ CodeRabbit configuration file

**/*: Ты строгий ревьюер SPEC DRIVEN DEVELOPMENT.

Перед выводами изучи README.md, другие *.md файлы, linked issues,
PR description, PR comments/discussion и релевантную кодовую базу.

Сверь изменения с исходным ТЗ/спекой и обсуждением. Флагай любой уход
от спеки, недокументированное изменение поведения, отсутствие тестов
для заявленного поведения и security-риск. Если спека не видна,
попроси автора добавить ее в issue или PR description.

Проверь решение с точки зрения формальной верификации: какие инварианты,
предусловия и постусловия можно доказать математически, а где доказуемость
слабая. Оцени решение с точки зрения теории игр: устойчивы ли стимулы,
нет ли выгодного обхода правил, и какое решение было бы сильнее.

Files:

  • packages/lib/tests/core/templates.test.ts
  • packages/app/src/lib/core/templates-entrypoint/tasks.ts
  • packages/lib/src/core/templates-entrypoint/tasks.ts
packages/app/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (README.md)

App layer (APP) should work only with API and not have direct access to LIB layer

Files:

  • packages/app/src/lib/core/templates-entrypoint/tasks.ts
🪛 LanguageTool
.kanban/changes/ABC-2/files.md

[style] ~22-~22: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...cing cache use after refresh failure. - Guards mirror HEAD validation/repair before ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🔇 Additional comments (9)
packages/app/src/lib/core/templates-entrypoint/tasks.ts (2)

133-133: LGTM!

Also applies to: 156-161


184-184: LGTM!

Also applies to: 188-188, 194-194, 196-196, 208-208

packages/lib/src/core/templates-entrypoint/tasks.ts (2)

133-133: LGTM!

Also applies to: 156-161


184-184: LGTM!

Also applies to: 188-188, 194-194, 196-196, 208-208

packages/lib/tests/core/templates.test.ts (1)

302-324: LGTM!

.kanban/changes/ABC-2/README.md (1)

1-52: LGTM!

.kanban/changes/ABC-2/verification.md (1)

1-46: LGTM!

.kanban/changes/ABC-2/review.md (1)

47-55: ⚡ Quick win

Missing input: Provide the original review comment inside <review_comment>...</review_comment> (and any verification outputs/results you have) so I can rewrite it to the required format.

.kanban/changes/ABC-2/files.md (1)

11-13: 🏗️ Heavy lift

Provide the original <review_comment>...</review_comment> text (and any relevant diff/snippet) so I can rewrite it.

Comment on lines +17 to +33
The entrypoint now computes `CACHE_HEAD_REF`, verifies it exists with:

```bash
git --git-dir "$CACHE_REPO_DIR" show-ref --verify --quiet "$CACHE_HEAD_REF"
```

If it is missing, the entrypoint selects the first existing branch from:

```bash
refs/heads/main refs/heads/master refs/heads
```

Then it repairs `HEAD` via:

```bash
git --git-dir "$CACHE_REPO_DIR" symbolic-ref HEAD "$CACHE_HEAD_REF"
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Поиск использования переменных в shell-командах без валидации

echo "=== Checking for variable usage in git commands ==="
rg -n -A3 -B3 'git.*\$CACHE_REPO_DIR|symbolic-ref.*\$CACHE_HEAD_REF' \
  packages/lib/src/core/templates-entrypoint/tasks.ts \
  packages/app/src/lib/core/templates-entrypoint/tasks.ts

echo ""
echo "=== Checking for sha256 hashing and validation ==="
rg -n 'sha256|CACHE_REPO_DIR.*=|CACHE_HEAD_REF.*=' \
  packages/lib/src/core/templates-entrypoint/tasks.ts \
  packages/app/src/lib/core/templates-entrypoint/tasks.ts

echo ""
echo "=== Checking for input sanitization patterns ==="
rg -n 'validate|sanitize|escape|refs/heads' \
  packages/lib/src/core/templates-entrypoint/tasks.ts \
  packages/app/src/lib/core/templates-entrypoint/tasks.ts

Repository: ProverCoderAI/docker-git

Length of output: 8521


🏁 Script executed:

#!/bin/bash
set -euo pipefail

files=(
  "packages/app/src/lib/core/templates-entrypoint/tasks.ts"
  "packages/lib/src/core/templates-entrypoint/tasks.ts"
)

for f in "${files[@]}"; do
  echo "===== $f (lines 110-180) ====="
  nl -ba "$f" | sed -n '110,180p'
  echo
done

echo "===== Search: CACHE_HEAD_REF assignments ====="
rg -n "CACHE_HEAD_REF" packages/app/src/lib/core/templates-entrypoint/tasks.ts packages/lib/src/core/templates-entrypoint/tasks.ts

echo
echo "===== Search: symbolic-ref usage ====="
rg -n "symbolic-ref" packages/app/src/lib/core/templates-entrypoint/tasks.ts packages/lib/src/core/templates-entrypoint/tasks.ts

echo
echo "===== Search: REPO_URL / AUTH_REPO_URL usage ====="
rg -n "REPO_URL|AUTH_REPO_URL|GIT_TERMINAL_PROMPT|cloneCacheRefreshRefspecs" packages/app/src/lib/core/templates-entrypoint/tasks.ts packages/lib/src/core/templates-entrypoint/tasks.ts

echo
echo "===== Search: refspec injection points ====="
rg -n "for-each-ref|show-ref --verify|--format='%\(\)?" packages/app/src/lib/core/templates-entrypoint/tasks.ts packages/lib/src/core/templates-entrypoint/tasks.ts || true

Repository: ProverCoderAI/docker-git

Length of output: 214


🏁 Script executed:

#!/bin/bash
set -euo pipefail

files=(
  "packages/app/src/lib/core/templates-entrypoint/tasks.ts"
  "packages/lib/src/core/templates-entrypoint/tasks.ts"
)

for f in "${files[@]}"; do
  echo "===== $f (lines 110-180) ====="
  nl -ba "$f" | sed -n '110,180p'
  echo
done

echo "===== Search: CACHE_HEAD_REF assignments ====="
rg -n "CACHE_HEAD_REF" packages/app/src/lib/core/templates-entrypoint/tasks.ts packages/lib/src/core/templates-entrypoint/tasks.ts

echo
echo "===== Search: symbolic-ref usage ====="
rg -n "symbolic-ref" packages/app/src/lib/core/templates-entrypoint/tasks.ts packages/lib/src/core/templates-entrypoint/tasks.ts

echo
echo "===== Search: REPO_URL / AUTH_REPO_URL usage ====="
rg -n "REPO_URL|AUTH_REPO_URL|GIT_TERMINAL_PROMPT|cloneCacheRefreshRefspecs" packages/app/src/lib/core/templates-entrypoint/tasks.ts packages/lib/src/core/templates-entrypoint/tasks.ts

echo
echo "===== Search: refspec injection points ====="
rg -n "for-each-ref|show-ref --verify|--format='%(refname)'" packages/app/src/lib/core/templates-entrypoint/tasks.ts packages/lib/src/core/templates-entrypoint/tasks.ts || true

Repository: ProverCoderAI/docker-git

Length of output: 214


🏁 Script executed:

#!/bin/bash
set -euo pipefail

files=(
  "packages/app/src/lib/core/templates-entrypoint/tasks.ts"
  "packages/lib/src/core/templates-entrypoint/tasks.ts"
)

show_range() {
  local f="$1" start="$2" end="$3"
  echo "===== $f (lines ${start}-${end}) ====="
  awk -v s="$start" -v e="$end" '{ if (NR>=s && NR<=e) printf "%6d: %s\n", NR, $0 }' "$f"
  echo
}

for f in "${files[@]}"; do
  show_range "$f" 110 200
done

echo "===== Search: CACHE_HEAD_REF assignments ====="
rg -n "CACHE_HEAD_REF" packages/app/src/lib/core/templates-entrypoint/tasks.ts packages/lib/src/core/templates-entrypoint/tasks.ts

echo
echo "===== Search: symbolic-ref usage ====="
rg -n "symbolic-ref" packages/app/src/lib/core/templates-entrypoint/tasks.ts packages/lib/src/core/templates-entrypoint/tasks.ts

echo
echo "===== Search: REPO_URL / AUTH_REPO_URL usage ====="
rg -n "REPO_URL|AUTH_REPO_URL|GIT_TERMINAL_PROMPT|cloneCacheRefreshRefspecs" packages/app/src/lib/core/templates-entrypoint/tasks.ts packages/lib/src/core/templates-entrypoint/tasks.ts

echo
echo "===== Search: refspec injection points ====="
rg -n "for-each-ref|show-ref --verify|for-each-ref|--format='%\(\)?" packages/app/src/lib/core/templates-entrypoint/tasks.ts packages/lib/src/core/templates-entrypoint/tasks.ts || true

Repository: ProverCoderAI/docker-git

Length of output: 24979


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n "REPO_URL|repoUrl|repo_url" packages -S --glob='*.ts' --glob='*.tsx' --glob='*.js' --glob='*.mjs' --glob='*.cjs'

echo
echo "=== Search for sanitization/validation of REPO_URL ==="
rg -n "sanitize|escape|replace\\(|replaceAll\\(|encode|urlencode|decode|regex|match|RegExp|allowed|whitelist|blacklist|validate|assert" packages -S --glob='*.ts' --glob='*.tsx' --glob='*.js'

echo
echo "=== Search where '$AUTH_REPO_URL' is injected into bash templates ==="
rg -n "\\$AUTH_REPO_URL" packages -S --glob='*.ts'

echo
echo "=== Search for validation/formatting of RESOLVED_GIT_AUTH_TOKEN/user (single quotes) ==="
rg -n "RESOLVED_GIT_AUTH_TOKEN|RESOLVED_GIT_AUTH_USER|token" packages/lib packages/app -S --glob='*.ts'

Repository: ProverCoderAI/docker-git

Length of output: 50451


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Найти реализации парсинга repoUrl
rg -n "parseGithubRepoUrl" packages/app -S --glob='*.ts'
rg -n "parseGitlabRepoUrl" packages/app -S --glob='*.ts'

echo "----"

# 2) Показать ключевые участки репо-парсеров (ограничим выдачу)
# (Используем awk, т.к. nl нет.)
for f in \
  "$(rg -l "parseGithubRepoUrl" packages/app --glob='*.ts' | head -n 1)" \
  "$(rg -l "parseGitlabRepoUrl" packages/app --glob='*.ts' | head -n 1)"
do
  echo "===== $f (пробуем показать 1-260 строк) ====="
  awk 'NR>=1 && NR<=260 {printf "%6d: %s\n", NR, $0}' "$f"
  echo
done

echo "----"

# 3) Проверить, есть ли явные ограничения/валидация URL на уровне API/схем
rg -n "Schema\\.(String|.*)=|zod|superstruct|refine|regex|pattern|whitelist|sanitize|escape|reject" packages/api packages/app -S --glob='*.ts' --glob='*.tsx'

Repository: ProverCoderAI/docker-git

Length of output: 45395


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# helper to print a line range with line numbers without `nl`
print_range() {
  local f="$1" a="$2" b="$3"
  echo "===== $f ($a-$b) ====="
  awk -v s="$a" -v e="$b" '{ if (NR>=s && NR<=e) printf "%6d: %s\n", NR, $0 }' "$f"
  echo
}

# repo parsing / normalization
print_range "packages/app/src/lib/core/repo.ts" 180 420

# entrypoint tasks: how tokens are selected / how bash templates set REPO_URL/AUTH vars
print_range "packages/app/src/lib/core/templates-entrypoint/tasks.ts" 1 220

Repository: ProverCoderAI/docker-git

Length of output: 21529


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n "clone-cache|cloneCacheRefreshRefspecs|symbolic-ref HEAD|CACHE_HEAD_REF|mirror refresh|git --git-dir '.*CACHE_REPO_DIR' fetch" packages/lib/tests packages/app/tests -S --glob='*.ts' --glob='*.tsx' --glob='*.js' --glob='*.mjs'

echo "----"

rg -n "templates\\.test\\.ts" -S packages/lib/tests/core/templates.test.ts
sed -n '600,760p' packages/lib/tests/core/templates.test.ts 2>/dev/null || true
sed -n '280,360p' packages/lib/tests/core/templates.test.ts 2>/dev/null || true

Repository: ProverCoderAI/docker-git

Length of output: 1688


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n "REPO_URL\\s*:|AUTH_REPO_URL|GIT_AUTH_TOKEN|GIT_AUTH_USER|RESOLVED_GIT_AUTH_TOKEN|RESOLVED_GIT_AUTH_USER|GITHUB_AUTH_SKIP|GITLAB_TOKEN|GIT_AUTH_LABEL|GIT_AUTH_USER" packages/app/src/lib/core/templates packages/app/src/lib/core/templates-entrypoint packages/app/src/lib/core/templates/docker-compose.ts packages/app/src/lib/core -S --glob='*.ts'

echo "----"

# show docker-compose env blocks around REPO_URL
f="packages/app/src/lib/core/templates/docker-compose.ts"
if [ -f "$f" ]; then
  awk 'NR>=200 && NR<=280 {printf "%6d: %s\n", NR, $0}' "$f"
fi

echo "----"

# find where entrypoint variables are exported in entrypoint scripts
rg -n "export .*REPO_URL|REPO_URL=\"\\$\\{?config\\.repoUrl\\}?\"|REPO_URL:|GIT_AUTH_TOKEN:|GITLAB_TOKEN:|GITHUB_AUTH_LABEL:" packages/app/src/lib/core -S --glob='*.ts' --glob='*.tsx'

Repository: ProverCoderAI/docker-git

Length of output: 26151


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n "GIT_AUTH_TOKEN|GITHUB_TOKEN|LABELED_GIT_TOKEN|RESOLVED_GIT_AUTH_TOKEN|GIT_AUTH_USER|RESOLVED_GIT_AUTH_USER|REPO_URL" packages/app/src packages/api/src -S --glob='*.ts' --glob='*.tsx'

echo "----"

rg -n "token.*(regex|pattern|whitelist|sanitize|escape|reject|shell|metachar|quote|single quote|%27)|'\\s*\\+|%27|escapeForSingleQuotes" packages/app/src -S --glob='*.ts' --glob='*.tsx'

echo "----"

# Ключевая проверка: есть ли запрет на `'`/shell metachar в parser'ах для токенов/репо
rg -n "rejects|disallow|forbid|shell metachar|metachar|unsafe|\\bquote\\b|single quote|\\x27|%27" packages/app/tests packages/app/src -S --glob='*.ts' --glob='*.tsx'

Repository: ProverCoderAI/docker-git

Length of output: 50380


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Показать блок тестов про clone-cache около найденных строк ~285-350 и ~630-670
f="packages/lib/tests/core/templates.test.ts"
echo "===== $f: clone-cache tests (~260-360) ====="
awk 'NR>=260 && NR<=370 {printf "%6d: %s\n", NR, $0}' "$f"

echo
echo "===== $f: clone-cache refspecs test (~620-690) ====="
awk 'NR>=620 && NR<=690 {printf "%6d: %s\n", NR, $0}' "$f"

echo
# 2) Поиск тестов/валидаций по токенам и запрету кавычек/метасимволов
echo "===== Search: token validation / rejecting quotes/metacharacters ====="
rg -n "rejects|invalid|validate|whitelist|pattern|regex|metachar|single quote|\\x27|%27|shell|command injection|unsafe" packages/app/tests packages/app/src/lib/usecases packages/api/src -S --glob='*.ts' --glob='*.tsx'

Repository: ProverCoderAI/docker-git

Length of output: 50380


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Показать компактно реализации валидации токенов
show_range() {
  f="$1"; a="$2"; b="$3";
  echo "===== $f ($a-$b) ====="
  awk -v s="$a" -v e="$b" '{ if (NR>=s && NR<=e) printf "%6d: %s\n", NR, $0 }' "$f"
  echo
}

show_range "packages/app/src/lib/usecases/github-token-validation.ts" 1 220
show_range "packages/app/src/lib/usecases/gitlab-token-validation.ts" 1 240

echo "----"

# Иногда валидация токена завернута в preflight; покажем обвязку
show_range "packages/app/src/lib/usecases/github-token-preflight.ts" 1 260
show_range "packages/app/src/lib/usecases/gitlab-token-preflight.ts" 1 260

Repository: ProverCoderAI/docker-git

Length of output: 24478


Проведена оценка защиты clone-cache от path traversal/инъекций — подтвердить полностью не получается из‑за AUTH_REPO_URL

  • Path traversal: CACHE_REPO_DIR строится из REPO_CACHE_KEY как sha256sum/shasum от "$REPO_URL" (а при отсутствии — fallback с чисткой до [:alnum:]_.-), затем используется как подкаталог "$CACHE_ROOT". В этом виде записи не допускают выход из cache-директории.
  • Инъекция через CACHE_HEAD_REF: значение берётся из Git (symbolic-ref/for-each-ref) внутри конкретного mirror-репозитория и дополнительно проверяется git show-ref --verify --quiet "$CACHE_HEAD_REF"; дальнейшее использование symbolic-ref HEAD "$CACHE_HEAD_REF" идёт в double quotes, поэтому классический shell-injection из значения ref не просматривается.
  • Остаётся уязвимый участок (command injection) через AUTH_REPO_URL: в su - <sshUser> -c "... '$AUTH_REPO_URL' ..." AUTH_REPO_URL подставляется внутрь single quotes. Валидация токенов (validateGithubToken/validateGitlabToken) проверяет доступность токена сетью, но не ограничивает формат (например, отсутствие '/shell-метасимволов). Если RESOLVED_GIT_AUTH_TOKEN/RESOLVED_GIT_AUTH_USER или сам REPO_URL может содержать ', это может разорвать single-quoted фрагмент в -c и привести к инъекции.
  • Тесты clone-cache: они проверяют “repair HEAD” через symbolic-ref/show-ref и ограничение refspec только refs/heads/* + refs/tags/*, но не покрывают кейсы с кавычками/опасными символами в AUTH_REPO_URL.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.kanban/changes/ABC-2/review.md around lines 17 - 33, The review found a
command-injection risk where AUTH_REPO_URL (and derived
RESOLVED_GIT_AUTH_TOKEN/RESOLVED_GIT_AUTH_USER) get interpolated into su -
<sshUser> -c "... '$AUTH_REPO_URL' ..." inside single quotes; update the code to
eliminate that injection vector by never placing untrusted values directly into
a shell -c string: either (a) fully validate/sanitize AUTH_REPO_URL and tokens
to disallow any single-quote or shell metacharacters (reject on invalid chars)
in validateGithubToken/validateGitlabToken and the REPO_URL handling, or (b)
avoid shell interpolation altogether by invoking git via exec/array args or a
safe API under su (e.g., use sudo -u/exec with argument array) so that
AUTH_REPO_URL, RESOLVED_GIT_AUTH_TOKEN and RESOLVED_GIT_AUTH_USER are passed as
arguments rather than embedded in a single-quoted command; apply this change to
the code paths that form the su -c command that include AUTH_REPO_URL and to any
helpers that build that command.

Comment on lines +151 to +155
CACHE_HEAD_REF="$(git --git-dir "$CACHE_REPO_DIR" symbolic-ref -q HEAD 2>/dev/null || true)"
if [[ -z "$CACHE_HEAD_REF" ]] || ! git --git-dir "$CACHE_REPO_DIR" show-ref --verify --quiet "$CACHE_HEAD_REF"; then
CACHE_HEAD_REF="$(git --git-dir "$CACHE_REPO_DIR" for-each-ref --format='%(refname)' refs/heads/main refs/heads/master refs/heads | head -n 1 || true)"
fi
if [[ -n "$CACHE_HEAD_REF" ]] && git --git-dir "$CACHE_REPO_DIR" symbolic-ref HEAD "$CACHE_HEAD_REF"; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Несогласованность прав доступа при работе с git-командами кэша.

Команда fetch на строке 150 выполняется от имени ${config.sshUser} через su -, однако последующие команды symbolic-ref, show-ref и for-each-ref (строки 151-155) выполняются от root. Это может привести к проблемам с правами доступа, если директория кэша принадлежит пользователю dev.

🐛 Предлагаемое исправление: выполнять git-команды от того же пользователя
-          CACHE_HEAD_REF="$(git --git-dir "$CACHE_REPO_DIR" symbolic-ref -q HEAD 2>/dev/null || true)"
-          if [[ -z "$CACHE_HEAD_REF" ]] || ! git --git-dir "$CACHE_REPO_DIR" show-ref --verify --quiet "$CACHE_HEAD_REF"; then
-            CACHE_HEAD_REF="$(git --git-dir "$CACHE_REPO_DIR" for-each-ref --format='%(refname)' refs/heads/main refs/heads/master refs/heads | head -n 1 || true)"
-          fi
-          if [[ -n "$CACHE_HEAD_REF" ]] && git --git-dir "$CACHE_REPO_DIR" symbolic-ref HEAD "$CACHE_HEAD_REF"; then
+          CACHE_HEAD_REF="$(su - ${config.sshUser} -c "git --git-dir '$CACHE_REPO_DIR' symbolic-ref -q HEAD 2>/dev/null || true")"
+          if [[ -z "$CACHE_HEAD_REF" ]] || ! su - ${config.sshUser} -c "git --git-dir '$CACHE_REPO_DIR' show-ref --verify --quiet '$CACHE_HEAD_REF'"; then
+            CACHE_HEAD_REF="$(su - ${config.sshUser} -c "git --git-dir '$CACHE_REPO_DIR' for-each-ref --format='%(refname)' refs/heads/main refs/heads/master refs/heads | head -n 1 || true")"
+          fi
+          if [[ -n "$CACHE_HEAD_REF" ]] && su - ${config.sshUser} -c "git --git-dir '$CACHE_REPO_DIR' symbolic-ref HEAD '$CACHE_HEAD_REF'"; then
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/app/src/lib/core/templates-entrypoint/tasks.ts` around lines 151 -
155, The git commands that read refs (commands setting and checking
CACHE_HEAD_REF using symbolic-ref, show-ref, for-each-ref against
CACHE_REPO_DIR) are being run as root while the prior fetch runs under
${config.sshUser}, causing permission issues; change the script so those
commands run as the same user (use the same su - ${config.sshUser} context or
run them via sudo -u ${config.sshUser}) when operating on CACHE_REPO_DIR and
when assigning CACHE_HEAD_REF, ensuring all git invocations (fetch,
symbolic-ref, show-ref, for-each-ref) execute with the repository owner’s
privileges.

Comment on lines +151 to +155
CACHE_HEAD_REF="$(git --git-dir "$CACHE_REPO_DIR" symbolic-ref -q HEAD 2>/dev/null || true)"
if [[ -z "$CACHE_HEAD_REF" ]] || ! git --git-dir "$CACHE_REPO_DIR" show-ref --verify --quiet "$CACHE_HEAD_REF"; then
CACHE_HEAD_REF="$(git --git-dir "$CACHE_REPO_DIR" for-each-ref --format='%(refname)' refs/heads/main refs/heads/master refs/heads | head -n 1 || true)"
fi
if [[ -n "$CACHE_HEAD_REF" ]] && git --git-dir "$CACHE_REPO_DIR" symbolic-ref HEAD "$CACHE_HEAD_REF"; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Несогласованность прав доступа при работе с git-командами кэша.

Аналогичная проблема с правами доступа: команды symbolic-ref, show-ref и for-each-ref выполняются от root, тогда как fetch выполняется от ${config.sshUser}. Директория кэша создаётся и заполняется от имени пользователя, поэтому git-команды также должны выполняться от того же пользователя для согласованности.

🐛 Предлагаемое исправление
-          CACHE_HEAD_REF="$(git --git-dir "$CACHE_REPO_DIR" symbolic-ref -q HEAD 2>/dev/null || true)"
-          if [[ -z "$CACHE_HEAD_REF" ]] || ! git --git-dir "$CACHE_REPO_DIR" show-ref --verify --quiet "$CACHE_HEAD_REF"; then
-            CACHE_HEAD_REF="$(git --git-dir "$CACHE_REPO_DIR" for-each-ref --format='%(refname)' refs/heads/main refs/heads/master refs/heads | head -n 1 || true)"
-          fi
-          if [[ -n "$CACHE_HEAD_REF" ]] && git --git-dir "$CACHE_REPO_DIR" symbolic-ref HEAD "$CACHE_HEAD_REF"; then
+          CACHE_HEAD_REF="$(su - ${config.sshUser} -c "git --git-dir '$CACHE_REPO_DIR' symbolic-ref -q HEAD 2>/dev/null || true")"
+          if [[ -z "$CACHE_HEAD_REF" ]] || ! su - ${config.sshUser} -c "git --git-dir '$CACHE_REPO_DIR' show-ref --verify --quiet '$CACHE_HEAD_REF'"; then
+            CACHE_HEAD_REF="$(su - ${config.sshUser} -c "git --git-dir '$CACHE_REPO_DIR' for-each-ref --format='%(refname)' refs/heads/main refs/heads/master refs/heads | head -n 1 || true")"
+          fi
+          if [[ -n "$CACHE_HEAD_REF" ]] && su - ${config.sshUser} -c "git --git-dir '$CACHE_REPO_DIR' symbolic-ref HEAD '$CACHE_HEAD_REF'"; then
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/lib/src/core/templates-entrypoint/tasks.ts` around lines 151 - 155,
The git cache commands (symbolic-ref, show-ref, for-each-ref) are being executed
as root while fetch runs as ${config.sshUser}, causing permission
inconsistencies; update the calls that set and check CACHE_HEAD_REF (the git
--git-dir "$CACHE_REPO_DIR" symbolic-ref, show-ref and for-each-ref invocations)
to run under the same user as fetch (e.g., prefix those git commands with the
same sudo/runuser invocation used for fetch or otherwise switch to
${config.sshUser} before executing them) so the cache directory is always
accessed by the same SSH user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Надо настроить кэш на скачивание репозиториев

1 participant