Skip to content

Refactor CI: pull chroma from Docker image, track SHA via Heroku conf…#6

Merged
Gabrielebattimelli merged 1 commit intomainfrom
add-agent-skill
Apr 23, 2026
Merged

Refactor CI: pull chroma from Docker image, track SHA via Heroku conf…#6
Gabrielebattimelli merged 1 commit intomainfrom
add-agent-skill

Conversation

@Gabrielebattimelli
Copy link
Copy Markdown
Member

…ig var

Copilot AI review requested due to automatic review settings April 23, 2026 06:11
@Gabrielebattimelli Gabrielebattimelli merged commit ddd170a into main Apr 23, 2026
1 check passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the weekly indexing GitHub Actions workflow to make indexing incremental by restoring chroma/ from the currently deployed Heroku Docker image and tracking the last indexed PhysLib commit SHA via a Heroku config var (instead of committing state back to Heroku Git).

Changes:

  • Install Heroku CLI in the workflow and pull chroma/ from registry.heroku.com/... to seed incremental runs.
  • Replace .last-physlib-sha file tracking with LAST_PHYSLIB_SHA stored as a Heroku config var.
  • Remove the “push updated index to Heroku git remote” path and rely on rebuilding/releasing the container image.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +33 to +44
# Pull chroma/ from the live Docker image so the pipeline runs incrementally
- name: Extract ChromaDB from current Heroku image
env:
HEROKU_API_KEY: ${{ secrets.HEROKU_API_KEY }}
run: |
git remote add heroku "https://heroku:$HEROKU_API_KEY@git.heroku.com/physlibsearch.git"

# Restore chroma/ and the SHA tracker from the heroku git remote
- name: Restore ChromaDB and SHA tracker
run: |
git fetch heroku main
git checkout heroku/main -- chroma/ .last-physlib-sha 2>/dev/null \
|| echo "No prior index on heroku/main — starting fresh."
heroku container:login
docker pull registry.heroku.com/physlibsearch/web || echo "No existing image — starting fresh."
CID=$(docker create registry.heroku.com/physlibsearch/web 2>/dev/null) || true
if [ -n "$CID" ]; then
docker cp "$CID:/app/chroma" . 2>/dev/null || echo "No chroma/ in image — starting fresh."
docker rm "$CID"
fi
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ChromaDB extraction step runs unconditionally, even when PhysLib hasn't changed and the rest of the job is skipped. This adds a potentially large docker pull to every scheduled run; consider moving the PhysLib change check earlier (or gating the extraction step on has_changes) so the workflow can exit quickly when there's nothing to update.

Copilot uses AI. Check for mistakes.
Comment on lines 52 to 53
run: |
CURRENT_SHA=$(git ls-remote "$PHYSLIB_REPO" HEAD | cut -f1)
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CURRENT_SHA=$(git ls-remote ... | cut -f1) can silently produce an empty SHA if git ls-remote fails, because without pipefail the pipeline exit status is from cut. That can make the workflow treat a transient network failure as a change and later persist an empty/incorrect LAST_PHYSLIB_SHA. Add set -o pipefail (or avoid the pipeline) and explicitly fail if CURRENT_SHA is empty.

Suggested change
run: |
CURRENT_SHA=$(git ls-remote "$PHYSLIB_REPO" HEAD | cut -f1)
run: |
set -o pipefail
CURRENT_SHA=$(git ls-remote "$PHYSLIB_REPO" HEAD | cut -f1)
if [ -z "$CURRENT_SHA" ]; then
echo "Failed to resolve current PhysLib HEAD SHA from $PHYSLIB_REPO" >&2
exit 1
fi

Copilot uses AI. Check for mistakes.
run: |
CURRENT_SHA=$(git ls-remote "$PHYSLIB_REPO" HEAD | cut -f1)
LAST_SHA=$(cat .last-physlib-sha 2>/dev/null || echo "")
LAST_SHA=$(heroku config:get LAST_PHYSLIB_SHA --app physlibsearch 2>/dev/null || echo "")
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heroku config:get ... 2>/dev/null || echo "" suppresses all errors (including auth/permission issues) and treats them as "no prior SHA", which can trigger an unnecessary full reindex and then overwrite the stored SHA. Consider only defaulting to empty when the config var is genuinely missing, and otherwise failing the job so misconfiguration doesn't silently reset state.

Suggested change
LAST_SHA=$(heroku config:get LAST_PHYSLIB_SHA --app physlibsearch 2>/dev/null || echo "")
LAST_SHA_OUTPUT=$(heroku config:get LAST_PHYSLIB_SHA --app physlibsearch 2>/dev/null)
HEROKU_CONFIG_GET_STATUS=$?
if [ "$HEROKU_CONFIG_GET_STATUS" -ne 0 ]; then
echo "Failed to read LAST_PHYSLIB_SHA from Heroku app physlibsearch." >&2
exit "$HEROKU_CONFIG_GET_STATUS"
fi
LAST_SHA=$LAST_SHA_OUTPUT

Copilot uses AI. Check for mistakes.
Comment on lines +39 to +40
docker pull registry.heroku.com/physlibsearch/web || echo "No existing image — starting fresh."
CID=$(docker create registry.heroku.com/physlibsearch/web 2>/dev/null) || true
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docker pull ... || echo "No existing image" treats any pull failure (network hiccup, rate limit, registry outage) as "no existing image" and proceeds with a fresh index, which can hide operational issues and cause unexpectedly long runs. Consider failing on pull errors except for the specific "manifest unknown"/"not found" case.

Suggested change
docker pull registry.heroku.com/physlibsearch/web || echo "No existing image — starting fresh."
CID=$(docker create registry.heroku.com/physlibsearch/web 2>/dev/null) || true
CID=""
PULL_OUTPUT=$(docker pull registry.heroku.com/physlibsearch/web 2>&1)
PULL_STATUS=$?
if [ "$PULL_STATUS" -eq 0 ]; then
echo "$PULL_OUTPUT"
CID=$(docker create registry.heroku.com/physlibsearch/web 2>/dev/null) || true
elif printf '%s\n' "$PULL_OUTPUT" | grep -qiE 'manifest unknown|not found'; then
echo "No existing image — starting fresh."
else
echo "$PULL_OUTPUT"
exit "$PULL_STATUS"
fi

Copilot uses AI. Check for mistakes.
Comment on lines 157 to +159
heroku container:push web --app physlibsearch
heroku container:release web --app physlibsearch
heroku config:set LAST_PHYSLIB_SHA=${{ steps.check.outputs.current_sha }} --app physlibsearch
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Heroku app name (physlibsearch) is hard-coded in several commands (registry path, config:get, container:*). To reduce duplication and the chance of inconsistent edits later, consider defining it once as an env var (e.g., HEROKU_APP) and reusing it across steps.

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +31
- name: Install Heroku CLI
run: curl https://cli-assets.heroku.com/install.sh | sh
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Installing the Heroku CLI via curl ... | sh executes a remote script without verification, which is a supply-chain risk. Prefer a pinned install method (e.g., package manager with version pinning, or downloading a specific release artifact and verifying its checksum/signature).

Suggested change
- name: Install Heroku CLI
run: curl https://cli-assets.heroku.com/install.sh | sh
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install Heroku CLI
run: npm install --global heroku@9.3.0

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants