chore(deps): Bump actions/upload-pages-artifact from 3 to 4 by dependabot[bot] · Pull Request #1 · anulum/director-ai

dependabot · 2026-03-01T23:31:26Z

Bumps actions/upload-pages-artifact from 3 to 4.

Release notes

Sourced from actions/upload-pages-artifact's releases.

v4.0.0

What's Changed

Potentially breaking change: hidden files (specifically dotfiles) will not be included in the artifact by @tsusdere in actions/upload-pages-artifact#102 If you need to include dotfiles in your artifact: instead of using this action, create your own artifact according to these requirements https://github.com/actions/upload-pages-artifact?tab=readme-ov-file#artifact-validation

Pin actions/upload-artifact to SHA by @heavymachinery in actions/upload-pages-artifact#127

Full Changelog: actions/upload-pages-artifact@v3.0.1...v4.0.0

v3.0.1

Changelog

Group tar's output to prevent it from messing up action logs @SilverRainZ (#94)

Update README.md @uiolee (#88)

Bump the non-breaking-changes group with 1 update @dependabot (#92)

Update Dependabot config to group non-breaking changes @JamesMGreene (#91)

Bump actions/checkout from 3 to 4 @dependabot (#76)

See details of all code changes since previous release.

Commits

7b1f4a7 Merge pull request #127 from heavymachinery/pin-sha
4cc19c7 Pin actions/upload-artifact to SHA
2d163be Merge pull request #107 from KittyChiu/main
c704843 fix: linted README
9605915 Merge pull request #106 from KittyChiu/kittychiu/update-readme-1
e59cdfe Update README.md
a2d6704 doc: updated usage section in readme
984864e Merge pull request #105 from actions/Jcambass-patch-1
45dc788 Add workflow file for publishing releases to immutable action package
efaad07 Merge pull request #102 from actions/hidden-files
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [actions/upload-pages-artifact](https://github.com/actions/upload-pages-artifact) from 3 to 4. - [Release notes](https://github.com/actions/upload-pages-artifact/releases) - [Commits](actions/upload-pages-artifact@v3...v4) --- updated-dependencies: - dependency-name: actions/upload-pages-artifact dependency-version: '4' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>

dependabot · 2026-03-01T23:31:27Z

Labels

The following labels could not be found: ci, dependencies. Please create them before Dependabot can add them to a pull request.

Please fix the above issues or remove invalid values from dependabot.yml.

Update domain benchmark section with calibrated measurements: - PubMedQA: score range [0.01, 0.77], best F1=62.1% at t=0.50 - FinanceBench: score range [0.007, 0.63], 80%+ FPR without KB - Key finding: NLI-only scoring needs KB grounding for discrimination - Competitive positioning: every claim sourced with measurement date - Honest Limitations: NLI-only domain scoring weak without KB as #1 Co-Authored-By: Arcane Sapience <protoscience@anulum.li>

…n score_and_save The F1 metric mislabel that has confused the entire FPR-reduction campaign. ``AggreFactMetrics.avg_balanced_acc`` computes the **per-dataset mean** of balanced accuracies (unweighted average across the 11 AggreFact datasets). ``score_and_save()`` stored this value under the field name ``global_balanced_accuracy`` with a doc string that said "sample-pooled BA computed once across all 29 320 samples" — which it was not. Consequence: FactCG's stored ``global_balanced_accuracy: 0.7558`` is the per-dataset mean at the global threshold, NOT sample-pooled. Direct computation shows FactCG's TRUE sample-pooled BA is 0.8142 at the same threshold. We had been comparing the champion's sample-pooled 82.11 % against FactCG's per-dataset mean 75.58 % as if they were the same metric, overstating the gap by ~6 pp. This commit: * Adds ``_compute_sample_pooled_ba(predictions, labels) -> float`` helper that computes true sample-pooled balanced accuracy on the flat (preds, labels) pool. * ``score_and_save()`` now writes FOUR explicit metric fields in a 2×2 matrix of {per-dataset-mean, sample-pooled} × {global threshold, per-dataset thresholds}: - ``per_dataset_mean_balanced_accuracy_at_global_threshold`` (= AggreFact leaderboard convention, verified verbatim from https://llm-aggrefact.github.io/ on 2026-04-12) - ``per_dataset_mean_balanced_accuracy_at_per_dataset_thresholds`` (post-hoc tuned — our FactCG "77.76 % potential #1" number) - ``sample_pooled_balanced_accuracy_at_global_threshold`` (true sample-pooled, new) - ``sample_pooled_balanced_accuracy_at_per_dataset_thresholds`` (true sample-pooled with per-ds tuning, new) * Legacy aliases ``global_balanced_accuracy`` and ``per_dataset_avg_balanced_accuracy`` are kept for back-compat and map to the per-dataset-mean variants. A deprecation comment documents the migration path. * ``AggreFactMetrics.avg_balanced_acc`` is renamed to ``per_dataset_mean_balanced_acc`` (canonical) with ``avg_balanced_acc`` kept as a deprecated alias. Doc string explains both metrics and cites the leaderboard verification. Full audit trail: docs/internal/experiments_log_2026-04-12.md Entries 18 and 19. Co-Authored-By: Arcane Sapience <protoscience@anulum.li>

- 75.8% → 75.6% per-dataset mean (leaderboard #6, verified 2026-04-12) - Add FactCG-tuned 77.76% (potential #1, ahead of Bespoke-MiniCheck 77.4%) - Add leaderboard rank column - Remove MiniCheck-DeBERTa-L row (not on published leaderboard) - Simplify Gemma routed callout (remove sample-pooled per-family breakdown which was mixing metrics) Co-Authored-By: Arcane Sapience <protoscience@anulum.li>

director_ai.core.trajectory ships the foundation for the 2026-04-21 roadmap Tier 1 #1 feature: pre-execution Monte-Carlo halt based on N simulated draws from an injected actor. TrajectorySimulator runs n_simulations independent draws (default 8) with deterministic per-draw seeds (base_seed + i), feeds each trajectory's joined text to a CoherenceScorer-shaped verdict producer, and aggregates the results into a PreflightVerdict: - halt_rate / mean_coherence / std_coherence - 95% empirical credible interval over the per-trajectory scores - recommended action (``proceed`` / ``warn`` / ``halt``) based on two halt-rate thresholds (warn 0.25, halt 0.50 by default) - the raw TrajectoryResult list so operators can inspect which draws failed Seeded determinism means two preflight calls with the same prompt produce byte-identical verdicts — reproducibility for forensic incident review and for regression tests on preflight decisions. Optional on_trajectory callback per draw; exceptions from the callback are swallowed so a broken sink cannot abort the loop. Follow-ups tracked separately (distilled-actor integration, CoherenceAgent wiring, conformal calibration against historical traces, Rust-accelerated Monte-Carlo loop). Foundation scope matches the roadmap memo. Coverage: 17 tests covering construction validation, proceed / warn / halt bands, deterministic replay, seed variation, per-trajectory callback, callback failure isolation, verdict shape, min/max/std aggregation. mypy clean on 194 source files. Co-Authored-By: Arcane Sapience <protoscience@anulum.li>

anulum merged commit 11b5c65 into main Mar 1, 2026
8 checks passed

dependabot Bot deleted the dependabot/github_actions/actions/upload-pages-artifact-4 branch March 1, 2026 23:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(deps): Bump actions/upload-pages-artifact from 3 to 4#1

chore(deps): Bump actions/upload-pages-artifact from 3 to 4#1
anulum merged 1 commit intomainfrom
dependabot/github_actions/actions/upload-pages-artifact-4

dependabot Bot commented on behalf of github Mar 1, 2026

Uh oh!

dependabot Bot commented on behalf of github Mar 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dependabot Bot commented on behalf of github Mar 1, 2026

v4.0.0

What's Changed

v3.0.1

Changelog

Uh oh!

dependabot Bot commented on behalf of github Mar 1, 2026

Labels

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant