Migrate from Azure Pipelines to GitHub Actions#1
Conversation
- Add .github/workflows/ci.yml: push/PR CI on windows-2022 and ubuntu-22.04 - Add .github/workflows/publish.yml: tag-triggered publish (NuGet, NPM, VSCode extension) with environment: production gate and workflow_dispatch simulate mode - Add .github/actions/build-and-test/action.yml: composite action for build+test - Add .github/dependabot.yml: weekly github-actions dependency updates - Patch eng/publish/PublishPolyglotNotebooksHelper.psm1: - FindChildItem null guard before Get-FileHash - vsce verify-signature captures stderr (2>&1 | Out-String) to prevent NullRef - NuGet push uses --skip-duplicate - azure/login guarded with if: !inputs.simulate Drops all Microsoft-internal infra (MicroBuild signing, 1ES SDL, TSA, OneLocBuild, symbol publishing). Publish uses OIDC/managed identity for VS Code Marketplace; PATs for NuGet and NPM. AZP files are NOT yet deleted (shadow-CI phase).
There was a problem hiding this comment.
Pull request overview
Migrates the repository from Azure Pipelines to GitHub Actions: introduces CI and Publish workflows plus a shared composite build/test action, sets up weekly Dependabot updates for GitHub Actions, and adjusts the existing publish PowerShell helpers to be tolerant of test-signed builds and idempotent NuGet pushes. Microsoft-internal/1ES tooling (signing, SDL, TSA, OneLocBuild, etc.) is intentionally dropped. The legacy AZP YAMLs are kept temporarily for shadow-CI validation.
Changes:
- Add
.github/workflows/ci.yml,.github/workflows/publish.yml, compositebuild-and-test/action.yml, anddependabot.yml. - Switch VS Code Marketplace publish auth to OIDC/
azure/login; switch NPM to npmjs.org; NuGet viaNUGET_API_KEY. - Soften
vsce verify-signatureto a warning and add--skip-duplicatetodotnet nuget pushinPublishPolyglotNotebooksHelper.psm1.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/ci.yml | New push/PR CI on Windows + Linux delegating to the composite action. |
| .github/workflows/publish.yml | Tag/dispatch-triggered publish pipeline for NuGet, NPM, and VS Code Marketplace with OIDC auth. |
| .github/actions/build-and-test/action.yml | Shared composite build/test action used by CI and Publish, with platform-specific steps and caching. |
| .github/dependabot.yml | Weekly grouped Dependabot updates for GitHub Actions. |
| eng/publish/PublishPolyglotNotebooksHelper.psm1 | Adds missing-file guard, downgrades signature verification to a warning, and uses --skip-duplicate for NuGet push. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
DRY: hoist NPM_CONFIG_REGISTRY and NODE_VERSION to workflow-level env; add npm-version input to composite action; add paths-ignore comment. Best practices: add concurrency groups (ci cancel-in-progress, publish no-cancel); add workflow_dispatch to ci.yml; move checks:write to per-job in ci.yml; add timeout-minutes to publish-nuget/npm/vscode; pin ubuntu-latest to ubuntu-22.04; pass NUGET_API_KEY via env var instead of CLI arg. Versions: Node.js 22.18.0 -> 22.22.3, npm 11.6.0 -> 11.14.1. Dependabot: add npm (/src) and nuget (/) ecosystem scans.
…ON to composite action
npm 11.x strict npm ci validation requires all optional dependencies to have a resolved package entry in the lock file, even macOS-only optional packages like fsevents. The polyglot-notebooks-ui-components lock file was generated without a resolved fsevents entry (only the range specifier), causing npm ci to fail with 'Missing: fsevents@2.3.3 from lock file'. Adds the node_modules/fsevents entry (packages section) and the legacy fsevents entry (dependencies section) matching the version used in the other package lock files in this repo.
HIGH - signature verification now fails closed on production publishes
(PublishPolyglotNotebooksHelper.psm1): gate soft-fail on simulate
parameter; non-simulate builds exit 1 if signature check fails.
HIGH - NuGet push loop no longer swallows errors (publish.yml):
replace ind ... | while read subshell pattern with or loop and
set -eo pipefail so dotnet nuget push failures propagate correctly.
HIGH - npm publish glob replaced with explicit for-loop (publish.yml):
validate exactly one tarball exists before publishing; fail clearly if
zero or multiple tarballs are found.
MEDIUM - NuGet cache path corrected (action.yml):
Arcade sets NUGET_PACKAGES to {repo}/.packages when -ci is passed
(useGlobalNuGetCache=false). Changed cache path from ~/.nuget/packages
to github.workspace/.packages so the cache actually hits.
LOW - cp error swallowing narrowed (action.yml):
check eng/resources exists before copy; only suppress empty-glob with
nullglob rather than silently ignoring all errors.
LOW - comments added:
inputs.simulate empty-string behavior documented in publish.yml.
pwsh dependency on ubuntu-22.04 documented in action.yml.
- make simulate behavior explicit via prepare job output in publish workflow - validate extracted package version matches tag version on tag-triggered runs - harden Linux resource copy and add explicit pwsh availability precheck - align NuGet cache path and NUGET_PACKAGES usage in build-and-test action
Skip blame-hang collection for Microsoft.DotNet.Interactive.Jupyter.Tests on non-Windows to avoid false-positive Test Run Aborted failures while preserving test execution.
- add timestamped progress logging to test-retry-runner - add workflow_dispatch diagnostics workflow for Jupyter Linux test hangs
- run isolated Linux Jupyter diagnostics on pull requests and manual dispatch - add timestamped test progress logs in test-retry-runner
- build prerequisites before running isolated Jupyter diagnostics - run diagnostics step with continue-on-error to collect artifacts without breaking PR
- run targeted Jupyter blame-hang diagnostics when Linux test step fails - upload diagnostics via existing artifacts/TestResults path
Subscribe to Jupyter response streams before sending requests so fast in-memory playback cannot emit replies before the awaiting code is subscribed. This prevents recorded Jupyter tests from missing responses and hanging after test execution completes.
- Remove linux-hang-diagnostics.yml temporary workflow - Remove temporary diagnostics step from build-and-test action - Restore blame-hang collection for Jupyter tests (root cause now fixed) - Keep timestamped logging utility functions for debugging CI run 25950724045 confirmed the race fix resolves the hang issue. Both build-linux and build-windows jobs passed without timeouts.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 11 changed files in this pull request and generated 9 comments.
Files not reviewed (1)
- src/polyglot-notebooks-ui-components/package-lock.json: Language not supported
Comments suppressed due to low confidence (1)
.github/actions/build-and-test/action.yml:170
- The diagnostics test step uses
--no-buildand runs only when the preceding regular test run failed. If the failure was a build/compile error (vs. a test hang), the build outputs needed by--no-buildwill not be present and this step will fail with a confusing "could not find assembly" error rather than helping diagnose the hang. Consider guarding this with a check that the test assembly exists, or dropping--no-buildso it falls back to building on demand.
- name: Report test results (Linux)
if: ${{ inputs.platform == 'linux' && !cancelled() && (github.event_name != 'pull_request' || github.event.pull_request.head.repo.full_name == github.repository) }}
uses: dorny/test-reporter@v3
with:
name: Linux Tests
path: artifacts/TestResults/${{ inputs.build-config }}/**/*.trx
reporter: dotnet-trx
fail-on-error: false
- dependabot.yml: Replace /src with per-package directories using 'directories:' (plural) so Dependabot actually finds npm manifests - publish.yml: Anchor symbols-exclusion regex to '\.symbols\.nupkg$' instead of the looser 'symbols' substring match - publish.yml: Add global-json-file to setup-dotnet in publish-nuget job to pin the SDK version used for 'dotnet nuget push' - publish.yml: Replace fixed Start-Sleep 180 with a polling loop that queries the nuget.org v3 flat-container API every 10s up to 15min - publish.yml: Add clarifying comment explaining the intentional double-push pattern (publish-nuget owns primary OIDC push; PS script re-push is safe) - publish.yml: Add comments explaining why working-directory is load-bearing for PackVSCodeExtension.ps1 and PackNpmPackage.ps1 - build-and-test/action.yml: Set npm cache to workspace-relative path .npm-cache (via 'npm config set cache') so it works on both Linux and Windows runners (Windows default cache is %%AppData%%\npm-cache, not ~/.npm)
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 11 changed files in this pull request and generated 8 comments.
Files not reviewed (1)
- src/polyglot-notebooks-ui-components/package-lock.json: Language not supported
Comments suppressed due to low confidence (2)
.github/workflows/publish.yml:312
- The polling loop always sleeps 10 seconds before its first probe, which means publish-vscode is delayed by ≥10s even when nuget.org has already indexed the package. More importantly, after a failed
Invoke-RestMethod(e.g., transient 5xx), the catch branch logs the error but does not break — that's fine on its own, but thedo { ... } while ([DateTimeOffset]::UtcNow -lt $timeout)only re-evaluates the time check after the body completes, so if many consecutive failures occur the loop will continue until timeout without ever succeeding. Consider probing once before sleeping (so the happy path returns immediately) and treating prolonged HTTP failures as a hard error rather than spinning silently.
- name: Wait for nuget.org propagation
if: needs.prepare.outputs.simulate != 'true'
shell: pwsh
run: |
$version = '${{ needs.build.outputs.version }}'
$packageId = 'Microsoft.dotnet-interactive'
$url = "https://api.nuget.org/v3-flatcontainer/$($packageId.ToLower())/index.json"
$timeout = [DateTimeOffset]::UtcNow.AddMinutes(15)
do {
Start-Sleep -Seconds 10
try {
$index = (Invoke-RestMethod -Uri $url).versions
if ($index -contains $version) {
Write-Host "Version $version is indexed on nuget.org"
break
}
Write-Host "Waiting for version $version to appear on nuget.org..."
} catch {
Write-Host "Failed to query nuget.org: $_"
}
} while ([DateTimeOffset]::UtcNow -lt $timeout)
if ([DateTimeOffset]::UtcNow -ge $timeout) {
Write-Error "Timed out waiting for version $version to appear on nuget.org"
exit 1
}
.github/workflows/publish.yml:307
- In
try { $index = (Invoke-RestMethod -Uri $url).versions; ... } catch { ... }, when the request returns a 404 (which the v3 flat-container endpoint returns for a not-yet-indexed package id),Invoke-RestMethodthrows and the catch swallows the error as "Failed to query nuget.org". That's fine for the very first push of a new package id but means transient network errors and "version not yet indexed" are indistinguishable in the logs. Consider differentiating 404s (expected during indexing) from other failures so operators can diagnose stuck publishes.
try {
$index = (Invoke-RestMethod -Uri $url).versions
if ($index -contains $version) {
Write-Host "Version $version is indexed on nuget.org"
break
}
Write-Host "Waiting for version $version to appear on nuget.org..."
} catch {
Write-Host "Failed to query nuget.org: $_"
}
When CIBuild.cmd/cibuild.sh run on GitHub Actions they set DisableArcade=1,
which prevents the Arcade SDK from being imported in Directory.Build.targets.
Without the Arcade import, 'dotnet build' writes test binaries to the default
'bin/{Config}/{TFM}/' directory instead of 'artifacts/bin/{project}/…'.
However 'dotnet test --no-build' (run in a separate GHA step without
DisableArcade=1) evaluates the project with Arcade and expects to find the DLL
at the Arcade-convention path 'artifacts/bin/{project}/{Config}/{TFM}/'. This
mismatch produced the 'test source file not found' failure.
Fix: in Directory.Build.targets, when DisableArcade=1, set ArtifactsDir,
ArtifactsBinDir, and OutputPath to match the Arcade layout. This makes the
build output land in the same place that the test-runner evaluation expects.
Double-dashes are invalid in XML comments. Replace '--no-build' and the em-dash ellipsis with plain ASCII equivalents.
The build scripts (CIBuild.cmd / cibuild.sh) set DisableArcade=1, but
env vars set inside a step are not inherited by subsequent GHA steps
unless written to GITHUB_ENV. The test-retry-runner.ps1 step therefore
evaluated project files without DisableArcade=1, so Arcade was imported
and OutDir was computed as artifacts/bin/{project}/... which is where the
DLLs were not found (they were built to the default bin/ path).
Fix: after setting DisableArcade=1, also append it to GITHUB_ENV so
that the Run-tests step inherits the variable and dotnet test --no-build
resolves the output path consistently with how the build ran.
Also revert the Directory.Build.targets PropertyGroup added in earlier
commits (which incorrectly caused dotnet pack to look for DLLs at
artifacts/bin/... while the build was still writing to bin/).
The CMD echo redirect in CIBuild.cmd was broken: 'echo DisableArcade=1>>' is parsed by CMD as echoing 'DisableArcade=' with fd1 redirected to the file, so GITHUB_ENV received an empty value. Similarly, relying on GITHUB_ENV from a sibling step was fragile. The clean fix is to explicitly set DisableArcade in the 'env:' block of both test steps in action.yml. This ensures dotnet test evaluates projects without Arcade (matching how they were built), so it finds DLLs at the default bin/ path instead of artifacts/bin/.
- Forward unknown CIBuild.cmd arguments consistently with cibuild.sh and parse arbitrary SignType values - Add no-build/no-restore flags to pack in CI scripts - Harden publish NuGet package loops and nuget.org polling behavior - Document VSIX artifact layout expected by publish helpers - Pin npm registry setup and rewrite npm lockfile resolved URLs to npmjs.org - Remove unrelated Directory.Build.targets whitespace
Keep CI pack in no-build/no-restore mode, but disable project reference builds so SDK 10 solution pack does not invoke Build with NoBuild=true and fail with NETSDK1085.
1067ff8 to
59d6f4e
Compare
- Pin GitHub-published actions to existing v4 major tags - Resolve publish simulate mode explicitly by event type - Extract publish version only from the bare Microsoft.dotnet-interactive package - Quote forwarded Windows CI args and document expected token shape
- Set checkout to v6 - Set upload-artifact to v7 and download-artifact to v8 - Set setup-dotnet to v5, setup-node to v6, and cache to v5
Summary
Replaces all 5 Azure Pipelines YAML files with GitHub Actions workflows.
Non-CI fixes included
JupyterKernel: creates theToTask-derived task before awaitingSendAsyncto avoid missing hot-observable completion.test-retry-runner.ps1: adds timestamped retry logging so CI retry attempts are easier to correlate in logs.New files
.github/workflows/ci.yml— push/PR CI on Windows (windows-2022) and Linux (ubuntu-22.04).github/workflows/publish.yml— tag-triggered publish (NuGet, NPM, VS Code Marketplace);workflow_dispatchsimulate mode.github/actions/build-and-test/action.yml— composite action (build + test, reused by both CI and publish).github/dependabot.yml— weekly github-actions dependency updatesWhat's dropped (no replacement needed)
Publish authentication
azure/login@v3+ DefaultAzureCredential)NUGET_API_KEYsecretNPM_TOKENsecretRequired repo setup (before enabling publish)
productionGitHub Environment → add Required ReviewersAZURE_CLIENT_ID,AZURE_TENANT_ID,AZURE_SUBSCRIPTION_ID,NUGET_API_KEY,NPM_TOKENrepo:BenjaminMichaelis/interactive:environment:productionAZP files not yet deleted
Shadow-CI phase: run GHA alongside AZP until validated, then delete:
azure-pipelines.ymlazure-pipelines-official.ymlazure-publish-stable-polyglot-notebooks.ymlazure-publish-insiders-polyglot-notebooks.ymlapiscan-compliance.yml/es-metadata.ymlReview history
4 rounds of GPT-5.5 × Opus 4.6 cross-review; all critical findings applied.