Skip to content

CLI auth#72

Merged
m-peko merged 3 commits into
mainfrom
feature/cli-auth
Mar 25, 2026
Merged

CLI auth#72
m-peko merged 3 commits into
mainfrom
feature/cli-auth

Conversation

@m-peko
Copy link
Copy Markdown
Collaborator

@m-peko m-peko commented Mar 25, 2026

No description provided.

@m-peko m-peko requested a review from garrettallen14 March 25, 2026 20:02
@m-peko m-peko merged commit e8a8033 into main Mar 25, 2026
7 checks passed
@m-peko m-peko deleted the feature/cli-auth branch March 25, 2026 20:57
mmercuri added a commit that referenced this pull request Mar 30, 2026
…n main)

Rebased onto latest main (e8a8033) which includes:
- CLI with auth (PR #72)
- layerlens.instrument tracing + adapters (PR #66, #69)
- Scorers resource, integrations resource
- API naming convention fixes (PR #61)

No impact on samples: Stratix() constructor is backward-compatible,
use_bearer_auth defaults to False, all existing API signatures unchanged.

Samples include: core (18), industry (10), cowork (5), modalities (3),
integrations (2), cicd (2+workflow), openclaw (10+skill), mcp (1),
copilotkit (2+UI), claude-code skills (6), sample data (23 files).

469 non-live tests passing. 54 live tests available.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
m-peko pushed a commit that referenced this pull request Apr 2, 2026
…n main)

Rebased onto latest main (e8a8033) which includes:
- CLI with auth (PR #72)
- layerlens.instrument tracing + adapters (PR #66, #69)
- Scorers resource, integrations resource
- API naming convention fixes (PR #61)

No impact on samples: Stratix() constructor is backward-compatible,
use_bearer_auth defaults to False, all existing API signatures unchanged.

Samples include: core (18), industry (10), cowork (5), modalities (3),
integrations (2), cicd (2+workflow), openclaw (10+skill), mcp (1),
copilotkit (2+UI), claude-code skills (6), sample data (23 files).

469 non-live tests passing. 54 live tests available.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
m-peko added a commit that referenced this pull request Apr 21, 2026
)

* SDK samples: 70+ production-ready samples, docs, and tests (rebased on main)

Rebased onto latest main (e8a8033) which includes:
- CLI with auth (PR #72)
- layerlens.instrument tracing + adapters (PR #66, #69)
- Scorers resource, integrations resource
- API naming convention fixes (PR #61)

No impact on samples: Stratix() constructor is backward-compatible,
use_bearer_auth defaults to False, all existing API signatures unchanged.

Samples include: core (18), industry (10), cowork (5), modalities (3),
integrations (2), cicd (2+workflow), openclaw (10+skill), mcp (1),
copilotkit (2+UI), claude-code skills (6), sample data (23 files).

469 non-live tests passing. 54 live tests available.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Remove marc-only/ from tracking, add to .gitignore

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Move examples/cli/ to samples/cli/

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add instrumentation and integration management samples from examples/

Copy 3 new files from examples/ that had no equivalent in samples/:
- samples/integrations/openai_instrumented.py (instrument_openai + @trace + span)
- samples/integrations/langchain_instrumented.py (LangChainCallbackHandler)
- samples/core/integration_management.py (client.integrations CRUD)

Update docs/instrumentation/providers.md and frameworks.md with Related Samples links.
Update samples/integrations/README.md and samples/core/README.md.
Update samples/README.md integrations count (2 → 4).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Consolidate examples/ into samples/: remove duplicates, integrate unique patterns

- Remove 14 examples/ files already covered by samples/core equivalents
- Create samples/core/benchmark_evaluation.py for model+benchmark workflow
  (evaluations.create → wait_for_completion → results.get/get_all)
- Integrate 12 unique patterns from remaining examples/ into samples/:
  - trace_evaluation.py: add get_results().steps iteration, get_many() without filter
  - compare_evaluations.py: add compare_models(), outcome_filter, result field access
  - judge_optimization.py: add BadRequestError catch, optimization result fields
  - model_benchmark_management.py: add models.add/remove, benchmarks.add/remove, filters
  - evaluation_filtering.py: document both camelCase and snake_case sort_by conventions
  - paginated_results.py: add results.get_by_id() alternative
  - public_catalog.py: add evaluation summary fields, get_prompts search/sort params
  - async_workflow.py: add evaluation instance methods (wait_for_completion_async, etc)
- Add Related Samples to docs/examples/creating-evaluations.md
- Add Related Samples to docs/instrumentation/providers.md and frameworks.md
- Update all READMEs for new files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Remove hardcoded retrieval score from rag_assessment.py (CLAUDE.md Rule 3)

The "0.92" similarity score was fabricated and displayed as if computed
by a real retrieval engine. Removed the fake score -- retrieval is by
document ID, and actual quality scoring comes from the judge evaluation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add per-sample SDK call assertions for all 58 samples (10/10 compliance)

Every sample now has specific assertions verifying which SDK methods
it calls (not just "didn't crash"). Covers:
- 20 core samples (benchmark_evaluation, integration_management added)
- 5 cowork samples (code_review, pair_programming, rag_assessment, etc)
- 3 modality samples (text, brand, document evaluation)
- 4 integration samples (openai/anthropic traced + instrumented)
- 2 cicd samples

Also adds mock setup for client.integrations and client.trace_evaluations.get_many.
495 non-live tests passing, 58 live tests deselected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Remove examples/ entirely, remap all 53 doc references to samples/

All example files have been either:
- Removed (14 duplicates already covered by samples/core equivalents)
- Removed after integrating unique patterns into samples/ (12 files)
- Replaced by samples/core/benchmark_evaluation.py (3 client workflow files)

Updated all 53 doc references in docs/examples/ to point to samples/core/.
Updated docs/examples/README.md with new file table.
examples/ directory no longer exists.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add comprehensive MCP server tests (29 tests)

Tests cover all 6 tool handlers, dispatch logic, error handling,
asyncio.to_thread wrapping, and helper functions:

- TestToolCatalogue: server creation and handler existence
- TestHandleListTraces: summary output, default limit, empty/null responses
- TestHandleGetTrace: detail output, not-found handling
- TestHandleRunEvaluation: creation output, failure handling
- TestHandleGetEvaluation: status+results, not-found, pending state
- TestHandleCreateJudge: creation output, failure handling
- TestHandleListJudges: list output, empty/null responses
- TestDispatchAndErrors: unknown tool, SDK exceptions, helper functions
- TestAsyncWrapping: all 5 handlers verified to use asyncio.to_thread

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix samples

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: m-peko <marinpeko5@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants