CLI auth by m-peko · Pull Request #72 · LayerLens/stratix-python

m-peko · 2026-03-25T20:02:49Z

No description provided.

…n main) Rebased onto latest main (e8a8033) which includes: - CLI with auth (PR #72) - layerlens.instrument tracing + adapters (PR #66, #69) - Scorers resource, integrations resource - API naming convention fixes (PR #61) No impact on samples: Stratix() constructor is backward-compatible, use_bearer_auth defaults to False, all existing API signatures unchanged. Samples include: core (18), industry (10), cowork (5), modalities (3), integrations (2), cicd (2+workflow), openclaw (10+skill), mcp (1), copilotkit (2+UI), claude-code skills (6), sample data (23 files). 469 non-live tests passing. 54 live tests available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@trace

) * SDK samples: 70+ production-ready samples, docs, and tests (rebased on main) Rebased onto latest main (e8a8033) which includes: - CLI with auth (PR #72) - layerlens.instrument tracing + adapters (PR #66, #69) - Scorers resource, integrations resource - API naming convention fixes (PR #61) No impact on samples: Stratix() constructor is backward-compatible, use_bearer_auth defaults to False, all existing API signatures unchanged. Samples include: core (18), industry (10), cowork (5), modalities (3), integrations (2), cicd (2+workflow), openclaw (10+skill), mcp (1), copilotkit (2+UI), claude-code skills (6), sample data (23 files). 469 non-live tests passing. 54 live tests available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Remove marc-only/ from tracking, add to .gitignore Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Move examples/cli/ to samples/cli/ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add instrumentation and integration management samples from examples/ Copy 3 new files from examples/ that had no equivalent in samples/: - samples/integrations/openai_instrumented.py (instrument_openai + @trace + span) - samples/integrations/langchain_instrumented.py (LangChainCallbackHandler) - samples/core/integration_management.py (client.integrations CRUD) Update docs/instrumentation/providers.md and frameworks.md with Related Samples links. Update samples/integrations/README.md and samples/core/README.md. Update samples/README.md integrations count (2 → 4). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Consolidate examples/ into samples/: remove duplicates, integrate unique patterns - Remove 14 examples/ files already covered by samples/core equivalents - Create samples/core/benchmark_evaluation.py for model+benchmark workflow (evaluations.create → wait_for_completion → results.get/get_all) - Integrate 12 unique patterns from remaining examples/ into samples/: - trace_evaluation.py: add get_results().steps iteration, get_many() without filter - compare_evaluations.py: add compare_models(), outcome_filter, result field access - judge_optimization.py: add BadRequestError catch, optimization result fields - model_benchmark_management.py: add models.add/remove, benchmarks.add/remove, filters - evaluation_filtering.py: document both camelCase and snake_case sort_by conventions - paginated_results.py: add results.get_by_id() alternative - public_catalog.py: add evaluation summary fields, get_prompts search/sort params - async_workflow.py: add evaluation instance methods (wait_for_completion_async, etc) - Add Related Samples to docs/examples/creating-evaluations.md - Add Related Samples to docs/instrumentation/providers.md and frameworks.md - Update all READMEs for new files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Remove hardcoded retrieval score from rag_assessment.py (CLAUDE.md Rule 3) The "0.92" similarity score was fabricated and displayed as if computed by a real retrieval engine. Removed the fake score -- retrieval is by document ID, and actual quality scoring comes from the judge evaluation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add per-sample SDK call assertions for all 58 samples (10/10 compliance) Every sample now has specific assertions verifying which SDK methods it calls (not just "didn't crash"). Covers: - 20 core samples (benchmark_evaluation, integration_management added) - 5 cowork samples (code_review, pair_programming, rag_assessment, etc) - 3 modality samples (text, brand, document evaluation) - 4 integration samples (openai/anthropic traced + instrumented) - 2 cicd samples Also adds mock setup for client.integrations and client.trace_evaluations.get_many. 495 non-live tests passing, 58 live tests deselected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Remove examples/ entirely, remap all 53 doc references to samples/ All example files have been either: - Removed (14 duplicates already covered by samples/core equivalents) - Removed after integrating unique patterns into samples/ (12 files) - Replaced by samples/core/benchmark_evaluation.py (3 client workflow files) Updated all 53 doc references in docs/examples/ to point to samples/core/. Updated docs/examples/README.md with new file table. examples/ directory no longer exists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add comprehensive MCP server tests (29 tests) Tests cover all 6 tool handlers, dispatch logic, error handling, asyncio.to_thread wrapping, and helper functions: - TestToolCatalogue: server creation and handler existence - TestHandleListTraces: summary output, default limit, empty/null responses - TestHandleGetTrace: detail output, not-found handling - TestHandleRunEvaluation: creation output, failure handling - TestHandleGetEvaluation: status+results, not-found, pending state - TestHandleCreateJudge: creation output, failure handling - TestHandleListJudges: list output, empty/null responses - TestDispatchAndErrors: unknown tool, SDK exceptions, helper functions - TestAsyncWrapping: all 5 handlers verified to use asyncio.to_thread Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix samples --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: m-peko <marinpeko5@gmail.com>

CLI auth

221d593

m-peko requested a review from garrettallen14 March 25, 2026 20:02

m-peko added 2 commits March 25, 2026 21:05

Fix format/lint

afe9659

Fix format/lint

1019c28

garrettallen14 approved these changes Mar 25, 2026

View reviewed changes

m-peko merged commit e8a8033 into main Mar 25, 2026
7 checks passed

m-peko deleted the feature/cli-auth branch March 25, 2026 20:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI auth#72

CLI auth#72
m-peko merged 3 commits into
mainfrom
feature/cli-auth

m-peko commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

m-peko commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants