CLI auth#72
Merged
Merged
Conversation
garrettallen14
approved these changes
Mar 25, 2026
mmercuri
added a commit
that referenced
this pull request
Mar 30, 2026
…n main) Rebased onto latest main (e8a8033) which includes: - CLI with auth (PR #72) - layerlens.instrument tracing + adapters (PR #66, #69) - Scorers resource, integrations resource - API naming convention fixes (PR #61) No impact on samples: Stratix() constructor is backward-compatible, use_bearer_auth defaults to False, all existing API signatures unchanged. Samples include: core (18), industry (10), cowork (5), modalities (3), integrations (2), cicd (2+workflow), openclaw (10+skill), mcp (1), copilotkit (2+UI), claude-code skills (6), sample data (23 files). 469 non-live tests passing. 54 live tests available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
m-peko
pushed a commit
that referenced
this pull request
Apr 2, 2026
…n main) Rebased onto latest main (e8a8033) which includes: - CLI with auth (PR #72) - layerlens.instrument tracing + adapters (PR #66, #69) - Scorers resource, integrations resource - API naming convention fixes (PR #61) No impact on samples: Stratix() constructor is backward-compatible, use_bearer_auth defaults to False, all existing API signatures unchanged. Samples include: core (18), industry (10), cowork (5), modalities (3), integrations (2), cicd (2+workflow), openclaw (10+skill), mcp (1), copilotkit (2+UI), claude-code skills (6), sample data (23 files). 469 non-live tests passing. 54 live tests available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
m-peko
added a commit
that referenced
this pull request
Apr 21, 2026
) * SDK samples: 70+ production-ready samples, docs, and tests (rebased on main) Rebased onto latest main (e8a8033) which includes: - CLI with auth (PR #72) - layerlens.instrument tracing + adapters (PR #66, #69) - Scorers resource, integrations resource - API naming convention fixes (PR #61) No impact on samples: Stratix() constructor is backward-compatible, use_bearer_auth defaults to False, all existing API signatures unchanged. Samples include: core (18), industry (10), cowork (5), modalities (3), integrations (2), cicd (2+workflow), openclaw (10+skill), mcp (1), copilotkit (2+UI), claude-code skills (6), sample data (23 files). 469 non-live tests passing. 54 live tests available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Remove marc-only/ from tracking, add to .gitignore Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Move examples/cli/ to samples/cli/ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add instrumentation and integration management samples from examples/ Copy 3 new files from examples/ that had no equivalent in samples/: - samples/integrations/openai_instrumented.py (instrument_openai + @trace + span) - samples/integrations/langchain_instrumented.py (LangChainCallbackHandler) - samples/core/integration_management.py (client.integrations CRUD) Update docs/instrumentation/providers.md and frameworks.md with Related Samples links. Update samples/integrations/README.md and samples/core/README.md. Update samples/README.md integrations count (2 → 4). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Consolidate examples/ into samples/: remove duplicates, integrate unique patterns - Remove 14 examples/ files already covered by samples/core equivalents - Create samples/core/benchmark_evaluation.py for model+benchmark workflow (evaluations.create → wait_for_completion → results.get/get_all) - Integrate 12 unique patterns from remaining examples/ into samples/: - trace_evaluation.py: add get_results().steps iteration, get_many() without filter - compare_evaluations.py: add compare_models(), outcome_filter, result field access - judge_optimization.py: add BadRequestError catch, optimization result fields - model_benchmark_management.py: add models.add/remove, benchmarks.add/remove, filters - evaluation_filtering.py: document both camelCase and snake_case sort_by conventions - paginated_results.py: add results.get_by_id() alternative - public_catalog.py: add evaluation summary fields, get_prompts search/sort params - async_workflow.py: add evaluation instance methods (wait_for_completion_async, etc) - Add Related Samples to docs/examples/creating-evaluations.md - Add Related Samples to docs/instrumentation/providers.md and frameworks.md - Update all READMEs for new files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Remove hardcoded retrieval score from rag_assessment.py (CLAUDE.md Rule 3) The "0.92" similarity score was fabricated and displayed as if computed by a real retrieval engine. Removed the fake score -- retrieval is by document ID, and actual quality scoring comes from the judge evaluation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add per-sample SDK call assertions for all 58 samples (10/10 compliance) Every sample now has specific assertions verifying which SDK methods it calls (not just "didn't crash"). Covers: - 20 core samples (benchmark_evaluation, integration_management added) - 5 cowork samples (code_review, pair_programming, rag_assessment, etc) - 3 modality samples (text, brand, document evaluation) - 4 integration samples (openai/anthropic traced + instrumented) - 2 cicd samples Also adds mock setup for client.integrations and client.trace_evaluations.get_many. 495 non-live tests passing, 58 live tests deselected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Remove examples/ entirely, remap all 53 doc references to samples/ All example files have been either: - Removed (14 duplicates already covered by samples/core equivalents) - Removed after integrating unique patterns into samples/ (12 files) - Replaced by samples/core/benchmark_evaluation.py (3 client workflow files) Updated all 53 doc references in docs/examples/ to point to samples/core/. Updated docs/examples/README.md with new file table. examples/ directory no longer exists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add comprehensive MCP server tests (29 tests) Tests cover all 6 tool handlers, dispatch logic, error handling, asyncio.to_thread wrapping, and helper functions: - TestToolCatalogue: server creation and handler existence - TestHandleListTraces: summary output, default limit, empty/null responses - TestHandleGetTrace: detail output, not-found handling - TestHandleRunEvaluation: creation output, failure handling - TestHandleGetEvaluation: status+results, not-found, pending state - TestHandleCreateJudge: creation output, failure handling - TestHandleListJudges: list output, empty/null responses - TestDispatchAndErrors: unknown tool, SDK exceptions, helper functions - TestAsyncWrapping: all 5 handlers verified to use asyncio.to_thread Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix samples --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: m-peko <marinpeko5@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.