[DRAFT] adding generated and custom code for custom training by jayesh-tanna · Pull Request #45951 · Azure/azure-sdk-for-python

jayesh-tanna · 2026-03-27T00:57:24Z

Description

Typespec PullRequest: Azure/azure-rest-api-specs#41619

Add Training Jobs support to `azure-ai-projects` SDK

Overview

This PR introduces CommandJob support under client.beta.training.jobs (sync) and
async_client.beta.training.jobs (async), enabling users to create, get, list, update,
cancel, and delete training jobs from the Azure AI Projects SDK without wrapping boilerplate.

A lot of our customers are currently using azure-ai-ml feels familiar to them — same patterns, same mental model. That way, when they are ready to move to Azure AI Foundry, the migration is a small step rather than a full rewrite.

Design Choices

1. Flat CommandJob surface — no envelope required
Callers pass CommandJob directly to create_or_update and receive CommandJob back from
get/list. The SDK wraps/unwraps the Job(properties=...) wire envelope transparently.

2. Custom CommandJob subclass (model patch)
CommandJob extends the auto-generated _RestCommandJob and exposes read-only name and id
properties promoted from the outer Job envelope returned by the service.

3. _from_rest_object factory method
A classmethod on CommandJob constructs the flat model from any service response object,
with explicit ValueError/TypeError on unexpected shapes rather than silent None fields.

4. CommandJobLimits.timeout accepts int, float, or timedelta
The patched CommandJobLimits.__init__ converts plain numeric seconds to timedelta before
forwarding to the generated model, eliminating a common serialization foot-gun.

5. Auto-injection of Foundry-Features preview header
Every operation (list, get, create_or_update, begin_delete, begin_cancel) automatically injects
Foundry-Features: Jobs=V1Preview so callers never need to pass it manually as a custom header.

6. Automatic local-path resolution for code and inputs
If code or an input path is a local file or folder, the SDK transparently uploads it as a
dataset asset and swaps in the returned datastore URI before the request is sent.

7. Input validation before every create/update
create_or_update validates name, command, environment_image_reference, and compute
are non-empty upfront, surfacing clear ValueErrors instead of opaque HTTP 400 responses.

8. Full async mirror (_patch_jobs_async.py)
All sync customizations are mirrored in the async operations class using async/await and
distributed_trace_async, including async dataset upload resolution for code and inputs.

Customizations Summary

Customization	What it does
Flat `CommandJob` model with `name` and `id` properties	The service returns jobs wrapped in an outer `Job` envelope. We subclass the generated model to surface `name` and `id` directly on the object so callers never need to unwrap `job.properties.name`.
`CommandJob._from_rest_object` factory	Converts a raw service `Job` response into a flat `CommandJob` in one place, with typed error messages if the response shape is unexpected (missing properties, wrong job type).
`Job` envelope wrapping in `create_or_update`	The service wire format requires `Job(properties=CommandJob(...))`. The patch wraps the caller's flat `CommandJob` into the envelope automatically before the HTTP call, keeping the public API clean.
`CommandJobLimits` timeout coercion	Overrides `__init__` to accept plain `int`/`float` seconds in addition to `timedelta`, converting them automatically. Removes a class of runtime serialization errors when callers pass numeric timeouts.
`Foundry-Features` preview header injection	Injects `Foundry-Features: Jobs=V1Preview` into every request from `_inject_preview_header`, so the preview feature flag is always active without callers needing to know about it.
Local-path auto-upload for `code` and `inputs`	Before sending a job, any local file or folder in `code` or input `path` is uploaded to a new dataset via `DatasetsOperations` and the field is replaced with the returned datastore URI transparently.
Dataset name:version short-form resolution	An input URI in `name:version` or `azureai:name:version` form is resolved to a full datastore URI by fetching the existing dataset, removing the need for callers to look up URIs manually.
Pre-flight `_validate` guard	Checks `name`, `command`, `environment_image_reference`, and `compute` are non-empty before any network call, giving callers an immediate `ValueError` with a clear message instead of a cryptic HTTP 400.
Async mirror of all sync customizations	Every sync customization (envelope wrap/unwrap, validation, path resolution, header injection) is duplicated with `async`/`await` in `_patch_jobs_async.py` so the async client has identical behaviour.

Pending / Future Work

command() factory function — Following the same pattern as azure-ai-ml's top-level
command() function (see azure.ai.ml.entities._builders.command_func), a standalone
command(*, command, environment, compute, inputs, outputs, ...) helper will be added so users
can write job = command(...); client.beta.training.jobs.create_or_update(name, job) without
constructing CommandJob directly.
Unit & live test coverage — Tests for the patch layer (validation, local-path resolution,
header injection, _from_rest_object error paths, async equivalents) will be added in this PR
in the next commit.

Sample code

job = CommandJob(
        command="python train.py --epochs 10 --lr 0.001 --output $AZUREML_MODEL_DIR/outputs",
        environment_image_reference="mcr.microsoft.com/azureml/minimal-ubuntu22.04-py39-cuda11.8-gpu-inference",
        compute=compute_id,
        display_name="Sample Command Job - Full",
        description="A sample job created via the Azure AI Projects SDK.",
        tags={"framework": "pytorch", "priority": "low", "team": "ai-platform"},
        properties={"experiment_id": "exp-42", "model_version": "1.0"},
        code="./src",
        environment_variables={
            "NCCL_DEBUG": "INFO",
            "PYTHONPATH": "/opt/conda/lib/python3.9/site-packages",
        },
        inputs={
            "training_data": Input(
                type=AssetTypes.URI_FILE,
                path="./data/train.csv",
                mode=InputOutputModes.READ_ONLY_MOUNT,
                description="CIFAR-10 training split",
            ),
        },
        outputs={
            "model_output": Output(
                type=AssetTypes.URI_FOLDER,
                path="azureai://datastores/workspaceblobstore/paths/outputs/cifar10-model/",
                mode=InputOutputModes.UPLOAD,
                asset_name="cifar10-trained-model",
                description="CIFAR-10 training split"
            ),
        },
        resources=JobResourceConfiguration(
            instance_count=2,
            instance_type="Standard_NC6s_v3",
            shm_size="8g",
            docker_args="--ipc=host",
            properties={"AISuperComputer": {"slaTier": "Premium", "priority": "high"}},
        ),
        distribution=PyTorchDistribution(process_count_per_instance=1),
        limits=CommandJobLimits(timeout=7200),
        queue_settings=QueueSettings(job_tier="Spot"),
        is_archived=False,
    )
job = project_client.beta.training.jobs.create_or_update(name='job_name', body=job)
print(job)

All SDK Contribution checklist:

The pull request does not introduce [breaking changes]
CHANGELOG is updated for new features, bug fixes or other significant changes.
I have read the contribution guidelines.

General Guidelines and Best Practices

Title of the pull request is clear and informative.
There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

Pull request includes test coverage for the included changes.

This reverts commit 86d53c5.

…allow_preview=True" is not specified (#45600)

…#45611) * marking finetuning pause and resume operations as live extended tests * updating recording --------- Co-authored-by: Jayesh Tanna <jatanna@microsoft.com>

* rename env vars * rename env var * resolved comments * remove chat completion * resolved comment

* Add CSV and synthetic data generation evaluation samples Add two new evaluation samples under sdk/ai/azure-ai-projects/samples/evaluations/: - sample_evaluations_builtin_with_csv.py: Demonstrates evaluating pre-computed responses from a CSV file using the csv data source type. Uploads a CSV file via the datasets API, runs coherence/violence/f1 evaluators, and polls results. - sample_synthetic_data_evaluation.py: Demonstrates synthetic data evaluation (preview) that generates test queries from a prompt, sends them to a model target, and evaluates responses with coherence/violence evaluators. Also adds: - data_folder/sample_data_evaluation.csv: Sample CSV data file with 3 rows - README.md: Updated sample index with both new samples Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Update synthetic eval sample: agent target + cleaner dataset ID retrieval - Switch from model target to agent target (azure_ai_agent) - Create agent version via agents.create_version() before evaluation - Simplify output_dataset_id retrieval using getattr instead of nested hasattr/isinstance checks - Add AZURE_AI_AGENT_NAME env var requirement - Remove input_messages (not needed for agent target) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add model target synthetic eval sample, cross-reference both - Add sample_synthetic_data_model_evaluation.py for model target with input_messages system prompt - Update sample_synthetic_data_evaluation.py docstring with cross-reference - Update README.md with both synthetic samples (agent and model) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Rename synthetic agent sample, clarify README, add prompt/files comments - Rename sample_synthetic_data_evaluation.py to sample_synthetic_data_agent_evaluation.py - Clarify README: JSONL dataset vs CSV dataset descriptions - Remove (preview) from synthetic sample descriptions in README - Add comments about prompt and reference_files options in both synthetic samples Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Skip new eval samples in recording tests Add sample_evaluations_builtin_with_csv.py, sample_synthetic_data_agent_evaluation.py, and sample_synthetic_data_model_evaluation.py to samples_to_skip list since they require file upload prerequisites or are long-running preview features. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Rename env vars per PR review: FOUNDRY_PROJECT_ENDPOINT, FOUNDRY_MODEL_NAME Address review comments from howieleung: - AZURE_AI_PROJECT_ENDPOINT -> FOUNDRY_PROJECT_ENDPOINT - AZURE_AI_MODEL_DEPLOYMENT_NAME -> FOUNDRY_MODEL_NAME Updated in all 3 new samples and README. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Rename AZURE_AI_AGENT_NAME to FOUNDRY_AGENT_NAME per review Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Update changelog with new sample entries Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

)

…ects/2.0.2

…ollingMethod` should be private (#45685)

…ed when relevant (#45718)

…ects/2.0.2

* New samples * resolved comments

…ects/2.0.2

* LLM validation use 5.2 and chat completion * change log * Resolved comments

…Azure/azure-sdk-for-python into feature/azure-ai-projects/2.0.2

…ects/2.0.2

* Adding-Upload-Evaluator * Adding-Upload-Evaluator * Adding-Upload-Evaluator * Adding-Upload-Evaluator-aio * rename * added - eval and eval run * fix * adding tests * updated as per review

dargilco · 2026-03-30T13:27:02Z

Please hold off on merging the PR. I want to get Johan's feedback on introducing nested sub-clients like .beta.training.jobs. I'll start a thread with him and you.

…ects/2.0.2

…sues

* Sample-Fix * fix samples

* Instruction now not provided in test function. * fix test

…r upload (#46063) * fix(azure-ai-projects): skip all dot-prefixed directories in evaluator upload Change skip_dirs filter to exclude any directory starting with '.' instead of only '.git' and '.venv'. This covers .venv, .git, .mypy_cache, .tox, .pytest_cache, and any other hidden/tool directories. Applied to both sync and async upload functions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(azure-ai-projects): skip dot-prefixed files in evaluator upload Extend the existing skip logic to also exclude dot-prefixed files (e.g. .env, .DS_Store, .gitignore) from evaluator uploads, matching the treatment already applied to dot-prefixed directories. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat(azure-ai-projects): add file_pattern and folder_exclusions_pattern to evaluators.upload Add optional regex-based filtering parameters to _upload_folder_to_blob and upload methods, consistent with datasets.upload_folder pattern: - file_pattern: filter which files to upload by name - folder_exclusions_pattern: exclude directories by name pattern Applied to both sync and async implementations. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * refactor: remove hardcoded skip lists, let customer control via patterns Remove hardcoded skip_dirs and skip_extensions. Filtering is now fully controlled by the optional file_pattern and folder_exclusions_pattern parameters. Docstrings include recommended excludes for typical Python evaluator projects. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: restore sample_eval_upload_friendly_evaluator.py accidentally emptied Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * test: update upload tests for customer-controlled pattern filtering Replace test_upload_skips_pycache_and_pyc_files with two new tests: - test_upload_skips_pycache_and_pyc_files_with_patterns: verifies filtering works when patterns are provided - test_upload_uploads_all_files_without_patterns: verifies all files are uploaded when no patterns are given Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: fix Sphinx docstring continuation line alignment Align continuation lines under the directive name (e.g. 'p' in :param) instead of using deeper indentation, per Sphinx requirements. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ects/2.0.2

github-actions · 2026-04-06T05:13:09Z

API Change Check

APIView identified API level changes in this PR and created the following API reviews

azure-ai-projects

Co-authored-by: Copilot <copilot@github.com>

Copilot

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

Co-authored-by: Copilot <copilot@github.com>

dargilco and others added 30 commits March 6, 2026 10:29

Changes for azure-ai-projects release v2.0.1

ef2bd10

x

38e2bfe

Set beta version

95df964

Re-emit

d04d08b

fix

06c4c26

Rename env varrs name

86d53c5

Revert "Rename env varrs name"

67a8d79

This reverts commit 86d53c5.

Better Exception messages when you try to use preview features, and "…

c357ed8

…allow_preview=True" is not specified (#45600)

marking finetuning pause and resume operations as live extended tests (…

7602ddb

…#45611) * marking finetuning pause and resume operations as live extended tests * updating recording --------- Co-authored-by: Jayesh Tanna <jatanna@microsoft.com>

Rename env vars (#45599)

c2e7e57

* rename env vars * rename env var * resolved comments * remove chat completion * resolved comment

Fix azure-ai-projects linting errors with pylint version 4.0.5 (#45628)

db3ea9d

Re-emit from latest TypeSpec in branch feature/foundry-staging (#45659

cb4ef7b

)

Merge remote-tracking branch 'origin/main' into feature/azure-ai-proj…

c15af4f

…ects/2.0.2

Remove

090b923

Update project status to Production/Stable

d27b152

Classes UpdateMemoriesLROPollingMethod and `AsyncUpdateMemoriesLROP…

b70d349

…ollingMethod` should be private (#45685)

Unit-tests to make sure "Foundry-Features" HTTP request header is add…

cb6748f

…ed when relevant (#45718)

Merge remote-tracking branch 'origin/main' into feature/azure-ai-proj…

902ac13

…ects/2.0.2

Re-emit from TypeSpec, with support for api-key auth (#45748)

bf7b0ad

New samples (#45775)

98b8f73

* New samples * resolved comments

Merge remote-tracking branch 'origin/main' into feature/azure-ai-proj…

8cb669a

…ects/2.0.2

Fix missing IntelliSense from returned OpenAI client (#45800)

ed5cba5

Merge remote-tracking branch 'origin/main' into feature/azure-ai-proj…

822c347

…ects/2.0.2

LLM validation use 5.2 and chat completion (#45816)

3b48acd

* LLM validation use 5.2 and chat completion * change log * Resolved comments

update .env.template

aa8f213

Merge branch 'feature/azure-ai-projects/2.0.2' of https://github.com/…

2dcb2c1

…Azure/azure-sdk-for-python into feature/azure-ai-projects/2.0.2

Merge remote-tracking branch 'origin/main' into feature/azure-ai-proj…

54b2815

…ects/2.0.2

making trace context propagation enabled by default (#45830)

e6c5933

Custom Eval - Upload (#45678)

8dc5bb8

* Adding-Upload-Evaluator * Adding-Upload-Evaluator * Adding-Upload-Evaluator * Adding-Upload-Evaluator-aio * rename * added - eval and eval run * fix * adding tests * updated as per review

dargilco and others added 18 commits March 30, 2026 20:51

Merge remote-tracking branch 'origin/main' into feature/azure-ai-proj…

4fbde2d

…ects/2.0.2

update asserts.json, to try and fix datasets tests

edad182

try again

f225c21

Disable dataset tests. I spent enough time trying to fix recording is…

9ca0be2

…sues

Merge branch 'main' into feature/azure-ai-projects/2.0.2

e043089

Custom eval sample fix (#45989)

de2928b

* Sample-Fix * fix samples

Instruction now not provided in test function. (#46028)

07fea08

* Instruction now not provided in test function. * fix test

llm-analyze.py (#46042)

56701b1

Remove un-needed command in our post-emitter-fixes.cmd file

5e68a09

add comments to post-emitter-fixes.cmd

a1fae61

Fix doc string for get_openai_client

8941cad

allowing yaml file uploads

d7555cd

using netloc for url verification in yaml upload

7d02cdc

run 'black'

b29a495

Simplify memory samples following service-side change (#46111)

0d1b882

Merge remote-tracking branch 'origin/main' into feature/azure-ai-proj…

04806e9

…ects/2.0.2

merging with feature/azure-ai-projects/2.0.2

31be2ff

adding dedup logic

767cda0

howieleung force-pushed the feature/azure-ai-projects/2.0.2 branch from eac227e to 223cb73 Compare April 16, 2026 06:39

Base automatically changed from feature/azure-ai-projects/2.0.2 to main April 17, 2026 15:34

adding download operation with custom code

a400479

Co-authored-by: Copilot <copilot@github.com>

Copilot AI review requested due to automatic review settings April 30, 2026 07:37

Copilot AI reviewed Apr 30, 2026

View reviewed changes

Jayesh Tanna and others added 4 commits April 30, 2026 18:17

adding validate method

dd2b123

Co-authored-by: Copilot <copilot@github.com>

adding more changes

b13f536

merging with main

121196e

generating job code

0a5493c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] adding generated and custom code for custom training#45951

[DRAFT] adding generated and custom code for custom training#45951
jayesh-tanna wants to merge 69 commits intomainfrom
jatanna/trainingv1

jayesh-tanna commented Mar 27, 2026 •

edited

Loading

Uh oh!

dargilco commented Mar 30, 2026

Uh oh!

github-actions Bot commented Apr 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

jayesh-tanna commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Add Training Jobs support to azure-ai-projects SDK

Overview

Design Choices

Customizations Summary

Pending / Future Work

Sample code

All SDK Contribution checklist:

General Guidelines and Best Practices

Testing Guidelines

Uh oh!

dargilco commented Mar 30, 2026

Uh oh!

github-actions Bot commented Apr 6, 2026

API Change Check

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

jayesh-tanna commented Mar 27, 2026 •

edited

Loading

Add Training Jobs support to `azure-ai-projects` SDK