Master every aspect of Azure AI Foundry — from hub creation to multi-agent orchestration, RAG, fine-tuning, evaluation, responsible AI, and enterprise deployment.
Azure-AI_Foundry/
├── README.md ← YOU ARE HERE
├── requirements.txt ← Python dependencies
├── .env.example ← Environment variable template
├── .gitignore
├── course/ ← Self-paced training course
│ ├── syllabus.md ← 12-week study plan
│ ├── hands-on-labs.md ← Lab index with walkthroughs
│ ├── module-01-introduction/ ← What is AI Foundry, architecture
│ ├── module-02-projects-and-hubs/ ← Hubs, projects, RBAC, networking
│ ├── module-03-model-catalog/ ← Model catalog, deployments, PTU
│ ├── module-04-prompt-engineering/ ← Playground, prompts, prompt flow
│ ├── module-05-rag-and-grounding/ ← RAG, AI Search, vector search
│ ├── module-06-agents/ ← Agent Service, tools, multi-agent
│ ├── module-07-fine-tuning/ ← Data prep, fine-tuning, distillation
│ ├── module-08-evaluation-and-monitoring/ ← Evaluators, tracing, monitoring
│ ├── module-09-responsible-ai/ ← Content Safety, prompt shields
│ ├── module-10-mlops-and-deployment/← CI/CD, versioning, endpoints
│ ├── module-11-integrations/ ← Semantic Kernel, LangChain, APIM
│ └── module-12-advanced-scenarios/ ← Multi-modal, batch, enterprise
├── docs/ ← Reference materials
│ ├── quick-reference-cards.md ← Cheat sheets, decision trees
│ ├── reference-architectures.md ← Mermaid architecture diagrams
│ └── tips-and-tricks.md ← 30+ practical tips
├── infra/ ← Bicep IaC templates
│ ├── README.md ← Template catalog
│ ├── ai-foundry-hub.bicep ← Hub + dependencies
│ ├── ai-foundry-project.bicep ← Project within hub
│ ├── ai-foundry-openai.bicep ← OpenAI + model deployments
│ ├── ai-foundry-search.bicep ← AI Search resource
│ └── ai-foundry-complete.bicep ← Full end-to-end deployment
├── scripts/
│ ├── python/
│ │ ├── foundry-basics/ ← Hub/project SDK operations
│ │ ├── models/ ← Chat completions, deployments
│ │ ├── agents/ ← Agent Service, multi-agent
│ │ ├── rag/ ← RAG with AI Search
│ │ ├── evaluation/ ← Evaluators, safety testing
│ │ ├── fine-tuning/ ← Fine-tune workflow
│ │ └── deployment/ ← Endpoint management
│ ├── rest-api/ ← .http files for REST examples
│ └── powershell/ ← Environment setup/cleanup
└── images/ ← Architecture diagrams
| Order | File | What You'll Learn |
|---|---|---|
| 1 | overview.md | What is AI Foundry, capabilities, when to use it |
| 2 | lesson-1-foundry-architecture.md | Hub/Project hierarchy, underlying resources, networking |
| 3 | lesson-2-getting-started.md | Create your first hub & project, portal tour, SDK setup |
| 4 | knowledge-check.md | 15 scenario-based questions |
Bicep: ai-foundry-hub.bicep | ai-foundry-project.bicep
Scripts: create_project.py
| Order | File | What You'll Learn |
|---|---|---|
| 1 | overview.md | Resource organization, hub vs project |
| 2 | lesson-1-hub-management.md | Creating/configuring hubs, connections, multi-hub topologies |
| 3 | lesson-2-project-management.md | Project RBAC, assets, quotas, sharing models |
| 4 | lesson-3-networking-security.md | Private endpoints, managed VNet, Key Vault, managed identity |
| 5 | knowledge-check.md | 15 scenario-based questions |
Bicep: ai-foundry-hub.bicep | ai-foundry-project.bicep
| Order | File | What You'll Learn |
|---|---|---|
| 1 | overview.md | Model catalog, deployment types, pricing tiers |
| 2 | lesson-1-explore-models.md | Browse catalog, model cards, benchmarks, filtering |
| 3 | lesson-2-deploy-models.md | Deploy OpenAI/OSS models, PTU vs PAYG, content filters |
| 4 | lesson-3-model-management.md | Monitoring, scaling, retirement, quota management |
| 5 | knowledge-check.md | 15 scenario-based questions |
Bicep: ai-foundry-openai.bicep
Scripts: deploy_model.py | chat_completions.py
| Order | File | What You'll Learn |
|---|---|---|
| 1 | overview.md | Playground features, prompt engineering fundamentals |
| 2 | lesson-1-playground-deep-dive.md | Chat/Completions/Images/Audio playgrounds, config export |
| 3 | lesson-2-prompt-techniques.md | Zero-shot, few-shot, CoT, JSON mode, function calling |
| 4 | lesson-3-prompt-flow.md | Prompt Flow: create, test, debug, deploy flows |
| 5 | knowledge-check.md | 15 scenario-based questions |
Scripts: chat_completions.py
| Order | File | What You'll Learn |
|---|---|---|
| 1 | overview.md | RAG pattern, why grounding matters |
| 2 | lesson-1-add-your-data.md | Add your data in portal, chunking, embedding models |
| 3 | lesson-2-azure-ai-search-integration.md | AI Search setup, hybrid search, semantic reranking |
| 4 | lesson-3-advanced-rag-patterns.md | Multi-index, conversational, agentic RAG, security |
| 5 | knowledge-check.md | 15 scenario-based questions |
Bicep: ai-foundry-search.bicep
Scripts: rag_with_ai_search.py
| Order | File | What You'll Learn |
|---|---|---|
| 1 | overview.md | Agent concepts, Agent Service vs frameworks |
| 2 | lesson-1-agent-fundamentals.md | Create agents, threads, runs, tool configuration |
| 3 | lesson-2-agent-tools.md | Code interpreter, file search, Bing, functions, Azure Functions |
| 4 | lesson-3-multi-agent-orchestration.md | Multi-agent patterns, Semantic Kernel, AutoGen |
| 5 | knowledge-check.md | 15 scenario-based questions |
Scripts: create_agent.py | multi_agent.py
| Order | File | What You'll Learn |
|---|---|---|
| 1 | overview.md | When to fine-tune vs RAG vs prompt engineering |
| 2 | lesson-1-data-preparation.md | JSONL format, quality requirements, validation split |
| 3 | lesson-2-fine-tuning-workflow.md | Create job, hyperparameters, deploy fine-tuned model |
| 4 | lesson-3-advanced-fine-tuning.md | Continuous fine-tuning, vision fine-tuning, distillation |
| 5 | knowledge-check.md | 15 scenario-based questions |
Scripts: fine_tune_model.py
| Order | File | What You'll Learn |
|---|---|---|
| 1 | overview.md | Why evaluation matters, built-in evaluators |
| 2 | lesson-1-built-in-evaluators.md | Groundedness, relevance, coherence, safety evaluators |
| 3 | lesson-2-custom-evaluators.md | Custom evaluators, LLM-as-judge, red-teaming |
| 4 | lesson-3-production-monitoring.md | Azure Monitor, App Insights, OpenTelemetry tracing |
| 5 | knowledge-check.md | 15 scenario-based questions |
Scripts: run_evaluation.py
| Order | File | What You'll Learn |
|---|---|---|
| 1 | overview.md | Microsoft RAI principles, Content Safety |
| 2 | lesson-1-content-safety.md | Text/image moderation, severity levels, blocklists |
| 3 | lesson-2-prompt-shields-and-safety.md | Prompt injection, jailbreak detection, groundedness |
| 4 | lesson-3-governance-and-compliance.md | AI governance, data privacy, EU AI Act, auditing |
| 5 | knowledge-check.md | 15 scenario-based questions |
| Order | File | What You'll Learn |
|---|---|---|
| 1 | overview.md | MLOps lifecycle, IaC for AI Foundry |
| 2 | lesson-1-endpoint-management.md | Real-time/batch endpoints, auth, scaling, A/B testing |
| 3 | lesson-2-cicd-pipelines.md | GitHub Actions, Azure DevOps, eval gates, env promotion |
| 4 | lesson-3-versioning-and-rollback.md | Model/prompt versioning, blue-green, canary, DR |
| 5 | knowledge-check.md | 15 scenario-based questions |
Scripts: deploy_endpoint.py
Bicep: ai-foundry-complete.bicep
| Order | File | What You'll Learn |
|---|---|---|
| 1 | overview.md | Integration ecosystem overview |
| 2 | lesson-1-developer-tools.md | VS Code, GitHub Copilot, Jupyter, CLI, SDK, REST |
| 3 | lesson-2-frameworks.md | Semantic Kernel, LangChain, AutoGen, LlamaIndex |
| 4 | lesson-3-enterprise-integrations.md | APIM AI Gateway, Functions, Logic Apps, Teams, Fabric |
| 5 | knowledge-check.md | 15 scenario-based questions |
| Order | File | What You'll Learn |
|---|---|---|
| 1 | overview.md | Enterprise patterns, multi-modal, batch processing |
| 2 | lesson-1-multimodal-ai.md | GPT-4o vision, Whisper, TTS, video analysis |
| 3 | lesson-2-batch-and-streaming.md | Global Batch API, SSE streaming, async patterns |
| 4 | lesson-3-enterprise-patterns.md | Multi-region, HA/DR, retry patterns, cost allocation |
| 5 | knowledge-check.md | 15 scenario-based questions |
- Read this README to understand the repo layout
- Open
course/syllabus.md— your 12-week study plan - Set up your lab environment — see Lab Environment Setup below
- Start with Module 1 — Introduction to Azure AI Foundry
- Explore the AI Foundry portal — https://ai.azure.com
# 1. Clone this repo
git clone https://github.com/Ab3y/Azure-AI_Foundry.git && cd Azure-AI_Foundry
# 2. Set up Python environment
python -m venv .venv
source .venv/bin/activate # or .venv\Scripts\Activate.ps1 on Windows
pip install -r requirements.txt
# 3. Copy and configure environment variables
cp .env.example .env
# Edit .env with your Azure resource endpoints and keys
# 4. Log into Azure
az login
az group create --name ai-foundry-labs-rg --location eastus2
# 5. Deploy AI Foundry Hub (includes Storage, Key Vault, AI Services)
az deployment group create \
--resource-group ai-foundry-labs-rg \
--template-file infra/ai-foundry-hub.bicep \
--parameters hubName=my-foundry-hub
# 6. Deploy a project within the hub
az deployment group create \
--resource-group ai-foundry-labs-rg \
--template-file infra/ai-foundry-project.bicep \
--parameters projectName=my-first-project
# 7. Clean up when done (IMPORTANT — avoid charges!)
az group delete --name ai-foundry-labs-rg --yes --no-waitEstimated lab cost: $20–$50 if completed within 2–3 weeks and resources deleted promptly. Use free tiers where available.
| Resource | Link |
|---|---|
| Azure AI Foundry Overview | https://learn.microsoft.com/en-us/azure/ai-studio/what-is-ai-studio |
| AI Foundry Portal | https://ai.azure.com |
| What is an AI Foundry Hub? | https://learn.microsoft.com/en-us/azure/ai-studio/concepts/ai-resources |
| Create a Hub Resource | https://learn.microsoft.com/en-us/azure/ai-studio/how-to/create-azure-ai-resource |
| Create a Project | https://learn.microsoft.com/en-us/azure/ai-studio/how-to/create-projects |
| RBAC Roles for AI Foundry | https://learn.microsoft.com/en-us/azure/ai-studio/concepts/rbac-ai-studio |
| Configure Private Link | https://learn.microsoft.com/en-us/azure/ai-studio/how-to/configure-private-link |
| Managed VNet | https://learn.microsoft.com/en-us/azure/ai-studio/how-to/configure-managed-network |
| Resource | Link |
|---|---|
| Model Catalog Overview | https://learn.microsoft.com/en-us/azure/ai-studio/how-to/model-catalog-overview |
| Deploy Azure OpenAI Models | https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource |
| Deploy Serverless API Models | https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-serverless |
| Provisioned Throughput (PTU) | https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/provisioned-throughput |
| Model Retirement & Upgrades | https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/model-retirements |
| Quota Management | https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/quota |
| Resource | Link |
|---|---|
| Prompt Flow Overview | https://learn.microsoft.com/en-us/azure/ai-studio/how-to/prompt-flow |
| Build a Flow | https://learn.microsoft.com/en-us/azure/ai-studio/how-to/flow-develop |
| Deploy a Flow | https://learn.microsoft.com/en-us/azure/ai-studio/how-to/flow-deploy |
| Resource | Link |
|---|---|
| RAG with AI Foundry | https://learn.microsoft.com/en-us/azure/ai-studio/concepts/retrieval-augmented-generation |
| Add Your Data | https://learn.microsoft.com/en-us/azure/ai-studio/how-to/rag-data-add |
| Azure AI Search Documentation | https://learn.microsoft.com/en-us/azure/search/ |
| Vector Search | https://learn.microsoft.com/en-us/azure/search/vector-search-overview |
| Hybrid Search | https://learn.microsoft.com/en-us/azure/search/hybrid-search-overview |
| Semantic Reranking | https://learn.microsoft.com/en-us/azure/search/semantic-search-overview |
| Integrated Vectorization | https://learn.microsoft.com/en-us/azure/search/vector-search-integrated-vectorization |
| Resource | Link |
|---|---|
| Azure AI Agent Service Overview | https://learn.microsoft.com/en-us/azure/ai-services/agents/overview |
| Agent Service Quickstart | https://learn.microsoft.com/en-us/azure/ai-services/agents/quickstart |
| Agent Tools Overview | https://learn.microsoft.com/en-us/azure/ai-services/agents/how-to/tools/overview |
| Code Interpreter | https://learn.microsoft.com/en-us/azure/ai-services/agents/how-to/tools/code-interpreter |
| File Search | https://learn.microsoft.com/en-us/azure/ai-services/agents/how-to/tools/file-search |
| Function Calling (Agents) | https://learn.microsoft.com/en-us/azure/ai-services/agents/how-to/tools/function-calling |
| Bing Grounding | https://learn.microsoft.com/en-us/azure/ai-services/agents/how-to/tools/bing-grounding |
| Azure AI Search Tool | https://learn.microsoft.com/en-us/azure/ai-services/agents/how-to/tools/azure-ai-search |
| Azure Functions Tool | https://learn.microsoft.com/en-us/azure/ai-services/agents/how-to/tools/azure-functions |
| Resource | Link |
|---|---|
| Evaluate Generative AI Apps | https://learn.microsoft.com/en-us/azure/ai-studio/how-to/evaluate-generative-ai-app |
| Evaluation SDK | https://learn.microsoft.com/en-us/azure/ai-studio/how-to/develop/evaluate-sdk |
| Built-in Evaluators | https://learn.microsoft.com/en-us/azure/ai-studio/how-to/evaluate-results |
| Safety Evaluations | https://learn.microsoft.com/en-us/azure/ai-studio/how-to/evaluate-prompts-playground |
| Resource | Link |
|---|---|
| Azure AI Content Safety | https://learn.microsoft.com/en-us/azure/ai-services/content-safety/ |
| Prompt Shields | https://learn.microsoft.com/en-us/azure/ai-services/content-safety/concepts/jailbreak-detection |
| Groundedness Detection | https://learn.microsoft.com/en-us/azure/ai-services/content-safety/concepts/groundedness |
| Protected Material Detection | https://learn.microsoft.com/en-us/azure/ai-services/content-safety/concepts/protected-material |
| Custom Categories | https://learn.microsoft.com/en-us/azure/ai-services/content-safety/concepts/custom-categories |
| Microsoft Responsible AI | https://www.microsoft.com/en-us/ai/responsible-ai |
| Resource | Link |
|---|---|
| azure-ai-projects SDK | https://learn.microsoft.com/en-us/python/api/overview/azure/ai-projects-readme |
| azure-ai-inference SDK | https://learn.microsoft.com/en-us/python/api/overview/azure/ai-inference-readme |
| azure-ai-evaluation SDK | https://learn.microsoft.com/en-us/python/api/overview/azure/ai-evaluation-readme |
| openai Python SDK (Azure) | https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/switching-endpoints |
| REST API Reference | https://learn.microsoft.com/en-us/rest/api/azureai/ |
| Azure AI Foundry SDK Docs | https://learn.microsoft.com/en-us/azure/ai-studio/how-to/develop/sdk-overview |
| Resource | Link |
|---|---|
| Semantic Kernel | https://learn.microsoft.com/en-us/semantic-kernel/overview/ |
| LangChain with Azure OpenAI | https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/integration-langchain |
| APIM as AI Gateway | https://learn.microsoft.com/en-us/azure/api-management/genai-gateway-capabilities |
| VS Code AI Toolkit | https://learn.microsoft.com/en-us/azure/ai-studio/how-to/develop/vscode |
| Training Path | Link |
|---|---|
| Get Started with Azure AI Foundry | https://learn.microsoft.com/en-us/training/paths/create-custom-copilots-ai-studio/ |
| Build AI Apps with Azure AI Foundry | https://learn.microsoft.com/en-us/training/paths/build-ai-solutions-with-azure-ai-studio/ |
| Develop Generative AI Solutions | https://learn.microsoft.com/en-us/training/paths/develop-ai-solutions-azure-openai/ |
| Implement RAG with Azure AI Search | https://learn.microsoft.com/en-us/training/paths/implement-knowledge-mining-azure-cognitive-search/ |
| Azure AI Agent Service | https://learn.microsoft.com/en-us/training/paths/build-ai-agents-azure/ |
- Always start at ai.azure.com — the Foundry portal is your command center for everything
- Use managed identity from day one — never hard-code keys; use
DefaultAzureCredentialin all scripts - Create separate hubs for dev/staging/prod — hub-level isolation prevents accidental cross-environment access
- Tag every resource with
environment,project, andownertags for cost tracking
- Use GPT-4o-mini for development — it's 10-30x cheaper than GPT-4o and fast enough for iteration
- Global deployments for non-latency-sensitive workloads — lower cost, higher availability
- Global Batch for bulk processing — 50% cheaper than standard API calls
- Delete deployments when not in use — idle PTU deployments still cost money
- Monitor token usage with Azure Monitor — set alerts before you get a surprise bill
- Use serverless API for OSS models — pay-per-token, no idle compute costs
- Enable semantic reranking in AI Search — dramatically improves RAG relevance with minimal cost
- Use streaming for user-facing chat — reduces perceived latency by showing tokens as they arrive
- Cache embeddings — don't regenerate embeddings for unchanged documents
- Right-size your AI Search tier — Basic tier handles most dev/test scenarios; Standard for production
- Use connection pooling — reuse
AzureOpenAIclient instances instead of creating new ones per request
- Private endpoints for production — never expose AI Foundry hubs to the public internet in production
- RBAC over access keys — use Azure AD authentication for all service-to-service calls
- Key Vault for secrets — store API keys, connection strings, and certificates in Key Vault
- Enable diagnostic logging — audit all API calls for compliance and troubleshooting
- Content filters are ON by default — understand the default filter levels before customizing
- System prompt is king — invest 80% of your prompt engineering effort in the system prompt
- Use JSON mode for structured outputs — add
response_format: {"type": "json_object"}for reliable parsing - Few-shot > zero-shot for consistency — provide 2-3 examples to set the pattern
- Temperature 0 for deterministic tasks — use temperature 0 for classification, extraction, and analysis
- Chain-of-thought for complex reasoning — add "Think step by step" to improve accuracy on multi-step tasks
- Start with single-agent, add complexity gradually — multi-agent is powerful but harder to debug
- Use function calling for real-time data — code interpreter can't call your APIs, but function calling can
- Set guardrails early — define what the agent should NOT do in its instructions
- Monitor agent runs — log every tool call and response for debugging and auditing
- Evaluate every change — even small prompt changes can significantly impact output quality; always run evaluations before deploying changes to production
| Directory | Scripts | Module |
|---|---|---|
foundry-basics/ |
create_project.py | Module 1-2 |
models/ |
deploy_model.py, chat_completions.py | Module 3-4 |
rag/ |
rag_with_ai_search.py | Module 5 |
agents/ |
create_agent.py, multi_agent.py | Module 6 |
evaluation/ |
run_evaluation.py | Module 8 |
fine-tuning/ |
fine_tune_model.py | Module 7 |
deployment/ |
deploy_endpoint.py | Module 10 |
| File | Endpoint |
|---|---|
chat-completions.http |
Azure OpenAI — chat, streaming, function calling, vision |
embeddings.http |
Azure OpenAI — text embeddings |
agents.http |
Agent Service — create, thread, message, run |
| Script | Purpose |
|---|---|
setup-environment.ps1 |
Deploy full lab environment |
cleanup-environment.ps1 |
Delete all lab resources |
| Template | Resources |
|---|---|
ai-foundry-hub.bicep |
Hub + Storage + Key Vault + AI Services |
ai-foundry-project.bicep |
Project within a hub |
ai-foundry-openai.bicep |
Azure OpenAI + model deployments |
ai-foundry-search.bicep |
Azure AI Search |
ai-foundry-complete.bicep |
Complete end-to-end deployment |
- Visual Studio Code with extensions:
- Python
- Jupyter
- REST Client (for .http files)
- Bicep
- Azure Account
- GitHub Copilot
- Azure AI Toolkit (Preview)
- Python 3.10+
- Azure CLI
- Git
- Azure Free Account
| Term | Definition |
|---|---|
| Hub | Top-level container in AI Foundry that provides shared resources (Storage, Key Vault, AI Services) for multiple projects |
| Project | A workspace within a hub for developing AI applications; contains models, data, evaluations, and endpoints |
| Connection | A link from a hub/project to an external resource (e.g., Azure OpenAI, AI Search, Blob Storage) |
| Deployment | An instance of a model made available via an endpoint for inference |
| PTU | Provisioned Throughput Units — reserved capacity for consistent, high-throughput inference |
| PAYG | Pay-As-You-Go — token-based pricing with no reserved capacity |
| RAG | Retrieval-Augmented Generation — pattern that grounds model responses in your own data |
| Prompt Flow | Visual authoring tool for building LLM-powered workflows with multiple nodes |
| Evaluator | A metric (built-in or custom) used to score model outputs on quality or safety dimensions |
| Agent | An AI entity that can use tools, maintain conversation state, and execute multi-step tasks autonomously |
| Content Safety | Azure service for detecting harmful content (hate, violence, sexual, self-harm) |
| Prompt Shields | Protection against prompt injection and jailbreak attacks |
| Semantic Kernel | Microsoft's open-source SDK for building AI agents and integrating LLMs into applications |
| Groundedness | Evaluation metric measuring whether model responses are factually supported by provided context |
| File | Purpose |
|---|---|
course/syllabus.md |
12-week self-paced study plan |
course/hands-on-labs.md |
Complete lab index with prerequisites |
docs/quick-reference-cards.md |
Cheat sheets and decision trees |
docs/reference-architectures.md |
Mermaid architecture diagrams |
docs/tips-and-tricks.md |
30+ practical tips organized by category |
infra/README.md |
Bicep template catalog |
requirements.txt |
Python dependencies |
.env.example |
Environment variable template |
Last updated: April 2026 (https://learn.microsoft.com/en-us/azure/search/) | | Azure AI Content Safety | Content Safety docs | | Azure AI Agent Service | Agent Service docs | | Fine-Tuning (OpenAI) | Fine-tune Azure OpenAI models | | Azure AI Document Intelligence | Document Intelligence docs | | Azure AI Speech | Speech Service docs | | Azure AI Vision | Computer Vision docs | | Azure AI Language | Language Service docs | | Azure AI Translator | Translator docs |
| Tool / SDK | URL |
|---|---|
| Azure AI Foundry SDK (Python) | azure-ai-projects on PyPI |
| Azure AI Inference SDK | azure-ai-inference on PyPI |
| Azure AI Evaluation SDK | azure-ai-evaluation on PyPI |
| Semantic Kernel (Python) | Semantic Kernel docs |
| Semantic Kernel (C#) | Semantic Kernel .NET |
| LangChain Azure Integration | langchain-openai on PyPI |
| AutoGen | AutoGen docs |
| Prompt Flow SDK | promptflow on PyPI |
| OpenAI Python SDK | openai on PyPI |
| Training Path | URL |
|---|---|
| Get started with Azure AI Foundry | MS Learn: AI Foundry fundamentals |
| Develop Generative AI solutions with Azure OpenAI | MS Learn: Azure OpenAI path |
| Implement RAG with Azure OpenAI | MS Learn: RAG path |
| Build a RAG-based copilot solution | MS Learn: Copilot with RAG |
| Azure AI Search training | MS Learn: AI Search |
| Responsible Generative AI | MS Learn: Responsible AI |
| AI-102: Azure AI Engineer Associate | MS Learn: AI-102 study path |
| Resource | URL |
|---|---|
| Azure Architecture Center — AI | AI architecture guidance |
| RAG Reference Architecture | Baseline RAG with AI Search |
| Enterprise Chat Reference | Enterprise chat with GPT |
| API Management AI Gateway | APIM as AI Gateway |
| Well-Architected Framework — AI | WAF AI workloads |
| # | Tip | Details |
|---|---|---|
| 1 | Use GPT-4o-mini as default | It's 10-30x cheaper than GPT-4o and handles 80%+ of use cases well |
| 2 | Leverage Batch API for bulk work | 50% discount on token costs for non-real-time workloads |
| 3 | Set token limits on deployments | Use max_tokens parameter to cap response length and prevent runaway costs |
| 4 | Use PTU for predictable workloads | Provisioned Throughput Units give lower per-token cost at committed volume |
| 5 | Delete idle deployments | Managed compute deployments incur cost even when idle — delete when not in use |
| 6 | Cache repeated queries | Use APIM semantic caching or application-level caching for common questions |
| 7 | Right-size AI Search | Start with Free tier for dev; Basic for small workloads; S1+ for production |
| # | Tip | Details |
|---|---|---|
| 8 | Use streaming for UX | Stream responses with SSE — users see tokens immediately instead of waiting |
| 9 | Deploy to the nearest region | Reduce latency by deploying to the region closest to your users |
| 10 | Optimize chunk size for RAG | 512 tokens is a good default; test 256-1024 to find the sweet spot for your data |
| 11 | Use global deployments for scale | Global deployments let Microsoft route to the lowest-latency region automatically |
| 12 | Precompute embeddings | Don't embed at query time — pre-embed your corpus during indexing |
| 13 | Use integrated vectorization | Let AI Search handle embedding during indexing — fewer moving parts |
| # | Tip | Details |
|---|---|---|
| 14 | Always use Managed Identity | Avoid storing API keys — use DefaultAzureCredential in code |
| 15 | Enable private endpoints | Keep AI traffic off the public internet in production |
| 16 | Layer content safety defenses | Combine content filters + prompt shields + system prompt guardrails |
| 17 | Rotate keys regularly | If you must use keys, rotate every 90 days via Key Vault |
| 18 | Restrict model access with RBAC | Use Azure AI Inference Deployment Operator for least-privilege access |
| # | Pitfall | Solution |
|---|---|---|
| 19 | Ignoring token limits | Always check the model's context window — GPT-4o supports 128K tokens |
| 20 | Skipping evaluation | Never deploy without running groundedness + safety evaluations first |
| 21 | Over-engineering prompts | Start simple, add complexity only when needed — test each change |
| 22 | Not handling rate limits | Implement exponential backoff and retry logic in production code |
| 23 | Mixing dev and prod in one hub | Use separate hubs for dev/staging/prod for proper isolation |
| # | Tip | Details |
|---|---|---|
| 24 | Use system prompt versioning | Store prompts in git and track changes like code — prompts are code |
| 25 | Test with adversarial inputs | Before deploying, test with prompt injection attempts and edge cases |
| 26 | Monitor token usage per user | Track consumption with APIM policies or custom telemetry |
| 27 | Use response_format: json_object |
For structured outputs, force JSON mode to avoid parsing failures |
| 28 | Combine RAG + fine-tuning | RAG for facts/knowledge, fine-tuning for style/format — they complement |
| 29 | Keep AI Search index fresh | Schedule indexer runs or use change detection for near-real-time updates |
| 30 | Export playground configs | Use "View code" in the playground to export your settings to Python/curl |
| Directory | Purpose | Key Scripts |
|---|---|---|
scripts/python/foundry-basics/ |
Hub & project management | Create hub, list projects, manage connections |
scripts/python/models/ |
Model catalog & inference | Deploy models, run inference, compare outputs |
scripts/python/rag/ |
RAG pipeline | Index documents, embed, search, generate |
scripts/python/agents/ |
Agent Service | Create agents, manage threads, use tools |
scripts/python/fine-tuning/ |
Fine-tuning | Upload data, create jobs, monitor training |
scripts/python/evaluation/ |
Evaluation pipeline | Run built-in & custom evaluators |
scripts/python/deployment/ |
Deployment automation | Deploy endpoints, traffic splitting |
| Directory | Purpose | Key Scripts |
|---|---|---|
scripts/powershell/ |
Azure resource management | Provision, configure, and cleanup Azure resources |
| Directory | Purpose | Key Scripts |
|---|---|---|
scripts/rest-api/ |
Raw HTTP calls | Direct API calls using curl/HTTP files for every operation |
| Directory | Purpose | Key Files |
|---|---|---|
infra/ |
Bicep templates | Hub, project, OpenAI, AI Search, Storage, Key Vault, networking |
| Extension | Publisher | Purpose |
|---|---|---|
| Azure AI Foundry | Microsoft | Manage hubs, projects, and deployments from VS Code |
| Azure Account | Microsoft | Azure authentication and subscription management |
| Azure Resources | Microsoft | Browse and manage Azure resources |
| Bicep | Microsoft | Syntax highlighting and IntelliSense for Bicep files |
| Python | Microsoft | Python language support |
| Jupyter | Microsoft | Interactive notebooks for experimentation |
| REST Client | Huachao Mao | Send HTTP requests directly from .http files |
| GitHub Copilot | GitHub | AI-powered code completion |
| Prompt Flow | Microsoft | Visual prompt flow editor |
| Package | Purpose | Install |
|---|---|---|
azure-ai-projects |
AI Foundry project management | pip install azure-ai-projects |
azure-ai-inference |
Model inference (chat, completions, embeddings) | pip install azure-ai-inference |
azure-ai-evaluation |
Evaluation framework | pip install azure-ai-evaluation |
azure-ai-contentsafety |
Content Safety API | pip install azure-ai-contentsafety |
azure-search-documents |
Azure AI Search client | pip install azure-search-documents |
azure-identity |
Authentication (DefaultAzureCredential) | pip install azure-identity |
openai |
OpenAI Python SDK (works with Azure) | pip install openai |
semantic-kernel |
Semantic Kernel SDK | pip install semantic-kernel |
langchain-openai |
LangChain + Azure OpenAI | pip install langchain-openai |
promptflow |
Prompt Flow SDK | pip install promptflow |
tiktoken |
Token counting for OpenAI models | pip install tiktoken |
python-dotenv |
Environment variable management | pip install python-dotenv |
| Tool | Purpose | Install |
|---|---|---|
az (Azure CLI) |
Azure resource management | Install Azure CLI |
az ml (ML Extension) |
Azure ML CLI commands | az extension add -n ml |
az ai (AI Extension) |
Azure AI CLI commands | az extension add -n ai |
pf (Prompt Flow CLI) |
Prompt Flow from the command line | pip install promptflow-tools |
bicep |
Infrastructure as Code | az bicep install |
| Term | Definition |
|---|---|
| Azure AI Foundry | Microsoft's unified platform for building, evaluating, and deploying generative AI applications. Formerly known as Azure AI Studio. |
| Hub | A top-level Azure resource that provides shared infrastructure (networking, identity, connections) for one or more AI projects. |
| Project | An isolated workspace within a hub where developers build, test, and deploy AI solutions. |
| Model Catalog | A curated gallery of 1,700+ foundation models from OpenAI, Meta, Mistral, Microsoft, and others available for deployment. |
| MaaS (Models as a Service) | Serverless model deployment — no infrastructure to manage, pay-per-token billing. Also called "Serverless API." |
| MaaP (Models as a Platform) | Managed compute model deployment — dedicated VMs, full control over the hosting environment. |
| PTU (Provisioned Throughput Unit) | Reserved capacity for Azure OpenAI models that guarantees a specific throughput level at committed pricing. |
| RAG (Retrieval-Augmented Generation) | A pattern that retrieves relevant documents and injects them into the LLM prompt to ground responses in your data. |
| Grounding | The practice of anchoring LLM responses in factual, source-backed information to reduce hallucinations. |
| Prompt Flow | A visual tool for building, testing, and deploying LLM-based workflows as directed acyclic graphs (DAGs). |
| Semantic Kernel | Microsoft's open-source SDK for building AI agents and integrating LLMs into applications using plugins and planners. |
| Agent | An AI system that can autonomously use tools, execute code, search files, and maintain conversation threads. |
| Thread | A persistent conversation session in the Agent Service that maintains message history across interactions. |
| Content Safety | Azure service that detects and filters harmful content (hate, violence, sexual, self-harm) in text and images. |
| Prompt Shield | A Content Safety feature that detects prompt injection attacks in both user inputs and grounding documents. |
| Groundedness Detection | A Content Safety feature that identifies when LLM outputs contain information not supported by the source context. |
| Fine-Tuning | The process of further training a pre-trained model on your own dataset to specialize its behavior for a specific task. |
| Distillation | Training a smaller, cheaper model to replicate the behavior of a larger model using the larger model's outputs as training data. |
| Evaluator | A metric or scoring function used to measure the quality, safety, or performance of an AI application's outputs. |
| LLM-as-Judge | Using a powerful LLM (e.g., GPT-4o) to evaluate the quality of another model's outputs — a form of automated evaluation. |
| Blue-Green Deployment | A deployment strategy that runs two identical environments (blue = current, green = new), switching traffic with zero downtime. |
| Token | The basic unit of text processing in LLMs. Roughly 4 characters or ¾ of a word in English. |
| Embedding | A numerical vector representation of text used for similarity search. Models like text-embedding-3-large produce 3,072-dimensional vectors. |
| Vector Search | A search method that finds semantically similar content by comparing embedding vectors using algorithms like HNSW. |
| Hybrid Search | Combining traditional keyword search with vector search and optional semantic reranking for the best retrieval quality. |
| Chunking | Splitting large documents into smaller segments for embedding and indexing in a vector store. |
| APIM AI Gateway | Using Azure API Management as a gateway for AI endpoints — adds rate limiting, caching, load balancing, and observability. |
| DefaultAzureCredential | The recommended authentication class from azure-identity that tries multiple auth methods (Managed Identity, CLI, etc.) in order. |
| Managed Identity | An Azure AD identity automatically managed by Azure, eliminating the need for storing credentials in code. |
| Serverless API | A deployment option where the model is hosted by Microsoft — no infrastructure to manage, billed per token consumed. |
| Online Endpoint | A REST API endpoint that serves real-time model inference requests with auto-scaling and traffic management. |
Built with ❤️ by Abe Abraham — learning Azure AI Foundry one module at a time.
Last updated: 2025