update aoai

30DaysOf · Apr 8, 2024 · 250c16b · 250c16b
1 parent 989d04d
commit 250c16b
Show file tree

Hide file tree

Showing 2 changed files with 99 additions and 5 deletions.
diff --git a/docs/src/content/docs/400/400-00-aoia-intro.ipynb b/docs/src/content/docs/400/400-00-aoia-intro.ipynb
@@ -4,7 +4,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# 400.0 Introduction To Azure Open AI\n",
+    "# 400.0 Introduction To Azure Open AI (AOAI)\n",
     "\n",
     "This notebook will introduce [Azure Open AI](https://learn.microsoft.com/en-us/azure/ai-services/openai/) core concepts and provide a basic walkthrough of code-first approaches to using the service from Python. Azure OpenAI is one of the core [Azure AI services](https://learn.microsoft.com/en-us/azure/ai-services/what-are-ai-services) provided by the broader [Azure AI platform](https://ai.azure.com) which focuses on [Azure AI Studio](https://learn.microsoft.com/en-us/azure/ai-studio/what-is-ai-studio?tabs=home) as the unified end-to-end development solution for generative AI apps.\n"
    ]
@@ -17,9 +17,37 @@
     "\n",
     "## 1. Core Concepts\n",
     "\n",
-    "1. **Large Language Model** (LLM) - A large language model is an AI model trained on large datasets of text data from the internet, allowing it to perform natual language processing tasks like question-answering, summarization, translation and more. They are currently used to drive **generative AI ** applications.\n",
-    "1. **Generative AI Applications** - These are AI apps that use LLMs to _generate_ human-like content including text, code, images, and audio. Their ability totake natual language input (\"prompts\") makes them a powerful interface for a wide range of applications from chatbots to code completion.\n",
-    "These are applications that use large language models to generate human-like text, code, images, and more. They are used in a wide range of applications including chatbots, code completion, and more."
+    "### 1.1 Basic Terminology\n",
+    "\n",
+    "| Concept | Explainer |\n",
+    "| --- | --- |\n",
+    "|**Large Language Model** (LLM) | A large language model is an AI model trained on large datasets of text data from the internet, allowing it to perform natual language processing tasks like question-answering, summarization, translation and more. They are currently used to drive **generative AI** applications - and defined by [model size and parameter counts](https://medium.com/@greg.broadhead/a-brief-guide-to-llm-numbers-parameter-count-vs-training-size-894a81c9258) |\n",
+    "| **Small Language Model** (SLM) | A small language model is a factor or more smaller in size (GPT is 1.76T parameters, Mistral is 7B paramteris) and may use a different architecture (e.g., GPT is self-attention, Mistral is sliding-attention). SLM are typically domain-specialized (trained on curated dataset) with potentially faster inference, less bias in outcomes. |\n",
+    "|**Generative AI Applications** | These are AI apps that use LLMs to _generate_ human-like content including text, code, images, and audio. Their ability to take natual language input (\"prompts\") makes them a powerful user-friendly interface for a wide range of applications from chatbots to code completion.|\n",
+    "| **Transformer** (Architecture)| The [transformer](https://en.wikipedia.org/wiki/Transformer_(deep_learning_architecture)) is a deep-learning architecture based on a [_multi-head attention mechanism_](https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf) - see [\"The Illustrated Transformer](https://jalammar.github.io/illustrated-transformer/) post and [Generative AI exists because of the transformer](https://ig.ft.com/generative-ai/) scrollytelling article for details. Core concepts are _tokenization_, _embeddings_, _encoders_, _decoders_, _weights_, _biases_, _parameters_ and _self-attention_.|\n",
+    "|**Generative Pre-Trained Transformer** (GPT) | Is the type of LLM driving _generative_ AI and popularized by the [GPT-n series](https://www.makeuseof.com/gpt-models-explained-and-compared/) of foundational models introduced by OpenAI (GPT-1 in 2018, GPT-4 in 2023) but also referring broadly to the many generative AI pre-trained models in use today |\n",
+    "|**Foundational Model** | Is a machine learning model trained on _broad datasets_ to create \"general purpose\" technologies that can support a wide range of application scenarios. Foundation models can be built for task categories like text generation (GPT-n), image generation (DALL-E), speech (Whisper) etc.|\n",
+    "| **Fine-Tuned Model**| Is the approach to _transfer learning_ from a pre-trained model to a specialized version for an application domain by re-training it with curated data for that domain. The tradeoffs are in cost, response quality and complexity for maintenance. The challenge is in _finding the right dataset_ for fine-tuning. |\n",
+    "| | |\n",
+    "\n",
+    "### 1.2 Azure OpenAI Concepts\n",
+    "\n",
+    "| Concept | Explainer |\n",
+    "| --- | --- |\n",
+    "| **Model Deployment** | Refers to hosting a pre-trained or fine-tuned model in the cloud, exposing an endpoiint (API) that can be integrated with applications for generative AI request/response interactions. For example: this can be a _completion API_ that takes a text input (prompt) and returns the generated output (text completion)for authorized clients. |\n",
+    "| [**Quota**](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/quota?tabs=rest) | Pre-trained Models have associated costs (priced in tokens-per-minute) and usage rate limits (enforceable by model, by region). The Azure OpenAI \"Quota\" lets you assign rate limits to your deployments up to a global limit (your quota). New deployments can't be made for a particular region/model if that quota is depleted. Your options then are to choose a different region, choose a different model, or request a quota increase.,  |\n",
+    "|[**Dynamic Quota**](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/dynamic-quota) | Dynamic quota is an Azure OpenAI feature that enables a standard (pay-as-you-go) deployment to opportunistically take advantage of more quota when extra capacity is available. It only provides a _temporary increase_ and is not predictable or guaranteed so factor that into usage.|\n",
+    "| [**Prompt Engineering**](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/prompt-engineering)| OpenAI GPT models are prompt-based making their responses sensitive to the design (structure and content) of the prompt. Understand prompt components (instruction, completion, context), primary content (examples, cues), supporting content (context, system), efficiency (tokens) and best practices (be clear & descriptive, give it time to think, have defined outs, templates for consistency) and process (iterate, evaluate)  |\n",
+    "| [**Function Calling**](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/function-calling?tabs=python) | Function calls help [connect LLM to external tools](https://platform.openai.com/docs/guides/function-calling) to enhance the default completion workflow. The LLM does **not** call the function - but it can review the definition and (a) determine if a function is relevant to workflow and (b) generate the function request object from the prompt for you. You can now call that function, the _add returned response as a new message to conversation_ and allow LLM to complete response to user.  |\n",
+    "|[**Embeddings**](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/understand-embeddings)| An embedding is a special format of data representation that machine learning models and algorithms can use easily e.g., to power vector search. AOAI uses _cosine similarity_ to compute similarity between search query (prompt) and target (documents) - and has different embedding models good for specific tasks (similarity, text search, code search). [This tutorial](https://learn.microsoft.com/en-us/azure/ai-services/openai/tutorials/embeddings?tabs=python-new%2Ccommand-line&pivots=programming-language-python) shows AOAI embeddings usage for document search  |\n",
+    "| [**Retrieval Augmented Generation** (RAG)](https://learn.microsoft.com/en-us/azure/machine-learning/concept-retrieval-augmented-generation?view=azureml-api-2&preserve-view=true) | Is a technique that can improve respponse quality by working _with your own data_ to boost response relevance. The pattern relies on sourcing and chunking data, converting text into embeddings (vectors) and creating search indexes for your data. Then orchestrating data flow from input (prompt) to completion (response) with efficient data retrieval & augmentation. |\n",
+    "| [**Fine Tuning**](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/fine-tuning-considerations) | This \"retrains an existing Large Language Model using example data, resulting in a new \"custom\" Large Language Model that has been optimized using the provided examples\". Fine tuning can improve quality and reduce quota usage - but adds complexity and costs in compute and data processing. [Compare options](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/customizing-llms) and evaluate usage carefully before you commit.|\n",
+    "| [**Content Filtering**](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/content-filter?tabs=warning%2Cpython-new) | This adds responsible AI practices into your deployment workflow (production) by gating your input (prompt) and output (completion) with customizable content filters that can _detect and act upon_ key categories ofharmful content (e.g., hate, sexual, violence, self-harm) in multiple languages. |\n",
+    "| [**Assistants**](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/assistants) | Think of \"Assistants\" as autonomous agents that can perform complex tasks and automate multi-task workflows for you with _persistent automatically managed threads_. It abstracts away issues like conversation state management and token window optimization in multi-turn conversations with the stateless completion API. And it can access powerful tools like Code Interpreter and Function Calling (code-first) or be developed in the playground (no code) | \n",
+    "| [Assistants + Code Interpreter](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/code-interpreter?tabs=python)|Code Interpreter allows the Assistants API to write and run Python code in a sandboxed execution environment. With Code Interpreter enabled, your Assistant can run code iteratively to solve more challenging code, math, and data analysis problems. |\n",
+    "| [Assistants + Function Calling ](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/assistant-functions?tabs=python)| The Assistants API supports function calling, which allows you to describe the structure of functions to an Assistant and then return the functions that need to be called along with their arguments.|\n",
+    "| [**Red Teaming**](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/red-teaming)| This is a concept related to _safety_ considerations in responsible AI. Red teaming is about using _systematic adversarial attacks_ on your application to test for security vulnerabilities. The Azure AI platform provides tools and guidance to help [operationalize Responsible AI](https://dev.to/azure/new-azure-ai-tools-to-help-operationalize-responsible-ai-for-generative-ai-apps-11ig) in your workflows |\n",
+    "\n"
    ]
   },
   {

diff --git a/docs/src/content/docs/400/400-09-oai-fine-tuning.ipynb b/docs/src/content/docs/400/400-09-oai-fine-tuning.ipynb
@@ -6,7 +6,7 @@
    "source": [
     "# 400.9 | Fine Tuning LLM on OpenAI\n",
     "\n",
-    "Notes and exercises based on the [Fine-tuning](https://platform.openai.com/docs/guides/fine-tuning) guidance from Open AI documentation. Please reference that link directly for the latest updates. My goal here is to create a sandbox where I can explore these ideas and build my own intuition interactively."
+    "Notes and exercises based on the [Fine-tuning](https://platform.openai.com/docs/guides/fine-tuning) guidance from Open AI documentation. Please reference that link directly for the latest updates. My goal here is to create a sandbox where I can explore these ideas and build my own intuition interactively. We'll start with the [How to fine-tune chat models](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_finetune_chat_models.ipynb) notebook to get hands-on experience with the process"
    ]
   },
   {
@@ -19,6 +19,72 @@
     "\n",
     "Before you begin working on exercises in this section, make sure you are setup to work with the right LLM Providers by running the relevant code cells from the [LLM Setup](./400-00-llm-setup.ipynb) notebook."
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "\n",
+    "### 0. OpenAI Guidance"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "\n",
+    "### 1. Tutorial Setup"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "\n",
+    "### 2. Data Preparation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "\n",
+    "### 3. Fine Tuning"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "\n",
+    "### 4. Inference"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---\n",
+    "\n",
+    "### 5. Conclusion"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": []
   }
  ],
  "metadata": {