title | titleSuffix | description | manager | ms.service | ms.subservice | ms.topic | ms.date | ms.author | author | ms.reviewer | ms.custom | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
How to deploy Phi-3 family of small language models with Azure Machine Learning |
Azure Machine Learning |
Learn how to deploy Phi-3 family of small language models with Azure Machine Learning. |
scottpolly |
machine-learning |
inferencing |
how-to |
07/01/2024 |
mopeakande |
msakande |
kritifaujdar |
|
In this article, you learn about the Phi-3 family of small language models (SLMs). You also learn to use Azure Machine Learning studio to deploy models from this family as serverless APIs with pay-as-you-go token-based billing.
The Phi-3 family of SLMs is a collection of instruction-tuned generative text models. Phi-3 models are the most capable and cost-effective small language models (SLMs) available, outperforming models of the same size and next size up across various language, reasoning, coding, and math benchmarks.
Phi-3 Mini is a 3.8B parameters, lightweight, state-of-the-art open model. Phi-3-Mini was trained with Phi-3 datasets that include both synthetic data and the filtered, publicly-available websites data, with a focus on high quality and reasoning-dense properties.
The model belongs to the Phi-3 model family, and the Mini version comes in two variants, 4K and 128K, which denote the context length (in tokens) that each model variant can support.
The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3-Mini-4K-Instruct and Phi-3-Mini-128K-Instruct showcased a robust and state-of-the-art performance among models with less than 13 billion parameters.
Phi-3-Small is a 7B parameters, lightweight, state-of-the-art open model. Phi-3-Small was trained with Phi-3 datasets that include both synthetic data and the filtered, publicly-available websites data, with a focus on high quality and reasoning-dense properties.
The model belongs to the Phi-3 model family, and the Small version comes in two variants, 8K and 128K, which denote the context length (in tokens) that each model variant can support.
- Phi-3-small-8k-Instruct
- Phi-3-small-128k-Instruct
The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3-Small-8k-Instruct and Phi-3-Small-128k-Instruct showcased a robust and state-of-the-art performance among models with less than 13 billion parameters.
Phi-3 Medium is a 14B parameters, lightweight, state-of-the-art open model. Phi-3-Medium was trained with Phi-3 datasets that include both synthetic data and the filtered, publicly-available websites data, with a focus on high quality and reasoning-dense properties.
The model belongs to the Phi-3 model family, and the Medium version comes in two variants, 4K and 128K, which denote the context length (in tokens) that each model variant can support.
- Phi-3-medium-4k-Instruct
- Phi-3-medium-128k-Instruct
The model underwent a rigorous enhancement process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures. When assessed against benchmarks that test common sense, language understanding, math, code, long context and logical reasoning, Phi-3-Medium-4k-Instruct and Phi-3-Medium-128k-Instruct showcased a robust and state-of-the-art performance among models with less than 13 billion parameters.
[!INCLUDE machine-learning-preview-generic-disclaimer]
Certain models in the model catalog can be deployed as a serverless API with pay-as-you-go billing. This kind of deployment provides a way to consume models as an API without hosting them on your subscription, while keeping the enterprise security and compliance that organizations need. This deployment option doesn't require quota from your subscription.
-
An Azure subscription with a valid payment method. Free or trial Azure subscriptions won't work. If you don't have an Azure subscription, create a paid Azure account to begin.
-
An Azure Machine Learning workspace. If you don't have a workspace, use the steps in the Quickstart: Create workspace resources article to create one. The serverless API model deployment offering for Phi-3 is only available with workspaces created in these regions:
- East US 2
- Sweden Central
For a list of regions that are available for each of the models supporting serverless API endpoint deployments, see Region availability for models in serverless API endpoints.
-
Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure Machine Learning. To perform the steps in this article, your user account must be assigned the Azure AI Developer role on the resource group. For more information on permissions, see Manage access to an Azure Machine Learning workspace.
To create a deployment:
-
Select the workspace in which you want to deploy your models. To use the serverless API model deployment offering, your workspace must belong to one of the regions listed in the prerequisites section.
-
Choose the model you want to deploy, for example Phi-3-medium-128k-Instruct, from the model catalog.
-
On the model's overview page in the model catalog, select Deploy and then Serverless API with Azure AI Content Safety.
Alternatively, you can initiate deployment by going to your workspace and selecting Endpoints > Serverless endpoints > Create. Then, you can select a model.
-
In the deployment wizard, select the Pricing and terms tab to learn about pricing for the selected model.
-
Give the deployment a name. This name becomes part of the deployment API URL. This URL must be unique in each Azure region.
-
Select Deploy. Wait until the deployment is ready and you're redirected to the Deployments page. This step requires that your account has the Azure AI Developer role permissions on the resource group, as listed in the prerequisites.
-
Take note of the Target URI and the secret Key, which you can use to call the deployment and generate completions. For more information on using the APIs, see Reference: Chat Completions.
-
Select the Test tab to start interacting with the model.
-
You can always find the endpoint's details, URI, and access keys by navigating to Workspace > Endpoints > Serverless endpoints.
Models deployed as serverless APIs can be consumed using the chat API, depending on the type of model you deployed.
- In the workspace, select Endpoints > Serverless endpoints.
- Find and select the deployment you created.
- Copy the Target URI and the Key token values.
- Make an API request using the
/v1/chat/completions
API using<target_url>/v1/chat/completions
. For more information on using the APIs, see the Reference: Chat Completions.
You can find the pricing information on the Pricing and terms tab of the deployment wizard when deploying the model.
Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per workspace. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.