
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning">
</div>



# LAB - Building Multi-stage AI System

In this lab, you will construct a multi-stage reasoning system using Databricks' features and LangChain.

You will start by building the first chain, which performs a search using a dataset containing product descriptions from Etsy. Following that, you will create the second chain, which creates an image for the proposed product. Finally, you will integrate these chains to form a complete multi-stage AI system.


**Lab Outline:**

In this lab, you will need to complete the following tasks;

* **Task 1:** Create a Vector Store

* **Task 2:** Build the First Chain (Vector Store Search)

* **Task 3:** Build the Second Chain (Product Image)

* **Task 4:**  Integrate Chains into a Multi-chain System

**📝 Your task:** Complete the **`<FILL_IN>`** sections in the code blocks and follow the other steps as instructed.

## REQUIRED - SELECT CLASSIC COMPUTE
Before executing cells in this notebook, please select your classic compute cluster in the lab. Be aware that **Serverless** is enabled by default.

Follow these steps to select the classic compute cluster:
1. Navigate to the top-right of this notebook and click the drop-down menu to select your cluster. By default, the notebook will use **Serverless**.

2. If your cluster is available, select it and continue to the next cell. If the cluster is not shown:

   - Click **More** in the drop-down.
   
   - In the **Attach to an existing compute resource** window, use the first drop-down to select your unique cluster.

**NOTE:** If your cluster has terminated, you might need to restart it in order to select it. To do this:

1. Right-click on **Compute** in the left navigation pane and select *Open in new tab*.

2. Find the triangle icon to the right of your compute cluster name and click it.

3. Wait a few minutes for the cluster to start.

4. Once the cluster is running, complete the steps above to select your cluster.

## Requirements

Please review the following requirements before starting the lesson:

* To run this notebook, you need to use one of the following Databricks runtime(s): **15.4.x-cpu-ml-scala2.12**


## Classroom Setup

Before starting the lab, run the provided classroom setup script. This script will define configuration variables necessary for the lab. Execute the following cell:

In [0]:
%pip install -U -qq databricks-sdk databricks-vectorsearch langchain-databricks langchain==0.3.7 langchain-community==0.3.7

dbutils.library.restartPython()

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


In [0]:
%run ../Includes/Classroom-Setup-02LAB

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


30
30



The examples and models presented in this course are intended solely for demonstration and educational purposes.
 Please note that the models and prompt examples may sometimes contain offensive, inaccurate, biased, or harmful content.


**Other Conventions:**

Throughout this demo, we'll refer to the object `DA`. This object, provided by Databricks Academy, contains variables such as your username, catalog name, schema name, working directory, and dataset locations. Run the code block below to view these details:

In [0]:
print(f"Username:          {DA.username}")
print(f"Catalog Name:      {DA.catalog_name}")
print(f"Schema Name:       {DA.schema_name}")
print(f"Working Directory: {DA.paths.working_dir}")
print(f"Dataset Location:  {DA.paths.datasets}")

Username:          labuser11182879_1754974636@vocareum.com
Catalog Name:      dbacademy
Schema Name:       labuser11182879_1754974636
Working Directory: /Volumes/dbacademy/ops/labuser11182879_1754974636@vocareum_com
Dataset Location:  NestedNamespace (dais='/Volumes/dbacademy_dais/v01', docs='/Volumes/dbacademy_docs/v01')


## Load Dataset

Before you start building the AI chain, you need to load and prepare the dataset and save it as a Delta table.  
For this demo, we will use the **[Databricks Documentation Dataset](/marketplace/consumer/listings/03bbb5c0-983d-4523-833a-57e994d76b3b?o=1120757972560637)** available from the Databricks Marketplace.

This dataset contains documentation pages with associated `id`, `url`, and `content`.  
We will format the data to create a single unified `document` field combining the URL and content, which will then be used to build a Vector Store.

The table will be created for you in the next code block.

In [0]:
## Load the docs table from Unity Catalog
vs_source_table_fullname = f"{DA.catalog_name}.{DA.schema_name}.docs"
create_docs_table(vs_source_table_fullname)
## Display a sample of the data
display(spark.sql(f"SELECT * FROM {vs_source_table_fullname}"))

Validation of table dbacademy_docs.v01.docs complete. No errors found.


id,document
22693,"## URL: https://docs.databricks.com/en/admin/access-control/auth-external.html ## Content: Enable authentication to external Databricks services Databricks administrators can enable users to authenticate directly to external Databricks services like the Ideas Portal and the Help Center using their Databricks workspace credentials. Requirements Requirements This feature requires the Premium plan or above. Ideas Portal Ideas Portal The Ideas Portal, powered by Aha!, lets you provide product feedback, vote up product ideas, and check on the progress of your favorite ideas. When you enable authentication to the Ideas Portal, all of your users will be able to access the Ideas Portal using delegated Databricks authentication: as long as they have an active Databricks workspace session, they will be signed into the Ideas Portal automatically. They can go to the workspace help menu (click the question mark in the upper right) and click Feedback, or they can use the Databricks SSO login option when they go to ideas.databricks.com. If they do not have an active Databricks workspace session, they will be prompted to log into Databricks and then taken to the Ideas Portal. For more information about the Ideas Portal, see Submit product feedback. Access to the Ideas Portal is enabled by default for all users in your workspace. To change access: Go to the settings page. Click the Advanced tab. Click the Authentication to Ideas Portal toggle. Click Confirm. Help Center Help Center If your organization has a Databricks Support contract, your authorized support contacts can use the Help Center, powered by Salesforce, to submit and monitor support cases. Users who are not support contacts can use the Help Center to search for help across multiple Databricks content sites without having to authenticate. When you enable authentication to the Help Center, users will be able to access the case submission and monitoring interface in the Help Center using delegated Databricks authentication, as long as their Databricks username for this workspace is registered in Salesforce as an authorized support contact. Delegated Databricks authentication signs such users in to the Help Center automatically if they have an active workspace session in their browser. They can go to the workspace help menu (click the question mark in the upper right) and click Support, or they can use the Databricks Sign-in option when they go to the Help Center. If they do not have an active Databricks workspace session when they attempt to sign in from the Help Center, they will be prompted to log into Databricks and then taken to the Help Center. For more information about the support process, see Support. Access to the Help Center is enabled by default for all users who meet the criteria described above. To change access: Go to the settings page. Click the Advanced tab. Click the Authentication to Help Center toggle. Click Confirm."
22694,"## URL: https://docs.databricks.com/en/admin/access-control/tokens.html ## Content: Monitor and manage personal access tokens To authenticate to the Databricks REST API, a user can create a personal access token and use it in their REST API request. This article explains how workspace admins can manage personal access tokens in their workspace. To create a personal access token, see Databricks personal access token authentication. To create a personal access token on behalf of a service principal, see Manage tokens for a service principal. Overview of personal access token management Overview of personal access token management Personal access tokens are enabled by default for all Databricks workspaces that were created in 2018 or later. When personal access tokens are enabled on a workspace, users with the CAN USE permission can generate personal access tokens to access Databricks REST APIs, and they can generate these tokens with any expiration date they like, including an indefinite lifetime. By default, no non-admin workspace users have the CAN USE permission, meaning that they cannot create or use personal access tokens. As a Databricks workspace admin, you can disable personal access tokens for a workspace, monitor and revoke tokens, control which non-admin users can create tokens and use tokens, and set a maximum lifetime for new tokens. Managing personal access tokens in your workspace requires the Premium plan or above. To create a personal access token, see Databricks personal access token authentication. Enable or disable personal access token authentication for the workspace Enable or disable personal access token authentication for the workspace Personal access token authentication is enabled by default for all Databricks workspaces that were created in 2018 or later. You can change this setting in the workspace settings page. When personal access tokens are disabled for a workspace, personal access tokens cannot be used to authenticate to Databricks and workspace users and service principals cannot create new tokens. No tokens are deleted when you disable personal access token authentication for a workspace. If tokens are re-enabled later, any non-expired tokens are available for use. If you want to disable token access for a subset of users, you can keep personal access token authentication enabled for the workspace and set fine-grained permissions for users and groups. See Control who can create and use tokens. Warning Partner Connect, partner integrations, and service principals require personal access tokens to be enabled on a workspace. To disable the ability to create and use personal access tokens for the workspace: Go to the settings page. Click the Advanced tab. Click the Personal Access Tokens toggle. Click Confirm. This change may take a few seconds to take effect. You can also use the Workspace configuration API to disable personal access tokens for the workspace. Control who can create and use tokens Control who can create and use tokens Workspace admins can set permissions on personal access tokens to control which users, service principals, and groups can create and use tokens. For details on how to configure personal access token permissions, see Manage access to Databricks automation. Set maximum lifetime of new tokens Set maximum lifetime of new tokens You can manage the maximum lifetime of new tokens in your workspace using the Databricks CLI. This limit applies only to new tokens. Set maxTokenLifetimeDays to the maximum token lifetime of new tokens in days, as an integer. If you set it to zero, new tokens are permitted to have no lifetime limit. For example: databricks workspace-conf set-status --json '{ ""maxTokenLifetimeDays"": ""90"" }' You can also use the Workspace configuration API to manage the maximum lifetime for new tokens in a workspace. Monitor and revoke tokens Monitor and revoke tokens This section describes how to use the Databricks CLI to manage existing tokens in the workspace. You can also use the Token Management API. Get tokens for the workspace To get the workspace’s tokens: databricks token-management list You can filter results by a user by using the flags created-by-id (to filter by the user ID) or created-by-username (to filter by the username). For example: databricks token-management list --created-by-username user@company.com Example response: ID Created By Comment token-id user@company.com dev Delete (revoke) a token To delete a token, replace TOKEN_ID with the id of the token to delete: databricks token-management delete TOKEN_ID"
22695,"## URL: https://docs.databricks.com/en/admin/account-settings-e2/audit-aws-call-api.html ## Content: Step 4: Call the log delivery API This article describes how to call the log delivery API. This is the last step in the audit log delivery configuration. To configure log delivery, you must call the log delivery API. Required values Required values In your API call, specify the following values that you copied in the previous steps: credentials_id: Your Databricks credential configuration ID, which represents your cross-account role credentials. storage_configuration_id: Your Databricks storage configuration ID, which represents your root S3 bucket. Also set the following fields: log_type: Set to AUDIT_LOGS. output_format: Set to JSON. delivery_path_prefix: (Optional) Set to the path prefix. This must match the path prefix that you used in your role policy. The delivery path is //workspaceId=/date=/auditlogs_.json. If you configure audit log delivery for the entire account, account-level audit events that are not associated with any single workspace are delivered to the workspaceId=0 partition. workspace_ids_filter: (Optional) To ensure delivery of account-level events, including Unity Catalog and Delta Sharing events, leave workspace_ids_filter empty. If you only want logs for select workspaces, set to an array of workspace IDs (each one is an int64). If you add specific workspace IDs in this field, you won’t receive account-level logs and or logs for workspaces created in the future. API call example API call example Here is an example call to the log delivery API: curl -X POST 'https://accounts.cloud.databricks.com/api/2.0/accounts//log-delivery' \ --header 'Authorization: Bearer $OAUTH_TOKEN' \ -d '{ ""log_delivery_configuration"": { ""log_type"": ""AUDIT_LOGS"", ""config_name"": ""audit log config"", ""output_format"": ""JSON"", ""credentials_id"": """", ""storage_configuration_id"": """", ""delivery_path_prefix"": ""auditlogs-data"", ""workspace_ids_filter"": [ 6383650456894062, 4102272838062927 ] } }' Example response: { ""log_delivery_configuration"": { ""config_id"": """", ""config_name"": ""audit log config"", ""log_type"": ""AUDIT_LOGS"", ""output_format"": ""JSON"", ""account_id"": """", ""credentials_id"": """", ""storage_configuration_id"": """", ""workspace_ids_filter"": [ 6383650456894062, 4102272838062927 ], ""delivery_path_prefix"": ""auditlogs-data"", ""status"": ""ENABLED"", ""creation_time"": 1591638409000, ""update_time"": 1593108904000, ""log_delivery_status"": { ""status"": ""CREATED"", ""message"": ""Log Delivery Configuration is successfully created. Status will be updated after the first delivery attempt."" } } } Note After initial setup or other log delivery configuration changes, expect a delay of up to one hour until changes take effect. After logging delivery begins, auditable events are typically logged within 15 minutes. Next steps Next steps Once you’ve configured your audit log delivery, learn more about the log schema and available logs by referencing the Audit log reference."
22696,"## URL: https://docs.databricks.com/en/admin/account-settings-e2/audit-aws-credentials.html ## Content: Step 2: Configure credentials for audit log delivery This article describes how to set up IAM services for audit log delivery. To use different credentials for different workspaces, repeat the procedures in this article for each workspace or group of workspaces. Note To use different S3 bucket names, you need to create separate IAM roles. Create the IAM role Create the IAM role Log into your AWS Console as a user with administrator privileges and go to the IAM service. Click the Roles tab in the sidebar. Click Create role. In Select type of trusted entity, click AWS service. Under Use Case, select EC2. Click the Next button. Click the Next button. In the Role name field, enter a role name. Click Create role. The list of roles displays. Create the inline policy Create the inline policy In the list of roles, click the role you created. Add an inline policy. On the Permissions tab, click Add permissions then Create inline policy. In the policy editor, click the JSON tab. Copy this access policy and modify it. Replace the following values in the policy with your own configuration values: : The bucket name of your AWS S3 bucket. : (Optional) The path to the delivery location in the S3 bucket. If unspecified, the logs are delivered to the root of the bucket. This path must match the delivery_path_prefix argument when you call the log delivery API. { ""Version"":""2012-10-17"", ""Statement"":[ { ""Effect"":""Allow"", ""Action"":[ ""s3:GetBucketLocation"" ], ""Resource"":[ ""arn:aws:s3:::"" ] }, { ""Effect"":""Allow"", ""Action"":[ ""s3:PutObject"", ""s3:GetObject"", ""s3:DeleteObject"", ""s3:PutObjectAcl"", ""s3:AbortMultipartUpload"" ], ""Resource"":[ ""arn:aws:s3::://"", ""arn:aws:s3::://*"" ] }, { ""Effect"":""Allow"", ""Action"":[ ""s3:ListBucket"", ""s3:ListMultipartUploadParts"", ""s3:ListBucketMultipartUploads"" ], ""Resource"":""arn:aws:s3:::"", ""Condition"":{ ""StringLike"":{ ""s3:prefix"":[ """", ""/*"" ] } } } ] } You can customize the policy usage of the path prefix in the following ways: If you do not want to use the bucket path prefix, remove / (including the final slash) from the policy each time it appears. If you want log delivery configurations for different workspaces that share the S3 bucket but use different path prefixes, you can include multiple path prefixes. There are two separate parts of the policy that reference . For each case, duplicate the two lines that reference the path prefix. For example: { ""Resource"":[ ""arn:aws:s3:::/field-team/"", ""arn:aws:s3:::/field-team/*"", ""arn:aws:s3:::/finance-team/"", ""arn:aws:s3:::/finance-team/*"" ] } Click Review policy. In the Name field, enter a policy name. Click Create policy. If you use service control policies to deny certain actions at the AWS account level, ensure that sts:AssumeRole is whitelisted so Databricks can assume the cross-account role. Create the trust policy"
22697,"## URL: https://docs.databricks.com/en/admin/account-settings-e2/audit-aws-credentials.html ## Content: Create the trust policy On the role summary page, click the Trust Relationships tab. Paste this access policy into the editor, replacing with your Databricks account ID. The policy uses the Databricks AWS account ID 414351767826. If you are are using Databricks on AWS GovCloud use the Databricks account ID 044793339203. { ""Version"":""2012-10-17"", ""Statement"":[ { ""Effect"":""Allow"", ""Principal"":{ ""AWS"":""arn:aws:iam::414351767826:role/SaasUsageDeliveryRole-prod-IAMRole-3PLHICCRR1TK"" }, ""Action"":""sts:AssumeRole"", ""Condition"":{ ""StringEquals"":{ ""sts:ExternalId"":[ """" ] } } } ] } In the role summary, copy the Role ARN. You need this value to call the create credential configuration API in the next step. Call the create credential configuration API Call the create credential configuration API To finish settings up your credentials, call the Create credential configuration API. This request establishes cross-account trust and returns a reference ID you can use when creating a new workspace. Replace with your Databricks account ID. Set credentials_name to a name that is unique within your account. Set role_arn to the role ARN that you just created. The response body includes a credentials_id field. Copy this field so you can use it to create the log delivery configuration in Step 4. For example: curl -X POST -n \ 'https://accounts.cloud.databricks.com/api/2.0/accounts//credentials' \ -d '{ ""credentials_name"": ""databricks-credentials-v1"", ""aws_credentials"": { ""sts_role"": { ""role_arn"": ""arn:aws:iam:::role/my-company-example-role"" } } }' Example response: { ""credentials_id"": """", ""account_id"": """", ""aws_credentials"": { ""sts_role"": { ""role_arn"": ""arn:aws:iam:::role/my-company-example-role"", ""external_id"": """" } }, ""credentials_name"": ""databricks-credentials-v1"", ""creation_time"": 1579753556257 } Again, copy the credentials_id field from the response for later use. Next steps Next steps If you need to set up cross-account delivery (your S3 bucket is in a different AWS account than the IAM role used for log delivery), see Step 3: Configure cross-account support (Optional). If your S3 bucket is in the same AWS account as your IAM role used for log delivery, skip to the final step of calling the log delivery API. See Step 4: Call the log delivery API."
22698,"## URL: https://docs.databricks.com/en/admin/account-settings-e2/audit-aws-cross-account.html ## Content: Step 3: Configure cross-account support (Optional) This article describes how to set up cross-account audit log delivery. If your S3 bucket is in the same AWS account as your IAM role used for log delivery, skip this step. To deliver logs to an AWS account other than the one used for your Databricks workspace, you must add the S3 bucket policy provided in this step. This policy references IDs for the cross-account IAM role that you created in Step 2: Configure credentials for audit log delivery. In the AWS Console, go to the S3 service. Click the bucket name. Click the Permissions tab. Click the Bucket Policy button. Click the Edit button. Copy and modify this bucket policy. Replace with the S3 bucket name, with the role ID of your newly-created IAM role, and with the bucket path prefix you want. { ""Version"": ""2012-10-17"", ""Statement"": [ { ""Effect"": ""Allow"", ""Principal"": { ""AWS"": [""arn:aws:iam::""] }, ""Action"": ""s3:GetBucketLocation"", ""Resource"": ""arn:aws:s3:::"" }, { ""Effect"": ""Allow"", ""Principal"": { ""AWS"": ""arn:aws:iam::"" }, ""Action"": [ ""s3:PutObject"", ""s3:GetObject"", ""s3:DeleteObject"", ""s3:PutObjectAcl"", ""s3:AbortMultipartUpload"", ""s3:ListMultipartUploadParts"" ], ""Resource"": [ ""arn:aws:s3::://"", ""arn:aws:s3::://*"" ] }, { ""Effect"": ""Allow"", ""Principal"": { ""AWS"": ""arn:aws:iam::"" }, ""Action"": ""s3:ListBucket"", ""Resource"": ""arn:aws:s3:::"", ""Condition"": { ""StringLike"": { ""s3:prefix"": [ """", ""/*"" ] } } } ] } Customize path prefixes Customize path prefixes You can customize the policy use of the path prefix: If you do not want to use the bucket path prefix, remove / (including the final slash) from the policy each time it appears. If you want log delivery configurations for multiple workspaces that share the S3 bucket but use different path prefixes, you can include multiple path prefixes. There are two separate parts of the policy that reference . For each case, duplicate the two lines that reference the path prefix. For example: { ""Resource"":[ ""arn:aws:s3:::/field-team/"", ""arn:aws:s3:::/field-team/*"", ""arn:aws:s3:::/finance-team/"", ""arn:aws:s3:::/finance-team/*"" ] } Next steps Next steps Finally, you’ll call the log delivery API to finish setting up delivery. See Step 4: Call the log delivery API."
22699,"## URL: https://docs.databricks.com/en/admin/account-settings-e2/audit-aws-storage.html ## Content: Step 1: Configure audit log storage This article explains how to set up an AWS S3 storage bucket for low-latency delivery of audit logs. Create the S3 bucket Create the S3 bucket Log into your AWS Console as a user with administrator privileges and go to the S3 service. Click the Create bucket button. In Bucket name, enter a name for your bucket. For more bucket naming guidance, see the AWS bucket naming rules. Click Create bucket. Create a Databricks storage configuration record Create a Databricks storage configuration record Next, you need to create a Databricks storage configuration record that represents your new S3 bucket. Specify your S3 bucket by calling the create new storage configuration API. Pass the following values: storage_configuration_name: New unique storage configuration name. root_bucket_info: A JSON object that contains a bucket_name field that contains your S3 bucket name. For example: curl -X POST 'https://accounts.cloud.databricks.com/api/2.0/accounts//storage-configurations' \ --header 'Authorization: Bearer $OAUTH_TOKEN' \ -d '{ ""storage_configuration_name"": ""databricks-workspace-storageconf-v1"", ""root_bucket_info"": { ""bucket_name"": ""my-company-example-bucket"" } }' Response: { ""storage_configuration_id"": """", ""account_id"": """", ""root_bucket_info"": { ""bucket_name"": ""my-company-example-bucket"" }, ""storage_configuration_name"": ""databricks-workspace-storageconf-v1"", ""creation_time"": 1579754875555 } Copy the storage_configuration_id value returned in the response body. You’ll need it when you call the log delivery API. Next steps Next steps Next, configure an IAM role and create a credential in Databricks. See Step 2: Configure credentials for audit log delivery."
22700,"## URL: https://docs.databricks.com/en/admin/account-settings-e2/credentials.html ## Content: Create an IAM role for workspace deployment This article describes how to: Create and configure a cross-account IAM role for Databricks workspace deployment. This role gives Databricks limited access to your AWS account for the purposes of creating and managing compute and VPC resources. Use the Databricks account console to create a credential configuration that references the IAM role. Requirements Requirements You need to be a Databricks account admin. Automate IAM role creation Automate IAM role creation You can automate the IAM role creation by using one of the following automation options: The AWS Quick Start (CloudFormation) to deploy your workspace. This is the recommended workspace deployment method. The Databricks Terraform provider. See Create Databricks workspaces using Terraform. Manual IAM role creation Manual IAM role creation The following steps apply to a custom AWS workspace deployment. You only need to follow these steps if you are deploying a workspace using the Custom AWS configuration option. Step 1: Create a cross-account IAM role Step 2: Create an access policy Step 3: Create a credential configuration for the role in Databricks Step 1: Create a cross-account IAM role Step 1: Create a cross-account IAM role Get your Databricks account ID. See Locate your account ID. Log into your AWS Console as a user with administrator privileges and go to the IAM console. Click the Roles tab in the sidebar. Click Create role. In Select type of trusted entity, click the AWS account tile. Select the Another AWS account checkbox. In the Account ID field, enter the Databricks account ID 414351767826. This is not the Account ID you copied from the Databricks account console. If you are are using Databricks on AWS GovCloud use the Databricks account ID 044793339203. Select the Require external ID checkbox. In the External ID field, enter your Databricks account ID, which you copied from the Databricks account console. Click the Next button. In the Add Permissions page, click the Next button. You should now be on the Name, review, and create page. In the Role name field, enter a role name. Click Create role. The list of roles appears. Step 2: Create an access policy"
22701,"## URL: https://docs.databricks.com/en/admin/account-settings-e2/credentials.html ## Content: Step 2: Create an access policy The access policy you add to the role depends on your Amazon VPC (Virtual Private Cloud) deployment type. For information about how Databricks uses each permission, see IAM permissions for Databricks-managed VPCs. Use the policy instructions that describe your deployment: Option 1: Default. A single VPC that Databricks creates and configures in your AWS account. This is the default configuration. Option 2: Customer-managed VPC with default restrictions. Create your Databricks workspaces in your own VPC, using a feature known as customer-managed VPC. Option 3: Customer-managed VPC with custom restrictions. Create your Databricks workspaces in your own VPC with custom restrictions for account ID, VPC ID, AWS Region, and security group. Option 1: Default deployment policy In the Roles section of the IAM console, click the IAM role you created in Step 1. Click the Add permissions drop-down and select Create inline policy. In the policy editor, click the JSON tab. Copy and paste the following access policy: { ""Version"": ""2012-10-17"", ""Statement"": [ { ""Sid"": ""Stmt1403287045000"", ""Effect"": ""Allow"", ""Action"": [ ""ec2:AllocateAddress"", ""ec2:AssignPrivateIpAddresses"", ""ec2:AssociateDhcpOptions"", ""ec2:AssociateIamInstanceProfile"", ""ec2:AssociateRouteTable"", ""ec2:AttachInternetGateway"", ""ec2:AttachVolume"", ""ec2:AuthorizeSecurityGroupEgress"", ""ec2:AuthorizeSecurityGroupIngress"", ""ec2:CancelSpotInstanceRequests"", ""ec2:CreateDhcpOptions"", ""ec2:CreateFleet"", ""ec2:CreateInternetGateway"", ""ec2:CreateLaunchTemplate"", ""ec2:CreateLaunchTemplateVersion"", ""ec2:CreateNatGateway"", ""ec2:CreateRoute"", ""ec2:CreateRouteTable"", ""ec2:CreateSecurityGroup"", ""ec2:CreateSubnet"", ""ec2:CreateTags"", ""ec2:CreateVolume"", ""ec2:CreateVpc"", ""ec2:CreateVpcEndpoint"", ""ec2:DeleteDhcpOptions"", ""ec2:DeleteFleets"", ""ec2:DeleteInternetGateway"", ""ec2:DeleteLaunchTemplate"", ""ec2:DeleteLaunchTemplateVersions"", ""ec2:DeleteNatGateway"", ""ec2:DeleteRoute"", ""ec2:DeleteRouteTable"", ""ec2:DeleteSecurityGroup"", ""ec2:DeleteSubnet"", ""ec2:DeleteTags"", ""ec2:DeleteVolume"", ""ec2:DeleteVpc"", ""ec2:DeleteVpcEndpoints"", ""ec2:DescribeAvailabilityZones"", ""ec2:DescribeFleetHistory"", ""ec2:DescribeFleetInstances"", ""ec2:DescribeFleets"", ""ec2:DescribeIamInstanceProfileAssociations"", ""ec2:DescribeInstanceStatus"", ""ec2:DescribeInstances"", ""ec2:DescribeInternetGateways"", ""ec2:DescribeLaunchTemplates"", ""ec2:DescribeLaunchTemplateVersions"", ""ec2:DescribeNatGateways"", ""ec2:DescribePrefixLists"", ""ec2:DescribeReservedInstancesOfferings"", ""ec2:DescribeRouteTables"", ""ec2:DescribeSecurityGroups"", ""ec2:DescribeSpotInstanceRequests"", ""ec2:DescribeSpotPriceHistory"", ""ec2:DescribeSubnets"", ""ec2:DescribeVolumes"", ""ec2:DescribeVpcs"", ""ec2:DetachInternetGateway"", ""ec2:DisassociateIamInstanceProfile"", ""ec2:DisassociateRouteTable"", ""ec2:GetLaunchTemplateData"", ""ec2:GetSpotPlacementScores"", ""ec2:ModifyFleet"", ""ec2:ModifyLaunchTemplate"", ""ec2:ModifyVpcAttribute"", ""ec2:ReleaseAddress"", ""ec2:ReplaceIamInstanceProfileAssociation"", ""ec2:RequestSpotInstances"", ""ec2:RevokeSecurityGroupEgress"", ""ec2:RevokeSecurityGroupIngress"", ""ec2:RunInstances"", ""ec2:TerminateInstances"" ], ""Resource"": [ ""*"" ] }, { ""Effect"": ""Allow"", ""Action"": [ ""iam:CreateServiceLinkedRole"", ""iam:PutRolePolicy"" ], ""Resource"": ""arn:aws:iam::*:role/aws-service-role/spot.amazonaws.com/AWSServiceRoleForEC2Spot"", ""Condition"": { ""StringLike"": { ""iam:AWSServiceName"": ""spot.amazonaws.com"" } } } ] } Click Review policy. In the Name field, enter a policy name. Click Create policy. (Optional) If you use Service Control Policies to deny certain actions at the AWS account level, ensure that sts:AssumeRole is allowlisted so Databricks can assume the cross-account role. In the role summary, copy the Role ARN to add to Databricks. Option 2: Customer-managed VPC with default restrictions policy Log into your AWS Console as a user with administrator privileges and go to the IAM console. Click the Roles tab in the sidebar. In the list of roles, click the cross-account IAM role that you created in Step 1. Click the Add permissions drop-down and select Create inline policy. In the policy editor, click the JSON tab. Copy and paste the following access policy."
22702,"## URL: https://docs.databricks.com/en/admin/account-settings-e2/credentials.html ## Content: Click the Add permissions drop-down and select Create inline policy. In the policy editor, click the JSON tab. Copy and paste the following access policy. { ""Version"": ""2012-10-17"", ""Statement"": [ { ""Sid"": ""Stmt1403287045000"", ""Effect"": ""Allow"", ""Action"": [ ""ec2:AssociateIamInstanceProfile"", ""ec2:AttachVolume"", ""ec2:AuthorizeSecurityGroupEgress"", ""ec2:AuthorizeSecurityGroupIngress"", ""ec2:CancelSpotInstanceRequests"", ""ec2:CreateTags"", ""ec2:CreateVolume"", ""ec2:DeleteTags"", ""ec2:DeleteVolume"", ""ec2:DescribeAvailabilityZones"", ""ec2:DescribeIamInstanceProfileAssociations"", ""ec2:DescribeInstanceStatus"", ""ec2:DescribeInstances"", ""ec2:DescribeInternetGateways"", ""ec2:DescribeNatGateways"", ""ec2:DescribeNetworkAcls"", ""ec2:DescribePrefixLists"", ""ec2:DescribeReservedInstancesOfferings"", ""ec2:DescribeRouteTables"", ""ec2:DescribeSecurityGroups"", ""ec2:DescribeSpotInstanceRequests"", ""ec2:DescribeSpotPriceHistory"", ""ec2:DescribeSubnets"", ""ec2:DescribeVolumes"", ""ec2:DescribeVpcAttribute"", ""ec2:DescribeVpcs"", ""ec2:DetachVolume"", ""ec2:DisassociateIamInstanceProfile"", ""ec2:ReplaceIamInstanceProfileAssociation"", ""ec2:RequestSpotInstances"", ""ec2:RevokeSecurityGroupEgress"", ""ec2:RevokeSecurityGroupIngress"", ""ec2:RunInstances"", ""ec2:TerminateInstances"", ""ec2:DescribeFleetHistory"", ""ec2:ModifyFleet"", ""ec2:DeleteFleets"", ""ec2:DescribeFleetInstances"", ""ec2:DescribeFleets"", ""ec2:CreateFleet"", ""ec2:DeleteLaunchTemplate"", ""ec2:GetLaunchTemplateData"", ""ec2:CreateLaunchTemplate"", ""ec2:DescribeLaunchTemplates"", ""ec2:DescribeLaunchTemplateVersions"", ""ec2:ModifyLaunchTemplate"", ""ec2:DeleteLaunchTemplateVersions"", ""ec2:CreateLaunchTemplateVersion"", ""ec2:AssignPrivateIpAddresses"", ""ec2:GetSpotPlacementScores"" ], ""Resource"": [ ""*"" ] }, { ""Effect"": ""Allow"", ""Action"": [ ""iam:CreateServiceLinkedRole"", ""iam:PutRolePolicy"" ], ""Resource"": ""arn:aws:iam::*:role/aws-service-role/spot.amazonaws.com/AWSServiceRoleForEC2Spot"", ""Condition"": { ""StringLike"": { ""iam:AWSServiceName"": ""spot.amazonaws.com"" } } } ] } Click Review policy. In the Name field, enter a policy name. Click Create policy. (Optional) If you use Service Control Policies to deny certain actions at the AWS account level, ensure that sts:AssumeRole is allowlisted so Databricks can assume the cross-account role. In the role summary, copy the Role ARN. Option 3: Customer-managed VPC with custom policy restrictions Note The Databricks production AWS account from which Amazon Machine Images (AMI) are sourced is 601306020600. You can use this account ID to create custom access policies that restrict which AMIs can be used within your AWS account. For more information, contact your Databricks account team. Log into your AWS Console as a user with administrator privileges and go to the IAM console. Click the Roles tab in the sidebar. In the list of roles, click the cross-account IAM role that you created for in Step 1. Click the Add permissions dropdown then Create inline policy. In the policy editor, click the JSON tab. Copy and paste the following access policy. Replace the following values in the policy with your own configuration values: ACCOUNTID — Your AWS account ID, which is a number. VPCID — ID of the AWS VPC where you want to launch workspaces. REGION — AWS Region name for your VPC deployment, for example us-west-2. SECURITYGROUPID — ID of your AWS security group. When you add a security group restriction, you cannot reuse the cross-account IAM role or reference a credentials ID (credentials_id) for any other workspaces. For those other workspaces, you must create separate roles, policies, and credentials objects. Note If you have custom requirements configured for security groups with your customer-managed vpc, contact your Databricks account team for assistance with IAM policy customizations."


%md 
## Create a Vector Store

In this step, you will compute embeddings for the dataset containing information about the products and store them in a Vector Search index using Databricks Vector Search.

**🚨IMPORTANT: Vector Search endpoints must be created before running the rest of the demo. These are already created for you in Databricks Lab environment.**


In [0]:
## Assign Vector Search endpoint by username
vs_endpoint_prefix = "vs_endpoint_"
vs_endpoint_name = vs_endpoint_prefix + str(get_fixed_integer(DA.unique_name("_")))
print(f"Assigned Vector Search endpoint name: {vs_endpoint_name}.")

Assigned Vector Search endpoint name: vs_endpoint_4.


In [0]:
## Index table name
vs_index_table_fullname = f"{DA.catalog_name}.{DA.schema_name}.doc_embeddings"

## Store embeddings in vector store
## NOTE: we're using 'content' as the embedding column
create_vs_index(vs_endpoint_name, vs_index_table_fullname, vs_source_table_fullname, "document" )

[NOTICE] Using a notebook authentication token. Recommended for development only. For improved performance, please use Service Principal based authentication. To disable this message, pass disable_notice=True to VectorSearchClient().
[NOTICE] Using a notebook authentication token. Recommended for development only. For improved performance, please use Service Principal based authentication. To disable this message, pass disable_notice=True to VectorSearchClient().
Endpoint named vs_endpoint_4 is ready.


## Task 1: Build the First Chain (Vector Store Search)

In this task, you will create first chain that will search for product details from the Vector Store using a dataset containing product descriptions.

**Instructions:**
   - Configure components for the first chain to perform a search using the Vector Store.
   - Utilize the loaded dataset to generate prompts for Vector Store search queries.
   - Set up retrieval to extract relevant product details based on the generated prompts and search results.


In [0]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.prompts import PromptTemplate
from langchain_databricks import ChatDatabricks, DatabricksVectorSearch
from langchain_core.output_parsers import StrOutputParser

## Define the Databricks Chat model: llama-3
llm_llama = ChatDatabricks(endpoint="databricks-meta-llama-3-3-70b-instruct", max_tokens=1000)

## Define the prompt template for generating search queries
prompt_template_vs = PromptTemplate.from_template(
    """
    You are a documentation assistant. Based on the following context from a technical document, generate a concise summary or relevant content snippet for answering the user’s question.

    Write a response that is aligned with the tone and format of technical documentation and helps the user understand or resolve their query.

    Maximum 300 words.

    Use the following document snippet and context as example;

    <context>
    {context}
    </context>

    Question: {input}
    """
)

## Construct the RetrievalQA chain for Vector Store search
def get_retriever(persist_dir=None):
    vsc = VectorSearchClient(disable_notice=True)
    vs_index = vsc.get_index(vs_endpoint_name, vs_index_table_fullname)
    vectorstore = DatabricksVectorSearch(vs_index_table_fullname)
    return vectorstore.as_retriever(search_kwargs={"k": 3})

## Construct the chain for question-answering
question_answer_chain = create_stuff_documents_chain(llm_llama, prompt_template_vs)
chain1 = create_retrieval_chain(get_retriever(), question_answer_chain)

## Invoke the chain with an example query   
response = chain1.invoke({"input": "How do I create a Delta table?"})
print(response['answer'])

  llm_llama = ChatDatabricks(endpoint="databricks-meta-llama-3-3-70b-instruct", max_tokens=1000)
  vectorstore = DatabricksVectorSearch(vs_index_table_fullname)


[NOTICE] Using a notebook authentication token. Recommended for development only. For improved performance, please use Service Principal based authentication. To disable this message, pass disable_notice=True to VectorSearchClient().
**Creating a Delta Table**

To create a Delta table, you can use the `CREATE OR REFRESH LIVE TABLE` syntax in SQL. The following example creates a table by loading data from a CSV file stored in a Unity Catalog volume:

```sql
CREATE OR REFRESH LIVE TABLE table_name
COMMENT "Table comment"
AS SELECT * FROM read_files('/path/to/file.csv', format => 'csv', header => true, mode => 'FAILFAST')
```

Replace `table_name` with the desired name for your table, and `/path/to/file.csv` with the path to your CSV file.

**Example Use Case**
--------------------

The following example creates a table named `baby_names_sql_raw` by loading data from a CSV file:
```sql
CREATE OR REFRESH LIVE TABLE baby_names_sql_raw
COMMENT "Popular baby first names in New York. This data

Trace(request_id=tr-eb588853e63b4067b72c034e68cb58d8)

## Task 2: Build the Second Chain (Optimization)

In this step, you will create a second chain to enhance the product details generated by the first chain. This optimization process aims to make the descriptions more compelling and SEO-friendly. In a real-world scenario, this model could be trained on your internal data or fine-tuned to align with your specific business objectives.

**Instructions:**

- Define a second chain using `llama-3-70b-instruct`.  

- Create a prompt to optimize the generated product description. For example:  
  *"You are a marketing expert. Revise the product title and description to be SEO-friendly and more appealing to Databricks users."*

- Use `product_details` as the parameter to be passed into the prompt.  

- Implement the chain and test it with a sample input.  


In [0]:
## Define the Databricks Chat model using llama-3-3-70b-instruct
llm_llama3 = ChatDatabricks(endpoint="databricks-meta-llama-3-3-70b-instruct", max_tokens=1000)

## Define the prompt template for refining documentation output
doc_optimization_prompt = PromptTemplate.from_template(
    """
    You are a technical writer. Improve the following documentation snippet to make it clearer, concise, and aligned with the tone used in Databricks documentation.

    Documentation snippet: {doc_snippet}

    Return only the revised documentation content.
    """
)

## Define chain 2
chain2 = doc_optimization_prompt | llm_llama3 | StrOutputParser()

## Test the chain
chain2.invoke({"doc_snippet": "Query testing product with mobile app control"})



Trace(request_id=tr-e4e025a05f5748cca9c04cfd1b2ca3dc)

## Task 3: Integrate Chains into a Multi-chain System

In this task, you will link the individual chains created in Task 2 and Task 3 together to form a multi-chain system that can handle multi-stage reasoning.

**Instructions:**

- Use Databricks **`Llama Chat model`** for processing text inputs, which is defined above in the first task.

- Create a prompt template to generate an **`HTML page`** for displaying generated product details.

- Construct the **`Multi-Chain System`**  by combining the outputs of the previous chains. **Important**: You will need to rename the output of the first chain and second chain while passing them to the next stage. This sequential chain should be as; **chain3 = chain1 > (`product_details`) > chain2 > `(optimized_product_details)` > prompt3**.  

- Invoke the multi-chain system with the input data to generate the HTML page for the specified product.


In [0]:
from langchain.schema.runnable import RunnablePassthrough, RunnableMap
from langchain_core.output_parsers import StrOutputParser
from IPython.display import display, HTML

## Define the prompt template for generating the HTML page
prompt_template_3 = PromptTemplate.from_template(
    """Create an HTML section for the following technical documentation snippet:
    
    Content: {optimized_doc}

    Return valid HTML (no head/body tags).
    """
)

## Construct multi-stage chain
chain3 = (
    chain1
    | RunnableMap({"doc_snippet": lambda x: x["answer"]})
    | chain2
    | RunnableMap({"optimized_doc": lambda x: x})
    | prompt_template_3
    | llm_llama
    | StrOutputParser()
)

## Sample query
query = {
    "input": "How do I create a Delta table in Databricks?"
}

output_html = chain3.invoke(query)

## Display the generated HTML output
display(HTML(output_html))

Trace(request_id=tr-f60aa57f8fa042029cec96da3edad2a8)

## Task 4: Save the Chain to Model Registry in UC

In this task, you will save the multi-stage chain system within our Unity Catalog.

**Instructions:**

- Set the model registry to UC and use the model name defined.

- Log and register the final multi-chain system.

- To test the registered model, load the model back from model registry and query it using a sample query. 

After registering the chain, you can view the chain and models in the **Catalog Explorer**.

In [0]:
from mlflow.models import infer_signature
import mlflow

## Set model registry to UC
mlflow.set_registry_uri("databricks-uc")
model_name = f"{DA.catalog_name}.{DA.schema_name}.multi_stage_doc_chain"

## Log the model
with mlflow.start_run(run_name="multi_stage_doc_chain") as run:
    signature = infer_signature(query, output_html)
    model_info = mlflow.langchain.log_model(
        chain3,
        loader_fn=get_retriever, 
        artifact_path="chain",
        registered_model_name=model_name,
        input_example=query,
        signature=signature
    )

## Load and test the model
model_uri = f"models:/{model_name}/{model_info.registered_model_version}"
model = mlflow.langchain.load_model(model_uri)

output_html = model.invoke(query)
display(HTML(output_html))

2025/08/12 05:49:00 INFO mlflow: Attempting to auto-detect Databricks resource dependencies for the current langchain model. Dependency auto-detection is best-effort and may not capture all dependencies of your langchain model, resulting in authorization errors when serving or querying your model. We recommend that you explicitly pass `resources` to mlflow.langchain.log_model() to ensure authorization to dependent resources succeeds when the model is deployed.


[NOTICE] Using a notebook authentication token. Recommended for development only. For improved performance, please use Service Principal based authentication. To disable this message, pass disable_notice=True to VectorSearchClient().


Uploading artifacts:   0%|          | 0/46 [00:00<?, ?it/s]

Successfully registered model 'dbacademy.labuser11182879_1754974636.multi_stage_doc_chain'.


Uploading artifacts:   0%|          | 0/46 [00:00<?, ?it/s]

Created version '1' of model 'dbacademy.labuser11182879_1754974636.multi_stage_doc_chain'.


Downloading artifacts:   0%|          | 0/46 [00:00<?, ?it/s]

[NOTICE] Using a notebook authentication token. Recommended for development only. For improved performance, please use Service Principal based authentication. To disable this message, pass disable_notice=True to VectorSearchClient().


Trace(request_id=tr-a79009cb4fac4f11896a0b9500e404ca)


## Conclusion

In this lab, you've learned how to build a multi-stage AI system using Databricks and LangChain. By integrating multiple chains, you can perform complex reasoning tasks such as searching for product details and optimizing the response based on your business needs. This approach enables the development of sophisticated AI systems capable of handling diverse tasks efficiently.



&copy; 2025 Databricks, Inc. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the <a href="https://www.apache.org/" target="blank">Apache Software Foundation</a>.<br/>
<br/><a href="https://databricks.com/privacy-policy" target="blank">Privacy Policy</a> | 
<a href="https://databricks.com/terms-of-use" target="blank">Terms of Use</a> | 
<a href="https://help.databricks.com/" target="blank">Support</a>