<div>
<img src="https://cme-solution-accelerators-images.s3-us-west-2.amazonaws.com/toxicity/solution-accelerator-logo.png"; width="50%">
</div>


# About This Series of Notebooks

This series of notebooks is intended to help you use AI_Query functions in Databricks and identify common sentiment and features from your customer feedback

In support of this goal, we will:

- Load customer feedback data from Amazon
- Use out of the box [AI functions](https://docs.databricks.com/aws/en/large-language-models/ai-functions) in Databricks to deliver batch inference sentiment analysis on your data in only a few lines of code
- Create  a single, simple pipeline to detect sentiment. This pipeline can then be used for managing tables for reporting, ad hoc queries, and/or decision support.
- Create a Genie room so you can explore your sentiment data with natural language interactions 
- Create a dashboard for monitoring sentiment back to the business and drive insights and action


# Introduction

In the previous, the review data was successfully downloaded, ingested into Databricks and stored in a table in Unity Catalog.

In this notebook, we'll cover the basics of using AI_QUERY() to help us understand the process, before moving to apply this as a batch job across our entire dataset.

The end result will be a new table containing our AI outputs which have been parsed and configured for downstream analysis by our analytical teams.

In [0]:
catalog = dbutils.widgets.text("catalog",'mido_edw_dev')
schema = dbutils.widgets.text("schema",'sentiment_analysis')
volume = dbutils.widgets.text("volume",'reviews')

In [0]:
catalog = dbutils.widgets.get("catalog")
schema = dbutils.widgets.get("schema")
volume = dbutils.widgets.get("volume")
transpiledtable = 'amazon_reviews_sentiment'
transpiledtable = f'{catalog}.{schema}.{transpiledtable}'


#Inference

This next step is critical to the process. For the sake of clarity, and keeping with the AI theme, Inference is defined as:

_...process of using a trained machine learning model to make predictions or generate outputs based on new, unseen data. It’s the “deployment” phase of an ML model — where the model is no longer learning, but applying what it has already learned._

_In Databricks, batch inference refers to running predictions on large datasets at rest (as opposed to streaming), often using tools like AI_QUERY to apply models at scale directly within SQL workflows._

In this specific scenario, we'll be performing said batch inference against our customer sentinement data, by passing a prompt (instruction) combined with our data, in order to get outputs we can then use and act upon.

Our first step will be to define the prompt that will be sent to the foundational model to be used by AI_Query to score our sentiment.

The prompt can be customised as needed for the use case, with the example below designed to provide a json-friendly output that can be subsequently processed and used in downstream analytics

To begin with, we'll explore the AI_QUERY function for an individual query to demonstrate the functionality

[AI_QUERY](https://learn.microsoft.com/en-us/azure/databricks/large-language-models/ai-query-batch-inference) works by passing a prompt to the AI_QUERY function that returns an output from a model hosted in or outside of Databricks.

Users can leverage both [Databricks-hosted foundation models](https://docs.databricks.com/aws/en/machine-learning/foundation-model-apis/) (this example) or bring their own custom or fine-tuned models

***NOTE*** For a lighter touch approach to sentiment analysis that does not require a full prompt but still gives you the releent scoring, please check out the AI_Function https://docs.databricks.com/aws/en/sql/language-manual/functions/ai_analyze_sentiment too, if you don't need to the level of customisation shown in this accelerator

![](images/inference.png)

## Single Row Use of AI_Query

In our first example, we pass a simple, single prompt to a model to view the output.

In [0]:
%sql
-- This examples calls AI_QUERY using a single row prompt. We use the provided databricks-meta-llama-3-3-70b-instruct endpoint for inference.

DECLARE llmprompt string;

SET var llmprompt = "Tell me why sentiment analysis is so important for product companies";

SELECT AI_QUERY('databricks-meta-llama-3-3-70b-instruct', llmprompt) as Output;

#How Much Can It Help?

In [0]:
%sql
-- This examples calls AI_QUERY using a single row prompt. We use the provided databricks-meta-llama-3-3-70b-instruct endpoint for inference.

DECLARE llmpromptexample string;

SET var llmpromptexample = "What percent improvement/impact could such a sentiment analysis solution have for a product company?";

SELECT AI_QUERY('databricks-meta-llama-3-3-70b-instruct', llmprompt) as Output;

# Batch Inference

With that example demonstrated, we can take this a step further with [Batch Inference](https://docs.databricks.com/aws/en/large-language-models/ai-query-batch-inference). This allows us to perform a similar analysis but at scale, leveraging the power of Databricks to perform high volume inference in a view lines of code.

As before, we'll define a prompt. This time it will be much more detailed, allowing us to capture the necessary outputs in terms of sentiment, themes and suggestions that provide clear insights for our downstream teams to act upon.

The output will be stored in a table, providing a performant, scalable source from which we can use our Genie room and AI/BI Dashboards.

In [0]:
%python
#advanced prompt
prompt = """
You are an AI sentiment analysis assistant specializing in product feedback evaluation.  
Given the following product reviews, perform the following tasks:  

1. **Sentiment Analysis:**  
   - Classify the sentiment as **Positive, Neutral, or Negative**.  
   - Provide a **sentiment score** between **-1 (very negative) and +1 (very positive)**.  

2. **Key Themes & Insights:**  
   - Identify the **main topics** customers are mentioning (e.g., usability, features, pricing).  
   - Summarize key takeaways based on the feedback provided.  

3. **Improvement Suggestions:**  
   - If there are concerns or negative feedback, suggest potential **improvements**.  

4. **Highlight Customer Quotes:**  
   - Extract **1-2 key quotes** that best represent the overall sentiment.  


Give the response in a JSON format with the following fields:

sentiment: Positive/Neutral/Negative,
sentiment_score: FLOAT_VALUE,
key_themes: [Theme 1, Theme 2, Theme 3],
suggestions: Actionable recommendations for improvement.,
highlight_quotes: [Quote 1, Quote 2]

Do not include any text before the sentiment, sentiment score, key themes, suggestions, or highlight quotes, such as 'Here is the analysis of the product review in JSON format:'

Do not include any back tics like ```json

Ensure the key_themes are standardised across the returned dataset to allow for consistent analysis.
"""


##Batch Inference with AI_Query

With our prompt defined, we go to the critical step. Here, we do the following:

- Pass our prompt to the AI_QUERY function, along with the the review data from our reviews table
- Perform batch inference at scale, producing outputs as defined by the prompt instructions
- Store the output in a Transpiled table for downstream analysis

As a reminder, we can perform a quick query on our source table to view the relevant fields for our LLM to analyse:

In [0]:
#Query base table for example review and schema

spark.sql(f"select * from {catalog}.{schema}.amazon_reviews limit 10").display()


##Calling AI_QUERY

We now call the AI_QUERY function by passing in our prompt and data. Note, in this example we concenate both the title and test together, as both have useful data to be used by the LLM.

The output will be stored in the defined {transpiledtable}.

Depending on your own dataset and models used, this next process may take a few minutes.

In [0]:
%python

#replace the endpoint with your own and remember to amend relevant table details

query = f"""
CREATE OR REPLACE TABLE {transpiledtable}
as
SELECT 
    parent_asin,
    ai_query(
    'databricks-meta-llama-3-3-70b-instruct',
    "{prompt}"||concat(title,',', text)) AS output
    FROM {catalog}.{schema}.amazon_reviews limit 10000
    """

display(spark.sql(query))

# 'databricks-meta-llama-3-3-70b-instruct',
#meta-llama-3-1-405b-instruct


We can now view the output from the inference by viewing the Transpiled table details, as shown below:

In [0]:
spark.sql(f"select * from {transpiledtable}").display()

#Conclusion

In this section we've used the AI_QUERY function to process our review data, using the prompt passed as defined earlier.

The LLM output has been stored in a processed table that can now be used in the steps for our Genie Space and Dashboards.

Now proceed to the final notebook, 003 Serving, to complete this accelerator.

