<div>
<img src="https://cme-solution-accelerators-images.s3-us-west-2.amazonaws.com/toxicity/solution-accelerator-logo.png"; width="50%">
</div>


# About This Series of Notebooks

This series of notebooks is intended to help you use AI_Query functions in Databricks and identify common sentiment and features from your customer feedback

In support of this goal, we will:

- Load customer feedback data from Amazon
- Use out of the box [AI functions](https://docs.databricks.com/aws/en/large-language-models/ai-functions) in Databricks to deliver batch inference sentiment analysis on your data in only a few lines of code
- Create  a single, simple pipeline to detect sentiment. This pipeline can then be used for managing tables for reporting, ad hoc queries, and/or decision support.
- Create a Genie room so you can explore your sentiment data with natural language interactions 
- Create a dashboard for monitoring sentiment back to the business and drive insights and action


# About This Data


<table>
  <tr>
    <td style="vertical-align: top; width: 70%;">
      <p>This dataset is the Amazon Reviews'23 Dataset provided by McAuley labs and includes rich features such as:</p>
      <ol>
        <li>User Reviews (ratings, text, helpfulness votes, etc.)</li>
        <li>Item Metadata (descriptions, price, raw image, etc.)</li>
        <li>Links (user-item / bought together graphs).</li>
      </ol>
      <p>Full details of the Dataset and its usage can be found 
      <a href="https://amazon-reviews-2023.github.io/main.html" target="_blank">here</a>.</p>
    </td>
    <td style="vertical-align: top; text-align: right; width: 30%;">
      <img src="/Workspace/Users/mike.dobing@databricks.com/Sentiment Analysis  With AI Functions/images/ucsd_logo.png" width="150">
    </td>
  </tr>
</table>


# High Level Flow - Ingestion

In terms of a flow, we're going to be following the approach below, with each step being indicated on the diagram.

In this Ingestion notebook we'll be following the area highlighted in red - specifically downloading our data into a Volume in Databricks before ingesting into a managed table in Unity Catalog.

![](/Workspace/Users/mike.dobing@databricks.com/customer_sentiment_analysis_with_ai_functions/images/ingest.png)


#Step 1 - Setup Our Catalog, Schema and Tables

Initially, we define the Catalog, Schema and Volumes that will be used for this accelerator

In [0]:
catalog = dbutils.widgets.getArgument("catalog")
schema = dbutils.widgets.getArgument("schema")
volume = dbutils.widgets.getArgument("volume")


With your chosen catalog, schema and volume defined, we can now create the necessary objects required to build the accelerator

In [0]:

query = f"create catalog if not exists {catalog}"
spark.sql(query)
query = f"create schema if not exists {catalog}.{schema}"
spark.sql(query)
query = f"create volume if not exists {catalog}.{schema}.{volume}"

# Step 2 - Setup Raw Data

Run the following command. This will download (if needed) the Amazon Review data and store in the previously supplied tables and catalogs for our subsequent analysis.

In [0]:
%run "./util/00_Setup"


#Step 3 - Verify Our Raw Data

Our data is now ready for initial querying and validation. Run the following cells to explore the review data.

In [0]:
spark.sql(f"select * from {catalog}.{schema}.amazon_reviews").display()

We can also do this against our item table to review the item metadata captured in our source data

In [0]:
spark.sql(f"select main_category, title, parent_asin, rating_number, average_rating from {catalog}.{schema}.amazon_items").display()


#Conclusion 

The review data has now been successfully ingested and cleansed. It's now ready for batch inference using AI_Query.

Please go to [Inference](https://adb-984752964297111.11.azuredatabricks.net/editor/notebooks/1332008936123138?o=984752964297111) notebook for the next step where we introduce AI_Query