# 1. Agent bricks: Review Apspect Extraction Agent

This notebook walks through instructions to build an Information Extraction Agent in Agent Bricks to extract aspect insights & sentiments from raw reviews

Data Flow: 
**raw reviews -> review aspect extractions**  -> location aspect daily -> flag all issues -> issue diagnosis and recommendations

## Build Information Extraction Agent

Extract Structured Insights from Raw Reviews

Example: 

- Information Extraction Agent: Schema Config [`ie_agent_config.json`](/Workspace/Users/cindy.wu@databricks.com/voc_industry_demo/ie_agent_config.json)
- Instructions: 

```
1. Extract ALL relevant metadata mentioned or implied in the review.
   - Include fields such as: star rating, review date, length of stay, and overall sentiment.
   - If metadata is missing, set to null — do not guess or invent.

2. Extract EVERY relevant aspect from the review text.
   - Use the predefined aspect list (Arrival & Departure, Staff & Service, In-Room Experience, Food & Beverage, Facilities & Amenities, Environment & Location, Value & Loyalty).
   - For each aspect:
       • Identify sentiment (very_positive, positive, neutral, negative, very_negative).
       • Include short, verbatim evidence (phrases directly from the review).
       • opinion_terms: array of short polarity-bearing words/phrases tied to this aspect (e.g., “spotless,” “friendly,” “overpriced,” “noisy AC”); use verbatim spans when possible
   - Deduplicate aspects — each aspect should appear at most once.
   - Do not miss subtle mentions, mixed opinions, or multiple details for the same aspect.
   - Capture both positive and negative details accurately, without omitting context.

3. Extract all **entities** explicitly or implicitly mentioned in the review.
   - Entities include: staff roles or names, attractions, nearby locations etc.
   - Keep entity names consistent and distinct.
   - Do not fabricate entities; if unclear, set to null.
   - Avoid redundancy: each unique entity should appear only once.

4. Output clean, valid JSON following the specified schema (no extra text, no commentary).
   - Ensure consistency across reviews.
   ```

## Batch Inference With IE Agent Endpoint and `AI_QUERY`
#### [IE Query in SQL Editor](/Workspace/Users/cindy.wu@databricks.com/voc_industry_demo/queries/AI Query Extraction with KIE.dbquery.ipynb)


In [0]:
%sql
CREATE OR REPLACE TABLE lakehouse_inn_catalog.voc.review_extractions AS
WITH query_results AS (
  SELECT
    review_id,
    review_text,
    ai_query(
      'kie-b59e4876-endpoint',
      review_text,
      failOnError => false
    ) AS respon
    
    se
  FROM (
    SELECT review_id, review_text
    FROM lakehouse_inn_catalog.voc.raw_reviews
  )
)
SELECT
  review_id,
  review_text,
  response.result AS response,
  response.errorMessage AS error
FROM query_results;


## MAS Agent with KA and Genie Room

0. data generation
- generate raw review data with ai_query based on list of locations and channels
- generate hotel runbook based on list of aspects
- [Lakebase] Sync Hotel runbook to lakebase

1. Extract Insights
- [Agent Bricks, SQL] Build Infmroation Extraction Agent and AI_QUERY to extract insights from raw review 

2. Diagnosis
- [Notebook] Aggreate insights by location-aspect and calcualte metrics
- [Notebook] Create issues table based on extracted location-aspect metrics
- [Lakebass] Sync latest issues table to lakebase
- [Agent Bricks] Use Custom LLM to idenitify issue causes based on relevant reviews and provide summary
- [Lakebase, Apps] Sync issues table to lakebase
- [AI/BI] Build Genie room for raw reviews and issues table for deep dive
- [AI/BI, Apps] Build dashboard to visualize issues by location, map view

3. Recommendations
- [Notebook] Generate emails based on issue and hotel runbook in App
- [Agent Bricks, Apps] Build MAS with Genie and KA 


In [0]:
from pyspark.sql import functions as F, Window as W
diag = "lakehouse_inn_catalog.voc.open_issues_diagnosis"

# Load issues table
issues_df = spark.table(diag)

# Window to get the latest opened_at per (aspect, location)
w = W.partitionBy("aspect", "location").orderBy(F.col("opened_at").desc())

# Add row number to identify the latest issue per group
issues_ranked = issues_df.withColumn("rn", F.row_number().over(w))

# Create new status column: only the latest as Open, others as Closed
issues_final = (
    issues_ranked.withColumn(
        "status",
        F.when(F.col("rn") == 1, F.lit("Open")).otherwise(F.lit("Closed"))
    )
    .drop("rn")
)

# Overwrite the table with the updated status column
issues_final.write.format("delta").mode("overwrite").option("overwriteSchema", "true").saveAsTable(diag)