# Process and analyse text with Built In Databricks SQL AI functions

Databricks SQL provides [built-in GenAI capabilities](https://docs.databricks.com/en/large-language-models/ai-functions.html), letting you perform adhoc operation, leveraging state of the art LLM, optimized for these tasks.

These functions are the following:

- `ai_analyze_sentiment`
- `ai_classify`
- `ai_extract`
- `ai_fix_grammar`
- `ai_gen`
- `ai_mask`
- `ai_forecast`
- `ai_similarity`
- `ai_summarize`
- `ai_translate`
- ....

Using these functions is pretty straightforward. Under the hood, they call specialized LLMs with a custom prompt, providing fast answers.

You can use them on any column in any SQL text.


Make sure to use a SQL Warehouse or Serverless for AI Funcitons

https://learn.microsoft.com/en-us/azure/databricks/large-language-models/ai-functions

In [0]:
USE CATALOG users;
USE SCHEMA david_hurley

## ai_analyze_sentiment
**ai_analyze_sentiment(content)** - Returns the sentiment of a text.


In [0]:
SELECT ai_analyze_sentiment('Its a nice day today')

## ai_fix_grammar
**`ai_fix_grammar(content)`** - Corrects grammatical errors in a given text.

In [0]:
SELECT ai_fix_grammar('Its there first day of work');

## ai_similarity
**`ai_similarity(strExpr1, strExpr2)`** - Compares two strings and computes the semantic similarity score.

In [0]:
SELECT ai_similarity('Databricks', 'Apache Spark'),
ai_similarity('Apache Spark', 'The Apache Spark Engine'),
ai_similarity('Databricks LTD', 'Databricks Limted')

## ai_classify 
**`ai_classify(content, labels)`** - Classifies the provided content into one of the provided labels.

In [0]:
SELECT ai_classify('Apple', array('Fruit', 'Vegtable'))

# Example: Work Order Data

In [0]:
SELECT * 
FROM users.david_hurley.opg_wo_example

In [0]:
SELECT 
  *,
  ai_classify(
    work_order_description,
    array('Repair', 'Inspect') 
  ) AS impact
FROM
  users.david_hurley.opg_wo_example
LIMIT 
  15

### What if we want to further explore the work order data, we can use **ai_query()**

In [0]:
CREATE OR REPLACE TEMP VIEW work_order_temp_subset AS
SELECT 
    *,
    ai_query(
    "databricks-gpt-5-2",
    "You are an engineer. You are tasked with reading descriptions from work orders and labelling into the following: 'minor wear noted', 'major wear noted', and 'no wear noted'. Use only these labels." || work_order_description
) AS damage_found
FROM 
    users.david_hurley.opg_wo_example
LIMIT 15

In [0]:
SELECT * FROM work_order_temp_subset

In [0]:
SELECT 
  *,
  ai_summarize(
    work_order_description
) AS impact_summary
FROM users.david_hurley.opg_wo_example

## Using ai_extract to create structured data

In [0]:
SELECT 
  *,
  ai_extract(
    work_order_description,
    array('issue','severity')
) AS impact_summary
FROM work_order_temp_subset

## Combining things

In [0]:
SELECT 
  ai_gen(
    CONCAT(
      'Generate a concise one sentence title based on the description: ',
      'Asset: ', asset_id, ':',
      'Description: ', work_order_description
    )
  ) AS comment
FROM 
  work_order_temp_subset



### Imagine that all we have is comment data. 
Now extract event type, and depth using ai_extract

In [0]:
select * from work_order_temp_subset

In [0]:
SELECT 
  *,
  ai_extract(
    work_order_description,
    array('Maintenance_Type',  'Severity', 'Post_Maintenance_Condition')) AS extracted
  FROM 
    work_order_temp_subset
  

Generate new columns from the extracted fields

In [0]:
SELECT
  *,
  ai_extract(
    work_order_description,
    array('Maintenance_Type',  'Severity', 'Post_Maintenance_Condition')
  )
  AS extracted,
  extracted.Maintenance_Type    AS main_type,
  extracted.Severity           AS severity,
  extracted.Post_Maintenance_Condition        AS current_condition
FROM work_order_temp_subset;

### ai_parse_document()

In [0]:
WITH parsed_documents AS (
    SELECT
      path,
      ai_parse_document(
        content,
        map(
          'imageOutputPath', '{output_results_path}',
          'descriptionElementTypes', '*'
        )
      ) AS parsed
    FROM READ_FILES('{input_files_path}', format => 'binaryFile')
  )
SELECT 
  current_timestamp() AS createdAt,
  path,
  parsed
FROM parsed_documents