# AI SQL

AISQL delivers an easy-to-use composable query language that handles both text and multimodal data through familiar SQL syntax. Our high-performance batch engine processes queries faster through intelligent query optimization, while delivering lower costs than traditional AI solutions. Natively ingest structured and unstructured data into unified multimodal tables, enabling comprehensive insights through familiar SQL analytics for all your data.

### AI SQL Benefits:

1. Easy-to-use SQL syntax to build AI pipelines without complex coding
2. High-performance processing across all modalities (text, image, audio)
3. Lower cost batch processing through optimized architecture to support faster and larger batch jobs.

## Overview: Major function calls

### [AI_COMPLETE](https://docs.snowflake.com/sql-reference/functions/ai_complete)

Generates a completion for a given text string or image using a selected LLM. Use this function for most generative AI tasks. This is the updated version of COMPLETE (SNOWFLAKE.CORTEX). AI_COMPLETE is really like "get prompt answer from LLM"

#### Single Response

```sql
SELECT AI_COMPLETE('snowflake-arctic', 'What are large language models?');
```

#### Responses from table column

The following example generates a response for each row in the reviews table, using the content column as input. Each query result contains a critique of the corresponding review.

```sql
SELECT AI_COMPLETE(
        'mistral-large',
        CONCAT('Critique this review in bullet points: <review>', content, '</review>'))
FROM reviews
LIMIT 10;
```

#### Detailed Output

```sql
SELECT AI_COMPLETE(
    model => 'llama2-70b-chat',
    prompt => 'how does a snowflake get its unique pattern?',
    model_parameters => {
        'temperature': 0.7,
        'max_tokens': 10
    },
    show_details => true
);
```

> The response is a JSON object with the model’s message and related details. The options argument was used to truncate the output.

```json
{
    "choices": [
        {
            "messages": " The unique pattern on a snowflake is"
        }
    ],
    "created": 1708536426,
    "model": "llama2-70b-chat",
    "usage": {
        "completion_tokens": 10,
        "prompt_tokens": 22,
        "guardrail_tokens": 0,
        "total_tokens": 32
    }
}
```

> A token is roughly equivalent to four characters of text.

------

### [AI_CLASSIFY](https://docs.snowflake.com/sql-reference/functions/ai_classify)

Classifies text or images into categories that you specify.

```sql
SELECT AI_CLASSIFY('One day I will see the world', ['travel', 'cooking']);
```

produces

```json
{
  "labels": ["travel"]
}
```

------

### [AI_FILTER](https://docs.snowflake.com/en/sql-reference/functions/ai_filter)

AI-powered SQL operator for semantic filtering. You can use the same syntax on a single table (for filtering) or join multiple tables together upon a semantic relationship. Returns True or False for a given text or image input, allowing you to filter results in SELECT, WHERE, or JOIN ... ON clauses.

```sql
SELECT AI_FILTER('Is Canada in North America?');
```

```text
TRUE
```

You can use `AI_FILTER` with a `JOIN` to express linking two tables with a natural language prompt that AI can reason on.

The following example joins the RESUMES table with the JOBS table using a prompt with the AI_FILTER function.

```sql
SELECT *
FROM RESUMES
JOIN JOBS
ON AI_FILTER(PROMPT('Evaluate if this resume {0} fits this job description {1}', RESUME.contents, JOBS.jd));
```

> Important: When performing JOIN operations that utilize the AI_FILTER function, each table in the JOIN can’t exceed 500 rows.

The `PROMPT` function constructs a structured OBJECT containing a template string and a list of arguments. This object is useful for dynamically formatting messages, constructing structured prompts, or storing formatted data for further processing, such as by Cortex AI functions.


```json
{
  'template': 'Evaluate if this resume {0} fits this job description {1}',
  'args': ARRAY(<value_1>, <value_2>, ...)
}
```

------

### [AI_AGG](https://docs.snowflake.com/sql-reference/functions/ai_agg)

Aggregates a text column and returns insights across multiple rows based on a user-defined prompt. This function isn’t subject to context window limitations.

Reduces a column of text data using a natural language task description.

For example, point this function at the reviews column and it will return a summary of user feedback.

```sql
AI_AGG(reviews, 'Summarize the book reviews in 200 words')
```

------

### [AI_SUMMARIZE_AGG](https://docs.snowflake.com/sql-reference/functions/ai_summarize_agg)

Summarizes a column of text data.

For example, `AI_SUMMARIZE_AGG(churn_reason)` will return a summary of the churn_reason column.

Unlike AI_COMPLETE and SUMMARIZE (SNOWFLAKE.CORTEX), this function supports datasets larger than the maximum language model context window.

------

### [SUMMARIZE (SNOWFLAKE.CORTEX)](https://docs.snowflake.com/sql-reference/functions/summarize-snowflake-cortex)

Summarizes the given English-language input text. If given a column for a table to summarize, will summarize for each row.

------

### [AI_EMBED](https://docs.snowflake.com/sql-reference/functions/ai_embed)

Creates an embedding vector from text or an image. Embeddings are abstract numerical representations of the features of a piece of text or an image that can be used to determine the degree of similarity between pieces of text or images, which can be used for semantic search, clustering, classification, and other tasks. A vector is a mathematical object that has both magnitude (or length) and direction. It can be visualized as an arrow, where the arrow's length represents the magnitude and the arrow's head indicates the direction.

```sql
SELECT AI_EMBED('snowflake-arctic-embed-l-v2.0', 'hello world');`
```

produces:

```text
[-0.026520,0.016800,-0.047821,-0.003788,-0.012810,0.003830,-0.026398,-0.047516,-0.003845,-0.052124,0.001115,-0.020477,0.060944,0.000671,-0.049011,0.089539,-0.008957,0.015495,-0.008499,0.034882,-0.083862,0.072876,-0.023651,-0.012886,0.078613,0.005287,0.011513,-0.031494,-0.018005,-0.090454,-0.009804,0.068054,-0.022278,-0.055908,-0.009010,-0.026321,0.032562,-0.005398,-0.137695,-0.021591,-0.006611,0.002253,-0.006504,0.017349,-0.003319,-0.060242,-0.091614,0.006466,-0.004868,-0.039215,-0.019867,0.057190,-0.038239,0.014381,-0.023193,0.001735,0.000531,0.003136,-0.016174,0.022644,0.015419,0.088379,0.046967,0.030273,-0.028229,0.145752,0.003588,-0.009789,-0.048065,-0.025513,-0.000844,-0.016754,-0.037231,0.030579,-0.049652,0.029037,0.011192,0.035187,0.004967,0.051575,0.083008,0.037659,-0.016953,0.064453,0.023285,0.024506,0.019623,0.078613,-0.005245,-0.063477,-0.033142,0.009087,0.085449,-0.051971,-0.014954,0.025177,-0.004929,-0.013718,-0.014801,0.038879,0.021957,0.034119,0.015381,-0.036865,0.011284,-0.001076,0.032043,-0.025864,0.010391,-0.032440,0.050049,-0.029861,0.055847,0.027710,-0.019089,0.014557,-0.005749,0.005547,0.011757,-0.036499,-0.055725,0.051300,0.010155,-0.001466,0.006271,-0.049042,-0.047028,-0.003275,-0.060608,-0.012291,-0.005577,0.003830,0.001916,-0.024796,-0.053528,-0.027496,0.008614,-0.001675,-0.000833,-0.018005,-0.017303,0.104614,-0.007751,0.007301,0.024094,-0.054779,-0.023773,-0.040161,0.055298,-0.037384,-0.094482,0.003496,0.028503,0.045807,0.039154,0.029724,-0.019684,0.029327,-0.037689,-0.051971,-0.054626,0.037079,-0.023529,0.009171,-0.009354,0.013802,0.007183,0.015457,-0.012230,0.028458,-0.016068,0.045929,0.018906,-0.029312,0.029022,0.090515,0.013214,-0.050629,-0.008270,-0.037079,-0.099060,-0.010742,-0.051941,0.030502,-0.061310,-0.025558,-0.016693,-0.041626,-0.047668,-0.022919,0.076416,0.003700,0.070312,-0.025558,-0.064514,0.072876,0.017242,-0.062378,-0.040314,-0.006962,-0.081909,-0.034546,-0.016586,0.029068,-0.049896,-0.004833,-0.007809,-0.035248,0.020096,0.052795,-0.022034,0.081238,-0.043396,-0.053375,-0.077881,0.028030,-0.010033,0.047668,-0.065674,-0.043152,0.003574,-0.041626,0.033203,0.017548,0.017975,0.004864,-0.008003,0.061951,-0.008224,0.043823,-0.005745,-0.049988,0.045807,-0.028168,-0.004990,-0.022690,-0.033752,-0.034821,-0.008690,0.126221,-0.006760,0.015671,-0.006058,-0.030853,-0.098267,-0.001682,-0.009789,-0.005573,0.012360,0.036621,0.019501,-0.027725,0.035248,0.032227,0.024704,-0.054474,-0.024994,0.000839,-0.049103,-0.039124,0.025620,-0.059570,0.014816,0.033783,0.013992,0.003220,-0.061798,-0.005581,0.031708,0.049805,0.019577,-0.028046,0.027832,0.016846,0.009171,0.028961,-0.036957,0.008362,0.010796,0.028671,0.024368,-0.019455,0.042572,-0.017151,0.000399,-0.004173,-0.073486,0.029663,0.027893,-0.008705,0.048920,0.020081,0.013397,-0.000416,0.041840,0.013794,0.002033,-0.002686,-0.023087,0.023376,-0.001861,0.011864,-0.012627,-0.044739,0.056641,0.020645,0.052368,-0.032684,0.007732,-0.027267,0.007702,-0.008537,0.072998,0.009888,0.002789,0.035858,-0.038422,-0.008080,0.003494,0.086365,-0.035065,0.010307,-0.007465,-0.003847,-0.033142,-0.003998,-0.021576,-0.007240,-0.003275,-0.051910,0.005070,0.010651,0.054535,0.015701,-0.011688,-0.032135,0.017227,0.006081,-0.011696,0.027191,0.016220,-0.008255,-0.003204,0.032532,-0.010620,-0.011841,-0.027466,-0.041412,-0.010193,0.023529,-0.030441,-0.024033,0.022018,-0.040100,-0.012955,0.004719,-0.037689,0.042786,0.006348,0.027023,-0.025314,0.018784,0.009964,0.024857,-0.001601,0.022293,-0.057159,0.004459,0.027451,-0.023651,-0.045013,-0.012947,-0.013489,0.019211,-0.011047,-0.020157,-0.030502,-0.008339,0.020981,-0.003889,-0.004517,-0.009689,0.029633,-0.014160,-0.020920,0.075012,0.048767,0.040375,-0.015656,0.003872,0.044159,-0.005592,0.017014,-0.017303,-0.012817,-0.052490,-0.002396,-0.012123,0.032410,0.042725,0.002674,0.012161,-0.062500,-0.005760,0.010475,-0.014801,0.024750,0.076782,0.075745,-0.044250,-0.029968,0.049500,-0.017075,0.028305,-0.029266,0.034576,0.032745,0.041290,-0.002556,0.023514,-0.008621,0.011818,0.003420,-0.019485,0.003952,0.023529,-0.033386,0.008324,-0.005581,0.030518,0.020187,-0.006477,-0.008926,0.005169,-0.032745,-0.000290,-0.025101,-0.007584,-0.007061,0.020294,0.023575,0.013756,0.026550,-0.012474,0.035309,0.001669,-0.017838,0.043701,0.000176,0.000213,-0.002548,-0.021561,0.058594,0.006016,0.014679,0.025436,-0.043701,0.021103,-0.031677,0.057983,0.001859,-0.010796,0.009575,0.000861,0.035492,0.046234,-0.029388,0.011925,-0.013519,0.009346,-0.013191,-0.022659,-0.001984,0.058533,-0.020966,0.007610,0.024094,-0.014862,-0.022049,0.016113,-0.025330,-0.034607,-0.033112,-0.007080,0.011429,0.044189,-0.010925,0.018677,0.001493,0.045563,0.012276,-0.040039,-0.038055,-0.060547,0.011124,-0.027954,0.023849,-0.025284,0.045532,0.039673,-0.030579,-0.031433,0.025497,-0.022171,0.039978,-0.022354,-0.010689,-0.058533,0.013115,0.039581,0.016571,0.037720,-0.014511,0.014549,-0.023453,-0.010040,0.046326,-0.002409,-0.034302,0.008774,-0.001661,0.029327,0.039215,0.017303,-0.042694,-0.003273,0.012833,0.011642,-0.007130,0.006096,-0.028702,-0.006458,-0.006222,-0.014023,0.014198,0.025040,-0.009476,0.016953,-0.008469,0.011108,-0.011017,-0.003407,-0.017090,0.018646,0.021118,-0.030563,0.007797,0.041534,0.078186,0.015480,0.004803,-0.028320,-0.029465,-0.005962,0.004635,0.027893,-0.008591,-0.056000,-0.026291,0.022522,0.047913,-0.015327,-0.001714,0.021164,-0.003155,0.017654,0.058990,-0.007790,-0.006477,-0.010429,0.032898,0.005970,0.036041,-0.014496,0.023911,-0.011658,0.023224,-0.006042,-0.003103,-0.039703,0.008575,-0.015900,0.045685,-0.010590,-0.041901,-0.048035,0.024490,0.024521,-0.010094,-0.021439,-0.009140,-0.025436,0.006142,-0.022110,-0.014862,0.038757,0.061127,-0.008209,-0.009689,-0.003155,0.032227,-0.017731,0.009254,0.010384,-0.014091,-0.038177,0.017685,0.027908,0.000295,0.000819,0.024765,-0.016388,0.034027,0.021820,-0.004345,-0.007195,-0.008720,-0.007126,-0.025909,-0.045776,0.020905,-0.020569,-0.050049,0.004612,-0.026108,0.044067,0.006847,-0.012657,-0.013298,0.015549,0.071655,0.015068,-0.004726,-0.027573,0.007767,0.025177,-0.034485,0.005787,-0.000975,-0.018784,-0.063660,-0.000098,-0.015259,-0.008102,-0.026520,0.016235,0.024124,0.006481,0.000655,-0.016937,0.001495,-0.043579,0.030106,-0.003632,-0.020386,0.007801,0.023117,-0.011330,0.022491,0.010880,0.027435,0.042511,0.003197,0.007229,0.014801,-0.027985,-0.045288,0.019196,0.001431,0.011093,0.021713,0.009605,0.019836,0.007423,-0.019058,-0.022171,0.033569,0.005787,0.016235,0.060120,0.011177,-0.063049,0.006996,0.002211,-0.011040,-0.000536,0.000299,-0.006283,0.008049,0.008240,0.016037,0.041168,-0.009026,0.010841,0.060089,-0.042297,-0.007874,-0.031311,-0.007317,0.001381,0.029190,0.012398,0.020325,-0.021423,-0.020676,0.036682,0.041962,0.006927,0.031830,-0.003433,0.039856,-0.020477,0.019943,-0.015915,-0.005119,-0.012581,-0.004837,0.012352,0.012779,-0.051697,0.010262,-0.055023,0.090759,-0.002325,0.011383,-0.024582,0.012177,0.017914,-0.020386,0.002314,-0.001024,0.013947,0.012253,-0.029327,0.042908,-0.003298,0.016037,0.005772,-0.000535,-0.043304,0.016083,0.033081,-0.004601,0.039825,-0.052795,0.008888,-0.024353,0.028564,0.008858,0.004253,0.021591,0.001434,0.000162,0.021881,-0.009186,0.009148,0.004608,0.029724,0.009758,0.004951,-0.013008,0.048584,-0.015915,0.015945,0.004131,0.028366,0.010887,-0.011780,-0.015823,-0.043854,0.007904,-0.008644,-0.012527,0.010239,-0.041168,-0.016434,-0.027527,-0.012352,-0.035492,0.037842,-0.016907,-0.001155,0.005047,0.005077,0.036896,0.031403,-0.048828,-0.015587,-0.022171,0.015396,-0.024628,0.010712,0.023239,-0.045807,0.012215,-0.003925,0.002398,-0.033051,-0.037506,0.004086,0.041107,0.008133,0.078247,-0.021347,0.032166,-0.003738,-0.042480,0.011154,-0.004227,0.000752,-0.001289,-0.028763,-0.030396,0.024582,0.004704,-0.006062,0.001657,0.029037,0.035034,-0.003153,-0.037170,0.014221,-0.005257,-0.003716,-0.020706,0.044098,0.000779,-0.002247,-0.035706,-0.049164,0.022903,-0.064697,0.005913,0.019516,-0.039398,-0.006248,-0.032745,-0.029236,0.003382,-0.003572,-0.023376,0.019791,0.008217,0.001361,-0.044250,-0.016022,0.000669,0.031921,-0.010605,0.017044,0.027710,0.044373,0.025482,0.029495,-0.013985,0.009003,-0.016296,-0.001483,-0.007099,-0.001817,-0.048492,-0.067566,-0.019211,-0.008240,-0.026901,-0.024384,-0.005215,-0.008003,-0.052246,-0.004681,0.027100,-0.065308,-0.039093,-0.029892,-0.005508,0.011627,-0.006641,-0.047577,-0.002586,0.053192,-0.029633,0.001842,0.004131,-0.031433,0.016571,0.034851,-0.031128,0.038788,0.009926,0.021744,-0.058624,-0.022552,0.003696,-0.016922,0.043304,0.021820,0.000059,0.009689,0.023056,0.041931,-0.012276,-0.011642,0.008675,-0.002996,0.028229,-0.019012,-0.069214,-0.007263,-0.024414,-0.005608,-0.005878,0.013718,-0.019577,0.017548,-0.024689,-0.018951,0.049652,0.004395,-0.018066,0.042419,-0.006405,-0.010307,0.033173,-0.006283,-0.027191,0.022293,-0.003216,-0.035095,-0.027496,0.020218,-0.006466,0.044189,0.005405,0.016571,-0.053467,-0.019394,-0.013977,0.058105,-0.018677,0.036377,0.021255,0.000258,-0.016281,0.015350,0.004452,0.003830,0.006046,0.009354,0.010811,-0.013977,0.020187,0.020554,-0.015671,-0.001852,0.034424,-0.019669,0.003157,0.018845,-0.017838,0.011887,0.021591,0.024384,0.030136,0.023193,-0.007259,-0.007149,-0.000183,-0.015884,0.023575,-0.041046,-0.012115,0.035217,0.011055,0.041138,0.035828,0.056915,0.002918,-0.049103,0.010208,0.012833,-0.053101,-0.016785,-0.005871,0.006115,0.011642,-0.022171,-0.021301,0.014412,0.010246,0.016251,0.026108,-0.013794,0.031525,-0.004982,-0.033630,0.026764,-0.003622,-0.006756,0.030273,0.011009,-0.066040,-0.047424,-0.010590,0.015930,0.013565,0.024780,-0.012390,0.015640,0.034454,-0.012947,-0.049255,-0.000352,-0.009933,-0.007324,-0.017273,-0.062683,0.039215,0.003571,0.045715,0.044373,-0.040436,0.040985,-0.000519,-0.019272]
```

------

### [AI_SIMILARITY](https://docs.snowflake.com/sql-reference/functions/ai_similarity)

Computes a similarity score based on the vector cosine similarity value of the inputs’ embedding vectors. Currently supports both text and image similarity computation.

```sql
SELECT AI_SIMILARITY('I like this dish', 'This dish is very good');
```

produces

```text
0.8349131147
```

- The closer to 1 indicates the vectors are pointing in the exact same direction and are similar.
- 0 indicates that the vectors are orthogonal, meaning they have no directional similarity.
- -1 indicates that the vectors point in exactly opposite directions.

------

### [AI_TRANSCRIBE](https://docs.snowflake.com/en/sql-reference/functions/ai_transcribe)

Transcribes text from an audio file with optional timestamps and speaker labels. AI_TRANSCRIBE supports numerous languages, and audio can contain more than one language. Timestamps and speaker labels are extracted based on the specified timestamp granularity, as shown in the table below.

| Timestamp Granularity | Result |
| --------------------- | ------ |
| Default               | Transcription of entire audio file in one piece |
| Word                  | Transcription with timestamps for each word |
| Speaker               | Indicates who is speaking, and a timestamp, at each change of speaker |

The following example transcribes [an audio file](https://docs.snowflake.com/en/_downloads/a1ea941b9694993063b55e621dae1cd0/consultation.wav) stored in the financial_consultation stage, returning a text transcript of the entire file. The [TO_FILE function](https://docs.snowflake.com/en/sql-reference/functions/to_file) converts the staged file to a file reference.

```sql
SELECT AI_TRANSCRIBE(TO_FILE(
    '@financial_consultation', 'consultation.wav'));
```

produces

```json
{
  "audio_duration": 321.78,
  "text": "Good afternoon, Robert. Thanks for calling in today. I understand you had some concerns about your portfolio you wanted to discuss. Yes, I'm really worried. I've been watching the news and the market's been all over the place lately. I'm thinking maybe I should just sell everything, all my stocks and mutual funds and put it all in bonds or CDs. At least then I could sleep at night. I can definitely understand that concern, Robert. Market volatility can be unsettling, especially when you're seeing those daily swings in the headlines. Before we talk about any major moves, can you help me understand what specifically is driving this anxiety? Is it the recent tech sector pullback or something more general? It's everything. I'm 52 years old and I keep thinking about what happened in 2008. I lost so much then and I'm worried we're heading for another crash with this new administration. I can't afford to lose my retirement savings. Those are absolutely valid concerns, and I appreciate you sharing that context. That was a really challenging time for everyone. Let me ask you this. When we last reviewed your portfolio in March, we had you allocated at about 70% equities and 30% bonds, correct? And your target retirement age is still 62%. That's right. But honestly, 70% in stocks feels way too risky right now. I'm thinking more like 20% stocks, 80% bonds, maybe even less in stocks. I understand that instinct, Robert. Let's walk through this together. First, I want to remind you of something important. Your current portfolio is already designed with volatility in mind. You're not in individual stocks. You're in diversified index funds and some actively managed funds across different sectors and even international markets. But they're still going down. My quarterly statement showed I was down 8% this quarter alone. You're absolutely right, and that's painful to see, but let's put this in perspective. Over the past 12 months, even with this recent volatility, your portfolio is still up about 3%. The market has given back some gains, but we're not in crisis territory. Remember, we built your allocation specifically because you have 10 years until retirement. That time horizon is actually your biggest asset here. So you're saying I should just do nothing? Not exactly nothing, but I am suggesting we don't make dramatic changes based on short-term market movements. However, I do hear your concern about risk tolerance. What if we made a smaller adjustment? Instead of going to 20% stocks, what if we moved to 60% stocks and 40% bonds? That would reduce your equity exposure by 10%, which might help you sleep better, but wouldn't take you completely out of the growth potential you need for retirement. That actually sounds more reasonable, but I'm still worried about losing more money. I understand completely. Let me ask you this. What's your bigger worry, the volatility of the next year, two or two, or having enough money to retire comfortably at 62? Because if we get too conservative now, inflation alone could erode your purchasing power over the next decade. I didn't really thought about inflation that way. I guess I've been so focused on not losing money that I forgot about the money I might not make. Exactly. And remember, Robert, you're not alone in this. I've had this conversation with many clients over the past few weeks. The ones who stayed disciplined during previous market downturns are generally glad they did. What if we also set up a plan where we review your portfolio monthly for the next few months? That way you'll have regular check-ins and won't feel like you're just riding this out blindly. Monthly reviews would definitely help. And maybe the 60-40 split is a good compromise. I just, I don't want to be stupid about this. Robert, wanting to protect your retirement isn't stupid. It's exactly what you should be thinking about. The key is making sure we're protecting it in the right way. Staying invested in a diversified portfolio, even with some volatility, has historically been the best way to preserve and grow wealth over time. Okay, I think I can live with moving to 60% stocks, but if things get really bad... If things get really bad, we'll talk again. That's what I'm here for. And remember, we'll be reviewing this monthly anyway. You're not locked into anything forever. But I do want to emphasize that market timing is incredibly difficult, even for professionals. The goal isn't to avoid all volatility. It's to stay invested long enough to benefit from the market's long-term upward trend. All right, Sarah, let's do the rebalancing to 60-40 and I'll try to stop checking my account balance every day. It sounds like a solid plan, Robert. And yes, definitely limit the daily balance checking. That's a recipe for anxiety. I'll send you some research on historical market recoveries after our call and we'll schedule our first monthly review for next month. How does that sound? That sounds good. Thanks for talking me through this, Sarah. I feel a lot better than when I called. I'm so glad to hear that, Robert. Remember, staying invested requires patience, but your future self will thank you for it. I'll have the rebalancing done by tomorrow morning, and you should see the changes reflected in your account by Thursday. Perfect. Thanks again, Sarah. I thank you deeply for your patience and understanding. I'll talk to you next month."
}
```

------

### [AI_SENTIMENT](https://docs.snowflake.com/sql-reference/functions/ai_sentiment)

Returns overall and category (optional) sentiment in the given input text.

```sql
SELECT AI_SENTIMENT('A tourist\'s delight, in low urban light,
    Recommended gem, a pizza night sight. Swift arrival, a pleasure so right,
    Yet, pockets felt lighter, a slight pricey bite. 💰🍕🚀');
```

produces

```json
{
  "categories": [
    {
      "name": "overall",
      "sentiment": "mixed"
    }
  ]
}
```

------

### [AI_EXTRACT](https://docs.snowflake.com/en/sql-reference/functions/ai_extract)

Extracts information from an input string or file and returns a JSON object containing the extracted information.

```sql
select AI_EXTRACT(TEXT => 'Employee Dan Gillis has his address listed as 77 Massachusetts Avenue, Cambridge, MA 02139',
                  RESPONSEFORMAT => {'last_name': 'What is the last name of the employee?', 'address': 'What is the address of the employee?'}) as response;
```

produces

```json
{
  "response": {
    "address": "77 Massachusetts Avenue, Cambridge, MA 02139",
    "last_name": "Gillis"
  }
}
```

------

### [AI_PARSE_DOCUMENT](https://docs.snowflake.com/en/sql-reference/functions/ai_parse_document)

Returns the extracted content from a document on a Snowflake stage as a JSON-formatted string. This function supports two types of extraction: Optical Character Recognition (OCR), and layout.

For detailed information and examples, see [AISQL AI_PARSE_DOCUMENT](https://docs.snowflake.com/en/user-guide/snowflake-cortex/parse-document)

## Exercise: AISQL Functions For Equity Research

Similar to traditional database operators, Snowflake's AI powered functions enable column level operations, except they leverage LLMs. This is extremely helpful for processing unstructured data for any downstream analytics. In this notebook, we are going to:

1. Parse pdf documents into text using [AI_PARSE_DOCUMENT](https://docs.snowflake.com/en/sql-reference/functions/ai_parse_document)
2. Extract entities using [AI_COMPLETE Structured Outputs](https://docs.snowflake.com/en/user-guide/snowflake-cortex/complete-structured-outputs)
3. Use Top-K and AI Joins to map entities to S&P 500 tickers ([AI_FILTER](https://docs.snowflake.com/LIMITEDACCESS/snowflake-cortex/ai_filter-snowflake-cortex))
4. Summarize research insights (using [AI_AGG](https://docs.snowflake.com/LIMITEDACCESS/snowflake-cortex/ai_agg)) across multiple articles upon given ticker

This lab uses 7 PDF documents with equity research information in various forms. Inside each of the PDFs is bulleted data, table data, as well as text data.  We'll use the [AI_PARSE_DOCUMENT](https://docs.snowflake.com/en/sql-reference/functions/ai_parse_document) function to extract this information into a table so that we can feed it into Cortex.  We then extract the entities that are mentioned in those PDF documents into a table.  Then we join them to a stock ticker using a marketplace listing of common stock ticker and company names.  We will do this join using AI_FILTER so we can match and clean up what the original Top K join gave us but with the logic of LLMs.  Then we will summarize the findings on one of the stocks using AI_AGG 

------

### **Step 1 - Complete the Prerequisites**

Use the provided file [financial_service_equity_research_setup.sql](./financial_service_equity_research_setup.sql) as an example. Ensure that your [Notebook Session Context](https://docs.snowflake.com/en/user-guide/ui-snowsight/notebooks-sessions) is set to match the database and schema names you choose.

------

### **Step 2 - File Validation**

Validate the files are present in the stage, run the following:

In [None]:
LIST @equitydocs;

------

### **Step 3 - Document Parsing**

Let's parse the research pdf documents into text using [AI_PARSE_DOCUMENT](https://docs.snowflake.com/en/sql-reference/functions/ai_parse_document).

> NOTE: This step can take around a minute or so on an XS warehouse compute considering the calls made to the Cortex AI Inference service.

In [None]:
CREATE OR REPLACE TABLE raw_docs_text AS
SELECT TO_FILE('@equitydocs', relative_path)                                 AS staged_file,
       TO_OBJECT(PARSE_JSON('{"mode": "layout", "page_split": false}'))      AS ai_parse_document_options,
       TO_VARIANT(ai_parse_document(staged_file, ai_parse_document_options)) AS raw_text_dict,
       raw_text_dict:content                                                 AS raw_text
FROM DIRECTORY(@equitydocs);


#### **Review: Document Parsing Process**

The parsing operation above performs several key steps to extract content from PDF documents:

##### 📁 **Stage Access & File Discovery**

- **Queried Directory Table**: Uses `FROM DIRECTORY(@equitydocs)` to access file metadata
  - Directory tables are implicit objects that provide file-level metadata without creating separate database objects
  - Returns information like file paths, sizes, and timestamps for all files in the stage

##### 🔗 **File Data Type Creation**  

- **Generated File References**: Used [TO_FILE()](https://docs.snowflake.com/en/sql-reference/functions/to_file) to create [FILE data types](https://docs.snowflake.com/en/sql-reference/data-types-unstructured#label-data-types-file)
  - Combines stage name with relative file paths from the directory table
  - Creates proper file references that AI functions can process

##### ⚙️ **AI_PARSE_DOCUMENT `<options>` Creation**

Used `TO_OBJECT(PARSE_JSON(...))` to create the [AI_PARSE_DOCUMENT](https://docs.snowflake.com/en/sql-reference/functions/ai_parse_document) `<options>` argument
  - `PARSE_JSON()` converts the JSON string `{"mode": "layout", "page_split": false}` into a [VARIANT](https://docs.snowflake.com/en/sql-reference/data-types-semistructured#variant) data type.
  - `TO_OBJECT()` then converts this `VARIANT` into an [OBJECT](https://docs.snowflake.com/en/sql-reference/data-types-semistructured#object) data type required by the function

##### 🧠 **AI_PARSE_DOCUMENT Function**

The [AI_PARSE_DOCUMENT](https://docs.snowflake.com/en/sql-reference/functions/ai_parse_document) function extracts content from documents stored on Snowflake stages and returns results as an **OBJECT** data type. We then used `TO_VARIANT()` to convert the OBJECT results into a VARIANT column, whiche enables flexible querying of semi-structured data and easy extraction of nested content using dot notation.

###### **Extraction Modes:**

| Mode | Output | Best For |
|------|--------|----------|
| **Layout** | Structured markdown with tables, headers, formatting | Preserving document structure and visual organization |
| **OCR** | Plain text only | Simple text extraction without formatting |

**Current Options:** 
- We're using the `LAYOUT` mode option to maintain document structure, which allows us to work with both textual content and organizational elements like tables and formatting.
- We're using the `"page_split": false` option ensures the entire document is processed as a single continuous text block, maintaining content flow across pages rather than treating each page as a separate entity.

---

### **Step 4 - Parsed Data Verification** ✅
Execute the following SQL to verify that [AI_PARSE_DOCUMENT](https://docs.snowflake.com/en/sql-reference/functions/ai_parse_document) successfully extracted information from the PDFs, including bullet points and chart data:

In [None]:
select * from raw_docs_text;

------

### **Step 5 - Extract Company and Sentiment**

Now we can extract the company and a sentiment from the document using [AI_COMPLETE](https://docs.snowflake.com/en/sql-reference/functions/ai_complete) and [AI COMPLETE Structured Outputs](https://docs.snowflake.com/en/user-guide/snowflake-cortex/complete-structured-outputs) reading from the documents we just parsed above.

[AI COMPLETE Structured Outputs](https://docs.snowflake.com/en/user-guide/snowflake-cortex/complete-structured-outputs) lets you supply a [JSON schema](https://json-schema.org/) that completion responses must follow. This reduces the need for post-processing in your AI data pipelines and enables seamless integration with systems that require deterministic response formatting. `AI_COMPLETE` verifies each generated token against your JSON schema to ensure that the response conforms to the supplied schema.

Every model supported by [AI_COMPLETE](https://docs.snowflake.com/en/sql-reference/functions/ai_complete) supports structured output, but the most powerful models typically generate higher quality responses.

In [None]:
CREATE OR REPLACE TABLE ENTITY_EXTRACTION_EXAMPLE as 
select *,
    ai_complete(
      model => 'claude-3-7-sonnet',
      prompt => 'You are tasked with extracting companies from a research article. Extract "company" for each company that is identified and the "sentiment" which includes a sentiment of how the company was referenced.:\n\n'|| RAW_TEXT::text,
      response_format => {
          'type':'json',
          'schema':{'type' : 'object','properties': {'company_sentiment': {
              'type': 'array',
              'items': {
                  'type': 'object',
                  'properties': {
                      'company': {'type': 'string'},
                      'sentiment': {'type': 'string'}}
                      }
                  }}}
          }
    ) as extraction,
    ai_complete('llama3.1-70b', 'summarize below test: ' || raw_text) as summary
    from raw_docs_text;

Here we can see in the returned array object within the extraction column what the sentiment of each company is. Double click on a row in the extraction column to see the full values.

In [None]:
select extraction, * exclude extraction from  ENTITY_EXTRACTION_EXAMPLE;

#### **Step 6 - Load Stock Tickers from Marketplace Listing**

Let's use our marketplace listing we brought in during our setup to get a list of common tickers.  If you forgot to do this step please refer to the prerequisites documentation.

In [None]:
CREATE OR REPLACE TABLE TICKERS_LIST AS 
SELECT distinct(company_name) as company_name,
       ticker                 as ticker
FROM S__P_500_BY_DOMAIN_AND_AGGREGATED_BY_TICKERS_SAMPLE.DATAFEEDS.SP_500
group by 1,2;

In [None]:
select * from TICKERS_LIST;

### **Step 7 - Map Company Entity to Ticker**

Let's map the extracted company entity to the S&P 500 Ticker using a TOP-K Join and [AI_FILTER](https://docs.snowflake.com/LIMITEDACCESS/snowflake-cortex/ai_filter-snowflake-cortex). First we will need to flatten out this JSON object so we can have a value for each company in its own row.  This will make the joins easier.

In [None]:
create or replace view flattened_extraction as 
SELECT 
    staged_file                       as staged_file,
    raw_text                          as raw_text,
    summary                           as summary,
    flattened.value:company::STRING   as company,
    flattened.value:sentiment::STRING as sentiment,
    extraction 
FROM 
    entity_extraction_example,
    LATERAL FLATTEN(INPUT => extraction:company_sentiment) AS flattened;

In [None]:
select company, sentiment, * exclude company, sentiment from flattened_extraction;

#### Company-to-Ticker Mapping with Similarity Matching

Now we'll map the extracted companies to S&P 500 tickers using a **Top-K similarity approach** followed by AI validation:

##### 1. **Top-K Join**: Find the most similar company names using embedding similarity scores

> **Note**: Similarity-based matching is one of several approaches for entity resolution. Alternative methods include exact string matching, fuzzy matching algorithms, or rule-based mapping systems.

In [None]:
CREATE OR REPLACE TABLE top_candidates AS
SELECT fl_get_relative_path(c.staged_file)                                                 AS file_name,
       c.raw_text                                                                          AS raw_text,
       c.summary                                                                           AS summary,
       c.company                                                                           AS company,
       c.sentiment                                                                         AS sentiment,
       c.extraction                                                                        AS extraction,
       d.company_name                                                                      AS company_name,
       d.ticker                                                                            AS ticker,
       ai_similarity(c.company, d.company_name, {'model':'snowflake-arctic-embed-m-v1.5'}) AS sim_score
FROM flattened_extraction c
         CROSS JOIN tickers_list d
QUALIFY ROW_NUMBER() OVER (PARTITION BY c.company, fl_get_relative_path(c.staged_file) ORDER BY sim_score DESC) <= 2;

Looking at our results we can see it did its best to join the ticker to our company name.  But we can easily see this logic is flawed as it joins some companies together that do not make any sense.

In [None]:
-- take a look from top k match - a lot of false positives due to this approach
-- is there a better way to approach this problem?
SELECT company      as extracted, 
       company_name as mapped_company,
       ticker       as mapped_ticker
FROM top_candidates;

##### 2. **AI_FILTER Validation**: Use LLM reasoning to confirm matches and eliminate false positives. We will use  [AI_FILTER](https://docs.snowflake.com/en/sql-reference/functions/ai_filter) for the most accurate results.

In [None]:
-- ENTITY DISAMBIGUATION - USING AI FILTER TO further filter down the matched entities.

CREATE OR REPLACE TABLE matched_candidates AS
SELECT file_name    AS file_name,
       raw_text     AS raw_text,
       summary      AS summary,
       company      AS extracted,
       company_name AS mapped_company,
       ticker       AS mapped_ticker
FROM top_candidates
WHERE ai_filter('Does this extracted company:' || company || ' refers to the same company as this S&P 500 company: ' || company_name || '?')
ORDER BY file_name;

In [None]:
select extracted, mapped_company, mapped_ticker from matched_candidates;

### **Step 8 - Cross-Document Intelligence Synthesis**

Now we'll use [AI_AGG](https://docs.snowflake.com/en/sql-reference/functions/ai_agg) to consolidate research insights across all documents mentioning each ticker. This function aggregates multiple text sources and generates unified summaries that capture key themes, sentiment trends, and analytical perspectives from the entire document corpus.

In [None]:
select 
    mapped_ticker,
    count(*) as count_research,
    AI_AGG('TICKER: ' || mapped_ticker || '\n' || raw_text, 'You are provided a couple research articles to the company; Please help me summarize in bullet points on discussions relevant to the company') as aggregated_summary
from matched_candidates
where mapped_ticker = 'MSFT' -- other tickers you can also check are CRM, NVDA
group by mapped_ticker;

Use pandas dataframe to display the aggregated_summary column from the previous cell which contains the bullet points for MSFT.

In [None]:
df = AGGREGATED_INSIGHTS.to_pandas()
print(df['AGGREGATED_SUMMARY'].iloc[0])