# Indexing Delimited Files on Azure AI Search using Console and Notebook

## Overview
LLMs work best when querying vector databases (DBs). In a few of our tutorials in this repo, we have created vector DBs from unstructured data like PDF documents. Here, we create a vector DB from structured data, which is technically complex and requires additional steps. Here we will vectorize (embed) a csv file, index our DB using Azure AI Search, and then query our vector DB using a GPT model deployed within Azure AI Studio.

This notebook differs slightly from the tutorial titled `AzureAIStudio_index_structured_notebook.ipynb` in that here we create the index within Azure AI Search directly, rather than in the notebook. We also use NIH grant data here rather than a Kaggle dataset. 

## Prerequisites
We assume you have access to Azure AI Studio and Azure AI Search Service. We assume you and have already deployed an LLM.

## Learning objectives

This tutorial will cover the following topics:
+ Introduce embeddings from structured data
+ Create Azure AI Search index from the console
+ Query Azure AI Search index from command line using LLMs

## Get started

### Install packages

Use Python3 (ipykernel) kernel

In [None]:
pip install langchain openai

Import libraries

In [None]:
import os
import pandas as pd
from openai import AzureOpenAI


### Connect to an index
This is the index you created via [these instructions](https://github.com/STRIDES/NIHCloudLabAzure/blob/main/docs/create_index_from_csv.md).
Look [here](https://learn.microsoft.com/en-us/azure/search/search-create-service-portal#name-the-service) for your endpoint name, and [here](https://learn.microsoft.com/en-us/azure/search/search-security-api-keys?tabs=portal-use%2Cportal-find%2Cportal-query#find-existing-keys) for your index key.

In [None]:
endpoint="<Your AI Search Endpoint>"
index_name="<Your Index Name>"
index_key='<Your Index Key>'

In [None]:
#connect to vector store   
from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential

search_client = SearchClient(endpoint, index_name, AzureKeyCredential(index_key))

### Connect to your model
First, make sure you have a [model deployed](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-openai), and if not, deploy a model.
To get your endpoint, key, and version number, just go to the Chat Playground and click **View Code** at the top.

In [None]:
#connect to model
os.environ["AZURE_OPENAI_ENDPOINT"] = "<Azure AI Studio Endpoint>"
os.environ["AZURE_OPENAI_API_KEY"] = "<Azure AI Studio API Key"

client = AzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_KEY"),  
    api_version="2023-08-01-preview",
    azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
    )

### Query the Vector Store

First, enter your question. Feel free to experiment with different variations or prompts.

In [None]:
query = " \
    Your input data is a list of grants. \
    Based on only the 'Project_Title' \
    list the 'Project_Number' and 'Total_Cost' \
    of all grants related to breast cancer \
"

Now we feed the query and the input embeddings to our LLM and return the results. 

In [None]:
#run query output on model
search_results = str(list(search_client.search(query)))
response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "You are an NIH Program Officer"},
        {"role": "user", "content": "Context: "+ search_results + "\n\n Query: " + query}
    ],
)
#view model output
response.choices[0].message.content.strip()

## Conclusion
And that is it! You successfully created a simple chat bot that runs queries against structured data! This is a complex problem and there are a lot of good blogs out there that describe more complex architectures. We recommend you do some investigation and see if you can come up with an even better solution for your use case! 

Key skills you learned were to: 
+ Create embeddings and a vector store using Azure AI Search in the console
+ Send prompts to the LLM grounded on your structured data from the command line

## Clean up

Remember to shut down your Azure ML compute, delete your AI search resource, and optionally delete your deployed models in AI Studio