# Create an Azure AI Search index and upload documents

This code demonstrate how to create the index and index documents, that is needed for the notebooks 'find_duplicates.ipynb' and 'rerank_chunks_and_generate_answer.ipynb'.

The output is the index created in the Azure AI Service with documents indexed on it.

## Prerequisites

+ An Azure subscription, with [access to Azure OpenAI](https://aka.ms/oai/access).
+ An Azure OpenAI service with the service name and an API key.
+ A deployment of the text-embedding-ada-002 embedding model on the Azure OpenAI Service.
+ An Azure AI Search service with the end-point, API Key and the index name to create.

We used Python 3.12.5, [Visual Studio Code with the Python extension](https://code.visualstudio.com/docs/python/python-tutorial), and the [Jupyter extension](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter) to test this example.

### Set up a Python virtual environment in Visual Studio Code

1. Open the Command Palette (Ctrl+Shift+P).
1. Search for **Python: Create Environment**.
1. Select **Venv**.
1. Select a Python interpreter. Choose 3.10 or later.

It can take a minute to set up. If you run into problems, see [Python environments in VS Code](https://code.visualstudio.com/docs/python/environments).

### Install packages

In [None]:
! pip install openai
! pip install azure-search-documents

## Import packages and create AOAI and AI Search clients

In [1]:
import os
import time
from dotenv import load_dotenv
from openai import AzureOpenAI
from azure.core.credentials import AzureKeyCredential
import sys
sys.path.append('../..')
import rag_utils

# Load environment variables from .env
load_dotenv(override=True)

# AZURE AI SEARCH
ai_search_endpoint = os.environ["SEARCH_SERVICE_ENDPOINT"]
ai_search_apikey = os.environ["SEARCH_SERVICE_QUERY_KEY"]
ai_search_index_name = os.environ["SEARCH_INDEX_NAME"]
ai_search_credential = AzureKeyCredential(ai_search_apikey)

# AZURE OPENAI FOR EMBEDDING
aoai_embedding_endpoint = os.environ["AZURE_OPENAI_EMBEDDING_ENDPOINT"]
azure_openai_embedding_key = os.environ["AZURE_OPENAI_EMBEDDING_API_KEY"]
embedding_model_name_ada = os.environ["AZURE_OPENAI_EMBEDDING_NAME_ADA"]
embedding_model_name_large_3 = os.environ["AZURE_OPENAI_EMBEDDING_NAME_LARGE_3"]

# AOAI client for embedding creation (ADA)
aoai_api_version = '2024-02-15-preview'

aoai_embedding_client_ada = AzureOpenAI(
    azure_deployment=embedding_model_name_ada,
    api_version=aoai_api_version,
    azure_endpoint=aoai_embedding_endpoint,
    api_key=azure_openai_embedding_key
)

aoai_embedding_client_large_3 = AzureOpenAI(
    azure_deployment=embedding_model_name_large_3,
    api_version=aoai_api_version,
    azure_endpoint=aoai_embedding_endpoint,
    api_key=azure_openai_embedding_key
)

## Create the Azure AI Seach index

In [2]:
from rag_utils import create_index

## Index documents in the Azure AI Search index

In [3]:
from rag_utils import index_documents

## Create the Azure AI Search index and index markdown chunked documents

In [None]:
# Create the index
create_index(ai_search_endpoint, ai_search_credential, 'project_assurance_ada', 'ada')

# Read chunk files in markdown format in the input directory
input_dir = '../../data_out/chunk_files'
chunk_contents = rag_utils.load_files(input_dir, '.txt')

# Index the chunks with large-3 model
index_documents(ai_search_endpoint, ai_search_credential, 'project_assurance_ada', aoai_embedding_client_ada, 'ada', chunk_contents)
