### Load and Index Data - Vector Store

We will use [Azure AI Search](https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search) to load and index the data.  Azure AI Search is a cloud search service with built-in AI capabilities that enrich all types of information to easily identify and explore relevant content at scale. It uses the same integrated Microsoft natural language stack that Bing and Office have used for more than a decade, and AI services across vision, language, and speech, to deliver knowledge from structured and unstructured data.

Azure AI search enabled the vector search feature and vector search is a proven technique for significantly increasing the semantic relevance of search results.  It is a technique that uses machine learning to embed text into a vector space, where the distance between vectors is a measure of semantic similarity.  This allows for the use of vector similarity search to find relevant results.

Azure AI Search can index and store vectors, but it doesn't generate them out of the box. The documents that you push to your search service must contain vectors within the payload. Alternatively, you can use the Indexer to pull vectors from your data sources such as Blob Storage JSON files or CSVs. You can also use a Custom Skill to generate embeddings as part of the AI Enrichment process.

[Sample repo](https://github.com/Azure/cognitive-search-vector-pr) to get started with vector search. 

#### Set the Environment Variable

In [1]:
# Import Python libraries
import os
import openai
from Utilities.envVars import *

# Set OpenAI API key and endpoint
openai.api_type = "azure"
openai.api_version = OpenAiVersion
openai_api_key = OpenAiKey
assert openai_api_key, "ERROR: Azure OpenAI Key is missing"
openai.api_key = openai_api_key
openAiEndPoint = f"{OpenAiEndPoint}"
openai.api_base = openAiEndPoint

# chatindex (set at the .env)
indexName = SearchIndex

#### Import Required Library

### Langchain
[LangChain](https://python.langchain.com/en/latest/index.html) is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model, but will also be:
- Data-aware: connect a language model to other sources of data
- Agentic: allow a language model to interact with its environment

The LangChain framework is designed around these principles.

We will use Langchain framework for rest of the workshop.

In [2]:
# Import required libraries
from langchain.llms.openai import AzureOpenAI, OpenAI
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import (
    PDFMinerLoader,
    UnstructuredFileLoader,
)
from Utilities.cogSearch import createSearchIndex, indexSections

In [3]:
# Flexibility to change the call to OpenAI or Azure OpenAI
embeddingModelType = "azureopenai"

#### Load the PDF, create the chunk and push to Azure AI Search

In [4]:
# Set the file name and the namespace for the index
fileName = "Fabric Get Started.pdf"
fabricGetStartedPath = "Data/PDF/" + fileName
# Load the PDF with Document Loader available from Langchain
loader = PDFMinerLoader(fabricGetStartedPath)
rawDocs = loader.load()
# Set the source 
for doc in rawDocs:
    doc.metadata['source'] = fabricGetStartedPath

textSplitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=0)
docs = textSplitter.split_documents(rawDocs)

#### What is Chunk, chunk_size and chunk_overlap?  Significance of these settings

In [5]:
# Call Helper function to create Index and Index the sections
createSearchIndex(SearchService, SearchKey, indexName)

Search index chatindex already exists


In [6]:
indexSections(OpenAiEndPoint, OpenAiKey, OpenAiVersion, OpenAiApiKey, SearchService, SearchKey, embeddingModelType, OpenAiEmbedding, fileName, indexName, docs)

Total docs: 58
Indexing sections from 'Fabric Get Started.pdf' into search index 'chatindex'
	Indexed 58 sections, 58 succeeded


#### Perform Vector Search

In [7]:
from Utilities.cogSearch import performCogSearch

# Pure Vector Search
query = "What is Microsoft Fabric"  

results = performCogSearch(OpenAiEndPoint, OpenAiKey, OpenAiVersion, OpenAiApiKey, SearchService, SearchKey, embeddingModelType, OpenAiEmbedding, query, indexName, 3)

for result in results:  
    print(f"Id: {result['id']}")  
    print(f"Content: {result['content']}")  
    print(f"Source File: {result['sourcefile']}\n") 

Id: Fabric_Get_Started_pdf-1
Content: Tell us about your PDF experience.

Microsoft Fabric get started
documentation

Microsoft Fabric is a unified platform that can meet your organization's data and
analytics needs. Discover the Fabric shared and platform documentation from this page.

About Microsoft Fabric

ｅ OVERVIEW

What is Fabric?

Fabric terminology

ｂ GET STARTED

Start a Fabric trial

Fabric home navigation

End-to-end tutorials

Context sensitive Help pane

Get started with Fabric items

ｐ CONCEPT

Find items in OneLake data hub

Promote and certify items

ｃ HOW-TO GUIDE

Apply sensitivity labels

Workspaces

ｐ CONCEPT

Fabric workspace

Workspace roles

ｂ GET STARTED

Create a workspace

ｃ HOW-TO GUIDE

Workspace access control

What is Microsoft Fabric?

Article • 05/23/2023

Microsoft Fabric is an all-in-one analytics solution for enterprises that covers everything

from data movement to data science, Real-Time Analytics, and business intelligence. It

offers a comprehe

In [8]:
# Vector Search with Multi-language support
query = "¿Qué es Microsoft Fabric?"

results = performCogSearch(OpenAiEndPoint, OpenAiKey, OpenAiVersion, OpenAiApiKey, SearchService, SearchKey, embeddingModelType, OpenAiEmbedding, query, indexName, 3)
  
for result in results:  
    print(f"Id: {result['id']}")  
    print(f"Content: {result['content']}")  
    print(f"Source File: {result['sourcefile']}\n") 

Id: Fabric_Get_Started_pdf-1
Content: Tell us about your PDF experience.

Microsoft Fabric get started
documentation

Microsoft Fabric is a unified platform that can meet your organization's data and
analytics needs. Discover the Fabric shared and platform documentation from this page.

About Microsoft Fabric

ｅ OVERVIEW

What is Fabric?

Fabric terminology

ｂ GET STARTED

Start a Fabric trial

Fabric home navigation

End-to-end tutorials

Context sensitive Help pane

Get started with Fabric items

ｐ CONCEPT

Find items in OneLake data hub

Promote and certify items

ｃ HOW-TO GUIDE

Apply sensitivity labels

Workspaces

ｐ CONCEPT

Fabric workspace

Workspace roles

ｂ GET STARTED

Create a workspace

ｃ HOW-TO GUIDE

Workspace access control

What is Microsoft Fabric?

Article • 05/23/2023

Microsoft Fabric is an all-in-one analytics solution for enterprises that covers everything

from data movement to data science, Real-Time Analytics, and business intelligence. It

offers a comprehe