### Load and Index Data - Vector Store

We will use [Azure Cognitive Search](https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search) to load and index the data.  Azure Cognitive Search is a cloud search service with built-in AI capabilities that enrich all types of information to easily identify and explore relevant content at scale. It uses the same integrated Microsoft natural language stack that Bing and Office have used for more than a decade, and AI services across vision, language, and speech, to deliver knowledge from structured and unstructured data.

Cognitive search enabled the vector search feature! When done correctly, vector search is a proven technique for significantly increasing the semantic relevance of search results.  It is a technique that uses machine learning to embed text into a vector space, where the distance between vectors is a measure of semantic similarity.  This allows for the use of vector similarity search to find relevant results.  [Sign up]
(https://aka.ms/VectorSearchSignUp) for Private Preview of Vector Search.

Cognitive Search can index and store vectors, but it doesn't generate them out of the box. The documents that you push to your search service must contain vectors within the payload. Alternatively, you can use the Indexer to pull vectors from your data sources such as Blob Storage JSON files or CSVs. You can also use a Custom Skill to generate embeddings as part of the AI Enrichment process.


[Sample repo](https://github.com/Azure/cognitive-search-vector-pr) to get started with vector search. 

#### Pre-requisite:
- To run the code, install the following packages from local Wheel file. Alternatively, install azure-search-documents==11.4.0a20230509004 from the Dev Feed. For instructions on how to connect to the dev feed, please visit Azure-Python-SDK Azure Search Documents [Dev Feed](https://dev.azure.com/azure-sdk/public/_artifacts/feed/azure-sdk-for-python/connect/pip).
- An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/).
- An Azure Cognitive Search service (any tier, any region). [Create a service](https://learn.microsoft.com/en-us/azure/search/search-create-service-portal) or find an [existing service](https://portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Search%2FsearchServices) under your current subscription.

In [None]:
%pip install ./azure_search_documents-11.4.0b4-py3-none-any.whl

In [None]:
# Install langchain
%pip install langchain

#### Set the Environment Variable

In [1]:
import os  
import json  
import openai
from Utilities.envVars import *

# Set Search Service endpoint, index name, and API key from environment variables
indexName = SearchIndex

# Set OpenAI API key and endpoint
openai.api_type = "azure"
openai.api_version = OpenAiVersion
openai_api_key = OpenAiKey
assert openai_api_key, "ERROR: Azure OpenAI Key is missing"
openai.api_key = openai_api_key
openAiEndPoint = f"https://{OpenAiService}.openai.azure.com"
assert openAiEndPoint, "ERROR: Azure OpenAI Endpoint is missing"
assert "openai.azure.com" in openAiEndPoint.lower(), "ERROR: Azure OpenAI Endpoint should be in the form: \n\n\t<your unique endpoint identifier>.openai.azure.com"
openai.api_base = openAiEndPoint
davincimodel = OpenAiDavinci


#### Import Required Library

In [2]:
# Import required libraries
from langchain.llms.openai import AzureOpenAI, OpenAI
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import (
    PDFMinerLoader,
    UnstructuredFileLoader,
)
from Utilities.cogSearch import createSearchIndex, indexSections

#### Load the PDF, create the chunk and push to Azure Cognitive Search

In [4]:
# Flexibility to change the call to OpenAI or Azure OpenAI
embeddingModelType = "openai"

In [4]:
# Set the file name and the namespace for the index
fileName = "Fabric Get Started.pdf"
fabricGetStartedPath = "Data/PDF/" + fileName
# Load the PDF with Document Loader available from Langchain
loader = PDFMinerLoader(fabricGetStartedPath)
rawDocs = loader.load()
# Set the source 
for doc in rawDocs:
    doc.metadata['source'] = fabricGetStartedPath

textSplitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=0)
docs = textSplitter.split_documents(rawDocs)
# Call Helper function to create Index and Index the sections
createSearchIndex(SearchService, SearchKey, indexName)
indexSections(OpenAiService, OpenAiKey, OpenAiVersion, OpenAiApiKey, SearchService, SearchKey, embeddingModelType, fileName, indexName, docs)

#### Perform Vector Search

In [5]:
from Utilities.cogSearch import performCogSearch

# Pure Vector Search
query = "What is Microsoft Fabric"  

results = performCogSearch(OpenAiService, OpenAiKey, OpenAiVersion, OpenAiApiKey, SearchService, SearchKey, embeddingModelType, query, indexName, 3)

for result in results:  
    print(f"Id: {result['id']}")  
    print(f"Content: {result['content']}")  
    print(f"Source File: {result['sourcefile']}\n") 

Id: Fabric_Get_Started_pdf-3
Content: Fabric allows creators to concentrate on producing their best work, freeing them from

the need to integrate, manage, or understand the underlying infrastructure that

supports the experience.

Components of Microsoft Fabric

Microsoft Fabric offers the comprehensive set of analytics experiences designed to work

together seamlessly. Each experience is tailored to a specific persona and a specific task.

Fabric includes industry-leading experiences in the following categories for an end-to-

end analytical need.

Data Engineering - Data Engineering experience provides a world class Spark

platform with great authoring experiences, enabling data engineers to perform

large scale data transformation and democratize data through the lakehouse.

Microsoft Fabric Spark's integration with Data Factory enables notebooks and

spark jobs to be scheduled and orchestrated. For more information, see What is

Data engineering in Microsoft Fabric?

Data Factory

In [6]:
# Vector Search with Multi-language support
query = "¿Qué es Microsoft Fabric?"

results = performCogSearch(OpenAiService, OpenAiKey, OpenAiVersion, OpenAiApiKey, SearchService, SearchKey, embeddingModelType, query, indexName, 3)
  
for result in results:  
    print(f"Id: {result['id']}")  
    print(f"Content: {result['content']}")  
    print(f"Source File: {result['sourcefile']}\n") 

Id: Fabric_Get_Started_pdf-1
Content: Tell us about your PDF experience.

Microsoft Fabric get started
documentation

Microsoft Fabric is a unified platform that can meet your organization's data and
analytics needs. Discover the Fabric shared and platform documentation from this page.

About Microsoft Fabric

ｅ OVERVIEW

What is Fabric?

Fabric terminology

ｂ GET STARTED

Start a Fabric trial

Fabric home navigation

End-to-end tutorials

Context sensitive Help pane

Get started with Fabric items

ｐ CONCEPT

Find items in OneLake data hub

Promote and certify items

ｃ HOW-TO GUIDE

Apply sensitivity labels

Workspaces

ｐ CONCEPT

Fabric workspace

Workspace roles

ｂ GET STARTED

Create a workspace

ｃ HOW-TO GUIDE

Workspace access control

What is Microsoft Fabric?

Article • 05/23/2023

Microsoft Fabric is an all-in-one analytics solution for enterprises that covers everything

from data movement to data science, Real-Time Analytics, and business intelligence. It

offers a comprehe