# Vector 인덱스 만들기

Azure AI 검색에서 벡터 저장소에는 벡터 필드와 비 벡터 필드를 정의하는 인덱스 스키마, 포함 공간을 만들고 압축하는 알고리즘에 대한 벡터 구성, 쿼리 요청에 사용되는 벡터 필드 정의에 대한 설정이 있습니다.

인덱스 만들기 또는 업데이트 API는 벡터 저장소를 만듭니다. 다음 단계를 사용하여 벡터 데이터를 인덱싱합니다.

- 벡터 알고리즘과 선택적 압축을 사용하여 스키마 정의
- 벡터 필드 정의 추가
- 사전 벡터화된 데이터를 별도의 단계로 로드 또는 인덱싱 중에 통합 벡터화를 데이터 청크 및 인코딩에 사용

[필수 조건]

- 모든 지역 및 모든 계층의 Azure AI 검색. 대부분의 기존 서비스는 벡터 검색을 지원합니다. 2019년 1월 이전에 만들어진 서비스의 경우 벡터 인덱스를 만들 수 없는 작은 하위 집합이 있습니다. 이런 상황에서는 새로운 서비스를 만들어야 합니다. 통합 벡터화(Azure AI를 호출하는 기술 세트)를 사용하는 경우 Azure AI 검색은 Azure OpenAI 또는 Azure AI 서비스와 동일한 지역에 있어야 합니다.

- 기존 벡터 포함 또는 인덱싱 파이프라인에서 포함 모델을 호출하는 통합 벡터화를 사용합니다.

- 포함을 만드는 데 사용된 모델의 차원 제한을 알아야 합니다. 유효한 값은 2에서 3072까지입니다. Azure OpenAI에서 text-embedding-ada-002의 경우 숫자 벡터의 길이는 1536입니다. text-embedding-3-small 또는 text-embedding-3-large의 경우 벡터 길이는 3072입니다.

- 지원되는 유사성 메트릭이 무엇인지도 알아야 합니다. Azure OpenAI의 경우 유사성은 cosine를 사용하여 계산됩니다.

- 인덱스 만들기에 익숙해야 합니다. 스키마에는 문서 키에 대한 필드, 검색하거나 필터링하려는 기타 필드, 인덱싱 및 쿼리 중에 필요한 동작에 대한 기타 구성이 포함되어야 합니다.


> 📝 참고
>
> 더 자세한 내용은 [벡터 인덱스 만들기](https://learn.microsoft.com/ko-kr/azure/search/vector-search-how-to-create-index) 을 참고해 주세요.

```
작업 필요한 내용
- aoai 모델 배포 -> 참고 사이트 link 할것
```

In [1]:
import os
import json
import requests

from openai import AzureOpenAI
from dotenv import load_dotenv
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from azure.core.credentials import AzureKeyCredential
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents import SearchClient
from azure.search.documents import SearchIndexingBufferedSender
from azure.search.documents.models import VectorizedQuery
from azure.search.documents.models import VectorizableTextQuery
from azure.search.documents.models import VectorFilterMode
from azure.search.documents.models import QueryType, QueryCaptionType, QueryAnswerType
from azure.search.documents.indexes.models import (
    SimpleField,
    SearchFieldDataType,
    SearchableField,
    SearchField,
    VectorSearch,
    HnswAlgorithmConfiguration,
    VectorSearchProfile,
    SemanticConfiguration,
    SemanticPrioritizedFields,
    SemanticField,
    SemanticSearch,
    ComplexField,
    SearchIndex,
    AzureOpenAIVectorizer,
    AzureOpenAIParameters
)

load_dotenv(override=True)

endpoint = os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT")
credential = AzureKeyCredential(os.getenv("AZURE_SEARCH_ADMIN_KEY", "")) if len(os.getenv("AZURE_SEARCH_ADMIN_KEY", "")) > 0 else DefaultAzureCredential()

azure_openai_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
azure_openai_key = os.getenv("AZURE_OPENAI_KEY", "") if len(os.getenv("AZURE_OPENAI_KEY", "")) > 0 else None
azure_openai_embedding_deployment = os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT", "text-embedding-ada-002")
azure_openai_embedding_dimensions = int(os.getenv("AZURE_OPENAI_EMBEDDING_DIMENSIONS", 1536))
azure_openai_api_version = os.getenv("AZURE_OPENAI_API_VERSION", "2024-06-01")

index_name = "hotel_quickstart_vector"

### vector index 생성

In [10]:
# 필드 - titleVector, contentVector 필드가 vector 필드
fields = [
    SimpleField(name="HotelId", type=SearchFieldDataType.String, key=True),
    SearchableField(name="HotelName", type=SearchFieldDataType.String, sortable=True),
    SearchableField(name="Description", type=SearchFieldDataType.String, analyzer_name="en.lucene"),
    SearchableField(name="Description_fr", type=SearchFieldDataType.String, analyzer_name="fr.lucene"),
    SearchableField(name="Category", type=SearchFieldDataType.String, facetable=True, filterable=True, sortable=True),

    SearchableField(name="Tags", collection=True, type=SearchFieldDataType.String, facetable=True, filterable=True),

    SimpleField(name="ParkingIncluded", type=SearchFieldDataType.Boolean, facetable=True, filterable=True, sortable=True),
    SimpleField(name="LastRenovationDate", type=SearchFieldDataType.DateTimeOffset, facetable=True, filterable=True, sortable=True),
    SimpleField(name="Rating", type=SearchFieldDataType.Double, facetable=True, filterable=True, sortable=True),

    ComplexField(name="Address", fields=[
        SearchableField(name="StreetAddress", type=SearchFieldDataType.String),
        SearchableField(name="City", type=SearchFieldDataType.String, facetable=True, filterable=True, sortable=True),
        SearchableField(name="StateProvince", type=SearchFieldDataType.String, facetable=True, filterable=True, sortable=True),
        SearchableField(name="PostalCode", type=SearchFieldDataType.String, facetable=True, filterable=True, sortable=True),
        SearchableField(name="Country", type=SearchFieldDataType.String, facetable=True, filterable=True, sortable=True),
    ]),
    SimpleField(name="Location", type=SearchFieldDataType.GeographyPoint, filterable=True, sortable=True),
    ComplexField(name="Rooms", collection=True, fields=[
        SearchableField(name="Description", type=SearchFieldDataType.String, analyzer_name="en.lucene"),
        SearchableField(name="Description_fr", type=SearchFieldDataType.String, analyzer_name="fr.lucene"),
        SearchableField(name="Type", type=SearchFieldDataType.String, facetable=True, filterable=True),
        SimpleField(name="BaseRate", type=SearchFieldDataType.Double, facetable=True, filterable=True),
        SearchableField(name="BedOptions", type=SearchFieldDataType.String, facetable=True, filterable=True),
        SimpleField(name="SleepsCount", type=SearchFieldDataType.Int32, facetable=True, filterable=True),
        SimpleField(name="SmokingAllowed", type=SearchFieldDataType.Boolean, facetable=True, filterable=True),
        SearchableField(name="Tags", type=SearchFieldDataType.String, collection=True, facetable=True, filterable=True),
    ]),
    
    SearchField(name="hotelNameVector",
                type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
                searchable=True, 
                vector_search_dimensions=azure_openai_embedding_dimensions, 
                vector_search_profile_name="myHnswProfile"),
    SearchField(name="descriptionVector", 
                type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
                searchable=True, 
                vector_search_dimensions=azure_openai_embedding_dimensions, 
                vector_search_profile_name="myHnswProfile"),
]


# 벡터 검색 구성  
vector_search = VectorSearch(
    profiles=[
        VectorSearchProfile(
            name="myHnswProfile",
            algorithm_configuration_name="myHnsw",
            vectorizer="myVectorizer"
        )
    ],
    algorithms=[
        HnswAlgorithmConfiguration(
            name="myHnsw"
        )
    ],
    vectorizers=[
        AzureOpenAIVectorizer(
            name = "myVectorizer",
            azure_open_ai_parameters = AzureOpenAIParameters(
                resource_uri = azure_openai_endpoint,
                deployment_id = azure_openai_embedding_deployment,
                model_name = azure_openai_embedding_deployment,
                api_key = azure_openai_key
            )
        )
    ]
)

semantic_config = SemanticConfiguration(
    name="my-semantic-config",
    prioritized_fields=SemanticPrioritizedFields(
        title_field=SemanticField(field_name="HotelName"),
        keywords_fields=[SemanticField(field_name="Category")],
        content_fields=[SemanticField(field_name="Description")]
    )
)

# 시맨틱 설정 
semantic_search = SemanticSearch(configurations=[semantic_config])

index_client = SearchIndexClient(endpoint=endpoint, credential=credential)

# index가 존재하면 삭제
try:
    index_client.get_index(index_name)
    index_client.delete_index(index_name)
    print(f"'{index_name}' 인덱스 삭제.")
except Exception as e:
    print(f"'{index_name}' 인덱스가 존재하지 않거나 삭제가 되지 않았습니다.: {e}")

# 시맨틱 설정으로 index 생성
index = SearchIndex(
    name = index_name,
    fields = fields,
    vector_search = vector_search, 
    semantic_search = semantic_search)
result = index_client.create_or_update_index(index)

print(f'{result.name} 인덱스 생성 완료')

'hotel_quickstart_vector' 인덱스 삭제.
hotel_quickstart_vector 인덱스 생성 완료


### embedding 생성

인덱싱 할 문서를 읽어와 특정 필드(HotelName, Description)를 Embedding 후 인덱싱 합니다.

In [3]:
# text-embedding-ada-002로 embedding
openai_client = AzureOpenAI(
    api_version=azure_openai_api_version,
    azure_endpoint=azure_openai_endpoint,
    api_key=azure_openai_key
)

In [4]:
# indexing 할 데이터를 가져와 인덱싱

# 여기에서는 hotels_data.json 파일을 읽어와 embedding 후 output 디렉토리로 저장합니다.
# 이전에 실행 한 경우에는 비용을 아끼기 위해 다시 실행하지 않아도 됩니다.
hotels_data_file_path = './data/hotels_data.json'

with open(file=hotels_data_file_path, mode='r', encoding='utf-8-sig') as file:
    hotel_documents = json.load(file)['value']

# 문서 embedding

# HotelName, Description 필드 임베딩
hotel_name = [item['HotelName'] for item in hotel_documents]
description = [item['Description'] for item in hotel_documents]

hotel_name_response = openai_client.embeddings.create(
    model = azure_openai_embedding_deployment,
    input = hotel_name
)
hotel_name_embeddings = [item.embedding for item in hotel_name_response.data]

description_response = openai_client.embeddings.create(
    model = azure_openai_embedding_deployment, 
    input = description
)
description_embeddings = [item.embedding for item in description_response.data]

for i, item in enumerate(hotel_documents):
    item['hotelNameVector'] = hotel_name_embeddings[i]
    item['descriptionVector'] = description_embeddings[i]

# docVectors.json 파일로 문서와 임베딩 결과 기록
output_path = os.path.join('.', 'output', 'docVectors.json')
output_directory = os.path.dirname(output_path)
if not os.path.exists(output_directory):
    os.makedirs(output_directory)
with open(output_path, "w") as f:
    json.dump(hotel_documents, f)

In [11]:
### text와 embedding 된 vector 를 vector index로 입력

# 이전에서 임베딩 된 데이터가 있다면 그대로 사용하여도 됩니다.
# 재실행 시 임베딩 시 사용되는 자원을 아끼기 위해 저장하여 놓은 데이터를 읽어와 인덱싱 합니다.

# output 디렉토리에서 embedding 된 결과물을 저장해 놓은 docVectors.json 를 읽어 인덱싱
output_path = os.path.join('.', 'output', 'docVectors.json')
output_directory = os.path.dirname(output_path)

if not os.path.exists(output_directory):
    os.makedirs(output_directory)
with open(output_path, 'r') as file:  
    documents = json.load(file)  

search_client = SearchClient(endpoint=endpoint, index_name=index_name, credential=credential)
result = search_client.upload_documents(documents)
print(f"문서 인덱싱: {len(documents)}") 

문서 인덱싱: 50


### vector 유사도 검색

기본적인 벡터 검색을 수행하는 예시입니다.

검색 텍스트를 embedding 후 DescriptionVector 필드를 대상으로 벡터 검색을 수행합니다.

[VectorizedQuery](https://learn.microsoft.com/ko-kr/python/api/azure-search-documents/azure.search.documents.models.vectorizedquery?view=azure-python) 참고

In [None]:
# vector 검색
query = "traditional hotels with free wifi"  

# query를 embedding 합니다.
embedding = openai_client.embeddings.create(
    input = query,
    model = azure_openai_embedding_deployment
).data[0].embedding

# descriptionVector 필드에서 embdding 된 query로 검색하여 상위 3개의 가까운 항목 반환
vector_query = VectorizedQuery(
    vector = embedding, 
    k_nearest_neighbors = 3, 
    fields = "descriptionVector")
  
results = search_client.search(  
    search_text = None,  
    vector_queries = [vector_query],
    select=["HotelName", "Description", "Category"],
)  
  
for result in results:  
    print(f"HotelName: {result['HotelName']}")  
    print(f"Score: {result['@search.score']}")  
    print(f"Description: {result['Description']}")  
    print(f"Category: {result['Category']}\n")  

HotelName: Countryside Hotel
Score: 0.8845056
Description: Save up to 50% off traditional hotels. Free WiFi, great location near downtown, full kitchen, washer & dryer, 24/7 support, bowling alley, fitness center and more.
Category: Extended-Stay

HotelName: Friendly Motor Inn
Score: 0.8612631
Description: Close to historic sites, local attractions, and urban parks. Free Shuttle to the airport and casinos. Free breakfast and WiFi.
Category: Budget

HotelName: Lion's Den Inn
Score: 0.8593013
Description: Full breakfast buffet for 2 for only $1. Excited to show off our room upgrades, faster high speed WiFi, updated corridors & meeting space. Come relax and enjoy your stay.
Category: Budget



### query를 embedding 하지 않고 검색 

검색 텍스트를 별도로 embedding 하지 않고, 텍스트를 입력하면 vectorizer가 쿼리를 embedding 후 처리합니다.

[VectorizableTextQuery](https://learn.microsoft.com/ko-kr/python/api/azure-search-documents/azure.search.documents.models.vectorizabletextquery) 참고

In [13]:
query = "traditional hotels with free wifi"
  
vector_query = VectorizableTextQuery(
    text=query, 
    k_nearest_neighbors=3, 
    fields="descriptionVector"
)
  
results = search_client.search(  
    search_text=None,  
    vector_queries= [vector_query],
    select=["HotelName", "Description", "Category"],)  
  
for result in results:  
    print(f"HotelName: {result['HotelName']}")  
    print(f"Score: {result['@search.score']}")  
    print(f"Description: {result['Description']}")  
    print(f"Category: {result['Category']}\n")  

HotelName: Countryside Hotel
Score: 0.8845056
Description: Save up to 50% off traditional hotels. Free WiFi, great location near downtown, full kitchen, washer & dryer, 24/7 support, bowling alley, fitness center and more.
Category: Extended-Stay

HotelName: Friendly Motor Inn
Score: 0.8612631
Description: Close to historic sites, local attractions, and urban parks. Free Shuttle to the airport and casinos. Free breakfast and WiFi.
Category: Budget

HotelName: Lion's Den Inn
Score: 0.8593013
Description: Full breakfast buffet for 2 for only $1. Excited to show off our room upgrades, faster high speed WiFi, updated corridors & meeting space. Come relax and enjoy your stay.
Category: Budget



### Exhaustive KNN 검색

index에 설정 된 값이 아닌 Exhaustive KNN 으로 검색합니다. 정확하게 검색하기에 ground-truth  값으로 사용 할 수 있습니다.

In [14]:
query = "traditional hotels with free wifi"

# 위의 예제와 다른 점은 exhaustive=True
vector_query = VectorizableTextQuery(
    text=query, 
    k_nearest_neighbors=3, 
    fields="descriptionVector", 
    exhaustive=True)
  
results = search_client.search(  
    search_text=None,  
    vector_queries= [vector_query],
    select=["HotelName", "Description", "Category"],)  
  
for result in results:  
    print(f"HotelName: {result['HotelName']}")  
    print(f"Score: {result['@search.score']}")  
    print(f"Description: {result['Description']}")  
    print(f"Category: {result['Category']}\n")  

HotelName: Countryside Hotel
Score: 0.88450634
Description: Save up to 50% off traditional hotels. Free WiFi, great location near downtown, full kitchen, washer & dryer, 24/7 support, bowling alley, fitness center and more.
Category: Extended-Stay

HotelName: Friendly Motor Inn
Score: 0.8612631
Description: Close to historic sites, local attractions, and urban parks. Free Shuttle to the airport and casinos. Free breakfast and WiFi.
Category: Budget

HotelName: Lion's Den Inn
Score: 0.8593009
Description: Full breakfast buffet for 2 for only $1. Excited to show off our room upgrades, faster high speed WiFi, updated corridors & meeting space. Come relax and enjoy your stay.
Category: Budget



### Cross-Field Vector 검색

여러 벡터 필드를 동시에 쿼리할 수 있는 교차 필드 벡터(Cross-Field Vector) 검색 입니다.</br>

descriptionVector 필드와 함께 hotelNameVector 필드도 검색 합니다.

In [15]:
query = "traditional hotels with free wifi"  
  
vector_query = VectorizableTextQuery(
    text=query,
    k_nearest_neighbors=3,
    fields="descriptionVector, hotelNameVector")

results = search_client.search(  
    search_text=None,  
    vector_queries= [vector_query],
    select=["HotelName", "Description", "Category"],
)  
  
for result in results:  
    print(f"HotelName: {result['HotelName']}")  
    print(f"Score: {result['@search.score']}")  
    print(f"Description: {result['Description']}")  
    print(f"Category: {result['Category']}\n")   

HotelName: Countryside Hotel
Score: 0.01666666753590107
Description: Save up to 50% off traditional hotels. Free WiFi, great location near downtown, full kitchen, washer & dryer, 24/7 support, bowling alley, fitness center and more.
Category: Extended-Stay

HotelName: Swirling Currents Hotel
Score: 0.01666666753590107
Description: Spacious rooms, glamorous suites and residences, rooftop pool, walking access to shopping, dining, entertainment and the city center. Each room comes equipped with a microwave, a coffee maker and a minifridge. In-room entertainment includes complimentary W-Fi and flat-screen TVs. 
Category: Suite

HotelName: Old Century Hotel
Score: 0.016393441706895828
Description: The hotel is situated in a nineteenth century plaza, which has been expanded and renovated to the highest architectural standards to create a modern, functional and first-class hotel in which art and unique historical elements coexist with the most modern comforts. The hotel also regularly hosts e

### Multi-Vector 검색

여러 쿼리 벡터를 전달하여 여러 벡터 필드를 동시에 쿼리할 수 있는 교차 필드 벡터 검색을 보여줍니다.

이 경우 두 개의 서로 다른 임베딩 모델에서 인덱스의 해당 벡터 필드에 쿼리 벡터를 전달할 수 있습니다.

각각의 벡터 필드에 대해 벡터 검색을 수행하기, 가중치(weight), exhaustive KNN 등의 검색 설정도 다르게 부여 할 수 있습니다.

In [16]:
query = "traditional hotels with free wifi"  
  
vector_query_1 = VectorizableTextQuery(text=query, k_nearest_neighbors=3, fields="hotelNameVector", weight=1, exhaustive=True)
vector_query_2 = VectorizableTextQuery(text=query, k_nearest_neighbors=3, fields="descriptionVector", weight=0.7)

results = search_client.search(  
    search_text=None,  
    vector_queries=[vector_query_1, vector_query_2],
    select=["HotelName", "Description", "Category"],
)  
  
for result in results:  
    print(f"HotelName: {result['HotelName']}")  
    print(f"Score: {result['@search.score']}")  
    print(f"Description: {result['Description']}")  
    print(f"Category: {result['Category']}\n")   

HotelName: Swirling Currents Hotel
Score: 0.01666666753590107
Description: Spacious rooms, glamorous suites and residences, rooftop pool, walking access to shopping, dining, entertainment and the city center. Each room comes equipped with a microwave, a coffee maker and a minifridge. In-room entertainment includes complimentary W-Fi and flat-screen TVs. 
Category: Suite

HotelName: Old Century Hotel
Score: 0.016393441706895828
Description: The hotel is situated in a nineteenth century plaza, which has been expanded and renovated to the highest architectural standards to create a modern, functional and first-class hotel in which art and unique historical elements coexist with the most modern comforts. The hotel also regularly hosts events like wine tastings, beer dinners, and live music.
Category: Boutique

HotelName: Twin Vortex Hotel
Score: 0.016129031777381897
Description: New experience in the making. Be the first to experience the luxury of the Twin Vortex. Reserve one of our newly

### Vector 검색에서 필터(filter) 사용

검색에 필터를 적용하는 방법을 보여줍니다.

사전 필터링(pre-filtering, 기본값)을 사용할지 사후 필터링(post-filtering)을 사용할지 선택할 수 있습니다.

In [17]:
query = "traditional hotels with free wifi"  
  
vector_query = VectorizableTextQuery(
    text=query,
    k_nearest_neighbors=3,
    fields="descriptionVector")

results = search_client.search(  
    search_text=None,  
    vector_queries= [vector_query],
    vector_filter_mode=VectorFilterMode.PRE_FILTER,
    filter="Category eq 'Budget'",
    select=["HotelName", "Description", "Category"],
)
  
for result in results:  
    print(f"HotelName: {result['HotelName']}")  
    print(f"Score: {result['@search.score']}")  
    print(f"Description: {result['Description']}")  
    print(f"Category: {result['Category']}\n") 

HotelName: Friendly Motor Inn
Score: 0.8612631
Description: Close to historic sites, local attractions, and urban parks. Free Shuttle to the airport and casinos. Free breakfast and WiFi.
Category: Budget

HotelName: Lion's Den Inn
Score: 0.8593013
Description: Full breakfast buffet for 2 for only $1. Excited to show off our room upgrades, faster high speed WiFi, updated corridors & meeting space. Come relax and enjoy your stay.
Category: Budget

HotelName: Thunderbird Motel
Score: 0.85543555
Description: Book Now & Save. Clean, Comfortable rooms at the lowest price. Enjoy complimentary coffee and tea in common areas.
Category: Budget



### 하이브리드(Hybrid) 검색

어휘(Lexical) 검색과 벡터(Vector) 검색을 함께 수행 후 결과를 반환 합니다.

벡터 검색의 경우 유사도를 이용한 검색과 함께 어휘의 정확한 검색을 함께 수행하여 검색 품질을 향상 시킬 수 있습니다.

In [18]:
query = "near downtown hotels"  
  
vector_query = VectorizableTextQuery(
    text=query, 
    k_nearest_neighbors=3, 
    fields="descriptionVector")

# 어휘 검색(search_text=query)과 벡터 검색(vector_queries=[vector_query])을 함께 사용하여 검색 합니다.
results = search_client.search(  
    search_text=query,  
    vector_queries=[vector_query],
    select=["HotelName", "Description", "Category"],
    top=3
)  

for result in results:  
    print(f"HotelName: {result['HotelName']}")  
    print(f"Score: {result['@search.score']}")  
    print(f"Description: {result['Description']}")  
    print(f"Category: {result['Category']}\n")  

HotelName: Treehouse Hotel
Score: 0.03177805617451668
Description: Near the beating heart of our vibrant downtown and bustling business district. Experience the warmth of our hotel. Enjoy free WiFi, local transportation and Milk & Cookies.
Category: Budget

HotelName: Luxury Lion Resort
Score: 0.03159204125404358
Description: Unmatched Luxury. Visit our downtown hotel to indulge in luxury accommodations. Moments from the stadium and transportation hubs, we feature the best in convenience and comfort.
Category: Luxury

HotelName: Hotel on the Harbor
Score: 0.03021353855729103
Description: Stunning Downtown Hotel with indoor Pool. Ideally located close to theatres, museums and the convention center. Indoor Pool and Sauna and fitness centre. Popular Bar & Restaurant
Category: Luxury



In [19]:
# 위의 검색 결과와, 벡터 검색만 사용(search_text=None) 의 결과를 비교해 보면, 다르다는 것을 알 수 있습니다.
vector_query = VectorizableTextQuery(
    text=query,
    k_nearest_neighbors=3,
    fields="descriptionVector")

results = search_client.search(  
    search_text=None,  
    vector_queries=[vector_query],
    select=["HotelName", "Description", "Category"],
    top=3
)  
  
for result in results:  
    print(f"HotelName: {result['HotelName']}")  
    print(f"Score: {result['@search.score']}")  
    print(f"Description: {result['Description']}")  
    print(f"Category: {result['Category']}\n")  

HotelName: Luxury Lion Resort
Score: 0.88745934
Description: Unmatched Luxury. Visit our downtown hotel to indulge in luxury accommodations. Moments from the stadium and transportation hubs, we feature the best in convenience and comfort.
Category: Luxury

HotelName: Treehouse Hotel
Score: 0.882512
Description: Near the beating heart of our vibrant downtown and bustling business district. Experience the warmth of our hotel. Enjoy free WiFi, local transportation and Milk & Cookies.
Category: Budget

HotelName: Hotel on the Harbor
Score: 0.87716687
Description: Stunning Downtown Hotel with indoor Pool. Ideally located close to theatres, museums and the convention center. Indoor Pool and Sauna and fitness centre. Popular Bar & Restaurant
Category: Luxury



### 하이브리드(hybrid) 검색에 가중치(weight) 적용

벡터 검색 시 가중치(weight)를 조정 할 수 있습니다.

In [20]:
query = "near downtown hotels"  
  
vector_query = VectorizableTextQuery(
    text=query, 
    k_nearest_neighbors=3, 
    fields="descriptionVector", 
    weight=0.2)

results = search_client.search(  
    search_text=query,  
    vector_queries=[vector_query],
    select=["HotelName", "Description", "Category"],
    top=3
)  
  
for result in results:  
    print(f"HotelName: {result['HotelName']}")  
    print(f"Score: {result['@search.score']}")  
    print(f"Description: {result['Description']}")  
    print(f"Category: {result['Category']}\n")  

HotelName: Treehouse Hotel
Score: 0.018663303926587105
Description: Near the beating heart of our vibrant downtown and bustling business district. Experience the warmth of our hotel. Enjoy free WiFi, local transportation and Milk & Cookies.
Category: Budget

HotelName: Luxury Lion Resort
Score: 0.018258705735206604
Description: Unmatched Luxury. Visit our downtown hotel to indulge in luxury accommodations. Moments from the stadium and transportation hubs, we feature the best in convenience and comfort.
Category: Luxury

HotelName: Hotel on the Harbor
Score: 0.017310313880443573
Description: Stunning Downtown Hotel with indoor Pool. Ideally located close to theatres, museums and the convention center. Indoor Pool and Sauna and fitness centre. Popular Bar & Restaurant
Category: Luxury



### 시맨틱 하이브리드(Semantic Hybrid) 검색

하이브리드 검색 결과를 semantic ranker 를 사용하여 캡션 된 결과를 반환 합니다.

벡터 인덱스 생성 시 sementic 설정([SemanticConfiguration](https://learn.microsoft.com/ko-kr/python/api/azure-search-documents/azure.search.documents.indexes.models.semanticconfiguration))을 추가 하였으며, 시맨틱 체계 순위, 캡션, 강조 표시와 답변에 사용할 제목, 콘텐츠 및 키워드 필드를 설정하였습니다.

``` python
semantic_config = SemanticConfiguration(
    name="my-semantic-config",
    prioritized_fields=SemanticPrioritizedFields(
        title_field=SemanticField(field_name="HotelName"),
        keywords_fields=[SemanticField(field_name="Category")],
        content_fields=[SemanticField(field_name="Description")]
    )
)
```



In [21]:
query = "Good hotels for times square"

vector_query = VectorizableTextQuery(
    text=query, 
    k_nearest_neighbors=3, 
    fields="descriptionVector", 
    exhaustive=True)

# 하이브리드 검색
results = search_client.search(  
    search_text=query,  
    vector_queries=[vector_query],
    select=["HotelName", "Description", "Category"],
    query_type=QueryType.SEMANTIC, 
    semantic_configuration_name='my-semantic-config', 
    query_caption=QueryCaptionType.EXTRACTIVE, 
    query_answer=QueryAnswerType.EXTRACTIVE,
    top=3
)

semantic_answers = results.get_answers()
for answer in semantic_answers:
    if answer.highlights:
        print(f"Semantic Answer: {answer.highlights}")
    else:
        print(f"Semantic Answer: {answer.text}")
    print(f"Semantic Answer Score: {answer.score}\n")

for result in results:
    print(f"HotelName: {result['HotelName']}")
    print(f"Reranker Score: {result['@search.reranker_score']}")
    print(f"Description: {result['Description']}")
    print(f"Category: {result['Category']}")

    captions = result["@search.captions"]
    if captions:
        caption = captions[0]
        if caption.highlights:
            print(f"Caption: {caption.highlights}\n")
        else:
            print(f"Caption: {caption.text}\n")

Semantic Answer: This<em> classic hotel </em>is<em> fully-refurbished and ideally located on the main commercial artery of the city in the heart of New York.</em> A few minutes away is Times Square and the historic centre of the city, as well as other places of interest that make New York one of America's most attractive and cosmopolitan cities.
Semantic Answer Score: 0.9769999980926514

HotelName: Stay-Kay City Hotel
Reranker Score: 2.979752779006958
Description: This classic hotel is fully-refurbished and ideally located on the main commercial artery of the city in the heart of New York. A few minutes away is Times Square and the historic centre of the city, as well as other places of interest that make New York one of America's most attractive and cosmopolitan cities.
Category: Boutique
Caption: This<em> classic hotel </em>is<em> fully-refurbished and ideally located on the main commercial artery of the city in the heart of New York.</em> A few minutes away is Times Square and the h