# OpenSearch Warming Up (custom analyser 사용)

## [중요] 사전 실행 노트북
이 노트북은 아래 두개의 셋업 노트북이 먼저 실행이 되어야 합니다.
- (1) Setup 노트북
    - 경로는 aws-ai-ml-workshop-kr/genai/aws-gen-ai-kr/00_setup/setup.ipynb 와 같습니다.
    -  [Setup Notebook](https://github.com/aws-samples/aws-ai-ml-workshop-kr/blob/master/genai/aws-gen-ai-kr/00_setup/setup.ipynb)
- (2) Amazon OpenSearch 설치 노트북    
    - 경로는 aws-ai-ml-workshop-kr/genai/aws-gen-ai-kr/00_setup/setup_opensearch.ipynb 와 같습니다.
    - [Setup OpenSearch](https://github.com/aws-samples/aws-ai-ml-workshop-kr/blob/master/genai/aws-gen-ai-kr/00_setup/setup_opensearch.ipynb)
    - 위 노트북을 통하여 아래와 같은 정보를 이 노트북에서 사용합니다.
        - opensearch_domain_endpoint
        - opensearch_user_id
        - opensearch_user_password



여기서는 OpenSearch 가 설치된 것을 가정하고, 한글 형태소 분석기의 사용하는 법을 알려 드립니다.

---
## Ref: 
- [Amazon OpenSearch Service, 한국어 분석을 위한 ‘노리(Nori)’ 플러그인 활용](https://aws.amazon.com/ko/blogs/tech/amazon-opensearch-service-korean-nori-plugin-for-analysis/)
- [Amazon OpenSearch Service로 검색 구현하기](https://catalog.us-east-1.prod.workshops.aws/workshops/de4e38cb-a0d9-4ffe-a777-bf00d498fa49/ko-KR/indexing/blog-reindex)
- [OpenSearch Python Client](https://opensearch.org/docs/1.3/clients/python-high-level/)
- [nori_part_of_speech token filter](https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-nori-speech.html)
- [Elasticsearch를 검색 엔진으로 사용하기(1): Nori 한글 형태소 분석기로 검색 고도화 하기](https://hanamon.kr/elasticsearch-%EA%B2%80%EC%83%89%EC%97%94%EC%A7%84-nori-%ED%98%95%ED%83%9C%EC%86%8C-%EB%B6%84%EC%84%9D%EA%B8%B0-%EA%B2%80%EC%83%89-%EA%B3%A0%EB%8F%84%ED%99%94-%EB%B0%A9%EB%B2%95/)

# 1. 환경 세팅

In [1]:
%load_ext autoreload
%autoreload 2

import sys, os
def add_python_path(module_path):
    if os.path.abspath(module_path) not in sys.path:
        sys.path.append(os.path.abspath(module_path))
        print(f"python path: {os.path.abspath(module_path)} is added")
    else:
        print(f"python path: {os.path.abspath(module_path)} already exists")
    print("sys.path: ", sys.path)

module_path = ".."
add_python_path(module_path)


python path: /home/sagemaker-user/aws-ai-ml-workshop-kr/genai/aws-gen-ai-kr/20_applications/02_qa_chatbot/01_preprocess_docs is added
sys.path:  ['/home/sagemaker-user/aws-ai-ml-workshop-kr/genai/aws-gen-ai-kr/20_applications/02_qa_chatbot/01_preprocess_docs/warming_up', '/opt/conda/lib/python310.zip', '/opt/conda/lib/python3.10', '/opt/conda/lib/python3.10/lib-dynload', '', '/opt/conda/lib/python3.10/site-packages', '/home/sagemaker-user/aws-ai-ml-workshop-kr/genai/aws-gen-ai-kr/20_applications/02_qa_chatbot/01_preprocess_docs']


In [2]:
sys.path

['/home/sagemaker-user/aws-ai-ml-workshop-kr/genai/aws-gen-ai-kr/20_applications/02_qa_chatbot/01_preprocess_docs/warming_up',
 '/opt/conda/lib/python310.zip',
 '/opt/conda/lib/python3.10',
 '/opt/conda/lib/python3.10/lib-dynload',
 '',
 '/opt/conda/lib/python3.10/site-packages',
 '/home/sagemaker-user/aws-ai-ml-workshop-kr/genai/aws-gen-ai-kr/20_applications/02_qa_chatbot/01_preprocess_docs']

# 2. Bedrock Client 생성

In [3]:
import json
import boto3
from pprint import pprint
from termcolor import colored
from local_utils import bedrock, print_ww
from local_utils.bedrock import bedrock_info

# ---- ⚠️ Un-comment and edit the below lines as needed for your AWS setup ⚠️ ----

# os.environ["AWS_DEFAULT_REGION"] = "<REGION_NAME>"  # E.g. "us-east-1"
# os.environ["AWS_PROFILE"] = "<YOUR_PROFILE>"
# os.environ["BEDROCK_ASSUME_ROLE"] = "<YOUR_ROLE_ARN>"  # E.g. "arn:aws:..."
# os.environ["BEDROCK_ENDPOINT_URL"] = "<YOUR_ENDPOINT_URL>"  # E.g. "https://..."


boto3_bedrock = bedrock.get_bedrock_client(
    assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),
    endpoint_url=os.environ.get("BEDROCK_ENDPOINT_URL", None),
    region=os.environ.get("AWS_DEFAULT_REGION", None),
)

print(colored("\n== FM lists ==", "green"))
pprint(bedrock_info.get_list_fm_models())

Create new client
  Using region: us-west-2
  Using profile: None
boto3 Bedrock client successfully created!
bedrock-runtime(https://bedrock-runtime.us-west-2.amazonaws.com)
[32m
== FM lists ==[0m
{'Claude-Instant-V1': 'anthropic.claude-instant-v1',
 'Claude-V1': 'anthropic.claude-v1',
 'Claude-V2': 'anthropic.claude-v2',
 'Claude-V2-1': 'anthropic.claude-v2:1',
 'Claude-V3-Sonnet': 'anthropic.claude-3-sonnet-20240229-v1:0',
 'Cohere-Embeddings-En': 'cohere.embed-english-v3',
 'Cohere-Embeddings-Multilingual': 'cohere.embed-multilingual-v3',
 'Command': 'cohere.command-text-v14',
 'Command-Light': 'cohere.command-light-text-v14',
 'Jurassic-2-Mid': 'ai21.j2-mid-v1',
 'Jurassic-2-Ultra': 'ai21.j2-ultra-v1',
 'Llama2-13b-Chat': 'meta.llama2-13b-chat-v1',
 'Titan-Embeddings-G1': 'amazon.titan-embed-text-v1',
 'Titan-Text-G1': 'amazon.titan-text-express-v1',
 'Titan-Text-G1-Light': 'amazon.titan-text-lite-v1'}


# 3. Titan Embedding 모델 로딩

In [4]:
# We will be using the Titan Embeddings Model to generate our Embeddings.
from langchain.embeddings import BedrockEmbeddings
from langchain.llms.bedrock import Bedrock

llm_emb = BedrockEmbeddings(client=boto3_bedrock)
llm_emb

BedrockEmbeddings(client=<botocore.client.BedrockRuntime object at 0x7fb73ebb8c40>, region_name=None, credentials_profile_name=None, model_id='amazon.titan-embed-text-v1', model_kwargs=None, endpoint_url=None, normalize=False)

# 4. OpenSearch Client 생성

## 오픈 서치 도메인 및 인증 정보 세팅

- [langchain.vectorstores.opensearch_vector_search.OpenSearchVectorSearch](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.opensearch_vector_search.OpenSearchVectorSearch.html)

In [5]:
from local_utils.proc_docs import get_parameter

In [9]:
import boto3
ssm = boto3.client('ssm', 'us-west-2')

opensearch_domain_endpoint = get_parameter(
    boto3_clinet = ssm,
    parameter_name = 'opensearch_domain_endpoint',
)

opensearch_user_id = get_parameter(
    boto3_clinet = ssm,
    parameter_name = 'opensearch_user_id',
)

opensearch_user_password = get_parameter(
    boto3_clinet = ssm,
    parameter_name = 'opensearch_user_password',
)


In [10]:
opensearch_domain_endpoint = opensearch_domain_endpoint
rag_user_name = opensearch_user_id
rag_user_password = opensearch_user_password

http_auth = (rag_user_name, rag_user_password) # Master username, Master password

In [11]:
from local_utils.opensearch import opensearch_utils

In [12]:
aws_region = os.environ.get("AWS_DEFAULT_REGION", None)

os_client = opensearch_utils.create_aws_opensearch_client(
    aws_region,
    opensearch_domain_endpoint,
    http_auth
)

# 5. 디폴트 Index Creation
- 간단하게 text 타입으로 title, body 두개의 컬럼으로 구성합니다.

In [13]:
from local_utils.rag import create_aws_opensearch_client, check_if_index_exists, delete_index
from local_utils.rag import create_index, add_doc, search_document

## Index 이름 정의

In [14]:
index_name = '01-warming-up-english-index'

## 기존 Index 삭제

In [15]:
index_exists = opensearch_utils.check_if_index_exists(
    os_client,
    index_name
)


if index_exists:
    opensearch_utils.delete_index(
        os_client,
        index_name
    )
else:
    print("Index does not exist")    

index_name=01-warming-up-english-index, exists=False
Index does not exist


## Index 스키마 정의

In [16]:
index_body = {
    'settings': {
        'analysis': {
            'analyzer': {
                'my_analyzer': {
                    'type': 'custom',
                    'tokenizer': 'custom_english_tokenizer',
                    'filter': [
                        'lowercase',
                        'custom_english_stop',
                        'custom_english_stemmer',
                    ],
                }
            },
            'tokenizer': {
                'custom_english_tokenizer': {
                    'type': 'standard',
                    'max_token_length': 20
                }
            },
            'filter': {
                'custom_english_stop': {
                    'type': 'stop',
                    'stopwords': '_english_'
                },
                'custom_english_stemmer': {
                    'type': 'stemmer',
                    'language': 'english'
                }                
            }        
        },
        'index': {
            'knn': True,
            'knn.space_type': 'cosinesimil'  # Example space type
        }
    },
    'mappings': {
        'properties': {
            'metadata': {
                'properties': {
                    'title': {                                        
                        'analyzer': 'my_analyzer',
                        'search_analyzer': 'my_analyzer',
                        'type': 'text'},  # For full-text search
                }
            },            
            'text': {
                'analyzer': 'my_analyzer',
                'search_analyzer': 'my_analyzer',
                'type': 'text'
            },
            'vector_field': {
                'type': 'knn_vector',
                'dimension': 1536  # Replace with your vector dimension
            }
        }
    }
}


In [17]:

opensearch_utils.create_index(os_client, index_name, index_body)
index_info = os_client.indices.get(index=index_name)
index_info


Creating index:
{'acknowledged': True, 'shards_acknowledged': True, 'index': '01-warming-up-english-index'}


{'01-warming-up-english-index': {'aliases': {},
  'mappings': {'properties': {'metadata': {'properties': {'title': {'type': 'text',
       'analyzer': 'my_analyzer'}}},
    'text': {'type': 'text', 'analyzer': 'my_analyzer'},
    'vector_field': {'type': 'knn_vector', 'dimension': 1536}}},
  'settings': {'index': {'replication': {'type': 'DOCUMENT'},
    'number_of_shards': '5',
    'provided_name': '01-warming-up-english-index',
    'knn.space_type': 'cosinesimil',
    'knn': 'true',
    'creation_date': '1710125758751',
    'analysis': {'filter': {'custom_english_stemmer': {'type': 'stemmer',
       'language': 'english'},
      'custom_english_stop': {'type': 'stop', 'stopwords': '_english_'}},
     'analyzer': {'my_analyzer': {'filter': ['lowercase',
        'custom_english_stop',
        'custom_english_stemmer'],
       'type': 'custom',
       'tokenizer': 'custom_english_tokenizer'}},
     'tokenizer': {'custom_english_tokenizer': {'type': 'standard',
       'max_token_length':

# 6. 디폴트 Index 에 Doc 넣기
- 아래와 같이 문서 하나를 추가 합니다.

In [18]:
def create_doc_body(title, text, llm_emb):
    # Example document
    text_emb = llm_emb.embed_query(text)
    doc_body = {
        "text": text,
        "vector_field": text_emb,  # Replace with your vector
        "metadata" : [
            {         
            "title": title, 
            }
        ]
    }

    return doc_body




In [19]:
doc_body = create_doc_body(title = "categories",
                # text = "It’s fun to contribute a brand-new PR or 2 to OpenSearch!",
                text =  "The guy contributes to an OpenSearch in 2024 and he is great !!!",
                llm_emb = llm_emb)

pprint(doc_body)

{'metadata': [{'title': 'categories'}],
 'text': 'The guy contributes to an OpenSearch in 2024 and he is great !!!',
 'vector_field': [0.45117188,
                  0.20410156,
                  0.53125,
                  -0.072265625,
                  0.48632812,
                  -0.41796875,
                  -0.33007812,
                  0.00026893616,
                  -0.82421875,
                  0.18261719,
                  -0.083984375,
                  0.21289062,
                  0.16894531,
                  -0.29492188,
                  -0.2734375,
                  -0.08691406,
                  0.36523438,
                  0.3828125,
                  -0.71875,
                  0.36132812,
                  -0.8515625,
                  0.12792969,
                  0.033203125,
                  0.12890625,
                  0.057373047,
                  0.45507812,
                  -0.0060424805,
                  -0.5546875,
                  -0.44921875,
 

In [20]:
doc_id = '1'
opensearch_utils.add_doc(os_client, index_name, doc_body, id=doc_id)



Adding document:
{'_index': '01-warming-up-english-index', '_id': '1', '_version': 1, 'result': 'created', 'forced_refresh': True, '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 0, '_primary_term': 1}


## Term Vector 확인

아래의 결과의 하단에 보면 아래와 같이 "opensearch" 가 하나의 term 으로 저장이 된 것을 볼 수 있습니다.

- 주어진 테스트 "The guy contributes to an OpenSearch in 2024 and he is great !!!" 는 인덱스의 정의에 따라서 아래와 같은 토큰만 저장이 됩니다.
- [gui, contribut, opensearch, 2024, he, great] term 만이 최종 인덱스에 저장 됩니다.
- 이유는 인덱스 안에는 필터에 의해서 lowercase, stop words, stemming 이 적용되어 그렇습니다.

In [21]:
os_client.termvectors(index=index_name, id= doc_id, fields='text')

{'_index': '01-warming-up-english-index',
 '_id': '1',
 '_version': 1,
 'found': True,
 'took': 20,
 'term_vectors': {'text': {'field_statistics': {'sum_doc_freq': 6,
    'doc_count': 1,
    'sum_ttf': 6},
   'terms': {'2024': {'term_freq': 1,
     'tokens': [{'position': 7, 'start_offset': 40, 'end_offset': 44}]},
    'contribut': {'term_freq': 1,
     'tokens': [{'position': 2, 'start_offset': 8, 'end_offset': 19}]},
    'great': {'term_freq': 1,
     'tokens': [{'position': 11, 'start_offset': 55, 'end_offset': 60}]},
    'gui': {'term_freq': 1,
     'tokens': [{'position': 1, 'start_offset': 4, 'end_offset': 7}]},
    'he': {'term_freq': 1,
     'tokens': [{'position': 9, 'start_offset': 49, 'end_offset': 51}]},
    'opensearch': {'term_freq': 1,
     'tokens': [{'position': 5, 'start_offset': 26, 'end_offset': 36}]}}}}}

위의 Term Vector 의 확인인 개발자 Tool 에 가서 하시면 더욱 편리 합니다.
아래 쿼리를 복사해서 해보세요.
```
GET /_analyze
{
  "analyzer" : "standard",
  "text" : "The guy contributes to an OpenSearch in 2024 and he is great !!!",
   "explain": false
}


GET /01-warming-up-english-index/_analyze
{
  "analyzer" : "my_analyzer",
  "text" : "The guy contributes to an OpenSearch in 2024 and he is great !!!",
   "explain": false
}

GET /01-warming-up-english-index/_analyze
{
  "analyzer" : "my_analyzer",
  "text" : "Devices, Campaigns,Resellers, EMM groups, & Roles",
   "explain": false
}
```
- ![opensearch_dev_tool.png](img/opensearch_dev_tool.png)

# 7. 문서 검색

## Lexical 검색

#### opensearch 키워드를 하나 넣고 검색 합니다.

In [22]:
q = 'opensearch'
query = {
  "query": {
    "match": {
      "text": {
        "query": f"{q}"
      }
    }
  }
}
print("query: ", query)
response = opensearch_utils.search_document(os_client, query, index_name)    
response

query:  {'query': {'match': {'text': {'query': 'opensearch'}}}}


{'took': 9,
 'timed_out': False,
 '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 1, 'relation': 'eq'},
  'max_score': 0.2876821,
  'hits': [{'_index': '01-warming-up-english-index',
    '_id': '1',
    '_score': 0.2876821,
    '_source': {'text': 'The guy contributes to an OpenSearch in 2024 and he is great !!!',
     'vector_field': [0.45117188,
      0.20410156,
      0.53125,
      -0.072265625,
      0.48632812,
      -0.41796875,
      -0.33007812,
      0.00026893616,
      -0.82421875,
      0.18261719,
      -0.083984375,
      0.21289062,
      0.16894531,
      -0.29492188,
      -0.2734375,
      -0.08691406,
      0.36523438,
      0.3828125,
      -0.71875,
      0.36132812,
      -0.8515625,
      0.12792969,
      0.033203125,
      0.12890625,
      0.057373047,
      0.45507812,
      -0.0060424805,
      -0.5546875,
      -0.44921875,
      -0.5703125,
      -0.016357422,
      0.46679688,
      0.51171875,
      -0.0

#### 대문자로 검색

In [23]:
q = 'OpenSearch'
query = {
  "query": {
    "match": {
      "text": {
        "query": f"{q}"
      }
    }
  }
}
print("query: ", query)
response = opensearch_utils.search_document(os_client, query, index_name)    
response

query:  {'query': {'match': {'text': {'query': 'OpenSearch'}}}}


{'took': 5,
 'timed_out': False,
 '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 1, 'relation': 'eq'},
  'max_score': 0.2876821,
  'hits': [{'_index': '01-warming-up-english-index',
    '_id': '1',
    '_score': 0.2876821,
    '_source': {'text': 'The guy contributes to an OpenSearch in 2024 and he is great !!!',
     'vector_field': [0.45117188,
      0.20410156,
      0.53125,
      -0.072265625,
      0.48632812,
      -0.41796875,
      -0.33007812,
      0.00026893616,
      -0.82421875,
      0.18261719,
      -0.083984375,
      0.21289062,
      0.16894531,
      -0.29492188,
      -0.2734375,
      -0.08691406,
      0.36523438,
      0.3828125,
      -0.71875,
      0.36132812,
      -0.8515625,
      0.12792969,
      0.033203125,
      0.12890625,
      0.057373047,
      0.45507812,
      -0.0060424805,
      -0.5546875,
      -0.44921875,
      -0.5703125,
      -0.016357422,
      0.46679688,
      0.51171875,
      -0.0

#### contribute 동사 원형으로 검색
- contributes ("The guy contributes to an OpenSearch in 2024 and he is great !!!" ) 인덱싱을 해도 Stemming 필터로 인해서 검색이 됨.

In [24]:
q = 'contribute'
query = {
  "query": {
    "match": {
      "text": {
        "query": f"{q}"
      }
    }
  }
}
print("query: ", query)
response = opensearch_utils.search_document(os_client, query, index_name)    
response

query:  {'query': {'match': {'text': {'query': 'contribute'}}}}


{'took': 5,
 'timed_out': False,
 '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 1, 'relation': 'eq'},
  'max_score': 0.2876821,
  'hits': [{'_index': '01-warming-up-english-index',
    '_id': '1',
    '_score': 0.2876821,
    '_source': {'text': 'The guy contributes to an OpenSearch in 2024 and he is great !!!',
     'vector_field': [0.45117188,
      0.20410156,
      0.53125,
      -0.072265625,
      0.48632812,
      -0.41796875,
      -0.33007812,
      0.00026893616,
      -0.82421875,
      0.18261719,
      -0.083984375,
      0.21289062,
      0.16894531,
      -0.29492188,
      -0.2734375,
      -0.08691406,
      0.36523438,
      0.3828125,
      -0.71875,
      0.36132812,
      -0.8515625,
      0.12792969,
      0.033203125,
      0.12890625,
      0.057373047,
      0.45507812,
      -0.0060424805,
      -0.5546875,
      -0.44921875,
      -0.5703125,
      -0.016357422,
      0.46679688,
      0.51171875,
      -0.0

### Minimum Should Match 검색
- [OpenSearch Minimum should match](https://opensearch.org/docs/latest/query-dsl/minimum-should-match/)
- [Elasticsearch Minimum_should_match](https://opster.com/guides/elasticsearch/search-apis/elasticsearch-minimum-should-match/)
- 아래 'who is contributing in 2023' 는 아래와 같이 토큰화 됨.
    - [who, contribut, 2023]
    - ![minimum_should_match.png](img/minimum_should_match.png)
- minimum_should_match 가 60%  이면 유효 Term 가 3개 이므로 3 * 0.6 = 1.8 --> 내림을 하면 1 이어서, 적어도 1개의 term 이 검색의 문서와 일치하면 결과로 반환이 됨.
    - [gui, contribut, opensearch, 2024, he, great] 문서의 term 목록



In [25]:
q = 'who is contributing in 2023'
query = {
    'query': {
        'bool': {
            'filter': [],
            'must': [
                {
                    'match': {
                        'text': {
                            'query': q,
                            'minimum_should_match': '60%',
                            'operator': 'or'
                        }
                    }
                }
            ]
        }
    },
    'size': 3
}
print("query: ", query)
response = opensearch_utils.search_document(os_client, query, index_name)    
response



query:  {'query': {'bool': {'filter': [], 'must': [{'match': {'text': {'query': 'who is contributing in 2023', 'minimum_should_match': '60%', 'operator': 'or'}}}]}}, 'size': 3}


{'took': 6,
 'timed_out': False,
 '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 1, 'relation': 'eq'},
  'max_score': 0.2876821,
  'hits': [{'_index': '01-warming-up-english-index',
    '_id': '1',
    '_score': 0.2876821,
    '_source': {'text': 'The guy contributes to an OpenSearch in 2024 and he is great !!!',
     'vector_field': [0.45117188,
      0.20410156,
      0.53125,
      -0.072265625,
      0.48632812,
      -0.41796875,
      -0.33007812,
      0.00026893616,
      -0.82421875,
      0.18261719,
      -0.083984375,
      0.21289062,
      0.16894531,
      -0.29492188,
      -0.2734375,
      -0.08691406,
      0.36523438,
      0.3828125,
      -0.71875,
      0.36132812,
      -0.8515625,
      0.12792969,
      0.033203125,
      0.12890625,
      0.057373047,
      0.45507812,
      -0.0060424805,
      -0.5546875,
      -0.44921875,
      -0.5703125,
      -0.016357422,
      0.46679688,
      0.51171875,
      -0.0

- minimum_should_match 가 70%  이면 유효 Term 가 3개 이므로 3 * 0.7 = 2.1 --> 내림을 하면 2 이어서, 적어도 2개의 term 이 검색의 문서와 일치하면 결과로 반환이 됨.    
    - 아래는 2개가 매치가 안되므로 검색 반환이 안됨
    - [gui, contribut, opensearch, 2024, he, great] 문서의 term 목록

In [27]:
q = 'who is contributing in 2023'
query = {
    'query': {
        'bool': {
            'filter': [],
            'must': [
                {
                    'match': {
                        'text': {
                            'query': q,
                            'minimum_should_match': '70%',
                            'operator': 'or'
                        }
                    }
                }
            ]
        }
    },
    'size': 3
}
print("query: ", query)
response = opensearch_utils.search_document(os_client, query, index_name)    
response



query:  {'query': {'bool': {'filter': [], 'must': [{'match': {'text': {'query': 'who is contributing in 2023', 'minimum_should_match': '70%', 'operator': 'or'}}}]}}, 'size': 3}


{'took': 2,
 'timed_out': False,
 '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 0, 'relation': 'eq'},
  'max_score': None,
  'hits': []}}

### text, metadata.title 동시 검색
- "which category in 2023?" text 필드에는 매치되는 term 이 없습니다. 하지만 title 에는 category 가 매치가 됩니다. 그래서 아래 첫번째 실행은 결과가 안나옵니다. 두 번째 실행을 한 것에 대한 실형 결과 입니다.
- ![multi_field.png](img/multi_field.png)

아래 쿼리를 복사해서 사용하셔도 됩니다.
```
GET /01-warming-up-english-index/_search
{
  "query": {
    "multi_match": {
      "query": "which category in 2023?",
      "fields": [
        "text"
      ],
      "minimum_should_match": "50%"
    }
  }
}

GET /01-warming-up-english-index/_search
{
  "query": {
    "multi_match": {
      "query": "which category in 2023?",
      "fields": [
        "text",
        "metadata.title"
      ],
      "minimum_should_match": "50%"
    }
  }
}
```

In [28]:
q = "which category in 2023?"
query = {
  "query": {
    "multi_match": {
      "query": q,
      "fields": [
        "text",
        "metadata.title"
      ],
      "minimum_should_match": "50%"
    }
  }
}
print("query: ", query)
response = opensearch_utils.search_document(os_client, query, index_name)    
response



query:  {'query': {'multi_match': {'query': 'which category in 2023?', 'fields': ['text', 'metadata.title'], 'minimum_should_match': '50%'}}}


{'took': 7,
 'timed_out': False,
 '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 1, 'relation': 'eq'},
  'max_score': 0.2876821,
  'hits': [{'_index': '01-warming-up-english-index',
    '_id': '1',
    '_score': 0.2876821,
    '_source': {'text': 'The guy contributes to an OpenSearch in 2024 and he is great !!!',
     'vector_field': [0.45117188,
      0.20410156,
      0.53125,
      -0.072265625,
      0.48632812,
      -0.41796875,
      -0.33007812,
      0.00026893616,
      -0.82421875,
      0.18261719,
      -0.083984375,
      0.21289062,
      0.16894531,
      -0.29492188,
      -0.2734375,
      -0.08691406,
      0.36523438,
      0.3828125,
      -0.71875,
      0.36132812,
      -0.8515625,
      0.12792969,
      0.033203125,
      0.12890625,
      0.057373047,
      0.45507812,
      -0.0060424805,
      -0.5546875,
      -0.44921875,
      -0.5703125,
      -0.016357422,
      0.46679688,
      0.51171875,
      -0.0

## Semantic 검색

In [29]:
text = "who contributes in 2024"
text_emb = llm_emb.embed_query(text)

In [30]:
query = {
  "size": 2,  
  "query": {
    "script_score": {
      "query": {
        "match_all": {}  
      },
      "script": {
        "source": "cosineSimilarity(params.query_vector, doc['vector_field']) + 1.0",
        "params": {
          "query_vector": text_emb  
        }
      }
    }
  }
}

# print("query: ", query)
response = opensearch_utils.search_document(os_client, query, index_name)    
response


{'took': 223,
 'timed_out': False,
 '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 1, 'relation': 'eq'},
  'max_score': 1.2429131,
  'hits': [{'_index': '01-warming-up-english-index',
    '_id': '1',
    '_score': 1.2429131,
    '_source': {'text': 'The guy contributes to an OpenSearch in 2024 and he is great !!!',
     'vector_field': [0.45117188,
      0.20410156,
      0.53125,
      -0.072265625,
      0.48632812,
      -0.41796875,
      -0.33007812,
      0.00026893616,
      -0.82421875,
      0.18261719,
      -0.083984375,
      0.21289062,
      0.16894531,
      -0.29492188,
      -0.2734375,
      -0.08691406,
      0.36523438,
      0.3828125,
      -0.71875,
      0.36132812,
      -0.8515625,
      0.12792969,
      0.033203125,
      0.12890625,
      0.057373047,
      0.45507812,
      -0.0060424805,
      -0.5546875,
      -0.44921875,
      -0.5703125,
      -0.016357422,
      0.46679688,
      0.51171875,
      -0

# 8. 생성된 인덱스 삭제

In [28]:
index_exists = opensearch_utils.check_if_index_exists(
    os_client,
    index_name
)


if index_exists:
    opensearch_utils.delete_index(
        os_client,
        index_name
    )
else:
    print("Index does not exist")    

index_name=01-warming-up-english-index, exists=True

Deleting index:
{'acknowledged': True}
