# 目标概要

- 指导如何部署neural-sparse模型(OpenSearch内置模型)和neural-dense模型(bedrock:Cohere Embedding)
- 指导如何在ingestion 和 query时使用这些模型
- 指导如何实现两路的Hybird查询

In [None]:
!pip install requests_aws4auth
!pip install boto3

## 0. 前置步骤
- 进入OpenSearch Dashboard 的 dev tool <br>
  对于部署在VPC中的OpenSearch cluster，需要通过该VPC的ec2进行访问中转，具体可以参考workshop其中一[小节](https://catalog.us-east-1.prod.workshops.aws/workshops/158a2497-7cbe-4ba4-8bee-2307cb01c08a/en-US/4-runpipeline/setupknowledgebase) <br> <br>
- 注册一个model_group
    ```http
    POST /_plugins/_ml/model_groups/_register
    {
      "name": "remote_model_group",
      "description": "A model group for remote models"
    }
    ```
    <br>
    输出:
    
    ```json
    {
      "model_group_id": "zksL94sB0S9ucTLoj1u0",
      "status": "CREATED"
    }
    ```

## 1. 部署neural-sparse模型 

### 2.11版本(部署在SageMaker中)

- 进入OpenSearch的Integration页面

  ![integration_1.png](./integration_1.png)
  按照cloudformation template的要求填完以后，会有模型部署在sagemaker，且已经在OpenSearch中注册好model

- 验证部署的nerual-sparse模型
  + 进入cloudformation对应stack，切换到output, 获取modelId, ConnecterId 以及Sagemaker endpoint
    ![nerual-sparse.png](./nerual-sparse.png)
  + dev tool中执行如下脚本
    ```http
    GET /_plugins/_ml/models/<model_id>
    ```
    <br>
    输出
    
    ```json
    {
      "name": "sagemaker-model-for-connector-1UsW_4sB0S9ucTLoyVsG",
      "model_group_id": "1ksW_4sB0S9ucTLoyVul",
      "algorithm": "REMOTE",
      "model_version": "1",
      "description": "Sagemaker Model for connector 1UsW_4sB0S9ucTLoyVsG",
      "model_state": "DEPLOYED",
      "created_time": 1700791765463,
      "last_updated_time": 1700791765549,
      "last_deployed_time": 1700791765549,
      "planning_worker_node_count": 2,
      "current_worker_node_count": 2,
      "planning_worker_nodes": [
        "W5BqyJqbRr2GPVkjwgoaqQ",
        "dbjOCw5sSBuIKgKz3CKXjQ"
      ],
      "deploy_to_all_nodes": true,
      "connector_id": "1UsW_4sB0S9ucTLoyVsG"
    }
    ```

- 测试部署的SPLADE模型
  ```http
  POST /_plugins/_ml/models/2EsW_4sB0S9ucTLoyVvY/_predict
  {
    "parameters": {
      "inputs": "Hi Altman"
    }
  }
  ```
  <br>
  输出
  <br>
  
  ```json
  {
    "inference_results": [
      {
        "output": [
          {
            "name": "response",
            "dataAsMap": {
              "response": [
                {
                  "e": 0.1419215202331543,
                  "he": 0.33063653111457825,
                  "his": 0.424188494682312,
                  "she": 0.10910777002573013,
                  "him": 0.05982781946659088,
                  "who": 0.47575441002845764,
                  "american": 0.011252160184085369,
                  ...
                }
              ]
            }
          }
        ],
        "status_code": 200
      }
    ]
  }
  ```


### 2.12版本（直接部署在OpenSearch集群中，尚未发布）

- 设置权限
  ```http
    PUT /_cluster/settings
    {
        "persistent": {
            "plugins.ml_commons.only_run_on_ml_node": false,
            "plugins.ml_commons.connector_access_control_enabled": true,
            "plugins.ml_commons.model_access_control_enabled": true,
            "plugins.ml_commons.trusted_connector_endpoints_regex": [
              "^https://runtime\\.sagemaker\\..*[a-z0-9-]\\.amazonaws\\.com/.*$",
              "^https://api\\.openai\\.com/.*$",
              "^https://api\\.cohere\\.ai/.*$",
              "^https://bedrock-runtime\\..*[a-z0-9-]\\.amazonaws\\.com/.*$"
            ]
        }
    }
    ```
    <br>
- 部署Sparse encoding models(bi-encoder模式, 即摄入和查询的是否都进行扩词)
  - 进入Opensearch Dashboard的dev tool， 执行如下脚本，注册bi-encoder模型
    ```http
    POST /_plugins/_ml/models/_register
    {
      "name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-v1",
      "version": "1.0.1",
      "model_format": "TORCH_SCRIPT"
    }
    ```
    
  - 部署模型
    ```http
    POST /_plugins/_ml/models/<model_id>/_deploy
    ```

- 部署Sparse encoding models(doc-only模式)
  - 进入Opensearch Dashboard的dev tool， 执行如下脚本，注册doc-only & tokenizer模型
    + 注入侧
        ```http
        POST /_plugins/_ml/models/_register
        {
          "name": "amazon/neural-sparse/opensearch-neural-sparse-encoding-doc-v1",
          "version": "1.0.1",
          "model_format": "TORCH_SCRIPT"
        }
        ```
    + 查询侧
        ```http
        POST /_plugins/_ml/models/_register
        {
          "name": "amazon/neural-sparse/opensearch-neural-sparse-tokenizer-v1",
          "version": "1.0.1",
          "model_format": "TORCH_SCRIPT"
        }
        ```
  - 部署这两个模型
    ```http
    POST /_plugins/_ml/models/<model_id_ingest>/_deploy
    POST /_plugins/_ml/models/<model_id_search>/_deploy
    ```
    + 注意：
      + model_id_ingest 为摄入时的模型opensearch-neural-sparse-encoding-doc-v1
      + model_id_search 为查询时的模型opensearch-neural-sparse-tokenizer-v1


## 2. 注册Bedrock Cohere向量模型


- 检测当前账号 Bedrock Cohere模型是否可用
  + 可以直接执行下面cell进行测试

In [None]:
BEDROCK_EMBEDDING_MODELID = "cohere.embed-multilingual-v3"
bedrock = boto3.client(service_name='bedrock-runtime', region_name=region)

def get_embedding_bedrock(text_arrs):
    body = json.dumps({
        "texts": text_arrs,
        "input_type": "search_document"
    })
    bedrock_resp = bedrock.invoke_model(
            body=body,
            modelId=BEDROCK_EMBEDDING_MODELID,
            accept="application/json",
            contentType="application/json"
        )
    response_body = json.loads(bedrock_resp.get('body').read())
    embeddings = response_body['embeddings']
    return embeddings

print(get_embedding_bedrock(["hello world", "see you later"]))

- 创建connector
  + 注意事项：
    1. 不能直接在dev tool中执行
    2. 不能在notebook中执行，因为AOS Domain在VPC中，需要拷贝下面代码到aos domain所在的vpc中ec2去执行
    3. 需要给connector创建一个IAM Role : OpenSearchAndBedrockRole 参考https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ml-amazon-connector.html
      ```json
        {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Sid": "VisualEditor0",
                    "Effect": "Allow",
                    "Action": [
                        "bedrock:InvokeModel",
                        "bedrock:InvokeModelWithResponseStream",
                        "sagemaker:InvokeEndpointAsync",
                        "sagemaker:InvokeEndpoint"
                    ],
                    "Resource": "*"
                },
                {
                    "Effect": "Allow",
                    "Action": "iam:PassRole",
                    "Resource": "arn:aws:iam::106839800180:role/OpenSearchAndBedrockRole"
                },
                {
                    "Effect": "Allow",
                    "Action": "es:ESHttpPost",
                    "Resource": "arn:aws:es:us-west-2:106839800180:domain/domain66ac69e0-ijsmtgwnje5s/*"
                }
            ]
        }      
      ```
      <br>
  + 执行代码, 代码中有一些hardcode的变量需要进行替换,比如host, role_name, account_id, model_name。如果是用作search的input_type，需要设定为search_query
    ```python
    import boto3
    import requests 
    from requests_aws4auth import AWS4Auth

    service = 'es'
    session = boto3.Session()
    credentials = session.get_credentials()
    region = session.region_name
    awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)

    path = '/_plugins/_ml/connectors/_create'
    host = 'https://vpc-domain66ac69e0-ijsmtgwnje5s-oa63og27tmacx2fstb5hpstta4.us-west-2.es.amazonaws.com'
    url = host + path

    role_name = "OpenSearchAndBedrockRole"
    account_id = "106839800180"
    role_arn = "arn:aws:iam::{}:role/{}".format(account_id, role_name)
    model_name = "cohere.embed-multilingual-v3"

    bedrock_url = "https://bedrock-runtime.{}.amazonaws.com/model/{}/invoke".format(region, model_name)

    payload = {
      "name": "Amazon Bedrock Connector: Cohere doc embedding",
      "description": "The connector to the Bedrock Cohere multilingual doc embedding model",
      "version": 1,
      "protocol": "aws_sigv4",
      "parameters": {
        "region": region,
        "service_name": "bedrock"
      },
      "credential": {
        "roleArn": role_arn
      },
      "actions": [
        {
          "action_type": "predict",
          "method": "POST",
          "url": bedrock_url,
          "headers": {
            "content-type": "application/json",
            "x-amz-content-sha256": "required"
          },
          "request_body": "{ \"texts\": ${parameters.texts}, \"input_type\": \"search_document\" }",
          "pre_process_function": "connector.pre_process.cohere.embedding",
          "post_process_function": "connector.post_process.cohere.embedding"
        }
      ]
    }
    headers = {"Content-Type": "application/json"}

    r = requests.post(url, auth=awsauth, json=payload, headers=headers)
    print(r.status_code)
    print(r.text)
    ```
    <br>
  + 输出 <br>
    + 用于向量化文档的connector
        ```json
        {"connector_id":"3ktHC4wB0S9ucTLoRFvx"}
        ```
        <br>
    + 用于向量化query的connector <br> 
        ```json
        {"connector_id":"4kt2C4wB0S9ucTLoNVsP"}
        ```

- 注册外部模型 <br>
    注意替换对应的变量 <br>
    可以添加deploy=true参数，也可以通过`POST /_plugins/_ml/models/<model_id>/_deploy` 进行部署
    ```http
    POST /_plugins/_ml/models/_register?deploy=true
    {
        "name": "cohere embed-multilingual-v3",
        "function_name": "remote",
        "model_group_id": "zksL94sB0S9ucTLoj1u0",
        "description": "embedding for multilingual",
        "connector_id": "3ktHC4wB0S9ucTLoRFvx"
    }
    ```
    输出: 
    ```json
    {
      "task_id": "30tIC4wB0S9ucTLoWVtV",
      "status": "CREATED",
      "model_id": "4EtIC4wB0S9ucTLoWVtt" // for doc: 4EtIC4wB0S9ucTLoWVtt; for query: 5Et6C4wB0S9ucTLo1Vsj
    }
    ```

- 测试外部模型
    ```http
    POST /_plugins/_ml/models/<model_id>/_predict
    {
      "parameters": {
        "texts": ["Hello word", "Hi Altman"]
      }
    }
    ```

## 3. 在摄入阶段配置Sparse encoding models

- 构建摄入的pipeline
```http
PUT /_ingest/pipeline/neural-sparse-pipeline
{
  "description": "neural sparse encoding pipeline",
  "processors" : [
    {
      "sparse_encoding": {
        "model_id": "<nerual_sparse_model_id>",
        "field_map": {
           "content": "sparse_embedding"
        }
      }
    },
    {
      "text_embedding": {
        "model_id": "<cohere_ingest_model_id>",
        "field_map": {
          "doc": "embedding"
        }
      }
    }
  ]
}
```

- 构建包含Sparse encoding的OpenSearch Index
```http
PUT chatbot-index
{
    "settings" : {
        "index":{
            "number_of_shards" : 1,
            "number_of_replicas" : 0,
            "knn": "true",
            "knn.algo_param.ef_search": 32
        }, 
        "default_pipeline": "neural-sparse-pipeline"
    },
    "mappings": {
        "properties": {
            "publish_date" : {
                "type": "date",
                "format": "yyyy-MM-dd HH:mm:ss"
            },
            "idx" : {
                "type": "integer"
            },
            "doc_type" : {
                "type" : "keyword"
            },
            "doc": {
                "type": "text",
                "analyzer": "ik_max_word",
                "search_analyzer": "ik_smart"
            },
            "content": {
                "type": "text",
                "analyzer": "ik_max_word",
                "search_analyzer": "ik_smart"
            },
            "doc_title": {
                "type": "keyword"
            },
            "doc_author": {
                "type": "keyword"
            },
            "doc_category": {
                "type": "keyword"
            },
            "embedding": {
                "type": "knn_vector",
                "dimension": 1024,
                "method": {
                    "name": "hnsw",
                    "engine": "nmslib",
                    "space_type": "innerproduct",
                    "parameters": {}
                }            
            },
            "sparse_embedding": {
                "type": "rank_features"
            }
        }
    }
}
```

## 4. 摄入测试

```http
PUT /chatbot-index/_doc/1
{
    "publish_date" : "2023-11-22 10:00:00",
    "idx" : 1,
    "doc_type" : "paragraph",
    "doc" : "Sam Altman returns to OpenAI in a bizarre reversal of fortunes",
    "content" : "New York - Sam Altman has agreed to return to lead OpenAI, the company said in a Tuesday post on X, just days after his surprise ouster as chief executive sparked an employee revolt that threatened to undermine what has been the leading company in the fledgling artificial intelligence industry.",
    "doc_title" : "",
    "doc_author" : "Altman",
    "doc_category" : ""
}
```

## 5. 查询测试
- 测试 neural_sparse 查询
```http
GET /chatbot-index/_search?explain=true
{
  "query": {
      "neural_sparse": {
          "sparse_embedding": {
            "query_text": "OpenAI Inc",
            "model_id": "<nerual_sparse_model_id>",
            "max_token_score": 3.5
          }
      }
  }
}
```

- 测试Cohere neural_dense 查询 
```http
GET /chatbot-index/_search/
{
    "query": {
        "neural": {
            "embedding": {
              "query_text": "OpenAI Inc",
              "model_id": "<cohere_search_model_id>",
              "k": 10
            }
        }
    }
}
```

- 测试两路混合查询
  - 创建search_pipeline
    ```http
    PUT /_search/pipeline/dense-sparse-pipeline
    {
      "description": "Post processor for hybrid search",
      "phase_results_processors": [
        {
          "normalization-processor": {
            "normalization": {
              "technique": "l2"
            },
            "combination": {
              "technique": "arithmetic_mean",
              "parameters": {
                "weights": [
                  0.3,
                  0.7
                ]
              }
            }
          }
        }
      ]
    }
    ```
    <br>
  - 执行混合查询 <br>
    ```http
    GET /chatbot-index/_search?search_pipeline=dense-sparse-pipeline
    {
      "query": {
        "hybrid": {
          "queries": [
            {
              "neural_sparse": {
                  "sparse_embedding": {
                    "query_text": "OpenAI Inc",
                    "model_id": "<splade_model_id>",
                    "max_token_score": 2.0
                  }
              }
            },
            {
              "neural": {
                "embedding": {
                  "query_text": "OpenAI Inc",
                  "model_id": "<cohere_search_model_id>",
                  "k": 10
                }
              }
            }
          ]
        }
      }
    }
    ```

## 6. 如何删除模型与connector

- 卸载部署的外部模型(如果需要重续部署，可以先卸载)
    ```http
    POST /_plugins/_ml/models/<model_id>/_undeploy
    ```
    输出 =>
    ```json
    {
      "W5BqyJqbRr2GPVkjwgoaqQ": {
        "stats": {
          "vdkQ94sBMoMEFe1F4cRP": "undeployed"
        }
      },
      "dbjOCw5sSBuIKgKz3CKXjQ": {
        "stats": {
          "vdkQ94sBMoMEFe1F4cRP": "undeployed"
        }
      }
    }
    ```
    <br>
- 删除这个模型(_register的反向操作)
    ```http
    DELETE /_plugins/_ml/models/<model_id> # v9kB-4sBMoMEFe1FCcRr
    ```
    输出 => 
    ```json
    {
      "_index": ".plugins-ml-model",
      "_id": "v9kB-4sBMoMEFe1FCcRr",
      "_version": 5,
      "result": "deleted",
      "_shards": {
        "total": 2,
        "successful": 2,
        "failed": 0
      },
      "_seq_no": 4,
      "_primary_term": 1
    }
    ```
    <br>
- 删除connector (重新创建的时候)
    ```http
    DELETE /_plugins/_ml/connectors/<connector_id>
    ```
    <br>输出=>
    ```json
    {
      "_index": ".plugins-ml-connector",
      "_id": "u9lo9osBMoMEFe1FhcTh",
      "_version": 2,
      "result": "deleted",
      "_shards": {
        "total": 2,
        "successful": 2,
        "failed": 0
      },
      "_seq_no": 1,
      "_primary_term": 1
    }
    ```

## 7. FAQ

问：关于批量推理? <br>
答：似乎nerual-sparse模型不能批量推理， bedrock: Cohere Embedding可以批量推理

问：Cohere模型的传入参数有哪些讲究？<br>
答：不能完全参考cohere官网，可以传入texts和query_type, truncate不能传入，否则会报错。modelId无需传入

问：使用nerual-sparse模型的版本要求 <br>
答：2.11版本以上，2.11版本通过sagemaker接入（ml.g4dn.xlarge具有最佳性价比），2.12版本预计可以部署在OpenSearch集群中

问: 需要部署哪些模型？<br>
答：如果Nerual-Sparse采用bi-encoder(也就是查询和文档都进行term expansion),那么仅需要部署一个模型。 如果采用doc-only，那么需要部署两个模型，一个用于对文档扩词，一个用于对query进行分词。对于Cohere, 由于调用的是bedrock API，所以本质上没有模型部署，但由于对doc和query向量化传入的参数不同，所以需要创建两个connector，并构建两个OpenSearch的model

问: 构建connector时，pre_process_function 和 post_process_function分别起什么作用？<br>
答：pre_process_function用于把OpenSearch Pipeline中的一些输入格式转换成模型接口的参数形式。post_process_function用于把模型的response中间结果转换成pipeline所需要的格式。建议使用OpenSearch中的一些定义好的Processor，模型接口变化后，OpenSearch也会随之进行维护。

问: nerual sparse 查询中的 max_token_score 参数是做什么用的？<br>
答: 这个代表了倒排索引上一个token能贡献的分数上限，用来适配lucene的WAND剪枝算法的，可以加速query。对我们的2个模型就用固定的2和3.5就行，不影响检索结果，纯加速用的。3.5 是给bi-encoder模式的固定参数，2是给doc-only模式的固定参数。 下个aos版本lucene做了不少优化，我们也不再需要，就会deprecate这个参数

问: neural_sparse 能使用explain 吗？
答: 可以，但是hybird查询的时不能。