feature: update to version 0.8.2

devflowinc · May 9, 2024 · a8c38b9 · a8c38b9
1 parent 6cf5003
commit a8c38b9
Show file tree

Hide file tree

Showing 124 changed files with 221 additions and 211 deletions.
diff --git a/README.md b/README.md
@@ -3,8 +3,8 @@ Trieve OpenAPI Specification. This document describes all of the operations avai
 
 This Python package is automatically generated by the [OpenAPI Generator](https://openapi-generator.tech) project:
 
-- API version: 0.8.0
-- Package version: 0.8.0
+- API version: 0.8.2
+- Package version: 0.8.2
 - Generator version: 7.4.0
 - Build package: org.openapitools.codegen.languages.PythonClientCodegen
 For more information, please visit [https://trieve.ai](https://trieve.ai)

diff --git a/docs/ChunkData.md b/docs/ChunkData.md
@@ -5,7 +5,7 @@
 
 Name | Type | Description | Notes
 ------------ | ------------- | ------------- | -------------
-**chunk_html** | **str** | HTML content of the chunk. This can also be plaintext. The innerText of the HTML will be used to create the embedding vector. The point of using HTML is for convienience, as some users have applications where users submit HTML content. | [optional] 
+**chunk_html** | **str** | HTML content of the chunk. This can also be plaintext. The innerText of the HTML will be used to create the embedding vector. The point of using HTML is for convienience, as some users have applications where users submit HTML content. | 
 **chunk_vector** | **List[float]** | Chunk_vector is a vector of floats which can be used instead of generating a new embedding. This is useful for when you are using a pre-embedded dataset. If this is not provided, the innerText of the chunk_html will be used to create the embedding. | [optional] 
 **convert_html_to_text** | **bool** | Convert HTML to raw text before processing to avoid adding noise to the vector embeddings. By default this is true. If you are using HTML content that you want to be included in the vector embeddings, set this to false. | [optional] 
 **group_ids** | **List[str]** | Group ids are the ids of the groups that the chunk should be placed into. This is useful for when you want to create a chunk and add it to a group or multiple groups in one request. Necessary because this route queues the chunk for ingestion and the chunk may not exist yet immediately after response. | [optional] 

diff --git a/docs/ChunkMetadata.md b/docs/ChunkMetadata.md
@@ -6,7 +6,6 @@
 Name | Type | Description | Notes
 ------------ | ------------- | ------------- | -------------
 **chunk_html** | **str** |  | [optional] 
-**content** | **str** |  | 
 **created_at** | **datetime** |  | 
 **dataset_id** | **str** |  | 
 **id** | **str** |  | 

diff --git a/docs/ChunkMetadataWithScore.md b/docs/ChunkMetadataWithScore.md
@@ -6,7 +6,6 @@
 Name | Type | Description | Notes
 ------------ | ------------- | ------------- | -------------
 **chunk_html** | **str** |  | [optional] 
-**content** | **str** |  | 
 **created_at** | **datetime** |  | 
 **dataset_id** | **str** |  | 
 **id** | **str** |  | 

diff --git a/docs/CreateChunkData.md b/docs/CreateChunkData.md
@@ -5,7 +5,7 @@
 
 Name | Type | Description | Notes
 ------------ | ------------- | ------------- | -------------
-**chunk_html** | **str** | HTML content of the chunk. This can also be plaintext. The innerText of the HTML will be used to create the embedding vector. The point of using HTML is for convienience, as some users have applications where users submit HTML content. | [optional] 
+**chunk_html** | **str** | HTML content of the chunk. This can also be plaintext. The innerText of the HTML will be used to create the embedding vector. The point of using HTML is for convienience, as some users have applications where users submit HTML content. | 
 **chunk_vector** | **List[float]** | Chunk_vector is a vector of floats which can be used instead of generating a new embedding. This is useful for when you are using a pre-embedded dataset. If this is not provided, the innerText of the chunk_html will be used to create the embedding. | [optional] 
 **convert_html_to_text** | **bool** | Convert HTML to raw text before processing to avoid adding noise to the vector embeddings. By default this is true. If you are using HTML content that you want to be included in the vector embeddings, set this to false. | [optional] 
 **group_ids** | **List[str]** | Group ids are the ids of the groups that the chunk should be placed into. This is useful for when you want to create a chunk and add it to a group or multiple groups in one request. Necessary because this route queues the chunk for ingestion and the chunk may not exist yet immediately after response. | [optional] 

diff --git a/docs/GroupScoreSlimChunks.md b/docs/GroupScoreSlimChunks.md
@@ -6,6 +6,8 @@
 Name | Type | Description | Notes
 ------------ | ------------- | ------------- | -------------
 **group_id** | **str** |  | 
+**group_name** | **str** |  | [optional] 
+**group_tracking_id** | **str** |  | [optional] 
 **metadata** | [**List[ScoreSlimChunks]**](ScoreSlimChunks.md) |  | 
 
 ## Example

diff --git a/docs/RecommendChunksRequest.md b/docs/RecommendChunksRequest.md
@@ -12,7 +12,7 @@ Name | Type | Description | Notes
 **positive_chunk_ids** | **List[str]** | The ids of the chunks to be used as positive examples for the recommendation. The chunks in this array will be used to find similar chunks. | [optional] 
 **positive_tracking_ids** | **List[str]** | The tracking_ids of the chunks to be used as positive examples for the recommendation. The chunks in this array will be used to find similar chunks. | [optional] 
 **recommend_type** | **str** | The type of recommendation to make. This lets you choose whether to recommend based off of &#x60;semantic&#x60; or &#x60;fulltext&#x60; similarity. The default is &#x60;semantic&#x60;. | [optional] 
-**slim_chunks** | **bool** | Set slim_chunks to true to avoid returning the content and chunk_html of the chunks. This is useful for when you want to reduce amount of data over the wire for latency improvement. Default is false. | [optional] 
+**slim_chunks** | **bool** | Set slim_chunks to true to avoid returning the content and chunk_html of the chunks. This is useful for when you want to reduce amount of data over the wire for latency improvement (typicall 10-50ms). Default is false. | [optional] 
 **strategy** | **str** | Strategy to use for recommendations, either \&quot;average_vector\&quot; or \&quot;best_score\&quot;. The default is \&quot;average_vector\&quot;. The \&quot;average_vector\&quot; strategy will construct a single average vector from the positive and negative samples then use it to perform a pseudo-search. The \&quot;best_score\&quot; strategy is more advanced and navigates the HNSW with a heuristic of picking edges where the point is closer to the positive samples than it is the negatives. | [optional] 
 
 ## Example

diff --git a/docs/RecommendGroupChunksRequest.md b/docs/RecommendGroupChunksRequest.md
@@ -13,7 +13,7 @@ Name | Type | Description | Notes
 **positive_group_ids** | **List[str]** | The ids of the groups to be used as positive examples for the recommendation. The groups in this array will be used to find similar groups. | [optional] 
 **positive_group_tracking_ids** | **List[str]** | The ids of the groups to be used as positive examples for the recommendation. The groups in this array will be used to find similar groups. | [optional] 
 **recommend_type** | **str** | The type of recommendation to make. This lets you choose whether to recommend based off of &#x60;semantic&#x60; or &#x60;fulltext&#x60; similarity. The default is &#x60;semantic&#x60;. | [optional] 
-**slim_chunks** | **bool** | Set slim_chunks to true to avoid returning the content and chunk_html of the chunks. This is useful for when you want to reduce amount of data over the wire for latency improvement. Default is false. | [optional] 
+**slim_chunks** | **bool** | Set slim_chunks to true to avoid returning the content and chunk_html of the chunks. This is useful for when you want to reduce amount of data over the wire for latency improvement (typicall 10-50ms). Default is false. | [optional] 
 **strategy** | **str** | Strategy to use for recommendations, either \&quot;average_vector\&quot; or \&quot;best_score\&quot;. The default is \&quot;average_vector\&quot;. The \&quot;average_vector\&quot; strategy will construct a single average vector from the positive and negative samples then use it to perform a pseudo-search. The \&quot;best_score\&quot; strategy is more advanced and navigates the HNSW with a heuristic of picking edges where the point is closer to the positive samples than it is the negatives. | [optional] 
 
 ## Example

diff --git a/docs/SearchChunkData.md b/docs/SearchChunkData.md
@@ -5,18 +5,18 @@
 
 Name | Type | Description | Notes
 ------------ | ------------- | ------------- | -------------
-**date_bias** | **bool** | Set date_bias to true to bias search results towards more recent chunks. This will work best in hybrid search mode. | [optional] 
 **filters** | [**ChunkFilter**](ChunkFilter.md) |  | [optional] 
 **get_collisions** | **bool** | Set get_collisions to true to get the collisions for each chunk. This will only apply if environment variable COLLISIONS_ENABLED is set to true. | [optional] 
-**get_total_pages** | **bool** | Get total page count for the query accounting for the applied filters. Defaults to true, but can be set to false to reduce latency in edge cases performance. | [optional] 
+**get_total_pages** | **bool** | Get total page count for the query accounting for the applied filters. Defaults to false, but can be set to true when the latency penalty is acceptable (typically 50-200ms). | [optional] 
 **highlight_delimiters** | **List[str]** | Set highlight_delimiters to a list of strings to use as delimiters for highlighting. If not specified, this defaults to [\&quot;?\&quot;, \&quot;,\&quot;, \&quot;.\&quot;, \&quot;!\&quot;]. | [optional] 
-**highlight_results** | **bool** | Set highlight_results to true to highlight the results. If not specified, this defaults to true. | [optional] 
+**highlight_results** | **bool** | Set highlight_results to false for a slight latency improvement (1-10ms). If not specified, this defaults to true. This will add &lt;b&gt;&lt;mark&gt; tags to the chunk_html of the chunks to highlight matching sub-sentences. | [optional] 
 **page** | **int** | Page of chunks to fetch. Page is 1-indexed. | [optional] 
 **page_size** | **int** | Page size is the number of chunks to fetch. This can be used to fetch more than 10 chunks at a time. | [optional] 
 **query** | **str** | Query is the search query. This can be any string. The query will be used to create an embedding vector and/or SPLADE vector which will be used to find the result set. | 
+**recency_bias** | **float** | Recency Bias lets you determine how much of an effect the recency of chunks will have on the search results. If not specified, this defaults to 0.0. We recommend setting this to 1.0 for a gentle reranking of the results, &gt;3.0 for a strong reranking of the results. | [optional] 
 **score_threshold** | **float** | Set score_threshold to a float to filter out chunks with a score below the threshold. | [optional] 
 **search_type** | **str** | Can be either \&quot;semantic\&quot;, \&quot;fulltext\&quot;, or \&quot;hybrid\&quot;. \&quot;hybrid\&quot; will pull in one page (10 chunks) of both semantic and full-text results then re-rank them using BAAI/bge-reranker-large. \&quot;semantic\&quot; will pull in one page (10 chunks) of the nearest cosine distant vectors. \&quot;fulltext\&quot; will pull in one page (10 chunks) of full-text results based on SPLADE. | 
-**slim_chunks** | **bool** | Set slim_chunks to true to avoid returning the content and chunk_html of the chunks. This is useful for when you want to reduce amount of data over the wire for latency improvement. Default is false. | [optional] 
+**slim_chunks** | **bool** | Set slim_chunks to true to avoid returning the content and chunk_html of the chunks. This is useful for when you want to reduce amount of data over the wire for latency improvement (typicall 10-50ms). Default is false. | [optional] 
 **use_weights** | **bool** | Set use_weights to true to use the weights of the chunks in the result set in order to sort them. If not specified, this defaults to true. | [optional] 
 
 ## Example

diff --git a/docs/SearchOverGroupsData.md b/docs/SearchOverGroupsData.md
@@ -7,16 +7,16 @@ Name | Type | Description | Notes
 ------------ | ------------- | ------------- | -------------
 **filters** | [**ChunkFilter**](ChunkFilter.md) |  | [optional] 
 **get_collisions** | **bool** | Set get_collisions to true to get the collisions for each chunk. This will only apply if environment variable COLLISIONS_ENABLED is set to true. | [optional] 
-**get_total_pages** | **bool** | Get total page count for the query accounting for the applied filters. Defaults to true, but can be set to false to reduce latency in edge cases performance. | [optional] 
+**get_total_pages** | **bool** | Get total page count for the query accounting for the applied filters. Defaults to false, but can be set to true when the latency penalty is acceptable (typically 50-200ms). | [optional] 
 **group_size** | **int** | Group_size is the number of chunks to fetch for each group. The default is 3. If a group has less than group_size chunks, all chunks will be returned. If this is set to a large number, we recommend setting slim_chunks to true to avoid returning the content and chunk_html of the chunks so as to lower the amount of time required for content download and serialization. | [optional] 
 **highlight_delimiters** | **List[str]** | Set highlight_delimiters to a list of strings to use as delimiters for highlighting. If not specified, this defaults to [\&quot;?\&quot;, \&quot;,\&quot;, \&quot;.\&quot;, \&quot;!\&quot;]. | [optional] 
-**highlight_results** | **bool** | Set highlight_results to true to highlight the results. If not specified, this defaults to true. | [optional] 
+**highlight_results** | **bool** | Set highlight_results to false for a slight latency improvement (1-10ms). If not specified, this defaults to true. This will add &lt;b&gt;&lt;mark&gt; tags to the chunk_html of the chunks to highlight matching sub-sentences. | [optional] 
 **page** | **int** | Page of group results to fetch. Page is 1-indexed. | [optional] 
 **page_size** | **int** | Page size is the number of group results to fetch. The default is 10. | [optional] 
 **query** | **str** | Query is the search query. This can be any string. The query will be used to create an embedding vector and/or SPLADE vector which will be used to find the result set. | 
 **score_threshold** | **float** | Set score_threshold to a float to filter out chunks with a score below the threshold. | [optional] 
 **search_type** | **str** | Can be either \&quot;semantic\&quot;, \&quot;fulltext\&quot;, or \&quot;hybrid\&quot;. \&quot;hybrid\&quot; will pull in one page (10 chunks) of both semantic and full-text results then re-rank them using BAAI/bge-reranker-large. \&quot;semantic\&quot; will pull in one page (10 chunks) of the nearest cosine distant vectors. \&quot;fulltext\&quot; will pull in one page (10 chunks) of full-text results based on SPLADE. | 
-**slim_chunks** | **bool** | Set slim_chunks to true to avoid returning the content and chunk_html of the chunks. This is useful for when you want to reduce amount of data over the wire for latency improvement. Default is false. | [optional] 
+**slim_chunks** | **bool** | Set slim_chunks to true to avoid returning the content and chunk_html of the chunks. This is useful for when you want to reduce amount of data over the wire for latency improvement (typicall 10-50ms). Default is false. | [optional] 
 
 ## Example
 

diff --git a/docs/SearchWithinGroupData.md b/docs/SearchWithinGroupData.md
@@ -5,19 +5,19 @@
 
 Name | Type | Description | Notes
 ------------ | ------------- | ------------- | -------------
-**date_bias** | **bool** | Set date_bias to true to bias search results towards more recent chunks. This will work best in hybrid search mode. | [optional] 
 **filters** | [**ChunkFilter**](ChunkFilter.md) |  | [optional] 
-**get_total_pages** | **bool** | Get total page count for the query accounting for the applied filters. Defaults to true, but can be set to false to reduce latency in edge cases performance. | [optional] 
+**get_total_pages** | **bool** | Get total page count for the query accounting for the applied filters. Defaults to false, but can be set to true when the latency penalty is acceptable (typically 50-200ms). | [optional] 
 **group_id** | **str** | Group specifies the group to search within. Results will only consist of chunks which are bookmarks within the specified group. | [optional] 
 **group_tracking_id** | **str** | Group_tracking_id specifies the group to search within by tracking id. Results will only consist of chunks which are bookmarks within the specified group. If both group_id and group_tracking_id are provided, group_id will be used. | [optional] 
 **highlight_delimiters** | **List[str]** | Set highlight_delimiters to a list of strings to use as delimiters for highlighting. If not specified, this defaults to [\&quot;?\&quot;, \&quot;,\&quot;, \&quot;.\&quot;, \&quot;!\&quot;]. | [optional] 
-**highlight_results** | **bool** | Set highlight_results to true to highlight the results. If not specified, this defaults to true. | [optional] 
+**highlight_results** | **bool** | Set highlight_results to false for a slight latency improvement (1-10ms). If not specified, this defaults to true. This will add &lt;b&gt;&lt;mark&gt; tags to the chunk_html of the chunks to highlight matching sub-sentences. | [optional] 
 **page** | **int** | The page of chunks to fetch. Page is 1-indexed. | [optional] 
 **page_size** | **int** | The page size is the number of chunks to fetch. This can be used to fetch more than 10 chunks at a time. | [optional] 
 **query** | **str** | The query is the search query. This can be any string. The query will be used to create an embedding vector and/or SPLADE vector which will be used to find the result set. | 
+**recency_bias** | **float** | Recency Bias lets you determine how much of an effect the recency of chunks will have on the search results. If not specified, this defaults to 0.0. | [optional] 
 **score_threshold** | **float** | Set score_threshold to a float to filter out chunks with a score below the threshold. | [optional] 
 **search_type** | **str** | Search_type can be either \&quot;semantic\&quot;, \&quot;fulltext\&quot;, or \&quot;hybrid\&quot;. \&quot;hybrid\&quot; will pull in one page (10 chunks) of both semantic and full-text results then re-rank them using BAAI/bge-reranker-large. \&quot;semantic\&quot; will pull in one page (10 chunks) of the nearest cosine distant vectors. \&quot;fulltext\&quot; will pull in one page (10 chunks) of full-text results based on SPLADE. | 
-**slim_chunks** | **bool** | Set slim_chunks to true to avoid returning the content and chunk_html of the chunks. This is useful for when you want to reduce amount of data over the wire for latency improvement. Default is false. | [optional] 
+**slim_chunks** | **bool** | Set slim_chunks to true to avoid returning the content and chunk_html of the chunks. This is useful for when you want to reduce amount of data over the wire for latency improvement (typicall 10-50ms). Default is false. | [optional] 
 **use_weights** | **bool** | Set use_weights to true to use the weights of the chunks in the result set in order to sort them. If not specified, this defaults to true. | [optional] 
 
 ## Example

diff --git a/openapi-generator.yaml b/openapi-generator.yaml
@@ -1,7 +1,7 @@
 generatorName: python
 outputDir: ./generated-code/python-client
 packageName: trieve_py_client
-packageVersion: 0.8.0
+packageVersion: 0.8.2
 
 additionalProperties:
   projectName: trieve_py_client
@@ -10,4 +10,4 @@ additionalProperties:
   packageDescription: "Python client for Trieve API generated from its OpenAPI specification using openapi-generator."
   packageAuthor: "Trieve"
   packageAuthorEmail: "developers@trieve.ai"
-  packageVersion: "0.8.0"
+  packageVersion: "0.8.2"