Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to deserialize VectorStoreFileObject #229

Closed
hiibolt opened this issue Jun 6, 2024 · 4 comments · Fixed by #230
Closed

Failing to deserialize VectorStoreFileObject #229

hiibolt opened this issue Jun 6, 2024 · 4 comments · Fixed by #230
Labels
bug Something isn't working

Comments

@hiibolt
Copy link
Contributor

hiibolt commented Jun 6, 2024

Error:

2024-06-06T04:57:33.665329Z ERROR async_openai::error: failed deserialization of: {
  "id": "file-cYrRWFomydf8Ng9gvqs4zWBD",
  "object": "vector_store.file",
  "usage_bytes": 0,
  "created_at": 1717649854,
  "vector_store_id": "vs_gN12Na14YvsPhjPPxS91WX3b",
  "status": "in_progress",
  "last_error": null,
  "chunking_strategy": {
    "type": "static",
    "static": {
      "max_chunk_size_tokens": 800,
      "chunk_overlap_tokens": 400
    }
  }
}

Failing code:

client
   .files()
   .create(CreateFileRequest {
       file: FileInput::from_vec_u8("meoww.txt".into(), memory.clone().into_bytes()),
       purpose: FilePurpose::Assistants,
   }))
   .expect("Failed to upload memory as file!");

Looks like the issue is here:

/// Static Chunking Strategy
#[derive(Clone, Serialize, Debug, Deserialize, PartialEq, Default)]
pub struct StaticChunkingStrategy {
    /// The maximum number of tokens in each chunk. The default value is `800`. The minimum value is `100` and the maximum value is `4096`.
    max_chunk_size_tokens: u16,
    /// The number of tokens that overlap between chunks. The default value is `400`.
    ///
    /// Note that the overlap must not exceed half of `max_chunk_size_tokens`.
    chunk_overlap_tokens: u16,
}

I'm new to Rust, but it seems like the max_chunk_size_tokens and chunk_overlap_tokens fields may need to be pub?

@64bit
Copy link
Owner

64bit commented Jun 6, 2024

Thank you for reporting the error.

Your observation that both of the fields should be pub, however, it seems that for correct "shape" of data the following is the correct representation:

pub enum VectorStoreFileObjectChunkingStrategy {
    /// This is returned when the chunking strategy is unknown. Typically, this is because the file was indexed before the `chunking_strategy` concept was introduced in the API.
    Other,
    Static{ static: StaticChunkingStrategy },
}

You're welcome to see that above works and PR is most welcome!

@64bit 64bit added the bug Something isn't working label Jun 6, 2024
@hiibolt
Copy link
Contributor Author

hiibolt commented Jun 6, 2024

It appears that OpenAI also now requires a name when creating a vector store:
Error: JSONDeserialize(Error("invalid type: null, expected a string", line: 4, column: 14))

(Reproducing code)

// Create a vector store
   let vector_store_handle = client
      .vector_stores()
      .create( CreateVectorStoreRequest {
         file_ids: None,
         name: None,
         expires_after: None,
         chunking_strategy: None,
         metadata: None
      })
      .await?;

The fix seems to be changing the following type Option<String> to String in the following snippet:

pub struct CreateVectorStoreRequest {
    ...
    #[serde(skip_serializing_if = "Option::is_none")]
    pub name: Option<String>,
    ...
}

I'll write a test covering both and open a pull request shortly, thank you for your fast help!

@64bit
Copy link
Owner

64bit commented Jun 7, 2024

Thank you for th PR. The change related to vector store name and the error message is not making sense to me, can you elaborate?

@64bit
Copy link
Owner

64bit commented Jun 7, 2024

Upon further investigation, it appears that there's another bug (because of inconstency in the spec).

You can actually create a vector store by just providing file_ids, in which case name will be null:

2024-06-07T21:37:30.842537Z ERROR async_openai::error: failed deserialization of: {
  "id": "vs_kyLc5xI5qptNc3Wd1JHyhmU7",
  "object": "vector_store",
  "name": null,
  "status": "in_progress",
  "usage_bytes": 0,
  "created_at": 1717796250,
  "file_counts": {
    "in_progress": 1,
    "completed": 0,
    "failed": 0,
    "cancelled": 0,
    "total": 1
  },
  "metadata": {},
  "expires_after": null,
  "expires_at": null,
  "last_active_at": 1717796250
}
Error: JSONDeserialize(Error("invalid type: null, expected a string", line: 4, column: 14))

That means the actual bug is in VectorStoreObject where name needs to be Option<String> instead of String.

So you can safely undo the name field change for CreateVectorStoreRequest in your PR. and feel free to include the changes for observation above.

@64bit 64bit closed this as completed in #230 Jun 7, 2024
64bit added a commit that referenced this issue Jun 7, 2024
…Strategy (#230)

* fix: Update vector store file chunking strategy to use StaticChunkingStrategy

This targets the changes mentioned in #229, because OpenAI requires a non-null name when creating a vector store. This also fixes attaching a file to a vector store, where it would fail to deserialize. Also adds a test for both errors to catch them during development in the future!

#229

* bugfix: Made optional name field consistent to OpenAI spec

Reverted according to recent comments on #229, reasoning can be found on thread.

* chore: Modified redudant code in test

Related details can be found in PR #230

* test: Update tests to double check for a failure in add file to vector store

Related PR: #230

Co-authored-by: Himanshu Neema <himanshun.iitkgp@gmail.com>

---------

Co-authored-by: Himanshu Neema <himanshun.iitkgp@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants