Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential memory leaks #54

Open
polterguy opened this issue Jun 8, 2023 · 2 comments
Open

Potential memory leaks #54

polterguy opened this issue Jun 8, 2023 · 2 comments

Comments

@polterguy
Copy link
Contributor

First I want to thank you for an incredible library. It seems to be consistently returning results in the milliseconds timeframe, where my previous logic had to apply a table scan, manually apply the dot product in C#, resulting in 3 to 5 orders of magnitudes faster results. Najs work! :)

Then my question which is about memory leaks. I run this in a Kubernetes environment, and I've noticed it keeps on eating memory, especially when you re-index, and/or query. I know C++ quite well, even though I haven't touched it for more than a decade, and I can see the following code. My comments are prefixed with QUESTION for brevity;

std::vector<float> * vectorFromBlobValue(sqlite3_value*value, const char ** pzErrMsg) {
  int n = sqlite3_value_bytes(value);
  const void * b;
  char header;
  char type;

  if(n < (2)) {
    *pzErrMsg = "Vector blob size less than header length";
    return NULL;
  }
  b = sqlite3_value_blob(value);
  memcpy(&header, ((char *) b + 0), sizeof(char));
  memcpy(&type,   ((char *) b + 1), sizeof(char));

  if(header != VECTOR_BLOB_HEADER_BYTE) {
    *pzErrMsg = "Blob not well-formatted vector blob";
    return NULL;
  }
  if(type != VECTOR_BLOB_HEADER_TYPE) {
    *pzErrMsg = "Blob type not right";
    return NULL;
  }
  int numElements = (n - 2)/sizeof(float);
  float * v = (float *) ((char *)b + 2);


  // QUESTION; Creates new std vector as heap memory and returns to method below
  return new std::vector<float>(v, v+numElements);
}

static std::vector<float>* valueAsVector(sqlite3_value*value) {
  // Option 1: If the value is a "vectorf32v0" pointer, create vector from that
  VectorFloat* v = (VectorFloat*) sqlite3_value_pointer(value, VECTOR_FLOAT_POINTER_NAME);


  // QUESTION; Creates new std vector as heap memory and returns to method below
  if (v!=NULL) return new std::vector<float>(v->data, v->data + v->size);
  std::vector<float> * vec;

  // Option 2: value is a blob in vector format
  if(sqlite3_value_type(value) == SQLITE_BLOB) {
    const char * pzErrMsg = 0;
    if((vec = vectorFromBlobValue(value, &pzErrMsg)) != NULL) {
      return vec;
    }
    if((vec = vectorFromRawBlobValue(value, &pzErrMsg)) != NULL) {
      return vec;
    }
  }
  // Option 3: if value is a JSON array coercible to float vector, use that
  //if(sqlite3_value_subtype(value) == JSON_SUBTYPE) {
  if(sqlite3_value_type(value) == SQLITE_TEXT) {
    if((vec = vectorFromTextValue(value)) != NULL) {
      return vec;
    }else {
      return NULL;
    }
  }

  // else, value isn't a vector
  return NULL;
}

static void vector_value_at(sqlite3_context *context, int argc, sqlite3_value **argv) {

  // QUESTION: Invokes above method returning heaped memory
  std::vector<float>*v = valueAsVector(argv[0]);
  if(v == NULL) return;
  int at = sqlite3_value_int(argv[1]);
  try {
    float result = v->at(at);
    sqlite3_result_double(context, result);
  }
   catch (const std::out_of_range& oor) {
    char * errmsg = sqlite3_mprintf("%d out of range: %s", at, oor.what());
    if(errmsg != NULL){
      sqlite3_result_error(context, errmsg, -1);
      sqlite3_free(errmsg);
    }
    else sqlite3_result_error_nomem(context);
  }

  // QUESTION: v pointer is never deleted?
}

This is only one example of where I suspect the library might be losing memory. There are other examples, but I am not acquainted enough with SQLite plugins to be sure, and there might be some hidden internals in STLite freeing up the memory that I can't see. However, I see your library is using massive amounts of memory over time, and it seems to be increasing.

Suggestion

Use auto_ptr to move heaped memory around to avoid lost memory?

@polterguy
Copy link
Contributor Author

Created PR

@romiras
Copy link

romiras commented Apr 25, 2024

Where is mentioned PR? Could you please share?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants