Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cursor not found error while adding vector fields to the documents #3

Open
ricurious opened this issue May 5, 2024 · 0 comments
Open

Comments

@ricurious
Copy link

In lab_3_mongodb_vector_search, when collections contain large number of documents, error is encountered while iterating through the cursor. It is resolved by first typecasting the cursor to a list & then iterating through the list.
Before:

def add_collection_content_vector_field(collection_name: str):
    collection = db[collection_name]
    bulk_operations = []
    for doc in collection.find():
        # remove any previous contentVector embeddings
        if "contentVector" in doc:
            print('content vector exists')
            del doc["contentVector"]

        # generate embeddings for the document string representation
        content = json.dumps(doc, default=str)
        content_vector = generate_embeddings(content)       
        
        bulk_operations.append(pymongo.UpdateOne(
            {"_id": doc["_id"]},
            {"$set": {"contentVector": content_vector}},
            upsert=True
        ))
    # execute bulk operations
    collection.bulk_write(bulk_operations)

image

On typecasting to list:

def add_collection_content_vector_field(collection_name: str):
    collection = db[collection_name]
    bulk_operations = []
    documents = list(collection.find())
    for doc in documents:
        # remove any previous contentVector embeddings
        if "contentVector" in doc:
            print('content vector exists')
            del doc["contentVector"]

        # generate embeddings for the document string representation
        content = json.dumps(doc, default=str)
        content_vector = generate_embeddings(content)       
        
        bulk_operations.append(pymongo.UpdateOne(
            {"_id": doc["_id"]},
            {"$set": {"contentVector": content_vector}},
            upsert=True
        ))
    # execute bulk operations
    collection.bulk_write(bulk_operations)

image

Can I raise a PR to make this change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant