Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: An error occurred while processing the file: 'File' object has no attribute 'file' #2784

Open
dangnhdev opened this issue Jun 29, 2024 · 5 comments
Labels
area: backend Related to backend functionality or under the /backend directory bug Something isn't working Stale

Comments

@dangnhdev
Copy link

What happened?

Newly installed quivr instance on a fresh VM. Logged in and upload the attached document to a brain.
woocommerce-api-v3.md

Notification:

Processing File woocommerceapiv3.md
An error occurred while processing the file: 'File' object has no attribute 'file'

Relevant log output

worker        | [2024-06-29 07:34:21,772: INFO/MainProcess] Task process_file_and_notify[e7a3c704-c721-4c1f-9509-a36c2c71f9ad] received
backend-core  | INFO:     192.168.1.15:65181 - "POST /upload?brain_id=07121c8f-4d1c-42b6-b0c1-c78d0a6a0eec&chat_id=e742ea72-83ef-424a-b1d4-6ee49ffdedcf HTTP/1.1" 200 OK
worker        | [2024-06-29 07:34:21,795: INFO/ForkPoolWorker-22] HTTP Request: GET http://host.docker.internal:54321/storage/v1/object/quivr/07121c8f-4d1c-42b6-b0c1-c78d0a6a0eec/woocommerceapiv3.md "HTTP/1.1 200 OK"
backend-core  | INFO:     192.168.1.15:65181 - "DELETE /chat/e742ea72-83ef-424a-b1d4-6ee49ffdedcf HTTP/1.1" 200 OK
worker        | [2024-06-29 07:34:21,812: INFO/ForkPoolWorker-22] HTTP Request: GET http://host.docker.internal:54321/rest/v1/vectors?select=id&file_sha1=eq.None "HTTP/1.1 200 OK"
worker        | [2024-06-29 07:34:21,822: INFO/ForkPoolWorker-22] HTTP Request: GET http://host.docker.internal:54321/rest/v1/vectors?select=id&file_sha1=eq.None "HTTP/1.1 200 OK"
worker        | [2024-06-29 07:34:21,827: INFO/ForkPoolWorker-22] HTTP Request: GET http://host.docker.internal:54321/rest/v1/brains_vectors?select=brain_id%2C%20vector_id&brain_id=eq.07121c8f-4d1c-42b6-b0c1-c78d0a6a0eec&file_sha1=eq.None "HTTP/1.1 200 OK"
worker        | [2024-06-29 07:34:21,832: WARNING/ForkPoolWorker-22] Error processing file: 'File' object has no attribute 'file'
worker        | [ERROR] quivr_api.celery_worker [celery_worker.py:91]: 'File' object has no attribute 'file'
worker        | Traceback (most recent call last):
worker        |   File "/code/api/quivr_api/celery_worker.py", line 69, in process_file_and_notify
worker        |     filter_file(
worker        |   File "/code/api/quivr_api/packages/files/processors.py", line 103, in filter_file
worker        |     raise e
worker        |   File "/code/api/quivr_api/packages/files/processors.py", line 86, in filter_file
worker        |     result = file_processors[file.file_extension](
worker        |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
worker        |   File "/code/api/quivr_api/packages/files/parsers/markdown.py", line 10, in process_markdown
worker        |     return process_file(
worker        |            ^^^^^^^^^^^^^
worker        |   File "/code/api/quivr_api/packages/files/parsers/common.py", line 36, in process_file
worker        |     doc = file.file
worker        |           ^^^^^^^^^
worker        |   File "/usr/local/lib/python3.11/site-packages/pydantic/main.py", line 811, in __getattr__
worker        |     raise AttributeError(f'{type(self).__name__!r} object has no attribute {item!r}')
worker        | AttributeError: 'File' object has no attribute 'file'
worker        | [2024-06-29 07:34:21,841: INFO/ForkPoolWorker-22] HTTP Request: PATCH http://host.docker.internal:54321/rest/v1/notifications?id=eq.1d11cf53-d076-4ad7-ac3e-d410e1525954 "HTTP/1.1 200 OK"
worker        | [2024-06-29 07:34:21,843: INFO/ForkPoolWorker-22] Task process_file_and_notify[e7a3c704-c721-4c1f-9509-a36c2c71f9ad] succeeded in 0.06813362699995196s: None
backend-core  | INFO:     192.168.1.15:65181 - "GET /knowledge?brain_id=07121c8f-4d1c-42b6-b0c1-c78d0a6a0eec HTTP/1.1" 200 OK

Twitter / LinkedIn details

No response

@dangnhdev dangnhdev added the bug Something isn't working label Jun 29, 2024
@dosubot dosubot bot added the area: backend Related to backend functionality or under the /backend directory label Jun 29, 2024
Copy link

dosubot bot commented Jun 29, 2024

To resolve the error "'File' object has no attribute 'file'" when processing an uploaded file in quivr, you need to ensure that you are not mistakenly trying to access a non-existent attribute file on the File object. The File object should have the correct attributes and methods defined.

Here is the corrected part of the code that handles the File object:

  1. Ensure the File class is correctly defined:

    class File:
        def __init__(self, file_name, tmp_file_path, bytes_content, file_size, file_extension):
            self.file_name = file_name
            self.tmp_file_path = tmp_file_path
            self.bytes_content = bytes_content
            self.file_size = file_size
            self.file_extension = file_extension
    
        def file_already_exists(self):
            # Implementation
            pass
    
        def file_already_exists_in_brain(self, brain_id):
            # Implementation
            pass
    
        def file_is_empty(self):
            return self.file_size == 0
    
        def link_file_to_brain(self, brain_id):
            # Implementation
            pass
  2. Update the process_file_and_notify function:

    @celery.task(name="process_file_and_notify")
    def process_file_and_notify(
        file_name: str,
        file_original_name: str,
        brain_id,
        notification_id=None,
        integration=None,
        delete_file=False,
    ):
        try:
            supabase_client = get_supabase_client()
            tmp_name = file_name.replace("/", "_")
            base_file_name = os.path.basename(file_name)
            _, file_extension = os.path.splitext(base_file_name)
    
            with NamedTemporaryFile(
                suffix="_" + tmp_name,
            ) as tmp_file:
                res = supabase_client.storage.from_("quivr").download(file_name)
                tmp_file.write(res)
                tmp_file.flush()
                file_instance = File(
                    file_name=base_file_name,
                    tmp_file_path=tmp_file.name,
                    bytes_content=res,
                    file_size=len(res),
                    file_extension=file_extension,
                )
                brain_vector_service = BrainVectorService(brain_id)
                if delete_file:
                    brain_vector_service.delete_file_from_brain(
                        file_original_name, only_vectors=True
                    )
    
                filter_file(
                    file=file_instance,
                    brain_id=brain_id,
                    original_file_name=file_original_name,
                )
    
                if notification_id:
                    notification_service.update_notification_by_id(
                        notification_id,
                        NotificationUpdatableProperties(
                            status=NotificationsStatusEnum.SUCCESS,
                            description="Your file has been properly uploaded!",
                        ),
                    )
                brain_service.update_brain_last_update_time(brain_id)
    
                return True
    
        except TimeoutError:
            logger.error("TimeoutError")
    
        except Exception as e:
            logger.exception(e)
            notification_service.update_notification_by_id(
                notification_id,
                NotificationUpdatableProperties(
                    status=NotificationsStatusEnum.ERROR,
                    description=f"An error occurred while processing the file: {e}",
                ),
            )
  3. Ensure the filter_file function is correctly implemented:

    def filter_file(
        file,
        brain_id,
        original_file_name=None,
    ):
        file_exists = file.file_already_exists()
        file_exists_in_brain = file.file_already_exists_in_brain(brain_id)
        using_file_name = file.file_name
    
        brain = brain_service.get_brain_by_id(brain_id)
        if brain is None:
            raise Exception("It seems like you're uploading knowledge to an unknown brain.")
    
        if file_exists_in_brain:
            return create_response(
                f"🤔 {using_file_name} already exists in brain {brain.name}.",
                "warning",
            )
        elif file.file_is_empty():
            return create_response(
                f"❌ {original_file_name} is empty.",
                "error",
            )
        elif file_exists:
            file.link_file_to_brain(brain_id)
            return create_response(
                f"✅ {using_file_name} has been uploaded to brain {brain.name}.",
                "success",
            )
    
        if file.file_extension in file_processors:
            try:
                result = file_processors[file.file_extension](
                    file=file,
                    brain_id=brain_id,
                    original_file_name=original_file_name,
                )
                if result is None or result == 0:
                    return create_response(
                        f"? {using_file_name} has been uploaded to brain. There might have been an error while reading it, please make sure the file is not illformed or just an image",
                        "warning",
                    )
                return create_response(
                    f"✅ {using_file_name} has been uploaded to brain {brain.name} in {result} chunks",
                    "success",
                )
            except Exception as e:
                print(f"Error processing file: {e}")
                raise e
    
        return create_response(
            f"❌ {using_file_name} is not supported.",
            "error",
        )

By ensuring that the File class and its methods are correctly defined and used, you should be able to resolve the error [1][2][3][4].

To continue talking to Dosu, mention @dosu.

@dangnhdev
Copy link
Author

Update: any file types I use to upload to Quivr produce the same error. PDF for example:
Running with latest ubuntu 22.04, docker-compose.yml. Can't get what I'm missing here...

worker        | [2024-06-29 09:37:48,388: INFO/ForkPoolWorker-22] HTTP Request: GET http://host.docker.internal:54321/rest/v1/brains_vectors?select=brain_id%2C%20vector_id&brain_id=eq.07121c8f-4d1c-42b6-b0c1-c78d0a6a0eec&file_sha1=eq.None "HTTP/1.1 200 OK"
worker        | [2024-06-29 09:37:48,393: WARNING/ForkPoolWorker-22] Error processing file: 'File' object has no attribute 'file'
worker        | [ERROR] quivr_api.celery_worker [celery_worker.py:91]: 'File' object has no attribute 'file'
worker        | Traceback (most recent call last):
worker        |   File "/code/api/quivr_api/celery_worker.py", line 69, in process_file_and_notify
worker        |     filter_file(
worker        |   File "/code/api/quivr_api/packages/files/processors.py", line 103, in filter_file
worker        |     raise e
worker        |   File "/code/api/quivr_api/packages/files/processors.py", line 86, in filter_file
worker        |     result = file_processors[file.file_extension](
worker        |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
worker        |   File "/code/api/quivr_api/packages/files/parsers/pdf.py", line 14, in process_pdf
worker        |     return process_file(
worker        |            ^^^^^^^^^^^^^
worker        |   File "/code/api/quivr_api/packages/files/parsers/common.py", line 36, in process_file
worker        |     doc = file.file
worker        |           ^^^^^^^^^
worker        |   File "/usr/local/lib/python3.11/site-packages/pydantic/main.py", line 811, in __getattr__
worker        |     raise AttributeError(f'{type(self).__name__!r} object has no attribute {item!r}')
worker        | AttributeError: 'File' object has no attribute 'file'
worker        | [2024-06-29 09:37:48,402: INFO/ForkPoolWorker-22] HTTP Request: PATCH http://host.docker.internal:54321/rest/v1/notifications?id=eq.85f784e8-43b4-4d3b-8fd1-1fa8e4eaa150 "HTTP/1.1 200 OK"
worker        | [2024-06-29 09:37:48,405: INFO/ForkPoolWorker-22] Task process_file_and_notify[86e3f8bf-e69f-44b5-9f83-75795cef444f] succeeded in 0.09197982099999535s: None
backend-core  | INFO:     192.168.1.15:63821 - "GET /knowledge?brain_id=07121c8f-4d1c-42b6-b0c1-c78d0a6a0eec HTTP/1.1" 200 OK

@dangnhdev
Copy link
Author

I got it:
Temporary remove LLAMA_CLOUD_API_KEY fixed the problem

if os.getenv("LLAMA_CLOUD_API_KEY"):
doc = file.file
document_ext = os.path.splitext(doc.filename)[1]

Copy link
Contributor

Thanks for your contributions, we'll be closing this issue as it has gone stale. Feel free to reopen if you'd like to continue the discussion.

@github-actions github-actions bot added the Stale label Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: backend Related to backend functionality or under the /backend directory bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

1 participant