Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Bug Report: Large zip breaking stream endpoint #859

Open
2 tasks done
pabik opened this issue Feb 21, 2024 · 10 comments
Open
2 tasks done

🐛 Bug Report: Large zip breaking stream endpoint #859

pabik opened this issue Feb 21, 2024 · 10 comments
Assignees
Labels
backend bug Something isn't working hacktoberfest help wanted Extra attention is needed

Comments

@pabik
Copy link
Collaborator

pabik commented Feb 21, 2024

📜 Description

Stream endpoint doesn't provide answer when embedded file in zip archive is long.

👟 Reproduction steps

  1. Upload a zip file
  2. Try chatting
    docs_tester.zip

👍 Expected behavior

DocsGPT should provide an answer.

👎 Actual Behavior with Screenshots

No answer, stream endpoint breaks.
image

💻 Operating system

MacOS

What browsers are you seeing the problem on?

Chrome

🤖 What development environment are you experiencing this bug on?

Docker

🔒 Did you set the correct environment variables in the right path? List the environment variable names (not values please!)

No response

📃 Provide any additional context for the Bug.

No response

📖 Relevant log output

No response

👀 Have you spent some time to check if this bug has been raised before?

  • I checked and didn't find similar issue

🔗 Are you willing to submit PR?

None

🧑‍⚖️ Code of Conduct

  • I agree to follow this project's Code of Conduct
@dartpain dartpain added help wanted Extra attention is needed bug Something isn't working labels Feb 22, 2024
@pabik pabik changed the title 🐛 Bug Report: 🐛 Bug Report: Large zip breaking stream endpoint Mar 1, 2024
@dartpain
Copy link
Contributor

dartpain commented Jun 7, 2024

Looks like it happens because the file is not being chunked properly or at all when answering, thus resulting current context token overload

@nayelimdejesus
Copy link
Contributor

Hi, I would like to work on this issue.

@nayelimdejesus
Copy link
Contributor

When you upload a big zip file what answer should it provide?

@dartpain
Copy link
Contributor

Just shouldn't break. Basically make sure that it doesn't error out.
Try running it with the file attached.

@jayantp2003
Copy link

I am interested to work on this issue.

@jayantp2003
Copy link

I was playing around with the zip file and couple of different files, I found that its not an issue related to chunking of code, there is some issue with RstParser class, I did update the file extensions to text file, for that case, it was working fine.

image image

Currently checking the Rstparser class to figure out the changes required.

@jayantp2003
Copy link

The issue is with the implementation of rst parser, in each file, it looks for a header and a text below it, but for the zip file we are testing on, it is just a single file with no header available, hence it is not being chunked. This header and text breakdown thing also seems to be an issue for markdown parser. The file should be chunked based on tokens or bytes and this tuple implementation also need to be updated.

@dartpain
Copy link
Contributor

Yeah seems like thats the issue, lets add another token size handler to it maybe?

@jayantp2003
Copy link

Hey, I have updated the code and created a PR, can you review it and approve, I am new to open source contributions and do not know how it works, after making a PR. Open to feedbacks.

@jayantp2003
Copy link

@dartpain Can you review my changes and provide feedback, and approve if implementation seems correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend bug Something isn't working hacktoberfest help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants