Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update document_writer.py #7593

Conversation

M-JULIANI
Copy link

Related Issues

Proposed Changes:

Guarding against writing null byes in /components/document_writer.py

How did you test it?

Manually, it allows the pipeline to write the parsed text from pdf to my postgres db.

Notes for the reviewer

Checklist

@M-JULIANI M-JULIANI requested a review from a team as a code owner April 24, 2024 13:21
@M-JULIANI M-JULIANI requested review from masci and removed request for a team April 24, 2024 13:21
@github-actions github-actions bot added the 2.x Related to Haystack v2.0 label Apr 24, 2024
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Marco Juliani seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@masci
Copy link
Contributor

masci commented May 9, 2024

@M-JULIANI thanks for the PR but I don't think this is the right place for the fix, this would always remove the null character for any data passing through a document writer, I can totally see how this might be a problem if you need that character on different types of document. I would probably do this in a pre-processor, like DocumentCleaner.

@masci masci closed this May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

psycopg.DataError: PostgreSQL text fields cannot contain NUL (0x00) bytes
3 participants