-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] extend multi-modality
support with DataURL
for documents like pdf
#4481
Comments
🤖 Hello @davidberenstein1957! I'm Dosu, a friendly bot here to assist you with any bugs, questions, or contributions while the human maintainers are away. Let's work together to make Argilla even better! 😊 Thank you for your feature request, davidberenstein1957. As of the current version of Argilla (v1.21.0), there is no built-in support for embedding document types like PDF, pptx, docx, xlsx, etc., within the TextField. The TextFieldSettings class only supports 'type' and 'use_markdown' properties, and there is no property or method that suggests the support for embedding documents. However, your suggestion to extend the As a maintainer, I can't create pull requests or issues, but I encourage you or other contributors to consider implementing this feature. You might want to start by extending the Please refer to the Argilla repository and specifically the fields.py file for more information on how the current data types are handled. If you decide to work on this feature, please ensure to follow the contribution guidelines outlined in the CONTRIBUTING.md file. I hope this information is helpful. If you have any further questions or need more clarification, feel free to ask. Best regards, SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Nice! I'll take a look, this idea is really interesting and useful! |
@sdiazlor add an example for the https://huggingface.co/HuggingFaceM4 dataset. |
🤖 Thank you for your feature request. Extending the Currently, Argilla's To extend this functionality to support documents, we would need to modify these methods to handle the new media types. This would involve creating new utility functions similar to For example, a def pdf_to_html(pdf_data):
return f'<embed src={pdf_data} type=application/pdf width=700px height=700px/></embed>' This function would take the PDF data (in this case, a URL to the PDF document) and return a string of HTML that embeds the PDF in the page. This HTML could then be rendered in the TextField. The Please note that this is a high-level overview of what the implementation might look like, and the actual implementation could be more complex. This feature would also need to be thoroughly tested to ensure that it works correctly and doesn't introduce any new issues. I hope this gives you a good idea of what this feature might entail. If you have any further questions or suggestions, please don't hesitate to let me know. SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
<!-- Thanks for your contribution! As part of our Community Growers initiative 🌱, we're donating Justdiggit bunds in your name to reforest sub-Saharan Africa. To claim your Community Growers certificate, please contact David Berenstein in our Slack community or fill in this form https://tally.so/r/n9XrxK once your PR has been merged. --> # Description Please include a summary of the changes and the related issue. Please also include relevant motivation and context. List any dependencies that are required for this change. Closes #4481 **Type of change** (Please delete options that are not relevant. Remember to title the PR according to the type of change) - [ ] New feature (non-breaking change which adds functionality) - [x] Refactor (change restructuring the codebase without changing functionality) - [x] Improvement (change adding some improvement to an existing functionality) **How Has This Been Tested** (Please describe the tests that you ran to verify your changes. And ideally, reference `tests`) - [ ] Test A - [ ] Test B **Checklist** - [ ] I added relevant documentation - [ ] I followed the style guidelines of this project - [ ] I did a self-review of my code - [ ] I made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK) (see text above) - [ ] I have added relevant notes to the `CHANGELOG.md` file (See https://keepachangelog.com/) --------- Co-authored-by: leire <leire@recogn.ai> Co-authored-by: David Berenstein <david.m.berenstein@gmail.com>
Is your feature request related to a problem? Please describe.
We recently got a request from a user about support for PDF documents within the TextField as DataURL.
Leire found that the following worked:
'<embed src=https://arxiv.org/pdf/2307.00808.pdf type=application/pdf width=700px height=700px/></embed>'
Describe the solution you'd like
I would like to see extended support for documents like PDF, and maybe pptx, docx, xlsx and others to load them into Argilla.
from argilla.client.feedback.utils import audio_to_html, image_to_html, video_to_html
Describe alternatives you've considered
N.A.
Additional context
The text was updated successfully, but these errors were encountered: