Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] extend multi-modality support with DataURL for documents like pdf #4481

Closed
davidberenstein1957 opened this issue Jan 11, 2024 · 4 comments · Fixed by #4495
Closed
Assignees
Labels
area: ui Indicates that an issue or pull request is related to the User Interface (UI) language: python Pull requests or issues that update Python code team: frontend Indicates that the issue or pull request is owned by the frontend team team: interns Indicates that the issue or pull request is owned by the machine learning interns team type: enhancement Indicates new feature requests type: popular request Indicates that several people outside of the Argilla organization are interested in this feature
Milestone

Comments

@davidberenstein1957
Copy link
Member

davidberenstein1957 commented Jan 11, 2024

Is your feature request related to a problem? Please describe.
We recently got a request from a user about support for PDF documents within the TextField as DataURL.
Leire found that the following worked:
'<embed src=https://arxiv.org/pdf/2307.00808.pdf type=application/pdf width=700px height=700px/></embed>'

Describe the solution you'd like
I would like to see extended support for documents like PDF, and maybe pptx, docx, xlsx and others to load them into Argilla.
from argilla.client.feedback.utils import audio_to_html, image_to_html, video_to_html

Describe alternatives you've considered
N.A.

Additional context
image

Copy link

dosubot bot commented Jan 11, 2024

🤖

Hello @davidberenstein1957!

I'm Dosu, a friendly bot here to assist you with any bugs, questions, or contributions while the human maintainers are away. Let's work together to make Argilla even better! 😊

Thank you for your feature request, davidberenstein1957. As of the current version of Argilla (v1.21.0), there is no built-in support for embedding document types like PDF, pptx, docx, xlsx, etc., within the TextField. The TextFieldSettings class only supports 'type' and 'use_markdown' properties, and there is no property or method that suggests the support for embedding documents.

However, your suggestion to extend the multi-modality support with DataURL for documents is a valuable one. This would indeed enhance the functionality of Argilla by allowing users to load various document types into the platform.

As a maintainer, I can't create pull requests or issues, but I encourage you or other contributors to consider implementing this feature. You might want to start by extending the TextFieldSettings class or creating a new class for handling document types.

Please refer to the Argilla repository and specifically the fields.py file for more information on how the current data types are handled.

If you decide to work on this feature, please ensure to follow the contribution guidelines outlined in the CONTRIBUTING.md file.

I hope this information is helpful. If you have any further questions or need more clarification, feel free to ask.

Best regards,
Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@dosubot dosubot bot added area: ui Indicates that an issue or pull request is related to the User Interface (UI) language: python Pull requests or issues that update Python code team: frontend Indicates that the issue or pull request is owned by the frontend team type: enhancement Indicates new feature requests labels Jan 11, 2024
@davidberenstein1957 davidberenstein1957 assigned leiyre and sdiazlor and unassigned leiyre Jan 11, 2024
@sdiazlor
Copy link
Contributor

Nice! I'll take a look, this idea is really interesting and useful!

@davidberenstein1957 davidberenstein1957 added the team: interns Indicates that the issue or pull request is owned by the machine learning interns team label Jan 15, 2024
@sdiazlor sdiazlor linked a pull request Jan 16, 2024 that will close this issue
13 tasks
@davidberenstein1957 davidberenstein1957 added this to the v1.23.0 milestone Jan 17, 2024
@davidberenstein1957
Copy link
Member Author

@sdiazlor add an example for the https://huggingface.co/HuggingFaceM4 dataset.

@nataliaElv nataliaElv added the type: popular request Indicates that several people outside of the Argilla organization are interested in this feature label Jan 22, 2024
Copy link

dosubot bot commented Jan 23, 2024

🤖

Thank you for your feature request. Extending the multi-modality support to include documents like PDFs, pptx, docx, xlsx, and others is a great idea and would certainly enhance the functionality of Argilla.

Currently, Argilla's multi-modality support is implemented through the for_multi_modal_classification and for_multi_modal_transcription methods in the TaskTemplateMixin class. These methods handle video, audio, and image data. The rendering of different media types in the TextField is handled by the use_markdown property. If use_markdown is set to True, the text field will be rendered using markdown. If it's set to False, the text field will be rendered as plain text.

To extend this functionality to support documents, we would need to modify these methods to handle the new media types. This would involve creating new utility functions similar to audio_to_html, image_to_html, and video_to_html for each new media type. These functions would convert the media data into a format that can be rendered in the TextField.

For example, a pdf_to_html function might look something like this:

def pdf_to_html(pdf_data):
    return f'<embed src={pdf_data} type=application/pdf width=700px height=700px/></embed>'

This function would take the PDF data (in this case, a URL to the PDF document) and return a string of HTML that embeds the PDF in the page. This HTML could then be rendered in the TextField.

The for_multi_modal_classification and for_multi_modal_transcription methods would also need to be updated to handle the new media types. This could involve adding new parameters to these methods to specify the media type, and then using the appropriate utility function to convert the media data into a format that can be rendered in the TextField.

Please note that this is a high-level overview of what the implementation might look like, and the actual implementation could be more complex. This feature would also need to be thoroughly tested to ensure that it works correctly and doesn't introduce any new issues.

I hope this gives you a good idea of what this feature might entail. If you have any further questions or suggestions, please don't hesitate to let me know.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

sdiazlor added a commit that referenced this issue Jan 24, 2024
<!-- Thanks for your contribution! As part of our Community Growers
initiative 🌱, we're donating Justdiggit bunds in your name to reforest
sub-Saharan Africa. To claim your Community Growers certificate, please
contact David Berenstein in our Slack community or fill in this form
https://tally.so/r/n9XrxK once your PR has been merged. -->

# Description

Please include a summary of the changes and the related issue. Please
also include relevant motivation and context. List any dependencies that
are required for this change.

Closes #4481

**Type of change**

(Please delete options that are not relevant. Remember to title the PR
according to the type of change)

- [ ] New feature (non-breaking change which adds functionality)
- [x] Refactor (change restructuring the codebase without changing
functionality)
- [x] Improvement (change adding some improvement to an existing
functionality)

**How Has This Been Tested**

(Please describe the tests that you ran to verify your changes. And
ideally, reference `tests`)

- [ ] Test A
- [ ] Test B

**Checklist**

- [ ] I added relevant documentation
- [ ] I followed the style guidelines of this project
- [ ] I did a self-review of my code
- [ ] I made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK)
(see text above)
- [ ] I have added relevant notes to the `CHANGELOG.md` file (See
https://keepachangelog.com/)

---------

Co-authored-by: leire <leire@recogn.ai>
Co-authored-by: David Berenstein <david.m.berenstein@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: ui Indicates that an issue or pull request is related to the User Interface (UI) language: python Pull requests or issues that update Python code team: frontend Indicates that the issue or pull request is owned by the frontend team team: interns Indicates that the issue or pull request is owned by the machine learning interns team type: enhancement Indicates new feature requests type: popular request Indicates that several people outside of the Argilla organization are interested in this feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants