[FEATURE] extend `multi-modality` support with `DataURL` for documents like `pdf` #4481

davidberenstein1957 · 2024-01-11T14:49:55Z

Is your feature request related to a problem? Please describe.
We recently got a request from a user about support for PDF documents within the TextField as DataURL.
Leire found that the following worked:
'<embed src=https://arxiv.org/pdf/2307.00808.pdf type=application/pdf width=700px height=700px/></embed>'

Describe the solution you'd like
I would like to see extended support for documents like PDF, and maybe pptx, docx, xlsx and others to load them into Argilla.
from argilla.client.feedback.utils import audio_to_html, image_to_html, video_to_html

Describe alternatives you've considered
N.A.

Additional context

The text was updated successfully, but these errors were encountered:

dosubot · 2024-01-11T14:50:16Z

🤖

Hello @davidberenstein1957!

I'm Dosu, a friendly bot here to assist you with any bugs, questions, or contributions while the human maintainers are away. Let's work together to make Argilla even better! 😊

Thank you for your feature request, davidberenstein1957. As of the current version of Argilla (v1.21.0), there is no built-in support for embedding document types like PDF, pptx, docx, xlsx, etc., within the TextField. The TextFieldSettings class only supports 'type' and 'use_markdown' properties, and there is no property or method that suggests the support for embedding documents.

However, your suggestion to extend the multi-modality support with DataURL for documents is a valuable one. This would indeed enhance the functionality of Argilla by allowing users to load various document types into the platform.

As a maintainer, I can't create pull requests or issues, but I encourage you or other contributors to consider implementing this feature. You might want to start by extending the TextFieldSettings class or creating a new class for handling document types.

Please refer to the Argilla repository and specifically the fields.py file for more information on how the current data types are handled.

If you decide to work on this feature, please ensure to follow the contribution guidelines outlined in the CONTRIBUTING.md file.

I hope this information is helpful. If you have any further questions or need more clarification, feel free to ask.

Best regards,
Dosu

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

sdiazlor · 2024-01-11T15:44:10Z

Nice! I'll take a look, this idea is really interesting and useful!

davidberenstein1957 · 2024-01-22T10:32:24Z

@sdiazlor add an example for the https://huggingface.co/HuggingFaceM4 dataset.

dosubot · 2024-01-23T23:06:54Z

🤖

Thank you for your feature request. Extending the multi-modality support to include documents like PDFs, pptx, docx, xlsx, and others is a great idea and would certainly enhance the functionality of Argilla.

Currently, Argilla's multi-modality support is implemented through the for_multi_modal_classification and for_multi_modal_transcription methods in the TaskTemplateMixin class. These methods handle video, audio, and image data. The rendering of different media types in the TextField is handled by the use_markdown property. If use_markdown is set to True, the text field will be rendered using markdown. If it's set to False, the text field will be rendered as plain text.

To extend this functionality to support documents, we would need to modify these methods to handle the new media types. This would involve creating new utility functions similar to audio_to_html, image_to_html, and video_to_html for each new media type. These functions would convert the media data into a format that can be rendered in the TextField.

For example, a pdf_to_html function might look something like this:

def pdf_to_html(pdf_data):
    return f'<embed src={pdf_data} type=application/pdf width=700px height=700px/></embed>'

This function would take the PDF data (in this case, a URL to the PDF document) and return a string of HTML that embeds the PDF in the page. This HTML could then be rendered in the TextField.

The for_multi_modal_classification and for_multi_modal_transcription methods would also need to be updated to handle the new media types. This could involve adding new parameters to these methods to specify the media type, and then using the appropriate utility function to convert the media data into a format that can be rendered in the TextField.

Please note that this is a high-level overview of what the implementation might look like, and the actual implementation could be more complex. This feature would also need to be thoroughly tested to ensure that it works correctly and doesn't introduce any new issues.

I hope this gives you a good idea of what this feature might entail. If you have any further questions or suggestions, please don't hesitate to let me know.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

# Description Please include a summary of the changes and the related issue. Please also include relevant motivation and context. List any dependencies that are required for this change. Closes #4481 **Type of change** (Please delete options that are not relevant. Remember to title the PR according to the type of change) - [ ] New feature (non-breaking change which adds functionality) - [x] Refactor (change restructuring the codebase without changing functionality) - [x] Improvement (change adding some improvement to an existing functionality) **How Has This Been Tested** (Please describe the tests that you ran to verify your changes. And ideally, reference `tests`) - [ ] Test A - [ ] Test B **Checklist** - [ ] I added relevant documentation - [ ] I followed the style guidelines of this project - [ ] I did a self-review of my code - [ ] I made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK) (see text above) - [ ] I have added relevant notes to the `CHANGELOG.md` file (See https://keepachangelog.com/) --------- Co-authored-by: leire <leire@recogn.ai> Co-authored-by: David Berenstein <david.m.berenstein@gmail.com>

davidberenstein1957 assigned leiyre and sdiazlor and unassigned leiyre Jan 11, 2024

davidberenstein1957 added the team: interns Indicates that the issue or pull request is owned by the machine learning interns team label Jan 15, 2024

sdiazlor mentioned this issue Jan 16, 2024

feat: extend multimodality #4495

Merged

13 tasks

sdiazlor linked a pull request Jan 16, 2024 that will close this issue

feat: extend multimodality #4495

Merged

13 tasks

davidberenstein1957 added this to the v1.23.0 milestone Jan 17, 2024

nataliaElv added the type: popular request Indicates that several people outside of the Argilla organization are interested in this feature label Jan 22, 2024

sdiazlor closed this as completed in #4495 Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] extend `multi-modality` support with `DataURL` for documents like `pdf` #4481

[FEATURE] extend `multi-modality` support with `DataURL` for documents like `pdf` #4481

davidberenstein1957 commented Jan 11, 2024 •

edited

Loading

dosubot bot commented Jan 11, 2024 •

edited

Loading

sdiazlor commented Jan 11, 2024

davidberenstein1957 commented Jan 22, 2024

dosubot bot commented Jan 23, 2024

[FEATURE] extend multi-modality support with DataURL for documents like pdf #4481

[FEATURE] extend multi-modality support with DataURL for documents like pdf #4481

Comments

davidberenstein1957 commented Jan 11, 2024 • edited Loading

dosubot bot commented Jan 11, 2024 • edited Loading

Sources

sdiazlor commented Jan 11, 2024

davidberenstein1957 commented Jan 22, 2024

dosubot bot commented Jan 23, 2024

Sources

[FEATURE] extend `multi-modality` support with `DataURL` for documents like `pdf` #4481

[FEATURE] extend `multi-modality` support with `DataURL` for documents like `pdf` #4481

davidberenstein1957 commented Jan 11, 2024 •

edited

Loading

dosubot bot commented Jan 11, 2024 •

edited

Loading