# Making the Most of Markdown within Argilla TextFields

## Introduction

As you may have noticed, Argilla supports Markdown within its text fields. This allows you to add formatting to your text, such as **bold** or *italic* text, or even [links](https://www.google.com). Additionally, this also allows you to add all HTML content, such as images, videos, and even iframes, which is a powerfull tool to have at your disposal.

Within this notebook, we will go over the basics of Markdown, and how to use it within Argilla.

- multi-modality
    - image
    - video
    - audio
- table
- exploting displacy
  - ner
  - relationships

## Installing Dependencies

We will be working with builtin Python libraries, as well as the `argilla` library. Additionally, we will use a unstructored document processor with a externally callable public API (to avoid overhead). This tool is called [IBM Deep Search](https://github.com/DS4SD/deepsearch-toolkit) but for a fully open source alternative, I recommend taking a look at [Unstructured](https://unstructured.io). To install the latter, run the following command:

In [4]:
!pip install argilla==1.17 
!pip install deepsearch-toolkit

Looking in indexes: https://pypi.org/simple, https://dmrepository.datamaran.com:8443/repository/dmPYTHON/simple

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.1[0m[39;49m -> [0m[32;49m23.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Looking in indexes: https://pypi.org/simple, https://dmrepository.datamaran.com:8443/repository/dmPYTHON/simple
Collecting typer[all]<0.10.0,>=0.9.0
  Using cached https://dmrepository.datamaran.com:8443/repository/dmPYTHON/packages/typer/0.9.0/typer-0.9.0-py3-none-any.whl (45 kB)
Installing collected packages: typer
  Attempting uninstall: typer
    Found existing installation: typer 0.7.0
    Uninstalling typer-0.7.0:
      Successfully uninstalled typer-0.7.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conf

### Signup to Deep Search


Go to https://deepsearch-experience.res.ibm.com/ and sign up for an account using the Google OAuth integration. Afterwards, you can use the following command to install the library.

![authenticate](img/making-most-of-markdown/deepsearch.png)


```bash
deepsearch profile config --profile-name "ds-experience" --host "https://deepsearch-experience.res.ibm.com/" --verify-ssl --username "<your-email>"
```

And add `your-api-key` to the prompted terminal.

## Get Coding

### Multi-Modality

A DataURL is a way to encode binary data into a string, which can then be used to embed the data into a webpage. This is a very useful tool, as it allows us to embed images, videos, and audio files directly into html, without having to worry about hosting them externally. This is done by prepending the data with a header, which specifies the type of data being encoded, and the encoding used. We will define three different functions, one for each modality, which will take a file path as input, and return a DataURL as output.

In [21]:
import base64
from pathlib import Path

def get_file_type(path):
    return Path(path).suffix[1:]

def video_to_dataurl(path, file_type: str = None):
    # Open the video file and read its contents
    with open(path, 'rb') as f:
        video_data = f.read()

    # Encode the video data as base64
    video_base64 = base64.b64encode(video_data).decode('utf-8')

    # Get the file type (e.g. mp4)
    file_type = file_type or get_file_type(path)
    
    # Prepend the Data URL prefix to the base64-encoded data
    data_url = f'data:video/{file_type};base64,' + video_base64

    # Create HTML
    html = f"<video controls><source src='{data_url}' type='video/{file_type}'></video>"
    html_href = f"<a href='{data_url}'>Click</a>"
    html_iframe = f"<iframe src='{data_url}'></iframe>"
    html_embed = f"<embed src='{data_url}'></embed>"
    return html_embed
    
def audio_to_dataurl(path, file_type: str = None):
    # Open the audio file and read its contents
    with open(path, 'rb') as f:
        audio_data = f.read()
    
    # Encode the audio data as base64
    audio_base64 = base64.b64encode(audio_data).decode('utf-8')
    
    # Get the file type (e.g. mp3)
    file_type = file_type or get_file_type(path)
    
    # Prepend the Data URL prefix to the base64-encoded data
    data_url = f'data:audio/{file_type};base64,' + audio_base64
    
    # Create HTML
    html = f"<audio controls autoplay><source src='{data_url}' type='audio/{file_type}'></audio>"
    return html

def image_to_dataurl(path, file_type: str = None):
    # open the image file and read its contents
    with open(path, 'rb') as f:
        image_data = f.read()
        
    # Encode the image data as base64
    image_base64 = base64.b64encode(image_data).decode('utf-8')
    
    # Get the file type (e.g. png)
    file_type = file_type or get_file_type(path)
    
    # Prepend the Data URL prefix to the base64-encoded data
    data_url = f'data:image/{file_type};base64,' + image_base64
    
    # Create HTML
    html = f'<img src="{data_url}">'
    return html


In [22]:
data_url_video = video_to_dataurl("img/making-most-of-markdown/snapshot.mp4")
data_url_audio_1 = audio_to_dataurl("img/making-most-of-markdown/heath_ledger.mp3")
data_url_audio_2 = audio_to_dataurl("img/making-most-of-markdown/heath_ledger_2.mp3")
data_url_image = image_to_dataurl("img/making-most-of-markdown/deepsearch.png")

In [24]:
import argilla as rg

records = [
    rg.FeedbackRecord(fields={"content": data_url_audio_1}),
    rg.FeedbackRecord(fields={"content": data_url_audio_2}),
    rg.FeedbackRecord(fields={"content": data_url_image}),
    rg.FeedbackRecord(fields={"content": data_url_video})
]
try:
    ds = rg.FeedbackDataset(
        fields=[rg.TextField(name="content", use_markdown=True)],
        questions=[rg.TextQuestion(name="describe")],
    )
    ds.add_records(records)
    ds = ds.push_to_argilla("multi-modal")
    
except:
    ds = rg.FeedbackDataset.from_argilla("multi-modal")
    ds.add_records(records)


Pushing records to Argilla...: 100%|██████████| 1/1 [00:00<00:00,  1.27it/s]
