![JohnSnowLabs](https://sparknlp.org/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp/blob/master/examples/python/transformers/SparkNLP_Reader2Image_Demo.ipynb)

# Introducing Reader2Image in SparkNLP

This notebook showcases the newly added `Reader2Image` annotator in Spark NLP. It provides a streamlined and user-friendly interface for reading image files and integrating them with VLM annotators in Spark NLP. The annotator is useful for preprocessing data in NLP pipelines that rely on information contained within images. Currently, it supports HTML and Markdown files.

## Setup and Initialization
Let's keep in mind a few things before we start 😊

Support for **Reader2Image** files was introduced in Spark NLP 6.1.3. Please make sure you have upgraded to the latest Spark NLP release.

- Let's install and setup Spark NLP in Google Colab. This part is pretty easy via our simple script

In [None]:
! wget -q http://setup.johnsnowlabs.com/colab.sh -O - | bash

In [4]:
import sparknlp

# let's start Spark with Spark NLP
spark = sparknlp.start()
print(sparknlp.version())

print("Apache Spark version: {}".format(spark.version))

Apache Spark version: 3.5.1


To illustrate the use of this reader, let’s define an HTML document containing image data and display a preview.

In [14]:
from IPython.core.display import display, HTML

html_code = """
<!DOCTYPE html>
<html>
<head>
    <title>Image Parsing Test</title>
</head>
<body>
<h1>Test Images</h1>

<!-- Base64 inline PNG -->
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA
  AAAFCAYAAACNbyblAAAAHElEQVQI12P4
  //8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg=="
     alt="Base64 Red Dot" width="5" height="5">

<!-- External image -->
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/React-icon.svg/1024px-React-icon.svg.png"
     alt="React Logo" width="50" height="50">
"""

display(HTML(html_code))

As you can see in the image above, we have two files: a small red dot and an atom. We expect a VLM model to generate descriptions of these images for us.

In [6]:
with open("example-images.html", "w") as f:
    f.write(html_code)

In [7]:
empty_df = spark.createDataFrame([], "string").toDF("text")

In [8]:
from pyspark.ml import Pipeline
from sparknlp.reader.reader2image import Reader2Image

reader2image = Reader2Image() \
    .setContentType("text/html") \
    .setContentPath("./example-images.html") \
    .setOutputCol("image")

pipeline = Pipeline(stages=[reader2image])
model = pipeline.fit(empty_df)

image_df = model.transform(empty_df)
image_df.show()

+--------------------+--------------------+--------------------+-------------------+--------------------+
|                path|             content|           partition|           fileName|               image|
+--------------------+--------------------+--------------------+-------------------+--------------------+
|file:/content/exa...|\n<!DOCTYPE html>...|[{Title, Test Ima...|example-images.html|[{image, example-...|
|file:/content/exa...|\n<!DOCTYPE html>...|[{Title, Test Ima...|example-images.html|[{image, example-...|
+--------------------+--------------------+--------------------+-------------------+--------------------+



For this example, we will use the `Qwen2VLTransformer`. Let’s add a text prompt column for VQA (Vision Question Answering).

In [9]:
from pyspark.sql.functions import lit
prompt_df = image_df.withColumn(
    "text",
    lit(
        "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n"
        "<|im_start|>user\n<|vision_start|><|image_pad|><|vision_end|>"
        "Describe this image.<|im_end|>\n"
        "<|im_start|>assistant\n"
    )
)

In [10]:
prompt_df.show()

+--------------------+--------------------+--------------------+-------------------+--------------------+--------------------+
|                path|             content|           partition|           fileName|               image|                text|
+--------------------+--------------------+--------------------+-------------------+--------------------+--------------------+
|file:/content/exa...|\n<!DOCTYPE html>...|[{Title, Test Ima...|example-images.html|[{image, example-...|<|im_start|>syste...|
|file:/content/exa...|\n<!DOCTYPE html>...|[{Title, Test Ima...|example-images.html|[{image, example-...|<|im_start|>syste...|
+--------------------+--------------------+--------------------+-------------------+--------------------+--------------------+



In [11]:
from sparknlp.annotator import Qwen2VLTransformer

visualQAClassifier = (
    Qwen2VLTransformer.pretrained()
    .setInputCols("image")
    .setOutputCol("answer")
)

qwen2_vl_2b_instruct_int4 download started this may take some time.
Approximate size to download 1.4 GB
[OK!]


In [12]:
pipeline = Pipeline().setStages([visualQAClassifier])
result_df = pipeline.fit(prompt_df).transform(prompt_df)

In [13]:
result_df.select("image.origin", "answer.result").show(truncate=False)

+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|origin               |result                                                                                              

Voilà! As you can see above, we have accurate descriptions of the images generated by Qwen2VLTransformer.