## Images

This covers how to load images such as JPG or PNG into a document format that we can use downstream.

In [1]:
from langchain.document_loaders.image import UnstructuredImageLoader
        
loader = UnstructuredImageLoader("./papers/figure-18-1.jpg")
data = loader.load()
data[0]

  from .autonotebook import tqdm as notebook_tqdm
Downloading (…)l0.05_quantized.onnx: 100%|██████████| 54.6M/54.6M [00:04<00:00, 11.6MB/s]


Document(page_content="What are the meals that | can cook with these?\n\nWith the variety of food items stored in the refrigerator, you can prepare several meals. For example, you can create a fruit salad using the strawberries, blueberries, and carrots. Additionally, you can make a delicious smoothie using the fruits and vegetables, which would be a healthy and nutritious snack or breakfast option. The milk and eggs can be used to create various dishes, such as a milk-based smoothie, a baked casserole, or scrambled eggs. The juice can also be used to make a fresh fruit and juice mix or as a refreshing homemade popsicle. The possibilities are vast, and the meals you can create will depend on your preferences and dietary needs.\n\n| would like to do the fruit salad. Show me the recipe.\n\nCertainly! Here's a simple fruit salad recipe using the items in your refrigerator:\n\nIngredients:\n\n- 2 cups fresh strawberries, sliced\n\n- 1 cup fresh blueberries\n\n- 1 large carrot, grated\n\n- 

In [4]:
loader = UnstructuredImageLoader("./papers/figure-18-1.jpg", mode="elements")
data = loader.load()

In [5]:
data[0]

Document(page_content='What are the meals that | can cook with these?', metadata={'source': './papers/figure-18-1.jpg', 'coordinates': {'points': ((702.8848266601562, 187.0473175048828), (702.8848266601562, 204.4428253173828), (1024.319091796875, 204.4428253173828), (1024.319091796875, 187.0473175048828)), 'system': 'PixelSpace', 'layout_width': 1100, 'layout_height': 855}, 'filename': 'figure-18-1.jpg', 'file_directory': './papers', 'last_modified': '2023-11-07T09:20:57', 'filetype': 'JPEG', 'page_number': 1, 'detection_class_prob': 0.388230562210083, 'category': 'NarrativeText'})

In [6]:
data[1]

Document(page_content='With the variety of food items stored in the refrigerator, you can prepare several meals. For example, you can create a fruit salad using the strawberries, blueberries, and carrots. Additionally, you can make a delicious smoothie using the fruits and vegetables, which would be a healthy and nutritious snack or breakfast option. The milk and eggs can be used to create various dishes, such as a milk-based smoothie, a baked casserole, or scrambled eggs. The juice can also be used to make a fresh fruit and juice mix or as a refreshing homemade popsicle. The possibilities are vast, and the meals you can create will depend on your preferences and dietary needs.', metadata={'source': './papers/figure-18-1.jpg', 'coordinates': {'points': ((88.8476333618164, 231.98199462890625), (88.8476333618164, 327.2592468261719), (1005.2608642578125, 327.2592468261719), (1005.2608642578125, 231.98199462890625)), 'system': 'PixelSpace', 'layout_width': 1100, 'layout_height': 855}, 'fil

In [9]:
import os
folder_path = './papers/'
list_image = []
for filename in os.listdir(folder_path):
    if filename.endswith('.jpg'):
        list_image.append(folder_path + filename)
        
documents = []
image_loaders = [UnstructuredImageLoader(image) for image in list_image]
for loader in image_loaders:
    documents.extend(loader.load())

*Note UnstructuredImageLoader doesn't understand image without context*  

## Image captions

By default, the loader utilizes the pre-trained [Salesforce BLIP image captioning model](https://huggingface.co/Salesforce/blip-image-captioning-base).

This notebook shows how to use the ImageCaptionLoader to generate a query-able index of image captions

In [8]:
from langchain.document_loaders import ImageCaptionLoader
import os

folder_path = './papers/'
list_image = []
for filename in os.listdir(folder_path):
    if filename.endswith('.jpg'):
        list_image.append(folder_path + filename)

loader = ImageCaptionLoader(path_images=list_image)
list_docs = loader.load()
list_docs

Downloading (…)rocessor_config.json: 100%|██████████| 287/287 [00:00<00:00, 2.71MB/s]
Downloading (…)okenizer_config.json: 100%|██████████| 506/506 [00:00<00:00, 3.24MB/s]
Downloading (…)solve/main/vocab.txt: 100%|██████████| 232k/232k [00:00<00:00, 459kB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████| 711k/711k [00:01<00:00, 704kB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 125/125 [00:00<00:00, 1.37MB/s]
Downloading (…)lve/main/config.json: 100%|██████████| 4.56k/4.56k [00:00<00:00, 44.9MB/s]
Downloading pytorch_model.bin: 100%|██████████| 990M/990M [01:25<00:00, 11.6MB/s] 


[Document(page_content='an image of a man riding a bike [SEP]', metadata={'image_path': './papers/figure-7-1.jpg'}),
 Document(page_content='an image of a boat and a text that reads what is the water? [SEP]', metadata={'image_path': './papers/figure-18-2.jpg'}),
 Document(page_content="an image of a woman's face with the words'you are not allowed'[SEP]", metadata={'image_path': './papers/figure-19-2.jpg'}),
 Document(page_content='an image of a pan with chicken on it [SEP]', metadata={'image_path': './papers/figure-8-1.jpg'}),
 Document(page_content='an image of a parking garage with a lot of luggage [SEP]', metadata={'image_path': './papers/figure-3-1.jpg'}),
 Document(page_content="an image of a woman with a sword and a text that reads,'the princess is a", metadata={'image_path': './papers/figure-19-1.jpg'}),
 Document(page_content='an image of a text description for a customer [SEP]', metadata={'image_path': './papers/figure-18-1.jpg'})]

*Note ImageCaptionLoader provide only few explanation*  