# A Menu Converter Project

## Introduction

In this script, we automate the process of converting PDF menu to images

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [9]:
directory = '/content/drive/MyDrive/AI Projects/RAG Course/5_Project_Using_OpenAI-Menu_Digitalizer'

## PDF To Image Converter

### Install and Import Required Library

In [3]:
!pip install PyMuPDF Pillow

Collecting PyMuPDF
  Downloading pymupdf-1.26.4-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (3.4 kB)
Downloading pymupdf-1.26.4-cp39-abi3-manylinux_2_28_x86_64.whl (24.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.1/24.1 MB[0m [31m53.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PyMuPDF
Successfully installed PyMuPDF-1.26.4


* **PyMuPDF**: A Python binding for MuPDF, which allows for PDF and image file processing.
* **Pillow**: A Python Imaging Library that adds image processing capabilities to your Python interpreter.

In [4]:
import fitz # PyMuPDF
from PIL import Image # Pillow
import os

* `fitz` provides functions to read and manipulate PDF files.
* `Image` from `PIL` allows us to create and modify images.
* `os` helps in interacting with the operating system, such as reading files and directories.

### Define The PDF To JPG Conversion Function

We define a function pdf_to_jpg that converts all PDF files in the specified directory to JPG images.

In [5]:
def pdf_to_jpg(directory):
  # Iterate over all files in the specified directory
  for filename in os.listdir(directory):
    # Check if the file is a PDF
    if filename.endswith('.pdf'):
      # Construct the full file path
      pdf_path = os.path.join(directory, filename)
      # Open the PDF file
      pdf_document = fitz.open(pdf_path)
      # Iterate over each page in the PDF
      for page_number in range(len(pdf_document)):
        # Get the page by its index
        page = pdf_document.load_page(page_number)
        # Render the page as apixmap (an in-memory image)
        pixmap = page.get_pixmap()

        # Construct the output image file path
        image_path = os.path.join(
            directory,
            f"images/{os.path.splitext(filename)[0]}_page{page_number+1}.jpg"
        )
        # Create the pixmap as a JPG image
        img = Image.frombytes("RGB", [pixmap.width, pixmap.height], pixmap.samples)
        # Save the image to the specified path
        img.save(image_path, "JPEG")

  # Print a message when all conversions are done
  print("All PDF files have been converted")

In this function:

* We loop through all files in the directory and select those that end with `.pdf`.
* Each PDF is opened using `fitz.open()`.
* We iterate through each page of the PDF.
* Each page is rendered to a pixmap using `page.get_pixmap()`.
* The pixmap is converted to an image using `Image.frombytes()`.
* The image is saved as a JPG file in the same directory.

In [11]:
# Apply the function
pdf_to_jpg(directory)

All PDF files have been converted
