### Import Libraries and Set Up Directory

In this cell, we import the required `fitz` library (PyMuPDF) and set up the directory where the extracted images will be stored.


In [1]:
import fitz  # PyMuPDF
import os

image_dir = "extracted_images"
os.makedirs(image_dir, exist_ok=True)

### Define Function to Extract Images

In this cell, we define the `extract_images_from_pdf` function that extracts all images from a PDF file.

- It opens the PDF using the `fitz` library.
- Iterates through each page and extracts all images.
- Saves the images in the `extracted_images` directory as `.png` files.

The function also prints the name of each extracted image and a count of total images extracted.


In [2]:
def extract_images_from_pdf(pdf_path, image_dir):
    pdf_document = fitz.open(pdf_path)

    image_count = 0
    for page_num in range(pdf_document.page_count):
        page = pdf_document.load_page(page_num)

        image_list = page.get_images(full=True)

        for img_index, img in enumerate(image_list):
            xref = img[0]
            base_image = pdf_document.extract_image(xref)
            image_bytes = base_image["image"]
            image_filename = os.path.join(
                image_dir, f"image_{page_num + 1}_{img_index + 1}.png")

            with open(image_filename, "wb") as img_file:
                img_file.write(image_bytes)

            image_count += 1
            print(f"Extracted image {image_filename}")

    print(f"Extraction complete. Total images extracted: {image_count}")

### Extract Images from the PDF

In this cell, we provide the path to the PDF file and call the `extract_images_from_pdf` function to extract the images. The images will be saved in the `extracted_images` directory.


In [3]:
# Example usage
pdf_path = "physicschapter6.pdf"
extract_images_from_pdf(pdf_path, image_dir)

Extracted image extracted_images\image_1_1.png
Extracted image extracted_images\image_1_2.png
Extracted image extracted_images\image_1_3.png
Extracted image extracted_images\image_1_4.png
Extracted image extracted_images\image_5_1.png
Extracted image extracted_images\image_5_2.png
Extracted image extracted_images\image_5_3.png
Extracted image extracted_images\image_5_4.png
Extracted image extracted_images\image_7_1.png
Extracted image extracted_images\image_7_2.png
Extracted image extracted_images\image_7_3.png
Extracted image extracted_images\image_7_4.png
Extracted image extracted_images\image_9_1.png
Extracted image extracted_images\image_9_2.png
Extracted image extracted_images\image_9_3.png
Extracted image extracted_images\image_9_4.png
Extracted image extracted_images\image_9_5.png
Extracted image extracted_images\image_9_6.png
Extracted image extracted_images\image_10_1.png
Extracted image extracted_images\image_10_2.png
Extracted image extracted_images\image_10_3.png
Extracted 