## PDF Page Rotation Angle Detection Task

Objective:
Implement the `determine_rotation_angle` function within the given code structure to detect the rotation angle of each page in a PDF file.

Code Structure:
The main function `rotate_all_pages_upright` is already implemented, but if necessary you are allowed to change its implementation. Your task is to complete the `determine_rotation_angle` function.

Input:
- A PDF file path (the function should be able to handle various PDF files)

Output:
- A list of integers, where each integer represents the rotation angle needed for a page in the PDF

Rotation Angle:
- The rotation angle should be in degrees, normalized to the range [0, 359].
- 0 means the page is already upright
- 90 means the page needs to be rotated 90 degrees clockwise to be upright
- and so on...

Task:
1. Implement the `determine_rotation_angle` function:
   - Input: A single page object (PdfReader.PageObject)
   - Output: An integer representing the rotation angle in degrees

2. The function should analyze the content of the page and determine the angle needed to make the page upright.

Requirements:
1. The function should work with different PDF files, not just a specific one.
2. Implement robust methods to determine the correct rotation angle.
3. Handle potential exceptions or edge cases (e.g., pages with mixed orientations, complex layouts).
4. Optimize for both accuracy and processing speed, as the function will be called for each page in the PDF.

Additional Considerations:
- You are allowed to use up to 40GB of GPU VRAM if necessary for your implementation.
- You may create as many additional functions as needed to support your implementation.
- You may use additional libraries if required, but ensure they are imported properly.
- Provide clear comments in your code to explain your rotation detection logic.

Testing:
- Test your implementation with various types of PDFs to ensure its robustness and generalizability.
- The main script provides a way to test your implementation on a file named "grouped_documents.pdf".

Note:
The task involves determining the rotation angle only. The actual rotation of the pages is not required in this implementation.

In [1]:
from typing import List
from PyPDF2 import PdfReader, PdfWriter
import io
import numpy as np

def rotate_all_pages_upright(input_pdf: str) -> List[int]:
    """
    Analyze all pages in the input PDF and determine the rotation angle needed for each page.

    Args:
    input_pdf (str): The file path of the input PDF.

    Returns:
    List[int]: A list of rotation angles (in degrees) for each page. 
               The angles are normalized to be in the range [0, 359].
               0 means no rotation needed, 90 means 90 degrees clockwise, etc.
    """
    reader = PdfReader(input_pdf)
    writer = PdfWriter()
    
    angles = []
    for page_number in range(len(reader.pages)):
        current_page = reader.pages[page_number]
        
        rotation_angle = determine_rotation_angle(current_page)
        angles.append(rotation_angle)
    
    return angles

def determine_rotation_angle(page: 'PdfReader.PageObject') -> int:
    """
    Determine the rotation angle needed to make the page upright.

    Args:
    page (PdfReader.PageObject): A single page from a PDF.

    Returns:
    int:  The rotation angle in degrees (e.g. 0, 90, 210).
          The rotation angle is  normalized to be in the range [0, 359].
          0 means the page is already upright, 90 means 90 degrees clockwise, etc.
    """
    # TODO: Implement the logic to determine the rotation angle of the pdf page
    return 0

# Usage
input_pdf: str = "grouped_documents.pdf"
rotation_angles: List[int] = rotate_all_pages_upright(input_pdf)
print(f"Rotation angles for each page: {rotation_angles}")

incorrect startxref pointer(1)


Rotation angles for each page: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]


In [5]:
from typing import List
from PyPDF2 import PdfReader, PdfWriter
from PIL import Image
import pytesseract
import io

def rotate_all_pages_upright(input_pdf: str) -> List[int]:
    """
    Analyze all pages in the input PDF and determine the rotation angle needed for each page.

    Args:
    input_pdf (str): The file path of the input PDF.

    Returns:
    List[int]: A list of rotation angles (in degrees) for each page. 
               The angles are normalized to be in the range [0, 359].
               0 means no rotation needed, 90 means 90 degrees clockwise, etc.
    """
    reader = PdfReader(input_pdf)
    angles = []
    for page_number in range(len(reader.pages)):
        current_page = reader.pages[page_number]
        rotation_angle = determine_rotation_angle(current_page)
        angles.append(rotation_angle)
    return angles

def determine_rotation_angle(page: 'PdfReader.PageObject') -> int:
    """
    Determine the rotation angle needed to make the page upright.

    Args:
    page (PdfReader.PageObject): A single page from a PDF.

    Returns:
    int:  The rotation angle in degrees (e.g. 0, 90, 210).
          The rotation angle is  normalized to be in the range [0, 359].
          0 means the page is already upright, 90 means 90 degrees clockwise, etc.
    """
    # Extract the text content of the page
    text = extract_text_from_page(page)
    if not text.strip():
        return 0  # If no text is found, assume the page is upright
    
    # Use Tesseract to determine the orientation
    orientation = pytesseract.image_to_osd(text, config='--psm 0')['rotate']
    
    # Normalize the orientation to be in the range [0, 359]
    rotation_angle = (360 - orientation) % 360
    return rotation_angle

def extract_text_from_page(page: 'PdfReader.PageObject') -> str:
    """
    Extract text from a PDF page using OCR.

    Args:
    page (PdfReader.PageObject): A single page from a PDF.

    Returns:
    str: The extracted text content.
    """
    # Convert the PDF page to an image
    page_image = convert_pdf_page_to_image(page)
    
    # Use Tesseract to extract text from the image
    text = pytesseract.image_to_string(page_image)
    return text

def convert_pdf_page_to_image(page: 'PdfReader.PageObject') -> 'Image':
    """
    Convert a PDF page to an image.

    Args:
    page (PdfReader.PageObject): A single page from a PDF.

    Returns:
    Image: The converted image.
    """
    pdf_writer = PdfWriter()
    pdf_writer.add_page(page)
    
    # Save the single-page PDF to a byte buffer
    pdf_bytes = io.BytesIO()
    pdf_writer.write(pdf_bytes)
    pdf_bytes.seek(0)
    
    # Convert the single-page PDF to an image using PIL
    image = Image.open(pdf_bytes)
    return image

# Usage
input_pdf = "grouped_documents.pdf"
rotation_angles = rotate_all_pages_upright(input_pdf)
print(f"Rotation angles for each page: {rotation_angles}")

UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x000001F273579AD0>

In [11]:
from typing import List
from PyPDF2 import PdfReader
from pdf2image import convert_from_path
import pytesseract
import tempfile

def rotate_all_pages_upright(input_pdf: str) -> List[int]:
    """
    Analyze all pages in the input PDF and determine the rotation angle needed for each page.

    Args:
    input_pdf (str): The file path of the input PDF.

    Returns:
    List[int]: A list of rotation angles (in degrees) for each page. 
               The angles are normalized to be in the range [0, 359].
               0 means no rotation needed, 90 means 90 degrees clockwise, etc.
    """
    reader = PdfReader(input_pdf)
    angles = []
    for page_number in range(len(reader.pages)):
        current_page = reader.pages[page_number]
        rotation_angle = determine_rotation_angle(current_page)
        angles.append(rotation_angle)
    return angles

def determine_rotation_angle(page: 'PdfReader.PageObject') -> int:
    """
    Determine the rotation angle needed to make the page upright.

    Args:
    page (PdfReader.PageObject): A single page from a PDF.

    Returns:
    int:  The rotation angle in degrees (e.g. 0, 90, 210).
          The rotation angle is  normalized to be in the range [0, 359].
          0 means the page is already upright, 90 means 90 degrees clockwise, etc.
    """
    # Extract the image of the page
    page_image = convert_pdf_page_to_image(page)
    
    # Use Tesseract to determine the orientation
    try:
        osd = pytesseract.image_to_osd(page_image)
        rotation_angle = int(osd.split("Rotate:")[1].split("\n")[0].strip())
    except:
        rotation_angle = 0
    
    # Normalize the orientation to be in the range [0, 359]
    rotation_angle = (360 - rotation_angle) % 360
    return rotation_angle

def convert_pdf_page_to_image(page: 'PdfReader.PageObject') -> 'Image':
    """
    Convert a PDF page to an image.

    Args:
    page (PdfReader.PageObject): A single page from a PDF.

    Returns:
    Image: The converted image.
    """
    pdf_writer = PdfWriter()
    pdf_writer.add_page(page)
    
    # Save the single-page PDF to a temporary file
    with tempfile.NamedTemporaryFile(suffix=".pdf") as temp_pdf:
        pdf_writer.write(temp_pdf)
        temp_pdf.flush()
        
        # Convert the single-page PDF to an image using pdf2image
        images = convert_from_path(temp_pdf.name)
    
    return images[0]  # There should be only one page, hence one image

# Usage
input_pdf = "grouped_documents.pdf"
rotation_angles = rotate_all_pages_upright(input_pdf)
print(f"Rotation angles for each page: {rotation_angles}")


C:\Users\arues\AppData\Local\Temp\tmp7x32dra1.pdf


PDFPageCountError: Unable to get page count.
I/O Error: Couldn't open file 'C:\Users\arues\AppData\Local\Temp\tmp7x32dra1.pdf': No error.


In [16]:
from typing import List
from PyPDF2 import PdfReader, PdfWriter
from pdf2image import convert_from_path
from IPython.display import display
import pytesseract
import tempfile

def rotate_all_pages_upright(input_pdf: str) -> List[int]:
    """
    Analyze all pages in the input PDF and determine the rotation angle needed for each page.

    Args:
    input_pdf (str): The file path of the input PDF.

    Returns:
    List[int]: A list of rotation angles (in degrees) for each page. 
               The angles are normalized to be in the range [0, 359].
               0 means no rotation needed, 90 means 90 degrees clockwise, etc.
    """
    reader = PdfReader(input_pdf)
    angles = []
    for page_number in range(len(reader.pages)):
        current_page = reader.pages[page_number]
        rotation_angle = determine_rotation_angle(current_page)
        angles.append(rotation_angle)
    return angles

def determine_rotation_angle(page: 'PdfReader.PageObject') -> int:
    """
    Determine the rotation angle needed to make the page upright.

    Args:
    page (PdfReader.PageObject): A single page from a PDF.

    Returns:
    int:  The rotation angle in degrees (e.g. 0, 90, 210).
          The rotation angle is  normalized to be in the range [0, 359].
          0 means the page is already upright, 90 means 90 degrees clockwise, etc.
    """
    # Extract the image of the page
    page_image = convert_pdf_page_to_image(page)
    
    #display(page_image)
    
    # Use Tesseract to determine the orientation
    try:
        osd = pytesseract.image_to_osd(page_image)
        print(osd)
        rotation_angle = int(osd.split("Rotate:")[1].split("\n")[0].strip())
    except:
        rotation_angle = 0
    
    # Normalize the orientation to be in the range [0, 359]
    rotation_angle = (360 - rotation_angle) % 360
    return rotation_angle

def convert_pdf_page_to_image(page: 'PdfReader.PageObject') -> 'Image':
    """
    Convert a PDF page to an image.

    Args:
    page (PdfReader.PageObject): A single page from a PDF.

    Returns:
    Image: The converted image.
    """
    pdf_writer = PdfWriter()
    pdf_writer.add_page(page)
    
    # Save the single-page PDF to a temporary file
    with tempfile.NamedTemporaryFile(suffix=".pdf", delete=False) as temp_pdf:
        pdf_writer.write(temp_pdf)
        temp_pdf_path = temp_pdf.name
    
    # Convert the single-page PDF to an image using pdf2image
    images = convert_from_path(temp_pdf_path)
    
    # Clean up the temporary file
    import os
    os.remove(temp_pdf_path)
    
    return images[0]  # There should be only one page, hence one image

# Usage
input_pdf = "grouped_documents.pdf"
rotation_angles = rotate_all_pages_upright(input_pdf)
print(f"Rotation angles for each page: {rotation_angles}")


Rotation angles for each page: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]


In [1]:
from typing import List
from PyPDF2 import PdfReader, PdfWriter
from pdf2image import convert_from_path
import pytesseract
import tempfile
import os
from PIL import Image

pytesseract.pytesseract.tesseract_cmd = r'"C:\\Program Files\\Tesseract-OCR\\tesseract.exe"'

def rotate_all_pages_upright(input_pdf: str) -> List[int]:
    """
    Analyze all pages in the input PDF and determine the rotation angle needed for each page.

    Args:
    input_pdf (str): The file path of the input PDF.

    Returns:
    List[int]: A list of rotation angles (in degrees) for each page. 
               The angles are normalized to be in the range [0, 359].
               0 means no rotation needed, 90 means 90 degrees clockwise, etc.
    """
    reader = PdfReader(input_pdf)
    angles = []
    for page_number in range(len(reader.pages)):
        current_page = reader.pages[page_number]
        rotation_angle = determine_rotation_angle(current_page)
        angles.append(rotation_angle)
    return angles

def determine_rotation_angle(page: 'PdfReader.PageObject') -> int:
    """
    Determine the rotation angle needed to make the page upright.

    Args:
    page (PdfReader.PageObject): A single page from a PDF.

    Returns:
    int:  The rotation angle in degrees (e.g. 0, 90, 210).
          The rotation angle is  normalized to be in the range [0, 359].
          0 means the page is already upright, 90 means 90 degrees clockwise, etc.
    """
    # Extract the image of the page
    page_image = convert_pdf_page_to_image(page)
    
    # Use Tesseract to determine the orientation
    try:
        osd = pytesseract.image_to_osd(page_image)
        print(f"Tesseract OSD output: {osd}")  # Debug print
        rotation_angle = int(osd.split("Rotate:")[1].split("\n")[0].strip())
    except Exception as e:
        print(f"Error in OSD: {e}")  # Debug print
        rotation_angle = 0
    
    # Normalize the orientation to be in the range [0, 359]
    rotation_angle = (360 - rotation_angle) % 360
    return rotation_angle

def convert_pdf_page_to_image(page: 'PdfReader.PageObject') -> 'Image':
    """
    Convert a PDF page to an image.

    Args:
    page (PdfReader.PageObject): A single page from a PDF.

    Returns:
    Image: The converted image.
    """
    pdf_writer = PdfWriter()
    pdf_writer.add_page(page)
    
    # Save the single-page PDF to a temporary file
    with tempfile.NamedTemporaryFile(suffix=".pdf", delete=False) as temp_pdf:
        pdf_writer.write(temp_pdf)
        temp_pdf_path = temp_pdf.name
    
    # Convert the single-page PDF to an image using pdf2image
    images = convert_from_path(temp_pdf_path)
    
    # Clean up the temporary file
    os.remove(temp_pdf_path)
    
    return images[0]  # There should be only one page, hence one image

# Usage
output_pdf = "grouped_documents.pdf"
#create_rotated_text_pdf(output_pdf)
rotation_angles = rotate_all_pages_upright(output_pdf)
print(f"Rotation angles for each page: {rotation_angles}")


Error in OSD: [WinError 5] Access is denied
Error in OSD: [WinError 5] Access is denied
Error in OSD: [WinError 5] Access is denied
Error in OSD: [WinError 5] Access is denied
Error in OSD: [WinError 5] Access is denied
Error in OSD: [WinError 5] Access is denied
Error in OSD: [WinError 5] Access is denied
Error in OSD: [WinError 5] Access is denied
Error in OSD: [WinError 5] Access is denied
Error in OSD: [WinError 5] Access is denied
Error in OSD: [WinError 5] Access is denied
Error in OSD: [WinError 5] Access is denied
Error in OSD: [WinError 5] Access is denied
Error in OSD: [WinError 5] Access is denied
Error in OSD: [WinError 5] Access is denied
Error in OSD: [WinError 5] Access is denied
Error in OSD: [WinError 5] Access is denied
Error in OSD: [WinError 5] Access is denied
Rotation angles for each page: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]


In [3]:
from typing import List
from PyPDF2 import PdfReader, PdfWriter
from pdf2image import convert_from_path
import pytesseract
import tempfile
import os
from PIL import Image

# Specify the Tesseract executable path
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'  # Update this path if necessary

def rotate_all_pages_upright(input_pdf: str) -> List[int]:
    """
    Analyze all pages in the input PDF and determine the rotation angle needed for each page.

    Args:
    input_pdf (str): The file path of the input PDF.

    Returns:
    List[int]: A list of rotation angles (in degrees) for each page. 
               The angles are normalized to be in the range [0, 359].
               0 means no rotation needed, 90 means 90 degrees clockwise, etc.
    """
    reader = PdfReader(input_pdf)
    angles = []
    for page_number in range(len(reader.pages)):
        current_page = reader.pages[page_number]
        rotation_angle = determine_rotation_angle(current_page)
        angles.append(rotation_angle)
    return angles

def determine_rotation_angle(page: 'PdfReader.PageObject') -> int:
    """
    Determine the rotation angle needed to make the page upright.

    Args:
    page (PdfReader.PageObject): A single page from a PDF.

    Returns:
    int:  The rotation angle in degrees (e.g. 0, 90, 210).
          The rotation angle is  normalized to be in the range [0, 359].
          0 means the page is already upright, 90 means 90 degrees clockwise, etc.
    """
    # Extract the image of the page
    page_image = convert_pdf_page_to_image(page)
    
    # Use Tesseract to determine the orientation
    try:
        osd = pytesseract.image_to_osd(page_image)
        print(f"Tesseract OSD output: {osd}")  # Debug print
        rotation_angle = int(osd.split("Rotate:")[1].split("\n")[0].strip())
    except Exception as e:
        print(f"Error in OSD: {e}")  # Debug print
        rotation_angle = 0
    
    # Normalize the orientation to be in the range [0, 359]
    rotation_angle = (360 - rotation_angle) % 360
    return rotation_angle

def convert_pdf_page_to_image(page: 'PdfReader.PageObject') -> 'Image':
    """
    Convert a PDF page to an image.

    Args:
    page (PdfReader.PageObject): A single page from a PDF.

    Returns:
    Image: The converted image.
    """
    pdf_writer = PdfWriter()
    pdf_writer.add_page(page)
    
    # Save the single-page PDF to a temporary file
    with tempfile.NamedTemporaryFile(suffix=".pdf", delete=False) as temp_pdf:
        pdf_writer.write(temp_pdf)
        temp_pdf_path = temp_pdf.name
    
    # Convert the single-page PDF to an image using pdf2image
    images = convert_from_path(temp_pdf_path)
    
    # Clean up the temporary file
    os.remove(temp_pdf_path)
    
    return images[0]  # There should be only one page, hence one image

# Usage
output_pdf = "grouped_documents.pdf"
#create_rotated_text_pdf(output_pdf)
rotation_angles = rotate_all_pages_upright(output_pdf)
print(f"Rotation angles for each page: {rotation_angles}")


Rotation angles for each page: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]


In [6]:
from typing import List
from PyPDF2 import PdfReader, PdfWriter
from pdf2image import convert_from_path
import pytesseract
import tempfile
import os
from PIL import Image

# Specify the Tesseract executable path
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'  # Update this path if necessary

def rotate_all_pages_upright(input_pdf: str) -> List[int]:
    """
    Analyze all pages in the input PDF and determine the rotation angle needed for each page.

    Args:
    input_pdf (str): The file path of the input PDF.

    Returns:
    List[int]: A list of rotation angles (in degrees) for each page. 
               The angles are normalized to be in the range [0, 359].
               0 means no rotation needed, 90 means 90 degrees clockwise, etc.
    """
    reader = PdfReader(input_pdf)
    angles = []
    for page_number in range(len(reader.pages)):
        current_page = reader.pages[page_number]
        rotation_angle = determine_rotation_angle(current_page)
        angles.append(rotation_angle)
    return angles

def determine_rotation_angle(page: 'PdfReader.PageObject') -> int:
    """
    Determine the rotation angle needed to make the page upright.

    Args:
    page (PdfReader.PageObject): A single page from a PDF.

    Returns:
    int: The rotation angle in degrees (e.g. 0, 90, 210).
         The rotation angle is normalized to be in the range [0, 359].
         0 means the page is already upright, 90 means 90 degrees clockwise, etc.
    """
    # Extract the image of the page
    page_image = convert_pdf_page_to_image(page)
    
    # Use Tesseract to determine the orientation
    try:
        osd = pytesseract.image_to_osd(page_image)
        print(f"Tesseract OSD output: {osd}")  # Debug print
        rotation_angle = int(osd.split("Rotate:")[1].split("\n")[0].strip())
    except Exception as e:
        print(f"Error in OSD: {e}")  # Debug print
        rotation_angle = 0
    
    # Normalize the orientation to be in the range [0, 359]
    rotation_angle = (360 - rotation_angle) % 360
    return rotation_angle

def convert_pdf_page_to_image(page: 'PdfReader.PageObject') -> 'Image':
    """
    Convert a PDF page to an image.

    Args:
    page (PdfReader.PageObject): A single page from a PDF.

    Returns:
    Image: The converted image.
    """
    pdf_writer = PdfWriter()
    pdf_writer.add_page(page)
    
    # Save the single-page PDF to a temporary file
    with tempfile.NamedTemporaryFile(suffix=".pdf", delete=False) as temp_pdf:
        pdf_writer.write(temp_pdf)
        temp_pdf_path = temp_pdf.name
    
    # Convert the single-page PDF to an image using pdf2image with higher resolution
    images = convert_from_path(temp_pdf_path, dpi=300)
    
    # Clean up the temporary file
    os.remove(temp_pdf_path)
    
    return images[0]  # There should be only one page, hence one image

# Usage
output_pdf = "grouped_documents.pdf"
rotation_angles = rotate_all_pages_upright(output_pdf)
print(f"Rotation angles for each page: {rotation_angles}")


Tesseract OSD output: Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 1.11
Script: Latin
Script confidence: 3.33

Tesseract OSD output: Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 0.53
Script: Latin
Script confidence: 6.67

Tesseract OSD output: Page number: 0
Orientation in degrees: 90
Rotate: 270
Orientation confidence: 0.75
Script: Latin
Script confidence: 16.67

Tesseract OSD output: Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 0.18
Script: Latin
Script confidence: 6.67

Tesseract OSD output: Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 0.60
Script: Latin
Script confidence: 6.67

Rotation angles for each page: [0, 0, 0, 0, 0, 0, 90, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]


In [3]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import transforms, models
from PIL import Image
from pdf2image import convert_from_path
import os
import tempfile
from PyPDF2 import PdfReader, PdfWriter
from typing import List

# Define the CNN model for angle regression
class AngleRegressionModel(nn.Module):
    def __init__(self):
        super(AngleRegressionModel, self).__init__()
        self.model = models.resnet18(pretrained=True)
        self.model.fc = nn.Linear(self.model.fc.in_features, 1)  # Regression output

    def forward(self, x):
        return self.model(x)

# Function to load the pre-trained model
def load_angle_regression_model(model_path: str):
    model = AngleRegressionModel()
    model.load_state_dict(torch.load(model_path, map_location=torch.device('cpu')))
    model.eval()
    return model

# Function to predict the rotation angle of an image
def predict_rotation_angle(model, image: 'Image') -> float:
    preprocess = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ])
    
    image_tensor = preprocess(image).unsqueeze(0)  # Add batch dimension
    with torch.no_grad():
        outputs = model(image_tensor)
    angle = outputs.item()
    return angle % 360

# Function to convert a PDF page to an image
def convert_pdf_page_to_image(page: 'PdfReader.PageObject') -> 'Image':
    pdf_writer = PdfWriter()
    pdf_writer.add_page(page)
    
    with tempfile.NamedTemporaryFile(suffix=".pdf", delete=False) as temp_pdf:
        pdf_writer.write(temp_pdf)
        temp_pdf_path = temp_pdf.name
    
    images = convert_from_path(temp_pdf_path, dpi=300)
    os.remove(temp_pdf_path)
    
    return images[0]

# Function to determine the exact rotation angle
def determine_rotation_angle(model, page: 'PdfReader.PageObject') -> float:
    page_image = convert_pdf_page_to_image(page)
    rotation_angle = predict_rotation_angle(model, page_image)
    return rotation_angle

# Main function to rotate all pages upright
def rotate_all_pages_upright(input_pdf: str, model_path: str) -> List[float]:
    model = load_angle_regression_model(model_path)
    reader = PdfReader(input_pdf)
    angles = []
    for page_number in range(len(reader.pages)):
        current_page = reader.pages[page_number]
        rotation_angle = determine_rotation_angle(model, current_page)
        angles.append(rotation_angle)
    return angles

# Usage
output_pdf = "grouped_documents.pdf"
model_path = "angle_regression_model.pth"  # Path to your pre-trained model
rotation_angles = rotate_all_pages_upright(output_pdf, model_path)
print(f"Rotation angles for each page: {rotation_angles}")


RuntimeError: Error(s) in loading state_dict for AngleRegressionModel:
	Missing key(s) in state_dict: "model.fc.weight", "model.fc.bias". 
	Unexpected key(s) in state_dict: "model.layer1.2.conv1.weight", "model.layer1.2.bn1.weight", "model.layer1.2.bn1.bias", "model.layer1.2.bn1.running_mean", "model.layer1.2.bn1.running_var", "model.layer1.2.bn1.num_batches_tracked", "model.layer1.2.conv2.weight", "model.layer1.2.bn2.weight", "model.layer1.2.bn2.bias", "model.layer1.2.bn2.running_mean", "model.layer1.2.bn2.running_var", "model.layer1.2.bn2.num_batches_tracked", "model.layer1.2.conv3.weight", "model.layer1.2.bn3.weight", "model.layer1.2.bn3.bias", "model.layer1.2.bn3.running_mean", "model.layer1.2.bn3.running_var", "model.layer1.2.bn3.num_batches_tracked", "model.layer1.0.conv3.weight", "model.layer1.0.bn3.weight", "model.layer1.0.bn3.bias", "model.layer1.0.bn3.running_mean", "model.layer1.0.bn3.running_var", "model.layer1.0.bn3.num_batches_tracked", "model.layer1.0.downsample.0.weight", "model.layer1.0.downsample.1.weight", "model.layer1.0.downsample.1.bias", "model.layer1.0.downsample.1.running_mean", "model.layer1.0.downsample.1.running_var", "model.layer1.0.downsample.1.num_batches_tracked", "model.layer1.1.conv3.weight", "model.layer1.1.bn3.weight", "model.layer1.1.bn3.bias", "model.layer1.1.bn3.running_mean", "model.layer1.1.bn3.running_var", "model.layer1.1.bn3.num_batches_tracked", "model.layer2.2.conv1.weight", "model.layer2.2.bn1.weight", "model.layer2.2.bn1.bias", "model.layer2.2.bn1.running_mean", "model.layer2.2.bn1.running_var", "model.layer2.2.bn1.num_batches_tracked", "model.layer2.2.conv2.weight", "model.layer2.2.bn2.weight", "model.layer2.2.bn2.bias", "model.layer2.2.bn2.running_mean", "model.layer2.2.bn2.running_var", "model.layer2.2.bn2.num_batches_tracked", "model.layer2.2.conv3.weight", "model.layer2.2.bn3.weight", "model.layer2.2.bn3.bias", "model.layer2.2.bn3.running_mean", "model.layer2.2.bn3.running_var", "model.layer2.2.bn3.num_batches_tracked", "model.layer2.3.conv1.weight", "model.layer2.3.bn1.weight", "model.layer2.3.bn1.bias", "model.layer2.3.bn1.running_mean", "model.layer2.3.bn1.running_var", "model.layer2.3.bn1.num_batches_tracked", "model.layer2.3.conv2.weight", "model.layer2.3.bn2.weight", "model.layer2.3.bn2.bias", "model.layer2.3.bn2.running_mean", "model.layer2.3.bn2.running_var", "model.layer2.3.bn2.num_batches_tracked", "model.layer2.3.conv3.weight", "model.layer2.3.bn3.weight", "model.layer2.3.bn3.bias", "model.layer2.3.bn3.running_mean", "model.layer2.3.bn3.running_var", "model.layer2.3.bn3.num_batches_tracked", "model.layer2.0.conv3.weight", "model.layer2.0.bn3.weight", "model.layer2.0.bn3.bias", "model.layer2.0.bn3.running_mean", "model.layer2.0.bn3.running_var", "model.layer2.0.bn3.num_batches_tracked", "model.layer2.1.conv3.weight", "model.layer2.1.bn3.weight", "model.layer2.1.bn3.bias", "model.layer2.1.bn3.running_mean", "model.layer2.1.bn3.running_var", "model.layer2.1.bn3.num_batches_tracked", "model.layer3.2.conv1.weight", "model.layer3.2.bn1.weight", "model.layer3.2.bn1.bias", "model.layer3.2.bn1.running_mean", "model.layer3.2.bn1.running_var", "model.layer3.2.bn1.num_batches_tracked", "model.layer3.2.conv2.weight", "model.layer3.2.bn2.weight", "model.layer3.2.bn2.bias", "model.layer3.2.bn2.running_mean", "model.layer3.2.bn2.running_var", "model.layer3.2.bn2.num_batches_tracked", "model.layer3.2.conv3.weight", "model.layer3.2.bn3.weight", "model.layer3.2.bn3.bias", "model.layer3.2.bn3.running_mean", "model.layer3.2.bn3.running_var", "model.layer3.2.bn3.num_batches_tracked", "model.layer3.3.conv1.weight", "model.layer3.3.bn1.weight", "model.layer3.3.bn1.bias", "model.layer3.3.bn1.running_mean", "model.layer3.3.bn1.running_var", "model.layer3.3.bn1.num_batches_tracked", "model.layer3.3.conv2.weight", "model.layer3.3.bn2.weight", "model.layer3.3.bn2.bias", "model.layer3.3.bn2.running_mean", "model.layer3.3.bn2.running_var", "model.layer3.3.bn2.num_batches_tracked", "model.layer3.3.conv3.weight", "model.layer3.3.bn3.weight", "model.layer3.3.bn3.bias", "model.layer3.3.bn3.running_mean", "model.layer3.3.bn3.running_var", "model.layer3.3.bn3.num_batches_tracked", "model.layer3.4.conv1.weight", "model.layer3.4.bn1.weight", "model.layer3.4.bn1.bias", "model.layer3.4.bn1.running_mean", "model.layer3.4.bn1.running_var", "model.layer3.4.bn1.num_batches_tracked", "model.layer3.4.conv2.weight", "model.layer3.4.bn2.weight", "model.layer3.4.bn2.bias", "model.layer3.4.bn2.running_mean", "model.layer3.4.bn2.running_var", "model.layer3.4.bn2.num_batches_tracked", "model.layer3.4.conv3.weight", "model.layer3.4.bn3.weight", "model.layer3.4.bn3.bias", "model.layer3.4.bn3.running_mean", "model.layer3.4.bn3.running_var", "model.layer3.4.bn3.num_batches_tracked", "model.layer3.5.conv1.weight", "model.layer3.5.bn1.weight", "model.layer3.5.bn1.bias", "model.layer3.5.bn1.running_mean", "model.layer3.5.bn1.running_var", "model.layer3.5.bn1.num_batches_tracked", "model.layer3.5.conv2.weight", "model.layer3.5.bn2.weight", "model.layer3.5.bn2.bias", "model.layer3.5.bn2.running_mean", "model.layer3.5.bn2.running_var", "model.layer3.5.bn2.num_batches_tracked", "model.layer3.5.conv3.weight", "model.layer3.5.bn3.weight", "model.layer3.5.bn3.bias", "model.layer3.5.bn3.running_mean", "model.layer3.5.bn3.running_var", "model.layer3.5.bn3.num_batches_tracked", "model.layer3.0.conv3.weight", "model.layer3.0.bn3.weight", "model.layer3.0.bn3.bias", "model.layer3.0.bn3.running_mean", "model.layer3.0.bn3.running_var", "model.layer3.0.bn3.num_batches_tracked", "model.layer3.1.conv3.weight", "model.layer3.1.bn3.weight", "model.layer3.1.bn3.bias", "model.layer3.1.bn3.running_mean", "model.layer3.1.bn3.running_var", "model.layer3.1.bn3.num_batches_tracked", "model.layer4.2.conv1.weight", "model.layer4.2.bn1.weight", "model.layer4.2.bn1.bias", "model.layer4.2.bn1.running_mean", "model.layer4.2.bn1.running_var", "model.layer4.2.bn1.num_batches_tracked", "model.layer4.2.conv2.weight", "model.layer4.2.bn2.weight", "model.layer4.2.bn2.bias", "model.layer4.2.bn2.running_mean", "model.layer4.2.bn2.running_var", "model.layer4.2.bn2.num_batches_tracked", "model.layer4.2.conv3.weight", "model.layer4.2.bn3.weight", "model.layer4.2.bn3.bias", "model.layer4.2.bn3.running_mean", "model.layer4.2.bn3.running_var", "model.layer4.2.bn3.num_batches_tracked", "model.layer4.0.conv3.weight", "model.layer4.0.bn3.weight", "model.layer4.0.bn3.bias", "model.layer4.0.bn3.running_mean", "model.layer4.0.bn3.running_var", "model.layer4.0.bn3.num_batches_tracked", "model.layer4.1.conv3.weight", "model.layer4.1.bn3.weight", "model.layer4.1.bn3.bias", "model.layer4.1.bn3.running_mean", "model.layer4.1.bn3.running_var", "model.layer4.1.bn3.num_batches_tracked", "model.fc.1.weight", "model.fc.1.bias". 
	size mismatch for model.layer1.0.conv1.weight: copying a param with shape torch.Size([64, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
	size mismatch for model.layer1.1.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
	size mismatch for model.layer2.0.conv1.weight: copying a param with shape torch.Size([128, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 64, 3, 3]).
	size mismatch for model.layer2.0.downsample.0.weight: copying a param with shape torch.Size([512, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 64, 1, 1]).
	size mismatch for model.layer2.0.downsample.1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
	size mismatch for model.layer2.0.downsample.1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
	size mismatch for model.layer2.0.downsample.1.running_mean: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
	size mismatch for model.layer2.0.downsample.1.running_var: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([128]).
	size mismatch for model.layer2.1.conv1.weight: copying a param with shape torch.Size([128, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]).
	size mismatch for model.layer3.0.conv1.weight: copying a param with shape torch.Size([256, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 3, 3]).
	size mismatch for model.layer3.0.downsample.0.weight: copying a param with shape torch.Size([1024, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 1, 1]).
	size mismatch for model.layer3.0.downsample.1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for model.layer3.0.downsample.1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for model.layer3.0.downsample.1.running_mean: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for model.layer3.0.downsample.1.running_var: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for model.layer3.1.conv1.weight: copying a param with shape torch.Size([256, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 256, 3, 3]).
	size mismatch for model.layer4.0.conv1.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 256, 3, 3]).
	size mismatch for model.layer4.0.downsample.0.weight: copying a param with shape torch.Size([2048, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 256, 1, 1]).
	size mismatch for model.layer4.0.downsample.1.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
	size mismatch for model.layer4.0.downsample.1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
	size mismatch for model.layer4.0.downsample.1.running_mean: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
	size mismatch for model.layer4.0.downsample.1.running_var: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([512]).
	size mismatch for model.layer4.1.conv1.weight: copying a param with shape torch.Size([512, 2048, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 512, 3, 3]).

In [11]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import transforms, models
from PIL import Image
from pdf2image import convert_from_path
import os
import tempfile
from PyPDF2 import PdfReader, PdfWriter
from typing import List

# Define the CNN model for angle regression
class AngleRegressionModel(nn.Module):
    def __init__(self):
        super(AngleRegressionModel, self).__init__()
        self.model = models.resnet50(weights=models.ResNet50_Weights.DEFAULT)
        self.model.fc = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(self.model.fc.in_features, 2)  # Predicting sine and cosine of the angle
        )

    def forward(self, x):
        return self.model(x)

# Function to load the pre-trained model
def load_angle_regression_model(model_path: str):
    model = AngleRegressionModel()
    model.load_state_dict(torch.load(model_path, map_location=torch.device('cpu')))
    model.eval()
    return model

# Function to convert sine and cosine to angle
def vector_to_angle(vector):
    angle_rad = torch.atan2(vector[1], vector[0])
    angle_deg = torch.rad2deg(angle_rad)
    return angle_deg if angle_deg >= 0 else angle_deg + 360

# Function to predict the rotation angle of an image
def predict_rotation_angle(model, image: 'Image') -> float:
    preprocess = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ])
    
    image_tensor = preprocess(image).unsqueeze(0)  # Add batch dimension
    with torch.no_grad():
        outputs = vector_to_angle(model(image_tensor)[0])
    angle = outputs.item()
    return angle % 360

# Function to convert a PDF page to an image
def convert_pdf_page_to_image(page: 'PdfReader.PageObject') -> 'Image':
    pdf_writer = PdfWriter()
    pdf_writer.add_page(page)
    
    with tempfile.NamedTemporaryFile(suffix=".pdf", delete=False) as temp_pdf:
        pdf_writer.write(temp_pdf)
        temp_pdf_path = temp_pdf.name
    
    images = convert_from_path(temp_pdf_path, dpi=300)
    os.remove(temp_pdf_path)
    
    return images[0]

# Function to determine the exact rotation angle
def determine_rotation_angle(model, page: 'PdfReader.PageObject') -> float:
    page_image = convert_pdf_page_to_image(page)
    rotation_angle = predict_rotation_angle(model, page_image)
    return rotation_angle

# Main function to rotate all pages upright
def rotate_all_pages_upright(input_pdf: str, model_path: str) -> List[float]:
    model = load_angle_regression_model(model_path)
    reader = PdfReader(input_pdf)
    angles = []
    for page_number in range(len(reader.pages)):
        current_page = reader.pages[page_number]
        rotation_angle = determine_rotation_angle(model, current_page)
        angles.append(rotation_angle)
    return angles

# Usage
output_pdf = "grouped_documents.pdf"
model_path = "angle_regression_model_3.pth"  # Path to your pre-trained model
rotation_angles = rotate_all_pages_upright(output_pdf, model_path)
print(f"Rotation angles for each page: {rotation_angles}")


Rotation angles for each page: [2.4669904708862305, 2.5704703330993652, 2.5704689025878906, 175.28688049316406, 172.16053771972656, 176.12037658691406, 20.410871505737305, 20.410871505737305, 20.410871505737305, 4.188563823699951, 4.191892147064209, 4.190777778625488, 20.410871505737305, 20.410871505737305, 20.410871505737305, 174.5769805908203, 172.40902709960938, 20.410871505737305]
