# Assignment_12

##  1. In what modes should the PdfFileReader() and PdfFileWriter() File objects will be opened?

In [None]:
#Solution
When working with the PdfFileReader() and PdfFileWriter() classes in Python's PyPDF2 library (or similar libraries for working with PDFs), we typically don't need to explicitly open the input PDF file in a specific mode. Instead, we provide the file name as an argument to the PdfFileReader() constructor, and the library handles the file opening and reading internally.
Here's an example:

    from PyPDF2 import PdfFileReader, PdfFileWriter

    # Open the input PDF file in read-binary mode
    pdf_reader = PdfFileReader(open('input.pdf', 'rb'))

    # Create a PdfFileWriter object for writing output
    pdf_writer = PdfFileWriter()
    
In the example above, we use the open() function to open the input PDF file in read-binary mode ('rb'). However, this is done outside the PdfFileReader() constructor. The PdfFileReader() constructor takes the already opened file object as an argument.
The same principle applies when using the PdfFileWriter() class. We don't need to open the output PDF file explicitly in a specific mode. Instead, we create a PdfFileWriter() object and use its methods to add content and pages to it. When we're ready to save the output PDF file, we can open it in write-binary mode ('wb') using the open() function and write the data from the PdfFileWriter() object into it.

Here's an example:

    from PyPDF2 import PdfFileWriter

    # Create a PdfFileWriter object for writing output
    pdf_writer = PdfFileWriter()

    # Add pages or content to pdf_writer

    # Open the output PDF file in write-binary mode and write data
    with open('output.pdf', 'wb') as output_pdf:
        pdf_writer.write(output_pdf)
        
So, to summarize, we typically open the input PDF file in read-binary mode outside the PdfFileReader() constructor, and we open the output PDF file in write-binary mode using the open() function when we're ready to save the modified PDF created with PdfFileWriter().


## 2. From a PdfFileReader object, how do you get a Page object for page 5?

In [None]:
#Solution
To get a Page object for a specific page, such as page 5, from a PdfFileReader object in Python's PyPDF2 library, we can use the getPage() method. Here's how we can do it:

    from PyPDF2 import PdfFileReader

    # Open the input PDF file in read-binary mode ('rb')
    pdf_reader = PdfFileReader(open('input.pdf', 'rb'))

    # Get a Page object for page 5 (pages are 0-indexed)
    page_number = 4  # Page 5 corresponds to index 4
    page = pdf_reader.getPage(page_number)

    # Now you can work with the 'page' object

In the example above, we first open the input PDF file in read-binary mode and create a PdfFileReader object (pdf_reader). Then, we use the getPage() method to retrieve the Page object for page 5. Note that page numbers are 0-indexed, so page 5 corresponds to index 4 in this case.

## 3. What PdfFileReader variable stores the number of pages in the PDF document?

In [None]:
#Solution
In Python's PyPDF2 library, the number of pages in a PDF document is stored in the numPages attribute of a PdfFileReader object. We can access this attribute to determine the total number of pages in the PDF document. Here's an example:

    from PyPDF2 import PdfFileReader

    # Open the input PDF file in read-binary mode ('rb')
    pdf_reader = PdfFileReader(open('input.pdf', 'rb'))

    # Get the number of pages in the PDF document
    num_pages = pdf_reader.numPages

    # Print the total number of pages
    print(f'The PDF document contains {num_pages} pages.')
    
In the example above, we use the numPages attribute of the pdf_reader object to retrieve the total number of pages in the PDF document and then print it.

## 4. If a PdfFileReader object’s PDF is encrypted with the password swordfish, what must you do before you can obtain Page objects from it?

In [None]:
#Solution
If a PdfFileReader object's PDF is encrypted with a password (e.g., "swordfish"), we need to decrypt it using that password before we can obtain Page objects from it. Here's how we can do it:

    from PyPDF2 import PdfFileReader

    # Open the input PDF file in read-binary mode ('rb')
    pdf_reader = PdfFileReader(open('encrypted.pdf', 'rb'))

    # Check if the PDF is encrypted
    if pdf_reader.isEncrypted:
        # Decrypt the PDF with the password
        pdf_reader.decrypt('swordfish')

    # Now we can obtain Page objects and work with the PDF


## 5. What methods do you use to rotate a page?

In [None]:
#Solution
To rotate a page in a PDF using Python's PyPDF2 library (or similar libraries for working with PDFs), we can use the rotateClockwise() and rotateCounterClockwise() methods of a Page object. Here's how we can use these methods to rotate a page:

from PyPDF2 import PdfFileReader, PdfFileWriter
pdf_reader = PdfFileReader(open('input.pdf', 'rb'))
page_number = 4  # Page 5 corresponds to index 4 (0-indexed)
page = pdf_reader.getPage(page_number)
# Rotate the page clockwise by 90 degrees
page.rotateClockwise(90)
# Rotate the page counterclockwise by 90 degrees
page.rotateCounterClockwise(90)
#Create a PdfFileWriter object and add the modified page to it
pdf_writer = PdfFileWriter()
pdf_writer.addPage(page)
#Save the modified PDF to a new file:
with open('output.pdf', 'wb') as output_pdf:
    pdf_writer.write(output_pdf)

## 6. What is the difference between a Run object and a Paragraph object?

In [None]:
#Solution
In Python, when working with libraries or modules that provide access to Microsoft Word documents, such as the python-docx library, we'll encounter similar concepts of Run and Paragraph objects as in Microsoft Word's object model. However, the specific differences between Run and Paragraph objects in Python may depend on the library or module we're using. Let's discuss the general differences between Run and Paragraph objects in the context of the python-docx library, which is a popular Python library for working with Word documents.

1. Paragraph Object (python-docx):
In the python-docx library, a Paragraph object represents a single paragraph of text within a Word document, similar to Microsoft Word.
A paragraph is a block of text that typically ends with a paragraph mark (newline character).
We can access and manipulate various properties of a paragraph, such as its text content, alignment, indentation, and spacing.

2. Run Object (python-docx):
In python-docx, a Run object represents a contiguous run of text within a paragraph, much like in Microsoft Word.
Runs are used to apply specific formatting attributes to a portion of text within a paragraph. For example, we can have a single paragraph with different runs to represent bold, italic, or colored text within the same paragraph.
We can access and modify the formatting properties of a run, such as font size, style, and color.

Here's a simplified example of how we might work with Paragraph and Run objects using the python-docx library:

    from docx import Document

    # Create a new Word document
    doc = Document()

    # Add a paragraph
    paragraph = doc.add_paragraph('This is a paragraph.')

    # Add a run within the paragraph with specific formatting
    run = paragraph.add_run(' This is bold and italic text.')
    run.bold = True
    run.italic = True

    # Save the document
    doc.save('example.docx')
    
In this example, we first create a paragraph and then add a run within that paragraph to apply bold and italic formatting to a portion of the text.

## 7. How do you obtain a list of Paragraph objects for a Document object that’s stored in a variable named doc?

In [None]:
#Solution
In the python-docx library, we can obtain a list of Paragraph objects for a Document object by iterating through the paragraphs in the document. Here's how we can do it:

    import docx

    # Load the document
    doc = docx.Document("your_document.docx")  # Replace with your document file path

    # Initialize a list to store Paragraph objects
    paragraphs = []

    # Iterate through paragraphs and add them to the list
    for paragraph in doc.paragraphs:
        paragraphs.append(paragraph)

    # Now, the 'paragraphs' list contains Paragraph objects from the document


## 8. What type of object has bold, underline, italic, strike, and outline variables?

In [None]:
#Solution
In Python, the object that typically has properties like bold, underline, italic, strike, and outline is a text formatting object within a document processing library or tool. These properties are commonly associated with text styling or formatting in documents, such as Microsoft Word documents (DOCX) or rich text documents.
For example, if we are using the python-docx library to work with DOCX files, we would use a Run object to represent a run of text within a paragraph. A Run object has properties like bold, underline, italic, strike, and others to control text formatting. Here's an example of how we might use these properties:

    from docx import Document
    from docx.shared import Pt

    # Create a new document
    doc = Document()

    # Add a paragraph with formatted text
    paragraph = doc.add_paragraph()
    run = paragraph.add_run("This is formatted text")

    # Apply formatting
    run.bold = True
    run.underline = True
    run.italic = True
    run.strike = True
    run.font.size = Pt(12)

In this example, run is a Run object, and we can use its properties to control the formatting of the text.

## 9. What is the difference between False, True, and None for the bold variable?

In [None]:
#Solution
In the context of text formatting, particularly when working with document processing libraries like python-docx for working with DOCX files, the bold variable can typically take three different values: True, False, and None. These values represent different states for the bold formatting of text:
1. True: When we set bold to True, it indicates that the text should be displayed in a bold font. This means that the text will appear with thicker and darker characters, making it visually distinct from regular text. For example:
    run = paragraph.add_run("This is bold text")
    run.bold = True
    In this case, "This is bold text" will be displayed in a bold font.
2. False: Setting bold to False explicitly specifies that the text should not be displayed in a bold font. This is the default state for text formatting, so if we don't set bold at all or set it to False, the text will be in a regular (non-bold) font.
    run = paragraph.add_run("This is regular text")
    run.bold = False  # This is the default behavior, and you can omit this line.
    In this case, "This is regular text" will be displayed without bold formatting.
3. None: When we set bold to None, it often means that we want to inherit the formatting from the parent style or document default. If the parent style is bold, the text will appear bold; if the parent style is not bold, the text will appear as regular text.
    run = paragraph.add_run("This inherits boldness from parent style")
    run.bold = None  # Inherits the formatting from the parent style.
In this case, the boldness of the text depends on the parent style or document default style.

## 10. How do you create a Document object for a new Word document?

In [None]:
#Solution

#Install the python-docx library (if not already installed):
#We can install the python-docx library using pip:

pip install python-docx

from docx import Document

# Create a new Word document
doc = Document()

# Add content to the document
doc.add_paragraph("This is a new Word document.")
doc.add_paragraph("We can add more paragraphs and elements here.")

# Save the document to a file
doc.save("new_document.docx")


## 11. How do you add a paragraph with the text &#39;Hello, there!&#39; to a Document object stored in a variable named doc?

In [None]:
#Solution
from docx import Document

# Assuming we have already created a Document object stored in the variable 'doc'

# Add a paragraph with the desired text
doc.add_paragraph('Hello, there!')


## 12. What integers represent the levels of headings available in Word documents?

In [None]:
#Solution
In Word documents and many word processing applications, headings are typically organized into levels, with each level represented by an integer. The common convention for representing the levels of headings in Word documents is using integers ranging from 1 to 9. These integers correspond to the hierarchy of headings, where lower numbers indicate higher-level headings (main headings or section titles), and higher numbers represent lower-level headings (subheadings or subsections). Here's a typical breakdown:

1. Heading 1: Level 1 (usually represented as "1" in most word processing software)
2. Heading 2: Level 2 (usually represented as "2")
3. Heading 3: Level 3 (usually represented as "3")
4. Heading 4: Level 4 (usually represented as "4")
5. Heading 5: Level 5 (usually represented as "5")
6. Heading 6: Level 6 (usually represented as "6")
7. Heading 7: Level 7 (usually represented as "7")
8. Heading 8: Level 8 (usually represented as "8")
9. Heading 9: Level 9 (usually represented as "9")

These heading levels help organize the structure and hierarchy of a document, making it easier to create a table of contents, navigate through the document, and apply consistent formatting to different sections. The actual appearance and formatting of headings can vary depending on the document's style and formatting settings.
