In [None]:
1. In what modes should the PdfFileReader() and PdfFileWriter() File objects will be opened?

In [None]:
In the PyPDF2 library (commonly used for working with PDF files in Python), the PdfFileReader() and
PdfFileWriter() objects do not require you to specify modes for opening PDF files like you would with
regular file objects in Python (e.g., 'r' for reading, 'w' for writing). Instead, you typically open PDF files in binary mode ('rb' for reading and 'wb' for writing) when working with PdfFileReader() and PdfFileWriter().

Here's how you would commonly use these objects:

In [None]:
1. PdfFileReader:

To open an existing PDF file for reading, you should use 'rb' (read binary) mode:

In [None]:
from PyPDF2 import PdfFileReader

with open('example.pdf', 'rb') as pdf_file:
    pdf_reader = PdfFileReader(pdf_file)
    # You can now use pdf_reader to read and manipulate the PDF content.

In [None]:
2. PdfFileWriter:

To create a new PDF file or modify an existing one, you should use 'wb' (write binary) 

In [None]:
from PyPDF2 import PdfFileWriter

with open('new_file.pdf', 'wb') as pdf_output_file:
    pdf_writer = PdfFileWriter()
    # You can now use pdf_writer to add pages and content to the PDF.

In [None]:
The 'rb' and 'wb' modes ensure that the PDF file is treated as a binary file, which is necessary because
PDF files contain binary data. Using these modes helps prevent any platform-specific issues with line
endings and ensures that the file is read or written correctly.

So, in summary, you should open PDF files in 'rb' mode for reading with PdfFileReader() and 'wb' mode
for writing with PdfFileWriter() in the PyPDF2 library.

In [None]:
2. From a PdfFileReader object, how do you get a Page object for page 5?

In [None]:
To get a Page object for page 5 from a PdfFileReader object in the PyPDF2 library, you can use the
getPage() method and pass the page number (zero-based index) as an argument. Here's how you can do
it:

In [None]:
from PyPDF2 import PdfFileReader

# Open the PDF file for reading in binary mode
with open('example.pdf', 'rb') as pdf_file:
    pdf_reader = PdfFileReader(pdf_file)

    # Get a Page object for page 5 (zero-based index)
    page_number = 4  # Page 5 corresponds to index 4
    page = pdf_reader.getPage(page_number)

    # Now you can work with the 'page' object, such as extracting text or modifying it.

In [None]:
In this code:

. We open the PDF file in binary mode using the 'rb' mode.
. We create a PdfFileReader object, pdf_reader, to read the PDF content.
. We specify the page number we want to access (page 5 corresponds to index 4, as PDF page numbering is
  zero-based).
. We use the getPage() method to retrieve a Page object for page 5, and you can then perform various
  operations on this Page object, such as extracting text or making modifications.
    
Make sure to replace 'example.pdf' with the actual path to your PDF file.

In [None]:
3. What PdfFileReader variable stores the number of pages in the PDF document?

In [None]:
In the PyPDF2 library, the number of pages in a PDF document can be obtained using the numPages attribute
of a PdfFileReader object. This attribute stores the total number of pages in the PDF document. Here's how
you can access it:

In [None]:
from PyPDF2 import PdfFileReader

# Open the PDF file for reading in binary mode
with open('example.pdf', 'rb') as pdf_file:
    pdf_reader = PdfFileReader(pdf_file)

    # Get the number of pages in the PDF document
    num_pages = pdf_reader.numPages

    # Now 'num_pages' contains the total number of pages in the PDF.

In [None]:
In this code:

. We open the PDF file in binary mode using the 'rb' mode.
. We create a PdfFileReader object, pdf_reader, to read the PDF content.
. We access the numPages attribute to retrieve the total number of pages in the PDF document, which is
  stored in the num_pages variable.

You can then use the num_pages variable to perform operations or calculations based on the total number
of pages in the PDF.

In [None]:
4. If a PdfFileReader object’s PDF is encrypted with the password swordfish, what must you do
   before you can obtain Page objects from it?

In [None]:

If a PdfFileReader object's PDF is encrypted with a password (e.g., "swordfish"), you must provide the
correct password to decrypt the PDF before you can obtain Page objects or perform any operations on it.
To do this, you can use the decrypt() method of the PdfFileReader object.

Here's how you can open an encrypted PDF with a password:

In [None]:
from PyPDF2 import PdfFileReader

# Open the encrypted PDF file for reading in binary mode
with open('encrypted.pdf', 'rb') as pdf_file:
    pdf_reader = PdfFileReader(pdf_file)

    # Provide the password to decrypt the PDF
    password = 'swordfish'

    # Use the decrypt() method to decrypt the PDF with the password
    if pdf_reader.decrypt(password):
        # The PDF is successfully decrypted with the provided password
        num_pages = pdf_reader.numPages
        # Now you can obtain Page objects and work with the PDF as needed.
    else:
        # The provided password is incorrect; handle the decryption failure accordingly.
        print("Failed to decrypt the PDF. Incorrect password.")


In [None]:
In this code:

. We open the encrypted PDF file in binary mode using the 'rb' mode.
. We create a PdfFileReader object, pdf_reader, to read the PDF content.
. We provide the correct password ('swordfish') to the decrypt() method to decrypt the PDF. If the
  password is correct, the method returns True, indicating successful decryption.
. If the provided password is incorrect, the decrypt() method returns False, and you should handle the
  decryption failure accordingly.
    
After successful decryption, you can obtain Page objects and perform operations on the PDF as usual

In [None]:
5. What methods do you use to rotate a page?

In [None]:

To rotate a page in a PDF using the PyPDF2 library in Python, you can use the rotateClockwise() or
rotateCounterClockwise() methods of a PageObject obtained from a PdfFileReader or PdfFileWriter object.
Here's how you can use these methods:

In [None]:
1. rotateClockwise(degrees):

This method rotates the page clockwise by the specified number of degrees (90, 180, or 270 degrees). The
original page is replaced with the rotated page.

In [None]:
from PyPDF2 import PdfFileReader, PdfFileWriter

# Open the PDF file for reading and writing
with open('input.pdf', 'rb') as pdf_file:
    pdf_reader = PdfFileReader(pdf_file)
    pdf_writer = PdfFileWriter()

    # Rotate the first page of the PDF clockwise by 90 degrees
    page = pdf_reader.getPage(0)
    page.rotateClockwise(90)

    # Add the rotated page to the output PDF
    pdf_writer.addPage(page)

    # Save the modified PDF to a new file
    with open('output.pdf', 'wb') as output_file:
        pdf_writer.write(output_file)

In [None]:
2. rotateCounterClockwise(degrees):

This method rotates the page counterclockwise by the specified number of degrees (90, 180, or 270
degrees). The original page is replaced with the rotated page.

In [None]:
from PyPDF2 import PdfFileReader, PdfFileWriter

# Open the PDF file for reading and writing
with open('input.pdf', 'rb') as pdf_file:
    pdf_reader = PdfFileReader(pdf_file)
    pdf_writer = PdfFileWriter()

    # Rotate the first page of the PDF counterclockwise by 90 degrees
    page = pdf_reader.getPage(0)
    page.rotateCounterClockwise(90)

    # Add the rotated page to the output PDF
    pdf_writer.addPage(page)

    # Save the modified PDF to a new file
    with open('output.pdf', 'wb') as output_file:
        pdf_writer.write(output_file)

In [None]:
In both examples, we first open the PDF file for reading and writing. Then, we obtain a PageObject using
pdf_reader.getPage(0) to access the first page (you can specify the page number you want to rotate).
Finally, we use either rotateClockwise() or rotateCounterClockwise() to rotate the page as needed and 
save the modified PDF to a new file

In [None]:
6. What is the difference between a Run object and a Paragraph object?

In [None]:
In the context of Microsoft Word documents and document processing libraries like python-docx, "Run" and
"Paragraph" are two fundamental elements that represent different parts of the text within a document.

In [None]:
1. Run Object:

. A "Run" is the smallest, inline element within a paragraph.
. It represents a contiguous run of text with the same formatting properties (e.g., font style, size, color).
. Runs are used for applying specific formatting to portions of text within a paragraph. For example, you
  can have a single paragraph with different runs for bold, italic, and underlined text.
. Runs can contain text, but they do not represent entire paragraphs or line breaks.
. You can apply formatting to runs, such as bold, italic, underline, font color, and more.

2. Paragraph Object:

. A "Paragraph" is a larger block-level element that represents a collection of text.
. It represents a single unit of text separated by paragraph breaks (i.e., pressing the Enter key).
. Paragraphs can contain one or more runs of text with various formatting.
. They are used to structure the document and represent distinct sections or blocks of text.
. Paragraphs can have their own paragraph-level formatting, such as alignment, indentation, spacing,
  and borders.
    
Here's a simple example of how you might use both Run and Paragraph objects in a Word document using the
python-docx library:

In [None]:
from docx import Document

# Create a new Document
doc = Document()

# Create a paragraph
p = doc.add_paragraph()

# Add text with different formatting using Runs
run1 = p.add_run("This is bold text.")
run1.bold = True

run2 = p.add_run(" This is italic text.")
run2.italic = True

# Create another paragraph
p2 = doc.add_paragraph("This is a regular paragraph with no formatting.")

# Save the document
doc.save("example.docx")

In [None]:
In this example, you create a document with two paragraphs. The first paragraph contains two runs with
different formatting (bold and italic), while the second paragraph is a regular paragraph with no specific
run-level formatting.

In summary, Runs are used to format portions of text within a paragraph, while Paragraphs are used to
structure and represent blocks of text in a document.

In [None]:
7. How do you obtain a list of Paragraph objects for a Document object that’s stored in a variable
named doc?

In [None]:
To obtain a list of Paragraph objects for a Document object stored in a variable named doc using the
python-docx library, you can use the document.paragraphs attribute. 
Here's how you can do it

In [None]:
from docx import Document

# Load a document (replace 'your_document.docx' with the actual file path)
doc = Document('your_document.docx')

# Get a list of Paragraph objects in the document
paragraphs = doc.paragraphs

# Now, 'paragraphs' is a list containing all the Paragraph objects in the document


In [None]:
In this code:

. We use the Document constructor to load a Word document, replacing 'your_document.docx' with the actual
  file path of your document.

. We access the paragraphs attribute of the Document object doc. This attribute contains a list of all the
  Paragraph objects in the document.

After obtaining the list of Paragraph objects in the paragraphs variable, you can iterate through them
to access and manipulate the text and formatting of each paragraph in the document.

In [None]:
8. What type of object has bold, underline, italic, strike, and outline variables?

In [None]:
The attributes bold, underline, italic, strike, and outline are typically associated with a Run object
in document processing libraries like python-docx for working with Microsoft Word documents.

A Run object represents a contiguous run of text within a paragraph, and it allows you to apply specific
formatting to that text. These attributes are used to control the formatting of the text within the Run.
Here's what each of these attributes does:

. bold: This attribute is used to make the text within the Run bold. You can set it to True or False to
  apply or remove bold formatting.

. underline: It is used to underline the text within the Run. You can set it to different underline styles
  such as 'single', 'double', 'dotted', etc., or None to remove underlining.

. italic: This attribute is used to italicize the text within the Run. You can set it to True or False to
  apply or remove italic formatting.

 . strike: It is used to apply strikethrough formatting to the text within the Run. You can set it to True
   or False to apply or remove strikethrough.

.  outline: This attribute is used to apply outline or shadow formatting to the text within the Run. You can
   set it to True or False to apply or remove outline formatting.

Here's an example of how you might use these attributes with a Run object in the python-docx library:

python

In [None]:
from docx import Document

# Create a new Document
doc = Document()

# Create a paragraph
p = doc.add_paragraph()

# Add text with formatting using a Run
run = p.add_run("Formatted Text")

# Apply formatting attributes
run.bold = True
run.underline = True
run.italic = True
run.strike = True
run.outline = True

# Save the document
doc.save("formatted_example.docx")


In [None]:
In this example, we create a Run object within a paragraph and apply various formatting attributes to
the text within that Run. These attributes allow you to control the appearance of text in your Word
document

In [None]:
9. What is the difference between False, True, and None for the bold variable?

In [None]:
In the context of document processing libraries like python-docx, the bold variable, which is used with
a Run object to control the bold formatting of text, can take three different values: False, True, and
None. Here's what each of these values means:

1. False:

When bold is set to False, it means that the text within the Run should not be displayed in bold.
This is used to remove bold formatting from the text. 

2. True:

When bold is set to True, it means that the text within the Run should be displayed in bold.
This is used to apply bold formatting to the text.

3. None:

. When bold is set to None, it typically means that the text within the Run should use the default formatting
  for bold.
. This means that the text's boldness is determined by the document's default styles or the styles applied
  to the paragraph or the text run. Setting bold to None is often used when you want to inherit the boldness
  from higher-level styles and not explicitly apply or remove bold formatting.

Here's an example that demonstrates the use of these values:

In [None]:
from docx import Document

# Create a new Document
doc = Document()

# Create a paragraph
p = doc.add_paragraph()

# Add text with formatting using a Run
run = p.add_run("Formatted Text")

# Apply different bold formatting
run.bold = True  # Apply bold formatting
run2 = p.add_run(" Unbolded Text")
run2.bold = False  # Remove bold formatting

run3 = p.add_run(" Inherited Boldness")
run3.bold = None  # Inherit bold formatting from higher-level styles

# Save the document
doc.save("bold_example.docx")


In [None]:
In this example:

. The first Run is set to bold (True), applying bold formatting.
. The second Run is added right after the first one, and its bold formatting is explicitly set to False,
  removing the bold formatting.
. The third Run is set to None, which means it inherits the boldness from higher-level styles, such as
  paragraph or document styles.
    
Using these values, you can control how text is displayed in terms of boldness in your Word documents.

In [None]:
10. How do you create a Document object for a new Word document?

In [None]:

To create a Document object for a new Word document using the python-docx library in Python, you can
follow these steps:

1. Import the Document class: First, you need to import the Document class from the docx module.

2. Create a new Document object: Use the Document() constructor to create a new, empty Word document.

3. Add content to the document: You can add content to the document, such as paragraphs, text, tables,
   and more, using the methods provided by the Document object.

4. Save the document: After adding content, save the document to a file using the save() method of the
   Document object.

Here's an example of how to create a new Word document and add a paragraph of text to it

In [None]:
from docx import Document

# Create a new Document object
doc = Document()

# Add content to the document
doc.add_heading('My New Document', 0)  # Add a heading
doc.add_paragraph('This is a paragraph of text.')  # Add a paragraph of text

# Save the document to a file
doc.save('new_document.docx')

# You can now open 'new_document.docx' in a Word processor to see the result.


In [None]:
In this example:

. We import the Document class from the docx module.
. We create a new Document object named doc.
. We add content to the document using add_heading() to insert a heading and add_paragraph() to add a
  paragraph of text.
. Finally, we save the document to a file named 'new_document.docx' using save().

You can further customize and populate the document with various elements like tables, images, bullet
points, and more by using the appropriate methods provided by the Document object.

In [None]:
11. How do you add a paragraph with the text 'Hello, there!' to a Document object stored in a
variable named doc?

In [None]:
To add a paragraph with the text 'Hello, there!' to a Document object stored in a variable named doc
using the python-docx library, you can use the add_paragraph() method. Here's how you can do it:

In [None]:
from docx import Document

# Assuming you already have a Document object stored in a variable named 'doc'

# Add a paragraph with the text 'Hello, there!'
paragraph_text = 'Hello, there!'
doc.add_paragraph(paragraph_text)

# Save the document if needed
# doc.save('your_document.docx')


In [None]:
In this code:

We import the Document class from the docx module.

. Assuming that you already have a Document object stored in the doc variable, we add a new paragraph to it
  using the add_paragraph() method.

. The text 'Hello, there!' is stored in the paragraph_text variable, which is then passed as an argument to
  the add_paragraph() method.

. Optionally, you can save the document to a file using the save() method if you want to persist the changes
  to the document. Uncomment the doc.save('your_document.docx') line and replace 'your_document.docx' with
  the desired file name and path.

This code will add a new paragraph with the specified text to the existing Document object in the doc
variable.

In [None]:
12. What integers represent the levels of headings available in Word documents?

In [None]:
In Word documents, headings are typically organized into levels, and each level is associated with an
integer value. These integer values represent the hierarchical structure of the document. The most
commonly used heading levels and their corresponding integer values are as follows:

Heading 1: Level 1 (integer value: 1)
Heading 2: Level 2 (integer value: 2)
Heading 3: Level 3 (integer value: 3)
Heading 4: Level 4 (integer value: 4)
Heading 5: Level 5 (integer value: 5)
Heading 6: Level 6 (integer value: 6)
    
These levels are used to create a hierarchy of headings in a Word document, with Heading 1 being the
highest level and Heading 6 the lowest. The choice of heading level affects the formatting and positioning
of the text in the document, and it helps readers understand the document's structure.

You can apply these heading levels in Word processors like Microsoft Word to format your document's
headings and create a table of contents or outline. Additionally, when working with document processing
libraries like python-docx, you can programmatically apply these heading levels to your document's content.