Skip to content

Commit c9e6c2f

Browse files
read comments and markdowns for more clarity
1 parent fb54740 commit c9e6c2f

File tree

1 file changed

+277
-0
lines changed

1 file changed

+277
-0
lines changed

Assignment-12.ipynb

+277
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,277 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"## PyPDF2 MODULE:Reading text content from PDFs and crafting new PDFs from existing documents.PyPDF2 does not have a way to extract images, charts, or other media from PDF documents, but it can extract text and return it as a Python string."
8+
]
9+
},
10+
{
11+
"cell_type": "code",
12+
"execution_count": 1,
13+
"metadata": {},
14+
"outputs": [
15+
{
16+
"data": {
17+
"text/plain": [
18+
"' A Simple PDF File This is a small demonstration .pdf file - just for use in the Virtual Mechanics tutorials. More text. And more text. And more text. And more text. And more text. And more text. And more text. And more text. And more text. And more text. And more text. Boring, zzzzz. And more text. And more text. And more text. And more text. And more text. And more text. And more text. And more text. And more text. And more text. And more text. And more text. And more text. And more text. And more text. And more text. Even more. Continued on page 2 ...'"
19+
]
20+
},
21+
"execution_count": 1,
22+
"metadata": {},
23+
"output_type": "execute_result"
24+
}
25+
],
26+
"source": [
27+
"## we give file object executing open function.\n",
28+
"import PyPDF2 as pf\n",
29+
"pdfFileObj = open('Sample.pdf', 'rb')\n",
30+
"pdfReader = pf.PdfFileReader(pdfFileObj)\n",
31+
"page = pdfReader.getPage(0)\n",
32+
"page.extractText()"
33+
]
34+
},
35+
{
36+
"cell_type": "markdown",
37+
"metadata": {},
38+
"source": [
39+
"## 'rb' for PdfFileReader() and 'wb' for PdfFileWriter() "
40+
]
41+
},
42+
{
43+
"cell_type": "code",
44+
"execution_count": 5,
45+
"metadata": {},
46+
"outputs": [],
47+
"source": [
48+
"## If i want to read 5th page i need to give 4 as a getPage function() as paging starts from 0\n",
49+
"pdfFileReader = pdfReader.getPage(0)"
50+
]
51+
},
52+
{
53+
"cell_type": "code",
54+
"execution_count": 4,
55+
"metadata": {},
56+
"outputs": [
57+
{
58+
"data": {
59+
"text/plain": [
60+
"PyPDF2.pdf.PageObject"
61+
]
62+
},
63+
"execution_count": 4,
64+
"metadata": {},
65+
"output_type": "execute_result"
66+
}
67+
],
68+
"source": [
69+
"## It stores PageObject\n",
70+
"type(pdfFileReader)"
71+
]
72+
},
73+
{
74+
"cell_type": "code",
75+
"execution_count": null,
76+
"metadata": {},
77+
"outputs": [],
78+
"source": [
79+
"## If the pdf is encrypted with a password swordfish.I need to call decrypt function giving swordfish as parameter\n",
80+
"decrypt('swordfish')"
81+
]
82+
},
83+
{
84+
"cell_type": "markdown",
85+
"metadata": {},
86+
"source": [
87+
"## To rotate page i will use rotateClockwise() and rotateCounterClockwise() functions giving integer values as degrees in arguments"
88+
]
89+
},
90+
{
91+
"cell_type": "markdown",
92+
"metadata": {},
93+
"source": [
94+
"## DOCX Module:"
95+
]
96+
},
97+
{
98+
"cell_type": "markdown",
99+
"metadata": {},
100+
"source": [
101+
"### The difference between Run object and paragraph object is paragraph can contain many bold Italic sentences and run objects are continuous words within a paragraph."
102+
]
103+
},
104+
{
105+
"cell_type": "code",
106+
"execution_count": 22,
107+
"metadata": {},
108+
"outputs": [
109+
{
110+
"data": {
111+
"text/plain": [
112+
"5"
113+
]
114+
},
115+
"execution_count": 22,
116+
"metadata": {},
117+
"output_type": "execute_result"
118+
}
119+
],
120+
"source": [
121+
"import docx\n",
122+
"doc = docx.Document('file-sample_100kB.docx')\n",
123+
"len(doc.paragraphs) "
124+
]
125+
},
126+
{
127+
"cell_type": "code",
128+
"execution_count": 29,
129+
"metadata": {},
130+
"outputs": [
131+
{
132+
"name": "stdout",
133+
"output_type": "stream",
134+
"text": [
135+
"This is Title: DATASCIENCE \n",
136+
"This is 1st Paragraph: My name is AkashBorgalli I live in Mumbai.\n",
137+
"This is 2nd Paragraph: I am a MuleSoft Developer in Capgemini. \n"
138+
]
139+
}
140+
],
141+
"source": [
142+
"print(\"This is Title: \",doc.paragraphs[0].text)\n",
143+
"print(\"This is 1st Paragraph: \",doc.paragraphs[3].text)\n",
144+
"print(\"This is 2nd Paragraph: \",doc.paragraphs[4].text)"
145+
]
146+
},
147+
{
148+
"cell_type": "code",
149+
"execution_count": 24,
150+
"metadata": {},
151+
"outputs": [
152+
{
153+
"data": {
154+
"text/plain": [
155+
"9"
156+
]
157+
},
158+
"execution_count": 24,
159+
"metadata": {},
160+
"output_type": "execute_result"
161+
}
162+
],
163+
"source": [
164+
"## There would be 129 runs happening in paragph 1 \n",
165+
"len(doc.paragraphs[3].runs)"
166+
]
167+
},
168+
{
169+
"cell_type": "code",
170+
"execution_count": 25,
171+
"metadata": {},
172+
"outputs": [
173+
{
174+
"data": {
175+
"text/plain": [
176+
"'My name'"
177+
]
178+
},
179+
"execution_count": 25,
180+
"metadata": {},
181+
"output_type": "execute_result"
182+
}
183+
],
184+
"source": [
185+
"##This word 'My name' is simple words\n",
186+
"doc.paragraphs[3].runs[0].text"
187+
]
188+
},
189+
{
190+
"cell_type": "code",
191+
"execution_count": 28,
192+
"metadata": {},
193+
"outputs": [
194+
{
195+
"name": "stdout",
196+
"output_type": "stream",
197+
"text": [
198+
"is \n",
199+
"AkashBorgalli\n"
200+
]
201+
}
202+
],
203+
"source": [
204+
"print(doc.paragraphs[3].runs[2].text)\n",
205+
"print(doc.paragraphs[3].runs[3].text)"
206+
]
207+
},
208+
{
209+
"cell_type": "markdown",
210+
"metadata": {},
211+
"source": [
212+
"## I will use doc.paragraphs[]"
213+
]
214+
},
215+
{
216+
"cell_type": "markdown",
217+
"metadata": {},
218+
"source": [
219+
"## Run object has these bold,Italic and Outline variables"
220+
]
221+
},
222+
{
223+
"cell_type": "markdown",
224+
"metadata": {},
225+
"source": [
226+
"- True makes the Run object bold\n",
227+
"- False makes the Run object not bold, no matter what the style’s bold setting is.\n",
228+
"- None will make the Run object just use the style’s bold setting."
229+
]
230+
},
231+
{
232+
"cell_type": "markdown",
233+
"metadata": {},
234+
"source": [
235+
"## I will call docx.Document() function"
236+
]
237+
},
238+
{
239+
"cell_type": "code",
240+
"execution_count": null,
241+
"metadata": {},
242+
"outputs": [],
243+
"source": [
244+
"## adding paragraph\n",
245+
"doc.add_paragraph('Hello there!')"
246+
]
247+
},
248+
{
249+
"cell_type": "markdown",
250+
"metadata": {},
251+
"source": [
252+
"## The integers 0, 1, 2, 3, and 4 where 0 is considered with Biggest Heading and 4 as Smallest."
253+
]
254+
}
255+
],
256+
"metadata": {
257+
"kernelspec": {
258+
"display_name": "Python 3",
259+
"language": "python",
260+
"name": "python3"
261+
},
262+
"language_info": {
263+
"codemirror_mode": {
264+
"name": "ipython",
265+
"version": 3
266+
},
267+
"file_extension": ".py",
268+
"mimetype": "text/x-python",
269+
"name": "python",
270+
"nbconvert_exporter": "python",
271+
"pygments_lexer": "ipython3",
272+
"version": "3.8.5"
273+
}
274+
},
275+
"nbformat": 4,
276+
"nbformat_minor": 4
277+
}

0 commit comments

Comments
 (0)