PDF-EXTRACTOR

Convert PDF To Text

Did you ever wonder how to redact text in PDF documents? We have the solution for you. Simply convert your PDF document to text. With the help of Optical Character Recognition (OCR), you can extract any text from a PDF document into a simple text file.
And it’s simple: just upload your PDF and let us do the rest. After you provided your file, PDF2Go will use OCR to get the text from your PDF and save it as a TXT file.
main.py file contents

resume PDF Example

txt.py file contents which gets extracted from PDF

REQIREMENTS

pip install PyPDF2

ABOUT

A Pure-Python library built as a PDF toolkit. It is capable of:

extracting document information (title, author, …) splitting documents page by page merging documents page by page cropping pages merging multiple pages into a single page encrypting and decrypting PDF files and more! By being Pure-Python, it should run on any Python platform without any dependencies on external libraries. It can also work entirely on StringIO objects rather than file streams, allowing for PDF manipulation in memory. It is therefore a useful tool for websites that manage or manipulate PDFs.

The PdfFileReader Class

class PyPDF2.PdfFileReader(stream, strict=True, warndest=None, overwriteWarnings=True)¶ Initializes a PdfFileReader object. This operation can take some time, as the PDF stream’s cross-reference tables are read into memory.

Parameters: stream – A File object or an object that supports the standard read and seek methods similar to a File object. Could also be a string representing a path to a PDF file. strict (bool) – Determines whether user should be warned of all problems and also causes some correctable problems to be fatal. Defaults to True. warndest – Destination for logging warnings (defaults to sys.stderr). overwriteWarnings (bool) – Determines whether to override Python’s warnings.py module with a custom implementation (defaults to True).

documentInfo()
getNumPages()
extractText()

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
c-cv.PNG		c-cv.PNG
c-main.PNG		c-main.PNG
c-txt.PNG		c-txt.PNG
main.py		main.py
mohit_resSep12.pdf		mohit_resSep12.pdf
text.txt		text.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

c-cv.PNG

c-cv.PNG

c-main.PNG

c-main.PNG

c-txt.PNG

c-txt.PNG

main.py

main.py

mohit_resSep12.pdf

mohit_resSep12.pdf

text.txt

text.txt

Repository files navigation

PDF-EXTRACTOR

Convert PDF To Text

main.py file contents

resume PDF Example

txt.py file contents which gets extracted from PDF

REQIREMENTS

pip install PyPDF2

ABOUT

The PdfFileReader Class

documentInfo()

getNumPages()

extractText()

About

Releases

Packages

Languages

MohitKumarMandhre/PDF-EXTRACTOR

Folders and files

Latest commit

History

Repository files navigation

PDF-EXTRACTOR

Convert PDF To Text

main.py file contents

resume PDF Example

txt.py file contents which gets extracted from PDF

REQIREMENTS

pip install PyPDF2

ABOUT

The PdfFileReader Class

documentInfo()

getNumPages()

extractText()

About

Resources

Stars

Watchers

Forks

Languages