This repository contains a Python application for converting PDF files to Excel files. The application uses various libraries to handle PDF extraction, OCR, and GUI functionalities.
To set up the environment for this project, you need to install the following Python libraries:
-
pandas
- Description: A powerful data manipulation and analysis library.
- Installation:
pip install pandas
-
PyPDF2
- Description: A pure Python PDF toolkit for splitting, merging, cropping, and transforming PDF pages.
- Installation:
pip install PyPDF2
-
pdf2image
- Description: A Python wrapper for
pdftoppmandpdfinfoto convert PDF pages to images. - Installation:
pip install pdf2image
- Description: A Python wrapper for
-
pytesseract
- Description: An OCR tool for Python to recognize and "read" the text embedded in images.
- Installation:
pip install pytesseract
-
Pillow
- Description: The Python Imaging Library (PIL) fork that adds some user-friendly features.
- Installation:
pip install Pillow
-
pdfplumber
- Description: A Python library for extracting content from PDFs, built on top of
pdfminer.six. - Installation:
pip install pdfplumber
- Description: A Python library for extracting content from PDFs, built on top of
-
openpyxl
- Description: A Python library to read/write Excel 2010 xlsx/xlsm/xltx/xltm files.
- Installation:
pip install openpyxl
The following built-in Python modules are also used in the project:
-
os
- Description: Provides a way of using operating system dependent functionality like reading or writing to the file system.
-
re
- Description: Provides support for regular expressions in Python.
-
tkinter
- Description: The standard GUI library for Python.
-
pathlib
- Description: Provides an object-oriented interface for filesystem paths.
To use the application, follow these steps:
- Clone the repository:
git clone https://github.com/Pradyumna744/PDF-TO-EXCEL-PYTHON-FOR-ADVICE-LETTERS.git