Skip to content

Efficient tool for PDF lists items extraction to CSV conversion and CSV file merging, leveraging Python's powerful libraries.

Notifications You must be signed in to change notification settings

GeroZayas/PDF-itemslist-extractor

Repository files navigation

📄 PDF Items List Extractor and CSV Utility Tool

A versatile tool designed to streamline the extraction of list items from PDF documents and the merging of CSV files, ensuring unique identification across datasets.

🛠️ Features

  • Extract Items from PDF: Convert list-like structures in PDF documents into structured CSV format.
  • Merge CSV Files: Combine multiple CSV files into a single file, maintaining unique IDs through a newly generated sequential ID column.

🖥️ Prerequisites

  • Python 3.6+
  • PyMuPDF (fitz)
  • Pandas
  • Typer

🚀 Installation

Clone the repository and install dependencies:

git clone https://github.com/GeroZayas/PDF-itemslist-extractor.git

cd PDF-itemslist-extractor

pip install -r requirements.txt

📝 Usage

Extract Items from PDF

python your_script_name.py extract_and_save./path/to/your/pdf/file.pdf./desired/output/path/

Merge Multiple CSV Files

python your_script_name.py merge_csv_files./file1.csv./file2.csv./merged_output.csv

📁 Example

Assuming you have a PDF named example.pdf and two CSV files named data1.csv and data2.csv, you can extract items from the PDF and merge the CSV files as follows:

python your_script_name.py extract_and_save./example.pdf./extracted_items.csv

python your_script_name.py merge_csv_files./data1.csv./data2.csv./merged_data.csv

🎯 Contributing

Contributions are welcome Feel free to submit a pull request or open an issue to discuss improvements or report bugs.

👤 Author

Gero Zayas - @gerozayas

📧 Contact

📧 gerozayas@gmail.com

🌐 Gero Zayas Portfolio

About

Efficient tool for PDF lists items extraction to CSV conversion and CSV file merging, leveraging Python's powerful libraries.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages