PyAI Extract & Summarize provides a unified command-line interface to extract and summarize content from files like PDFs, images, spreadsheets, and CSVs.
It combines Python utilities with AI models powered by the OpenAI API, helping you quickly turn raw documents into clear, usable insights.
This repository is powered by: Py.ai
Clone this repository:
git clone git@github.com:cedsic/pyai-extract-summarize.git
cd py_aiCreate a virtual environment and install dependencies:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -e .Some tools require OpenAI API access. Create a .env file in the root and add your API key:
OPENAI_API_KEY=your_api_key_here
For image OCR, make sure Tesseract is installed:
- Ubuntu/Debian:
sudo apt update
sudo apt install tesseract-ocr- MacOS:
brew install tesseract- Windows:
Download and install from Tesseract OCR and add it to your PATH.
Access all tools through the unified CLI:
py-ai --helpThe CLI provides two main commands:
extract: Extract raw text or data from a file.summarize: Extract and then summarize content using AI.
Example usage:
py-ai extract /path/to/file.pdf
py-ai summarize /path/to/file.xlsx --max-chars 800-
Extract text from a PDF
Description: Extracts plain text from PDFs.
CLI:py-ai extract /path/to/file.pdf
-
Summarize a PDF using AI
Description: Summarizes PDFs using AI models (We are using the OpenAI API). Useful for quickly understanding long documents.
CLI:py-ai summarize /path/to/file.pdf --max-chars 800
-
Extract text from an image
Description: Extracts plain text from images using Python (OpenCV + pytesseract). Useful for scanned documents, screenshots, or any image containing text.
CLI:py-ai extract /path/to/file.png
-
Summarize an image using AI
Description: First extracts text from an image, then summarizes it using AI models (OpenAI API). Ideal for quickly understanding text-heavy images.
CLI:py-ai summarize /path/to/file.png --max-chars 800
-
Extract text from an XLSX or CSV file
Description: Extracts plain text from spreadsheets (XLSX or CSV). It combines all cell values into a readable format.
CLI:py-ai extract /path/to/file.xlsx py-ai extract /path/to/file.csv
-
Summarize an XLSX or CSV using AI
Description: Summarizes the contents of XLSX or CSV files using AI models (OpenAI API). Great for generating quick overviews of large datasets or reports.
CLI:py-ai summarize /path/to/file.xlsx --max-chars 800 py-ai summarize /path/to/file.csv --max-chars 800
We welcome contributions! See Contributing for guidelines.
If you find issues or have suggestions, you can contact us.
This project is licensed under the MIT License.
This repository is powered by: Py.ai