Skip to content

A Python + AI command-line tool for extracting and summarizing text from documents (PDF, image, XLSX, CSV) using the OpenAI API.

License

Notifications You must be signed in to change notification settings

cedsic/pyai-extract-summarize

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyAI Extract & Summarize

Python License

PyAI Extract & Summarize provides a unified command-line interface to extract and summarize content from files like PDFs, images, spreadsheets, and CSVs.
It combines Python utilities with AI models powered by the OpenAI API, helping you quickly turn raw documents into clear, usable insights.

This repository is powered by: Py.ai


Table of Contents


Installation

Clone this repository:

git clone git@github.com:cedsic/pyai-extract-summarize.git
cd py_ai

Create a virtual environment and install dependencies:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -e .

Some tools require OpenAI API access. Create a .env file in the root and add your API key:

OPENAI_API_KEY=your_api_key_here

For image OCR, make sure Tesseract is installed:

  • Ubuntu/Debian:
sudo apt update
sudo apt install tesseract-ocr
  • MacOS:
brew install tesseract
  • Windows:
    Download and install from Tesseract OCR and add it to your PATH.

Usage

Access all tools through the unified CLI:

py-ai --help

The CLI provides two main commands:

  • extract: Extract raw text or data from a file.
  • summarize: Extract and then summarize content using AI.

Example usage:

py-ai extract /path/to/file.pdf
py-ai summarize /path/to/file.xlsx --max-chars 800

Tools

PDF Tools

  1. Extract text from a PDF
    Description: Extracts plain text from PDFs.
    CLI:

    py-ai extract /path/to/file.pdf
  2. Summarize a PDF using AI
    Description: Summarizes PDFs using AI models (We are using the OpenAI API). Useful for quickly understanding long documents.
    CLI:

    py-ai summarize /path/to/file.pdf --max-chars 800

Image Tools

  1. Extract text from an image
    Description: Extracts plain text from images using Python (OpenCV + pytesseract). Useful for scanned documents, screenshots, or any image containing text.
    CLI:

    py-ai extract /path/to/file.png
  2. Summarize an image using AI
    Description: First extracts text from an image, then summarizes it using AI models (OpenAI API). Ideal for quickly understanding text-heavy images.
    CLI:

    py-ai summarize /path/to/file.png --max-chars 800

XLSX/CSV Tools

  1. Extract text from an XLSX or CSV file
    Description: Extracts plain text from spreadsheets (XLSX or CSV). It combines all cell values into a readable format.
    CLI:

    py-ai extract /path/to/file.xlsx
    py-ai extract /path/to/file.csv
  2. Summarize an XLSX or CSV using AI
    Description: Summarizes the contents of XLSX or CSV files using AI models (OpenAI API). Great for generating quick overviews of large datasets or reports.
    CLI:

    py-ai summarize /path/to/file.xlsx --max-chars 800
    py-ai summarize /path/to/file.csv --max-chars 800

Contributing

We welcome contributions! See Contributing for guidelines.

If you find issues or have suggestions, you can contact us.

License

This project is licensed under the MIT License.

Links

This repository is powered by: Py.ai

About

A Python + AI command-line tool for extracting and summarizing text from documents (PDF, image, XLSX, CSV) using the OpenAI API.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages