PyAI Extract & Summarize

PyAI Extract & Summarize provides a unified command-line interface to extract and summarize content from files like PDFs, images, spreadsheets, and CSVs.
It combines Python utilities with AI models powered by the OpenAI API, helping you quickly turn raw documents into clear, usable insights.

This repository is powered by: Py.ai

Installation

Clone this repository:

git clone git@github.com:cedsic/pyai-extract-summarize.git
cd py_ai

Create a virtual environment and install dependencies:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -e .

Some tools require OpenAI API access. Create a .env file in the root and add your API key:

OPENAI_API_KEY=your_api_key_here

For image OCR, make sure Tesseract is installed:

Ubuntu/Debian:

sudo apt update
sudo apt install tesseract-ocr

MacOS:

brew install tesseract

Windows:
Download and install from Tesseract OCR and add it to your PATH.

Usage

Access all tools through the unified CLI:

py-ai --help

The CLI provides two main commands:

extract: Extract raw text or data from a file.
summarize: Extract and then summarize content using AI.

Example usage:

py-ai extract /path/to/file.pdf
py-ai summarize /path/to/file.xlsx --max-chars 800

Tools

PDF Tools

Extract text from a PDF
Description: Extracts plain text from PDFs.
CLI:
```
py-ai extract /path/to/file.pdf
```
Summarize a PDF using AI
Description: Summarizes PDFs using AI models (We are using the OpenAI API). Useful for quickly understanding long documents.
CLI:
```
py-ai summarize /path/to/file.pdf --max-chars 800
```

Image Tools

Extract text from an image
Description: Extracts plain text from images using Python (OpenCV + pytesseract). Useful for scanned documents, screenshots, or any image containing text.
CLI:
```
py-ai extract /path/to/file.png
```
Summarize an image using AI
Description: First extracts text from an image, then summarizes it using AI models (OpenAI API). Ideal for quickly understanding text-heavy images.
CLI:
```
py-ai summarize /path/to/file.png --max-chars 800
```

XLSX/CSV Tools

Extract text from an XLSX or CSV file
Description: Extracts plain text from spreadsheets (XLSX or CSV). It combines all cell values into a readable format.
CLI:
```
py-ai extract /path/to/file.xlsx
py-ai extract /path/to/file.csv
```
Summarize an XLSX or CSV using AI
Description: Summarizes the contents of XLSX or CSV files using AI models (OpenAI API). Great for generating quick overviews of large datasets or reports.
CLI:
```
py-ai summarize /path/to/file.xlsx --max-chars 800
py-ai summarize /path/to/file.csv --max-chars 800
```

Contributing

We welcome contributions! See Contributing for guidelines.

If you find issues or have suggestions, you can contact us.

License

This project is licensed under the MIT License.

Links

This repository is powered by: Py.ai

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
py_ai		py_ai
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PyAI Extract & Summarize

Table of Contents

Installation

Usage

Tools

PDF Tools

Image Tools

XLSX/CSV Tools

Contributing

License

Links

About

Uh oh!

Releases

Packages

Languages

License

cedsic/pyai-extract-summarize

Folders and files

Latest commit

History

Repository files navigation

PyAI Extract & Summarize

Table of Contents

Installation

Usage

Tools

PDF Tools

Image Tools

XLSX/CSV Tools

Contributing

License

Links

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages