Simple AI Processor

A config-driven script to process markdown files with AI and save the outputs. Perfect for GitHub Actions automation.

Features

Config-based: All settings in a single JSON file
Incremental processing: Tracks last processed file, only processes new entries
Ordered processing: Processes files from oldest to newest (by modification time)
GitHub Actions ready: No manual input required, can run fully automated
Auto-updating: Config file is updated after each successful processing

Setup

Ensure you have the appropriate API key in your .env file or environment variables:
- For OpenAI models (gpt-4o, gpt-4, etc.): OPENAI_API_KEY
- For Claude models (claude-sonnet-4, etc.): ANTHROPIC_API_KEY
Create a config file (see example below)

Config File Format

Create a JSON config file with the following fields:

{
  "input_folder": "journals",
  "output_folder": "ai_outputs",
  "prompt": "Summarize the main themes and insights from this journal entry.",
  "model": "gpt-4o",
  "max_batch_size": null,
  "last_processed_file": null
}

Config Fields

input_folder (required): Path to folder containing markdown files to process (searches recursively)
output_folder (required): Where to save AI-generated outputs
prompt / prompt_file / prompt_files (required): The instruction(s) to send to the AI. Can be:
- "prompt": A single inline prompt string
- "prompt_file": A single file path containing the prompt
- "prompt_files": An array of prompts/files to combine (e.g., ["prompts/base.txt", "prompts/format.txt"])
model (optional): AI model to use. Supports OpenAI models (gpt-4o, gpt-4, etc.) and Claude models (claude-sonnet-4-20250514, etc.). Default: gpt-4o
max_batch_size (optional): Number of files to concatenate and send together in a single AI request. Set to null or 1 to process files individually. Useful for analyzing multiple entries together or controlling API costs. For example, setting this to 3 will concatenate 3 files together and send them as one combined request to the AI.
last_processed_file (auto-managed): Path to the last successfully processed file. Set to null to process all files.

Usage

Basic Example

python3 simple_ai_processor.py --config processor_config.json

How It Works

Script reads the config file
Finds all .md files in input_folder (recursively)
Sorts files by modification time (oldest first)
Skips files up to and including last_processed_file
Groups remaining files into batches of size max_batch_size (or processes individually if not set)
For each batch:
- Concatenates all files in the batch together with separators
- Sends the combined content to the AI model in a single request
- Saves output to output_folder (filename based on batch, e.g., 2024-01-10_to_2024-01-12.txt)
- Updates last_processed_file to the last file in the batch
If an error occurs, processing stops and config is not updated (allows retry)

Output Files

Output filenames depend on batch size:

Single file (batch_size = 1 or null): Uses same name as input
- Input: journals/2024-01-15.md
- Output: ai_outputs/2024-01-15.txt
Multiple files (batch_size > 1): Uses range format
- Input: journals/2024-01-10.md, journals/2024-01-12.md, journals/2024-01-15.md
- Output: ai_outputs/2024-01-10_to_2024-01-15.txt

GitHub Actions Integration

This script is designed to work in GitHub Actions:

- name: Process new markdown files
  env:
    OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
  run: |
    python3 simple_ai_processor.py --config processor_config.json

Note: Only include the API key(s) that match the model you're using in your config.

The config file can be committed to the repo, and last_processed_file will be updated automatically as part of the workflow.

Example Workflows

Example 1: Individual Processing (batch_size = null or 1)

First run (config has "last_processed_file": null):

Found 3 file(s) to process
Processing: journals/2024-01-10.md
✓ Saved output to: ai_outputs/2024-01-10.txt
Processing: journals/2024-01-15.md
✓ Saved output to: ai_outputs/2024-01-15.txt
Processing: journals/2024-01-20.md
✓ Saved output to: ai_outputs/2024-01-20.txt

Example 2: Batch Processing (batch_size = 3)

First run (config has "last_processed_file": null and "max_batch_size": 3):

Found 5 file(s) to process

Processing batch of 3 files:
  - journals/2024-01-10.md
  - journals/2024-01-15.md
  - journals/2024-01-20.md
✓ Saved output to: ai_outputs/2024-01-10_to_2024-01-20.txt

Processing batch of 2 files:
  - journals/2024-01-25.md
  - journals/2024-01-30.md
✓ Saved output to: ai_outputs/2024-01-25_to_2024-01-30.txt

Second run (no new files):

No new files to process.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
providers		providers
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
processor_config.example.json		processor_config.example.json
requirements.txt		requirements.txt
simple_ai_processor.py		simple_ai_processor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Simple AI Processor

Features

Setup

Config File Format

Config Fields

Usage

Basic Example

How It Works

Output Files

GitHub Actions Integration

Example Workflows

Example 1: Individual Processing (batch_size = null or 1)

Example 2: Batch Processing (batch_size = 3)

About

Uh oh!

Releases

Packages

Languages

SeeingSharper/ai-file-processor

Folders and files

Latest commit

History

Repository files navigation

Simple AI Processor

Features

Setup

Config File Format

Config Fields

Usage

Basic Example

How It Works

Output Files

GitHub Actions Integration

Example Workflows

Example 1: Individual Processing (batch_size = null or 1)

Example 2: Batch Processing (batch_size = 3)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages