Markdown Agent

A Python-based tool (requires Python 3.10+) that extracts content from various document formats (PDF, DOCX, etc.) using MarkItDown and refines the resulting Markdown using Google AI Studio (gemini-3.1-flash-lite-preview).

Warning

Data Privacy Notice: When using this tool, your document's Markdown content is sent to Google's AI Studio servers for processing. Do not use this tool with highly sensitive or strictly confidential data if your organization's policy prohibits sharing data with third-party LLM providers.

Features

Document Extraction: Uses markitdown to convert complex documents into raw Markdown.
LLM Cleanup: Leverages Gemini 3.1 Flash Lite via Google AI Studio to fix broken tables, inconsistent headings, and formatting artifacts.
Safety & Validation: Robust input/output path validation and specific exception handling for API and file operations.
Easy Configuration: Managed environment variables via .env files.

Prerequisites

Python 3.10+
A Google AI Studio API key.

Installation

From Source

Clone the repository and install the package in editable mode:

git clone <GITHUB_REPO_URL>
cd markdown-agent
pip install -e .

Conda

Create and activate a ready-to-use environment from the provided environment.yml:

conda env create -f environment.yml
conda activate markdown-agent

Setup

1. Obtain Google AI Studio API Key

Go to Google AI Studio.
Click on "Get API key" in the left sidebar.
Click "Create API key in new project" (or use an existing one).
Copy your key.

2. Configure Environment Variables

Create a .env file in your working directory and add your Google API key:

GOOGLE_AI_STUDIO_KEY=your_google_api_key_here

The mda command will automatically load this file on startup.

Usage

mda path/to/document.pdf -o path/to/output.md

Arguments

Argument	Description
`input`	Path to the source document (PDF, DOCX, etc.)
`-o`, `--output`	Path to the output Markdown file (parent directory is created automatically)
`-v`, `--verbose`	Enable verbose (DEBUG) logging
`-s`, `--silent`	Disable all but ERROR logging

Reliability & Troubleshooting

Rate Limits

This tool uses Gemini 3.1 Flash Lite Preview via Google AI Studio, which has the following quota for free-tier users:

RPM (Requests Per Minute): 15
TPM (Tokens Per Minute): 250,000
RPD (Requests Per Day): 500

Note

Based on your specific configuration, the following limits are expected: 15 RPM, 250,000 TPM, and 500 RPD. Users exceeding these limits will encounter 429: Too Many Requests errors.

Common Issues

API Errors (503/429): Gemini API may occasionally return service errors or rate limit exceptions. If you encounter a connection error, wait a few seconds and retry.
Token Limits: This tool uses heavy-duty models capable of handling large contexts (up to 1M tokens), but extremely long documents might still hit limits or perform slower.

Project Structure

markdown_agent/cli.py: Core logic for extraction, LLM processing, and validation.
pyproject.toml: Package configuration and dependencies.
environment.yml: Conda environment definition.
.env: Environment variable configuration (not committed to git).

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github		.github
markdown_agent		markdown_agent
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Markdown Agent

Features

Prerequisites

Installation

From Source

Conda

Setup

1. Obtain Google AI Studio API Key

2. Configure Environment Variables

Usage

Arguments

Reliability & Troubleshooting

Rate Limits

Common Issues

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Markdown Agent

Features

Prerequisites

Installation

From Source

Conda

Setup

1. Obtain Google AI Studio API Key

2. Configure Environment Variables

Usage

Arguments

Reliability & Troubleshooting

Rate Limits

Common Issues

Project Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages