## [OmniParse](https://github.com/adithya-s-k/omniparse)
Seamlessly ingest any data and get structured, actionable output.

![OmniParse](https://raw.githubusercontent.com/adithya-s-k/omniparse/main/docs/assets/hero_image.png)
[![GitHub Stars](https://img.shields.io/github/stars/adithya-s-k/omniparse?style=social)](https://github.com/adithya-s-k/omniparse/stargazers)
[![GitHub Forks](https://img.shields.io/github/forks/adithya-s-k/omniparse?style=social)](https://github.com/adithya-s-k/omniparse/network/members)
[![GitHub Issues](https://img.shields.io/github/issues/adithya-s-k/omniparse)](https://github.com/adithya-s-k/omniparse/issues)
[![GitHub Pull Requests](https://img.shields.io/github/issues-pr/adithya-s-k/omniparse)](https://github.com/adithya-s-k/omniparse/pulls)
[![License](https://img.shields.io/github/license/adithya-s-k/omniparse)](https://github.com/adithya-s-k/omniparse/blob/main/LICENSE)



## Features
✅ Completely local, no external APIs  
✅ Supports 10+ file types  
✅ Convert documents, multimedia, and web pages to high-quality structured markdown  
✅ Table extraction, image extraction/captioning, audio/video transcription, web page crawling  
✅ Easily deployable using Docker and Skypilot  
✅ Colab friendly  

### Problem Statement:
It's challenging to process data as it comes in different shapes and sizes. OmniParse aims to be an ingestion/parsing platform where you can ingest any type of data, such as documents, images, audio, video, and web content, and get the most structured and actionable output that is GenAI (LLM) friendly.

## Coming Soon
⭐ Dynamic chunking and structured data extraction based on specified Schema
🛠️ One magic API: just feed in your file prompt what you want, and we will take care of the rest  
🔧 Dynamic model selection and support for external APIs  
📄 Batch processing for handling multiple files at once  
🦙 New open-source model to replace Surya OCR and Marker  

**Final goal** - replace all the different models currently being used with a single MultiModel Model to parse any type of data and get the data you need

📄 - [Documentation](https://docs.cognitivelab.in/) \
Created by [Adithya](https://x.com/adithya_s_k).

| Original PDF                                                                                                                                                                               | OmniParse-API                                                                                                                                                                           | PyPDF                                                                                                                                                               |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [![Original PDF](https://github.com/adithya-s-k/marker-api/raw/master/data/images/original\_pdf.png)](https://github.com/adithya-s-k/marker-api/blob/master/data/images/original\_pdf.png) | [![OmniParse-API](https://github.com/adithya-s-k/marker-api/raw/master/data/images/marker\_api.png)](https://github.com/adithya-s-k/marker-api/blob/master/data/images/marker\_api.png) | [![PyPDF](https://github.com/adithya-s-k/marker-api/raw/master/data/images/pypdf.png)](https://github.com/adithya-s-k/marker-api/blob/master/data/images/pypdf.png) |

In [21]:
## Clone the repository

!git clone https://github.com/adithya-s-k/omniparse.git
%cd omniparse
%pwd

Cloning into 'omniparse'...
remote: Enumerating objects: 657, done.[K
remote: Counting objects: 100% (122/122), done.[K
remote: Compressing objects: 100% (48/48), done.[K
remote: Total 657 (delta 92), reused 74 (delta 74), pack-reused 535 (from 1)[K
Receiving objects: 100% (657/657), 619.27 KiB | 18.21 MiB/s, done.
Resolving deltas: 100% (336/336), done.
/content/omniparse/omniparse/omniparse


'/content/omniparse/omniparse/omniparse'

In [22]:
## Install dependencies
## if you get a restart session warning you can ignore it

%pip install -e .

Obtaining file:///content/omniparse/omniparse/omniparse
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Collecting flash-attn<3.0.0,>=2.5.9 (from omniparse==0.0.1)
  Using cached flash_attn-2.7.4.post1-cp311-cp311-linux_x86_64.whl
Collecting markupsafe~=2.0 (from gradio<5.0.0,>=4.37.1->omniparse==0.0.1)
  Using cached MarkupSafe-2.1.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.0 kB)
Collecting triton<3,>=2.0.0 (from openai-whisper<20231118,>=20231117->omniparse==0.0.1)
  Using cached triton-2.3.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.4 kB)
INFO: pip is looking at multiple versions of torch to determine which version is compatible with other requirements. This could take a while.
Collecting torch<3.0.0,>=2.2.2 (from omnipa

In [23]:
%pip install transformers==4.41.2



In [24]:
# Update and install necessary packages
!apt-get update && apt-get install -y --no-install-recommends \
    wget \
    curl \
    unzip \
    git \
    libgl1 \
    libglib2.0-0 \
    curl \
    gnupg2 \
    ca-certificates \
    apt-transport-https \
    software-properties-common \
    libreoffice \
    ffmpeg \
    git-lfs \
    xvfb \
    && ln -s /usr/bin/python3 /usr/bin/python \
    && curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | bash \
    && wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | apt-key add - \
    && echo "deb http://dl.google.com/linux/chrome/deb/ stable main" > /etc/apt/sources.list.d/google-chrome.list \
    && apt-get update \
    && apt install python3-packaging \
    && apt-get install -y --no-install-recommends google-chrome-stable \
    && rm -rf /var/lib/apt/lists/*

# Download and install ChromeDriver
!CHROMEDRIVER_VERSION=$(curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE) && \
    wget -q -N https://chromedriver.storage.googleapis.com/$CHROMEDRIVER_VERSION/chromedriver_linux64.zip -P /tmp && \
    unzip -o /tmp/chromedriver_linux64.zip -d /tmp && \
    mv /tmp/chromedriver /usr/local/bin/chromedriver && \
    chmod +x /usr/local/bin/chromedriver && \
    rm /tmp/chromedriver_linux64.zip

# Set environment variables
import os
os.environ['CHROME_BIN'] = '/usr/bin/google-chrome'
os.environ['CHROMEDRIVER'] = '/usr/local/bin/chromedriver'
os.environ['DISPLAY'] = ':99'
os.environ['DBUS_SESSION_BUS_ADDRESS'] = '/dev/null'
os.environ['PYTHONUNBUFFERED'] = '1'

print("✅ Set up complete")

0% [Working]            Hit:1 http://archive.ubuntu.com/ubuntu jammy InRelease
0% [Waiting for headers] [Connecting to security.ubuntu.com (91.189.91.81)] [Co                                                                               Hit:2 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:3 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Hit:5 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:6 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:7 https://dl.google.com/linux/chrome/deb stable InRelease
Hit:8 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:9 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Hit:10 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
Hit:11 https://r2u.stat.illinois.edu/ubuntu jammy InRelease
Hit:12 https://packagecloud.io/github/

### Using Cloudflare tunnels (Recommended)
After the server is set up and cloudflare is available please go to /docs to access all the api endpoints

In [None]:
!wget https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb
!dpkg -i cloudflared-linux-amd64.deb

import subprocess
import threading
import time
import socket
import urllib.request

def iframe_thread(port):
  while True:
      time.sleep(0.5)
      sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
      result = sock.connect_ex(('127.0.0.1', port))
      if result == 0:
        break
      sock.close()
  print("\nOmniPrase API finished loading, trying to launch cloudflared (if it gets stuck here cloudflared is having issues)\n")

  p = subprocess.Popen(["cloudflared", "tunnel", "--url", "http://127.0.0.1:{}".format(port)], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  for line in p.stderr:
    l = line.decode()
    if "trycloudflare.com " in l:
      print("This is the URL to access OmniPrase:", l[l.find("http"):], end='')
    #print(l, end='')


threading.Thread(target=iframe_thread, daemon=True, args=(8000,)).start()

!python server.py --host 127.0.0.1 --port 8000 --documents --media --web

--2025-04-18 07:16:13--  https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb
Resolving github.com (github.com)... 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github.com/cloudflare/cloudflared/releases/download/2025.4.0/cloudflared-linux-amd64.deb [following]
--2025-04-18 07:16:14--  https://github.com/cloudflare/cloudflared/releases/download/2025.4.0/cloudflared-linux-amd64.deb
Reusing existing connection to github.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/106867604/d7e7703c-c0be-4512-b40f-145c402e03fd?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20250418%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250418T071614Z&X-Amz-Expires=300&X-Amz-Signature=ab9acd7a1240afc5b96e3b620dd1a49aa3dbab5609b3b7c8a07856f002ab

In [30]:
# 确保使用正确的CUDA版本（例如CUDA 11.8）
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# 或安装最新稳定版PyTorch
!pip install -U torch

Looking in indexes: https://download.pytorch.org/whl/cu118
Collecting torch
  Using cached torch-2.6.0-cp311-cp311-manylinux1_x86_64.whl.metadata (28 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Using cached nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Using cached nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Using cached nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Using cached nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Using cached nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Using cached nvidia_cufft_

In [33]:
# 卸载现有版本
!pip uninstall flash-attn

# 安装时禁用隔离构建，确保使用当前环境的PyTorch
!pip install flash-attn --no-build-isolation

Found existing installation: flash_attn 2.7.4.post1
Uninstalling flash_attn-2.7.4.post1:
  Would remove:
    /usr/local/lib/python3.11/dist-packages/flash_attn-2.7.4.post1.dist-info/*
    /usr/local/lib/python3.11/dist-packages/flash_attn/*
    /usr/local/lib/python3.11/dist-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so
    /usr/local/lib/python3.11/dist-packages/hopper/*
Proceed (Y/n)? y
  Successfully uninstalled flash_attn-2.7.4.post1
Collecting flash-attn
  Using cached flash_attn-2.7.4.post1-cp311-cp311-linux_x86_64.whl
Installing collected packages: flash-attn
Successfully installed flash-attn-2.7.4.post1


### Forward using localtunnel

In [None]:
!npm install -g localtunnel

import subprocess
import threading
import time
import socket
import urllib.request

def iframe_thread(port):
  while True:
      time.sleep(0.5)
      sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
      result = sock.connect_ex(('127.0.0.1', port))
      if result == 0:
        break
      sock.close()
  print("\Omniparse finished loading, trying to launch localtunnel (if it gets stuck here localtunnel is having issues)\n")

  print("The password/enpoint ip for localtunnel is:", urllib.request.urlopen('https://ipv4.icanhazip.com').read().decode('utf8').strip("\n"))
  p = subprocess.Popen(["lt", "--port", "{}".format(port)], stdout=subprocess.PIPE)
  for line in p.stdout:
    print(line.decode(), end='')


threading.Thread(target=iframe_thread, daemon=True, args=(8000,)).start()

!python server.py --host 127.0.0.1 --port 8000 --documents --media --web