ChatPDF

Chat whith your own pdf files 😀

Demo version of End-to-end service to analize pdf documents using LLMs.

This is a two stage solution, first, aplied robust ocr engineering using DocTR to generate the dataset, and then fine tuning a LLM model using LangChain and Open AI Finally expose chat with FastAPI

OCR Engineering

Chat using FastAPI

System requirements

Ubuntu 20
Python >=3.10

Getting Started

Clone repo
create and activate virtual enviroment

python3 -m venv .venv
source .venv/bin/activate

Install dependences

python3 -m pip install --upgrade pip setuptools wheel
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
pip install "python-doctr[torch]"
pip install langchain
pip install -r requirements.txt
export PYTHONPATH="${PYTHONPATH}:${PWD}"

setup keys for open ai, into .env file: OPENAI_API_KEY="your-open-ai-key"

Run Demo

Run Demo OCR - Engineering

python3 src/main.py showocr

OCR engineering in default project (Amazon report - 2022)

python3 src/main.py ocrengineering

Run FasAPI Endpoints

uvicorn app.main:app --port 5000

Run with docs

http://127.0.0.1:5000/docs

Train endpoint

http://127.0.0.1:5000/docs#/chatpdf/train_chatpdf_route_train_chatpdf_post

post the project anual_report

Chat with files

http://127.0.0.1:5000/docs#/chatpdf/chatpdf_route_chatpdf_post

Use with your own pdf file

Put manualy your pdf files into this structure:

chatpdf

    +--data/
        +--projects/
            +--project_name/
                +--documents/
                    1-file.pdf
                    ....
                    n-file.pdf
                +--text_files/

OCR dataloaders will search pdf files in documents folder and then generate text files into text_files folder

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
app		app
assets		assets
config		config
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChatPDF

System requirements

Getting Started

Run Demo

Run Demo OCR - Engineering

Run FasAPI Endpoints

Use with your own pdf file

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ChatPDF

System requirements

Getting Started

Run Demo

Run Demo OCR - Engineering

Run FasAPI Endpoints

Use with your own pdf file

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages