Skip to content

Crismarquez/chatpdf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ChatPDF

Fastapi

Chat whith your own pdf files 😀

Demo version of End-to-end service to analize pdf documents using LLMs.

This is a two stage solution, first, aplied robust ocr engineering using DocTR to generate the dataset, and then fine tuning a LLM model using LangChain and Open AI Finally expose chat with FastAPI

OCR Engineering

image

Chat using FastAPI

image image

System requirements

  • Ubuntu 20
  • Python >=3.10

Getting Started

  1. Clone repo

  2. create and activate virtual enviroment

python3 -m venv .venv
source .venv/bin/activate
  1. Install dependences
python3 -m pip install --upgrade pip setuptools wheel
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
pip install "python-doctr[torch]"
pip install langchain
pip install -r requirements.txt
export PYTHONPATH="${PYTHONPATH}:${PWD}"
  1. setup keys for open ai, into .env file: OPENAI_API_KEY="your-open-ai-key"

Run Demo

Run Demo OCR - Engineering

python3 src/main.py showocr

OCR engineering in default project (Amazon report - 2022)

python3 src/main.py ocrengineering

Run FasAPI Endpoints

uvicorn app.main:app --port 5000

Run with docs

http://127.0.0.1:5000/docs

Train endpoint

http://127.0.0.1:5000/docs#/chatpdf/train_chatpdf_route_train_chatpdf_post

post the project anual_report

Chat with files

http://127.0.0.1:5000/docs#/chatpdf/chatpdf_route_chatpdf_post

Use with your own pdf file

Put manualy your pdf files into this structure:

chatpdf

    +--data/
        +--projects/
            +--project_name/
                +--documents/
                    1-file.pdf
                    ....
                    n-file.pdf
                +--text_files/

OCR dataloaders will search pdf files in documents folder and then generate text files into text_files folder

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages