Skip to content

Unstructured-IO/irs-manual-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chat with IRS Manuals

This directory contains an application for chatting with IRS manuals. Once data is available, the chat application only uses self-hosted models and can be run in a disconnected environment. Here's how to get started with the chatbot:

Installation

pip install -r requirements.txt

Environment Variables

Note there are other options for these connections, but these are the ones referenced in this implementation

OpenAI

Pinecone

PINECONE_API_KEY
PINECONE_API_ENV
OPENAI_API_KEY
PINECONE_INDEX_NAME

Download PDFs from IRS website

python download_data.py <Base URL> <Page Start> <Page End> <Target Directory>

Download

Run PDFs against unstructured-ingest

PYTHONPATH=. ./unstructured/ingest/main.py \
  --local-input-path <ingest-input-dir> \
  --structured-output-dir <ingest-output-dir> \
  # optional parameter -> this will hit the *NEW* API vs. processing locally
  --partition-by-api

Download gif

Here's an example of the structured json output

JSON

Seed and utilize vector db

python ingest_data.py <path-to-structured-json-file-directory>

Run the chat CLI

python cli_app.py

Chat

Chat with our hosted instance here

About

No description, website, or topics provided.

Resources

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages