This directory contains an application for chatting with IRS manuals. Once data is available, the chat application only uses self-hosted models and can be run in a disconnected environment. Here's how to get started with the chatbot:
pip install -r requirements.txt
Note there are other options for these connections, but these are the ones referenced in this implementation
PINECONE_API_KEY
PINECONE_API_ENV
OPENAI_API_KEY
PINECONE_INDEX_NAME
python download_data.py <Base URL> <Page Start> <Page End> <Target Directory>
PYTHONPATH=. ./unstructured/ingest/main.py \
--local-input-path <ingest-input-dir> \
--structured-output-dir <ingest-output-dir> \
# optional parameter -> this will hit the *NEW* API vs. processing locally
--partition-by-api
Here's an example of the structured json output
python ingest_data.py <path-to-structured-json-file-directory>
python cli_app.py