🏢 YSS — Internal Knowledge Base Chatbot

A secure, RAG-powered internal knowledge base chatbot built with Python and Streamlit. Employees can ask natural language questions about company policies, HR documents, IT guides, and sales materials. Admins can upload documents through a password-protected interface.

Features

RAG-powered answers — retrieves relevant document excerpts before answering; no hallucinations
Multi-format ingestion — PDF, DOCX, TXT, Markdown
Admin upload panel — password-protected, categorized document management
Knowledge base browser — view all indexed docs, test retrieval queries
Streaming responses — answers appear token by token like a real chatbot
Source citations — every answer cites which documents were used
Conversation memory — maintains context across a session (last 10 turns)
Docker-ready — one command to deploy on any internal cloud

Tech Stack

Layer	Tool
Frontend	Streamlit
LLM	Anthropic Claude (claude-sonnet)
Vector Search	TF-IDF + Cosine Similarity (built-in, no GPU needed)
Document Parsing	PyPDF2, python-docx
Deployment	Docker + Docker Compose

Quick Start (Local)

1. Clone and enter project

git clone <your-repo-url>
cd company-kb

2. Create virtual environment

python -m venv venv
source venv/bin/activate        # macOS/Linux
# venv\Scripts\Activate.ps1    # Windows PowerShell

3. Install dependencies

pip install -r requirements.txt

4. Configure environment

cp .env.example .env

Edit .env and set your ANTHROPIC_API_KEY. Get one at https://console.anthropic.com

# .env
ANTHROPIC_API_KEY=sk-ant-...
ADMIN_PASSWORD=your-secure-password

5. Load the .env variables

# macOS/Linux
export $(cat .env | xargs)

# Windows PowerShell
Get-Content .env | ForEach-Object { $k,$v = $_ -split '=',2; [System.Environment]::SetEnvironmentVariable($k,$v) }

6. Seed sample documents (optional)

python seed_sample_docs.py

7. Run the app

streamlit run app.py

Open http://localhost:8501

Uploading Documents

Go to Admin Upload in the sidebar
Enter the admin password (default: admin123 — change this in .env)
Select a category, drag and drop files, click Ingest
Documents are immediately searchable in the chat

Supported formats: PDF, DOCX, TXT, Markdown

Docker Deployment (Internal Cloud)

Build and run

# Set your API key
echo "ANTHROPIC_API_KEY=sk-ant-..." > .env
echo "ADMIN_PASSWORD=your-secure-password" >> .env

# Build and start
docker-compose up -d

Access at http://your-server-ip:8501

Useful Docker commands

docker-compose logs -f          # View logs
docker-compose down             # Stop
docker-compose up -d --build    # Rebuild after code changes
docker-compose ps               # Check status

Documents and the vector store are persisted in Docker volumes — safe across restarts.

Project Structure

company-kb/
├── app.py                          # Main chat interface
├── pages/
│   ├── 1_Admin_Upload.py           # Document upload (password-protected)
│   └── 2_Knowledge_Base.py         # Browse docs + test search
├── utils/
│   ├── rag_engine.py               # RAG core: chunking, TF-IDF, retrieval
│   └── llm_client.py               # Anthropic Claude API wrapper
├── sample_docs/                    # Sample documents for demo
├── documents/                      # Uploaded documents (auto-created)
├── vector_store/                   # Index + metadata (auto-created)
├── .streamlit/config.toml          # Streamlit theme config
├── seed_sample_docs.py             # One-time sample data loader
├── requirements.txt
├── Dockerfile
├── docker-compose.yml
└── .env.example

Upgrading to Semantic Search (Optional)

The current engine uses TF-IDF which works well and requires no GPU. To upgrade to semantic embeddings (better for conceptual questions):

Uncomment sentence-transformers in requirements.txt
In rag_engine.py, replace the TFIDFStore with a sentence-transformers + faiss implementation

Security Notes

Change ADMIN_PASSWORD from the default before deploying
Keep ANTHROPIC_API_KEY secret — never commit .env to version control
For production, add SSO/LDAP authentication in front of Streamlit
The Docker setup runs on an internal port — do not expose directly to the internet without a reverse proxy (nginx) and HTTPS

License

Internal use only.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏢 YSS — Internal Knowledge Base Chatbot

Features

Tech Stack

Quick Start (Local)

1. Clone and enter project

2. Create virtual environment

3. Install dependencies

4. Configure environment

5. Load the .env variables

6. Seed sample documents (optional)

7. Run the app

Uploading Documents

Docker Deployment (Internal Cloud)

Build and run

Useful Docker commands

Project Structure

Upgrading to Semantic Search (Optional)

Security Notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
1_Admin_Upload.py		1_Admin_Upload.py
2_Knowledge_Base.py		2_Knowledge_Base.py
README.md		README.md
app.py		app.py
company-kb.tar.gz		company-kb.tar.gz
gitignore		gitignore
llm_client.py		llm_client.py
rag_engine.py		rag_engine.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🏢 YSS — Internal Knowledge Base Chatbot

Features

Tech Stack

Quick Start (Local)

1. Clone and enter project

2. Create virtual environment

3. Install dependencies

4. Configure environment

5. Load the .env variables

6. Seed sample documents (optional)

7. Run the app

Uploading Documents

Docker Deployment (Internal Cloud)

Build and run

Useful Docker commands

Project Structure

Upgrading to Semantic Search (Optional)

Security Notes

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages