AI-Powered Financial Document Analysis Platform
AInalyst is a sophisticated Retrieval-Augmented Generation (RAG) system that provides intelligent analysis of SEC filings using OpenAI's language models. Built with a modern tech stack, it enables users to query financial documents through natural language and receive contextual insights backed by official SEC data.
The platform consists of three main components:
- Data Pipeline: Automated SEC filing download and processing
- RAG Backend: FastAPI service with FAISS vector search and OpenAI integration
- Frontend Interface: Next.js chat application with real-time responses
AInalyst/
βββ api/
β βββ app.py # FastAPI backend with RAG endpoints
βββ frontend/ # Next.js 15 frontend application
β βββ src/
β β βββ app/
β β β βββ page.tsx # Landing page with animated UI
β β β βββ chat/
β β β βββ page.tsx # Chat interface
β β βββ components/ # Reusable UI components
β βββ package.json # Frontend dependencies
βββ data/ # Downloaded SEC filings (JSON format)
βββ download_filings.py # SEC EDGAR filing downloader
βββ incremental_chunk_embed.py # Document chunking and embedding
βββ query_rag.py # CLI retrieval testing tool
βββ requirements.txt # Python dependencies
βββ faiss_index.idx # FAISS vector index (generated)
βββ faiss_metadata.json # Document metadata (generated)
- Python 3.8+
- Node.js 18+ and npm
- OpenAI API Key (with access to embeddings and chat completions)
Clone the repository and set up your environment:
git clone https://github.com/your-username/AInalyst.git
cd AInalystCreate a .env file in the project root:
OPENAI_API_KEY=sk-your-openai-api-key-here
START_DATE=2023-01-01
MODE=DEMO
USER_AGENT="Your Name Your Project <your.email@example.com>"
CORS_ORIGINS=http://localhost:3000Backend:
pip install -r requirements.txtFrontend:
cd frontend
npm install
cd ..Download SEC filings (starts with Apple in DEMO mode):
python download_filings.pyCreate embeddings and build the search index:
python incremental_chunk_embed.pyStart the backend API:
uvicorn api.app:app --reload --host 0.0.0.0 --port 8000Start the frontend (in a new terminal):
cd frontend
npm run devVisit http://localhost:3000 to access AInalyst.
- DEMO: Downloads filings for Apple only (fast setup)
- FULL: Downloads all S&P 500 company filings (comprehensive dataset)
| Variable | Description | Default |
|---|---|---|
OPENAI_API_KEY |
Your OpenAI API key | Required |
START_DATE |
Beginning date for filing collection | 2023-01-01 |
MODE |
Data collection mode (DEMO or FULL) |
DEMO |
USER_AGENT |
SEC API user agent (required) | Required |
CORS_ORIGINS |
Allowed frontend origins | http://localhost:3000 |
Test the retrieval system directly:
python query_rag.py --query "What are Apple's main revenue streams?" --k 5POST /ask
{
"query": "What were Tesla's R&D expenses last year?",
"k": 5,
"api_key": "sk-your-key",
"chat_model": "gpt-4.1-mini-2025-04-14"
}Response:
{
"answer": "Based on Tesla's financial filings...",
"context": [
{
"ticker": "TSLA",
"accession": "0000950170-23-027673",
"text": "Research and development expenses...",
"score": 0.85,
"filing_date": "2023-01-26",
"form": "10-K",
"url": "https://www.sec.gov/Archives/edgar/data/..."
}
]
}- SEC Filing Download: Fetches 10-K, 10-Q, and Company Facts from SEC EDGAR API
- Text Extraction: Cleans HTML/XML and extracts relevant content sections
- Document Chunking: Splits documents into 1000-token chunks with 200-token overlap
- Vector Embedding: Uses OpenAI's
text-embedding-3-smallmodel - FAISS Indexing: Stores embeddings for efficient similarity search
- Retrieval: FAISS cosine similarity search finds top-K relevant chunks
- Augmentation: Assembles context from retrieved documents
- Generation: OpenAI chat completion with retrieved context
- Animated Landing Page: Cyberpunk-themed interface with spotlight effects
- Real-time Chat: WebSocket-like experience with streaming responses
- Source Attribution: Links to original SEC filings for verification
- Dark/Light Mode: Adaptive theme support
- Responsive Design: Mobile and desktop optimized
For deployment, update environment variables:
NEXT_PUBLIC_BACKEND_URL=https://your-api-domain.com
CORS_ORIGINS=https://your-frontend-domain.com,https://your-frontend-*.vercel.app- Log in to Render (or create an account).
- Click New β Web Service.
- Connect your GitHub repo (select EDEN757/AInalyst).
- Configure the service:
- Name: e.g. ainalyst-backend
- Region: Choose a region close to you (e.g., Oregon or Frankfurt).
- Root Directory:
api(so Render builds from AInalyst/api/). - Runtime: Python 3.
- Leave Build Command and Start Command blank for now.
- Click Create Web Service. Render will provision a placeholder service awaiting your build settings.
- In your new Web Service, go to Settings β Environment.
- Add the following variables (one at a time):
| Key | Value | Description |
|---|---|---|
OPENAI_API_KEY |
sk-your-openai-key |
Your private OpenAI key used for embedding generation and fallback chat completions. |
START_DATE |
2023-01-01 |
Earliest filing date for download_filings.py. |
MODE |
DEMO |
Mode flag used by your ingestion scripts. |
USER_AGENT |
yourname youremail@example.com |
Custom User-Agent when fetching SEC EDGAR filings. |
CORS_ORIGINS |
http://localhost:3000,https://a-inalyst.vercel.app |
Comma-separated list of allowed origins (development + production). |
Note:
- Replace
sk-your-openai-keywith your own OpenAI API key. - Replace
https://a-inalyst.vercel.appwith your actual Vercel frontend URL once it's live.
- Click Save after entering each variable.
- In the Web Service's Settings, scroll to Build & Deploy.
- Populate the Build Command field with:
pip install -r requirements.txt- Installs all Python dependencies.
- Populate the Start Command field with:
uvicorn api.app:app --host 0.0.0.0 --port $PORT- Launches the FastAPI server via Uvicorn.
$PORTis provided by Render at runtime.
After deployment, the following scripts need to be run manually from the shell:
- Download SEC filings:
python download_filings.py- Create embeddings and build search index:
python incremental_chunk_embed.pyTo keep the data fresh, set up cron jobs to run the data processing scripts periodically:
# Edit crontab
crontab -e
# Add the following lines to run daily at 2 AM
0 2 * * * cd /path/to/AInalyst && python download_filings.py
30 2 * * * cd /path/to/AInalyst && python incremental_chunk_embed.pyThis ensures:
- New SEC filings are downloaded daily
- Embeddings are updated with fresh data
- The search index stays current with the latest financial documents
- Instance Type
- On the free tier, select the free instance.
- Optionally upgrade to a paid instance for SSH access or persistent disks.
- Ensure Auto-Deploy is toggled On (default). Pushing to main will automatically trigger a rebuild.
- Click Save Changes (or Update Service). Render will queue a new deployment with your updated commands.
- Log in to Vercel.
- Click New Project.
- Under Import Git Repository, select your GitHub account and choose EDEN757/AInalyst.
- Configure project settings:
- Root Directory:
frontend(so Vercel builds from AInalyst/frontend/). - Framework Preset: Should auto-detect Next.js.
- Build Command:
npm run build(default). - Output Directory:
.next(default).
- Root Directory:
- Click Deploy. Vercel will install dependencies and run npm run build in the frontend/ folder.
The backend includes intelligent CORS management for Vercel deployments, automatically allowing preview and production URLs while maintaining security.
The embedding system supports incremental updates - only new documents are processed when running incremental_chunk_embed.py again.
- Multiple Document Types: Supports 10-K, 10-Q, and Company Facts
- Configurable Chunking: Adjustable chunk sizes and overlap
- Model Flexibility: Easy switching between OpenAI models
- Vector Store Agnostic: FAISS can be replaced with other vector databases
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- SEC EDGAR API for providing access to financial data
- OpenAI for embedding and language model capabilities
- FAISS (Facebook AI Similarity Search) for efficient vector operations
- Vercel for seamless frontend deployment
Built with β€οΈ for financial analysis and AI-powered insights
For questions or support, please open an issue on GitHub.