AInalyst

AI-Powered Financial Document Analysis Platform

AInalyst is a sophisticated Retrieval-Augmented Generation (RAG) system that provides intelligent analysis of SEC filings using OpenAI's language models. Built with a modern tech stack, it enables users to query financial documents through natural language and receive contextual insights backed by official SEC data.

🏗️ Architecture

The platform consists of three main components:

Data Pipeline: Automated SEC filing download and processing
RAG Backend: FastAPI service with FAISS vector search and OpenAI integration
Frontend Interface: Next.js chat application with real-time responses

📁 Project Structure

AInalyst/
├── api/
│   └── app.py                      # FastAPI backend with RAG endpoints
├── frontend/                       # Next.js 15 frontend application
│   ├── src/
│   │   ├── app/
│   │   │   ├── page.tsx           # Landing page with animated UI
│   │   │   └── chat/
│   │   │       └── page.tsx       # Chat interface
│   │   └── components/            # Reusable UI components
│   └── package.json               # Frontend dependencies
├── data/                          # Downloaded SEC filings (JSON format)
├── download_filings.py            # SEC EDGAR filing downloader
├── incremental_chunk_embed.py     # Document chunking and embedding
├── query_rag.py                   # CLI retrieval testing tool
├── requirements.txt               # Python dependencies
├── faiss_index.idx               # FAISS vector index (generated)
└── faiss_metadata.json          # Document metadata (generated)

⚙️ Prerequisites

Python 3.8+
Node.js 18+ and npm
OpenAI API Key (with access to embeddings and chat completions)

🚀 Quick Start

1. Environment Setup

Clone the repository and set up your environment:

git clone https://github.com/your-username/AInalyst.git
cd AInalyst

Create a .env file in the project root:

OPENAI_API_KEY=sk-your-openai-api-key-here
START_DATE=2023-01-01
MODE=DEMO
USER_AGENT="Your Name Your Project <your.email@example.com>"
CORS_ORIGINS=http://localhost:3000

2. Install Dependencies

Backend:

pip install -r requirements.txt

Frontend:

cd frontend
npm install
cd ..

3. Download and Process Data

Download SEC filings (starts with Apple in DEMO mode):

python download_filings.py

Create embeddings and build the search index:

python incremental_chunk_embed.py

4. Launch the Application

Start the backend API:

uvicorn api.app:app --reload --host 0.0.0.0 --port 8000

Start the frontend (in a new terminal):

cd frontend
npm run dev

Visit http://localhost:3000 to access AInalyst.

🔧 Configuration Options

Data Collection Modes

DEMO: Downloads filings for Apple only (fast setup)
FULL: Downloads all S&P 500 company filings (comprehensive dataset)

Environment Variables

Variable	Description	Default
`OPENAI_API_KEY`	Your OpenAI API key	Required
`START_DATE`	Beginning date for filing collection	`2023-01-01`
`MODE`	Data collection mode (`DEMO` or `FULL`)	`DEMO`
`USER_AGENT`	SEC API user agent (required)	Required
`CORS_ORIGINS`	Allowed frontend origins	`http://localhost:3000`

💡 Usage Examples

CLI Testing

Test the retrieval system directly:

python query_rag.py --query "What are Apple's main revenue streams?" --k 5

API Endpoints

POST /ask

{
  "query": "What were Tesla's R&D expenses last year?",
  "k": 5,
  "api_key": "sk-your-key",
  "chat_model": "gpt-4.1-mini-2025-04-14"
}

Response:

{
  "answer": "Based on Tesla's financial filings...",
  "context": [
    {
      "ticker": "TSLA",
      "accession": "0000950170-23-027673",
      "text": "Research and development expenses...",
      "score": 0.85,
      "filing_date": "2023-01-26",
      "form": "10-K",
      "url": "https://www.sec.gov/Archives/edgar/data/..."
    }
  ]
}

🛠️ Technical Details

Data Processing Pipeline

SEC Filing Download: Fetches 10-K, 10-Q, and Company Facts from SEC EDGAR API
Text Extraction: Cleans HTML/XML and extracts relevant content sections
Document Chunking: Splits documents into 1000-token chunks with 200-token overlap
Vector Embedding: Uses OpenAI's text-embedding-3-small model
FAISS Indexing: Stores embeddings for efficient similarity search

RAG Implementation

Retrieval: FAISS cosine similarity search finds top-K relevant chunks
Augmentation: Assembles context from retrieved documents
Generation: OpenAI chat completion with retrieved context

Frontend Features

Animated Landing Page: Cyberpunk-themed interface with spotlight effects
Real-time Chat: WebSocket-like experience with streaming responses
Source Attribution: Links to original SEC filings for verification
Dark/Light Mode: Adaptive theme support
Responsive Design: Mobile and desktop optimized

📊 Deployment

Production Configuration

For deployment, update environment variables:

NEXT_PUBLIC_BACKEND_URL=https://your-api-domain.com
CORS_ORIGINS=https://your-frontend-domain.com,https://your-frontend-*.vercel.app

6. Backend Deployment on Render

6.1 Create a Render Web Service

Log in to Render (or create an account).
Click New → Web Service.
Connect your GitHub repo (select EDEN757/AInalyst).
Configure the service:
- Name: e.g. ainalyst-backend
- Region: Choose a region close to you (e.g., Oregon or Frankfurt).
- Root Directory: api (so Render builds from AInalyst/api/).
- Runtime: Python 3.
- Leave Build Command and Start Command blank for now.
Click Create Web Service. Render will provision a placeholder service awaiting your build settings.

6.2 Set Environment Variables on Render

In your new Web Service, go to Settings → Environment.
Add the following variables (one at a time):

Key	Value	Description
`OPENAI_API_KEY`	`sk-your-openai-key`	Your private OpenAI key used for embedding generation and fallback chat completions.
`START_DATE`	`2023-01-01`	Earliest filing date for download_filings.py.
`MODE`	`DEMO`	Mode flag used by your ingestion scripts.
`USER_AGENT`	`yourname youremail@example.com`	Custom User-Agent when fetching SEC EDGAR filings.
`CORS_ORIGINS`	`http://localhost:3000,https://a-inalyst.vercel.app`	Comma-separated list of allowed origins (development + production).

Note:

Replace sk-your-openai-key with your own OpenAI API key.
Replace https://a-inalyst.vercel.app with your actual Vercel frontend URL once it's live.

Click Save after entering each variable.

6.3 Set Build & Start Commands on Render

In the Web Service's Settings, scroll to Build & Deploy.
Populate the Build Command field with:

pip install -r requirements.txt

Installs all Python dependencies.

Populate the Start Command field with:

uvicorn api.app:app --host 0.0.0.0 --port $PORT

Launches the FastAPI server via Uvicorn.
$PORT is provided by Render at runtime.

6.4 Data Processing Scripts

After deployment, the following scripts need to be run manually from the shell:

Download SEC filings:

python download_filings.py

Create embeddings and build search index:

python incremental_chunk_embed.py

6.5 Automated Data Updates (Cron Jobs)

To keep the data fresh, set up cron jobs to run the data processing scripts periodically:

# Edit crontab
crontab -e

# Add the following lines to run daily at 2 AM
0 2 * * * cd /path/to/AInalyst && python download_filings.py
30 2 * * * cd /path/to/AInalyst && python incremental_chunk_embed.py

This ensures:

New SEC filings are downloaded daily
Embeddings are updated with fresh data
The search index stays current with the latest financial documents

Instance Type
- On the free tier, select the free instance.
- Optionally upgrade to a paid instance for SSH access or persistent disks.
Ensure Auto-Deploy is toggled On (default). Pushing to main will automatically trigger a rebuild.
Click Save Changes (or Update Service). Render will queue a new deployment with your updated commands.

7. Frontend Deployment on Vercel

7.1 Create a Vercel Project

Log in to Vercel.
Click New Project.
Under Import Git Repository, select your GitHub account and choose EDEN757/AInalyst.
Configure project settings:
- Root Directory: frontend (so Vercel builds from AInalyst/frontend/).
- Framework Preset: Should auto-detect Next.js.
- Build Command: npm run build (default).
- Output Directory: .next (default).
Click Deploy. Vercel will install dependencies and run npm run build in the frontend/ folder.

🔍 Advanced Features

Custom CORS Handling

The backend includes intelligent CORS management for Vercel deployments, automatically allowing preview and production URLs while maintaining security.

Incremental Updates

The embedding system supports incremental updates - only new documents are processed when running incremental_chunk_embed.py again.

Extensible Architecture

Multiple Document Types: Supports 10-K, 10-Q, and Company Facts
Configurable Chunking: Adjustable chunk sizes and overlap
Model Flexibility: Easy switching between OpenAI models
Vector Store Agnostic: FAISS can be replaced with other vector databases

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

SEC EDGAR API for providing access to financial data
OpenAI for embedding and language model capabilities
FAISS (Facebook AI Similarity Search) for efficient vector operations
Vercel for seamless frontend deployment

Built with ❤️ for financial analysis and AI-powered insights

For questions or support, please open an issue on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
api		api
frontend		frontend
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download_filings.py		download_filings.py
incremental_chunk_embed.py		incremental_chunk_embed.py
openapi.json		openapi.json
pyproject.toml		pyproject.toml
query_rag.py		query_rag.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

AInalyst

🏗️ Architecture

📁 Project Structure

⚙️ Prerequisites

🚀 Quick Start

1. Environment Setup

2. Install Dependencies

3. Download and Process Data

4. Launch the Application

🔧 Configuration Options

Data Collection Modes

Environment Variables

💡 Usage Examples

CLI Testing

API Endpoints

🛠️ Technical Details

Data Processing Pipeline

RAG Implementation

Frontend Features

📊 Deployment

Production Configuration

6. Backend Deployment on Render

6.1 Create a Render Web Service

6.2 Set Environment Variables on Render

6.3 Set Build & Start Commands on Render

6.4 Data Processing Scripts

6.5 Automated Data Updates (Cron Jobs)

7. Frontend Deployment on Vercel

7.1 Create a Vercel Project

🔍 Advanced Features

Custom CORS Handling

Incremental Updates

Extensible Architecture

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages