F1-AI is a Retrieval-Augmented Generation (RAG) application specifically designed for Formula 1 information. It features an intelligent web scraper that automatically discovers and extracts Formula 1-related content from the web, stores it in a vector database, and enables natural language querying of the stored information.
- Web scraping of Formula 1 content with automatic content extraction
- Vector database storage using Pinecone for efficient similarity search
- OpenRouter integration for advanced LLM capabilities
- RAG-powered question answering with contextual understanding and source citations
- Command-line interface for automation and scripting
- User-friendly Streamlit web interface with chat history
- Asynchronous data ingestion and processing for improved performance
F1-AI is built on a modern tech stack:
- LangChain: Orchestrates the RAG pipeline and manages interactions between components
- Pinecone: Vector database for storing and retrieving embeddings
- OpenRouter: Primary LLM provider with Mistral-7B-Instruct model
- Ollama: Alternative local LLM provider for embeddings
- Playwright: Handles web scraping with JavaScript support
- BeautifulSoup4: Processes HTML content and extracts relevant information
- Streamlit: Provides an interactive web interface with chat functionality
- Python 3.8 or higher
- OpenRouter API key (set as OPENROUTER_API_KEY environment variable)
- Pinecone API key (set as PINECONE_API_KEY environment variable)
- 8GB RAM minimum (16GB recommended)
- Internet connection for web scraping
- Ollama installed (optional, for local embeddings)
- Download from Ollama's website
- Pull the required model:
ollama pull all-minilm-l6-v2
-
Clone the repository:
git clone <repository-url> cd FASTapi-RAG
-
Install the required dependencies:
pip install -r requirements.txt
-
Install Playwright browsers:
playwright install
-
Set up environment variables: Create a .env file with:
OPENROUTER_API_KEY=your_api_key_here # Required for LLM functionality PINECONE_API_KEY=your_api_key_here # Required for vector storage
-
Scrape and ingest F1 content:
python f1_scraper.py --start-urls https://www.formula1.com/ --max-pages 100 --depth 2 --ingest
Options:
--start-urls: Space-separated list of URLs to start crawling from--max-pages: Maximum number of pages to crawl (default: 100)--depth: Maximum crawl depth (default: 2)--ingest: Flag to ingest discovered content into RAG system--max-chunks: Maximum chunks per URL for ingestion (default: 50)--llm-provider: Choose LLM provider (openrouter, ollama)
-
Ask questions about Formula 1:
python f1_ai.py ask "Who won the 2023 F1 World Championship?"
Run the Streamlit app:
streamlit run streamlit_app.pyThis will open a web interface where you can:
- Ask questions about Formula 1
- View responses in a chat-like interface
- See source citations for answers
- Track conversation history
- Get real-time updates on response generation
f1_scraper.py: Intelligent web crawler implementation- Automatically discovers F1-related content
- Handles content relevance detection
- Manages crawling depth and limits
f1_ai.py: Core RAG application implementation- Handles data ingestion and chunking
- Manages vector database operations
- Implements question-answering logic
llm_manager.py: LLM provider management- Integrates with OpenRouter for advanced LLM capabilities
- Handles embeddings generation
- Manages API interactions
streamlit_app.py: Streamlit web interface- Provides chat-based UI
- Manages conversation history
- Handles async operations
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Submit a Pull Request
