An AI-driven scientific planning assistant designed to streamline experimental workflows and research processes.
Helix is a platform that leverages large language models and vector retrieval to help scientists and researchers generate comprehensive experiment plans. By analyzing existing scientific literature, protocol templates, and historical experiment data, the system provides actionable insights and detailed planning documents.
- Automated Experiment Planning: Generate detailed protocols, material lists, and timelines based on a research hypothesis.
- Context-Aware Retrieval: Integrated search across scientific papers (OpenAlex) and protocol repositories (Pinecone) using vector embeddings.
- Explainability (Retrieval Trace): Every plan includes a trace of why specific literature was selected and how it influenced the final protocol.
- Literature Quality Control: Automated assessment of retrieved papers for methodology relevance and evidence quality.
- Novelty Assessment: Categorizes plans as Exact, Similar, or New based on existing literature to ensure research impact.
- Reproducibility: Stores full context snapshots (prompt, model, retrieved data) for every plan version.
The system is built on a robust RAG (Retrieval-Augmented Generation) pipeline that prioritizes scientific accuracy and reproducibility.
- Input: User submits a scientific hypothesis or question.
- Embedding: The query is converted into a vector using the
multilingual-e5-largemodel. - Retrieval: Top-k relevant papers, protocols, and past experiments are fetched from vector and relational databases.
- Literature QC: An automated agent reviews retrieved items for relevance and domain alignment.
- Generation: A structured prompt containing the hypothesis and the retrieved context is sent to an LLM (e.g., Llama-3 70B via Groq).
- Persistence: The plan, novelty assessment, and a "Context Snapshot" are saved for full traceability.
- Frontend: React (TypeScript), Vite, Tailwind CSS, shadcn/ui, TanStack Query.
- Backend: FastAPI (Python), PostgreSQL (Supabase), Pinecone (Vector Search).
- LLM/AI: Groq (Llama-3), OpenAI/Gemini (via unified service layer), Pinecone Inference.
├── backend/
│ ├── models/ # SQLAlchemy and Pydantic schemas
│ ├── routers/ # API endpoints (Generate, History, Lab State)
│ ├── services/ # Core logic (Retrieval, QC, Generation, DB)
│ └── scripts/ # Data seeding and maintenance scripts
└── frontend-new/ # Main React web application
- Node.js (v18+)
- Python (v3.10+)
- PostgreSQL (or Supabase)
Create a .env file in the backend/ directory with the following variables:
# AI Services
GROQ_API_KEY=your_groq_key
PINECONE_API_KEY=your_pinecone_key
GEMINI_API_KEY=your_gemini_key
# Database
DATABASE_URL=your_postgresql_url
# Retrieval Settings
PINECONE_INDEX_NAME=experiment-protocols
OPENALEX_EMAIL=your_email@example.com
# Demo Mode
DEMO_MODE=true # Set to false for live API integration-
Clone the repository:
git clone https://github.com/AlexKurian77/helix.git cd helix -
Frontend Setup:
cd frontend-new npm install npm run dev -
Backend Setup:
cd backend pip install -r requirements.txt uvicorn main:app --reload
- Demo Mode: The project includes a
DEMO_MODEthat allows you to run the full UI and API without external keys by using deterministic mock data. - History & Trace: Explore the history of experiments and view the retrieval traces to understand the AI's "thought process."
This project is licensed under the terms specified in the repository.