FloatChat is an intelligent conversational interface for exploring ARGO ocean float data. Ask natural language questions about ocean temperature, salinity, depth measurements, and get engaging, educational responses powered by AI.
A complete ocean data exploration platform featuring:
- AI Chat Interface: Ask questions like "What's the deepest measurement in the Indian Ocean?"
- Interactive Maps: Visualize float locations and trajectories
- Data Profiles: Plot temperature, salinity, and depth profiles
- Smart Search: AI-powered retrieval from a comprehensive ocean database
- Python 3.8+
- PostgreSQL database
- Groq API key (free tier available)
- Git
# Clone the repository
git clone <your-repo-url>
cd FloatChat_Full_Project
# Create virtual environment
python -m venv venv
# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt# Copy environment template
cp .env.example .env
# Edit .env with your credentials
# Required: GROQ_API_KEY (get free key at https://groq.com/)
# Required: PostgreSQL connection detailsExample .env configuration:
GROQ_API_KEY=your_groq_api_key_here
DB_HOST=localhost
DB_PORT=5432
DB_NAME=float_chat_database
DB_USER=postgres
DB_PASSWORD=your_passwordOption A: Docker (Recommended)
# Start PostgreSQL with Docker
docker-compose up -d postgresOption B: Local PostgreSQL
# Create database manually
createdb float_chat_databaseDownload ARGO float data (this will take a few minutes):
# Download sample floats for demonstration
python download_argo.py --floats 1900121 2902110 2902111 2901256 2901258 2901305 2902094 2902097 2901257
# Or download specific floats by ID
python download_argo.py --floats 1900121 2901256What happens: Downloads NetCDF files containing real ocean measurements from ARGO floats worldwide.
Convert raw NetCDF files to structured data:
python clean_all_nc.py --input argo_floats --output cleaned_dataWhat happens:
- Extracts measurements (temperature, salinity, pressure)
- Converts to CSV format
- Organizes profiles, trajectories, and metadata
Create database schema and import cleaned data:
python load_to_postgres.py --input cleaned_dataWhat happens:
- Creates optimized database tables
- Loads profile data, measurements, trajectories
- Sets up indexes for fast querying
Create semantic search capabilities:
python create_embeddings.py --batch 200What happens:
- Generates text summaries of ocean data
- Creates vector embeddings using sentence-transformers
- Stores in ChromaDB for semantic search
Start Backend:
uvicorn backend.main:app --reload --port 8000Start Frontend (new terminal):
streamlit run frontend/streamlit_app.pyOpen your browser to http://localhost:8501 and start asking questions:
- "What's the maximum depth reached in the Indian Ocean in 2016?"
- "Show me temperature profiles near the equator"
- "Which float measured the highest salinity?"
- "Plot the trajectory of float 1900121"
Natural Conversation:
You: "What's the deepest measurement you have?"
FloatChat: "π The deepest dive in our data reached an incredible
1,987 meters! This amazing plunge happened in the Indian Ocean on
June 22, 2016. At that depth, the water was a chilly 2.85Β°C.
Want to explore more deep ocean mysteries? π"
Data Visualization:
- Interactive maps showing float locations
- Temperature/salinity/depth profile plots
- Float trajectory animations
- Regional data filtering
FloatChat_Full_Project/
βββ π₯ Data Pipeline
β βββ download_argo.py # Download ARGO float data
β βββ clean_all_nc.py # Process NetCDF files
β βββ load_to_postgres.py # Database ingestion
β
βββ π§ AI Components
β βββ create_embeddings.py # Generate semantic embeddings
β βββ langchain_agent.py # AI chat agent
β βββ query_cache.py # Response caching
β
βββ π₯οΈ Application
β βββ backend/main.py # FastAPI backend
β βββ frontend/streamlit_app.py # Streamlit UI
β βββ database_manager.py # Database utilities
β
βββ π Configuration
β βββ requirements.txt # Python dependencies
β βββ .env.example # Environment template
β βββ schema.sql # Database schema
β βββ docker-compose.yml # Docker services
β
βββ π Data Directories (created during setup)
βββ argo_floats/ # Raw NetCDF files
βββ cleaned_data/ # Processed CSV files
βββ chroma_db/ # Vector embeddings
Edit langchain_agent.py to use different models:
# Change in langchain_agent.py
model_name = "llama3-8b-8192" # or other Groq modelsFor large datasets, consider:
# Increase batch size for faster processing
python create_embeddings.py --batch 500
# Load data in parallel
python load_to_postgres.py --input cleaned_data --parallel 4# Download additional floats
python download_argo.py --floats 2901234 2901235 2901236
python clean_all_nc.py --input argo_floats --output cleaned_data
python load_to_postgres.py --input cleaned_data --update
python create_embeddings.py --batch 2001. Database Connection Error
# Check if PostgreSQL is running
pg_isready -h localhost -p 5432
# Verify database exists
psql -h localhost -U postgres -l2. Groq API Issues
# Test API key
python -c "import os; from dotenv import load_dotenv; load_dotenv(); print('API Key loaded:', bool(os.getenv('GROQ_API_KEY')))"3. Missing Dependencies
# Reinstall requirements
pip install -r requirements.txt --force-reinstall4. Port Already in Use
# Use different ports
streamlit run frontend/streamlit_app.py --server.port 8502
uvicorn backend.main:app --port 8001- Large Datasets: Increase
--batchsize for embedding creation - Slow Queries: Check database indexes are created properly
- Memory Issues: Process floats in smaller batches
This project uses data from the International Argo Program:
- Global ocean temperature and salinity profiles
- Real-time and delayed mode quality-controlled data
- Measurements from autonomous profiling floats worldwide
We welcome contributions! Here's how to get started:
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes and test thoroughly
- Submit a pull request with a clear description
- Follow PEP 8 style guidelines
- Add docstrings to new functions
- Include tests for new features
- Update this README for new functionality
This project is open source and available under the MIT License.
- International Argo Program for providing open ocean data
- Groq for fast LLM inference
- Streamlit for the interactive web interface
- ChromaDB for vector storage and retrieval
Having issues? Here's how to get help:
- Check the troubleshooting section above
- Search existing issues in the GitHub repository
- Create a new issue with detailed error messages and steps to reproduce
- Join discussions in the repository discussions tab
Ready to dive into ocean data? π Follow the quick start guide and start exploring!