Skip to content

anoexpected/floatchat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🌊 FloatChat - AI-Powered Ocean Data Explorer

FloatChat is an intelligent conversational interface for exploring ARGO ocean float data. Ask natural language questions about ocean temperature, salinity, depth measurements, and get engaging, educational responses powered by AI.

🎯 What You'll Build

A complete ocean data exploration platform featuring:

  • AI Chat Interface: Ask questions like "What's the deepest measurement in the Indian Ocean?"
  • Interactive Maps: Visualize float locations and trajectories
  • Data Profiles: Plot temperature, salinity, and depth profiles
  • Smart Search: AI-powered retrieval from a comprehensive ocean database

πŸš€ Quick Start Guide

Prerequisites

  • Python 3.8+
  • PostgreSQL database
  • Groq API key (free tier available)
  • Git

1. πŸ“₯ Clone and Setup

# Clone the repository
git clone <your-repo-url>
cd FloatChat_Full_Project

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

2. πŸ”‘ Configure Environment

# Copy environment template
cp .env.example .env

# Edit .env with your credentials
# Required: GROQ_API_KEY (get free key at https://groq.com/)
# Required: PostgreSQL connection details

Example .env configuration:

GROQ_API_KEY=your_groq_api_key_here
DB_HOST=localhost
DB_PORT=5432
DB_NAME=float_chat_database
DB_USER=postgres
DB_PASSWORD=your_password

3. πŸ—„οΈ Setup Database

Option A: Docker (Recommended)

# Start PostgreSQL with Docker
docker-compose up -d postgres

Option B: Local PostgreSQL

# Create database manually
createdb float_chat_database

4. πŸ“‘ Download Ocean Data

Download ARGO float data (this will take a few minutes):

# Download sample floats for demonstration
python download_argo.py --floats 1900121 2902110 2902111 2901256 2901258 2901305 2902094 2902097 2901257

# Or download specific floats by ID
python download_argo.py --floats 1900121 2901256

What happens: Downloads NetCDF files containing real ocean measurements from ARGO floats worldwide.

5. 🧹 Clean and Process Data

Convert raw NetCDF files to structured data:

python clean_all_nc.py --input argo_floats --output cleaned_data

What happens:

  • Extracts measurements (temperature, salinity, pressure)
  • Converts to CSV format
  • Organizes profiles, trajectories, and metadata

6. πŸ’Ύ Load Data to Database

Create database schema and import cleaned data:

python load_to_postgres.py --input cleaned_data

What happens:

  • Creates optimized database tables
  • Loads profile data, measurements, trajectories
  • Sets up indexes for fast querying

7. 🧠 Generate AI Embeddings

Create semantic search capabilities:

python create_embeddings.py --batch 200

What happens:

  • Generates text summaries of ocean data
  • Creates vector embeddings using sentence-transformers
  • Stores in ChromaDB for semantic search

8. 🚒 Launch the Application

Start Backend:

uvicorn backend.main:app --reload --port 8000

Start Frontend (new terminal):

streamlit run frontend/streamlit_app.py

9. πŸŽ‰ Explore Ocean Data!

Open your browser to http://localhost:8501 and start asking questions:

  • "What's the maximum depth reached in the Indian Ocean in 2016?"
  • "Show me temperature profiles near the equator"
  • "Which float measured the highest salinity?"
  • "Plot the trajectory of float 1900121"

πŸ“Š Example Interactions

Natural Conversation:

You: "What's the deepest measurement you have?"

FloatChat: "🌊 The deepest dive in our data reached an incredible 
1,987 meters! This amazing plunge happened in the Indian Ocean on 
June 22, 2016. At that depth, the water was a chilly 2.85Β°C. 
Want to explore more deep ocean mysteries? 🌍"

Data Visualization:

  • Interactive maps showing float locations
  • Temperature/salinity/depth profile plots
  • Float trajectory animations
  • Regional data filtering

πŸ› οΈ Project Structure

FloatChat_Full_Project/
β”œβ”€β”€ πŸ“₯ Data Pipeline
β”‚   β”œβ”€β”€ download_argo.py          # Download ARGO float data
β”‚   β”œβ”€β”€ clean_all_nc.py          # Process NetCDF files
β”‚   └── load_to_postgres.py      # Database ingestion
β”‚
β”œβ”€β”€ 🧠 AI Components  
β”‚   β”œβ”€β”€ create_embeddings.py     # Generate semantic embeddings
β”‚   β”œβ”€β”€ langchain_agent.py       # AI chat agent
β”‚   └── query_cache.py           # Response caching
β”‚
β”œβ”€β”€ πŸ–₯️ Application
β”‚   β”œβ”€β”€ backend/main.py          # FastAPI backend
β”‚   β”œβ”€β”€ frontend/streamlit_app.py # Streamlit UI
β”‚   └── database_manager.py      # Database utilities
β”‚
β”œβ”€β”€ πŸ“‹ Configuration
β”‚   β”œβ”€β”€ requirements.txt         # Python dependencies  
β”‚   β”œβ”€β”€ .env.example            # Environment template
β”‚   β”œβ”€β”€ schema.sql              # Database schema
β”‚   └── docker-compose.yml      # Docker services
β”‚
└── πŸ“ Data Directories (created during setup)
    β”œβ”€β”€ argo_floats/            # Raw NetCDF files
    β”œβ”€β”€ cleaned_data/           # Processed CSV files  
    └── chroma_db/              # Vector embeddings

πŸ”§ Advanced Configuration

Custom LLM Model

Edit langchain_agent.py to use different models:

# Change in langchain_agent.py
model_name = "llama3-8b-8192"  # or other Groq models

Database Performance

For large datasets, consider:

# Increase batch size for faster processing
python create_embeddings.py --batch 500

# Load data in parallel
python load_to_postgres.py --input cleaned_data --parallel 4

Adding More Floats

# Download additional floats
python download_argo.py --floats 2901234 2901235 2901236
python clean_all_nc.py --input argo_floats --output cleaned_data
python load_to_postgres.py --input cleaned_data --update
python create_embeddings.py --batch 200

πŸ› Troubleshooting

Common Issues

1. Database Connection Error

# Check if PostgreSQL is running
pg_isready -h localhost -p 5432

# Verify database exists
psql -h localhost -U postgres -l

2. Groq API Issues

# Test API key
python -c "import os; from dotenv import load_dotenv; load_dotenv(); print('API Key loaded:', bool(os.getenv('GROQ_API_KEY')))"

3. Missing Dependencies

# Reinstall requirements
pip install -r requirements.txt --force-reinstall

4. Port Already in Use

# Use different ports
streamlit run frontend/streamlit_app.py --server.port 8502
uvicorn backend.main:app --port 8001

Performance Tips

  • Large Datasets: Increase --batch size for embedding creation
  • Slow Queries: Check database indexes are created properly
  • Memory Issues: Process floats in smaller batches

🌍 Data Sources

This project uses data from the International Argo Program:

  • Global ocean temperature and salinity profiles
  • Real-time and delayed mode quality-controlled data
  • Measurements from autonomous profiling floats worldwide

🀝 Contributing

We welcome contributions! Here's how to get started:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Make your changes and test thoroughly
  4. Submit a pull request with a clear description

Development Guidelines

  • Follow PEP 8 style guidelines
  • Add docstrings to new functions
  • Include tests for new features
  • Update this README for new functionality

πŸ“„ License

This project is open source and available under the MIT License.


πŸ™ Acknowledgments

  • International Argo Program for providing open ocean data
  • Groq for fast LLM inference
  • Streamlit for the interactive web interface
  • ChromaDB for vector storage and retrieval

πŸ“ž Support

Having issues? Here's how to get help:

  1. Check the troubleshooting section above
  2. Search existing issues in the GitHub repository
  3. Create a new issue with detailed error messages and steps to reproduce
  4. Join discussions in the repository discussions tab

Ready to dive into ocean data? 🌊 Follow the quick start guide and start exploring!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors