🌊 FloatChat - AI-Powered Ocean Data Explorer

FloatChat is an intelligent conversational interface for exploring ARGO ocean float data. Ask natural language questions about ocean temperature, salinity, depth measurements, and get engaging, educational responses powered by AI.

🎯 What You'll Build

A complete ocean data exploration platform featuring:

AI Chat Interface: Ask questions like "What's the deepest measurement in the Indian Ocean?"
Interactive Maps: Visualize float locations and trajectories
Data Profiles: Plot temperature, salinity, and depth profiles
Smart Search: AI-powered retrieval from a comprehensive ocean database

🚀 Quick Start Guide

Prerequisites

Python 3.8+
PostgreSQL database
Groq API key (free tier available)
Git

1. 📥 Clone and Setup

# Clone the repository
git clone <your-repo-url>
cd FloatChat_Full_Project

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

2. 🔑 Configure Environment

# Copy environment template
cp .env.example .env

# Edit .env with your credentials
# Required: GROQ_API_KEY (get free key at https://groq.com/)
# Required: PostgreSQL connection details

Example .env configuration:

GROQ_API_KEY=your_groq_api_key_here
DB_HOST=localhost
DB_PORT=5432
DB_NAME=float_chat_database
DB_USER=postgres
DB_PASSWORD=your_password

3. 🗄️ Setup Database

Option A: Docker (Recommended)

# Start PostgreSQL with Docker
docker-compose up -d postgres

Option B: Local PostgreSQL

# Create database manually
createdb float_chat_database

4. 📡 Download Ocean Data

Download ARGO float data (this will take a few minutes):

# Download sample floats for demonstration
python download_argo.py --floats 1900121 2902110 2902111 2901256 2901258 2901305 2902094 2902097 2901257

# Or download specific floats by ID
python download_argo.py --floats 1900121 2901256

What happens: Downloads NetCDF files containing real ocean measurements from ARGO floats worldwide.

5. 🧹 Clean and Process Data

Convert raw NetCDF files to structured data:

python clean_all_nc.py --input argo_floats --output cleaned_data

What happens:

Extracts measurements (temperature, salinity, pressure)
Converts to CSV format
Organizes profiles, trajectories, and metadata

6. 💾 Load Data to Database

Create database schema and import cleaned data:

python load_to_postgres.py --input cleaned_data

What happens:

Creates optimized database tables
Loads profile data, measurements, trajectories
Sets up indexes for fast querying

7. 🧠 Generate AI Embeddings

Create semantic search capabilities:

python create_embeddings.py --batch 200

What happens:

Generates text summaries of ocean data
Creates vector embeddings using sentence-transformers
Stores in ChromaDB for semantic search

8. 🚢 Launch the Application

Start Backend:

uvicorn backend.main:app --reload --port 8000

Start Frontend (new terminal):

streamlit run frontend/streamlit_app.py

9. 🎉 Explore Ocean Data!

Open your browser to http://localhost:8501 and start asking questions:

"What's the maximum depth reached in the Indian Ocean in 2016?"
"Show me temperature profiles near the equator"
"Which float measured the highest salinity?"
"Plot the trajectory of float 1900121"

📊 Example Interactions

Natural Conversation:

You: "What's the deepest measurement you have?"

FloatChat: "🌊 The deepest dive in our data reached an incredible 
1,987 meters! This amazing plunge happened in the Indian Ocean on 
June 22, 2016. At that depth, the water was a chilly 2.85°C. 
Want to explore more deep ocean mysteries? 🌍"

Data Visualization:

Interactive maps showing float locations
Temperature/salinity/depth profile plots
Float trajectory animations
Regional data filtering

🛠️ Project Structure

FloatChat_Full_Project/
├── 📥 Data Pipeline
│   ├── download_argo.py          # Download ARGO float data
│   ├── clean_all_nc.py          # Process NetCDF files
│   └── load_to_postgres.py      # Database ingestion
│
├── 🧠 AI Components  
│   ├── create_embeddings.py     # Generate semantic embeddings
│   ├── langchain_agent.py       # AI chat agent
│   └── query_cache.py           # Response caching
│
├── 🖥️ Application
│   ├── backend/main.py          # FastAPI backend
│   ├── frontend/streamlit_app.py # Streamlit UI
│   └── database_manager.py      # Database utilities
│
├── 📋 Configuration
│   ├── requirements.txt         # Python dependencies  
│   ├── .env.example            # Environment template
│   ├── schema.sql              # Database schema
│   └── docker-compose.yml      # Docker services
│
└── 📁 Data Directories (created during setup)
    ├── argo_floats/            # Raw NetCDF files
    ├── cleaned_data/           # Processed CSV files  
    └── chroma_db/              # Vector embeddings

🔧 Advanced Configuration

Custom LLM Model

Edit langchain_agent.py to use different models:

# Change in langchain_agent.py
model_name = "llama3-8b-8192"  # or other Groq models

Database Performance

For large datasets, consider:

# Increase batch size for faster processing
python create_embeddings.py --batch 500

# Load data in parallel
python load_to_postgres.py --input cleaned_data --parallel 4

Adding More Floats

# Download additional floats
python download_argo.py --floats 2901234 2901235 2901236
python clean_all_nc.py --input argo_floats --output cleaned_data
python load_to_postgres.py --input cleaned_data --update
python create_embeddings.py --batch 200

🐛 Troubleshooting

Common Issues

1. Database Connection Error

# Check if PostgreSQL is running
pg_isready -h localhost -p 5432

# Verify database exists
psql -h localhost -U postgres -l

2. Groq API Issues

# Test API key
python -c "import os; from dotenv import load_dotenv; load_dotenv(); print('API Key loaded:', bool(os.getenv('GROQ_API_KEY')))"

3. Missing Dependencies

# Reinstall requirements
pip install -r requirements.txt --force-reinstall

4. Port Already in Use

# Use different ports
streamlit run frontend/streamlit_app.py --server.port 8502
uvicorn backend.main:app --port 8001

Performance Tips

Large Datasets: Increase --batch size for embedding creation
Slow Queries: Check database indexes are created properly
Memory Issues: Process floats in smaller batches

🌍 Data Sources

This project uses data from the International Argo Program:

Global ocean temperature and salinity profiles
Real-time and delayed mode quality-controlled data
Measurements from autonomous profiling floats worldwide

🤝 Contributing

We welcome contributions! Here's how to get started:

Fork the repository
Create a feature branch: git checkout -b feature-name
Make your changes and test thoroughly
Submit a pull request with a clear description

Development Guidelines

Follow PEP 8 style guidelines
Add docstrings to new functions
Include tests for new features
Update this README for new functionality

📄 License

This project is open source and available under the MIT License.

🙏 Acknowledgments

International Argo Program for providing open ocean data
Groq for fast LLM inference
Streamlit for the interactive web interface
ChromaDB for vector storage and retrieval

📞 Support

Having issues? Here's how to get help:

Check the troubleshooting section above
Search existing issues in the GitHub repository
Create a new issue with detailed error messages and steps to reproduce
Join discussions in the repository discussions tab

Ready to dive into ocean data? 🌊 Follow the quick start guide and start exploring!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
argo_floats		argo_floats
backend		backend
chroma_db		chroma_db
cleaned_data		cleaned_data
frontend		frontend
.env		.env
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
SUPER_INTELLIGENCE_ROADMAP.py		SUPER_INTELLIGENCE_ROADMAP.py
check_data.py		check_data.py
check_db_status.py		check_db_status.py
clean_all_nc.py		clean_all_nc.py
cleanup_project.py		cleanup_project.py
conversation_memory.json		conversation_memory.json
conversation_memory.py		conversation_memory.py
create_embeddings.py		create_embeddings.py
data_quality.py		data_quality.py
data_visualization.py		data_visualization.py
database_manager.py		database_manager.py
docker-compose.yml		docker-compose.yml
download_argo.py		download_argo.py
embeddings.parquet		embeddings.parquet
integration_demo.py		integration_demo.py
langchain_agent.py		langchain_agent.py
lightweight_db.py		lightweight_db.py
load_to_postgres.py		load_to_postgres.py
main.py		main.py
pyproject.toml		pyproject.toml
query_cache.py		query_cache.py
query_complexity_demo.py		query_complexity_demo.py
requirements.txt		requirements.txt
schema.sql		schema.sql
super_intelligence_showcase.py		super_intelligence_showcase.py
test_float.py		test_float.py
test_results.json		test_results.json
test_schema.py		test_schema.py
verify_dynamic_behavior.py		verify_dynamic_behavior.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌊 FloatChat - AI-Powered Ocean Data Explorer

🎯 What You'll Build

🚀 Quick Start Guide

Prerequisites

1. 📥 Clone and Setup

2. 🔑 Configure Environment

3. 🗄️ Setup Database

4. 📡 Download Ocean Data

5. 🧹 Clean and Process Data

6. 💾 Load Data to Database

7. 🧠 Generate AI Embeddings

8. 🚢 Launch the Application

9. 🎉 Explore Ocean Data!

📊 Example Interactions

🛠️ Project Structure

🔧 Advanced Configuration

Custom LLM Model

Database Performance

Adding More Floats

🐛 Troubleshooting

Common Issues

Performance Tips

🌍 Data Sources

🤝 Contributing

Development Guidelines

📄 License

🙏 Acknowledgments

📞 Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🌊 FloatChat - AI-Powered Ocean Data Explorer

🎯 What You'll Build

🚀 Quick Start Guide

Prerequisites

1. 📥 Clone and Setup

2. 🔑 Configure Environment

3. 🗄️ Setup Database

4. 📡 Download Ocean Data

5. 🧹 Clean and Process Data

6. 💾 Load Data to Database

7. 🧠 Generate AI Embeddings

8. 🚢 Launch the Application

9. 🎉 Explore Ocean Data!

📊 Example Interactions

🛠️ Project Structure

🔧 Advanced Configuration

Custom LLM Model

Database Performance

Adding More Floats

🐛 Troubleshooting

Common Issues

Performance Tips

🌍 Data Sources

🤝 Contributing

Development Guidelines

📄 License

🙏 Acknowledgments

📞 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages