A production-style geospatial data pipeline that ingests satellite datasets, extracts spatial features, and enables natural-language exploration through an AI-powered query agent.
Satellite / Environmental Data
β
βΌ
βββββββββββββββββββββββββββ
β Apache Airflow DAG β
β βββββββββββββββββββββ β
β β Check Raw Data β β
β ββββββββββ¬βββββββββββ β
β βΌ β
β βββββββββββββββββββββ β
β β Validate Schema β β
β ββββββββββ¬βββββββββββ β
β βΌ β
β ββββββββββ΄βββββββββββ β
β β PARALLEL β β
β β ββββββββββββββββ β β
β β β NDVI ββSlope β β β
β β ββββ¬ββββββββ¬ββββ β β
β ββββββΌββββββββΌβββββββ β
β βΌ βΌ β
β βββββββββββββββββββββ β
β β Feature Extract β β
β ββββββββββ¬βββββββββββ β
β βΌ β
β βββββββββββββββββββββ β
β β Load to PostGIS β β
β βββββββββββββββββββββ β
βββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββ
β PostGIS / SQLite DB β
β βββββββββββββββββββββ β
β β spatial_features β β
β β regions β β
β β ndvi_data β β
β β terrain_data β β
β β land_cover β β
β βββββββββββββββββββββ β
ββββββββββββββ¬βββββββββββββ
β
ββββββββββββ΄βββββββββββ
βΌ βΌ
ββββββββββββββββββββ ββββββββββββββββββββ
β dbt Models β β AI Query Agent β
β ββββββββββββββ β β ββββββββββββββ β
β β Staging β β β β LangChain β β
β β Marts β β β β TextβSQL β β
β ββββββββββββββ β β β RAG β β
ββββββββββββββββββββ β ββββββββββββββ β
ββββββββββ¬ββββββββββ
β
βΌ
ββββββββββββββββββββββββββ
β Streamlit Dashboard β
β ββββββββββ ββββββββββ β
β β Map β β Chat β β
β βExplorerβ β Agent β β
β ββββββββββ ββββββββββ β
β ββββββββββ ββββββββββ β
β βAnalyticsβ β Status β β
β ββββββββββ ββββββββββ β
ββββββββββββββββββββββββββ
| Feature | Description |
|---|---|
| πΏ NDVI Processing | Compute vegetation index from multi-band satellite data |
| β°οΈ Terrain Analysis | Slope, aspect, elevation gradients, and roughness |
| βοΈ Solar Suitability | Score regions for solar panel installation potential |
| π Airflow Pipeline | Automated DAG with parallel processing and validation |
| ποΈ PostGIS Schema | Spatial indexing with 7 tables and 3 analytics views |
| π dbt Models | 3 staging + 3 mart models with schema tests |
| π€ AI Query Agent | Natural language β SQL with LangChain + Gemini/OpenAI |
| π RAG Pipeline | Domain knowledge retrieval for contextual responses |
| πΊοΈ Interactive Maps | Folium maps with choropleth layers and heatmaps |
| π Analytics Charts | Plotly visualizations with correlation analysis |
git clone https://github.com/yourusername/geoinsight-ai.git
cd geoinsight-aipython -m venv venv
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activatepip install -r requirements.txtcp .env.example .env
# Edit .env with your API key:
# GEMINI_API_KEY=your_key_here# Generate synthetic geospatial data
python scripts/generate_sample_data.py
# Process NDVI and terrain
python -m geoinsight.processing.ndvi
python -m geoinsight.processing.terrain
# Extract spatial features
python -m geoinsight.processing.features
# Load into database
python -m pipeline.ingestion.ingeststreamlit run dashboard/app.pyOpen http://localhost:8501 and start exploring!
π‘ Tip: The dashboard has a one-click "Generate & Load Sample Data" button if you skip step 5.
The agent converts natural language to geospatial SQL:
| Query | What It Does |
|---|---|
| "Show areas with high vegetation density and low terrain slope" | Filters NDVI > 0.5 and slope < 15Β° |
| "Find regions with tree canopy coverage greater than 60%" | Queries canopy_coverage_pct > 60 |
| "Which regions have the highest NDVI values?" | Ranks by ndvi_mean DESC |
| "Identify locations suitable for solar panel installation" | Filters slope < 15Β° and sunlight > 6 hrs |
| "Compare vegetation across different land types" | GROUP BY land_type aggregation |
The agent works in two modes:
- LLM Mode (with API key): Uses Gemini/OpenAI for intelligent SQL generation
- Rule-Based Mode (no API key): Uses keyword matching for common query patterns
geoinsight-ai/
βββ geoinsight/ # Core processing package
β βββ processing/
β β βββ ndvi.py # NDVI computation & classification
β β βββ terrain.py # Slope, aspect, solar analysis
β β βββ features.py # Zonal stats & feature extraction
β β βββ utils.py # Coordinate & I/O utilities
β βββ agent/
β βββ query_agent.py # LangChain text-to-SQL agent
β βββ prompts.py # System prompts & few-shot examples
β βββ rag.py # RAG knowledge retrieval
β βββ tools.py # Custom geospatial tools
β
βββ pipeline/ # Data pipeline
β βββ dags/
β β βββ geospatial_ingestion_dag.py # Airflow DAG
β βββ ingestion/
β β βββ ingest.py # Data loading logic
β βββ validation/
β βββ validators.py # Quality checks
β
βββ database/
β βββ schema.sql # PostGIS DDL (7 tables, 3 views)
β βββ connection.py # DB manager (PostGIS + SQLite)
β
βββ dbt_models/
β βββ dbt_project.yml
β βββ models/
β βββ staging/ # stg_regions, stg_ndvi, stg_terrain
β βββ marts/ # vegetation, terrain, region summary
β
βββ dashboard/
β βββ app.py # Streamlit main app
β βββ components/
β β βββ map_view.py # Folium interactive maps
β β βββ chat_interface.py # AI agent chat UI
β β βββ analytics.py # Plotly charts
β βββ static/
β βββ style.css # Premium dark theme
β
βββ scripts/
β βββ generate_sample_data.py # Synthetic data generator
β
βββ tests/ # 36 unit tests
β βββ test_processing.py
β βββ test_pipeline.py
β βββ test_agent.py
β
βββ data/
β βββ raw/ # Input data (generated)
β βββ processed/ # Processed outputs
β
βββ docker-compose.yml # PostGIS + Airflow
βββ requirements.txt
βββ pyproject.toml
βββ Makefile
βββ README.md
| Layer | Technologies |
|---|---|
| Geospatial | GeoPandas, Rasterio, Shapely, NumPy, SciPy |
| Data Pipeline | Apache Airflow, Python, dbt |
| Database | PostgreSQL + PostGIS (production) / SQLite (dev) |
| AI Agent | LangChain, Google Gemini / OpenAI, RAG |
| Dashboard | Streamlit, Folium, Plotly |
| Testing | pytest |
For full PostGIS + Airflow deployment:
docker-compose up -dServices:
- PostGIS:
localhost:5432(user: geoinsight, pass: geoinsight) - Airflow:
localhost:8080(user: admin, pass: admin)
python -m pytest tests/ -v- Ingestion β Raw satellite/terrain data (GeoTIFF, GeoJSON)
- Validation β Schema, CRS, and quality checks
- NDVI Processing β Vegetation index from NIR/RED bands
- Terrain Analysis β Slope, aspect, solar suitability
- Feature Extraction β Zonal statistics per region
- Database Loading β PostGIS with spatial indexes
- dbt Transformation β Analytics-ready mart tables
GeoInsight AI β Intelligent Geospatial Data Platform
Python, GeoPandas, Airflow, PostGIS, dbt, LangChain
β’ Built a production-style geospatial pipeline that ingests satellite datasets
and extracts spatial features including NDVI vegetation index and terrain slope.
β’ Developed an Airflow-orchestrated ELT pipeline with PostGIS storage and
modular dbt models for spatial analytics.
β’ Implemented an AI-powered data agent using LangChain to convert natural-language
queries into geospatial SQL queries and visualize results on interactive maps.
MIT License β See LICENSE file for details.
Built with β€οΈ for Geospatial Intelligence