Skip to content

codecaffin4346/Geospatial-Intelligence-Platform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🌍 GeoInsight AI

Intelligent Geospatial Data Platform

A production-style geospatial data pipeline that ingests satellite datasets, extracts spatial features, and enables natural-language exploration through an AI-powered query agent.

Python PostGIS Airflow LangChain Streamlit


πŸ—οΈ System Architecture

              Satellite / Environmental Data
                          β”‚
                          β–Ό
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚    Apache Airflow DAG    β”‚
            β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
            β”‚  β”‚  Check Raw Data   β”‚  β”‚
            β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
            β”‚           β–Ό             β”‚
            β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
            β”‚  β”‚  Validate Schema  β”‚  β”‚
            β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
            β”‚           β–Ό             β”‚
            β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
            β”‚  β”‚     PARALLEL      β”‚  β”‚
            β”‚  β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”€β”  β”‚  β”‚
            β”‚  β”‚ β”‚ NDVI β”‚β”‚Slope β”‚  β”‚  β”‚
            β”‚  β”‚ β””β”€β”€β”¬β”€β”€β”€β”˜β””β”€β”€β”¬β”€β”€β”€β”˜  β”‚  β”‚
            β”‚  β””β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”˜  β”‚
            β”‚       β–Ό       β–Ό         β”‚
            β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
            β”‚  β”‚ Feature Extract   β”‚  β”‚
            β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
            β”‚           β–Ό             β”‚
            β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
            β”‚  β”‚  Load to PostGIS  β”‚  β”‚
            β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚
                          β–Ό
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚   PostGIS / SQLite DB    β”‚
            β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
            β”‚  β”‚  spatial_features β”‚  β”‚
            β”‚  β”‚  regions          β”‚  β”‚
            β”‚  β”‚  ndvi_data        β”‚  β”‚
            β”‚  β”‚  terrain_data     β”‚  β”‚
            β”‚  β”‚  land_cover       β”‚  β”‚
            β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β–Ό                     β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚    dbt Models     β”‚  β”‚  AI Query Agent  β”‚
    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
    β”‚  β”‚  Staging   β”‚  β”‚  β”‚  β”‚ LangChain  β”‚  β”‚
    │  │  Marts     │  │  │  │ Text→SQL   │  │
    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  β”‚  β”‚ RAG        β”‚  β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
                          β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                   β”‚
                                   β–Ό
                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                      β”‚   Streamlit Dashboard    β”‚
                      β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
                      β”‚  β”‚  Map   β”‚ β”‚  Chat  β”‚  β”‚
                      β”‚  β”‚Explorerβ”‚ β”‚ Agent  β”‚  β”‚
                      β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
                      β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
                      β”‚  β”‚Analyticsβ”‚ β”‚ Status β”‚  β”‚
                      β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

✨ Key Features

Feature Description
🌿 NDVI Processing Compute vegetation index from multi-band satellite data
⛰️ Terrain Analysis Slope, aspect, elevation gradients, and roughness
β˜€οΈ Solar Suitability Score regions for solar panel installation potential
πŸ”„ Airflow Pipeline Automated DAG with parallel processing and validation
πŸ—„οΈ PostGIS Schema Spatial indexing with 7 tables and 3 analytics views
πŸ“ dbt Models 3 staging + 3 mart models with schema tests
πŸ€– AI Query Agent Natural language β†’ SQL with LangChain + Gemini/OpenAI
πŸ“š RAG Pipeline Domain knowledge retrieval for contextual responses
πŸ—ΊοΈ Interactive Maps Folium maps with choropleth layers and heatmaps
πŸ“Š Analytics Charts Plotly visualizations with correlation analysis

πŸš€ Quick Start

1. Clone the Repository

git clone https://github.com/yourusername/geoinsight-ai.git
cd geoinsight-ai

2. Set Up Virtual Environment

python -m venv venv
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Configure Environment

cp .env.example .env
# Edit .env with your API key:
#   GEMINI_API_KEY=your_key_here

5. Generate Sample Data & Run Pipeline

# Generate synthetic geospatial data
python scripts/generate_sample_data.py

# Process NDVI and terrain
python -m geoinsight.processing.ndvi
python -m geoinsight.processing.terrain

# Extract spatial features
python -m geoinsight.processing.features

# Load into database
python -m pipeline.ingestion.ingest

6. Launch Dashboard

streamlit run dashboard/app.py

Open http://localhost:8501 and start exploring!

πŸ’‘ Tip: The dashboard has a one-click "Generate & Load Sample Data" button if you skip step 5.


πŸ€– AI Agent – Example Queries

The agent converts natural language to geospatial SQL:

Query What It Does
"Show areas with high vegetation density and low terrain slope" Filters NDVI > 0.5 and slope < 15Β°
"Find regions with tree canopy coverage greater than 60%" Queries canopy_coverage_pct > 60
"Which regions have the highest NDVI values?" Ranks by ndvi_mean DESC
"Identify locations suitable for solar panel installation" Filters slope < 15Β° and sunlight > 6 hrs
"Compare vegetation across different land types" GROUP BY land_type aggregation

The agent works in two modes:

  • LLM Mode (with API key): Uses Gemini/OpenAI for intelligent SQL generation
  • Rule-Based Mode (no API key): Uses keyword matching for common query patterns

πŸ“ Project Structure

geoinsight-ai/
β”œβ”€β”€ geoinsight/                  # Core processing package
β”‚   β”œβ”€β”€ processing/
β”‚   β”‚   β”œβ”€β”€ ndvi.py              # NDVI computation & classification
β”‚   β”‚   β”œβ”€β”€ terrain.py           # Slope, aspect, solar analysis
β”‚   β”‚   β”œβ”€β”€ features.py          # Zonal stats & feature extraction
β”‚   β”‚   └── utils.py             # Coordinate & I/O utilities
β”‚   └── agent/
β”‚       β”œβ”€β”€ query_agent.py       # LangChain text-to-SQL agent
β”‚       β”œβ”€β”€ prompts.py           # System prompts & few-shot examples
β”‚       β”œβ”€β”€ rag.py               # RAG knowledge retrieval
β”‚       └── tools.py             # Custom geospatial tools
β”‚
β”œβ”€β”€ pipeline/                    # Data pipeline
β”‚   β”œβ”€β”€ dags/
β”‚   β”‚   └── geospatial_ingestion_dag.py  # Airflow DAG
β”‚   β”œβ”€β”€ ingestion/
β”‚   β”‚   └── ingest.py            # Data loading logic
β”‚   └── validation/
β”‚       └── validators.py        # Quality checks
β”‚
β”œβ”€β”€ database/
β”‚   β”œβ”€β”€ schema.sql               # PostGIS DDL (7 tables, 3 views)
β”‚   └── connection.py            # DB manager (PostGIS + SQLite)
β”‚
β”œβ”€β”€ dbt_models/
β”‚   β”œβ”€β”€ dbt_project.yml
β”‚   └── models/
β”‚       β”œβ”€β”€ staging/             # stg_regions, stg_ndvi, stg_terrain
β”‚       └── marts/               # vegetation, terrain, region summary
β”‚
β”œβ”€β”€ dashboard/
β”‚   β”œβ”€β”€ app.py                   # Streamlit main app
β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”œβ”€β”€ map_view.py          # Folium interactive maps
β”‚   β”‚   β”œβ”€β”€ chat_interface.py    # AI agent chat UI
β”‚   β”‚   └── analytics.py         # Plotly charts
β”‚   └── static/
β”‚       └── style.css            # Premium dark theme
β”‚
β”œβ”€β”€ scripts/
β”‚   └── generate_sample_data.py  # Synthetic data generator
β”‚
β”œβ”€β”€ tests/                       # 36 unit tests
β”‚   β”œβ”€β”€ test_processing.py
β”‚   β”œβ”€β”€ test_pipeline.py
β”‚   └── test_agent.py
β”‚
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw/                     # Input data (generated)
β”‚   └── processed/               # Processed outputs
β”‚
β”œβ”€β”€ docker-compose.yml           # PostGIS + Airflow
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ Makefile
└── README.md

πŸ› οΈ Technology Stack

Layer Technologies
Geospatial GeoPandas, Rasterio, Shapely, NumPy, SciPy
Data Pipeline Apache Airflow, Python, dbt
Database PostgreSQL + PostGIS (production) / SQLite (dev)
AI Agent LangChain, Google Gemini / OpenAI, RAG
Dashboard Streamlit, Folium, Plotly
Testing pytest

🐳 Docker Setup (Production)

For full PostGIS + Airflow deployment:

docker-compose up -d

Services:

  • PostGIS: localhost:5432 (user: geoinsight, pass: geoinsight)
  • Airflow: localhost:8080 (user: admin, pass: admin)

πŸ§ͺ Running Tests

python -m pytest tests/ -v

πŸ“Š Data Pipeline Stages

  1. Ingestion – Raw satellite/terrain data (GeoTIFF, GeoJSON)
  2. Validation – Schema, CRS, and quality checks
  3. NDVI Processing – Vegetation index from NIR/RED bands
  4. Terrain Analysis – Slope, aspect, solar suitability
  5. Feature Extraction – Zonal statistics per region
  6. Database Loading – PostGIS with spatial indexes
  7. dbt Transformation – Analytics-ready mart tables

🎯 Resume Entry

GeoInsight AI – Intelligent Geospatial Data Platform
Python, GeoPandas, Airflow, PostGIS, dbt, LangChain

β€’ Built a production-style geospatial pipeline that ingests satellite datasets
  and extracts spatial features including NDVI vegetation index and terrain slope.

β€’ Developed an Airflow-orchestrated ELT pipeline with PostGIS storage and
  modular dbt models for spatial analytics.

β€’ Implemented an AI-powered data agent using LangChain to convert natural-language
  queries into geospatial SQL queries and visualize results on interactive maps.

πŸ“„ License

MIT License – See LICENSE file for details.


Built with ❀️ for Geospatial Intelligence

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages