<a href="https://colab.research.google.com/github/das-apratim/GeospatialDeepLearning/blob/main/LULC_Classification_Using_MachineLearning_Part1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Image Classification Using K-Means Clustering**  

K-Means clustering is an unsupervised machine learning technique used for classifying satellite images into different land cover types based on spectral similarity. It groups pixels with similar spectral characteristics into clusters, making it a useful approach for land use and land cover (LULC) classification without requiring labeled training data.  

#### **How It Works**  
1. The algorithm randomly initializes cluster centroids based on the number of desired classes.  
2. Each pixel is assigned to the nearest centroid based on spectral values.  
3. The centroids are recalculated iteratively until cluster assignments stabilize.  
4. The final clusters represent different land cover types, such as vegetation, water, and urban areas.  

#### **Applications**  
- Land cover classification  
- Change detection analysis  
- Water body identification  
- Agricultural monitoring  

K-Means is a simple yet effective method for quick and unsupervised image classification, but it requires careful selection of the number of clusters and may benefit from additional preprocessing techniques such as Principal Component Analysis (PCA) or spectral indices to enhance classification accuracy.

### Download Sample Data and Unzip

In [None]:
!wget https://github.com/das-apratim/GeospatialDeepLearning/blob/main/data/sampled_data.zip?raw=true -O sampled_data.zip
!unzip -q sampled_data.zip -d nz_imagery_sample

## Install Required Libraries

In [None]:
!pip install -q rasterio

## Import Libraries

In [None]:
import numpy as np
import rasterio
from rasterio.plot import show
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
import pandas as pd
import seaborn as sns
from tqdm import tqdm
import os
from scipy.ndimage import generic_filter
from glob import glob
from sklearn.preprocessing import StandardScaler

## Read Sentinal Bands

In [None]:
# Sentinel-2 band files (Modify this for your dataset)
out_profile= None
transform = None
band_files = glob("nz_imagery_sample/*.tif")

h = 0
w = 0
bands = {}

with rasterio.open(band_files[0]) as data:
    out_profile = data.profile.copy()
    transform = data.transform
    crs = data.crs
    h = data.height
    w = data.width

out_profile.update({"transform": transform})
out_profile.update({"crs": crs})

for f in band_files:
  data = rasterio.open(f)
  ras_data = data.read(1)
  ras_data = ras_data[0:h, 0:w]
  bands[f.split("/")[-1].split(".")[0]] = ras_data

## Canculate Supportive Indices (NDVI, NDBI, NDWI)
### **NDVI, NDBI, NDWI and Their Role in Image Classification**  

In remote sensing, spectral indices like **NDVI (Normalized Difference Vegetation Index), NDBI (Normalized Difference Built-up Index), and NDWI (Normalized Difference Water Index)** help enhance specific land cover features for classification. These indices are derived from multispectral satellite imagery and play a crucial role in distinguishing vegetation, urban areas, and water bodies.  

#### **1. NDVI (Normalized Difference Vegetation Index)**  
NDVI is used to assess vegetation health and distribution. It is calculated using the Near-Infrared (NIR) and Red bands:  
- Higher NDVI values indicate dense vegetation.  
- Lower NDVI values suggest barren land, urban areas, or water.  

#### **2. NDBI (Normalized Difference Built-up Index)**  
NDBI helps in identifying built-up and urbanized areas. It is derived from the Shortwave Infrared (SWIR) and Near-Infrared (NIR) bands:  
- High NDBI values indicate built-up regions.  
- Low values represent vegetation, water, or bare land.  

#### **3. NDWI (Normalized Difference Water Index)**  
NDWI is used for water body detection and is calculated using the Green and Near-Infrared (NIR) bands:  
- Higher NDWI values highlight water bodies.  
- Lower values indicate land features like vegetation or urban areas.  

#### **Role in Image Classification**  
These indices serve as additional input bands in classification models such as **PCA + K-Means, Random Forest, or SVM** by:  
- Enhancing spectral differences between land cover types.  
- Improving accuracy in distinguishing vegetation, water, and urban areas.  
- Reducing misclassification by incorporating meaningful spectral features.  

By integrating NDVI, NDBI, and NDWI with multispectral bands, classification results become more precise, aiding in better Land Use/Land Cover (LULC) mapping.

In [None]:
## Add Derived Bands
epsilon = 1e-6

# Compute NDVI, NDWI, and NDBI
ndvi = (bands["B08_10m"] - bands["B04_10m"]) / (bands["B08_10m"] + bands["B04_10m"] + epsilon)
ndwi = (bands["B03_10m"] - bands["B08_10m"]) / (bands["B03_10m"] + bands["B08_10m"] + epsilon)
ndbi = (bands["resampled_B11_20m"] - bands["B08_10m"]) / (bands["resampled_B11_20m"] + bands["B08_10m"] + epsilon)


# Read bands and stack them
stacked_image = np.stack([bands["B02_10m"], bands["B03_10m"], bands["B04_10m"], bands["B08_10m"], bands["resampled_B11_20m"], bands["resampled_B12_20m"], ndvi, ndwi, ndbi], axis=-1)

## K-Means clustring

#### Setting Up K-Means for 5 Primary classes and Saving the output

In [None]:
# Reshape for Clustering
height, width, bands = stacked_image.shape
pixels = stacked_image.reshape(-1, bands)  # Flatten to (n_samples, n_features)

# Normalize Pixel Values
scaler = StandardScaler()
pixels_norm = scaler.fit_transform(pixels)

# Define number of clusters (LULC classes)
n_clusters = 5  # Example: Water, Vegetation, Urban, Bare Land, Agriculture

# Train K-Means
kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=20)
clusters = kmeans.fit_predict(pixels_norm)

# Reshape to Image Dimensions
lulc_map = clusters.reshape(h, w)

# Reshape back to image dimensions
clustered_map = clusters.reshape(height, width)

# Save K-Means Clustering Output
out_profile.update({"count": 1})

with rasterio.open("kmeans_indices.tif","w",**out_profile) as dst:
    dst.write(clustered_map.astype(rasterio.uint8), 1)

print("Clustered image saved as 'kmeans_indices.tif'")

## Preview The Classified Data

In [None]:
with rasterio.open("kmeans_indices.tif") as src:
    pca_data = src.read()
    show(pca_data, cmap='viridis') # You can change the colormap
    plt.show()