Updated with Bands informaiton for V3.

# Land Cover Classification
The original land use mentoined in the Landuse_Ma columns of the massavi is out dated and donot match with the ground realities. The land use and total number of parcels in the given dataset are 

  | Land Use      | No. of Parcels | Area (Kanal, Approx.) |
  |---------------|----------------|------------------------|
  | Agriculture   | 5,759          | 32,729                 |
  | Stream        | 302            | 822                    |
  | Other         | 229            | 1,047                  |
  | Road/Street   | 172            | 1,668                  |
  | Graveyard     | 29             | 202                    |
  | Built-Up      | 24             | 602                    |


Some of the anomalies in this dataset are as follow
- The type mentioned in Land use column as agricultue has been converted into builtup, not suitable for crop classification dataset.
- The type mentioned in Land use column as agricultue has now both builtup and agriculture, should be cleaned to make it suiutable for crop classification.
- Some of the on ground agriculture parcels has been recorded as other in Land use column maynot participate in crop classificaiton dataset as per this dataset.
## Purpose
The purpose of this exercise is to get clean agriculture parcels dataset for crop classificaiton using artifical intelligence algorithms like random forest, SVM etc. 

## Data Cleansing Methodology:
Since the agriculture column in the Land use column does not give us a clear idea of the current land type therefore, we have to use other means to computer the current land use type. Therefore, as first step we will filter the 5,759 agriculter parcels and compute NDVI and NDBI.
1. Get imagery from the ee for the time peroid. In this case i got 135 images for the year 2024. 
2. Pure agriculture: if NDVI is >= 3 and <= 7  the current_land_type is agriculture
3. Pure builtup: if the NDBI value is >0 and NDVI value is <=2 the current_land_type is builtup
4. Mix: compute the percentage parcel coverage of NDVI and NDBI and use a threshold to classify either builtup or agriculture
The resultant dataset will be clean revised agriculture parcels and can be used for crop classification.





In [2]:
#Step 1: Load the original dataset having Landuse_Ma columon. This dataset also have crop information. 

import geopandas as gpd
import pandas as pd
import numpy as np
import leafmap
import geemap
import ee

In [3]:
cropfile = 'pabbi_crop.geojson'
gdf = gpd.read_file(cropfile)
gdf['Landuse_Ma'].groupby(gdf['Landuse_Ma']).count().sort_values(ascending=False)

Landuse_Ma
Agriculture     5759
Stream           302
Other            229
Road/Streets     172
Graveyard         29
Built up          24
Name: Landuse_Ma, dtype: int64

In [4]:

selected_parcels = gdf[gdf['Landuse_Ma'] == 'Agriculture']
print(f'Total Agriculture Parcels are :{len(selected_parcels)}') 
# These are the parcels with Landuse_Ma as Agriculture and will be used for further analysis.

Total Agriculture Parcels are :5759


## Data Cleansing
1. Calculating NDVI and NDBI of the selected agriculture parcels : 5759.

In [5]:
# Lets Convert the gdf of the original file to ee features utilizing earth engine power
ee.Initialize()
ee.Authenticate()
boundary = geemap.gdf_to_ee(gdf) # this is for cliping the images
selectedparcels_fc = geemap.gdf_to_ee(selected_parcels) # this is for getting the agriculture parcels features Collection 


In [6]:
# Define a function to get imagery an apply filter
def get_collection (start_date,end_date):
    collection = ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')
    filtered_collection = collection.filterDate(start_date, end_date).filterBounds(boundary.geometry()).filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 10))
    return filtered_collection

In [7]:
# defing ndvi , NDBI , BUI and UI indices

def get_ndvi(filtered_collection):
    # Calculate the annual NDVI from the filtered collection (filtered_collection)
    ndvi_collection = filtered_collection.map(lambda img: img.normalizedDifference(['B8', 'B4']).rename('NDVI'))
    # Calculate the mean NDVI for the year
    mean_ndvi = ndvi_collection.mean()
    return mean_ndvi
    
def get_ndbi(filtered_collection):
    # Calculate Normalized Difference Built-up Index (NDBI) (SWIR−NIR)/(SWIR+NIR) from the filtered collection filtered_collection
    # NDBI = (B11 - B8) / (B11 + B8)
    # Built‑up areas (positive values) vs. vegetation/water (negative)
    ndbi_collection = filtered_collection.map(lambda img: img.normalizedDifference(['B11', 'B8']).rename('NDBI'))
    # Calculate the mean NDBI for the year
    mean_ndbi = ndbi_collection.mean()
    return mean_ndbi
def get_ndwi(filtered_collection):
    # Calculate Normalized Difference Water Index (NDWI) (Green-NIR)/(Green+NIR) from the filtered collection filtered_collection
    # NDWI = (B3 - B8) / (B3 + B8)
    # Water (positive values) vs. built-up/vegetation (negative)
    ndwi_collection = filtered_collection.map(lambda img: img.normalizedDifference(['B3', 'B8']).rename('NDWI'))
    # Calculate the mean NDWI for the year
    mean_ndwi = ndwi_collection.mean()
    return mean_ndwi

def get_bui(filtered_collection):
    # Calculate the annual Built-Up Index (BUI) for an ee.ImageCollection `filtered_collection`.
    # BUI: (B11 + B4 - B8 - B2) / (B11 + B4 + B8 + B2)
    # Enhances urban density separation from other land‑covers
    bui_collection = filtered_collection.map(lambda img: (img.select('B11').add(img.select('B4')).subtract(img.select('B8'))
           .subtract(img.select('B2'))
           .divide(
               img.select('B11').add(img.select('B4'))
                  .add(img.select('B8'))
                  .add(img.select('B2'))
           )
           .rename('BUI')      
                    ))
    mean_bui = bui_collection.mean()
    return mean_bui
    
def get_ui(filtered_collection):
    # Calculate Urban Index (UI) (NDBI−NDVI)/(NDBI+NDVI) for the collection
    # Separates built‑up from vegetation more robustly
    mean_ndvi = get_ndvi(filtered_collection)
    mean_ndbi = get_ndbi(filtered_collection)
    mean_ui = mean_ndbi.subtract(mean_ndvi).divide(mean_ndbi.add(mean_ndvi)).rename('UI')
    return mean_ui



In [None]:
def get_composite(filtered_collection,ndvi_img, ndbi_img, bui_img, ui_img,ndwi_img):
    """
    Stack the four annual‐mean index images into one composite,
    then clip it to the boundary.
    """
    selected_bands = ['B2', 'B3', 'B4', 'B8', 'B11']
    filtered_collection = filtered_collection.select(selected_bands)
    
    #Now add the indices to each image in the filtered collection
    filtered_collection = filtered_collection.map(lambda img: img.addBands(ndvi_img).addBands(ndbi_img).addBands(bui_img).addBands(ui_img).addBands(ndwi_img))  
    
    # Reduce the collection to a mean single image
    composite_img = filtered_collection.mean()
    
    # Rename the bands in the correct order
    composite_img = composite_img.rename(['B2', 'B3', 'B4', 'B8', 'B11', 'NDVI', 'NDBI', 'BUI', 'UI','NDWI'])
    
    # 3. Clip to your boundary
    composite_img = composite_img.clip(boundary.geometry())
    
    return composite_img


In [9]:
filtered_collection = get_collection('2022-01-01', '2024-12-31')
ndvi_img = get_ndvi(filtered_collection)
ndbi_img = get_ndbi(filtered_collection)
bui_img = get_bui(filtered_collection)
ui_img = get_ui(filtered_collection)
ndwi_img = get_ndwi(filtered_collection)

composite_img = get_composite(filtered_collection, ndvi_img, ndbi_img, bui_img, ui_img, ndwi_img)

In [10]:
#filtered_collection.size().getInfo() # this will give the number of images in the collection
# show the band information of the composite image

# Show the band names of the composite image
def show_band_names(composite_img):
    band_names = composite_img.bandNames().getInfo()
    print("Band names in the composite image:")
    for name in band_names:
        print(name)

show_band_names(composite_img)

Band names in the composite image:
B2
B3
B4
B8
B11
NDVI
NDBI
BUI
UI
NDWI


## Create a binary mask for each pixel in the composite.
1. if the NDVI value of a pixel is greater than 0.25 it is agricultue pixel
2. if the NDBI value of a pixel is greater than 0.0 than it is a builtup pixel
3. if the NDWI value of a pixel is greate than 0.3 than its water.

In [None]:
builtmask = composite_img.select('BUI').gt(0)
agrimask = composite_img.select('NDVI').gt(0.25)


In [26]:
  # Getting pixel count for calculating percentage of parcel and plx mean for pure agri/built 
def sum_count(composite_img, builtmask, agrimask,parcels_fc):
    
    built = builtmask.rename('built_count').toInt()
    agri  = agrimask.rename('agri_count').toInt()
    
    mask_img = ee.Image.cat([built, agri])

    pixel_sum = mask_img.reduceRegions(
        collection=parcels_fc,
        reducer=ee.Reducer.sum(),
        scale=10,
        crs=composite_img.projection()
    )
    return pixel_sum

def mean_indices(composite_img, parcels_fc):
    #Select the bands of interest
    b2 = composite_img.select('B2')
    b3 = composite_img.select('B3')
    b4 = composite_img.select('B4')
    b8 = composite_img.select('B8')
    b11 = composite_img.select('B11')


    # Select the indices
    ndvi = composite_img.select('NDVI')
    ndbi = composite_img.select('NDBI')
    ui = composite_img.select('UI')
    bui = composite_img.select('BUI')
    ndwi = composite_img.select('NDWI')

    index_img = ee.Image.cat([b2,b3,b4,b8,b11,ndvi,ndbi,bui,ui,ndwi])
    mean_img = index_img.reduceRegions(
        collection=parcels_fc,
        reducer=ee.Reducer.mean(),
        scale=10,
        crs=composite_img.projection()
    )
    return mean_img


In [27]:
pxlsum = sum_count(composite_img, builtmask, agrimask,selectedparcels_fc)
pxlmean = mean_indices(composite_img, selectedparcels_fc)
# Convert the ee.FeatureCollection to a pandas DataFrame
pxlsum_gdf= geemap.ee_to_gdf(pxlsum)
pxlmean_gdf= geemap.ee_to_gdf(pxlmean)  


In [28]:
pxlmean_gdf.head()

Unnamed: 0,geometry,Area_Acre,B11,B2,B3,B4,B8,BUI,Crop_Type,FFID,Landuse_Ma,Mouza_Name,NDBI,NDVI,NDWI,Parcel_ID,UI
0,"POLYGON ((71.74996 34.03193, 71.75027 34.03195...",0.101785,2147.930654,1080.794583,1314.424346,1412.001354,2250.015724,0.037398,,1,Agriculture,Khushmaqam,-0.016614,0.226656,-0.261929,668,-1.202087
1,"POLYGON ((71.74868 34.03205, 71.74869 34.03195...",0.315557,2148.237869,905.980655,1133.265953,1248.493593,2115.168267,0.060771,,3,Agriculture,Khushmaqam,0.013744,0.254974,-0.298905,632,-0.876391
2,"POLYGON ((71.75021 34.03218, 71.75022 34.0321,...",0.187644,2198.647527,989.403836,1234.108843,1331.561342,2231.942662,0.047527,,4,Agriculture,Khushmaqam,-0.003842,0.248897,-0.284861,669,-1.022695
3,"POLYGON ((71.75401 34.03224, 71.75403 34.03219...",0.161568,2193.297278,806.999988,1099.118312,1177.751882,2761.666157,-0.038227,,5,Agriculture,Khushmaqam,-0.109844,0.41013,-0.431634,693,-1.738528
4,"POLYGON ((71.74971 34.03192, 71.74996 34.03193...",0.229336,2126.134001,1160.127481,1376.297936,1480.144814,2196.267099,0.039058,,6,Agriculture,Khushmaqam,-0.011919,0.197473,-0.233127,667,-1.196795


In [29]:
# since both dataframes have duplicate columns therefore to avoid duplicaiton 
# we will select the columns we need for merging
mean_cols = ['geometry','NDVI','NDBI','BUI','UI','NDWI','B2','B3','B4','B8','B11']
plxmean_small = pxlmean_gdf[mean_cols]

#Now we will merge the two dataframes
stats_gdf = pxlsum_gdf.merge(plxmean_small, on='geometry', how='left')

print(stats_gdf.columns.tolist())

['geometry', 'Area_Acre', 'Crop_Type', 'FFID', 'Landuse_Ma', 'Mouza_Name', 'Parcel_ID', 'agri_count', 'built_count', 'NDVI', 'NDBI', 'BUI', 'UI', 'NDWI', 'B2', 'B3', 'B4', 'B8', 'B11']


In [30]:
# Now stats_gdf has columns 'built_count' and 'agri_count' per parcel
# Lets compute the percentage of built and agri land
stats_gdf['pct_built'] = stats_gdf['built_count'] / (stats_gdf['built_count'] + stats_gdf['agri_count'])
stats_gdf['pct_agri']  = stats_gdf['agri_count']  / (stats_gdf['built_count'] + stats_gdf['agri_count'])
# Now we will drop the columns we do not need

In [31]:
stats_gdf

Unnamed: 0,geometry,Area_Acre,Crop_Type,FFID,Landuse_Ma,Mouza_Name,Parcel_ID,agri_count,built_count,NDVI,...,BUI,UI,NDWI,B2,B3,B4,B8,B11,pct_built,pct_agri
0,"POLYGON ((71.74996 34.03193, 71.75027 34.03195...",0.101785,,1,Agriculture,Khushmaqam,668,1.721569,4.980392,0.226656,...,0.037398,-1.202087,-0.261929,1080.794583,1314.424346,1412.001354,2250.015724,2147.930654,0.743125,0.256875
1,"POLYGON ((71.74868 34.03205, 71.74869 34.03195...",0.315557,,3,Agriculture,Khushmaqam,632,5.627451,15.454902,0.254974,...,0.060771,-0.876391,-0.298905,905.980655,1133.265953,1248.493593,2115.168267,2148.237869,0.733073,0.266927
2,"POLYGON ((71.75021 34.03218, 71.75022 34.0321,...",0.187644,,4,Agriculture,Khushmaqam,669,4.796078,9.149020,0.248897,...,0.047527,-1.022695,-0.284861,989.403836,1234.108843,1331.561342,2231.942662,2198.647527,0.656074,0.343926
3,"POLYGON ((71.75401 34.03224, 71.75403 34.03219...",0.161568,,5,Agriculture,Khushmaqam,693,6.192157,1.698039,0.410130,...,-0.038227,-1.738528,-0.431634,806.999988,1099.118312,1177.751882,2761.666157,2193.297278,0.215209,0.784791
4,"POLYGON ((71.74971 34.03192, 71.74996 34.03193...",0.229336,,6,Agriculture,Khushmaqam,667,2.674510,11.192157,0.197473,...,0.039058,-1.196795,-0.233127,1160.127481,1376.297936,1480.144814,2196.267099,2126.134001,0.807127,0.192873
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5754,"POLYGON ((71.76623 34.0439, 71.76626 34.04366,...",0.171500,,6723,Agriculture,Amankot,843,8.384314,0.000000,0.571489,...,-0.101405,-1.856395,-0.555056,460.745576,706.997455,669.423946,2584.963943,1786.159601,0.000000,1.000000
5755,"POLYGON ((71.76681 34.0447, 71.76711 34.04347,...",0.466952,,6724,Agriculture,Amankot,848,22.807843,0.000000,0.420987,...,-0.035676,-1.744243,-0.461200,606.152732,866.504644,990.755175,2359.760189,1915.876354,0.000000,1.000000
5756,"POLYGON ((71.77759 34.02906, 71.77783 34.02811...",1.469960,Wheat / Corn,6725,Agriculture,Amankot,1331,71.909804,0.000000,0.521036,...,-0.087365,-1.897647,-0.515865,566.059865,869.797291,863.894983,2831.684697,2005.133498,0.000000,1.000000
5757,"POLYGON ((71.79385 34.02814, 71.79411 34.02642...",3.634200,,6726,Agriculture,Amankot,2070,177.721569,27.772549,0.417301,...,-0.031983,-1.711403,-0.453149,676.936347,959.914195,1077.387722,2598.212482,2088.983032,0.135150,0.864850


In [38]:
def classify_hybrid(row,
                    ndvi_ag=0.6, ui_built=0.2,
                    maj_pct=0.75, min_pct=0.25):
    ndvi = row['NDVI']
    ui   = row['UI']
    pa   = row['pct_agri']
    pb   = row['pct_built']
    
    # 1) Spectral “easy wins”
    
    if ndvi >= ndvi_ag and ui <= 0:
        return 'Pure-Agriculture'
    if ui >= ui_built:
        return 'Pure-Builtup'

    # 2) Percent-based majority
    if pa >= maj_pct:
        return 'Pure-Agriculture'
    if pb >= maj_pct:
        return 'Pure-Builtup'

    # 3) Partial by percent
    if pa >= min_pct and pa > pb:
        return 'Partial-Agriculture'
    if pb >= min_pct and pb > pa:
        return 'Partial-Builtup'
    if pa >= pb and pa < min_pct:
        return 'Barren'


    # 4) Everything else → Mixed
    return 'Mixed'

# Apply and inspect
stats_gdf['landuse_class'] = stats_gdf.apply(classify_hybrid, axis=1)
print(stats_gdf['landuse_class'].value_counts())


landuse_class
Pure-Agriculture       3147
Partial-Agriculture    1100
Pure-Builtup            801
Partial-Builtup         632
Mixed                    79
Name: count, dtype: int64


In [39]:
pure_agri = stats_gdf[stats_gdf['landuse_class'] == 'Pure-Agriculture']
pure_built = stats_gdf[stats_gdf['landuse_class'] == 'Pure-Builtup']
partial_agri = stats_gdf[stats_gdf['landuse_class'] == 'Partial-Agriculture']
partial_built = stats_gdf[stats_gdf['landuse_class'] == 'Partial-Builtup']
mixed = stats_gdf[stats_gdf['landuse_class'] == 'Mixed']


In [43]:
m = leafmap.Map(center=(37.5, 70), )
m.add_basemap('Google Satellite')

# 1. Create a minimal GeoDataFrame with only geometry + landuse_class
popup_gdf = stats_gdf[['geometry', 'landuse_class']].copy()


legend_dict = {
    'Pure Agriculture':           '#00AA00',  # dark green
    'Pure Builtup':               '#0000FF',  # blue
    'Partial Agri(25-75%)':       '#FFA500',  # orange
    'Partial Builtup(25-75%)':    '#800080',  # purple
    'Mixed':                      '#FFFF00',  # yellow
}


m.add_gdf(
    popup_gdf,
    layer_name='Landuse Class',
    style={                      # pick any style you like
        'color':   'black',
        'fillColor':'transparent',
        'fillOpacity': 0.1,
        'weight':  1
    },
    zoom_to_layer=True,
    info_mode='on_click'        # show popup when you click
)

# Pure Agriculture
m.add_gdf(
    pure_agri,
    layer_name='Pure Agriculture',
    style={
        'color': 'green',      # border
        'fillColor': 'green',  # interior
        'fillOpacity': 0.4,
        'weight': 1
    }, info_mode='off'
     
)
# Pure Builtup
m.add_gdf(
    pure_built,
    layer_name='Pure Builtup',
    style={
        'color': 'blue',
        'fillColor': 'blue',
        'fillOpacity': 0.4,
        'weight': 1
    }, info_mode='off'
    
)

# Partial Agriculture
m.add_gdf(
    partial_agri,
    layer_name='Partial Agri (25-75%)',
    style={
        'color': 'orange',
        'fillColor': 'orange',
        'fillOpacity': 0.4,
        'weight': 1
    }, info_mode='off'
    
)
# Partial Builtup
m.add_gdf(
    partial_built,
    layer_name='Partial Builtup(25-75%)',
    style={
        'color': 'purple',
        'fillColor': 'purple',
        'fillOpacity': 0.4,
        'weight': 1
    }, info_mode='off'
    
)

# Other
m.add_gdf(
    mixed,
    layer_name='Other',
    style={
        'color': 'yellow',
        'fillColor': 'yellow',
        'fillOpacity': 0.4,
        'weight': 1
    }, info_mode='off'
    
)

# Add legend
m.add_legend(
    legend_title='Landuse Classification',
    legend_dict=legend_dict,
    position='bottomleft'
)

m.add_layer_manager()

m

Map(center=[37.5, 70], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoom_out_t…

In [35]:
stats_gdf.to_file('Pabbi_RevisedLandtype-V3.geojson', driver='GeoJSON')

In [137]:
m = leafmap.Map(center=(37.5, 70), zoom=6)
m.add_basemap('Google Satellite')

# 1. Create a minimal GeoDataFrame with only geometry + landuse_class
popup_gdf = stats_gdf[['geometry', 'landuse_class']].copy()

# 2. Add your styling layers *without* popups
layers_lsit = [ 
    (pure_agri,    'Pure Agriculture',    legend_dict['Pure Agriculture']),
    (pure_built,   'Pure Builtup',        legend_dict['Pure Builtup']),
    (partial_agri, 'Partial Agriculture', legend_dict['Partial Agriculture']),
    (partial_built,'Partial Builtup',     legend_dict['Partial Builtup']),
    (mixed,        'Mixed',               legend_dict['Mixed']),
        ]

for gdf, name, color in layers_lsit:
    # 2.1 Add the layer to the map   
    m.add_gdf(
        gdf,
        layer_name=name,
        style={
            'color':    color,
            'fillColor':color,
            'fillOpacity':0.3,
            'weight':   1
        },
        zoom_to_layer=True,
        info_mode='off'   # make sure no popups for these layers
    )

# 3. Add your minimal layer *last*, with popups ON only for landuse_class
m.add_gdf(
    popup_gdf,
    layer_name='Landuse Class',
    style={
        'color':      'black',
        'fillColor': 'transparent',
        'fillOpacity':0.1,
        'weight':     1
    },
    info_mode='on_click'  # only this layer shows a popup
)

# 4. Legend & Layer Manager
m.add_legend(
    legend_title='Landuse Classification',
    legend_dict=legend_dict,
    position='bottomleft'
)
m.add_layer_manager()

m


Map(center=[37.5, 70], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoom_out_t…