# Agriculture Crop type detection:

Goal: Classify different crop types in satellite imagery.
Classify wheat, maize, and sugarcane fields using multi-temporal imagery for tehsil pabbi
Dataset:
- Sentinel 2 Imagery Time peroid(-------)
- Crop Survey Data (Time peroid)


# Steps:
1. Load the pabbi crop geojson dataset.
2. 

In [247]:
import geopandas as gpd
import pandas as pd
import geemap
import ee


In [248]:
cropfile = 'pabbi_crop.geojson'
gdf = gpd.read_file(cropfile)
gdf.head()

Unnamed: 0,Mouza_Name,Landuse_Ma,Area_Acre,FFID,Parcel_ID,Crop_Type,geometry
0,Khushmaqam,Agriculture,0.101785,1,668.0,,"MULTIPOLYGON (((753923.077 3769111.141, 753894..."
1,Khushmaqam,Built up,0.036718,2,670.0,,"MULTIPOLYGON (((753959.894 3769126.291, 753959..."
2,Khushmaqam,Agriculture,0.315557,3,632.0,,"MULTIPOLYGON (((753839.267 3769129.693, 753841..."
3,Khushmaqam,Agriculture,0.187644,4,669.0,,"MULTIPOLYGON (((753952.852 3769131.737, 753946..."
4,Khushmaqam,Agriculture,0.161568,5,693.0,,"MULTIPOLYGON (((754361.202 3769166.424, 754269..."


In [249]:
gdf.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 6740 entries, 0 to 6739
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   Mouza_Name  6740 non-null   object  
 1   Landuse_Ma  6515 non-null   object  
 2   Area_Acre   6740 non-null   float64 
 3   FFID        6740 non-null   int32   
 4   Parcel_ID   6740 non-null   float64 
 5   Crop_Type   511 non-null    object  
 6   geometry    6740 non-null   geometry
dtypes: float64(2), geometry(1), int32(1), object(3)
memory usage: 342.4+ KB


In [250]:
gdf['Crop_Type'].unique()

array([None, 'Indian Squash', 'Tomato', 'Dairy Farm / Wheat', 'Corn',
       'Buitup', 'Builtup/Tomato/Wheat', 'Builtup',
       'Tube well / Persian Clover / Wheat', 'Persian Clover', 'Wheat',
       'Builtup / Persian Clover', 'Trees', 'Orchard', 'Builtup / Tomato',
       'Builtup / Barren', 'Orchard / Persian Clover',
       'Persian Clover / Wheat', 'Barren',
       'Wheat / Graveyard / Persian Clover / Barren', 'Eucalyptus',
       'Track / Persian Clover / Eucalyptus', 'Builtup / Eucalyptus',
       'Wheat / Tomato', 'Builtup / Corn', 'Sugarcane',
       'Tomato / Sugarcane', 'Builtup / Wheat',
       'Persian Clover / Builtup', 'Wheat / Sugarcane',
       'Sugarcane / Corn', 'Sugarcane / Builtup',
       'Persian Clover / Barren', 'Tomato / Indian Squash',
       'Corn / Tomato', 'Corn / Persian Clover',
       'Persian Clover / Tomato', 'No crop', 'Graveyard / Persian Clover',
       'Barren / Builtup', 'Mix', 'Wheat / Persian Clover',
       'Orchard / Wheat', 'Persian Clover

In [251]:
gdf.value_counts('Crop_Type')

Crop_Type
Tomato                                         94
Wheat                                          81
Persian Clover                                 57
Sugarcane                                      52
Barren                                         34
                                               ..
Wheat / Indian Squash / Persian Clover          1
Wheat / Graveyard / Persian Clover / Barren     1
Wheat / Sugarcane / Tomato                      1
Wheat / Sugarcane / Tomato / Indian Squash      1
Wheat / Water Channel                           1
Name: count, Length: 76, dtype: int64

In [252]:
gdf.crs

<Projected CRS: EPSG:32642>
Name: WGS 84 / UTM zone 42N
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Between 66°E and 72°E, northern hemisphere between equator and 84°N, onshore and offshore. Afghanistan. India. Kazakhstan. Kyrgyzstan. Pakistan. Russian Federation. Tajikistan. Uzbekistan.
- bounds: (66.0, 0.0, 72.0, 84.0)
Coordinate Operation:
- name: UTM zone 42N
- method: Transverse Mercator
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

In [253]:
gdf['Crop_Type'].value_counts()

Crop_Type
Tomato                                         94
Wheat                                          81
Persian Clover                                 57
Sugarcane                                      52
Barren                                         34
                                               ..
Builtup / Sugarcane / Wheat / Water channel     1
Builtup / Tomato / Wheat / Persian Clover       1
Water Channel                                   1
Garlic                                          1
Other                                           1
Name: count, Length: 76, dtype: int64

In [254]:
crop_count = gdf['Crop_Type'].value_counts()
print(crop_count)


Crop_Type
Tomato                                         94
Wheat                                          81
Persian Clover                                 57
Sugarcane                                      52
Barren                                         34
                                               ..
Builtup / Sugarcane / Wheat / Water channel     1
Builtup / Tomato / Wheat / Persian Clover       1
Water Channel                                   1
Garlic                                          1
Other                                           1
Name: count, Length: 76, dtype: int64


In [255]:
# Filtering the composite labeling
#step 1: Remove rows with '/' in 'Crop_Type'
gdf_filtered = gdf[~gdf['Crop_Type'].fillna('').str.contains('/')]

#Step 2: Remove the nocrop type from the filtered data
nocrop = ['No crop' ,'Mix', 'Trees', 'Buitup' ,'Water body', 'Water Channel','Other','Builtup','Barren','Barley','Garlic','Egg Plant','Potato','Indian Squash' ]
gdf_filtered = gdf_filtered[~gdf_filtered['Crop_Type'].isin(nocrop)]

#Step 3: Final filtered data
gdf_filtered['Crop_Type'].value_counts()
print(gdf_filtered['Crop_Type'].value_counts())
print ('Mean of Crop_Type',gdf_filtered['Crop_Type'].value_counts().mean())

# 0) Drop any rows with missing Crop_Type
gdf_filtered = gdf_filtered.dropna(subset=['Crop_Type'])

Crop_Type
Tomato            94
Wheat             81
Persian Clover    57
Sugarcane         52
Corn              16
Orchard            6
Eucalyptus         5
Lady Finger        5
Name: count, dtype: int64
Mean of Crop_Type 39.5


In [256]:
major_crop = ['Tomato', 'Wheat','Persian Clover','Sugarcane','Corn']
minor_crop = [m for m in gdf_filtered['Crop_Type'].unique() if m not in major_crop]
print('Major Crop:',major_crop)
print('Minor Crop:',minor_crop)

Major Crop: ['Tomato', 'Wheat', 'Persian Clover', 'Sugarcane', 'Corn']
Minor Crop: ['Orchard', 'Eucalyptus', 'Lady Finger']


In [257]:
# 2) Compute target sample count for minor classes
counts = gdf_filtered['Crop_Type'].value_counts()
mean_imp = counts[major_crop].mean()      # average size of the major classes
target_min = int(mean_imp * 0.6)          # e.g. 60% of that
print(f"Major mean = {mean_imp:.1f}, so target for minor = {target_min}")


Major mean = 60.0, so target for minor = 36


In [258]:
# 3) Build the new stratified‐balanced GeoDataFrame
balanced_parts = []
for crop, group in gdf_filtered.groupby('Crop_Type'):
    n = len(group)
    if crop in major_crop:
        # keep all major‐crop samples
        balanced_parts.append(group)
    else:
        # oversample minor up to target_min
        if n < target_min:
            sampled = group.sample(n=target_min, replace=True, random_state=42)
        else:
            sampled = group
        balanced_parts.append(sampled)

gdf_strat = pd.concat(balanced_parts, ignore_index=True)

# 4) Verify the new distribution
print(gdf_strat['Crop_Type'].value_counts())

Crop_Type
Tomato            94
Wheat             81
Persian Clover    57
Sugarcane         52
Lady Finger       36
Orchard           36
Eucalyptus        36
Corn              16
Name: count, dtype: int64


# Calculating NDVI through GEE, 
using old code.

In [259]:
ee.Authenticate()
ee.Initialize()

In [260]:
# Converting the gdf_balanced to ee feature collection
crop_fc = geemap.gdf_to_ee(gdf_strat)

In [264]:
#Step 1 Processing satellite Imagery

# Adding Satellite Imagery and calculating NDVI using Sentinel-2
collection = ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')
# Filter the collection for a specific region and time period
start_date = '2024-09-01'
end_date = '2025-04-30'
# Filter the collection by date and region
filtered_collection = collection.filterDate(start_date, end_date).filterBounds(crop_fc).filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 10))

# Select the first image from the filtered collection
median_image = filtered_collection.median().clip(crop_fc)

# Calculate NDVI
ndvi = median_image.normalizedDifference(['B8', 'B4']).rename('NDVI')
# McFeeters NDWI = (Green - NIR) / (Green + NIR)
ndwi = median_image.normalizedDifference(['B3', 'B8']).rename('NDWI')

# Adding other bans and calculating their mean
bands = ['B2', 'B3', 'B4', 'B8', 'B11', 'B12']
band_img = median_image.select(bands).rename([f'{b}_mean' for b in bands])
# Calculating the texture
nir_int = median_image.select('B8').toInt32()
texture = nir_int.glcmTexture(size=3)
contrast = texture.select('B8_contrast').rename('contrast')



features_img = band_img.addBands([ndvi, ndwi])
features_img = features_img.addBands(contrast)



# Select the bands of interest (e.g.,NVDI)
ndvi_vis = {
    'min': 0.25,
    'max': 0.8,
    'palette': ['white', 'yellow', 'green', 'red']  # Red for highest vegetation
}


In [265]:
# Step 2: Calculate mean NDVI for each agri polygon
crop_features = features_img.reduceRegions(
    collection=crop_fc,
    reducer=ee.Reducer.mean(),
    scale=10,
)

In [266]:
# convert the result to a data frame
gdfcrop_features = geemap.ee_to_gdf(crop_features)

In [267]:
gdfcrop_features.columns.to_list()

['geometry',
 'Area_Acre',
 'B11_mean',
 'B12_mean',
 'B2_mean',
 'B3_mean',
 'B4_mean',
 'B8_mean',
 'Crop_Type',
 'FFID',
 'Landuse_Ma',
 'Mouza_Name',
 'NDVI',
 'NDWI',
 'Parcel_ID',
 'contrast']

In [268]:
gdfcrop_features.crs

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

In [269]:
# We need to calculate the area of each polygon in square meters
# Therefore we will convert the geometry to a projected coordinate system (EPSG:32643)
gdfcrop_features = gdfcrop_features.to_crs(epsg=32643)
# Calculating the area , perimetere and compactness
gdfcrop_features['Area_m2'] = gdfcrop_features.geometry.area
gdfcrop_features['Perimeter_m'] = gdfcrop_features.geometry.length
gdfcrop_features['Compactness'] = (4 * 3.14 * gdfcrop_features['Area_m2']) / (gdfcrop_features['Perimeter_m'] ** 2)

# Switch back to lat/lon if needed for mapping
gdfcrop_features = gdfcrop_features.to_crs(epsg=4326)

#Inspect the new columns
gdfcrop_features[['Area_m2','Perimeter_m','Compactness']].head()

Unnamed: 0,Area_m2,Perimeter_m,Compactness
0,9899.429387,530.225196,0.442261
1,4117.205775,308.88064,0.542014
2,1563.555794,203.633454,0.473592
3,2233.372236,243.888865,0.471592
4,3415.636358,274.419225,0.569682


In [272]:
# Check the result
#print(gdf_with_ndvi[['Crop_Type', 'NDVI', 'NDWI']].head())
# Combining both NDVI and NDWI into single feature
feature_cols = ['NDVI', 'NDWI', 'B2_mean', 'B3_mean', 'B4_mean', 'B8_mean', 'B11_mean', 'B12_mean','Area_m2','Perimeter_m','Compactness','contrast']
# Checking again

print(gdfcrop_features[['Crop_Type'] + feature_cols ].head())


  Crop_Type      NDVI      NDWI     B2_mean     B3_mean      B4_mean  \
0      Corn  0.363044 -0.411394  702.770639  952.190151  1070.423973   
1      Corn  0.506520 -0.509906  490.154028  726.520871   733.610589   
2      Corn  0.414448 -0.450394  671.161087  921.235173  1007.449010   
3      Corn  0.225400 -0.332920  684.700233  938.364668  1186.880602   
4      Corn  0.230121 -0.339825  682.845573  926.110963  1177.688676   

       B8_mean     B11_mean     B12_mean      Area_m2  Perimeter_m  \
0  2292.649464  1900.493145  1508.502821  9899.429387   530.225196   
1  2248.600489  1583.395723  1109.417265  4117.205775   308.880640   
2  2432.392662  1935.611697  1468.468750  1563.555794   203.633454   
3  1875.594514  1750.625073  1515.669019  2233.372236   243.888865   
4  1878.816074  1728.222592  1542.005415  3415.636358   274.419225   

   Compactness      contrast  
0     0.442261  75313.082728  
1     0.542014  52865.279005  
2     0.473592  20065.261516  
3     0.471592  40653.

# Lets create the test - training split


In [273]:
from sklearn.model_selection import train_test_split
# Splitting the data into training and testing sets

#Step 1 Define features (NDVI mean) and target label (Crop_Type)
X = gdfcrop_features[feature_cols].values
y = gdfcrop_features['Crop_Type'].values

# Train-test Split 80 % train and 20% test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42,stratify=y)


# Train the Random Forest Model

In [274]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix

# Initialize the model
rf_model = RandomForestClassifier(n_estimators=200,
    max_depth=15,
    class_weight='balanced_subsample',
    random_state=42)

# Train the model
rf_model.fit(X_train, y_train)

# Predict on the test set
y_pred = rf_model.predict(X_test)

# Evaluate the model
print("Classification Report:")
print(classification_report(y_test, y_pred,zero_division=0))

print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))


Classification Report:
                precision    recall  f1-score   support

          Corn       0.00      0.00      0.00         3
    Eucalyptus       1.00      1.00      1.00         7
   Lady Finger       1.00      1.00      1.00         7
       Orchard       0.78      1.00      0.88         7
Persian Clover       0.62      0.67      0.64        12
     Sugarcane       0.33      0.18      0.24        11
        Tomato       0.48      0.58      0.52        19
         Wheat       0.35      0.38      0.36        16

      accuracy                           0.59        82
     macro avg       0.57      0.60      0.58        82
  weighted avg       0.55      0.59      0.56        82

Confusion Matrix:
[[ 0  0  0  0  0  0  0  3]
 [ 0  7  0  0  0  0  0  0]
 [ 0  0  7  0  0  0  0  0]
 [ 0  0  0  7  0  0  0  0]
 [ 0  0  0  0  8  0  2  2]
 [ 0  0  0  0  1  2  6  2]
 [ 0  0  0  1  2  1 11  4]
 [ 0  0  0  1  2  3  4  6]]
