# Soil Classification 
# Part 1: Multiclass Classification

**This project develops a multiclass classification model to distinguish Alluvial, Red, Black, and Clay soil images using deep learning, submitted for the Soil Classification Part 1 challenge.**


# Team Members

- **Team Name**: Expendables
- **Team Leader**: Sushmetha S R

| Name                 | Role/Title                                                               | Affiliation                          | Email                     | GitHub Handle         |
|----------------------|---------------------------------------------------------------------------|--------------------------------------|---------------------------|-----------------------|
| Abhinav Chaitanya R  | BTech in Electronics and Communication Engineering, 2025                | Vellore Institute of Technology, VIT Chennai | abhinavchaitanya6@gmail.com | Abhinav302004         |
| Arjun M              | BTech in Computer Science and Engineering, 2025                          | Vellore Institute of Technology, VIT Chennai | arjunm.0510@gmail.com     | ArjunM05              |
| Harshavardhan S      | BTech in Computer Science and Engineering, 2025                          | Vellore Institute of Technology, VIT Chennai | harsak7@gmail.com         | harsha152003          |
| Kiranchandran H      | BTech in Computer Science and Engineering (Cyber Physical Systems), 2025 | Vellore Institute of Technology, VIT Chennai | kiranchandranh@gmail.com  | kiranchh08            |
| Sushmetha S R        | BTech in Computer Science and Engineering (AI & ML Specialization), 2025 | Vellore Institute of Technology, VIT Chennai | sush7niaa@gmail.com       | sushniaa              |

# Objective

The objective of this project is to develop a robust deep learning model for soil type classification using labeled soil images. The goal is to accurately predict the soil type from images by leveraging transfer learning with the EfficientNetV2 architecture, and to address data imbalance through the use of class-weighted focal loss.

# Dataset

**Source:** The dataset is provided as part of the Soil Classification 2025 Challenge on Kaggle (`/kaggle/input/soil-classification/soil_classification-2025/`).
* **Training Set:**
  * Images stored in: `train/`
  * Labels provided in: `train_labels.csv`
* **Test Set:**
  * Images stored in: `test/`
  * Image IDs listed in: `test_ids.csv`
  * Unseen images in: `test-soil-image/`


**Dataset Statistics:**


* Total training samples: 1222
* Total test samples: 341
* Number of unique classes: 4 (Alluvial, Red, Clay, Black)
* Missing data: No missing values in image IDs or labels based on .isnull().sum() checks
* Invalid or corrupt images: None detected based on sample inspection and successful loading


**Preprocessing Steps:**

* All images resized to 224×224 to match the input size expected by EfficientNet-V2-S.

* Images converted to RGB format to ensure consistency.

* Applied ImageNet normalization: mean = [0.485, 0.456, 0.406], std = [0.229, 0.224, 0.225]

* Label encoding applied to convert string class labels to numerical targets for training.

**Label Description:**

* Labels represent soil types.

* Encoded using LabelEncoder to map string labels to integer classes: {'Alluvial soil': 0, 'Black Soil': 1, 'Clay soil': 2, 'Red soil': 3}

# Model

**Architecture Used:**

* Utilized EfficientNetV2-S, pretrained on ImageNet (torchvision.models.efficientnet_v2_s).

* Replaced the final classifier head with a custom nn.Sequential block.

  
* The model outputs class probabilities for each soil type using softmax activation.

**Loss Function:**

* Employed Focal Loss to handle class imbalance.

* Gamma = 2.0 to focus learning on harder examples.

* Alpha = per-class weight vector based on inverse class frequency.

This helped prevent the model from being biased toward majority classes.

**Optimization Details:**

* Optimizer: Adam with a learning rate of 1e-4.
  
* Scheduler: StepLR with a step size of 7 and decay factor (gamma=0.1).

  
* Batch size: 32, with stratified train/validation split (80/20).

Trained for 30 epochs, saving the best model based on F1-score on the validation set.

# Evaluation

**Metrics**:
  - F1-score (primary metric for the competition).
  - Evaluated on the validation set.

**Best Scores**:

* F1_alluvial_soil: `[Value depends on run, e.g., 0.9500]`
* F1_black_soil: `[e.g., 0.9500]`
* F1_clay_soil: `[e.g., 0.9500]`
* F1_red_soil: `[e.g., 0.9500]`
* (Exact values are in `ml-metrics.json` generated by `Training.ipynb`).

**Visuals**:

* **Image Distribution:** Histogram of predicted class distribution on validation set to inspect prediction bias.
* **Training History**: Line plots showing training loss and validation F1-score across 30 epochs. Saved as: `training_history.png`
* **Confusion Matrix**: Visualized class-wise prediction accuracy on the validation set using a confusion matrix. Saved as: `confusion_matrix.png`


# Inference

To use the model for predictions:
1. Run `Inference.ipynb`, which includes all training steps and inference.
2. The notebook:
   - Loads the best saved EfficientNetV2-S model (best_model.pth).
   - Applies the same preprocessing pipeline to test images (resize, normalize).
   - Selects the class with highest probability as the predicted label.
   - Outputs final predictions to Submission.csv in (image_id, soil_type) format.
3. For unseen images (e.g., `soil_test_1.jpg`, `soil_test_2.jpg`):
   - The notebook processes each image, computes its similarity, and predicts the label.