# Soil Image Classification Project Overview (Binary Classification)

This notebook presents a detailed summary of the binary Soil Image Classification task, which aims to determine whether a given image belongs to a particular soil class (labeled `1`) or not (`0`). The model is trained using a convolutional neural network (CNN) based on the ResNet18 architecture. The project includes data preparation, model training, evaluation, and submission generation.

## Dataset Description

The dataset comprises high-resolution soil images along with binary labels:

- **Label 1**: Indicates the image belongs to the target soil class.
- **Label 0**: Indicates it does not belong to the target class.

**Data Sources:**

- Training images and labels: `train_labels.csv`
- Test images: `test_ids.csv`
- Images are stored in respective train/test folders and resized to `224×224` pixels before training.

## Methodology

The classification pipeline is composed of the following stages:

1. **Data Loading & Preparation**
   - Paths are constructed for image access.
   - Labels are ensured to be binary (`int` type).
   - Stratified split is used to maintain class balance in training and validation sets.

2. **Data Augmentation**
   - Applied using `Albumentations`:
     - Random horizontal flip
     - Brightness/contrast adjustment
     - Random rotation
     - Normalization
   - Enhances generalization and reduces overfitting.

3. **Model Architecture**
   - **ResNet18** pre-trained on ImageNet.
   - The final layer modified to output probabilities for two classes (binary classification).

4. **Loss Function & Optimization**
   - `CrossEntropyLoss` used.
   - Optimized using `Adam` with a learning rate of `1e-4`.

5. **Training Loop**
   - Model trains for 5 epochs.
   - Monitors **F1-score** on training and validation sets.
   - Saves the best model based on validation performance.

6. **Inference**
   - The best-performing model is used for test prediction.
   - Outputs are mapped back to original label names (`0` or `1`).
   - Final predictions are saved to `submission.csv`.

## Evaluation Metrics

Model performance is primarily measured using the **F1-score**, which is crucial for imbalanced binary classification. Additionally, training loss is tracked.

### Sample Output from Training:
```
Epoch 1: Train Loss = 0.1617, Train F1 = 0.8421
Validation F1: 0.8765
✅ New best model saved
...
```

## Conclusion

This project demonstrates that a well-designed pipeline leveraging ResNet18 and advanced augmentation techniques can effectively classify binary soil categories. The use of stratified splitting and F1-score monitoring ensures the model performs robustly even in class-imbalanced scenarios.