# Final Portfolio Project - Classification Task
## Crop Prediction Based on Soil and Environmental Features

**Student Name:** Alish Duwal  
**Student ID:** 2461817  
**Group:** L5CG2  

**UN Sustainable Development Goal:** SDG 2 - Zero Hunger  
**Objective:** Predict the type of crop to be grown based on soil nutrients and environmental conditions

## 1. Import Libraries

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Scikit-learn libraries
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report, confusion_matrix

# Classical ML Models
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression

# Neural Network
from sklearn.neural_network import MLPClassifier

# Feature Selection
from sklearn.feature_selection import SelectKBest, f_classif, RFE

# Set random seed for reproducibility
np.random.seed(42)

# Display settings
pd.set_option('display.max_columns', None)
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('Set2')

print("✓ All libraries imported successfully!")

## 2. Load and Explore Dataset

In [None]:
# Load the dataset
df = pd.read_csv('/content/drive/MyDrive/FinalAssessment/sensor_Crop_Dataset.csv', encoding='latin1')

print("Dataset Shape:", df.shape)
print("\n" + "="*50)
print("First 5 rows:")
df.head()

In [None]:
# Dataset Information
print("Dataset Information:")
print("="*50)
df.info()

In [None]:
# Check for missing values
print("Missing Values:")
print("="*50)
missing_values = df.isnull().sum()
print(missing_values[missing_values > 0])
print(f"\nTotal missing values: {df.isnull().sum().sum()}")

In [None]:
# Statistical Summary
print("Statistical Summary of Numerical Features:")
print("="*50)
df.describe()

In [None]:
# Check target variable distribution
print("Target Variable (Crop) Distribution:")
print("="*50)
print(df['Crop'].value_counts())
print(f"\nNumber of unique crops: {df['Crop'].nunique()}")

## 3. Exploratory Data Analysis (EDA)

### 3.1 Data Cleaning

In [None]:
# Handle missing values if any
print("Handling missing values...")
df_clean = df.dropna()
print(f"Rows after removing missing values: {len(df_clean)}")
print(f"Rows removed: {len(df) - len(df_clean)}")

In [None]:
# Select relevant features for crop prediction
# We'll use environmental features: Nitrogen, Phosphorus, Potassium, Temperature, Humidity, pH_Value, Rainfall
print("Selected Features for Crop Prediction:")
print("="*50)
selected_features = ['Nitrogen', 'Phosphorus', 'Potassium', 'Temperature', 'Humidity', 'pH_Value', 'Rainfall']
print("Features:", selected_features)
print("Target: Crop")

### 3.2 Visualizations

In [None]:
# Crop distribution visualization
plt.figure(figsize=(12, 6))
crop_counts = df_clean['Crop'].value_counts()
plt.subplot(1, 2, 1)
crop_counts.plot(kind='bar', color='skyblue', edgecolor='black')
plt.title('Distribution of Crop Types', fontsize=14, fontweight='bold')
plt.xlabel('Crop Type')
plt.ylabel('Frequency')
plt.xticks(rotation=45, ha='right')
plt.grid(axis='y', alpha=0.3)

plt.subplot(1, 2, 2)
crop_counts.plot(kind='pie', autopct='%1.1f%%', startangle=90)
plt.title('Crop Type Distribution (Percentage)', fontsize=14, fontweight='bold')
plt.ylabel('')
plt.tight_layout()
plt.show()

print("Insight: The dataset shows distribution of different crop types.")

In [None]:
# Distribution of numerical features
fig, axes = plt.subplots(3, 3, figsize=(15, 12))
axes = axes.ravel()

for idx, col in enumerate(selected_features):
    axes[idx].hist(df_clean[col], bins=30, color='steelblue', edgecolor='black', alpha=0.7)
    axes[idx].set_title(f'Distribution of {col}', fontsize=11, fontweight='bold')
    axes[idx].set_xlabel(col)
    axes[idx].set_ylabel('Frequency')
    axes[idx].grid(axis='y', alpha=0.3)

# Hide extra subplots
for idx in range(len(selected_features), 9):
    axes[idx].axis('off')

plt.tight_layout()
plt.show()

print("Insight: Histograms show the distribution of environmental and soil features.")
print("Most features appear to have varied distributions across different ranges.")

In [None]:
# Correlation heatmap
plt.figure(figsize=(10, 8))
correlation_matrix = df_clean[selected_features].corr()
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm', 
            square=True, linewidths=1, cbar_kws={"shrink": 0.8})
plt.title('Correlation Heatmap of Environmental Features', fontsize=14, fontweight='bold', pad=20)
plt.tight_layout()
plt.show()

print("Insight: The correlation heatmap reveals relationships between features.")
print("Features with low correlation are more independent and valuable for prediction.")

In [None]:
# Boxplot for outlier detection
fig, axes = plt.subplots(2, 4, figsize=(16, 8))
axes = axes.ravel()

for idx, col in enumerate(selected_features):
    axes[idx].boxplot(df_clean[col], vert=True, patch_artist=True,
                     boxprops=dict(facecolor='lightblue', color='blue'),
                     medianprops=dict(color='red', linewidth=2))
    axes[idx].set_title(f'Boxplot of {col}', fontsize=10, fontweight='bold')
    axes[idx].set_ylabel('Value')
    axes[idx].grid(axis='y', alpha=0.3)

# Hide extra subplot
axes[7].axis('off')

plt.tight_layout()
plt.show()

print("Insight: Boxplots help identify outliers in the dataset.")
print("Some features may have extreme values that could affect model performance.")

## 4. Data Preprocessing

In [None]:
# Prepare features (X) and target (y)
X = df_clean[selected_features]
y = df_clean['Crop']

print("Features (X) shape:", X.shape)
print("Target (y) shape:", y.shape)
print("\nNumber of samples:", len(X))
print("Number of features:", X.shape[1])
print("Number of classes:", y.nunique())

In [None]:
# Encode target variable
le = LabelEncoder()
y_encoded = le.fit_transform(y)

print("Target variable encoded successfully!")
print("Classes:", le.classes_)
print("\nEncoded target shape:", y_encoded.shape)

In [None]:
# Split the data into training and testing sets (80-20 split)
X_train, X_test, y_train, y_test = train_test_split(
    X, y_encoded, test_size=0.2, random_state=42, stratify=y_encoded
)

print("Data split successfully!")
print("="*50)
print(f"Training set size: {len(X_train)} ({len(X_train)/len(X)*100:.1f}%)")
print(f"Testing set size: {len(X_test)} ({len(X_test)/len(X)*100:.1f}%)")
print("\nTraining features shape:", X_train.shape)
print("Testing features shape:", X_test.shape)

In [None]:
# Feature Scaling (Important for Neural Networks and some algorithms)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print("Features scaled successfully using StandardScaler!")
print("\nScaled training features shape:", X_train_scaled.shape)
print("Scaled testing features shape:", X_test_scaled.shape)

## 5. Task 1: Build Neural Network Model (MLPClassifier)

**Architecture:**
- Input Layer: 7 features
- Hidden Layer 1: 64 neurons with ReLU activation
- Hidden Layer 2: 32 neurons with ReLU activation
- Output Layer: Number of crop classes with Softmax (implicit)
- Loss Function: Cross-Entropy Loss
- Optimizer: Adam
- Learning Rate: 0.001 (default)

In [None]:
# Build Neural Network Classifier
nn_classifier = MLPClassifier(
    hidden_layer_sizes=(64, 32),  # Two hidden layers: 64 and 32 neurons
    activation='relu',             # ReLU activation function
    solver='adam',                 # Adam optimizer
    learning_rate_init=0.001,      # Learning rate
    max_iter=500,                  # Maximum iterations
    random_state=42,
    early_stopping=True,           # Use early stopping to prevent overfitting
    validation_fraction=0.1,       # 10% of training data for validation
    verbose=False
)

print("Neural Network Architecture:")
print("="*50)
print("Input Layer: 7 features")
print("Hidden Layer 1: 64 neurons (ReLU activation)")
print("Hidden Layer 2: 32 neurons (ReLU activation)")
print(f"Output Layer: {len(le.classes_)} neurons (Softmax activation)")
print("\nOptimizer: Adam")
print("Loss Function: Cross-Entropy Loss")
print("Learning Rate: 0.001")
print("Max Iterations: 500")
print("Early Stopping: Enabled")

In [None]:
# Train the Neural Network
print("Training Neural Network...")
nn_classifier.fit(X_train_scaled, y_train)
print("✓ Neural Network training completed!")
print(f"\nNumber of iterations: {nn_classifier.n_iter_}")
print(f"Loss: {nn_classifier.loss_:.4f}")

In [None]:
# Evaluate Neural Network on Training Set
y_train_pred_nn = nn_classifier.predict(X_train_scaled)

print("Neural Network - Training Set Performance:")
print("="*50)
print(f"Accuracy: {accuracy_score(y_train, y_train_pred_nn):.4f}")
print(f"Precision: {precision_score(y_train, y_train_pred_nn, average='weighted'):.4f}")
print(f"Recall: {recall_score(y_train, y_train_pred_nn, average='weighted'):.4f}")
print(f"F1-Score: {f1_score(y_train, y_train_pred_nn, average='weighted'):.4f}")

In [None]:
# Evaluate Neural Network on Test Set
y_test_pred_nn = nn_classifier.predict(X_test_scaled)

print("Neural Network - Test Set Performance:")
print("="*50)
print(f"Accuracy: {accuracy_score(y_test, y_test_pred_nn):.4f}")
print(f"Precision: {precision_score(y_test, y_test_pred_nn, average='weighted'):.4f}")
print(f"Recall: {recall_score(y_test, y_test_pred_nn, average='weighted'):.4f}")
print(f"F1-Score: {f1_score(y_test, y_test_pred_nn, average='weighted'):.4f}")

print("\n" + "="*50)
print("Classification Report:")
print("="*50)
print(classification_report(y_test, y_test_pred_nn, target_names=le.classes_))

In [None]:
# Confusion Matrix for Neural Network
cm_nn = confusion_matrix(y_test, y_test_pred_nn)

plt.figure(figsize=(10, 8))
sns.heatmap(cm_nn, annot=True, fmt='d', cmap='Blues', 
            xticklabels=le.classes_, yticklabels=le.classes_)
plt.title('Neural Network - Confusion Matrix', fontsize=14, fontweight='bold')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.tight_layout()
plt.show()

## 6. Task 2: Build Two Classical ML Models

### 6.1 Model 1: Random Forest Classifier

In [None]:
# Build Random Forest Classifier
rf_classifier = RandomForestClassifier(random_state=42)

print("Training Random Forest Classifier...")
rf_classifier.fit(X_train_scaled, y_train)
print("✓ Random Forest training completed!")

In [None]:
# Evaluate Random Forest on Test Set
y_test_pred_rf = rf_classifier.predict(X_test_scaled)

print("Random Forest - Test Set Performance:")
print("="*50)
print(f"Accuracy: {accuracy_score(y_test, y_test_pred_rf):.4f}")
print(f"Precision: {precision_score(y_test, y_test_pred_rf, average='weighted'):.4f}")
print(f"Recall: {recall_score(y_test, y_test_pred_rf, average='weighted'):.4f}")
print(f"F1-Score: {f1_score(y_test, y_test_pred_rf, average='weighted'):.4f}")

### 6.2 Model 2: Logistic Regression

In [None]:
# Build Logistic Regression Classifier
lr_classifier = LogisticRegression(max_iter=1000, random_state=42)

print("Training Logistic Regression Classifier...")
lr_classifier.fit(X_train_scaled, y_train)
print("✓ Logistic Regression training completed!")

In [None]:
# Evaluate Logistic Regression on Test Set
y_test_pred_lr = lr_classifier.predict(X_test_scaled)

print("Logistic Regression - Test Set Performance:")
print("="*50)
print(f"Accuracy: {accuracy_score(y_test, y_test_pred_lr):.4f}")
print(f"Precision: {precision_score(y_test, y_test_pred_lr, average='weighted'):.4f}")
print(f"Recall: {recall_score(y_test, y_test_pred_lr, average='weighted'):.4f}")
print(f"F1-Score: {f1_score(y_test, y_test_pred_lr, average='weighted'):.4f}")

## 7. Task 3: Hyperparameter Optimization with Cross-Validation

### 7.1 Random Forest Hyperparameter Tuning

In [None]:
# Define parameter grid for Random Forest (reduced for faster execution)
rf_param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [10, 20, None],
    'min_samples_split': [2, 5],
    'min_samples_leaf': [1, 2]
}

print("Random Forest - Hyperparameter Grid:")
print("="*50)
for param, values in rf_param_grid.items():
    print(f"{param}: {values}")

print(f"\nTotal combinations: {np.prod([len(v) for v in rf_param_grid.values()])}")

In [None]:
# Perform GridSearchCV for Random Forest
print("\nPerforming GridSearchCV for Random Forest...")
print("This may take a few minutes...")

rf_grid_search = GridSearchCV(
    RandomForestClassifier(random_state=42),
    rf_param_grid,
    cv=5,
    scoring='accuracy',
    n_jobs=-1,
    verbose=1
)

rf_grid_search.fit(X_train_scaled, y_train)

print("\n✓ GridSearchCV completed for Random Forest!")
print("\nBest Hyperparameters:")
print("="*50)
for param, value in rf_grid_search.best_params_.items():
    print(f"{param}: {value}")
print(f"\nBest Cross-Validation Score: {rf_grid_search.best_score_:.4f}")

### 7.2 Logistic Regression Hyperparameter Tuning

In [None]:
# Define parameter grid for Logistic Regression (reduced for faster execution)
lr_param_grid = {
    'C': [0.1, 1.0, 10.0],
    'solver': ['lbfgs', 'liblinear'],
    'penalty': ['l2']
}

print("Logistic Regression - Hyperparameter Grid:")
print("="*50)
for param, values in lr_param_grid.items():
    print(f"{param}: {values}")

print(f"\nTotal combinations: {np.prod([len(v) for v in lr_param_grid.values()])}")

In [None]:
# Perform GridSearchCV for Logistic Regression
print("\nPerforming GridSearchCV for Logistic Regression...")

lr_grid_search = GridSearchCV(
    LogisticRegression(max_iter=1000, random_state=42),
    lr_param_grid,
    cv=5,
    scoring='accuracy',
    n_jobs=-1,
    verbose=1
)

lr_grid_search.fit(X_train_scaled, y_train)

print("\n✓ GridSearchCV completed for Logistic Regression!")
print("\nBest Hyperparameters:")
print("="*50)
for param, value in lr_grid_search.best_params_.items():
    print(f"{param}: {value}")
print(f"\nBest Cross-Validation Score: {lr_grid_search.best_score_:.4f}")

## 8. Task 4: Feature Selection

We will use SelectKBest with f_classif for feature selection.

In [None]:
# Feature Selection using SelectKBest
k_best = 5  # Select top 5 features

selector = SelectKBest(score_func=f_classif, k=k_best)
X_train_selected = selector.fit_transform(X_train_scaled, y_train)
X_test_selected = selector.transform(X_test_scaled)

# Get selected feature names
selected_feature_indices = selector.get_support(indices=True)
selected_feature_names = [selected_features[i] for i in selected_feature_indices]

print("Feature Selection Results:")
print("="*50)
print(f"Method: SelectKBest with f_classif")
print(f"Number of features selected: {k_best}")
print(f"\nSelected Features: {selected_feature_names}")
print(f"\nFeature Scores:")
for i, (feature, score) in enumerate(zip(selected_features, selector.scores_)):
    selected = "✓" if i in selected_feature_indices else "✗"
    print(f"{selected} {feature}: {score:.2f}")

In [None]:
# Visualize feature importance scores
plt.figure(figsize=(10, 6))
feature_scores = pd.DataFrame({
    'Feature': selected_features,
    'Score': selector.scores_
}).sort_values('Score', ascending=False)

colors = ['green' if f in selected_feature_names else 'lightgray' for f in feature_scores['Feature']]

plt.barh(feature_scores['Feature'], feature_scores['Score'], color=colors, edgecolor='black')
plt.xlabel('F-Score', fontsize=12)
plt.ylabel('Features', fontsize=12)
plt.title('Feature Importance Scores (SelectKBest)', fontsize=14, fontweight='bold')
plt.gca().invert_yaxis()
plt.grid(axis='x', alpha=0.3)
plt.tight_layout()
plt.show()

print("\nGreen bars indicate selected features for final models.")

## 9. Task 5: Final Models with Optimal Hyperparameters and Selected Features

### 9.1 Final Random Forest Model

In [None]:
# Build final Random Forest model with best parameters and selected features
final_rf = RandomForestClassifier(**rf_grid_search.best_params_, random_state=42)

print("Training Final Random Forest Model...")
print("="*50)
print("Features used:", selected_feature_names)
print("Number of features:", len(selected_feature_names))
print("\nHyperparameters:")
for param, value in rf_grid_search.best_params_.items():
    print(f"  {param}: {value}")

final_rf.fit(X_train_selected, y_train)
print("\n✓ Final Random Forest model trained!")

In [None]:
# Evaluate final Random Forest model
y_test_pred_rf_final = final_rf.predict(X_test_selected)

# Get cross-validation score
rf_cv_score = cross_val_score(final_rf, X_train_selected, y_train, cv=5, scoring='accuracy').mean()

rf_final_accuracy = accuracy_score(y_test, y_test_pred_rf_final)
rf_final_precision = precision_score(y_test, y_test_pred_rf_final, average='weighted')
rf_final_recall = recall_score(y_test, y_test_pred_rf_final, average='weighted')
rf_final_f1 = f1_score(y_test, y_test_pred_rf_final, average='weighted')

print("Final Random Forest - Test Set Performance:")
print("="*50)
print(f"CV Score: {rf_cv_score:.4f}")
print(f"Accuracy: {rf_final_accuracy:.4f}")
print(f"Precision: {rf_final_precision:.4f}")
print(f"Recall: {rf_final_recall:.4f}")
print(f"F1-Score: {rf_final_f1:.4f}")

### 9.2 Final Logistic Regression Model

In [None]:
# Build final Logistic Regression model with best parameters and selected features
final_lr = LogisticRegression(**lr_grid_search.best_params_, max_iter=1000, random_state=42)

print("Training Final Logistic Regression Model...")
print("="*50)
print("Features used:", selected_feature_names)
print("Number of features:", len(selected_feature_names))
print("\nHyperparameters:")
for param, value in lr_grid_search.best_params_.items():
    print(f"  {param}: {value}")

final_lr.fit(X_train_selected, y_train)
print("\n✓ Final Logistic Regression model trained!")

In [None]:
# Evaluate final Logistic Regression model
y_test_pred_lr_final = final_lr.predict(X_test_selected)

# Get cross-validation score
lr_cv_score = cross_val_score(final_lr, X_train_selected, y_train, cv=5, scoring='accuracy').mean()

lr_final_accuracy = accuracy_score(y_test, y_test_pred_lr_final)
lr_final_precision = precision_score(y_test, y_test_pred_lr_final, average='weighted')
lr_final_recall = recall_score(y_test, y_test_pred_lr_final, average='weighted')
lr_final_f1 = f1_score(y_test, y_test_pred_lr_final, average='weighted')

print("Final Logistic Regression - Test Set Performance:")
print("="*50)
print(f"CV Score: {lr_cv_score:.4f}")
print(f"Accuracy: {lr_final_accuracy:.4f}")
print(f"Precision: {lr_final_precision:.4f}")
print(f"Recall: {lr_final_recall:.4f}")
print(f"F1-Score: {lr_final_f1:.4f}")

## 10. Task 6: Final Model Comparison

Comparison of all models including Neural Network and optimized classical models.

In [None]:
# Create comprehensive comparison table
comparison_data = {
    'Model': [
        'Neural Network (MLP)',
        'Random Forest (Optimized)',
        'Logistic Regression (Optimized)'
    ],
    'Features Used': [
        f'All ({len(selected_features)})',
        f'Selected ({len(selected_feature_names)})',
        f'Selected ({len(selected_feature_names)})'
    ],
    'CV Score': [
        'N/A',
        f'{rf_cv_score:.4f}',
        f'{lr_cv_score:.4f}'
    ],
    'Accuracy': [
        f'{accuracy_score(y_test, y_test_pred_nn):.4f}',
        f'{rf_final_accuracy:.4f}',
        f'{lr_final_accuracy:.4f}'
    ],
    'Precision': [
        f'{precision_score(y_test, y_test_pred_nn, average="weighted"):.4f}',
        f'{rf_final_precision:.4f}',
        f'{lr_final_precision:.4f}'
    ],
    'Recall': [
        f'{recall_score(y_test, y_test_pred_nn, average="weighted"):.4f}',
        f'{rf_final_recall:.4f}',
        f'{lr_final_recall:.4f}'
    ],
    'F1-Score': [
        f'{f1_score(y_test, y_test_pred_nn, average="weighted"):.4f}',
        f'{rf_final_f1:.4f}',
        f'{lr_final_f1:.4f}'
    ]
}

comparison_df = pd.DataFrame(comparison_data)

print("\n" + "="*80)
print("FINAL MODEL COMPARISON TABLE")
print("="*80)
print(comparison_df.to_string(index=False))
print("="*80)

In [None]:
# Visualize model comparison
metrics = ['Accuracy', 'Precision', 'Recall', 'F1-Score']
nn_scores = [
    accuracy_score(y_test, y_test_pred_nn),
    precision_score(y_test, y_test_pred_nn, average='weighted'),
    recall_score(y_test, y_test_pred_nn, average='weighted'),
    f1_score(y_test, y_test_pred_nn, average='weighted')
]
rf_scores = [rf_final_accuracy, rf_final_precision, rf_final_recall, rf_final_f1]
lr_scores = [lr_final_accuracy, lr_final_precision, lr_final_recall, lr_final_f1]

x = np.arange(len(metrics))
width = 0.25

fig, ax = plt.subplots(figsize=(12, 6))
bars1 = ax.bar(x - width, nn_scores, width, label='Neural Network', color='steelblue')
bars2 = ax.bar(x, rf_scores, width, label='Random Forest', color='forestgreen')
bars3 = ax.bar(x + width, lr_scores, width, label='Logistic Regression', color='coral')

ax.set_xlabel('Metrics', fontsize=12, fontweight='bold')
ax.set_ylabel('Score', fontsize=12, fontweight='bold')
ax.set_title('Final Model Performance Comparison', fontsize=14, fontweight='bold', pad=20)
ax.set_xticks(x)
ax.set_xticklabels(metrics)
ax.legend(fontsize=10)
ax.grid(axis='y', alpha=0.3)
ax.set_ylim([0, 1.1])

# Add value labels on bars
for bars in [bars1, bars2, bars3]:
    for bar in bars:
        height = bar.get_height()
        ax.text(bar.get_x() + bar.get_width()/2., height,
                f'{height:.3f}', ha='center', va='bottom', fontsize=8)

plt.tight_layout()
plt.show()

## 11. Conclusion and Reflection

### Model Performance Summary

All three models achieved high performance in predicting crop types based on environmental and soil features. The Neural Network, Random Forest, and Logistic Regression models all demonstrated strong accuracy, precision, recall, and F1-scores on the test set.

### Impact of Methods

**Cross-Validation:** GridSearchCV helped identify optimal hyperparameters for both Random Forest and Logistic Regression, potentially improving their generalization performance.

**Feature Selection:** Using SelectKBest, we reduced the feature space from 7 to 5 features, which:
- Simplified the models
- Reduced computational complexity
- Maintained or improved model performance
- Identified the most important environmental factors for crop prediction

### Key Insights

1. **Environmental features** (Nitrogen, Phosphorus, Potassium, Temperature, Humidity, pH, Rainfall) are strong predictors of suitable crop types
2. **Feature selection** successfully identified the most discriminative features while reducing dimensionality
3. Both **classical ML models** and **Neural Networks** performed well on this classification task
4. The models can help farmers make **data-driven decisions** about crop selection, supporting **SDG 2: Zero Hunger**

### Future Directions

1. Experiment with **ensemble methods** combining multiple models
2. Collect **more diverse data** across different geographical regions
3. Incorporate **additional features** like soil texture, elevation, and climate patterns
4. Deploy the model as a **web application** for real-world agricultural use
5. Investigate **model interpretability** to understand feature contributions better

In [None]:
print("\n" + "="*80)
print("CLASSIFICATION TASK COMPLETED SUCCESSFULLY!")
print("="*80)
print("\nAll required tasks have been completed:")
print("✓ Task 1: Exploratory Data Analysis")
print("✓ Task 2: Neural Network Model (MLPClassifier)")
print("✓ Task 3: Two Classical ML Models (Random Forest & Logistic Regression)")
print("✓ Task 4: Hyperparameter Optimization with Cross-Validation")
print("✓ Task 5: Feature Selection (SelectKBest)")
print("✓ Task 6: Final Model Comparison")
print("✓ Task 7: Conclusion and Reflection")
print("\n" + "="*80)