# Crop Recommendation Machine Learning Model

This notebook builds classification models that predict crop recommendations based on categorical soil and environmental inputs.

**Dataset:** Crop_recommendation.csv (22 crops, 2200 samples)
**Models:** Neural Network (MLPClassifier) and Logistic Regression
**Goal:** Accept user-friendly categorical inputs and predict crop labels

## 1. Import Required Libraries

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, f1_score, recall_score, classification_report
import joblib
import warnings
warnings.filterwarnings('ignore')

print("Libraries imported successfully!")

Libraries imported successfully!


## 2. Define Categorization Functions

Convert numerical features to categorical ranges for user-friendly inputs.

In [2]:
def categorize_nitrogen(n):
    """Categorize Nitrogen levels"""
    if n <= 20:
        return 'Very Low'
    elif n <= 40:
        return 'Low'
    elif n <= 80:
        return 'Medium'
    elif n <= 120:
        return 'High'
    else:
        return 'Very High'

def categorize_phosphorous(p):
    """Categorize Phosphorous levels"""
    if p <= 25:
        return 'Very Low'
    elif p <= 50:
        return 'Low'
    elif p <= 75:
        return 'Medium'
    elif p <= 100:
        return 'High'
    else:
        return 'Very High'

def categorize_potassium(k):
    """Categorize Potassium levels"""
    if k <= 20:
        return 'Very Low'
    elif k <= 35:
        return 'Low'
    elif k <= 60:
        return 'Medium'
    elif k <= 100:
        return 'High'
    else:
        return 'Very High'

def categorize_temperature(temp):
    """Categorize Temperature levels"""
    if temp <= 18:
        return 'Cool'
    elif temp <= 25:
        return 'Mild'
    elif temp <= 32:
        return 'Warm'
    else:
        return 'Hot'

def categorize_humidity(humidity):
    """Categorize Humidity levels"""
    if humidity <= 40:
        return 'Dry'
    elif humidity <= 70:
        return 'Moderate'
    elif humidity <= 90:
        return 'Humid'
    else:
        return 'Very Humid'

def categorize_ph(ph):
    """Categorize pH levels"""
    if ph <= 6.0:
        return 'Acidic'
    elif ph <= 7.0:
        return 'Neutral'
    else:
        return 'Alkaline'

def categorize_rainfall(rainfall):
    """Categorize Rainfall levels"""
    if rainfall <= 60:
        return 'Low'
    elif rainfall <= 120:
        return 'Medium'
    elif rainfall <= 200:
        return 'High'
    else:
        return 'Very High'

print("Categorization functions defined successfully!")

Categorization functions defined successfully!


## 3. Load and Explore Dataset

In [3]:
# Load dataset
file_path = r"C:\Users\chan-shinan\Documents\icbt\final project\kamil\dataset\Merged_Crop_Recommendation.csv"
df = pd.read_csv(file_path)

print("Dataset loaded successfully!")
print(f"Dataset shape: {df.shape}")
print(f"Columns: {df.columns.tolist()}")
print(f"\nUnique crops: {df['label'].nunique()}")
print(f"Crop types: {sorted(df['label'].unique())}")

Dataset loaded successfully!
Dataset shape: (12200, 8)
Columns: ['N', 'P', 'K', 'temperature', 'humidity', 'ph', 'rainfall', 'label']

Unique crops: 122
Crop types: ['almond', 'amaranth', 'apple', 'apricot', 'artichoke', 'asparagus', 'avocado', 'bamboo', 'banana', 'barley', 'basil', 'beetroot', 'betel', 'bilberry', 'blackberry', 'blackgram', 'blueberry', 'breadfruit', 'broccoli', 'buckwheat', 'cabbage', 'carambola', 'carrot', 'cashew', 'cassava', 'cauliflower', 'celery', 'chard', 'cherry', 'chia', 'chickpea', 'clementine', 'coconut', 'coffee', 'cotton', 'cranberry', 'cucumber', 'currant', 'date', 'dragonfruit', 'durian', 'eggplant', 'fig', 'garlic', 'ginger', 'gooseberry', 'grapes', 'guava', 'hazelnut', 'hemp', 'jackfruit', 'jambul', 'jute', 'kidneybeans', 'kiwi', 'leek', 'lemongrass', 'lentil', 'lettuce', 'longan', 'lychee', 'macadamia', 'maize', 'mandarin', 'mango', 'mangosteen', 'melon', 'millet', 'mint', 'mothbeans', 'mulberry', 'mungbean', 'muskmelon', 'nectarine', 'oats', 'okra',

In [4]:
# Display basic statistics
print("Dataset Info:")
df.info()
print("\nFirst 5 rows:")
df.head()

Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12200 entries, 0 to 12199
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   N            12200 non-null  float64
 1   P            12200 non-null  float64
 2   K            12200 non-null  float64
 3   temperature  12200 non-null  float64
 4   humidity     12200 non-null  float64
 5   ph           12200 non-null  float64
 6   rainfall     12200 non-null  float64
 7   label        12200 non-null  object 
dtypes: float64(7), object(1)
memory usage: 762.6+ KB

First 5 rows:


Unnamed: 0,N,P,K,temperature,humidity,ph,rainfall,label
0,90.0,42.0,43.0,20.879744,82.002744,6.502985,202.935536,rice
1,85.0,58.0,41.0,21.770462,80.319644,7.038096,226.655537,rice
2,60.0,55.0,44.0,23.004459,82.320763,7.840207,263.964248,rice
3,74.0,35.0,40.0,26.491096,80.158363,6.980401,242.864034,rice
4,78.0,42.0,42.0,20.130175,81.604873,7.628473,262.71734,rice


In [5]:
# Display statistical summary
print("Statistical Summary:")
df.describe()

Statistical Summary:


Unnamed: 0,N,P,K,temperature,humidity,ph,rainfall
count,12200.0,12200.0,12200.0,12200.0,12200.0,12200.0,12200.0
mean,52.035687,54.132384,53.157218,25.608621,70.820496,6.467251,104.037551
std,34.285868,31.161659,44.253161,5.042113,20.718325,0.770112,52.405819
min,0.0,5.0,5.0,8.82,14.25804,3.5,20.21
25%,25.0,30.37149,17.388519,22.321391,56.927912,5.950637,65.158502
50%,49.0,52.795723,44.0007,25.598806,72.966506,6.454414,100.903906
75%,77.468044,74.159922,78.737746,28.883787,87.751087,6.968015,138.859383
max,140.0,145.0,205.0,43.675493,99.981876,9.935091,298.560117


## 4. Data Preprocessing - Convert to Categorical Features

In [6]:
# Apply categorization functions to create categorical features
print("Converting numerical features to categorical ranges...")

df['N_cat'] = df['N'].apply(categorize_nitrogen)
df['P_cat'] = df['P'].apply(categorize_phosphorous)
df['K_cat'] = df['K'].apply(categorize_potassium)
df['temperature_cat'] = df['temperature'].apply(categorize_temperature)
df['humidity_cat'] = df['humidity'].apply(categorize_humidity)
df['ph_cat'] = df['ph'].apply(categorize_ph)
df['rainfall_cat'] = df['rainfall'].apply(categorize_rainfall)

print("Categorical features created successfully!")

# Display sample of categorical features
categorical_cols = ['N_cat', 'P_cat', 'K_cat', 'temperature_cat', 'humidity_cat', 'ph_cat', 'rainfall_cat']
print("\nSample of categorical features:")
df[categorical_cols + ['label']].head(10)

Converting numerical features to categorical ranges...
Categorical features created successfully!

Sample of categorical features:


Unnamed: 0,N_cat,P_cat,K_cat,temperature_cat,humidity_cat,ph_cat,rainfall_cat,label
0,High,Low,Medium,Mild,Humid,Neutral,Very High,rice
1,High,Medium,Medium,Mild,Humid,Alkaline,Very High,rice
2,Medium,Medium,Medium,Mild,Humid,Alkaline,Very High,rice
3,Medium,Low,Medium,Warm,Humid,Neutral,Very High,rice
4,Medium,Low,Medium,Mild,Humid,Alkaline,Very High,rice
5,Medium,Low,Medium,Mild,Humid,Alkaline,Very High,rice
6,Medium,Medium,Medium,Mild,Humid,Acidic,Very High,rice
7,High,Medium,Medium,Mild,Humid,Acidic,Very High,rice
8,High,Medium,Medium,Mild,Humid,Neutral,Very High,rice
9,Medium,Medium,Medium,Mild,Humid,Neutral,Very High,rice


In [7]:
# Check distribution of categorical features
print("Distribution of Categorical Features:")
for col in categorical_cols:
    print(f"\n{col}:")
    print(df[col].value_counts().sort_index())

Distribution of Categorical Features:

N_cat:
N_cat
High         2474
Low          2581
Medium       4334
Very High     329
Very Low     2482
Name: count, dtype: int64

P_cat:
P_cat
High         1959
Low          3287
Medium       3530
Very High     978
Very Low     2446
Name: count, dtype: int64

K_cat:
K_cat
High         2709
Low          1706
Medium       2584
Very High    1760
Very Low     3441
Name: count, dtype: int64

temperature_cat:
temperature_cat
Cool     799
Hot     1215
Mild    4672
Warm    5514
Name: count, dtype: int64

humidity_cat:
humidity_cat
Dry           1003
Humid         4006
Moderate      4566
Very Humid    2625
Name: count, dtype: int64

ph_cat:
ph_cat
Acidic      3279
Alkaline    2888
Neutral     6033
Name: count, dtype: int64

rainfall_cat:
rainfall_cat
High         3830
Low          2689
Medium       5176
Very High     505
Name: count, dtype: int64


## 5. Encode Categorical Features for Machine Learning

In [8]:
# Encode categorical features using Label Encoder
print("Encoding categorical features for ML models...")

categorical_columns = ['N_cat', 'P_cat', 'K_cat', 'temperature_cat', 
                      'humidity_cat', 'ph_cat', 'rainfall_cat']

encoders = {}
encoded_df = df.copy()

for col in categorical_columns:
    le = LabelEncoder()
    encoded_df[col + '_encoded'] = le.fit_transform(df[col])
    encoders[col] = le
    print(f"{col} encoded: {dict(zip(le.classes_, le.transform(le.classes_)))}")

# Prepare features and target
feature_columns = [col + '_encoded' for col in categorical_columns]
X = encoded_df[feature_columns]
y = encoded_df['label']

print(f"\nFeatures shape: {X.shape}")
print(f"Target shape: {y.shape}")
print(f"Feature columns: {feature_columns}")

Encoding categorical features for ML models...
N_cat encoded: {'High': np.int64(0), 'Low': np.int64(1), 'Medium': np.int64(2), 'Very High': np.int64(3), 'Very Low': np.int64(4)}
P_cat encoded: {'High': np.int64(0), 'Low': np.int64(1), 'Medium': np.int64(2), 'Very High': np.int64(3), 'Very Low': np.int64(4)}
K_cat encoded: {'High': np.int64(0), 'Low': np.int64(1), 'Medium': np.int64(2), 'Very High': np.int64(3), 'Very Low': np.int64(4)}
temperature_cat encoded: {'Cool': np.int64(0), 'Hot': np.int64(1), 'Mild': np.int64(2), 'Warm': np.int64(3)}
humidity_cat encoded: {'Dry': np.int64(0), 'Humid': np.int64(1), 'Moderate': np.int64(2), 'Very Humid': np.int64(3)}
ph_cat encoded: {'Acidic': np.int64(0), 'Alkaline': np.int64(1), 'Neutral': np.int64(2)}
rainfall_cat encoded: {'High': np.int64(0), 'Low': np.int64(1), 'Medium': np.int64(2), 'Very High': np.int64(3)}

Features shape: (12200, 7)
Target shape: (12200,)
Feature columns: ['N_cat_encoded', 'P_cat_encoded', 'K_cat_encoded', 'temperature

## 6. Split Data into Training and Testing Sets

In [9]:
# Split data (80% train, 20% test)
print("Splitting data into training and testing sets...")

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"Training set size: {len(X_train)} samples")
print(f"Test set size: {len(X_test)} samples")
print(f"Training set percentage: {len(X_train)/len(X)*100:.1f}%")
print(f"Test set percentage: {len(X_test)/len(X)*100:.1f}%")

# Check class distribution in splits
print("\nClass distribution in training set:")
print(y_train.value_counts().sort_index())
print("\nClass distribution in test set:")
print(y_test.value_counts().sort_index())

Splitting data into training and testing sets...
Training set size: 9760 samples
Test set size: 2440 samples
Training set percentage: 80.0%
Test set percentage: 20.0%

Class distribution in training set:
label
almond         65
amaranth       66
apple         143
apricot        61
artichoke      78
             ... 
walnut         62
watermelon    148
wheat          64
yam            75
zucchini       74
Name: count, Length: 122, dtype: int64

Class distribution in test set:
label
almond        16
amaranth      17
apple         36
apricot       15
artichoke     19
              ..
walnut        16
watermelon    37
wheat         16
yam           19
zucchini      18
Name: count, Length: 122, dtype: int64


## 7. Train Machine Learning Models

### 7.1 Neural Network (MLPClassifier)

In [10]:
# Train Neural Network model
print("Training Neural Network (MLPClassifier)...")

mlp = MLPClassifier(
    hidden_layer_sizes=(100, 50),
    max_iter=1000,
    random_state=42,
    learning_rate_init=0.001
)

mlp.fit(X_train, y_train)
mlp_pred = mlp.predict(X_test)

print("Neural Network training completed!")
print(f"Number of iterations: {mlp.n_iter_}")
print(f"Training score: {mlp.score(X_train, y_train):.4f}")
print(f"Test score: {mlp.score(X_test, y_test):.4f}")

Training Neural Network (MLPClassifier)...
Neural Network training completed!
Number of iterations: 588
Training score: 0.2405
Test score: 0.1594


### 7.2 Logistic Regression

In [11]:
# Train Logistic Regression model
print("Training Logistic Regression...")

lr = LogisticRegression(
    max_iter=1000,
    random_state=42,
    multi_class='ovr'
)

lr.fit(X_train, y_train)
lr_pred = lr.predict(X_test)

print("Logistic Regression training completed!")
print(f"Training score: {lr.score(X_train, y_train):.4f}")
print(f"Test score: {lr.score(X_test, y_test):.4f}")

Training Logistic Regression...
Logistic Regression training completed!
Training score: 0.1159
Test score: 0.1070


## 8. Model Evaluation and Comparison

In [12]:
# Calculate evaluation metrics for both models
print("=" * 60)
print("MODEL PERFORMANCE COMPARISON")
print("=" * 60)

# Neural Network metrics
mlp_accuracy = accuracy_score(y_test, mlp_pred)
mlp_f1 = f1_score(y_test, mlp_pred, average='macro')
mlp_recall = recall_score(y_test, mlp_pred, average='macro')

# Logistic Regression metrics
lr_accuracy = accuracy_score(y_test, lr_pred)
lr_f1 = f1_score(y_test, lr_pred, average='macro')
lr_recall = recall_score(y_test, lr_pred, average='macro')

# Store results
models_results = {
    'Neural Network': {
        'accuracy': mlp_accuracy,
        'f1_score': mlp_f1,
        'recall': mlp_recall,
        'predictions': mlp_pred
    },
    'Logistic Regression': {
        'accuracy': lr_accuracy,
        'f1_score': lr_f1,
        'recall': lr_recall,
        'predictions': lr_pred
    }
}

# Display results
print("\nNeural Network (MLPClassifier):")
print(f"  Accuracy: {mlp_accuracy:.4f}")
print(f"  F1-Score (macro): {mlp_f1:.4f}")
print(f"  Recall (macro): {mlp_recall:.4f}")

print("\nLogistic Regression:")
print(f"  Accuracy: {lr_accuracy:.4f}")
print(f"  F1-Score (macro): {lr_f1:.4f}")
print(f"  Recall (macro): {lr_recall:.4f}")

# Determine best model based on F1-score
best_model_name = 'Neural Network' if mlp_f1 > lr_f1 else 'Logistic Regression'
best_score = max(mlp_f1, lr_f1)

print(f"\n{'='*60}")
print(f"BEST PERFORMING MODEL: {best_model_name}")
print(f"F1-Score: {best_score:.4f}")
print(f"{'='*60}")

MODEL PERFORMANCE COMPARISON

Neural Network (MLPClassifier):
  Accuracy: 0.1594
  F1-Score (macro): 0.0707
  Recall (macro): 0.0906

Logistic Regression:
  Accuracy: 0.1070
  F1-Score (macro): 0.0293
  Recall (macro): 0.0586

BEST PERFORMING MODEL: Neural Network
F1-Score: 0.0707


## 9. Detailed Classification Reports

In [13]:
# Neural Network Classification Report
print("NEURAL NETWORK - DETAILED CLASSIFICATION REPORT")
print("=" * 55)
print(classification_report(y_test, mlp_pred))

NEURAL NETWORK - DETAILED CLASSIFICATION REPORT
              precision    recall  f1-score   support

      almond       0.12      0.06      0.08        16
    amaranth       0.06      0.06      0.06        17
       apple       0.32      0.50      0.39        36
     apricot       0.00      0.00      0.00        15
   artichoke       0.00      0.00      0.00        19
   asparagus       0.00      0.00      0.00        16
     avocado       0.00      0.00      0.00        17
      bamboo       0.00      0.00      0.00        20
      banana       0.39      0.53      0.45        38
      barley       0.00      0.00      0.00        17
       basil       0.00      0.00      0.00        19
    beetroot       0.00      0.00      0.00        15
       betel       0.00      0.00      0.00        17
    bilberry       0.00      0.00      0.00        17
  blackberry       0.00      0.00      0.00        15
   blackgram       0.22      0.49      0.31        35
   blueberry       0.00      0.00

In [14]:
# Logistic Regression Classification Report
print("LOGISTIC REGRESSION - DETAILED CLASSIFICATION REPORT")
print("=" * 55)
print(classification_report(y_test, lr_pred))

LOGISTIC REGRESSION - DETAILED CLASSIFICATION REPORT
              precision    recall  f1-score   support

      almond       0.00      0.00      0.00        16
    amaranth       0.00      0.00      0.00        17
       apple       0.14      0.31      0.19        36
     apricot       0.00      0.00      0.00        15
   artichoke       0.00      0.00      0.00        19
   asparagus       0.00      0.00      0.00        16
     avocado       0.00      0.00      0.00        17
      bamboo       0.00      0.00      0.00        20
      banana       0.10      0.42      0.16        38
      barley       0.00      0.00      0.00        17
       basil       0.00      0.00      0.00        19
    beetroot       0.00      0.00      0.00        15
       betel       0.00      0.00      0.00        17
    bilberry       0.00      0.00      0.00        17
  blackberry       0.00      0.00      0.00        15
   blackgram       0.10      0.09      0.09        35
   blueberry       0.00     

## 10. Save Best Performing Model

In [15]:
# Save the best performing model
print(f"Saving best model: {best_model_name}")

best_model = mlp if best_model_name == 'Neural Network' else lr

# Create model package with encoders
model_package = {
    'model': best_model,
    'encoders': encoders,
    'model_type': best_model_name,
    'feature_columns': feature_columns,
    'categorical_columns': categorical_columns
}

# Save model
joblib.dump(model_package, 'crop_recommendation_model_new.pkl')
print("Model saved as 'crop_recommendation_model_new.pkl'")

print(f"\nModel package contains:")
print(f"- Trained {best_model_name} model")
print(f"- Feature encoders for categorical inputs")
print(f"- Model metadata")

Saving best model: Neural Network
Model saved as 'crop_recommendation_model_new.pkl'

Model package contains:
- Trained Neural Network model
- Feature encoders for categorical inputs
- Model metadata


## 11. Generate Performance Report File

In [16]:
# Generate detailed performance report
print("Generating detailed performance report...")

report_content = []
report_content.append("CROP RECOMMENDATION MODEL PERFORMANCE REPORT")
report_content.append("=" * 55)
report_content.append("")
report_content.append(f"Dataset: Crop_recommendation.csv")
report_content.append(f"Total samples: {len(df)}")
report_content.append(f"Number of crops: {df['label'].nunique()}")
report_content.append(f"Train/Test split: 80%/20%")
report_content.append(f"Training samples: {len(X_train)}")
report_content.append(f"Test samples: {len(X_test)}")
report_content.append("")

for model_name, results in models_results.items():
    report_content.append(f"{model_name.upper()} RESULTS:")
    report_content.append("-" * 40)
    report_content.append(f"Accuracy: {results['accuracy']:.4f}")
    report_content.append(f"F1-Score (macro): {results['f1_score']:.4f}")
    report_content.append(f"Recall (macro): {results['recall']:.4f}")
    report_content.append("")
    report_content.append("Classification Report:")
    report_content.append(classification_report(y_test, results['predictions']))
    report_content.append("\n" + "="*55 + "\n")

report_content.append(f"BEST PERFORMING MODEL: {best_model_name}")
report_content.append(f"Best F1-Score: {best_score:.4f}")

# Save report to file
with open('model_performance_report.txt', 'w') as f:
    f.write('\n'.join(report_content))

print("Performance report saved as 'model_performance_report.txt'")

Generating detailed performance report...
Performance report saved as 'model_performance_report.txt'


## 12. Create User-Friendly Prediction Function

In [17]:
def predict_crop(N_level, P_level, K_level, temp_level, humidity_level, ph_level, rainfall_level):
    """
    Predict crop based on categorical input levels
    
    Parameters:
    - N_level: 'Very Low', 'Low', 'Medium', 'High', 'Very High'
    - P_level: 'Very Low', 'Low', 'Medium', 'High', 'Very High'
    - K_level: 'Very Low', 'Low', 'Medium', 'High', 'Very High'
    - temp_level: 'Cool', 'Mild', 'Warm', 'Hot'
    - humidity_level: 'Dry', 'Moderate', 'Humid', 'Very Humid'
    - ph_level: 'Acidic', 'Neutral', 'Alkaline'
    - rainfall_level: 'Low', 'Medium', 'High', 'Very High'
    
    Returns:
    - Predicted crop name and confidence score
    """
    try:
        # Load saved model
        model_package = joblib.load('crop_recommendation_model_new.pkl')
        model = model_package['model']
        encoders = model_package['encoders']
        
        # Create input dataframe
        input_data = pd.DataFrame({
            'N_cat': [N_level],
            'P_cat': [P_level],
            'K_cat': [K_level],
            'temperature_cat': [temp_level],
            'humidity_cat': [humidity_level],
            'ph_cat': [ph_level],
            'rainfall_cat': [rainfall_level]
        })
        
        # Encode categorical features
        for col in input_data.columns:
            input_data[col + '_encoded'] = encoders[col].transform(input_data[col])
        
        # Select encoded features
        feature_columns = [col + '_encoded' for col in input_data.columns if not col.endswith('_encoded')]
        X_input = input_data[feature_columns]
        
        # Make prediction
        prediction = model.predict(X_input)[0]
        
        # Get prediction probability if available
        if hasattr(model, 'predict_proba'):
            prediction_proba = model.predict_proba(X_input)[0]
            confidence = max(prediction_proba)
        else:
            confidence = None
        
        return prediction, confidence
        
    except Exception as e:
        return f"Error: {str(e)}", None

print("Prediction function created successfully!")
print("\nFunction usage:")
print("predict_crop(N_level, P_level, K_level, temp_level, humidity_level, ph_level, rainfall_level)")

Prediction function created successfully!

Function usage:
predict_crop(N_level, P_level, K_level, temp_level, humidity_level, ph_level, rainfall_level)


## 13. Demonstration - Making Predictions

In [18]:
# Demonstrate prediction with example inputs
print("=" * 60)
print("PREDICTION DEMONSTRATION")
print("=" * 60)

# Example 1
print("\nExample 1:")
print("Input conditions:")
print("  Nitrogen: High")
print("  Phosphorous: Medium")
print("  Potassium: Medium")
print("  Temperature: Warm")
print("  Humidity: Humid")
print("  pH: Neutral")
print("  Rainfall: High")

prediction, confidence = predict_crop(
    N_level='High',
    P_level='Medium', 
    K_level='Medium',
    temp_level='Warm',
    humidity_level='Humid',
    ph_level='Neutral',
    rainfall_level='High'
)

print(f"\nPredicted crop: {prediction}")
if confidence:
    print(f"Confidence: {confidence:.4f} ({confidence*100:.2f}%)")

# Example 2
print("\n" + "-"*40)
print("\nExample 2:")
print("Input conditions:")
print("  Nitrogen: Low")
print("  Phosphorous: Very High")
print("  Potassium: High")
print("  Temperature: Cool")
print("  Humidity: Moderate")
print("  pH: Alkaline")
print("  Rainfall: Low")

prediction2, confidence2 = predict_crop(
    N_level='Low',
    P_level='Very High', 
    K_level='High',
    temp_level='Cool',
    humidity_level='Moderate',
    ph_level='Alkaline',
    rainfall_level='Low'
)

print(f"\nPredicted crop: {prediction2}")
if confidence2:
    print(f"Confidence: {confidence2:.4f} ({confidence2*100:.2f}%)")

PREDICTION DEMONSTRATION

Example 1:
Input conditions:
  Nitrogen: High
  Phosphorous: Medium
  Potassium: Medium
  Temperature: Warm
  Humidity: Humid
  pH: Neutral
  Rainfall: High

Predicted crop: jute
Confidence: 0.7596 (75.96%)

----------------------------------------

Example 2:
Input conditions:
  Nitrogen: Low
  Phosphorous: Very High
  Potassium: High
  Temperature: Cool
  Humidity: Moderate
  pH: Alkaline
  Rainfall: Low

Predicted crop: macadamia
Confidence: 0.0676 (6.76%)


## 14. Summary and Instructions

In [1]:
print("=" * 70)
print("CROP RECOMMENDATION MODEL - TRAINING COMPLETED SUCCESSFULLY!")
print("=" * 70)

print("\nFiles Created:")
print("- crop_recommendation_model_new.pkl (best performing model)")
print("- model_performance_report.txt (detailed performance metrics)")

print(f"\nBest Model: {best_model_name}")
print(f"F1-Score: {best_score:.4f}")

print("\nTo make predictions, use categorical inputs:")
print("\nNitrogen levels: Very Low, Low, Medium, High, Very High")
print("Phosphorous levels: Very Low, Low, Medium, High, Very High")
print("Potassium levels: Very Low, Low, Medium, High, Very High")
print("Temperature levels: Cool, Mild, Warm, Hot")
print("Humidity levels: Dry, Moderate, Humid, Very Humid")
print("pH levels: Acidic, Neutral, Alkaline")
print("Rainfall levels: Low, Medium, High, Very High")

print("\nExample usage:")
print("prediction, confidence = predict_crop('High', 'Medium', 'Medium', 'Warm', 'Humid', 'Neutral', 'High')")

print("\nCrop types in dataset:")
crop_list = sorted(df['label'].unique())
for i, crop in enumerate(crop_list, 1):
    print(f"{i:2d}. {crop}")

print(f"\nTotal unique crops: {len(crop_list)}")

CROP RECOMMENDATION MODEL - TRAINING COMPLETED SUCCESSFULLY!

Files Created:
- crop_recommendation_model_new.pkl (best performing model)
- model_performance_report.txt (detailed performance metrics)


NameError: name 'best_model_name' is not defined