## Fertilizer Recommendation Model Training with Comparison

In [1]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from joblib import dump


### Load and Inspect the Dataset

In [2]:
data = pd.read_csv('Fertilizer Prediction.csv')
print(data.head())
print('Shape of Dataset:', data.shape)

The history saving thread hit an unexpected error (OperationalError('database or disk is full')).History will not be written to the database.
   Temparature  Humidity   Moisture Soil Type  Crop Type  Nitrogen  Potassium  \
0           26         52        38     Sandy      Maize        37          0   
1           29         52        45     Loamy  Sugarcane        12          0   
2           34         65        62     Black     Cotton         7          9   
3           32         62        34       Red    Tobacco        22          0   
4           28         54        46    Clayey      Paddy        35          0   

   Phosphorous Fertilizer Name  
0            0            Urea  
1           36             DAP  
2           30        14-35-14  
3           20           28-28  
4            0            Urea  
Shape of Dataset: (99, 9)


### Data Preprocessing

Separate features and labels, and encode categorical features (i.e., *Soil Type* and *Crop Type*) using one-hot encoding.

In [3]:
# Separate labels and features
labels = data['Fertilizer Name']
features = data.drop(['Fertilizer Name'], axis=1)

# Identify categorical columns and apply one-hot encoding
categorical_cols = ['Soil Type', 'Crop Type']
features = pd.get_dummies(features, columns=categorical_cols)

print('Labels:')
print(labels.head())

print('Features after encoding:')
print(features.head())

Labels:
0        Urea
1         DAP
2    14-35-14
3       28-28
4        Urea
Name: Fertilizer Name, dtype: object
Features after encoding:
   Temparature  Humidity   Moisture  Nitrogen  Potassium  Phosphorous  \
0           26         52        38        37          0            0   
1           29         52        45        12          0           36   
2           34         65        62         7          9           30   
3           32         62        34        22          0           20   
4           28         54        46        35          0            0   

   Soil Type_Black  Soil Type_Clayey  Soil Type_Loamy  Soil Type_Red  ...  \
0            False             False            False          False  ...   
1            False             False             True          False  ...   
2             True             False            False          False  ...   
3            False             False            False           True  ...   
4            False              True

### Feature Scaling

Scale the features to normalize the data.

In [4]:
scaler = StandardScaler()
scaled_features = scaler.fit_transform(features)
features = pd.DataFrame(scaled_features, columns=features.columns)

print('Scaled Features:')
print(features.head())

Scaled Features:
   Temparature  Humidity   Moisture  Nitrogen  Potassium  Phosphorous  \
0    -1.229084  -1.230737 -0.462064  1.567539  -0.584910    -1.387607   
1    -0.368145  -1.230737  0.162128 -0.598658  -0.584910     1.297209   
2     1.066752   1.006492  1.678023 -1.031898   0.970777     0.849740   
3     0.492793   0.490209 -0.818745  0.267821  -0.584910     0.103958   
4    -0.655125  -0.886548  0.251298  1.394244  -0.584910    -1.387607   

   Soil Type_Black  Soil Type_Clayey  Soil Type_Loamy  Soil Type_Red  ...  \
0        -0.487340         -0.503155        -0.518875      -0.487340  ...   
1        -0.487340         -0.503155         1.927248      -0.487340  ...   
2         2.051957         -0.503155        -0.518875      -0.487340  ...   
3        -0.487340         -0.503155        -0.518875       2.051957  ...   
4        -0.487340          1.987461        -0.518875      -0.487340  ...   

   Crop Type_Cotton  Crop Type_Ground Nuts  Crop Type_Maize  \
0         -0.37139

### Split the Dataset

Split the data into training and testing sets.

In [5]:
X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.2, random_state=42)
print('Training set size:', X_train.shape)
print('Testing set size:', X_test.shape)

Training set size: (79, 22)
Testing set size: (20, 22)


### Model Training: Random Forest

In [6]:
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)
y_pred_rf = rf_model.predict(X_test)
accuracy_rf = accuracy_score(y_test, y_pred_rf)
print('Random Forest Accuracy:', accuracy_rf)

Random Forest Accuracy: 0.95


### Model Training: K-Nearest Neighbors

In [7]:
knn_model = KNeighborsClassifier(n_neighbors=3)
knn_model.fit(X_train, y_train)
y_pred_knn = knn_model.predict(X_test)
accuracy_knn = accuracy_score(y_test, y_pred_knn)
print('KNN Accuracy:', accuracy_knn)

KNN Accuracy: 0.1


### Compare Model Performances and Save the Best Model

The model with the higher accuracy will be chosen and saved along with the scaler.

In [8]:
if accuracy_rf > accuracy_knn:
    best_model = rf_model
    model_type = 'Random Forest'
else:
    best_model = knn_model
    model_type = 'K-Nearest Neighbors'

print(f'The better model based on accuracy is: {model_type}')

dump(best_model, 'fertilizer_model.pkl')
dump(scaler, 'fertilizer_scaler.pkl')
print('Best model and scaler saved to disk.')

The better model based on accuracy is: Random Forest
Best model and scaler saved to disk.
