## Crop Recommendation Model Training with Comparison

In [1]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from joblib import dump

This notebook demonstrates the process of training two different machine learning models (KNN and Gaussian Naive Bayes) for recommending crops based on environmental features such as NPK values, temperature, humidity, pH, and rainfall. We will compare their performances to decide which model to use.

In [2]:
data = pd.read_csv('Crop_recommendation.csv')
print(data.head())
print('Shape of Dataset:', data.shape)

    N   P   K  temperature   humidity        ph    rainfall label
0  90  42  43    20.879744  82.002744  6.502985  202.935536  rice
1  85  58  41    21.770462  80.319644  7.038096  226.655537  rice
2  60  55  44    23.004459  82.320763  7.840207  263.964248  rice
3  74  35  40    26.491096  80.158363  6.980401  242.864034  rice
4  78  42  42    20.130175  81.604873  7.628473  262.717340  rice
Shape of Dataset: (2200, 8)


### Import Necessary Libraries

In [3]:
labels = data['label']
features = data.drop('label', axis=1)
print('Labels:')
print(labels.head())
print('Features:')
print(features.head())

Labels:
0    rice
1    rice
2    rice
3    rice
4    rice
Name: label, dtype: object
Features:
    N   P   K  temperature   humidity        ph    rainfall
0  90  42  43    20.879744  82.002744  6.502985  202.935536
1  85  58  41    21.770462  80.319644  7.038096  226.655537
2  60  55  44    23.004459  82.320763  7.840207  263.964248
3  74  35  40    26.491096  80.158363  6.980401  242.864034
4  78  42  42    20.130175  81.604873  7.628473  262.717340


Import libraries required for handling data, machine learning operations, and model evaluation.

In [4]:
scaler = StandardScaler()
scaled_features = scaler.fit_transform(features)
features = pd.DataFrame(scaled_features, columns=features.columns)
print(features.head())

          N         P         K  temperature  humidity        ph  rainfall
0  1.068797 -0.344551 -0.101688    -0.935587  0.472666  0.043302  1.810361
1  0.933329  0.140616 -0.141185    -0.759646  0.397051  0.734873  2.242058
2  0.255986  0.049647 -0.081939    -0.515898  0.486954  1.771510  2.921066
3  0.635298 -0.556811 -0.160933     0.172807  0.389805  0.660308  2.537048
4  0.743673 -0.344551 -0.121436    -1.083647  0.454792  1.497868  2.898373


### Load and Inspect the Dataset

In [5]:
X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.2, random_state=42)
print('Training set size:', X_train.shape)
print('Testing set size:', X_test.shape)

Training set size: (1760, 7)
Testing set size: (440, 7)


Load the crop recommendation data from the provided CSV file and inspect the first few entries to understand its structure.

In [6]:
gnb_model = GaussianNB()
gnb_model.fit(X_train, y_train)
y_pred_gnb = gnb_model.predict(X_test)
accuracy_gnb = accuracy_score(y_test, y_pred_gnb)
print('Gaussian Naive Bayes Accuracy:', accuracy_gnb)

Gaussian Naive Bayes Accuracy: 0.9954545454545455


### Data Preprocessing

In [7]:
knn_model = KNeighborsClassifier(n_neighbors=3)
knn_model.fit(X_train, y_train)
y_pred_knn = knn_model.predict(X_test)
accuracy_knn = accuracy_score(y_test, y_pred_knn)
print('KNN Accuracy:', accuracy_knn)

KNN Accuracy: 0.9681818181818181


Separate the features and the labels, and apply scaling to normalize the features, crucial for effective model training.

In [8]:
better_model = gnb_model if accuracy_gnb > accuracy_knn else knn_model
model_type = 'Gaussian Naive Bayes' if accuracy_gnb > accuracy_knn else 'K-Nearest Neighbors'
print(f'The better model based on accuracy is {model_type}.')

The better model based on accuracy is Gaussian Naive Bayes.


### Model Training: Gaussian Naive Bayes

In [9]:
dump(better_model, 'crop_model.pkl')
dump(scaler, 'crop_scaler.pkl')
print('Model and scaler saved to disk.')

Model and scaler saved to disk.
