# `CROP RECOMMENDATION`

Recommending the crop according to certain situations and features is basically Crop Recommendation. The selection of crops & cropping system plays a major role in improving the productivity and profitability of the farmers. Crop recommendation system thereby helps farmers during this decision making process by considering various parameters such as temperature, rainfall, and seasons and agro-ecological situations

![crop_recommendation](https://www.day1tech.com/wp-content/uploads/2021/06/Image-Crop-Recommendation@1x-1.png)

So through this Machine Learning model we are going to predict the suitable crop we can grow on a particular by considering it's NPK value i.e. Nitrogen, Phosphorous and Potassium value of the soil (along with it's PH value) and focusing on weather conditions like Humidity and Temperature.

**So the following main steps we are going to perform in this notebook :**
* Installing and Importing Libraries
* Loading the Dataset
* Data Preprocessing (like LabelEncoding)
* Splitting the Dataset
* Creating the Machine Learning Pipeline
* Saving the best Model

## `1. Installing and Importing Libraries`

In [34]:
!pip install pandas numpy scikit-learn xgboost



You should consider upgrading via the 'C:\Users\Aryan Sinha\AppData\Local\Programs\Python\Python39\python.exe -m pip install --upgrade pip' command.


In [35]:
import pandas as pd
import numpy as np

## `2. Loading and Analysing the Dataset`

In [36]:
df = pd.read_csv('data/crop_recommendation.csv')
df.head()

Unnamed: 0,N,P,K,temperature,humidity,ph,rainfall,label
0,90,42,43,20.879744,82.002744,6.502985,202.935536,rice
1,85,58,41,21.770462,80.319644,7.038096,226.655537,rice
2,60,55,44,23.004459,82.320763,7.840207,263.964248,rice
3,74,35,40,26.491096,80.158363,6.980401,242.864034,rice
4,78,42,42,20.130175,81.604873,7.628473,262.71734,rice


In [37]:
df.isnull().sum()

N              0
P              0
K              0
temperature    0
humidity       0
ph             0
rainfall       0
label          0
dtype: int64

In [38]:
df.shape

(2200, 8)

**So, we don't have any null values in our dataset**

In [39]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2200 entries, 0 to 2199
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   N            2200 non-null   int64  
 1   P            2200 non-null   int64  
 2   K            2200 non-null   int64  
 3   temperature  2200 non-null   float64
 4   humidity     2200 non-null   float64
 5   ph           2200 non-null   float64
 6   rainfall     2200 non-null   float64
 7   label        2200 non-null   object 
dtypes: float64(4), int64(3), object(1)
memory usage: 137.6+ KB


In [40]:
df.describe()

Unnamed: 0,N,P,K,temperature,humidity,ph,rainfall
count,2200.0,2200.0,2200.0,2200.0,2200.0,2200.0,2200.0
mean,50.551818,53.362727,48.149091,25.616244,71.481779,6.46948,103.463655
std,36.917334,32.985883,50.647931,5.063749,22.263812,0.773938,54.958389
min,0.0,5.0,5.0,8.825675,14.25804,3.504752,20.211267
25%,21.0,28.0,20.0,22.769375,60.261953,5.971693,64.551686
50%,37.0,51.0,32.0,25.598693,80.473146,6.425045,94.867624
75%,84.25,68.0,49.0,28.561654,89.948771,6.923643,124.267508
max,140.0,145.0,205.0,43.675493,99.981876,9.935091,298.560117


In [41]:
df.label.value_counts()

rice           100
maize          100
jute           100
cotton         100
coconut        100
papaya         100
orange         100
apple          100
muskmelon      100
watermelon     100
grapes         100
mango          100
banana         100
pomegranate    100
lentil         100
blackgram      100
mungbean       100
mothbeans      100
pigeonpeas     100
kidneybeans    100
chickpea       100
coffee         100
Name: label, dtype: int64

So we have 100 samples of each 22 crop in our dataset and hence the dataset can be said as balanced.

## `3. Data Preprocessing`

### Label Encoding the `label` Column

In [42]:
from sklearn.preprocessing import LabelEncoder

In [43]:
le = LabelEncoder().fit(df['label'])

In [44]:
le_name_mapping = dict(zip(le.classes_, le.transform(le.classes_)))
print(le_name_mapping)

{'apple': 0, 'banana': 1, 'blackgram': 2, 'chickpea': 3, 'coconut': 4, 'coffee': 5, 'cotton': 6, 'grapes': 7, 'jute': 8, 'kidneybeans': 9, 'lentil': 10, 'maize': 11, 'mango': 12, 'mothbeans': 13, 'mungbean': 14, 'muskmelon': 15, 'orange': 16, 'papaya': 17, 'pigeonpeas': 18, 'pomegranate': 19, 'rice': 20, 'watermelon': 21}


In [45]:
df['label'] = le.transform(df['label'])

In [46]:
df.head()

Unnamed: 0,N,P,K,temperature,humidity,ph,rainfall,label
0,90,42,43,20.879744,82.002744,6.502985,202.935536,20
1,85,58,41,21.770462,80.319644,7.038096,226.655537,20
2,60,55,44,23.004459,82.320763,7.840207,263.964248,20
3,74,35,40,26.491096,80.158363,6.980401,242.864034,20
4,78,42,42,20.130175,81.604873,7.628473,262.71734,20


## `4. Splitting the Data`

In [47]:
from sklearn.model_selection import train_test_split

In [48]:
X = df.drop(['label'],axis=1)
y = df['label']

In [49]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42)

In [50]:
X_train.head()

Unnamed: 0,N,P,K,temperature,humidity,ph,rainfall
1656,17,16,14,16.396243,92.181519,6.625539,102.944161
752,37,79,19,27.543848,69.347863,7.143943,69.408782
892,7,73,25,27.521856,63.132153,7.288057,45.208411
1041,101,70,48,25.360592,75.031933,6.012697,116.553145
1179,0,17,30,35.474783,47.972305,6.279134,97.790725


In [51]:
X_test.head()

Unnamed: 0,N,P,K,temperature,humidity,ph,rainfall
1451,101,17,47,29.494014,94.729813,6.185053,26.308209
1334,98,8,51,26.179346,86.522581,6.259336,49.43051
1761,59,62,49,43.360515,93.351916,6.941497,114.778071
1735,44,60,55,34.280461,90.555616,6.825371,98.540477
1576,30,137,200,22.9143,90.704756,5.603413,118.604465


In [52]:
X_train.shape,X_test.shape,y_train.shape,y_test.shape

((1760, 7), (440, 7), (1760,), (440,))

## `5. Creating the Pipeline`

### Importing the Important Libraries

In [53]:
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import accuracy_score

In [54]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier,GradientBoostingClassifier,ExtraTreesClassifier
from sklearn.svm import SVC
from xgboost import XGBClassifier

### KNN Classifier

In [55]:
step1 = ColumnTransformer(transformers=[
    ('scal_tnf',MinMaxScaler(),[0,1,2,3,4,5,6]),
],remainder='passthrough')

step2 = KNeighborsClassifier()

pipe=Pipeline([
    ('step1',step1),
    ('step2',step2)
])

pipe.fit(X_train,y_train)
y_pred=pipe.predict(X_test)
print('Accuracy: ',accuracy_score(y_test,y_pred))

Accuracy:  0.9681818181818181


### SVM Classifier

In [56]:
step1 = ColumnTransformer(transformers=[
    ('scal_tnf',MinMaxScaler(),[0,1,2,3,4,5,6]),
],remainder='passthrough')

step2 = SVC()

pipe=Pipeline([
    ('step1',step1),
    ('step2',step2)
])

pipe.fit(X_train,y_train)
y_pred=pipe.predict(X_test)
print('Accuracy: ',accuracy_score(y_test,y_pred))

Accuracy:  0.9681818181818181


### Decision Tree Classifier

In [57]:
step1 = ColumnTransformer(transformers=[
    ('scal_tnf',MinMaxScaler(),[0,1,2,3,4,5,6]),
],remainder='passthrough')

step2 = DecisionTreeClassifier(max_depth=10)

pipe=Pipeline([
    ('step1',step1),
    ('step2',step2)
])

pipe.fit(X_train,y_train)
y_pred=pipe.predict(X_test)
print('Accuracy: ',accuracy_score(y_test,y_pred))

Accuracy:  0.9863636363636363


### Random Forest Classifier

In [64]:
step1 = ColumnTransformer(transformers=[
    ('scal_tnf',MinMaxScaler(),[0,1,2,3,4,5,6]),
],remainder='passthrough')

step2 = RandomForestClassifier(n_jobs=-1, 
                               random_state=42, 
                               n_estimators=10,
                               max_features=10,
                               max_depth=40, 
                               max_leaf_nodes=2**20)

pipe=Pipeline([
    ('step1',step1),
    ('step2',step2)
])

pipe.fit(X_train,y_train)
y_pred=pipe.predict(X_test)
print('Accuracy: ',accuracy_score(y_test,y_pred))

Accuracy:  0.990909090909091


### Extra Tree Classifier

In [59]:
step1 = ColumnTransformer(transformers=[
    ('scal_tnf',MinMaxScaler(),[0,1,2,3,4,5,6]),
],remainder='passthrough')

step2 = ExtraTreesClassifier(n_estimators=100,
                              random_state=3,
                              max_features=0.75,
                              max_depth=15)

pipe=Pipeline([
    ('step1',step1),
    ('step2',step2)
])

pipe.fit(X_train,y_train)
y_pred=pipe.predict(X_test)
print('Accuracy: ',accuracy_score(y_test,y_pred))

Accuracy:  0.990909090909091


### Gradient Boosting Classifier

In [60]:
step1 = ColumnTransformer(transformers=[
    ('scal_tnf',MinMaxScaler(),[0,1,2,3,4,5,6]),
],remainder='passthrough')

step2 = GradientBoostingClassifier()

pipe=Pipeline([
    ('step1',step1),
    ('step2',step2)
])

pipe.fit(X_train,y_train)
y_pred=pipe.predict(X_test)
print('Accuracy: ',accuracy_score(y_test,y_pred))

Accuracy:  0.9818181818181818


### XgBoost Classifier

In [61]:
step1 = ColumnTransformer(transformers=[
    ('scal_tnf',MinMaxScaler(),[0,1,2,3,4,5,6]),
],remainder='passthrough')

step2 = XGBClassifier(n_estimators=50,max_depth=10,learning_rate=0.5)

pipe=Pipeline([
    ('step1',step1),
    ('step2',step2)
])

pipe.fit(X_train,y_train)
y_pred=pipe.predict(X_test)
print('Accuracy: ',accuracy_score(y_test,y_pred))

Accuracy:  0.9886363636363636


In [62]:
from sklearn.ensemble import VotingClassifier

### Voting Regressor

In [63]:
step1 = ColumnTransformer(transformers=[
    ('scal_tnf',MinMaxScaler(),[0,1,2,3,4,5,6]),
],remainder='passthrough')

rf = RandomForestClassifier(n_jobs=-1, 
                               random_state=42, 
                               n_estimators=10,
                               max_features=10,
                               max_depth=40, 
                               max_leaf_nodes=2**20)

gbdt = GradientBoostingClassifier()

xgb = XGBClassifier(n_estimators=50,max_depth=10,learning_rate=0.5)

et = ExtraTreesClassifier(n_estimators=100,
                              random_state=3,
                              max_features=0.75,
                              max_depth=15)

step2 = VotingClassifier([('rf', rf), ('gbdt', gbdt), ('xgb',xgb), ('et',et)],weights=[5,1,2,2])

pipe=Pipeline([
    ('step1',step1),
    ('step2',step2)
])

pipe.fit(X_train,y_train)
y_pred=pipe.predict(X_test)
print('Accuracy: ',accuracy_score(y_test,y_pred))

Accuracy:  0.9886363636363636


We are finally going to use this **Random Forest Classfier Model**  as it's giving highest accuracy of **99.1%** on the testing data

## `6. Saving the Best Model (into pickle file)`

In [67]:
import pickle
pickle.dump(pipe,open('pipe.pkl','wb'))

In [30]:
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler

In [69]:
pipe = pickle.load(open('pipe.pkl','rb'))

df = [[90,42,43,20.87,82.01,6.5,202.93]]

pipe.predict(df)[0]



20