Crop Recommendation System

The purpose of this crop recommendation system is to assist farmers and agricultural professionals in selecting the most suitable crops for cultivation based on environmental and soil conditions. 

To achieve this, the project utilizes Jupyter Notebook (.ipynb) for data trimming and analysis, where raw agricultural data is cleaned, processed, and prepared for modeling. 

Machine learning algorithms are implemented within the notebook to build predictive models that recommend optimal crops based on factors like soil type, pH, temperature, and rainfall. 

The backend is developed using Flask, a lightweight Python web framework that processes user input and returns model predictions. 

The frontend is designed with HTML, offering a simple and interactive interface for users to enter data and receive recommendations. 

This system integrates data science, web development, and machine learning to support smarter, data-driven agricultural practices.

In [3]:
import numpy as np # import this library for matrix operations
import pandas as pd # import this library for data manipulation and analysis

In [4]:
crop = pd.read_csv("Crop_recommendation.csv") # load the CSV file containing information about land types 
# and their corresponding crop recommendations and create a dataframe of it

In [3]:
crop.head # checking if the dataframe is successfully created by checking it's first 5 rows

<bound method NDFrame.head of         N   P   K  temperature   humidity        ph    rainfall   label
0      90  42  43    20.879744  82.002744  6.502985  202.935536    rice
1      85  58  41    21.770462  80.319644  7.038096  226.655537    rice
2      60  55  44    23.004459  82.320763  7.840207  263.964248    rice
3      74  35  40    26.491096  80.158363  6.980401  242.864034    rice
4      78  42  42    20.130175  81.604873  7.628473  262.717340    rice
...   ...  ..  ..          ...        ...       ...         ...     ...
2195  107  34  32    26.774637  66.413269  6.780064  177.774507  coffee
2196   99  15  27    27.417112  56.636362  6.086922  127.924610  coffee
2197  118  33  30    24.131797  67.225123  6.362608  173.322839  coffee
2198  117  32  34    26.272418  52.127394  6.758793  127.175293  coffee
2199  104  18  30    23.603016  60.396475  6.779833  140.937041  coffee

[2200 rows x 8 columns]>

In [4]:
crop.shape # checking dimensions of the dataframe

(2200, 8)

hence the dataframe created has 2200 rows and 8 columns

In [6]:
crop.info() # checking the structure of the dataframe like data types of columns, null values, etc.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2200 entries, 0 to 2199
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   N            2200 non-null   int64  
 1   P            2200 non-null   int64  
 2   K            2200 non-null   int64  
 3   temperature  2200 non-null   float64
 4   humidity     2200 non-null   float64
 5   ph           2200 non-null   float64
 6   rainfall     2200 non-null   float64
 7   label        2200 non-null   object 
dtypes: float64(4), int64(3), object(1)
memory usage: 137.6+ KB


In [7]:
crop.isnull().sum() # checking the number of null cells in each column

N              0
P              0
K              0
temperature    0
humidity       0
ph             0
rainfall       0
label          0
dtype: int64

hence, in the dataframe created, no column has any empty cells

In [9]:
print(crop.duplicated().sum()) # checking number of rows in the dataframw that are duplicated

0


hence, the dataframe created has no duplicate rows

In [None]:
crop['label'].value_counts() # checking the number of occurences of unique values in 'label' column

label
rice           100
maize          100
chickpea       100
kidneybeans    100
pigeonpeas     100
mothbeans      100
mungbean       100
blackgram      100
lentil         100
pomegranate    100
banana         100
mango          100
grapes         100
watermelon     100
muskmelon      100
apple          100
orange         100
papaya         100
coconut        100
cotton         100
jute           100
coffee         100
Name: count, dtype: int64

hence, in 'label' column, each unique crop occurs 100 times

In [5]:
# encode the crop names as numbers and create a new column in the dataframe having those numbers

crop_dict = {
    'rice': 1,
    'maize': 2,
    'jute': 3,
    'cotton': 4,
    'coconut': 5,
    'papaya': 6,
    'orange': 7,
    'apple': 8,
    'muskmelon': 9,
    'watermelon': 10,
    'grapes': 11,
    'mango': 12,
    'banana': 13,
    'pomegranate': 14,
    'lentil': 15,
    'blackgram': 16,
    'mungbean': 17,
    'mothbeans': 18,
    'pigeonpeas': 19,
    'kidneybeans': 20,
    'chickpea': 21,
    'coffee': 22
}

crop['crop_num'] = crop['label'].map(crop_dict)

In [12]:
crop # checking the dataframe after adding this column

Unnamed: 0,N,P,K,temperature,humidity,ph,rainfall,label,crop_num
0,90,42,43,20.879744,82.002744,6.502985,202.935536,rice,1
1,85,58,41,21.770462,80.319644,7.038096,226.655537,rice,1
2,60,55,44,23.004459,82.320763,7.840207,263.964248,rice,1
3,74,35,40,26.491096,80.158363,6.980401,242.864034,rice,1
4,78,42,42,20.130175,81.604873,7.628473,262.717340,rice,1
...,...,...,...,...,...,...,...,...,...
2195,107,34,32,26.774637,66.413269,6.780064,177.774507,coffee,22
2196,99,15,27,27.417112,56.636362,6.086922,127.924610,coffee,22
2197,118,33,30,24.131797,67.225123,6.362608,173.322839,coffee,22
2198,117,32,34,26.272418,52.127394,6.758793,127.175293,coffee,22


In [6]:
crop.drop(['label'],axis=1,inplace=True) # remove 'label' column as we don't need it anymore after encoding the crop names

In [14]:
crop # checking the dataframe after removing 'label' column

Unnamed: 0,N,P,K,temperature,humidity,ph,rainfall,crop_num
0,90,42,43,20.879744,82.002744,6.502985,202.935536,1
1,85,58,41,21.770462,80.319644,7.038096,226.655537,1
2,60,55,44,23.004459,82.320763,7.840207,263.964248,1
3,74,35,40,26.491096,80.158363,6.980401,242.864034,1
4,78,42,42,20.130175,81.604873,7.628473,262.717340,1
...,...,...,...,...,...,...,...,...
2195,107,34,32,26.774637,66.413269,6.780064,177.774507,22
2196,99,15,27,27.417112,56.636362,6.086922,127.924610,22
2197,118,33,30,24.131797,67.225123,6.362608,173.322839,22
2198,117,32,34,26.272418,52.127394,6.758793,127.175293,22


now that we have desired data, we can start to work on the model

first we will split the dataframe data into training and testing data

In [7]:
# create dataframe 'X' and 'y' where 'X' is whole dataframe except 'crop_num' and 'y' is 'crop_num' column of dataframe 'crop'

X = crop.drop(['crop_num'],axis=1)
y = crop['crop_num']

In [14]:
X # checking training data

Unnamed: 0,N,P,K,temperature,humidity,ph,rainfall
0,90,42,43,20.879744,82.002744,6.502985,202.935536
1,85,58,41,21.770462,80.319644,7.038096,226.655537
2,60,55,44,23.004459,82.320763,7.840207,263.964248
3,74,35,40,26.491096,80.158363,6.980401,242.864034
4,78,42,42,20.130175,81.604873,7.628473,262.717340
...,...,...,...,...,...,...,...
2195,107,34,32,26.774637,66.413269,6.780064,177.774507
2196,99,15,27,27.417112,56.636362,6.086922,127.924610
2197,118,33,30,24.131797,67.225123,6.362608,173.322839
2198,117,32,34,26.272418,52.127394,6.758793,127.175293


In [15]:
y # checking testing data

0        1
1        1
2        1
3        1
4        1
        ..
2195    22
2196    22
2197    22
2198    22
2199    22
Name: crop_num, Length: 2200, dtype: int64

In [16]:
X.shape # checking dimensions of training data

(2200, 7)

hence, training data has 2200 rows and 7 columns

In [17]:
y.shape # checking dimensions of testing data

(2200,)

hence, testing data has 2200 rows and 1 column

In [1]:
from sklearn.model_selection import train_test_split

In [8]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # split the data into training and testing sets

In [9]:
X_train # checking the training data of dataframe 'X'

Unnamed: 0,N,P,K,temperature,humidity,ph,rainfall
1656,17,16,14,16.396243,92.181519,6.625539,102.944161
752,37,79,19,27.543848,69.347863,7.143943,69.408782
892,7,73,25,27.521856,63.132153,7.288057,45.208411
1041,101,70,48,25.360592,75.031933,6.012697,116.553145
1179,0,17,30,35.474783,47.972305,6.279134,97.790725
...,...,...,...,...,...,...,...
1638,10,5,5,21.213070,91.353492,7.817846,112.983436
1095,108,94,47,27.359116,84.546250,6.387431,90.812505
1130,11,36,31,27.920633,51.779659,6.475449,100.258567
1294,11,124,204,13.429886,80.066340,6.361141,71.400430


In [10]:
y_train # checking the training data of dataframe 'y'

1656     7
752     16
892     15
1041    13
1179    12
        ..
1638     7
1095    13
1130    12
1294    11
860     15
Name: crop_num, Length: 1760, dtype: int64

In [11]:
# scale the values of dataframe 'X_train' and 'X_test' using MinMaxScaler
from sklearn.preprocessing import MinMaxScaler
ms = MinMaxScaler()
X_train = ms.fit_transform(X_train)
X_test = ms.transform(X_test)

In [14]:
# using random forest model of machine learning to train and test the data
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier()
rfc.fit(X_train,y_train)

In [15]:


# create a function that takes characteristics of a land and returns the crop that should be planted
def recommendation(N,P,k,temperature,humidity,ph,rainfall):
    features = np.array([[N,P,k,temperature,humidity,ph,rainfall]])
    transformed_features = ms.fit_transform(features)
    prediction = rfc.predict(transformed_features)
    print(prediction)
    return prediction[0]

In [16]:
# checking if our model works for a given input
N = 40
P = 50
k = 50
temperature = 40.0
humidity = 20
ph = 100
rainfall = 100

predict = recommendation(N,P,k,temperature,humidity,ph,rainfall)

crop_dict = {
    1: "Rice", 2: "Maize", 3: "Jute", 4: "Cotton", 5: "Coconut", 6: "Papaya", 7: "Orange",
    8: "Apple", 9: "Muskmelon", 10: "Watermelon", 11: "Grapes", 12: "Mango", 13: "Banana",
    14: "Pomegranate", 15: "Lentil", 16: "Blackgram", 17: "Mungbean", 18: "Mothbeans",
    19: "Pigeonpeas", 20: "Kidneybeans", 21: "Chickpea", 22: "Coffee"
}

if predict in crop_dict:
    crop = crop_dict[predict]
    print("{} is the best crop to be cultivated ".format(crop))

else:
    print("Sorry but we are not able to recommend a proper crop for this environment")

[9]
Muskmelon is the best crop to be cultivated 


as we see that for the given conditions of the land, the model suggests that the land is most suitable for growing muskmelon

In [17]:
# pickle the necessary objects
import pickle
pickle.dump(rfc,open('model.pkl','wb'))
pickle.dump(ms,open('minmaxscaler.pkl','wb'))