# **Fertilizer Recommendation (Model Training)**
#### **Objective** : To train a Machine learning model to predict fertilizer labels based on given features or variables.
#### **Problem Type** : Multiclass Classification (predict fertilizer class)
#### The dataset consists of 8 Independent features to predict 7 different classes of fertilizers.
#### **Dataset Link** - https://www.kaggle.com/datasets/gdabhishek/fertilizer-prediction

## **Exploratory Data Analysis**

In [None]:
# Importing required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Loading the downloaded dataset
path = r"/content/Fertilizer_Prediction.csv"
df = pd.read_csv(path)

# rename target column
df = df.rename({'Fertilizer Name': 'Fertilizer','Crop Type': 'Crop_Type','Soil Type': 'Soil_Type'}, axis=1)

df.sample(15)

Unnamed: 0,Temparature,Humidity,Moisture,Soil_Type,Crop_Type,Nitrogen,Potassium,Phosphorous,Fertilizer
65,36,68,62,Red,Cotton,15,0,40,DAP
44,35,67,42,Sandy,Barley,10,0,35,DAP
63,28,54,47,Sandy,Barley,5,18,15,10-26-26
10,27,54,28,Clayey,Pulses,13,0,40,DAP
31,30,60,27,Loamy,Sugarcane,12,0,40,DAP
32,34,65,38,Clayey,Paddy,39,0,0,Urea
57,29,58,37,Sandy,Millets,8,0,15,20-20
20,30,60,44,Sandy,Millets,10,0,9,20-20
4,28,54,46,Clayey,Paddy,35,0,0,Urea
38,25,50,26,Red,Ground Nuts,15,14,11,17-17-17


In [None]:
print("SHAPE : ", df.shape)
df.info()

SHAPE :  (99, 9)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99 entries, 0 to 98
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Temparature  99 non-null     int64 
 1   Humidity     99 non-null     int64 
 2   Moisture     99 non-null     int64 
 3   Soil_Type    99 non-null     object
 4   Crop_Type    99 non-null     object
 5   Nitrogen     99 non-null     int64 
 6   Potassium    99 non-null     int64 
 7   Phosphorous  99 non-null     int64 
 8   Fertilizer   99 non-null     object
dtypes: int64(6), object(3)
memory usage: 7.1+ KB


In [None]:
df.describe()

Unnamed: 0,Temparature,Humidity,Moisture,Nitrogen,Potassium,Phosphorous
count,99.0,99.0,99.0,99.0,99.0,99.0
mean,30.282828,59.151515,43.181818,18.909091,3.383838,18.606061
std,3.502304,5.840331,11.271568,11.599693,5.814667,13.476978
min,25.0,50.0,25.0,4.0,0.0,0.0
25%,28.0,54.0,34.0,10.0,0.0,9.0
50%,30.0,60.0,41.0,13.0,0.0,19.0
75%,33.0,64.0,50.5,24.0,7.5,30.0
max,38.0,72.0,65.0,42.0,19.0,42.0


In [None]:
# Printing number of samples per each class
df["Fertilizer"].value_counts()

Urea        22
DAP         18
28-28       17
14-35-14    14
20-20       14
17-17-17     7
10-26-26     7
Name: Fertilizer, dtype: int64

## **Analyze Independent Variables**

In [None]:
# list of all numerical variables in dataset
numerical_features = [feature for feature in df.columns if df[feature].dtypes != 'O']
print('Number of numerical variables: ', len(numerical_features), numerical_features)

# list of all discrete variables in dataset
discrete_features=[feature for feature in numerical_features if len(df[feature].unique())<25]
print('Number of Discrete variables: ', len(discrete_features), discrete_features)

# list of all continuous variables in dataset
continuous_features=[feature for feature in numerical_features if feature not in discrete_features]
print('Number of Continuous variables: ', len(continuous_features), continuous_features)

# list of all categorical variables in dataset
categorical_features=[feature for feature in df.columns if df[feature].dtypes=='O']
print('Number of categorical variables: ', len(categorical_features), categorical_features)


Number of numerical variables:  6 ['Temparature', 'Humidity ', 'Moisture', 'Nitrogen', 'Potassium', 'Phosphorous']
Number of Discrete variables:  4 ['Temparature', 'Humidity ', 'Nitrogen', 'Potassium']
Number of Continuous variables:  2 ['Moisture', 'Phosphorous']
Number of categorical variables:  3 ['Soil_Type', 'Crop_Type', 'Fertilizer']


In [None]:
# Find the Cardinality i.e number of categories in each categorical feature
for feature in categorical_features:
    if feature=='Fertilizer': 
      pass
    print('The feature is {} and no. of categories are {}'.format(feature,len(df[feature].unique())))

The feature is Soil_Type and no. of categories are 5
The feature is Crop_Type and no. of categories are 11
The feature is Fertilizer and no. of categories are 7


## **One-Hot Encoding the Categorical Variables**

In [None]:
# list of categorical features in dataset
categorical_features=[feature for feature in df.columns if df[feature].dtype=='O']

# Remove the Target variable.
categorical_features.remove('Fertilizer')

# encode categorical features
new_encoded_columns = pd.get_dummies(df[categorical_features])

# Concatinating with original dataframe
df = pd.concat([df,new_encoded_columns],axis="columns")

# dropping the categorical variables since they are redundant now.
df = df.drop(categorical_features,axis="columns")

In [None]:
df.head(10)

Unnamed: 0,Temparature,Humidity,Moisture,Nitrogen,Potassium,Phosphorous,Fertilizer,Soil_Type_Black,Soil_Type_Clayey,Soil_Type_Loamy,...,Crop_Type_Cotton,Crop_Type_Ground Nuts,Crop_Type_Maize,Crop_Type_Millets,Crop_Type_Oil seeds,Crop_Type_Paddy,Crop_Type_Pulses,Crop_Type_Sugarcane,Crop_Type_Tobacco,Crop_Type_Wheat
0,26,52,38,37,0,0,Urea,0,0,0,...,0,0,1,0,0,0,0,0,0,0
1,29,52,45,12,0,36,DAP,0,0,1,...,0,0,0,0,0,0,0,1,0,0
2,34,65,62,7,9,30,14-35-14,1,0,0,...,1,0,0,0,0,0,0,0,0,0
3,32,62,34,22,0,20,28-28,0,0,0,...,0,0,0,0,0,0,0,0,1,0
4,28,54,46,35,0,0,Urea,0,1,0,...,0,0,0,0,0,1,0,0,0,0
5,26,52,35,12,10,13,17-17-17,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,25,50,64,9,0,10,20-20,0,0,0,...,1,0,0,0,0,0,0,0,0,0
7,33,64,50,41,0,0,Urea,0,0,1,...,0,0,0,0,0,0,0,0,0,1
8,30,60,42,21,0,18,28-28,0,0,0,...,0,0,0,1,0,0,0,0,0,0
9,29,58,33,9,7,30,14-35-14,1,0,0,...,0,0,0,0,1,0,0,0,0,0


## **Training the Model**

In [None]:
x = df.drop("Fertilizer",axis=1)
x.head(10)

Unnamed: 0,Temparature,Humidity,Moisture,Nitrogen,Potassium,Phosphorous,Soil_Type_Black,Soil_Type_Clayey,Soil_Type_Loamy,Soil_Type_Red,...,Crop_Type_Cotton,Crop_Type_Ground Nuts,Crop_Type_Maize,Crop_Type_Millets,Crop_Type_Oil seeds,Crop_Type_Paddy,Crop_Type_Pulses,Crop_Type_Sugarcane,Crop_Type_Tobacco,Crop_Type_Wheat
0,26,52,38,37,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0
1,29,52,45,12,0,36,0,0,1,0,...,0,0,0,0,0,0,0,1,0,0
2,34,65,62,7,9,30,1,0,0,0,...,1,0,0,0,0,0,0,0,0,0
3,32,62,34,22,0,20,0,0,0,1,...,0,0,0,0,0,0,0,0,1,0
4,28,54,46,35,0,0,0,1,0,0,...,0,0,0,0,0,1,0,0,0,0
5,26,52,35,12,10,13,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,25,50,64,9,0,10,0,0,0,1,...,1,0,0,0,0,0,0,0,0,0
7,33,64,50,41,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,1
8,30,60,42,21,0,18,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
9,29,58,33,9,7,30,1,0,0,0,...,0,0,0,0,1,0,0,0,0,0


In [None]:
print("Columns : ", x.columns)
x.info()

Columns :  Index(['Temparature', 'Humidity ', 'Moisture', 'Nitrogen', 'Potassium',
       'Phosphorous', 'Soil_Type_Black', 'Soil_Type_Clayey', 'Soil_Type_Loamy',
       'Soil_Type_Red', 'Soil_Type_Sandy', 'Crop_Type_Barley',
       'Crop_Type_Cotton', 'Crop_Type_Ground Nuts', 'Crop_Type_Maize',
       'Crop_Type_Millets', 'Crop_Type_Oil seeds', 'Crop_Type_Paddy',
       'Crop_Type_Pulses', 'Crop_Type_Sugarcane', 'Crop_Type_Tobacco',
       'Crop_Type_Wheat'],
      dtype='object')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99 entries, 0 to 98
Data columns (total 22 columns):
 #   Column                 Non-Null Count  Dtype
---  ------                 --------------  -----
 0   Temparature            99 non-null     int64
 1   Humidity               99 non-null     int64
 2   Moisture               99 non-null     int64
 3   Nitrogen               99 non-null     int64
 4   Potassium              99 non-null     int64
 5   Phosphorous            99 non-null     int64
 6   Soil_

In [None]:
y = df["Fertilizer"]
y.head(10)

0        Urea
1         DAP
2    14-35-14
3       28-28
4        Urea
5    17-17-17
6       20-20
7        Urea
8       28-28
9    14-35-14
Name: Fertilizer, dtype: object

## **Data Splitting**

In [None]:
# DATA SPLITTING 
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2,shuffle=True)

In [None]:
x_train.values[:10]

array([[34, 65, 63, 14,  0, 38,  0,  0,  0,  1,  0,  0,  1,  0,  0,  0,
         0,  0,  0,  0,  0,  0],
       [27, 54, 30, 13,  0, 13,  0,  1,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  1,  0,  0,  0],
       [31, 62, 48, 14, 15, 12,  0,  0,  0,  0,  1,  0,  0,  0,  1,  0,
         0,  0,  0,  0,  0,  0],
       [34, 65, 48, 23,  0, 19,  0,  0,  1,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  1],
       [28, 54, 65, 39,  0,  0,  1,  0,  0,  0,  0,  0,  1,  0,  0,  0,
         0,  0,  0,  0,  0,  0],
       [34, 65, 62,  7,  9, 30,  1,  0,  0,  0,  0,  0,  1,  0,  0,  0,
         0,  0,  0,  0,  0,  0],
       [30, 60, 27,  4, 17, 17,  0,  0,  0,  1,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  1,  0],
       [29, 58, 52, 13,  0, 36,  0,  0,  1,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  1],
       [25, 50, 65, 36,  0,  0,  0,  0,  1,  0,  0,  0,  1,  0,  0,  0,
         0,  0,  0,  0,  0,  0],
       [29, 58, 43, 24,  0, 18,  0,  1,  0,  0,  0,  0,

In [None]:
y_train.values[:10]

array(['DAP', '20-20', '17-17-17', '28-28', 'Urea', '14-35-14',
       '10-26-26', 'DAP', 'Urea', '28-28'], dtype=object)

## **LightGBM Classifier Model**

In [None]:
# Creating a lightgbm model
import lightgbm as lgb

model = lgb.LGBMClassifier()

# Training the model using Training Data
model.fit(x_train,y_train)

LGBMClassifier()

In [None]:
# Make Prediction
output = model.predict([[27, 54, 28, 13,  0, 40,  0,  1,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
  1,  0,  0  ,0]])
print("Predicted Fertilizer : ",output[0])

Predicted Fertilizer :  DAP


## **Input Function**

In [None]:
import numpy as np

def get_input(x):

    # Index values of each variable in x
    x_structure = {
        "Temparature": 0, "Humidity": 1, "Moisture": 2, "Nitrogen": 3,
        "Potassium": 4, "Phosphorous": 5, "Black": 6,  "Clayey": 7, "Loamy": 8,
        "Red": 9, "Sandy": 10, "Barley": 11, "Cotton": 12, "Ground Nuts": 13, "Maize": 14,
        "Millets": 15, "Oil seeds": 16, "Paddy": 17, "Pulses": 18, "Sugarcane": 19, "Tobacco": 20,
        "Wheat": 21
    }

    output = np.zeros(len(x_structure))
    output[0] = x[0]
    output[1] = x[1]
    output[2] = x[2]
    output[3] = x[3]
    output[4] = x[4]
    output[5] = x[5]
    output[x_structure[x[6]]] = 1
    output[x_structure[x[7]]] = 1
    return output


input_x = get_input([27, 54, 28	, 13	, 0	, 40, "Clayey", "Pulses"])
input_x

array([27., 54., 28., 13.,  0., 40.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.])

In [None]:
# Make Prediction
x1 = get_input([27, 54, 28	, 13	, 0	, 40, "Clayey", "Pulses"])

y1 = model.predict([x1])
print("Predicted Fertilizer : ",y1[0])

Predicted Fertilizer :  DAP


In [387]:
# Save the model
model.booster_.save_model("fertilizer_model.txt")

<lightgbm.basic.Booster at 0x7f85daac1a30>

## **CONCLUSION : The objective of this notebook have been achieved. We trained and saved our model which we can now use in production environment.**