# Multi-Label Classification with an Irrigation Machine

Multi-label classification problems differ from multi-class problems in that each observation can be labeled with zero or more classes. So classes/labels are not mutually exclusive, you could water all, none or any combination of farm parcels based on the inputs.

To account for this behavior what we do is have an output layer with as many neurons as classes but, unlike in multi-class problems, each output neuron has a sigmoid activation function. This makes each neuron in the output layer able to output a number between 0 and 1 independently.

Here, we have a dataset of a farm field. In it is an array of 20 sensors distributed along 3 crop fields. These sensors measure, among other things, the humidity of the soil, radiation of the sun, etc. Our task is to use the combination of measurements from these sensors to decide which parcels to water, given each parcel has different environmental requirements.

Each sensor measures an integer value between 0 and 13 volts. Parcels can be represented as one-hot encoded vectors of length 3, where each index is one of the parcels. Parcels can be watered simultaneously.

In [1]:
# import libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder

# Import the sequential model and dense layer
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.utils import to_categorical

2024-07-27 01:47:23.692004: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [2]:
# read the data
irrigation_machine = pd.read_csv('data/irrigation_machine.csv')
irrigation_machine.head()

Unnamed: 0.1,Unnamed: 0,sensor_0,sensor_1,sensor_2,sensor_3,sensor_4,sensor_5,sensor_6,sensor_7,sensor_8,...,sensor_13,sensor_14,sensor_15,sensor_16,sensor_17,sensor_18,sensor_19,parcel_0,parcel_1,parcel_2
0,0,1.0,2.0,1.0,7.0,0.0,1.0,1.0,4.0,0.0,...,8.0,1.0,0.0,2.0,1.0,9.0,2.0,0,1,0
1,1,5.0,1.0,3.0,5.0,2.0,2.0,1.0,2.0,3.0,...,4.0,5.0,5.0,2.0,2.0,2.0,7.0,0,0,0
2,2,3.0,1.0,4.0,3.0,4.0,0.0,1.0,6.0,0.0,...,3.0,3.0,1.0,0.0,3.0,1.0,0.0,1,1,0
3,3,2.0,2.0,4.0,3.0,5.0,0.0,3.0,2.0,2.0,...,4.0,1.0,1.0,4.0,1.0,3.0,2.0,0,0,0
4,4,4.0,3.0,3.0,2.0,5.0,1.0,3.0,1.0,1.0,...,1.0,3.0,2.0,2.0,1.0,1.0,0.0,1,1,0


In [4]:
irrigation_machine.describe()

Unnamed: 0.1,Unnamed: 0,sensor_0,sensor_1,sensor_2,sensor_3,sensor_4,sensor_5,sensor_6,sensor_7,sensor_8,...,sensor_13,sensor_14,sensor_15,sensor_16,sensor_17,sensor_18,sensor_19,parcel_0,parcel_1,parcel_2
count,2000.0,2000.0,2000.0,2000.0,2000.0,2000.0,2000.0,2000.0,2000.0,2000.0,...,2000.0,2000.0,2000.0,2000.0,2000.0,2000.0,2000.0,2000.0,2000.0,2000.0
mean,999.5,1.437,1.659,2.6545,2.6745,2.8875,1.411,3.3155,4.2015,1.214,...,2.7315,3.416,1.2065,2.325,1.7295,2.2745,1.8135,0.6355,0.7305,0.212
std,577.494589,1.321327,1.338512,1.699286,1.855875,1.816451,1.339394,2.206444,2.280241,1.386782,...,1.774537,1.960578,1.258034,1.715181,1.561265,1.67169,1.469285,0.48141,0.443811,0.408827
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,499.75,0.0,1.0,1.0,1.0,2.0,0.0,2.0,3.0,0.0,...,1.0,2.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0
50%,999.5,1.0,1.0,2.0,2.0,3.0,1.0,3.0,4.0,1.0,...,2.0,3.0,1.0,2.0,1.0,2.0,2.0,1.0,1.0,0.0
75%,1499.25,2.0,2.0,4.0,4.0,4.0,2.0,5.0,6.0,2.0,...,4.0,5.0,2.0,3.0,3.0,3.0,3.0,1.0,1.0,0.0
max,1999.0,8.0,9.0,10.0,11.0,12.0,7.0,13.0,12.0,8.0,...,11.0,11.0,6.0,10.0,11.0,10.0,7.0,1.0,1.0,1.0


In [6]:
irrigation_machine.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 24 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  2000 non-null   int64  
 1   sensor_0    2000 non-null   float64
 2   sensor_1    2000 non-null   float64
 3   sensor_2    2000 non-null   float64
 4   sensor_3    2000 non-null   float64
 5   sensor_4    2000 non-null   float64
 6   sensor_5    2000 non-null   float64
 7   sensor_6    2000 non-null   float64
 8   sensor_7    2000 non-null   float64
 9   sensor_8    2000 non-null   float64
 10  sensor_9    2000 non-null   float64
 11  sensor_10   2000 non-null   float64
 12  sensor_11   2000 non-null   float64
 13  sensor_12   2000 non-null   float64
 14  sensor_13   2000 non-null   float64
 15  sensor_14   2000 non-null   float64
 16  sensor_15   2000 non-null   float64
 17  sensor_16   2000 non-null   float64
 18  sensor_17   2000 non-null   float64
 19  sensor_18   2000 non-null  

According to the dataset, there is a total of 23 columns: 20 columns for the sensors and 3 columns representing each farm parcel. 

There are no null values.

Next, we will train the neural network that just needs one input and 2 hidden neurons. We'll use sigmoid outputs as we turn our focus away from the sum of probabilities. Now, we want each output neuron to be able to take a value between 0 and 1. The sigmoid activation constrains our neuron output in the range of 0-1.

For the loss function, we'll use binary cross-entropy when we compile our model. We can look at it as if we were planning several beinary classification problems: for each output we are deciding whether or not its corresponding label is present given the current input.

In [25]:
# Instantiate a Sequential model
model = Sequential()

model.add(Input(shape=(20,)))

# Add a hidden layer of 64 neurons and a 20 neuron's input
model.add(Dense(64, activation='relu'))

# Add an output layer of 3 neurons with sigmoid activation
model.add(Dense(3, activation='sigmoid'))

# Compile your model with binary crossentropy loss
model.compile(optimizer='adam',
           loss = 'binary_crossentropy',
           metrics=['accuracy'])

model.summary()

In [26]:
sensors = irrigation_machine.drop(columns=['Unnamed: 0','parcel_0', 'parcel_1', 'parcel_2'], axis=1).values
parcels = irrigation_machine[['parcel_0', 'parcel_1', 'parcel_2']].values

X_train, X_test, y_train, y_test = train_test_split(sensors, parcels, test_size=0.2, random_state=42)

# train for 100 epocs using validation split of 0.2
model.fit(X_train, y_train, epochs=100, validation_split=0.2)

# predict on testing set
preds = model.predict(X_test)
preds_rounded = np.round(preds)

# print rounded preds
print('Rounded Predictions: \n', preds_rounded)

Epoch 1/100
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.7105 - loss: 1.0237 - val_accuracy: 0.4750 - val_loss: 0.5397
Epoch 2/100
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.4945 - loss: 0.5155 - val_accuracy: 0.5906 - val_loss: 0.4208
Epoch 3/100
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.6056 - loss: 0.4141 - val_accuracy: 0.6500 - val_loss: 0.3625
Epoch 4/100
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.6783 - loss: 0.3493 - val_accuracy: 0.6156 - val_loss: 0.3281
Epoch 5/100
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.6125 - loss: 0.3215 - val_accuracy: 0.6562 - val_loss: 0.3113
Epoch 6/100
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.6811 - loss: 0.2906 - val_accuracy: 0.6594 - val_loss: 0.2925
Epoch 7/100
[1m40/40[0m [32m━━━

In [27]:
# evaluate model's accuracy on the test data
accuracy = model.evaluate(X_test, y_test)[1]

print('Accuracy:', accuracy)

[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - accuracy: 0.7160 - loss: 0.2974 
Accuracy: 0.7099999785423279
