# Practical Part: Naive Bayes and AdaBoost in Agronomy

In this practical session, we will implement Naive Bayes and AdaBoost classifiers. We'll simulate agronomy-related data to illustrate how these methods work in predicting crop health based on features like soil moisture, leaf color, and pest presence.

In [1]:
# Importing required libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

## Dataset Creation

Let's create a synthetic dataset where:
- Each row represents a plant observation.
- Features include `soil_moisture`, `leaf_color`, and `pest_presence`.
- The target variable is `health`, with classes "healthy" and "unhealthy".

In [2]:
# Simulating a dataset with agronomy-related features
np.random.seed(42)
data = pd.DataFrame({
    'soil_moisture': np.random.choice(['low', 'medium', 'high'], size=100),
    'leaf_color': np.random.choice(['green', 'yellow', 'brown'], size=100),
    'pest_presence': np.random.choice(['present', 'absent'], size=100),
    'health': np.random.choice(['healthy', 'unhealthy'], size=100)
})

# Preview of the dataset
data.head()


Unnamed: 0,soil_moisture,leaf_color,pest_presence,health
0,high,brown,absent,unhealthy
1,low,brown,present,unhealthy
2,high,brown,absent,healthy
3,high,green,present,unhealthy
4,low,brown,absent,healthy


## Encoding Categorical Variables

Since Naive Bayes and AdaBoost require numerical input, we will convert our categorical features into numeric form using one-hot encoding.

In [3]:
# Encoding categorical variables
data_encoded = pd.get_dummies(data, columns=['soil_moisture', 'leaf_color', 'pest_presence'])
X = data_encoded.drop('health', axis=1)
y = data['health'].apply(lambda x: 1 if x == 'healthy' else 0)  # Encoding target variable


## Splitting the Dataset

We’ll split our data into training and testing sets to check our model’s performance on unseen data.


In [4]:
# Splitting data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


## Naive Bayes Classifier

We’ll train a Naive Bayes model on the training data to classify plant health.


In [5]:
# Training the Naive Bayes classifier
nb_model = GaussianNB()
nb_model.fit(X_train, y_train)

# Making predictions and evaluating the Naive Bayes model
y_pred_nb = nb_model.predict(X_test)
print("Naive Bayes Accuracy:", accuracy_score(y_test, y_pred_nb))
print("Classification Report:\n", classification_report(y_test, y_pred_nb))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_nb))


Naive Bayes Accuracy: 0.7
Classification Report:
               precision    recall  f1-score   support

           0       0.78      0.64      0.70        11
           1       0.64      0.78      0.70         9

    accuracy                           0.70        20
   macro avg       0.71      0.71      0.70        20
weighted avg       0.71      0.70      0.70        20

Confusion Matrix:
 [[7 4]
 [2 7]]


### Real-Life Example with Naive Bayes

Imagine we're classifying crop health based on leaf color and soil moisture. A farmer notices that plants with yellow leaves and low soil moisture often become unhealthy. Naive Bayes helps quantify this by estimating probabilities, assuming each feature independently contributes to crop health.


## AdaBoost Classifier

Now, let's use an AdaBoost classifier to improve our plant health prediction. We'll use a Decision Tree as the weak learner.


In [6]:
# Training the AdaBoost classifier
ada_model = AdaBoostClassifier(base_estimator=DecisionTreeClassifier(max_depth=1), n_estimators=50, random_state=42)
ada_model.fit(X_train, y_train)

# Making predictions and evaluating the AdaBoost model
y_pred_ada = ada_model.predict(X_test)
print("AdaBoost Accuracy:", accuracy_score(y_test, y_pred_ada))
print("Classification Report:\n", classification_report(y_test, y_pred_ada))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_ada))


AdaBoost Accuracy: 0.75
Classification Report:
               precision    recall  f1-score   support

           0       0.75      0.82      0.78        11
           1       0.75      0.67      0.71         9

    accuracy                           0.75        20
   macro avg       0.75      0.74      0.74        20
weighted avg       0.75      0.75      0.75        20

Confusion Matrix:
 [[9 2]
 [3 6]]




### Real-Life Example with AdaBoost

Imagine a scenario where we’re classifying crop health based on several observations. With AdaBoost, each model learns to focus on cases it misclassified in the past, gradually building a more accurate model.


## Comparison of Naive Bayes and AdaBoost

| Model        | Naive Bayes Accuracy | AdaBoost Accuracy |
|--------------|-----------------------|-------------------|
| Score        |       70%             |       75%         |
  
 As expected, AdaBoost has a higher accuracy because it corrects mistakes iteratively.


In [7]:
# Displaying the comparison
print(f"Naive Bayes Accuracy: {accuracy_score(y_test, y_pred_nb):.2f}")
print(f"AdaBoost Accuracy: {accuracy_score(y_test, y_pred_ada):.2f}")


Naive Bayes Accuracy: 0.70
AdaBoost Accuracy: 0.75


## Conclusion

- **Naive Bayes** is effective for simple, independent features, but it may be limited by its assumptions.
- **AdaBoost** improves accuracy by iteratively focusing on misclassified instances, making it powerful for complex data.
