# **OPEN-ARC**
---

### Project 8: Plant Stress Prediction Model:
**Challenge:** Create an AI model, capable of predicting a plant's stress level.


### Terms and Use:
Learn more about the project's [LICENSE](https://github.com/Infinitode/OPEN-ARC/blob/main/LICENSE) and read our [CODE_OF_CONDUCT](https://github.com/Infinitode/OPEN-ARC/blob/main/CODE_OF_CONDUCT) before contributing to the project. You can contribute to this project from here: [https://github.com/Infinitode/OPEN-ARC/](https://github.com/Infinitode/OPEN-ARC/).

---

Please fill out this performance sheet to help others quickly see your model's performance **(optional)**:

### Performance Sheet:
| Contributor | Architecture Type | Platform | Base Model | Dataset | BLEU-Score | Link |
|-------------|-------------------|----------|------------|---------|----------|------|
| Infinitode  | XGBClassifier  | Kaggle   | ✔  | Plant-Health-Data | 99.1%    | [Notebook](https://github.com/Infinitode/OPEN-ARC/Project-8-PSPM/project-8-pspm.ipynb) |
| Username  | Unknown  | Kaggle   | ✗/✔  | Plant-Health-Data | Score    | [Notebook](https://github.com) |

---

### Model: XGBoostClassifier:
This implementation uses an XGBoost Classifier model. You can learn more about XGBoost classifiers from here: https://apmonitor.com/pds/index.php/Main/XGBoostClassifier

In [16]:
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from xgboost import XGBClassifier

# Load dataset
file_path = "/kaggle/input/plant-health-data/plant_health_data.csv"
data = pd.read_csv(file_path)

# Drop unnecessary features
data = data.drop(['Timestamp', 'Plant_ID'], axis=1)

# Data exploration
print(data.head())
print(data.info())
print(data.describe())

# Handle missing values
data = data.dropna()

# Encoding categorical variables
# Define mapping
health_mapping = {
    "Healthy": 0,
    "Moderate Stress": 1,
    "High Stress": 2
}

# Apply mapping to the target column
data['Plant_Health_Status'] = data['Plant_Health_Status'].map(health_mapping)

# Split dataset into features and target
X = data.drop(columns=["Plant_Health_Status"])
y = data["Plant_Health_Status"]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Initialize and train the model
xgb = XGBClassifier(
    n_estimators=100, 
    learning_rate=0.1, 
    max_depth=6, 
    subsample=0.8, 
    colsample_bytree=0.8, 
    random_state=42
)
xgb.fit(X_train, y_train)

# Predictions and evaluation
y_pred = xgb.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")
print(classification_report(y_test, y_pred))

   Soil_Moisture  Ambient_Temperature  Soil_Temperature   Humidity  \
0      27.521109            22.240245         21.900435  55.291904   
1      14.835566            21.706763         18.680892  63.949181   
2      17.086362            21.180946         15.392939  67.837956   
3      15.336156            22.593302         22.778394  58.190811   
4      39.822216            28.929001         18.100937  63.772036   

   Light_Intensity   Soil_pH  Nitrogen_Level  Phosphorus_Level  \
0       556.172805  5.581955       10.003650         45.806852   
1       596.136721  7.135705       30.712562         25.394393   
2       591.124627  5.656852       29.337002         27.573892   
3       241.412476  5.584523       16.966621         26.180705   
4       444.493830  5.919707       10.944961         37.898907   

   Potassium_Level  Chlorophyll_Content  Electrochemical_Signal  \
0        39.076199            35.703006                0.941402   
1        17.944826            27.993296         

An accuracy of `99.16%` is quite exceptional. This means that our model could correctly, in almost all cases, assume the health of a plant, given the features in our dataset.

In [17]:
# Save the trained model
xgb.save_model("xgb_plant_health.json")

## Testing the model

Below, we'll test the model on random samples, comparing them to the true labels in the dataset.

In [21]:
import random

def test_random_samples(model, X_test, y_test, n_samples=5):
    """
    Selects random samples from the test set, makes predictions, and compares with actual values.
    
    Parameters:
    - model: Trained XGBoost classifier.
    - X_test: Feature set for testing.
    - y_test: True labels for testing.
    - n_samples: Number of random samples to test.
    
    Returns:
    None
    """
    # Convert X_test and y_test to DataFrame for easier indexing
    X_test_df = X_test.reset_index(drop=True)
    y_test_df = y_test.reset_index(drop=True)

    # Pick random indices
    random_indices = random.sample(range(len(X_test)), n_samples)
    
    print("Testing on Random Samples:")
    for idx in random_indices:
        sample = X_test_df.iloc[idx]
        true_label = y_test_df.iloc[idx]
        
        # Predict using the model
        prediction = model.predict(sample.values.reshape(1, -1))

        # Reverse the health mapping
        reverse_health_mapping = {v: k for k, v in health_mapping.items()}

        # Map true and predicted labels
        true_label_description = reverse_health_mapping[true_label]
        predicted_label_description = reverse_health_mapping[prediction[0]]
        
        # Output results
        print(f"Sample Index: {idx}")
        print(f"Features: {sample.values}")
        print(f"True Label: {true_label}, Predicted Label: {prediction[0]}")
        print(f"True Label (Description): {true_label_description}, Predicted Label (Description): {predicted_label_description}")
        print("-" * 40)

# Example usage
test_random_samples(xgb, X_test, y_test)

Testing on Random Samples:
Sample Index: 68
Features: [ 14.95127629  28.1448083   19.76793919  60.04383982 320.72798012
   7.44179672  36.14175518  30.49286292  42.79289468  21.38528052
   1.66735995]
True Label: 2, Predicted Label: 2
True Label (Description): High Stress, Predicted Label (Description): High Stress
----------------------------------------
Sample Index: 104
Features: [ 11.94587385  28.0830208   24.58021046  50.62789124 768.22040676
   6.21460401  46.363573    44.67257557  38.58131534  48.09147089
   1.79547927]
True Label: 2, Predicted Label: 2
True Label (Description): High Stress, Predicted Label (Description): High Stress
----------------------------------------
Sample Index: 65
Features: [ 22.01587866  29.60668838  21.42104682  63.83220953 392.30986702
   6.47412594  38.72747773  19.84868482  14.82515429  48.8969809
   1.26613011]
True Label: 1, Predicted Label: 1
True Label (Description): Moderate Stress, Predicted Label (Description): Moderate Stress
-------------

### The End:

This is the end of this project notebook, make sure to experiment and contribute to help improve the model and implementation. You can browse more of the open-source free projects on our GitHub repository: https://github.com/Infinitode/OPEN-ARC. If you like this project, make sure to star the repo and contribute your implementation, or help others in the community.

~ Infinitode