<a href="https://colab.research.google.com/github/ChiruSai-Lord/ai-exam/blob/main/Copy_of_project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Generate the Python code for a smart pesticide recommendation system based on deep image analysis and environmental conditions, including data collection, preprocessing, model architecture, training, and evaluation.

## Data collection

### Subtask:
Gather a dataset of plant images showing various diseases and pests, along with corresponding environmental data (temperature, humidity, rainfall, etc.) and effective pesticide recommendations for each case.


## Data preprocessing

### Subtask:
Preprocess the image data (resizing, normalization, augmentation) and environmental data (scaling, handling missing values) to prepare it for model training.


**Reasoning**:
I need to load the image and environmental data, preprocess both datasets, align them, and split the combined data into training, validation, and testing sets. I will use dummy data generation for demonstration purposes as actual data is not available.



In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt

# Step 1: Load the collected image and environmental data (using dummy data)
# Assuming image data is a numpy array of shape (num_samples, height, width, channels)
# Assuming environmental data is a pandas DataFrame
num_samples = 1000
img_height, img_width = 128, 128
# Dummy image data: random pixel values
image_data = np.random.rand(num_samples, img_height, img_width, 3)

# Dummy environmental data: temperature, humidity, rainfall, wind_speed
environmental_data = pd.DataFrame({
    'temperature': np.random.rand(num_samples) * 20 + 15, # 15-35 degrees
    'humidity': np.random.rand(num_samples) * 50 + 30,    # 30-80 %
    'rainfall': np.random.rand(num_samples) * 10,         # 0-10 mm
    'wind_speed': np.random.rand(num_samples) * 15        # 0-15 km/h
})

# Introduce some missing values in environmental data
for col in environmental_data.columns:
    environmental_data.loc[np.random.choice(num_samples, int(num_samples * 0.05), replace=False), col] = np.nan

# Dummy target variable (pesticide recommendation index)
target = np.random.randint(0, 5, num_samples) # Assuming 5 different pesticide recommendations

# Step 2 & 3 & 4: Preprocess image data (resizing, normalization, augmentation)
# Resizing is implicitly handled if the dummy data has the target size
# Normalization to [0, 1] is done by generating data between 0 and 1
# Data augmentation (applied later during training)
image_datagen = ImageDataGenerator(
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)

# Step 5: Handle missing values in environmental data
# Using mean imputation as an example strategy
environmental_data_filled = environmental_data.fillna(environmental_data.mean())

# Step 6: Scale the environmental data features
scaler = StandardScaler()
environmental_data_scaled = scaler.fit_transform(environmental_data_filled)
environmental_data_scaled_df = pd.DataFrame(environmental_data_scaled, columns=environmental_data_filled.columns)


# Step 7: Ensure that the preprocessed image and environmental data are aligned correctly
# Since we generated dummy data with the same number of samples and in the same order,
# they are already aligned. In a real scenario, ensure alignment based on sample IDs.

# Step 8: Split the preprocessed data into training, validation, and testing sets.
# Combine image and environmental data for splitting
# We'll split the indices to keep image and environmental data separate but aligned
indices = np.arange(num_samples)
train_indices, test_indices, y_train, y_test = train_test_split(indices, target, test_size=0.2, random_state=42, stratify=target)
train_indices, val_indices, y_train, y_val = train_test_split(train_indices, y_train, test_size=0.25, random_state=42, stratify=y_train) # 0.25 * 0.8 = 0.2 of original data

x_train_img = image_data[train_indices]
x_val_img = image_data[val_indices]
x_test_img = image_data[test_indices]

x_train_env = environmental_data_scaled_df.iloc[train_indices]
x_val_env = environmental_data_scaled_df.iloc[val_indices]
x_test_env = environmental_data_scaled_df.iloc[test_indices]

print("Image data shapes:")
print(f"Training: {x_train_img.shape}")
print(f"Validation: {x_val_img.shape}")
print(f"Testing: {x_test_img.shape}")

print("\nEnvironmental data shapes:")
print(f"Training: {x_train_env.shape}")
print(f"Validation: {x_val_env.shape}")
print(f"Testing: {x_test_env.shape}")

print("\nTarget shapes:")
print(f"Training: {y_train.shape}")
print(f"Validation: {y_val.shape}")
print(f"Testing: {y_test.shape}")

Image data shapes:
Training: (600, 128, 128, 3)
Validation: (200, 128, 128, 3)
Testing: (200, 128, 128, 3)

Environmental data shapes:
Training: (600, 4)
Validation: (200, 4)
Testing: (200, 4)

Target shapes:
Training: (600,)
Validation: (200,)
Testing: (200,)


## Model architecture

### Subtask:
Design a deep learning model that can process both image data and environmental data. This could involve separate branches for each data type that are later combined.


**Reasoning**:
Design a deep learning model that can process both image data and environmental data by creating separate branches for each data type that are later combined using the Keras functional API.



In [None]:
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense, concatenate
from tensorflow.keras.models import Model

# Define the number of environmental features and the number of output classes
num_environmental_features = x_train_env.shape[1]
num_classes = len(np.unique(y_train))

# 2. Define the input layers
image_input = Input(shape=(img_height, img_width, 3), name='image_input')
environmental_input = Input(shape=(num_environmental_features,), name='environmental_input')

# 3. Create a convolutional base for processing the image input
x = Conv2D(32, (3, 3), activation='relu')(image_input)
x = MaxPooling2D((2, 2))(x)
x = Conv2D(64, (3, 3), activation='relu')(x)
x = MaxPooling2D((2, 2))(x)
x = Conv2D(128, (3, 3), activation='relu')(x)
x = MaxPooling2D((2, 2))(x)

# 4. Flatten the output of the convolutional base
image_features = Flatten()(x)

# 5. Create a separate branch for processing the environmental data
y = Dense(32, activation='relu')(environmental_input)
y = Dense(16, activation='relu')(y) # Additional dense layer for environmental data

# 6. Concatenate the flattened image features and the processed environmental features
combined_features = concatenate([image_features, y])

# 7. Add one or more Dense layers after the concatenation
z = Dense(64, activation='relu')(combined_features)
z = Dense(32, activation='relu')(z) # Additional dense layer for combined features

# 8. Add a final Dense layer with a 'softmax' activation function
output_layer = Dense(num_classes, activation='softmax', name='output')(z)

# 9. Create the Keras Model
model = Model(inputs=[image_input, environmental_input], outputs=output_layer)

# 10. Print a summary of the model architecture
model.summary()

## Model training

### Subtask:
Train the integrated model on the prepared dataset to predict the appropriate pesticide based on image and environmental inputs.


**Reasoning**:
Compile and train the integrated model using the prepared training and validation data.



In [None]:
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

# Step 1: Compile the defined Keras model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Define callbacks
early_stopping = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
model_checkpoint = ModelCheckpoint('best_pesticide_model.keras', monitor='val_loss', save_best_only=True)


# Step 2, 3 & 4: Train the model
epochs = 50
batch_size = 32

history = model.fit(
    [x_train_img, x_train_env],
    y_train,
    epochs=epochs,
    batch_size=batch_size,
    validation_data=([x_val_img, x_val_env], y_val),
    callbacks=[early_stopping, model_checkpoint] # Step 5: Include callbacks
)

print("Model training complete.")

Epoch 1/50
[1m19/19[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 1s/step - accuracy: 0.2523 - loss: 1.6355 - val_accuracy: 0.2050 - val_loss: 1.6097
Epoch 2/50
[1m19/19[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 1s/step - accuracy: 0.1960 - loss: 1.6094 - val_accuracy: 0.2050 - val_loss: 1.6090
Epoch 3/50
[1m19/19[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 1s/step - accuracy: 0.1956 - loss: 1.6087 - val_accuracy: 0.2150 - val_loss: 1.6089
Epoch 4/50
[1m19/19[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m38s[0m 965ms/step - accuracy: 0.2021 - loss: 1.6085 - val_accuracy: 0.2050 - val_loss: 1.6087
Epoch 5/50
[1m19/19[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 1s/step - accuracy: 0.2071 - loss: 1.6105 - val_accuracy: 0.2300 - val_loss: 1.6090
Epoch 6/50
[1m19/19[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 941ms/step - accuracy: 0.2180 - loss: 1.6089 - val_accuracy: 0.2250 - val_loss: 1.6093
Epoch 7/50
[1m19/19[0m [32m━━━━

## Model evaluation

### Subtask:
Evaluate the model's performance using appropriate metrics (e.g., accuracy, precision, recall) on a separate test set.


**Reasoning**:
Load the best trained model, evaluate it on the test set, print the evaluation results, make predictions, and calculate and print classification metrics.



In [None]:
from tensorflow.keras.models import load_model
from sklearn.metrics import classification_report
import numpy as np

# Step 1: Load the best trained model weights from the saved file
loaded_model = load_model('best_pesticide_model.keras')

# Step 2: Evaluate the loaded model on the test dataset
test_loss, test_accuracy = loaded_model.evaluate([x_test_img, x_test_env], y_test, verbose=0)

# Step 3: Print the test loss and test accuracy
print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}")

# Step 4: Make predictions on the test dataset
predictions = loaded_model.predict([x_test_img, x_test_env])

# Get predicted labels by taking the argmax
predicted_labels = np.argmax(predictions, axis=1)

# Step 5: Calculate and print additional classification metrics
print("\nClassification Report:")
print(classification_report(y_test, predicted_labels))

Test Loss: 1.6082
Test Accuracy: 0.2100
[1m7/7[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 214ms/step

Classification Report:
              precision    recall  f1-score   support

           0       0.00      0.00      0.00        41
           1       0.00      0.00      0.00        39
           2       0.21      1.00      0.35        42
           3       0.00      0.00      0.00        41
           4       0.00      0.00      0.00        37

    accuracy                           0.21       200
   macro avg       0.04      0.20      0.07       200
weighted avg       0.04      0.21      0.07       200



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


## Summary:

### Data Analysis Key Findings

*   The data collection phase outlined the strategy for gathering image data of diseased/infested plants, corresponding environmental data (temperature, humidity, rainfall, wind speed), and relevant pesticide recommendations.
*   In the data preprocessing stage, dummy data was generated for image and environmental features. Missing values in the environmental data were imputed using the mean, and the features were scaled. The data was successfully split into training (60%), validation (20%), and testing (20%) sets, maintaining alignment between image and environmental data.
*   A deep learning model architecture was designed with separate branches for image (using convolutional layers) and environmental data (using dense layers), which were then concatenated and processed by further dense layers for final classification.
*   The model was compiled using the Adam optimizer and sparse categorical crossentropy loss. Training was performed for 50 epochs with a batch size of 32, incorporating Early Stopping and Model Checkpoint callbacks to monitor validation loss and save the best performing model.
*   Model evaluation on the test set resulted in a Test Loss of 1.6082 and a Test Accuracy of 0.2100. The classification report indicated poor performance, with low precision, recall, and F1-scores across most classes, and the model failed to predict some classes entirely.

### Insights or Next Steps

*   The current model performance is very poor, suggesting significant issues with the dataset, model architecture, training process, or a combination thereof.
*   Further steps should focus on improving the dataset quality and size, exploring more sophisticated model architectures (e.g., using pre-trained image models), hyperparameter tuning, and potentially investigating the class distribution in the dataset as the classification report indicates issues with predicting certain classes.
