<a href="https://colab.research.google.com/github/ashutosh7i/Aaditya14x/blob/main/crop_recommender_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Task: Develop a simple recommendation system that suggests the best crops to plant based on soil properties such as pH, temperature, and humidity.

Will make a web-app where farmer will enter-
Temp, Humidity, ph, Rainfall of his area as (K,N,P) are hard to derive.

App will suggest 3 crops to plant with images all in native language.

Below is ML model with random forest algo.

In [68]:
# Step 1: Import libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score

In [69]:
# Step 2: Load and explore data
data = pd.read_csv('Crop_recommendation.csv')
print(data.head())
print(data.info())
print(data.describe())

print(data['label'].unique())

    N   P   K  temperature   humidity        ph    rainfall label
0  90  42  43    20.879744  82.002744  6.502985  202.935536  rice
1  85  58  41    21.770462  80.319644  7.038096  226.655537  rice
2  60  55  44    23.004459  82.320763  7.840207  263.964248  rice
3  74  35  40    26.491096  80.158363  6.980401  242.864034  rice
4  78  42  42    20.130175  81.604873  7.628473  262.717340  rice
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2200 entries, 0 to 2199
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   N            2200 non-null   int64  
 1   P            2200 non-null   int64  
 2   K            2200 non-null   int64  
 3   temperature  2200 non-null   float64
 4   humidity     2200 non-null   float64
 5   ph           2200 non-null   float64
 6   rainfall     2200 non-null   float64
 7   label        2200 non-null   object 
dtypes: float64(4), int64(3), object(1)
memory usage: 137.6+ KB
None
         

In [70]:
# Step 3: Data preprocessing
# Exclude N, P, and K from features
X = data.drop(['N', 'P', 'K', 'label'], axis=1)
y = data['label']

In [71]:
# Step 4: Feature scaling
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

In [72]:
# Step 5: Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

In [73]:
# Step 6: Model selection and training
model = RandomForestClassifier()
model.fit(X_train, y_train)

In [74]:
# Step 7: Model evaluation
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

Accuracy: 0.9636363636363636
              precision    recall  f1-score   support

       apple       0.96      0.96      0.96        23
      banana       1.00      1.00      1.00        21
   blackgram       0.95      1.00      0.98        20
    chickpea       1.00      1.00      1.00        26
     coconut       1.00      1.00      1.00        27
      coffee       0.94      1.00      0.97        17
      cotton       1.00      1.00      1.00        17
      grapes       1.00      1.00      1.00        14
        jute       0.92      1.00      0.96        23
 kidneybeans       1.00      1.00      1.00        20
      lentil       0.92      1.00      0.96        11
       maize       0.91      1.00      0.95        21
       mango       0.86      1.00      0.93        19
   mothbeans       1.00      0.88      0.93        24
    mungbean       1.00      0.95      0.97        19
   muskmelon       1.00      1.00      1.00        17
      orange       0.80      0.86      0.83        1

In [81]:
# Step 8: Prediction
# Temp, Humidity, ph, Rainfall
new_data = np.array([[25.05802193,84.97323747,5.738678895,110.4408803]])  # Input from frontend
new_data_scaled = scaler.transform(new_data)
probabilities = model.predict_proba(new_data_scaled)[0]

# Get indices of top 3 probabilities
top_indices = probabilities.argsort()[-3:][::-1]

# Get corresponding crop labels
top_crops = model.classes_[top_indices]
print("Top 3 recommended crops:", top_crops)

Top 3 recommended crops: ['banana' 'pomegranate' 'grapes']




Creating a H5 model

In [83]:
import joblib
from tensorflow import keras

# Save the trained model as a pickle file
joblib.dump(model, 'crop_recommendation_model.pkl')

# Load the model from the pickle file
model = joblib.load('crop_recommendation_model.pkl')

# Convert the RandomForestClassifier to a keras model
keras_model = keras.Sequential([
    keras.layers.Input(shape=(X_train.shape[1],)),
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dense(len(model.classes_), activation='softmax')
])
keras_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Transfer the weights from the RandomForestClassifier to the keras model
for i, layer in enumerate(keras_model.layers):
    if i == 0:  # Skip the input layer
        continue
    layer.set_weights([model.estimators_[i-1].feature_importances_])

# Save the keras model as an h5 file
keras_model.save('crop_recommendation_model.h5')


ValueError: You called `set_weights(weights)` on layer "dense_1" with a weight list of length 1, but the layer was expecting 2 weights. Provided weights: [array([0.21939065, 0.37823331, 0.16500771, 0.2373...