### How Machine Learning Helps Farmers Select the Best Crops

<img src='farmer_in_a_field.jpg'>

Measuring soil metrics like nitrogen, phosphorus, potassium, and pH is key to assessing soil health but can be costly and time-consuming. Because of budget limits, farmers often need to decide which soil properties to test.

Choosing the right crop depends heavily on soil conditions, as each crop thrives under specific nutrient levels. A farmer wants help selecting the best crop based on soil data.

a dataset, soil_measures.csv, containing soil nutrient ratios ("N", "P", "K"), pH values, and the optimal crop for each field (the target). Each row shows soil measurements from a field and the crop best suited for those conditions.

<h4>we will help the farmer test only the most important soil nutrient ratios for his crop data set</h4>

In [8]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.preprocessing import StandardScaler

crops = pd.read_csv("soil_measures.csv")

In [9]:
#EDA
crops.isnull().sum() # no missing vlaues
crops.dtypes #all numeric ,crop is string catagorical
len(crops['crop'].unique()) #22 class classification

22

In [10]:
#create features and target
X=crops.drop('crop',axis=1)
y=crops['crop']
#encode our 22 classes
# Encode target into integers since logistic cant take a matrix of one hot encoders ,only 1 dim array
label_map = {label: idx for idx, label in enumerate(y.unique())}
y = y.map(label_map) #1 2 3..22

X_train,X_test,y_train,y_test=train_test_split(X,y,shuffle=True,random_state=42)

In [12]:
# Predict the crop using each feature individually. You should build a model for each feature. That means you will #build four models.

features_importance = {}

for feature in ["N", "P", "K", "ph"]:
    scaler = StandardScaler()
    '''
    Adjusting feature values to a standard range or distribution (e.g., mean=0, std=1 with StandardScaler)
    to Helps models converge faster, makes features comparable.
    Scaling when you have multiple features with different units or scales 
    
    '''
    log_reg = LogisticRegression(solver="lbfgs", max_iter=500)  #  multi class prediction 
    
    # Scale training data 
    # We must use the scaled data for training to improve convergence and performance warnings
    X_train_scaled = scaler.fit_transform(X_train[[feature]]) # [[]] so its a 2 dim
    
    # Scale test data using the same scaler of trained data
    # This prevents data leakage and ensures test data is transformed consistently
    X_test_scaled = scaler.transform(X_test[[feature]])
    
    
    log_reg.fit(X_train_scaled, y_train)
    
    
    y_pred = log_reg.predict(X_test_scaled)
    
    # Calculate the weighted F1 score for all the features
    features_importance[feature] = metrics.f1_score(y_test, y_pred, average="weighted")
    

best_feature = max(features_importance, key=features_importance.get)
best_f1 = features_importance[best_feature]

best_predictive_feature = {best_feature: best_f1}

print('Most important is: ', best_feature, best_f1)


Most important is:  K 0.14300946985661483
