# Sowing Success: How Machine Learning Helps Farmers Select the Best Crops

![Farmer in a field](farmer_in_a_field.jpg)

Measuring essential soil metrics such as nitrogen, phosphorous, potassium levels, and pH value is an important aspect of assessing soil condition. However, it can be an expensive and time-consuming process, which can cause farmers to prioritize which metrics to measure based on their budget constraints.

Farmers have various options when it comes to deciding which crop to plant each season. Their primary objective is to maximize the yield of their crops, taking into account different factors. One crucial factor that affects crop growth is the condition of the soil in the field, which can be assessed by measuring basic elements such as nitrogen and potassium levels. Each crop has an ideal soil condition that ensures optimal growth and maximum yield.

A farmer reached out to you as a machine learning expert for assistance in selecting the best crop for his field. They've provided you with a dataset called `soil_measures.csv`, which contains:

- `"N"`: Nitrogen content ratio in the soil
- `"P"`: Phosphorous content ratio in the soil
- `"K"`: Potassium content ratio in the soil
- `"pH"` value of the soil
- `"crop"`: categorical values that contain various crops (target variable).

Each row in this dataset represents various measures of the soil in a particular field. Based on these measurements, the crop specified in the `"crop"` column is the optimal choice for that field.  

In this project, you will build multi-class classification models to predict the type of `"crop"` and identify the single most importance feature for predictive performance.

Plan:
Understand the data: We'll load the dataset and take a look at its structure.
Data Preprocessing: We'll split the data into features (N, P, K, pH) and the target (crop).
Model Selection: We'll train a separate Logistic Regression model for each feature and evaluate the performance using a suitable metric (accuracy, F1-score, etc.).
Find the Best Feature: After evaluating each feature individually, we'll identify the feature with the best predictive performance and store the result in a dictionary.

Step 1: Loading and Exploring the Dataset

In [1]:
# All required libraries are imported here for you.
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics

# Load the dataset
crops = pd.read_csv("soil_measures.csv")

# Write your code here

# Load the dataset
crops = pd.read_csv("soil_measures.csv")

# Inspect the dataset
print(crops.head())
print(crops.info())


    N   P   K        ph  crop
0  90  42  43  6.502985  rice
1  85  58  41  7.038096  rice
2  60  55  44  7.840207  rice
3  74  35  40  6.980401  rice
4  78  42  42  7.628473  rice
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2200 entries, 0 to 2199
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   N       2200 non-null   int64  
 1   P       2200 non-null   int64  
 2   K       2200 non-null   int64  
 3   ph      2200 non-null   float64
 4   crop    2200 non-null   object 
dtypes: float64(1), int64(3), object(1)
memory usage: 86.1+ KB
None


Step 2: Data Preprocessing
We need to split the data into features and the target variable:

Features: N, P, K, pH
Target: crop
We also need to encode the categorical target variable (crop) into numerical labels since scikit-learn’s Logistic Regression requires numerical labels.

In [2]:
# Print all the column names in the dataset
print(crops.columns)


Index(['N', 'P', 'K', 'ph', 'crop'], dtype='object')


In [3]:
# Encode the categorical target variable 'crop'
crops['crop'] = crops['crop'].astype('category').cat.codes


# Split the data into features (X) and target (y)
X = crops[['N', 'P', 'K', 'ph']]  # Use 'ph' instead of 'pH'
y = crops['crop']  # Target variable



Here's what we'll do in this step:

For each feature, we'll train a Logistic Regression model to predict the crop type.
We'll calculate the accuracy of the model for each feature.
Finally, we'll determine the feature that gives the best accuracy and store it in the best_predictive_feature dictionary.

Step 3: Train Models and Evaluate Each Feature
Here’s the code to train a separate model for each feature and evaluate it:

In [4]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics

# Initialize the Logistic Regression model
model = LogisticRegression(max_iter=200)

# Dictionary to store the best feature and its score
best_predictive_feature = {}

# List of features
features = ['N', 'P', 'K', 'ph']

# Loop through each feature to train a model and evaluate performance
for feature in features:
    # Select the current feature for training
    X_feature = X[[feature]]  # Use only one feature at a time
    
    # Split into train and test sets
    X_train, X_test, y_train, y_test = train_test_split(X_feature, y, test_size=0.2, random_state=42)
    
    # Train the model
    model.fit(X_train, y_train)
    
    # Make predictions
    y_pred = model.predict(X_test)
    
    # Calculate accuracy
    accuracy = metrics.accuracy_score(y_test, y_pred)
    print(f"Accuracy for {feature}: {accuracy}")
    
    # Save the best feature based on accuracy
    if not best_predictive_feature or accuracy > list(best_predictive_feature.values())[0]:
        best_predictive_feature = {feature: accuracy}

# Print the best predictive feature and its accuracy score
print("Best Predictive Feature:", best_predictive_feature)


Accuracy for N: 0.14545454545454545
Accuracy for P: 0.19090909090909092
Accuracy for K: 0.29545454545454547
Accuracy for ph: 0.09772727272727273
Best Predictive Feature: {'K': 0.29545454545454547}


 Based on the results:

The feature K (Potassium content) has the highest predictive accuracy for classifying the crop type, with an accuracy of 29.54%.
This feature outperforms the others: N (Nitrogen), P (Phosphorus), and ph (soil pH level).
{'K': 0.29545454545454547}