<a href="https://colab.research.google.com/github/JhonyR28/Portafolio_JR_MachineLearning/blob/main/LogisticRegressionFeatures.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

How Machine Learning Helps Farmers Select the Best Crops


Dataset called `soil_measures.csv` contains:

- `"N"`: Nitrogen content ratio in the soil
- `"P"`: Phosphorous content ratio in the soil
- `"K"`: Potassium content ratio in the soil
- `"pH"` value of the soil
- `"crop"`: categorical values that contain various crops (target variable).

Each row in this dataset represents various measures of the soil in a particular field. Based on these measurements, the crop specified in the `"crop"` column is the optimal choice for that field.  

In this project I built a multi-class classification models to predict the type of `"crop"` and identify the single most importance feature for predictive performance.

In [None]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics

crops = pd.read_csv("soil_measures.csv")
print(crops.columns)
print(crops.head())
print(crops.describe())
print(crops.isna().sum().sort_values())

Index(['N', 'P', 'K', 'ph', 'crop'], dtype='object')
    N   P   K        ph  crop
0  90  42  43  6.502985  rice
1  85  58  41  7.038096  rice
2  60  55  44  7.840207  rice
3  74  35  40  6.980401  rice
4  78  42  42  7.628473  rice
                 N            P            K           ph
count  2200.000000  2200.000000  2200.000000  2200.000000
mean     50.551818    53.362727    48.149091     6.469480
std      36.917334    32.985883    50.647931     0.773938
min       0.000000     5.000000     5.000000     3.504752
25%      21.000000    28.000000    20.000000     5.971693
50%      37.000000    51.000000    32.000000     6.425045
75%      84.250000    68.000000    49.000000     6.923643
max     140.000000   145.000000   205.000000     9.935091
N       0
P       0
K       0
ph      0
crop    0
dtype: int64


In [None]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X = crops.drop('crop', axis=1)
y = crops['crop'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, test_size=0.2)
# Dictionary to store performance results
features_dict = {}
for feature in ["N", "P", "K", "ph"]:
    logreg = LogisticRegression(multi_class="multinomial")
    logreg.fit(X_train[[feature]], y_train)  # Fit with one feature
    y_pred = logreg.predict(X_test[[feature]])  # Predict with the same feature
    # F1-score (weighted)
    f1 = metrics.f1_score(y_test, y_pred, average='weighted')
    # Balanced accuracy
    balanced_accuracy = metrics.balanced_accuracy_score(y_test, y_pred)
    # Store the results in the dictionary
    features_dict[feature] = {'F1-score': f1, 'Balanced Accuracy': balanced_accuracy}
    # Print the results for each feature
    print(f"Feature: {feature}")
    print(f"F1-score: {f1}")
    print(f"Balanced Accuracy: {balanced_accuracy}")
    print("-" * 30)



Feature: N
F1-score: 0.09149868209906838
Balanced Accuracy: 0.1476702830081481
------------------------------
Feature: P
F1-score: 0.14761942909728204
Balanced Accuracy: 0.21874937920777934
------------------------------
Feature: K
F1-score: 0.23896974566001802
Balanced Accuracy: 0.30589580340250316
------------------------------
Feature: ph
F1-score: 0.04532731061152114
Balanced Accuracy: 0.11066529867543097
------------------------------


In [None]:
best_predictive_feature = {'K':0.23896974566001802}