<a href="https://www.kaggle.com/code/dalloliogm/autogluon-approach-to-fertilizer-prediction?scriptVersionId=243022174" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Autogluon approach

Autogluon is an autoML tool from Amazon. It performed well in previous playground competitions. Let's try it out.

In [23]:
# Go to Add-ons > Install Dependencies to install this into the environment
!pip install -q autogluon

## Parameters

In [24]:
import os
import pandas as pd
import polars as pl
import numpy as np
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split

def is_interactive_session():
    return os.environ.get('KAGGLE_KERNEL_RUN_TYPE','') == 'Interactive'

is_interactive_session()

config = {
    "autogluon_time": 3600,
    "autogluon_presets": "best_quality",
    #"reduce_features": 0, # Set to >0 to use only the first n features
    "tail_rows": 0 # Set to >0 to use only the last n rows in the file
    
}

if is_interactive_session():
    print("Interactive session")
    config["autogluon_time"] = 100
    #config["reduce_features"] = 200
    config["autogluon_presets"] = "medium_quality"
    config["tail_rows"] = 2000
    print(config)
else:
    print("running as job")
    print(config)

Interactive session
{'autogluon_time': 100, 'autogluon_presets': 'medium_quality', 'tail_rows': 2000}


## Read data

In [25]:
import pandas as pd

train = pd.read_csv('/kaggle/input/playground-series-s5e6/train.csv')
test = pd.read_csv('/kaggle/input/playground-series-s5e6/test.csv')


In [26]:
train.head()

Unnamed: 0,id,Temparature,Humidity,Moisture,Soil Type,Crop Type,Nitrogen,Potassium,Phosphorous,Fertilizer Name
0,0,37,70,36,Clayey,Sugarcane,36,4,5,28-28
1,1,27,69,65,Sandy,Millets,30,6,18,28-28
2,2,29,63,32,Sandy,Millets,24,12,16,17-17-17
3,3,35,62,54,Sandy,Barley,39,12,4,10-26-26
4,4,35,58,43,Red,Paddy,37,2,16,DAP


In [27]:
from autogluon.tabular import TabularPredictor


## Some Feature Engineering

### Nutrient Ratios

Fertilizers are often chosen based on nutrient balance, not just raw amounts.



In [35]:
train['N_P_ratio'] = train['Nitrogen'] / (train['Phosphorous'] + 1)
train['N_K_ratio'] = train['Nitrogen'] / (train['Potassium'] + 1)
train['P_K_ratio'] = train['Phosphorous'] / (train['Potassium'] + 1)
test['N_P_ratio'] = test['Nitrogen'] / (test['Phosphorous'] + 1)
test['N_K_ratio'] = test['Nitrogen'] / (test['Potassium'] + 1)
test['P_K_ratio'] = test['Phosphorous'] / (test['Potassium'] + 1)


### Weather Soil Interactions

In [36]:
train['Temp_Humidity'] = train['Temparature'] * train['Humidity']
train['Soil_Crop'] = train['Soil Type'] + '_' + train['Crop Type']
test['Temp_Humidity'] = test['Temparature'] * test['Humidity']
test['Soil_Crop'] = test['Soil Type'] + '_' + test['Crop Type']


### Polynomial binning

In [37]:
# Bin temparature
train['Temp_bin'] = pd.cut(train['Temparature'], bins=[0, 20, 30, 40, 60], labels=['Low', 'Med', 'High', 'Very High'])
# Combine with crop
train['Crop_Temp_bin'] = train['Crop Type'] + '_' + train['Temp_bin'].astype(str)


test['Temp_bin'] = pd.cut(test['Temparature'], bins=[0, 20, 30, 40, 60], labels=['Low', 'Med', 'High', 'Very High'])
# Combine with crop
test['Crop_Temp_bin'] = test['Crop Type'] + '_' + test['Temp_bin'].astype(str)


## Train Predictor

Autogluon will try several models and parameters. To trigger the training, just call .fit().

In [31]:
label = 'Fertilizer Name'
predictor = TabularPredictor(label="Fertilizer Name", 
                            eval_metric="log_loss").\
                fit(
                            train,
                            presets=config["autogluon_presets"],
                            time_limit=config["autogluon_time"])


No path specified. Models will be saved in: "AutogluonModels/ag-20250601_074235"
Verbosity: 2 (Standard Logging)
AutoGluon Version:  1.3.1
Python Version:     3.11.11
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP PREEMPT_DYNAMIC Sun Nov 10 10:07:59 UTC 2024
CPU Count:          4
Memory Avail:       28.97 GB / 31.35 GB (92.4%)
Disk Space Avail:   19.26 GB / 19.52 GB (98.7%)
Presets specified: ['medium_quality']
Beginning AutoGluon training ... Time limit = 100s
AutoGluon will save models to "/kaggle/working/AutogluonModels/ag-20250601_074235"
Train Data Rows:    750000
Train Data Columns: 16
Label Column:       Fertilizer Name
AutoGluon infers your prediction problem is: 'multiclass' (because dtype of label-column == object).
	7 unique label values:  ['28-28', '17-17-17', '10-26-26', 'DAP', '20-20', '14-35-14', 'Urea']
	If 'multiclass' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may spec

In [38]:
probs = predictor.predict_proba(test)  
top3_preds = probs.apply(lambda row: row.nlargest(3).index.tolist(), axis=1)


In [41]:
# from sklearn.metrics import label_ranking_average_precision_score
# import numpy as np

# # Ground truth as binary indicator matrix
# y_true = pd.get_dummies(test['Fertilizer Name']).values
# y_score = probs[test.columns[1:]]  # drop 'id'

# map3 = label_ranking_average_precision_score(y_true, y_score)
# print(f'MAP@3: {map3:.4f}')


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Get leaderboard with scores
lb = predictor.leaderboard(silent=True)

# Filter only models with valid CV scores
lb = lb[~lb['score_val'].isna()]

# Plot
plt.figure(figsize=(10, 6))
sns.barplot(data=lb, x='score_val', y='model', palette='viridis')
plt.xlabel('CV Score (MAP@3)')
plt.ylabel('Model')
plt.title('Cross-Validation Scores for AutoGluon Models')
plt.tight_layout()
plt.show()


In [44]:
# def mapk(y_true, y_pred, k=3):
#     def apk(actual, predicted, k):
#         predicted = predicted[:k]
#         score = 0.0
#         num_hits = 0.0
#         for i, p in enumerate(predicted):
#             if p == actual and p not in predicted[:i]:
#                 num_hits += 1.0
#                 score += num_hits / (i + 1.0)
#                 break  # Only the first correct label counts
#         return score

#     return np.mean([apk(a, p, k) for a, p in zip(y_true, y_pred)])

# # Use it:
# true_labels = val["Fertilizer Name"].values
# map3_score = mapk(true_labels, top3_preds, k=3)
# print(f"Strict MAP@3: {map3_score:.4f}")


In [None]:
probs = predictor.predict_proba(test)
top3 = probs.apply(lambda x: ' '.join(x.nlargest(3).index), axis=1)
submission = pd.DataFrame({'id': test['id'], 'Fertilizer Name': top3})
submission.to_csv('submission.csv', index=False)
